Rabbitmq cluster should recover from partitioning

Bug #1354319 reported by Bogdan Dobrelya
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Bogdan Dobrelya
5.0.x
Fix Committed
High
Bogdan Dobrelya

Bug Description

Related bug https://bugs.launchpad.net/fuel/+bug/1348548. It was split into two, because we should address Corosync and RabbitMQ clustering separately.
The repro-steps described in #1348548 are enough to reproduce the split brain for rabbit cluster as well.
As a result rabbit cluster will be split into two partitions:

Cluster status of node 'rabbit@node-2' ...
[{nodes,[{disc,['rabbit@node-2','rabbit@node-3','rabbit@node-4']}]},
{running_nodes,['rabbit@node-3','rabbit@node-2']},
{partitions,[{'rabbit@node-3',['rabbit@node-4']},{'rabbit@node-2',['rabbit@node-4']}]}]...done.

Cluster status of node 'rabbit@node-4' ...
[{nodes,[{disc,['rabbit@node-2','rabbit@node-3','rabbit@node-4']}]},{running_nodes,['rabbit@node-4']},
{partitions,[{'rabbit@node-4',['rabbit@node-2','rabbit@node-3']}]}]...done.

RabbitMQ has built-in auto-heal and pause-minority policies which should care about partitions automatically, so we have to use either of it instead of default 'ignore'.
See https://www.rabbitmq.com/partitions.html for details

Tags: ha rabbitmq
Changed in fuel:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Bogdan Dobrelya (bogdando)
milestone: none → 5.1
tags: added: ha
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/112791

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/112791
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=6051006e213225d7ebf8254f8385aa3dd9909eed
Submitter: Jenkins
Branch: master

commit 6051006e213225d7ebf8254f8385aa3dd9909eed
Author: Bogdan Dobrelya <email address hidden>
Date: Fri Aug 8 11:25:39 2014 +0300

    Make rabbitmq autoheal partitions

    Default policy is 'ignore' and it does nothing in order
    to recover from partitioning allowing many partitioned
    rabbit clusters to operate as is.
    Auto-heal policy will merge all partitions into the winner
    one once exited from partitioned state
    (e.g. connectivity restored).

    Closes-bug: #1354319

    Change-Id: I33823a3abfd42b75fa6bc73d6f3cd038a2163fd6
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Kevin Benton (kevinbenton) wrote :

Can this be back-ported to 5.0? This is a very subtle issue that breaks the openstack operation in strange ways because both partitions accept data so the problem is not apparent when examining logs.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/5.0)

Fix proposed to branch: stable/5.0
Review: https://review.openstack.org/115518

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/5.0)

Reviewed: https://review.openstack.org/115518
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=eaf994b7500a621254a220fa22d22a1985e8dad2
Submitter: Jenkins
Branch: stable/5.0

commit eaf994b7500a621254a220fa22d22a1985e8dad2
Author: Bogdan Dobrelya <email address hidden>
Date: Fri Aug 8 11:25:39 2014 +0300

    Make rabbitmq autoheal partitions

    Default policy is 'ignore' and it does nothing in order
    to recover from partitioning allowing many partitioned
    rabbit clusters to operate as is.
    Auto-heal policy will merge all partitions into the winner
    one once exited from partitioned state
    (e.g. connectivity restored).

    Closes-bug: #1354319

    Change-Id: I33823a3abfd42b75fa6bc73d6f3cd038a2163fd6
    Signed-off-by: Bogdan Dobrelya <email address hidden>

tags: added: rabbitmq
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.