Openstack HA , rabbitmq cluster in partition state after isolation of data/control interface

Bug #1404067 reported by venu kolli
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.0
Fix Committed
High
venu kolli
Trunk
Fix Committed
High
venu kolli

Bug Description

Rabbitmq cluster in partition state after isolation of data/control interface.

Issue observed on R2.0 build 12 with Sanju's fixes.

After isolating data/control interface on node 1 and bring the interface back , rabbit cluster is still in partition state.

root@vse2100-2:/var/log/rabbitmq# rabbitmqctl cluster_status
Cluster status of node 'rabbit@vse2100-2-ctrl' ...
[{nodes,
     [{disc,
          ['rabbit@vse2100-2-ctrl','rabbit@vse2100-3-ctrl',
           'rabbit@vse2100-4-ctrl']}]},
 {running_nodes,
     ['rabbit@vse2100-4-ctrl','rabbit@vse2100-3-ctrl',
      'rabbit@vse2100-2-ctrl']},
 {partitions,
     [{'rabbit@vse2100-2-ctrl',
          ['rabbit@vse2100-3-ctrl','rabbit@vse2100-4-ctrl']}]}]
...done.
root@vse2100-2:/var/log/rabbitmq#
root@vse2100-2:/var/log/rabbitmq# rabbitmqctl

Tags: ha
Revision history for this message
Sanju Abraham (asanju) wrote :

The fix addresses the issue of rabbitmq cluster partitioned on interface and link failures. In such cases, with autoheal flag does not fully recover. As per the documentation from rabbitmq, some of the fixes around recovery for autoheal is done in 3.3.0 and till that time the only way the partition can be restored is to restart rabbit on the node where is has the latest transaction ID

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/5787
Committed: http://github.org/Juniper/contrail-provisioning/commit/c83c74461362897611eed47e3f38a92e17ab4dc9
Submitter: Zuul
Branch: R2.0

commit c83c74461362897611eed47e3f38a92e17ab4dc9
Author: Sanju Abraham <email address hidden>
Date: Thu Dec 18 19:38:14 2014 -0800

Close-Bug#1404067. The fixes addresses the issue of rabbitmq cluster partitioned on interface and link failures. In such cases, with autoheal flag does not fully recover. As per the documentation from rabbitmq, some of the fixes around recovery for autoheal is done in 3.3.0 and till that time the only way the partition can be restored is to restart rabbit on the node where is has the latest transaction ID.

Change-Id: Ibc417806c66eefaaeb2b907fc25ce3c0a40f5bf9

tags: added: ha
Changed in juniperopenstack:
importance: Undecided → High
assignee: nobody → Sanju Abraham (asanju)
information type: Proprietary → Public
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.