Juju Charms Collection
hacluster package

no-quorum-policy=ignore regardless of cluster size is dangerous and may exacerbate split brain

Bug #1354452 reported by Gareth Woolridge on 2014-08-08

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	hacluster (Juju Charms Collection)	Fix Released	High	Liam Young

Bug Description

We recently experienced a split brain scenario in our HA environment where all nodes in our HA cluster grabbed the VIP following one of the instances crashing and the hardware restarting.

We have shmooshed infrastructure with the core Openstack HA services with 3 instances each under LXC on 3 physical nodes. This failure scenario was observed on all HA services when one physical node suffered a hardware related reboot.

crm status on these nodes showed the cluster was not quorate and the other 2 nodes were offline.

Bouncing corosync+pacemaker on the HA nodes restored normal operation but we then analysed logs for likely causes without much success.

crm configure show however shows "no-quorum-policy=ignore" to be set across HA clusters, confirmed to be set as part of configure_cluster by the charm.

An internet search seems to suggest this setting is required for a 2 node cluster, otherwise the service would fail if one node were down, but should not be set for larger clusters where it is not safe:

"Setting no-quorum-policy="ignore" is required in 2-node Pacemaker clusters for the following reason: if quorum enforcement is enabled, and one of the two nodes fails, then the remaining node can not establish a majority of quorum votes necessary to run services, and thus it is unable to take over any resources. The appropriate workaround is to ignore loss of quorum in the cluster. This is safe and necessary only in 2-node clusters. Do not set this property in Pacemaker clusters with more than two nodes. "

source: http://docs.openstack.org/high-availability-guide/content/_setting_basic_cluster_properties.html

We have manually set no-quorum-policy=stop for now on our 3 node cluster, the charm set this value appropriately depending on cluster size?

Tags:

Related branches

lp://staging/~gnuoy/charms/trusty/hacluster/lp-1354452

lp://staging/~james-page/charms/trusty/hacluster/mix-fixes

Merged into lp://staging/~openstack-charmers/charms/trusty/hacluster/next at revision 34

Liam Young (community): Approve on 2014-10-07

James Page: Needs Resubmitting on 2014-10-06

James Page (james-page) on 2014-08-13

Changed in hacluster (Juju Charms Collection):
importance:	Undecided → High
status:	New → Triaged

Liam Young (gnuoy) on 2014-08-20

Changed in hacluster (Juju Charms Collection):
assignee:	nobody → Liam Young (gnuoy)

Edward Hope-Morley (hopem) on 2014-10-06

tags:

added: openstack

James Page (james-page) on 2014-10-07

Changed in hacluster (Juju Charms Collection):
status:	Triaged → Fix Committed

Edward Hope-Morley (hopem) on 2014-10-24

Changed in hacluster (Juju Charms Collection):
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Juju Charms Collectionhacluster package