no-quorum-policy=ignore regardless of cluster size is dangerous and may exacerbate split brain
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
hacluster (Juju Charms Collection) |
Fix Released
|
High
|
Liam Young |
Bug Description
We recently experienced a split brain scenario in our HA environment where all nodes in our HA cluster grabbed the VIP following one of the instances crashing and the hardware restarting.
We have shmooshed infrastructure with the core Openstack HA services with 3 instances each under LXC on 3 physical nodes. This failure scenario was observed on all HA services when one physical node suffered a hardware related reboot.
crm status on these nodes showed the cluster was not quorate and the other 2 nodes were offline.
Bouncing corosync+pacemaker on the HA nodes restored normal operation but we then analysed logs for likely causes without much success.
crm configure show however shows "no-quorum-
An internet search seems to suggest this setting is required for a 2 node cluster, otherwise the service would fail if one node were down, but should not be set for larger clusters where it is not safe:
"Setting no-quorum-
source: http://
We have manually set no-quorum-
Related branches
- Liam Young (community): Approve
- James Page: Needs Resubmitting
-
Diff: 637 lines (+268/-169)4 files modifiedREADME.md (+6/-2)
config.yaml (+22/-1)
hooks/charmhelpers/contrib/network/ip.py (+2/-0)
hooks/hooks.py (+238/-166)
Changed in hacluster (Juju Charms Collection): | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in hacluster (Juju Charms Collection): | |
assignee: | nobody → Liam Young (gnuoy) |
tags: | added: openstack |
Changed in hacluster (Juju Charms Collection): | |
status: | Triaged → Fix Committed |
Changed in hacluster (Juju Charms Collection): | |
status: | Fix Committed → Fix Released |