adding a 3rd hacluster unit frequently makes ha-relation-changed to loop on crm node list
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack HA Cluster Charm |
Confirmed
|
High
|
Unassigned | ||
hacluster (Juju Charms Collection) |
Invalid
|
High
|
Unassigned |
Bug Description
FYI this happens when deploying HA openstack with 1501 release
charms, using hacluster in unicast mode. It's a staged deployment
where we 1st deploy all HA services with 2 units, relate them
(for OS service), and finally add the 3rd unit to all HA'd ones.
We're repeatedly seeing issues with hacluster not settling on the 3rd
unit (/2) - drilling down, found that the 3rd unit is carrying an
incomplete corosync.conf with "two_node: 1" and only 2 nodes there
for unicast, while the others are already running with the 3 nodes
setup: http://
Then the charm loops on 'crm node list' which never settles, not even
a manual corosync,pacemaker kill + restart works, as /2 can't join
the 3-node cluster (as expected by the other units).
Manually copying corosync.conf from /0 into /2 and restarting
corosync+pacemaker works, it can then succeed on 'crm node list',
and join the cluster.
Changed in hacluster (Juju Charms Collection): | |
milestone: | none → 15.04 |
tags: | added: openstack |
Changed in hacluster (Juju Charms Collection): | |
milestone: | 15.04 → 15.07 |
Changed in hacluster (Juju Charms Collection): | |
milestone: | 15.07 → 15.10 |
Changed in hacluster (Juju Charms Collection): | |
milestone: | 15.10 → 16.01 |
Changed in hacluster (Juju Charms Collection): | |
milestone: | 16.01 → 16.04 |
Changed in hacluster (Juju Charms Collection): | |
milestone: | 16.04 → 16.07 |
Changed in hacluster (Juju Charms Collection): | |
milestone: | 16.07 → 16.10 |
Changed in hacluster (Juju Charms Collection): | |
milestone: | 16.10 → 17.01 |
Changed in charm-hacluster: | |
importance: | Undecided → High |
status: | New → Confirmed |
Changed in hacluster (Juju Charms Collection): | |
status: | Confirmed → Invalid |
FYI this looks like a race, as we have the same (repeated) deployment
sometimes failing on different hacluster subordinates (keystone,
cinder, glance).