hook failed: "leader-settings-changed" (double bootstrap-pxc due to a leadership change before cluster bootstrap completion)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Percona Cluster Charm |
Confirmed
|
Critical
|
Unassigned |
Bug Description
16.04 HWE-edge kernel
Openstack version: Pike
Using percona-cluster r275, we deployed 3 units. 2 units (mysql/0 and mysql/2) failed in leader-
The juju log shows the leader UUID and unit UUID don't match.
2017-11-14 19:27:19 DEBUG leader-
2017-11-14 19:27:19 DEBUG leader-
2017-11-14 19:27:19 DEBUG leader-
2017-11-14 19:27:19 DEBUG leader-
2017-11-14 19:27:19 DEBUG leader-
2017-11-14 19:27:19 DEBUG leader-
2017-11-14 19:27:19 DEBUG leader-
2017-11-14 19:27:19 DEBUG leader-
2017-11-14 19:27:19 DEBUG leader-
2017-11-14 19:27:19 DEBUG leader-
2017-11-14 19:27:19 DEBUG leader-
2017-11-14 19:27:19 DEBUG leader-
2017-11-14 19:27:19 ERROR juju.worker.
I've attached the juju crashdump log.
tags: | added: cpe-onsite |
tags: | added: cdo-qa |
Changed in charm-percona-cluster: | |
importance: | Undecided → Critical |
milestone: | none → 17.11 |
tags: | added: cdo-qa-blocker foundation-engine |
I can confirm that alitvinov and I have seen exactly that behavior a few weeks ago on stable/17.08.
Leader UUID ('0dde54c4- c961-11e7- 9775-2f72db67cb da') != Unit UUID ('b3e1b7e8- c95f-11e7- 91fa-56c8581eb6 16')
In our case all leader-get invocations returned the same result (so the Juju leader bucket is consistent across units).
http:// paste.ubuntu. com/25962876/
As the code suggests, this is case 1 and it seems that there were two concurrent bootstrap attempts.
elif lead_cluster_ state_uuid != cluster_state_uuid:
# this may mean 2 things:
# 1) the units have diverged, which it's bad and we do stop.
# 2) cluster_state_uuid could not be retrieved because it
# hasn't been bootstrapped, mysqld is stopped, etc.
The question is how is that even possible with a single leader?
render_ config_ restart_ on_changed -> bootstrap_pxc
https:/ /github. com/openstack/ charm-percona- cluster/ blob/stable/ 17.08/hooks/ percona_ hooks.py# L323-L327
render_ config_ restart_ on_changed( clustered, hosts,
bootstrap =not bootstrapped)
if is_leader():
log("Leader unit - bootstrap required=%s" % (not bootstrapped),
DEBUG)
https:/ /github. com/openstack/ charm-percona- cluster/ blob/stable/ 17.08/hooks/ percona_ hooks.py# L221-L224
bootstrap_ pxc()
notify_ bootstrapped( )
# NOTE(dosaboy): this will not actually do anything if no cluster
# relation id exists yet.
If both units have passed is_leader gate than they were both leaders at the same time.
===
Each time a new leader is elected there will be a resignation operation by the operation resolver in uniter:
https:/ /github. com/juju/ juju/blob/ juju-2. 2.6/worker/ uniter/ leadership/ resolver. go#L56- L60
Normally, a current leader will keep its lease (lease renewal).
https:/ /github. com/juju/ juju/blob/ juju-2. 2.6/worker/ leadership/ tracker. go#L249
===
Looking at the logs:
unit-mysql-0 - never elected as a leader
unit-mysql-1 - elected as a once leader at 2017-11-14 17:27:07, never resigned
unit-mysql-2 - first leader ever for this app, elected as a leader once at 2017-11-14 17:17:00, resigned at 2017-11-14 17:21:33
unit-mysql-2.log
# there is no leader-elected hook implemented in percona-cluster so this is fine
post-install queued leader-elected event
2017-11-14 17:16:33 INFO juju.worker.uniter resolver.go:104 found queued "leader-elected" hook
2017-11-14 17:16:33 DEBUG juju.worker.uniter agent.go:17 [AGENT-STATUS] executing: running leader-elected hook
2017-11-14 17:16:35 INFO juju-log Unknown hook leader-elected - skipping.
2017-11-14 17:17:00 INFO juju.worker. uniter. operation runhook.go:113 ran "leader-elected" hook uniter. operation executor.go:100 committing operation "run leader-elected hook"
2017-11-14 17:17:00 DEBUG juju.worker.
...
2017-11-14 17:17:00 INFO juju.worker. uniter. operation runhook.go:113 ran "leader-elected" hook uniter. operation executor.go:100 committing operation "run leader-elected hook"
2017-11-14 17:17:00 DEBUG juju.worker.
...
2017-11-14 17:17:28 DEBUG juju-log Leader unit - bootstrap required=True
2017-11-14 17:17:32 DEBUG config-changed Unknown oper...