OpenStack Percona Cluster Charm

Bug #1732257
Comment #2

Comment 2 for bug 1732257

Revision history for this message

Dmitrii Shcherbakov (dmitriis) wrote on 2017-11-14: Re: hook failed: "leader-settings-changed"

I can confirm that alitvinov and I have seen exactly that behavior a few weeks ago on stable/17.08.

Leader UUID ('0dde54c4-c961-11e7-9775-2f72db67cbda') != Unit UUID ('b3e1b7e8-c95f-11e7-91fa-56c8581eb616')

In our case all leader-get invocations returned the same result (so the Juju leader bucket is consistent across units).

http://paste.ubuntu.com/25962876/

As the code suggests, this is case 1 and it seems that there were two concurrent bootstrap attempts.

    elif lead_cluster_state_uuid != cluster_state_uuid:
        # this may mean 2 things:
        # 1) the units have diverged, which it's bad and we do stop.
        # 2) cluster_state_uuid could not be retrieved because it
        # hasn't been bootstrapped, mysqld is stopped, etc.

The question is how is that even possible with a single leader?

render_config_restart_on_changed -> bootstrap_pxc

https://github.com/openstack/charm-percona-cluster/blob/stable/17.08/hooks/percona_hooks.py#L323-L327
    if is_leader():
        log("Leader unit - bootstrap required=%s" % (not bootstrapped),
            DEBUG)
        render_config_restart_on_changed(clustered, hosts,
                                         bootstrap=not bootstrapped)

https://github.com/openstack/charm-percona-cluster/blob/stable/17.08/hooks/percona_hooks.py#L221-L224
            bootstrap_pxc()
            # NOTE(dosaboy): this will not actually do anything if no cluster
            # relation id exists yet.
            notify_bootstrapped()

If both units have passed is_leader gate than they were both leaders at the same time.

===

Each time a new leader is elected there will be a resignation operation by the operation resolver in uniter:

https://github.com/juju/juju/blob/juju-2.2.6/worker/uniter/leadership/resolver.go#L56-L60

Normally, a current leader will keep its lease (lease renewal).

https://github.com/juju/juju/blob/juju-2.2.6/worker/leadership/tracker.go#L249

===

Looking at the logs:

unit-mysql-0 - never elected as a leader
unit-mysql-1 - elected as a once leader at 2017-11-14 17:27:07, never resigned
unit-mysql-2 - first leader ever for this app, elected as a leader once at 2017-11-14 17:17:00, resigned at 2017-11-14 17:21:33

unit-mysql-2.log

# there is no leader-elected hook implemented in percona-cluster so this is fine

post-install queued leader-elected event
2017-11-14 17:16:33 INFO juju.worker.uniter resolver.go:104 found queued "leader-elected" hook

2017-11-14 17:16:33 DEBUG juju.worker.uniter agent.go:17 [AGENT-STATUS] executing: running leader-elected hook

2017-11-14 17:16:35 INFO juju-log Unknown hook leader-elected - skipping.

2017-11-14 17:17:00 INFO juju.worker.uniter.operation runhook.go:113 ran "leader-elected" hook
2017-11-14 17:17:00 DEBUG juju.worker.uniter.operation executor.go:100 committing operation "run leader-elected hook"

...

2017-11-14 17:17:28 DEBUG juju-log Leader unit - bootstrap required=True

2017-11-14 17:17:32 DEBUG config-changed Unknown operation bootstrap-pxc.

2017-11-14 17:17:43 DEBUG juju-log Bootstrap PXC Succeeded
...

2017-11-14 17:21:27 DEBUG juju.worker.leadership tracker.go:230 notifying mysql/2 ticket of impending loss of mysql leadership
2017-11-14 17:21:27 DEBUG juju.worker.leadership tracker.go:215 mysql/2 waiting for mysql leadership release
2017-11-14 17:21:27 DEBUG juju.worker.leadership tracker.go:269 mysql/2 is not mysql leader

2017-11-14 17:21:33 DEBUG juju.worker.uniter.operation executor.go:100 committing operation "resign leadership"
2017-11-14 17:21:33 DEBUG juju.worker.uniter.operation executor.go:69 running operation run leader-settings-changed hook

unit-mysql-1.log:

2017-11-14 17:21:24 INFO juju.worker.leadership tracker.go:185 mysql/1 promoted to leadership of mysql

2017-11-14 17:27:07 INFO juju.worker.uniter.operation runhook.go:113 ran "leader-elected" hook
...
2017-11-14 17:27:09 DEBUG juju-log Leader unit - bootstrap required=True

2017-11-14 17:27:24 DEBUG juju-log Bootstrap PXC Succeeded

...

2017-11-14 17:29:48 DEBUG juju-log cluster:28: Bootstrap PXC Succeeded

2017-11-14 17:29:48 DEBUG juju-log cluster:28: Notifying peers that percona is bootstrapped (uuid=0dde54c4-c961-11e7-9775-2f72db67cbda)

...

2017-11-14 17:29:32 DEBUG juju-log cluster:28: Leader unit - bootstrap required=True

2017-11-14 17:29:48 DEBUG juju-log cluster:28: Bootstrap PXC Succeeded

2017-11-14 17:30:02 DEBUG juju-log cluster:28: Notifying peers that percona is bootstrapped (uuid=0dde54c4-c961-11e7-9775-2f72db67cbda)

2017-11-14 17:30:30 DEBUG juju-log cluster:28: Notifying peers that percona is bootstrapped (uuid=0dde54c4-c961-11e7-9775-2f72db67cbda)

2017-11-14 17:32:13 DEBUG juju-log cluster:28: Notifying peers that percona is bootstrapped (uuid=0dde54c4-c961-11e7-9775-2f72db67cbda)

====

So there was never a renewal on the unit-mysql-2:

mysql/2: 2017-11-14 17:13:32 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"

mysql/2: 2017-11-14 17:13:34 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"

mysql/2: 2017-11-14 17:13:35 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"

# All leader-set hook tool invocations for the old leader mysql/2 happened after the above

- 2017-11-14 17:17:00 mysql/2 -> leader

- 2017-11-14 17:21:24 mysql/1 -> promoted to leader

- 2017-11-14 17:21:33 mysql/2 -> resigned leadership
- 2017-11-14 17:21:33 mysql/2 -> leader-settings-changed operation started (not execution)

- 2017-11-14 17:27:03 mysql/2 -> leader-settings-changed execution began by agent

- 2017-11-14 17:27:06 mysql/2 -> unit agent reported that this is not a leader:
DEBUG juju.worker.leadership tracker.go:269 mysql/2 is not mysql leader

- 2017-11-14 17:27:07 mysql/1 -> leader-elected

< ?

- 2017-11-14 17:27:11 mysql/2 -> leader-settings-changed hook hook execution ended, operation committed by uniter
DEBUG juju.worker.uniter.operation executor.go:100 committing operation "run leader-settings-changed hook"

# All leader-set hook tool invocations for the new leader mysql/1 happened after the above

mysql/1 2017-11-14 17:29:48 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"

mysql/1 2017-11-14 17:30:02 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"

mysql/1 2017-11-14 17:30:31 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"

mysql/1 2017-11-14 17:32:13 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"

I can confirm that alitvinov and I have seen exactly that behavior a few weeks ago on stable/17.08.

Leader UUID ('0dde54c4-c961-11e7-9775-2f72db67cbda') != Unit UUID ('b3e1b7e8-c95f-11e7-91fa-56c8581eb616')

In our case all leader-get invocations returned the same result (so the Juju leader bucket is consistent across units).

http://paste.ubuntu.com/25962876/

As the code suggests, this is case 1 and it seems that there were two concurrent bootstrap attempts.

The question is how is that even possible with a single leader?

render_config_restart_on_changed -> bootstrap_pxc

If both units have passed is_leader gate than they were both leaders at the same time.

===

Each time a new leader is elected there will be a resignation operation by the operation resolver in uniter:

https://github.com/juju/juju/blob/juju-2.2.6/worker/uniter/leadership/resolver.go#L56-L60

Normally, a current leader will keep its lease (lease renewal).

https://github.com/juju/juju/blob/juju-2.2.6/worker/leadership/tracker.go#L249

===

Looking at the logs:

unit-mysql-2.log

# there is no leader-elected hook implemented in percona-cluster so this is fine

post-install queued leader-elected event
2017-11-14 17:16:33 INFO juju.worker.uniter resolver.go:104 found queued "leader-elected" hook

2017-11-14 17:16:33 DEBUG juju.worker.uniter agent.go:17 [AGENT-STATUS] executing: running leader-elected hook

2017-11-14 17:16:35 INFO juju-log Unknown hook leader-elected - skipping.

...

2017-11-14 17:17:28 DEBUG juju-log Leader unit - bootstrap required=True

2017-11-14 17:17:32 DEBUG config-changed Unknown operation bootstrap-pxc.

2017-11-14 17:17:43 DEBUG juju-log Bootstrap PXC Succeeded
...

unit-mysql-1.log:

2017-11-14 17:21:24 INFO juju.worker.leadership tracker.go:185 mysql/1 promoted to leadership of mysql

2017-11-14 17:27:07 INFO juju.worker.uniter.operation runhook.go:113 ran "leader-elected" hook
...
2017-11-14 17:27:09 DEBUG juju-log Leader unit - bootstrap required=True

2017-11-14 17:27:24 DEBUG juju-log Bootstrap PXC Succeeded

...

2017-11-14 17:29:48 DEBUG juju-log cluster:28: Bootstrap PXC Succeeded

2017-11-14 17:29:48 DEBUG juju-log cluster:28: Notifying peers that percona is bootstrapped (uuid=0dde54c4-c961-11e7-9775-2f72db67cbda)

...

2017-11-14 17:29:32 DEBUG juju-log cluster:28: Leader unit - bootstrap required=True

2017-11-14 17:29:48 DEBUG juju-log cluster:28: Bootstrap PXC Succeeded

2017-11-14 17:30:02 DEBUG juju-log cluster:28: Notifying peers that percona is bootstrapped (uuid=0dde54c4-c961-11e7-9775-2f72db67cbda)

2017-11-14 17:30:30 DEBUG juju-log cluster:28: Notifying peers that percona is bootstrapped (uuid=0dde54c4-c961-11e7-9775-2f72db67cbda)

2017-11-14 17:32:13 DEBUG juju-log cluster:28: Notifying peers that percona is bootstrapped (uuid=0dde54c4-c961-11e7-9775-2f72db67cbda)

====

So there was never a renewal on the unit-mysql-2:

mysql/2: 2017-11-14 17:13:32 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"

mysql/2: 2017-11-14 17:13:34 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"

mysql/2: 2017-11-14 17:13:35 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"

# All leader-set hook tool invocations for the old leader mysql/2 happened after the above

- 2017-11-14 17:17:00 mysql/2 -> leader

- 2017-11-14 17:21:24 mysql/1 -> promoted to leader

- 2017-11-14 17:21:33 mysql/2 -> resigned leadership
- 2017-11-14 17:21:33 mysql/2 -> leader-settings-changed operation started (not execution)

- 2017-11-14 17:27:03 mysql/2 -> leader-settings-changed execution began by agent

- 2017-11-14 17:27:06 mysql/2 -> unit agent reported that this is not a leader:
 DEBUG juju.worker.leadership tracker.go:269 mysql/2 is not mysql leader

- 2017-11-14 17:27:07 mysql/1 -> leader-elected

< ?

# All leader-set hook tool invocations for the new leader mysql/1 happened after the above

mysql/1 2017-11-14 17:29:48 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"

mysql/1 2017-11-14 17:30:02 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"

mysql/1 2017-11-14 17:30:31 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"

mysql/1 2017-11-14 17:32:13 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"