[series-upgrade] post-series-upgrade hook may fail due to race with principle
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack HA Cluster Charm |
New
|
Undecided
|
Unassigned |
Bug Description
Seen on openstack-dashboard during series upgrade.
The race hazard with hacluster starting in the post-series upgrade hook was hit again.
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
unit-openstack-
From the log directly:
2021-06-18 15:07:58 DEBUG post-series-upgrade active
2021-06-18 15:07:58 INFO juju-log Unit is ready
2021-06-18 15:07:58 WARNING post-series-upgrade Traceback (most recent call last):
2021-06-18 15:07:58 WARNING post-series-upgrade File "/var/lib/
2021-06-18 15:07:58 WARNING post-series-upgrade main()
2021-06-18 15:07:58 WARNING post-series-upgrade File "/var/lib/
2021-06-18 15:07:58 WARNING post-series-upgrade hooks.execute(
2021-06-18 15:07:58 WARNING post-series-upgrade File "/var/lib/
2021-06-18 15:07:58 WARNING post-series-upgrade self._hooks[
2021-06-18 15:07:58 WARNING post-series-upgrade File "/var/lib/
2021-06-18 15:07:58 WARNING post-series-upgrade series_
2021-06-18 15:07:58 WARNING post-series-upgrade File "/var/lib/
2021-06-18 15:07:58 WARNING post-series-upgrade resume_
2021-06-18 15:07:58 WARNING post-series-upgrade File "/var/lib/
2021-06-18 15:07:58 WARNING post-series-upgrade _pause_
2021-06-18 15:07:58 WARNING post-series-upgrade File "/var/lib/
2021-06-18 15:07:58 WARNING post-series-upgrade f(assess_
2021-06-18 15:07:58 WARNING post-series-upgrade File "/var/lib/
2021-06-18 15:07:58 WARNING post-series-upgrade raise Exception("Couldn't resume: {}".format("; ".join(messages)))
2021-06-18 15:07:58 WARNING post-series-upgrade Exception: Couldn't resume: haproxy didn't resume cleanly.
2021-06-18 15:07:58 ERROR juju.worker.
The haproxy log for the period just shows:
Jun 18 13:25:21 juju-182b52-0-lxd-6 haproxy[439531]: [WARNING] 133/180728 (439531) : All workers exited. Exiting... (143)
Jun 18 15:07:58 juju-182b52-0-lxd-6 haproxy[2374]: [NOTICE] 168/150758 (2374) : New worker #1 (2383) forked
Jun 18 15:38:48 juju-182b52-0-lxd-6 haproxy[2383]: [WARNING] 168/153848 (2383) : Server dash_secure/
check duration: 1ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Which may mean that the detection of the resumption wasn't clean??
At the same time pacemaker was logging:
Jun 18 15:07:42 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
Jun 18 15:07:42 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
Jun 18 15:07:47 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_ping) info: Reporting our current digest to juju-182b52-
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-controld [472] (process_lrm_event) notice: Result of monitor operation for res_horizon_haproxy on juju-182b52-
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-controld [472] (process_lrm_event) notice: juju-182b52-
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: --- 0.66.51 2
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: +++ 0.66.52 (null)
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib: @num_updates=52
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib/status/
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-attrd [470] (attrd_peer_update) notice: Setting fail-count-
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-attrd [470] (attrd_peer_update) notice: Setting last-failure-
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: --- 0.66.52 2
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: +++ 0.66.53 (null)
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib: @num_updates=53
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: ++ /cib/status/
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: --- 0.66.53 2
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: +++ 0.66.54 (null)
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib: @num_updates=54
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: ++ /cib/status/
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-execd [469] (cancel_
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-controld [472] (do_lrm_rsc_op) info: Performing key=8:98:
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-execd [469] (log_execute) info: executing - rsc:res_
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-controld [472] (process_lrm_event) info: Result of monitor operation for res_horizon_haproxy on juju-182b52-
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-controld [472] (process_lrm_event) notice: juju-182b52-
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: --- 0.66.54 2
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: +++ 0.66.55 (null)
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib: @num_updates=55
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib/status/
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-execd [469] (log_finished) info: finished - rsc:res_
s
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-controld [472] (process_lrm_event) notice: Result of stop operation for res_horizon_haproxy on juju-182b52-
_horizon_
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: --- 0.66.55 2
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: +++ 0.66.56 (null)
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib: @num_updates=56
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib/status/
/lrm_rsc_
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
rsion=0.66.56)
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-controld [472] (do_lrm_rsc_op) info: Performing key=23:
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-execd [469] (log_execute) info: executing - rsc:res_
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: --- 0.66.56 2
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: +++ 0.66.57 (null)
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib: @num_updates=57
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib/status/
/lrm_rsc_
193;23:
Jun 18 15:07:57 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
rsion=0.66.57)
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-execd [469] (log_finished) info: finished - rsc:res_
ms
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-controld [472] (process_lrm_event) notice: Result of start operation for res_horizon_haproxy on juju-182b52-
s_horizon_
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: --- 0.66.57 2
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: +++ 0.66.58 (null)
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib: @num_updates=58
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib/status/
/lrm_rsc_
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
rsion=0.66.58)
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-controld [472] (do_lrm_rsc_op) info: Performing key=7:98:
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: --- 0.66.58 2
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: +++ 0.66.59 (null)
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib: @num_updates=59
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib/status/
/lrm_rsc_
=-1, @rc-code=193, @op-status=-1, @last-rc-
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
rsion=0.66.59)
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-controld [472] (process_lrm_event) notice: Result of monitor operation for res_horizon_haproxy on juju-182b52-
res_horizon_
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: --- 0.66.59 2
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: +++ 0.66.60 (null)
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib: @num_updates=60
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib/status/
/lrm_rsc_
Jun 18 15:07:58 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
rsion=0.66.60)
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-attrd [470] (attrd_peer_update) notice: Setting last-failure-
| from juju-182b52-
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-attrd [470] (attrd_peer_update) notice: Setting fail-count-
u-182b52-11-lxd-6
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
e[@id='
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: --- 0.66.60 2
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: +++ 0.66.61 (null)
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: -- /cib/status/
']/nvpair[
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib: @num_updates=61
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
version=0.66.61)
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: --- 0.66.61 2
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: +++ 0.66.62 (null)
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: -- /cib/status/
']/nvpair[
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib: @num_updates=62
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
version=0.66.62)
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: --- 0.66.62 2
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: +++ 0.66.63 (null)
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: -- /cib/status/
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib: @num_updates=63
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
[@id='res_
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-controld [472] (delete_resource) info: Removing resource res_horizon_haproxy for 5d0932c1-
2-11-lxd-6
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-execd [469] (cancel_
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-controld [472] (lrm_remove_
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-controld [472] (notify_deleted) info: Notifying 5d0932c1-
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
e[@id='
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-execd [469] (process_
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-controld [472] (process_lrm_event) info: Result of monitor operation for res_horizon_haproxy on juju-182b52-
=res_horizon_
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-controld [472] (process_lrm_event) info: Deletion of resource 'res_horizon_
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: --- 0.66.62 2
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: +++ 0.66.63 (null)
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: -- /cib/status/
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib: @num_updates=63
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
[@id='res_
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
e[@id='
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: --- 0.66.63 2
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: Diff: +++ 0.67.0 (null)
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib: @epoch=67, @num_updates=0
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_perform_op) info: + /cib/configurat
p-options-
Jun 18 15:07:59 juju-182b52-0-lxd-6 pacemaker-based [466] (cib_process_
, version=0.67.0)
etc.
It is resolved with "juju resolved <unit>" and then it completes so there's an issue with it not being resumable.