hacluster charm upgrade will not fix existing duplicate VIP issue
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack HA Cluster Charm |
In Progress
|
Medium
|
Unassigned |
Bug Description
* Bug Description *
In Bug #1838528 we fixed an issue where pacemaker resources need to be stopped before being removed. This led to duplicate VIP resource names with the same IP address being created and potentially allocated to different nodes.
The fix for this issue was to stop the resource before deleting it, as they won't delete unless they are stopped.
However this fix only works if hacluster is upgraded first, before an upgrade to the principal charm is also done afterwards. If you upgrade the principal charm first and cause the problem, upgrading hacluster later will not fix it.
* Bug Cause *
This is because the code to reconfigure CRM only executes in the ha_relation_changed function, which is only called in the event of an actual ha-relation-
This means that the CRM configuration is not re-performed in the event of either upgrade_charm or config_changed - so an environment that upgrades their principal charm first and then hacluster second will trigger the issue and never fix it.
Secondly this generally means that any kind of config change reflected by the ha_relation_changed code won't be applied when made, but may later be applied when a charm just happens to trigger a relation change.
This has been hit in multiple production environments and is critical because the duplicate VIPs cause random problems in the environment.
This problem applies to any charm using hacluster, nova-cloud-
* Suggested Fix *
We should iterate on ha_relation_changed during upgrade_charm and probably also config_changed.
This is a heavy-weight function though so we should make sure it is actually needed by config-changed but I don't see the harm in using it for upgrade-charm. Though we should make sure it correctly respects and works with the logic used and recommended in the deployment upgrade guide to 'pause' hacluster, etc.
We should also upgrade the openstack charm deployment guide to actually mention upgrading hacluster, right now it is not mentioned:
https:/
* Reproduction steps *
(1) deploy a xenial-queens openstack cloud using nova-cloud-
(2) juju upgrade-charm nova-cloud-
(3) #observe "crm_mon" on nova-cloud-
(4) juju upgrade-charm hacluster #wait for completion
(%) #observe duplicate VIPs still exist
* Workaround *
You can manually run the ha_relation_joined hook since it iterates over all relations and does not use the context of the currently changed relation.
juju run --application nova-cloud-
tags: | added: sts |
tags: | added: seg |
tags: | added: charm-upgrade |
Changed in charm-hacluster: | |
status: | New → Triaged |
importance: | Undecided → Medium |
This is a bit of a side effect of the change in behaviour in Juju to not run the config-changed hook after upgrade-charm if configuration has not actually changed.
Iterating the ha relation is fine as part of a charm upgrade hook event:
for rid in hookenv. relation_ ids('ha' ):
ha_joined( rid)
(this code is executing during config-changed)