upgrade-series prepare puts units into failed state if a subordinate does not support the target series
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
Yang Kelvin Liu | ||
OpenStack Charm Guide |
Triaged
|
High
|
Peter Matulis |
Bug Description
Using 2.9.38 client,controller and model. Upgrading focal to jammy using upgrade-series prepare with a lldpd subordinate charm started execution of pre-series-upgrade hooks but failed, because that charm doesn't support jammy. All subordinate units + principal units went into an endless loop with error status and couldn't be fixed/resolved.
Juju shouldn't trigger prepare, so pre-checks are necessary and prevent the operator to do it, unless --force is specified.
Now the cloud is in a blocked and charms can't complete with:
nova-compute-
ceilometer-
ovn-chassis-
juju upgrade-series 9 complete
ERROR machine "9" can not complete, it is either not prepared or already completed
juju upgrade-series 9 prepare jammy -y
ERROR Upgrade series is currently being prepared for machine "9".
description: | updated |
Changed in juju: | |
importance: | Undecided → High |
milestone: | none → 2.9.43 |
status: | New → Triaged |
tags: | added: subordinate upgrade-charm |
Changed in charm-guide: | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in charm-guide: | |
assignee: | nobody → Peter Matulis (petermatulis) |
Changed in juju: | |
milestone: | 2.9.43 → 2.9.44 |
Changed in juju: | |
milestone: | 2.9.44 → 2.9.45 |
Changed in juju: | |
status: | Fix Committed → Fix Released |
I hit this issue in production with series-upgrade of a Yoga OpenStack cloud from focal->jammy. The issue for us was the hacluster charm. The 2.0.3 charm only supports focal while the 2.4 charm supports both focal+jammy. It still occurs on 2.9.42.
It's not clear from the original description, but the "upgrade-series prepare" does error with the following, after typing yes to confirm the upgrade:
"ERROR charm "hacluster" does not support jammy, force not used"
However the pre-upgrade-series hooks run in the background anyway, even though the juju client exits after that error. Then the hacluster unit goes into the failed state.
In our debugging, db.machineUpgra deSeriesLocks is empty. You can re-run prepare with --force which then creates a lock however the units are still stuck in a failed state. It seems that still leaves the object in a state the transcation won't let happen - perhaps because the hooks already ran.
= Workaround =
If you only attempted the "prepare" on a single unit, you can force-remove that unit, scale it back out, upgrade the hacluster charm and then proceed with a series upgrade. I was not able to find a way to get the broken unit out of the broken state.
juju remove-unit keystone/0 --force
juju add-unit keystone
juju upgrade-charm keystone --channel 2.4/stable
= Reproducer = focal-yoga. yaml
You can deploy a simple bundle with keystone and hacluster to reproduce the issue. I have attached the bundle as keystone-
juju add-model keystone1 focal-yoga. yaml
juju deploy ./keystone-
juju upgrade-series 0 prepare jammy
= Expectations =
- This is a critical issue that needs prioritising for a new 2.9.43 release.
- However we also need to determine if we can easily fix this situation as people are very likely to get stuck and removing and scaling the broken unit is very error prone in practice and best avoided if possible.