nova-status upgrade check fails on Object ID linkage
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
In Progress
|
High
|
Unassigned |
Bug Description
Description
===========
With upgrade from 2023.1 to 2023.2 when running nova-status upgrade check it fails with exit code 2.
According to the documentation [1], this command was run with the new codebase (2023.2) but before any service (api/conductor/
With that all computes are UP and healthy:
# openstack compute service list
+------
| ID | Binary | Host | Zone | Status | State | Updated At |
+------
| 001ea1ce-
| 8df25103-
| d85b115a-
+------
Steps to reproduce
==================
* Run cluster on 2023.1
* Perform upgrade to 2023.2 but do not restart nova services (as assumed by the documentation)
* Run nova-status upgrade check
Expected result
===============
Upgrade check passes
Actual result
=============
+------
| Check: Object ID linkage |
| Result: Failure |
| Details: Compute node objects without service_id linkage were found |
| in the database. Ensure all non-deleted compute services |
| have started with upgraded code. |
+------
1] https:/
just leaving some context before i finish for tonight.
when i originally asked for this check it was with the intent to detect the case where we were going to rely on this in the next release and notify the operator that one of the compute nodes was not upgaged.
chatting about this on irc that logic was flawed.
in the current release we don't depend on this being set.
before upgrading the compute nodes and restarting them it will always not be set and the help text for this
command says it should be run before the service are restarted to execute the new code.
the check as written will also not support clouds that have ironic deployed as tey will not have a compute service id set in the compute node record (and cant until we remove the hash ring code)
for those reasons i think we should likely revert this status check.
alternatively we can modify it to filter out ironic compute nodes and add a min compute service version check a the start. i.e. only run it if all comptue services are at least upgraded to bobcat.
if they are 2023.2+ and we filter out ironic that menast that you have db corruption as something removed the compute service id form the compute node record.