nova-status upgrade check fails on Object ID linkage

Bug #2039597 reported by Dmitriy Rabotyagov
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress

Bug Description


With upgrade from 2023.1 to 2023.2 when running nova-status upgrade check it fails with exit code 2.

According to the documentation [1], this command was run with the new codebase (2023.2) but before any service (api/conductor/scheduler/compute) was restarted, so they still run on 2023.1 codebase.

With that all computes are UP and healthy:

# openstack compute service list
| ID | Binary | Host | Zone | Status | State | Updated At |
| 001ea1ce-363f-41d1-9ce3-59ff966452a7 | nova-conductor | aio1 | internal | enabled | up | 2023-10-17T18:14:38.000000 |
| 8df25103-65c9-4892-be05-ebed7f3c1ad4 | nova-scheduler | aio1 | internal | enabled | up | 2023-10-17T18:14:40.000000 |
| d85b115a-cd8a-4ac9-82bc-f7a5f457cedc | nova-compute | aio1 | nova | enabled | up | 2023-10-17T18:14:39.000000 |

Steps to reproduce

* Run cluster on 2023.1
* Perform upgrade to 2023.2 but do not restart nova services (as assumed by the documentation)
* Run nova-status upgrade check

Expected result

Upgrade check passes

Actual result

| Check: Object ID linkage |
| Result: Failure |
| Details: Compute node objects without service_id linkage were found |
| in the database. Ensure all non-deleted compute services |
| have started with upgraded code. |


Revision history for this message
sean mooney (sean-k-mooney) wrote :

just leaving some context before i finish for tonight.
when i originally asked for this check it was with the intent to detect the case where we were going to rely on this in the next release and notify the operator that one of the compute nodes was not upgaged.

chatting about this on irc that logic was flawed.
in the current release we don't depend on this being set.

before upgrading the compute nodes and restarting them it will always not be set and the help text for this
command says it should be run before the service are restarted to execute the new code.

the check as written will also not support clouds that have ironic deployed as tey will not have a compute service id set in the compute node record (and cant until we remove the hash ring code)

for those reasons i think we should likely revert this status check.

alternatively we can modify it to filter out ironic compute nodes and add a min compute service version check a the start. i.e. only run it if all comptue services are at least upgraded to bobcat.

if they are 2023.2+ and we filter out ironic that menast that you have db corruption as something removed the compute service id form the compute node record.

Changed in nova:
status: New → Triaged
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master

Changed in nova:
status: Triaged → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.