reinstalling a compute node and then upgrading from pike to queens fails
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
New
|
Undecided
|
Unassigned |
Bug Description
Hi,
I had a working xenial/pike cloud recently, using neutron-ovs, with some compute nodes, in particular a ppc64 compute node named bagon. I needed to reinstall it, so I did the following :
1. nova service-delete <id of the compute service on bagon>
2. neutron agent-delete <uuid of the openvswitch agent on bagon>
3. Re-commission the node and deploy the nova-compute application on it
After what, some times later, I upgraded the cloud to queens. This apparently caused the node to stop working. It was logging the following error (nova-compute.log on bagon) :
2018-04-09 06:25:26.099 128068 ERROR nova.scheduler.
Full stack trace : https:/
I tracked down the problem, and found it was due to the following mismatch :
mysql> select uuid,host,deleted from compute_nodes where host='bagon';
+------
| uuid | host | deleted |
+------
| 2d236848-
| 92232041-
+------
2 rows in set (0.00 sec)
mysql> use nova_api;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select uuid,name from resource_providers where name like 'bagon%';
+------
| uuid | name |
+------
| 92232041-
+------
1 row in set (0.00 sec)
The nova.compute_nodes table has 2 records for bagon, as expected : one is the old, deleted record and the other the current, live record.
The problem, as you can see above, is that the nova_api.
I manually updated the UUID in the resource_providers table, and bagon started working fine.
I can't try to repro because I can't downgrade the cluster to try the pike=>queens upgrade a second time, but hopefully you can.
Thanks !