Originally reported in RH bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1584315
Reproduced on OSP12 (Pike).
After resizing an instance but before confirm, update_available_resource will fail on the source compute due to bug 1774249. If nova compute is restarted at this point before the resize is confirmed, the update_available_resource period task will never have succeeded, and therefore ResourceTracker's compute_nodes dict will not be populated at all.
When confirm calls _delete_allocation_after_move() it will fail with ComputeHostNotFound because there is no entry for the current node in ResourceTracker. The error looks like:
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [req-4f7d5d63-fc05-46ed-b505-41050d889752 09abbd4893bb45eea8fb1d5e40635339 d4483d13a6ef41b2ae575ddbd0c59141 - default default] [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] Setting instance vm_state to ERROR: ComputeHostNotFound: Compute host compute-1.localdomain could not be found.
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] Traceback (most recent call last):
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7445, in _error_out_instance_on_exception
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] yield
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3757, in _confirm_resize
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] migration.source_node)
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3790, in _delete_allocation_after_move
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] cn_uuid = rt.get_node_uuid(nodename)
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 155, in get_node_uuid
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] raise exception.ComputeHostNotFound(host=nodename)
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] ComputeHostNotFound: Compute host compute-1.localdomain could not be found.
if fix previous bug, this should be avoidable ?
otherwise, delete a migration record should update compude node and without compute node this should raise some exception, so it's a work as designed behaviour?