compute: stop handling virt lifecycle events in cleanup_host()
When rebooting a compute host, guest VMs can be getting shutdown
automatically by the hypervisor and the virt driver is sending events to
the compute manager to handle them. If the compute service is still up
while this happens it will try to call the stop API to power off the
instance and update the database to show the instance as stopped.
When the compute service comes back up and events come in from the virt
driver that the guest VMs are running, nova will see that the vm_state
on the instance in the nova database is STOPPED and shut down the
instance by calling the stop API (basically ignoring what the virt
driver / hypervisor tells nova is the state of the guest VM).
Alternatively, if the compute service shuts down after changing the
intance task_state to 'powering-off' but before the stop API cast is
complete, the instance can be in a strange vm_state/task_state
combination that requires the admin to manually reset the task_state to
recover the instance.
Let's just try to avoid some of this mess by disconnecting the event
handling when the compute service is shutting down like we do for
neutron VIF plugging events. There could still be races here if the
compute service is shutting down after the hypervisor (e.g. libvirtd),
but this is at least a best attempt to do the mitigate the potential
damage.
Reviewed: https:/ /review. openstack. org/192244 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=7bc4be78156 4c6b9e7a519aece a84ddbee6bd935
Committed: https:/
Submitter: Jenkins
Branch: stable/juno
commit 7bc4be781564c6b 9e7a519aecea84d dbee6bd935
Author: Matt Riedemann <email address hidden>
Date: Wed Apr 15 11:51:26 2015 -0700
compute: stop handling virt lifecycle events in cleanup_host()
When rebooting a compute host, guest VMs can be getting shutdown
automatically by the hypervisor and the virt driver is sending events to
the compute manager to handle them. If the compute service is still up
while this happens it will try to call the stop API to power off the
instance and update the database to show the instance as stopped.
When the compute service comes back up and events come in from the virt
driver that the guest VMs are running, nova will see that the vm_state
on the instance in the nova database is STOPPED and shut down the
instance by calling the stop API (basically ignoring what the virt
driver / hypervisor tells nova is the state of the guest VM).
Alternatively, if the compute service shuts down after changing the
intance task_state to 'powering-off' but before the stop API cast is
complete, the instance can be in a strange vm_state/task_state
combination that requires the admin to manually reset the task_state to
recover the instance.
Let's just try to avoid some of this mess by disconnecting the event
handling when the compute service is shutting down like we do for
neutron VIF plugging events. There could still be races here if the
compute service is shutting down after the hypervisor (e.g. libvirtd),
but this is at least a best attempt to do the mitigate the potential
damage.
Closes-Bug: #1444630
Related-Bug: #1293480
Related-Bug: #1408176
Conflicts: compute/ manager. py tests/unit/ compute/ test_compute_ mgr.py
nova/
nova/
Change-Id: I1a321371dff793 3cdd11d31d9f9c2 a2f850fd8d9 5c43b02f754409f 1c728e8cd0)
(cherry picked from commit d1fb8d0fbdd6cb9