OpenStack Compute (nova)

Bug #1293480
Comment #27

Comment 27 for bug 1293480

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-06-18: Related fix merged to nova (stable/juno)

#27

Reviewed: https://review.openstack.org/192244
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7bc4be781564c6b9e7a519aecea84ddbee6bd935
Submitter: Jenkins
Branch: stable/juno

commit 7bc4be781564c6b9e7a519aecea84ddbee6bd935
Author: Matt Riedemann <email address hidden>
Date: Wed Apr 15 11:51:26 2015 -0700

compute: stop handling virt lifecycle events in cleanup_host()

    When rebooting a compute host, guest VMs can be getting shutdown
    automatically by the hypervisor and the virt driver is sending events to
    the compute manager to handle them. If the compute service is still up
    while this happens it will try to call the stop API to power off the
    instance and update the database to show the instance as stopped.

    When the compute service comes back up and events come in from the virt
    driver that the guest VMs are running, nova will see that the vm_state
    on the instance in the nova database is STOPPED and shut down the
    instance by calling the stop API (basically ignoring what the virt
    driver / hypervisor tells nova is the state of the guest VM).

    Alternatively, if the compute service shuts down after changing the
    intance task_state to 'powering-off' but before the stop API cast is
    complete, the instance can be in a strange vm_state/task_state
    combination that requires the admin to manually reset the task_state to
    recover the instance.

    Let's just try to avoid some of this mess by disconnecting the event
    handling when the compute service is shutting down like we do for
    neutron VIF plugging events. There could still be races here if the
    compute service is shutting down after the hypervisor (e.g. libvirtd),
    but this is at least a best attempt to do the mitigate the potential
    damage.

    Closes-Bug: #1444630
    Related-Bug: #1293480
    Related-Bug: #1408176

    Conflicts:
     nova/compute/manager.py
     nova/tests/unit/compute/test_compute_mgr.py

Change-Id: I1a321371dff7933cdd11d31d9f9c2a2f850fd8d9
(cherry picked from commit d1fb8d0fbdd6cb95c43b02f754409f1c728e8cd0)

Reviewed:  https://review.openstack.org/192244
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7bc4be781564c6b9e7a519aecea84ddbee6bd935
Submitter: Jenkins
Branch:    stable/juno

commit 7bc4be781564c6b9e7a519aecea84ddbee6bd935
Author: Matt Riedemann <mriedem@us.ibm.com>
Date:   Wed Apr 15 11:51:26 2015 -0700

compute: stop handling virt lifecycle events in cleanup_host()
    
    When rebooting a compute host, guest VMs can be getting shutdown
    automatically by the hypervisor and the virt driver is sending events to
    the compute manager to handle them. If the compute service is still up
    while this happens it will try to call the stop API to power off the
    instance and update the database to show the instance as stopped.
    
    When the compute service comes back up and events come in from the virt
    driver that the guest VMs are running, nova will see that the vm_state
    on the instance in the nova database is STOPPED and shut down the
    instance by calling the stop API (basically ignoring what the virt
    driver / hypervisor tells nova is the state of the guest VM).
    
    Alternatively, if the compute service shuts down after changing the
    intance task_state to 'powering-off' but before the stop API cast is
    complete, the instance can be in a strange vm_state/task_state
    combination that requires the admin to manually reset the task_state to
    recover the instance.
    
    Let's just try to avoid some of this mess by disconnecting the event
    handling when the compute service is shutting down like we do for
    neutron VIF plugging events. There could still be races here if the
    compute service is shutting down after the hypervisor (e.g. libvirtd),
    but this is at least a best attempt to do the mitigate the potential
    damage.
    
    Closes-Bug: #1444630
    Related-Bug: #1293480
    Related-Bug: #1408176
    
    Conflicts:
    	nova/compute/manager.py
    	nova/tests/unit/compute/test_compute_mgr.py
    
    Change-Id: I1a321371dff7933cdd11d31d9f9c2a2f850fd8d9
    (cherry picked from commit d1fb8d0fbdd6cb95c43b02f754409f1c728e8cd0)