OpenStack Compute (nova)

Comment 12 for bug 1662867

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-05-31: Fix merged to nova (stable/queens)

#12

Reviewed: https://review.openstack.org/571424
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5a4c6913a37f912489543abd5e12a54feeeb89e2
Submitter: Zuul
Branch: stable/queens

commit 5a4c6913a37f912489543abd5e12a54feeeb89e2
Author: Matt Riedemann <email address hidden>
Date: Wed Mar 14 16:43:22 2018 -0400

libvirt: handle DiskNotFound during update_available_resource

    The update_available_resource periodic task in the compute manager
    eventually calls through to the resource tracker and virt driver
    get_available_resource method, which gets the guests running on
    the hypervisor, and builds up a set of information about the host.
    This includes disk information for the active domains.

    However, the periodic task can race with instances being deleted
    concurrently and the hypervisor can report the domain but the driver
    has already deleted the backing files as part of deleting the
    instance, and this leads to failures when running "qemu-img info"
    on the disk path which is now gone.

When that happens, the entire periodic update fails.

    This change simply tries to detect the specific failure from
    'qemu-img info' and translate it into a DiskNotFound exception which
    the driver can handle. In this case, if the associated instance is
    undergoing a task state transition such as moving to another host or
    being deleted, we log a message and continue. If the instance is in
    steady state (task_state is not set), then we consider it a failure
    and re-raise it up.

    Note that we could add the deleted=False filter to the instance query
    in _get_disk_over_committed_size_total but that doesn't help us in
    this case because the hypervisor says the domain is still active
    and the instance is not actually considered deleted in the DB yet.

    Change-Id: Icec2769bf42455853cbe686fb30fda73df791b25
    Closes-Bug: #1662867
    (cherry picked from commit 5f16e714f58336344752305f94451e7c7c55742c)

Reviewed:  https://review.openstack.org/571424
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5a4c6913a37f912489543abd5e12a54feeeb89e2
Submitter: Zuul
Branch:    stable/queens

commit 5a4c6913a37f912489543abd5e12a54feeeb89e2
Author: Matt Riedemann <mriedem.os@gmail.com>
Date:   Wed Mar 14 16:43:22 2018 -0400

libvirt: handle DiskNotFound during update_available_resource
    
    The update_available_resource periodic task in the compute manager
    eventually calls through to the resource tracker and virt driver
    get_available_resource method, which gets the guests running on
    the hypervisor, and builds up a set of information about the host.
    This includes disk information for the active domains.
    
    However, the periodic task can race with instances being deleted
    concurrently and the hypervisor can report the domain but the driver
    has already deleted the backing files as part of deleting the
    instance, and this leads to failures when running "qemu-img info"
    on the disk path which is now gone.
    
    When that happens, the entire periodic update fails.
    
    This change simply tries to detect the specific failure from
    'qemu-img info' and translate it into a DiskNotFound exception which
    the driver can handle. In this case, if the associated instance is
    undergoing a task state transition such as moving to another host or
    being deleted, we log a message and continue. If the instance is in
    steady state (task_state is not set), then we consider it a failure
    and re-raise it up.
    
    Note that we could add the deleted=False filter to the instance query
    in _get_disk_over_committed_size_total but that doesn't help us in
    this case because the hypervisor says the domain is still active
    and the instance is not actually considered deleted in the DB yet.
    
    Change-Id: Icec2769bf42455853cbe686fb30fda73df791b25
    Closes-Bug: #1662867
    (cherry picked from commit 5f16e714f58336344752305f94451e7c7c55742c)