OpenStack Compute (nova)

Comment 38 for bug 1853009

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-06-04: Fix merged to nova (stable/wallaby)

#38

Reviewed: https://review.opendev.org/c/openstack/nova/+/811808
Committed: https://opendev.org/openstack/nova/commit/cbbca58504275f194ec55eeb89dad4a496d98060
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit cbbca58504275f194ec55eeb89dad4a496d98060
Author: Mark Goddard <email address hidden>
Date: Mon Nov 18 12:06:47 2019 +0000

Prevent deletion of a compute node belonging to another host

    There is a race condition in nova-compute with the ironic virt driver as
    nodes get rebalanced. It can lead to compute nodes being removed in the
    DB and not repopulated. Ultimately this prevents these nodes from being
    scheduled to.

    The main race condition involved is in update_available_resources in
    the compute manager. When the list of compute nodes is queried, there is
    a compute node belonging to the host that it does not expect to be
    managing, i.e. it is an orphan. Between that time and deleting the
    orphan, the real owner of the compute node takes ownership of it ( in
    the resource tracker). However, the node is still deleted as the first
    host is unaware of the ownership change.

    This change prevents this from occurring by filtering on the host when
    deleting a compute node. If another compute host has taken ownership of
    a node, it will have updated the host field and this will prevent
    deletion from occurring. The first host sees this has happened via the
    ComputeHostNotFound exception, and avoids deleting its resource
    provider.

Co-Authored-By: melanie witt <email address hidden>

Conflicts:
nova/db/sqlalchemy/api.py

    NOTE(melwitt): The conflict is because change
    I9f414cf831316b624132d9e06192f1ecbbd3dd78 (db: Copy docs from
    'nova.db.*' to 'nova.db.sqlalchemy.*') is not in Wallaby.

    NOTE(melwitt): Differences from the cherry picked change from calling
    nova.db.api => nova.db.sqlalchemy.api directly are due to the alembic
    migration in Xena which looks to have made the nova.db.api interface
    obsolete.

Closes-Bug: #1853009
Related-Bug: #1841481

Change-Id: I260c1fded79a85d4899e94df4d9036a1ee437f02
(cherry picked from commit a8492e88783b40f6dc61888fada232f0d00d6acf)

Reviewed:  https://review.opendev.org/c/openstack/nova/+/811808
Committed: https://opendev.org/openstack/nova/commit/cbbca58504275f194ec55eeb89dad4a496d98060
Submitter: "Zuul (22348)"
Branch:    stable/wallaby

commit cbbca58504275f194ec55eeb89dad4a496d98060
Author: Mark Goddard <mark@stackhpc.com>
Date:   Mon Nov 18 12:06:47 2019 +0000

Prevent deletion of a compute node belonging to another host
    
    There is a race condition in nova-compute with the ironic virt driver as
    nodes get rebalanced. It can lead to compute nodes being removed in the
    DB and not repopulated. Ultimately this prevents these nodes from being
    scheduled to.
    
    The main race condition involved is in update_available_resources in
    the compute manager. When the list of compute nodes is queried, there is
    a compute node belonging to the host that it does not expect to be
    managing, i.e. it is an orphan. Between that time and deleting the
    orphan, the real owner of the compute node takes ownership of it ( in
    the resource tracker). However, the node is still deleted as the first
    host is unaware of the ownership change.
    
    This change prevents this from occurring by filtering on the host when
    deleting a compute node. If another compute host has taken ownership of
    a node, it will have updated the host field and this will prevent
    deletion from occurring. The first host sees this has happened via the
    ComputeHostNotFound exception, and avoids deleting its resource
    provider.
    
    Co-Authored-By: melanie witt <melwittt@gmail.com>
    
    Conflicts:
        nova/db/sqlalchemy/api.py
    
    NOTE(melwitt): The conflict is because change
    I9f414cf831316b624132d9e06192f1ecbbd3dd78 (db: Copy docs from
    'nova.db.*' to 'nova.db.sqlalchemy.*') is not in Wallaby.
    
    NOTE(melwitt): Differences from the cherry picked change from calling
    nova.db.api => nova.db.sqlalchemy.api directly are due to the alembic
    migration in Xena which looks to have made the nova.db.api interface
    obsolete.
    
    Closes-Bug: #1853009
    Related-Bug: #1841481
    
    Change-Id: I260c1fded79a85d4899e94df4d9036a1ee437f02
    (cherry picked from commit a8492e88783b40f6dc61888fada232f0d00d6acf)