commit a8492e88783b40f6dc61888fada232f0d00d6acf
Author: Mark Goddard <email address hidden>
Date: Mon Nov 18 12:06:47 2019 +0000
Prevent deletion of a compute node belonging to another host
There is a race condition in nova-compute with the ironic virt driver as
nodes get rebalanced. It can lead to compute nodes being removed in the
DB and not repopulated. Ultimately this prevents these nodes from being
scheduled to.
The main race condition involved is in update_available_resources in
the compute manager. When the list of compute nodes is queried, there is
a compute node belonging to the host that it does not expect to be
managing, i.e. it is an orphan. Between that time and deleting the
orphan, the real owner of the compute node takes ownership of it ( in
the resource tracker). However, the node is still deleted as the first
host is unaware of the ownership change.
This change prevents this from occurring by filtering on the host when
deleting a compute node. If another compute host has taken ownership of
a node, it will have updated the host field and this will prevent
deletion from occurring. The first host sees this has happened via the
ComputeHostNotFound exception, and avoids deleting its resource
provider.
Reviewed: https:/ /review. opendev. org/c/openstack /nova/+ /694802 /opendev. org/openstack/ nova/commit/ a8492e88783b40f 6dc61888fada232 f0d00d6acf
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit a8492e88783b40f 6dc61888fada232 f0d00d6acf
Author: Mark Goddard <email address hidden>
Date: Mon Nov 18 12:06:47 2019 +0000
Prevent deletion of a compute node belonging to another host
There is a race condition in nova-compute with the ironic virt driver as
nodes get rebalanced. It can lead to compute nodes being removed in the
DB and not repopulated. Ultimately this prevents these nodes from being
scheduled to.
The main race condition involved is in update_ available_ resources in
the compute manager. When the list of compute nodes is queried, there is
a compute node belonging to the host that it does not expect to be
managing, i.e. it is an orphan. Between that time and deleting the
orphan, the real owner of the compute node takes ownership of it ( in
the resource tracker). However, the node is still deleted as the first
host is unaware of the ownership change.
This change prevents this from occurring by filtering on the host when NotFound exception, and avoids deleting its resource
deleting a compute node. If another compute host has taken ownership of
a node, it will have updated the host field and this will prevent
deletion from occurring. The first host sees this has happened via the
ComputeHost
provider.
Co-Authored-By: melanie witt <email address hidden>
Closes-Bug: #1853009
Related-Bug: #1841481
Change-Id: I260c1fded79a85 d4899e94df4d903 6a1ee437f02