Bug 1853009 describes a race condition involving multiple nova-compute
services with ironic. As the compute services start up, the hash ring
rebalances, and the compute services have an inconsistent view of which
is responsible for a compute node.
The sequence of actions here is adapted from a real world log [1], where
multiple nova-compute services were started simultaneously. In some
cases mocks are used to simulate race conditions.
There are three main issues with the behaviour:
* host2 deletes the orphan node compute node after host1 has taken
ownership of it.
* host1 assumes that another compute service will not delete its nodes.
Once a node is in rt.compute_nodes, it is not removed again unless the
node is orphaned. This prevents host1 from recreating the compute
node.
* host1 assumes that another compute service will not delete its
resource providers. Once an RP is in the provider tree, it is not
removed.
This functional test documents the current behaviour, with the idea that
it can be updated as this behaviour is fixed.
Reviewed: https:/ /review. opendev. org/c/openstack /nova/+ /695012 /opendev. org/openstack/ nova/commit/ 59d9871e8a06725 38f8ffc43ae99b3 d1c4b08909
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 59d9871e8a06725 38f8ffc43ae99b3 d1c4b08909
Author: Mark Goddard <email address hidden>
Date: Tue Nov 19 14:45:02 2019 +0000
Add functional regression test for bug 1853009
Bug 1853009 describes a race condition involving multiple nova-compute
services with ironic. As the compute services start up, the hash ring
rebalances, and the compute services have an inconsistent view of which
is responsible for a compute node.
The sequence of actions here is adapted from a real world log [1], where
multiple nova-compute services were started simultaneously. In some
cases mocks are used to simulate race conditions.
There are three main issues with the behaviour:
* host2 deletes the orphan node compute node after host1 has taken
ownership of it.
* host1 assumes that another compute service will not delete its nodes.
Once a node is in rt.compute_nodes, it is not removed again unless the
node is orphaned. This prevents host1 from recreating the compute
node.
* host1 assumes that another compute service will not delete its
resource providers. Once an RP is in the provider tree, it is not
removed.
This functional test documents the current behaviour, with the idea that
it can be updated as this behaviour is fixed.
[1] http:// paste.openstack .org/show/ 786272/
Co-Authored-By: Matt Riedemann <email address hidden>
Change-Id: Ice4071722de54e 8d20bb8c3795be2 2f1995940cd
Related-Bug: #1853009
Related-Bug: #1853159