Comment 9 for bug 1853009

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/695012
Committed: https://opendev.org/openstack/nova/commit/59d9871e8a0672538f8ffc43ae99b3d1c4b08909
Submitter: "Zuul (22348)"
Branch: master

commit 59d9871e8a0672538f8ffc43ae99b3d1c4b08909
Author: Mark Goddard <email address hidden>
Date: Tue Nov 19 14:45:02 2019 +0000

    Add functional regression test for bug 1853009

    Bug 1853009 describes a race condition involving multiple nova-compute
    services with ironic. As the compute services start up, the hash ring
    rebalances, and the compute services have an inconsistent view of which
    is responsible for a compute node.

    The sequence of actions here is adapted from a real world log [1], where
    multiple nova-compute services were started simultaneously. In some
    cases mocks are used to simulate race conditions.

    There are three main issues with the behaviour:

    * host2 deletes the orphan node compute node after host1 has taken
      ownership of it.

    * host1 assumes that another compute service will not delete its nodes.
      Once a node is in rt.compute_nodes, it is not removed again unless the
      node is orphaned. This prevents host1 from recreating the compute
      node.

    * host1 assumes that another compute service will not delete its
      resource providers. Once an RP is in the provider tree, it is not
      removed.

    This functional test documents the current behaviour, with the idea that
    it can be updated as this behaviour is fixed.

    [1] http://paste.openstack.org/show/786272/

    Co-Authored-By: Matt Riedemann <email address hidden>

    Change-Id: Ice4071722de54e8d20bb8c3795be22f1995940cd
    Related-Bug: #1853009
    Related-Bug: #1853159