mnaser reported a weird case where an instance was found
in both cell0 (deleted there) and in cell1 (not deleted
there but in error state from a failed build). It's unclear
how this could happen besides some weird clustered rabbitmq
issue where maybe the schedule and build request to conductor
happens twice for the same instance and one picks a host and
tries to build and the other fails during scheduling and is
buried in cell0.
To avoid a split brain situation like this, we add a sanity
check in _bury_in_cell0 to make sure the instance mapping is
not pointing at a cell when we go to update it to cell0.
Similarly a check is added in the schedule_and_build_instances
flow (the code is moved to a private method to make it easier
to test).
Worst case is this is unnecessary but doesn't hurt anything,
best case is this helps avoid split brain clustered rabbit
issues.
Closes-Bug: #1775934
Change-Id: I335113f0ec59516cb337d34b6fc9078ea202130f
(cherry picked from commit 5b552518e1abdc63fb33c633661e30e4b2fe775e)
Reviewed: https:/ /review. opendev. org/752279 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=efc35b1c529 3c7c6c85f8cf9fd 9d8cd8de71d1d5
Committed: https:/
Submitter: Zuul
Branch: stable/train
commit efc35b1c5293c7c 6c85f8cf9fd9d8c d8de71d1d5
Author: Matt Riedemann <email address hidden>
Date: Fri Sep 20 17:07:35 2019 -0400
Sanity check instance mapping during scheduling
mnaser reported a weird case where an instance was found
in both cell0 (deleted there) and in cell1 (not deleted
there but in error state from a failed build). It's unclear
how this could happen besides some weird clustered rabbitmq
issue where maybe the schedule and build request to conductor
happens twice for the same instance and one picks a host and
tries to build and the other fails during scheduling and is
buried in cell0.
To avoid a split brain situation like this, we add a sanity and_build_ instances
check in _bury_in_cell0 to make sure the instance mapping is
not pointing at a cell when we go to update it to cell0.
Similarly a check is added in the schedule_
flow (the code is moved to a private method to make it easier
to test).
Worst case is this is unnecessary but doesn't hurt anything,
best case is this helps avoid split brain clustered rabbit
issues.
Closes-Bug: #1775934
Change-Id: I335113f0ec5951 6cb337d34b6fc90 78ea202130f 3fb33c633661e30 e4b2fe775e)
(cherry picked from commit 5b552518e1abdc6