conductor: Recreate volume attachments during a reschedule
When an instance with attached volumes fails to spawn, cleanup code
within the compute manager (_shutdown_instance called from
_build_resources) will delete the volume attachments referenced by
the bdms in Cinder. As a result we should check and if necessary
recreate these volume attachments when rescheduling an instance.
Note that there are a few different ways to fix this bug by
making changes to the compute manager code, either by not deleting
the volume attachment on failure before rescheduling [1] or by
performing the get/create check during each build after the
reschedule [2].
The problem with *not* cleaning up the attachments is if we don't
reschedule, then we've left orphaned "reserved" volumes in Cinder
(or we have to add special logic to tell compute when to cleanup
attachments).
The problem with checking the existence of the attachment on every
new host we build on is that we'd be needlessly checking that for
initial creates even if we don't ever need to reschedule, unless
again we have special logic against that (like checking to see if
we've rescheduled at all).
Also, in either case that involves changes to the compute means that
older computes might not have the fix.
So ultimately it seems that the best way to handle this is:
1. Only deal with this on reschedules.
2. Let the cell conductor orchestrate it since it's already dealing
with the reschedule. Then the compute logic doesn't need to change.
Reviewed: https:/ /review. openstack. org/612496 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=3a0f26c8229 12dbaad7c8a1d6a 3c599025afab37
Committed: https:/
Submitter: Zuul
Branch: stable/queens
commit 3a0f26c822912db aad7c8a1d6a3c59 9025afab37
Author: Lee Yarwood <email address hidden>
Date: Mon Jul 30 13:41:35 2018 +0100
conductor: Recreate volume attachments during a reschedule
When an instance with attached volumes fails to spawn, cleanup code resources) will delete the volume attachments referenced by
within the compute manager (_shutdown_instance called from
_build_
the bdms in Cinder. As a result we should check and if necessary
recreate these volume attachments when rescheduling an instance.
Note that there are a few different ways to fix this bug by
making changes to the compute manager code, either by not deleting
the volume attachment on failure before rescheduling [1] or by
performing the get/create check during each build after the
reschedule [2].
The problem with *not* cleaning up the attachments is if we don't
reschedule, then we've left orphaned "reserved" volumes in Cinder
(or we have to add special logic to tell compute when to cleanup
attachments).
The problem with checking the existence of the attachment on every
new host we build on is that we'd be needlessly checking that for
initial creates even if we don't ever need to reschedule, unless
again we have special logic against that (like checking to see if
we've rescheduled at all).
Also, in either case that involves changes to the compute means that
older computes might not have the fix.
So ultimately it seems that the best way to handle this is:
1. Only deal with this on reschedules.
2. Let the cell conductor orchestrate it since it's already dealing
with the reschedule. Then the compute logic doesn't need to change.
[1] https:/ /review. openstack. org/#/c/ 587071/ 3/nova/ compute/ manager. py@1631 /review. openstack. org/#/c/ 587071/ 4/nova/ compute/ manager. py@1667
[2] https:/
Conflicts:
nova/ tests/unit/ conductor/ test_conductor. py
NOTE(mriedem): There was a minor conflict due to not having change f06a58c3a7e8c25 96471991950433a in Queens.
I56fb1fd984
Change-Id: I739c06bd02336b f720cddacb21f48 e7857378487 e54eef24803f4ad c468131b34) 7c34698d9459b0a be3f215046)
Closes-bug: #1784353
(cherry picked from commit 41452a5c6adb8ca
(cherry picked from commit d3397788fe2d926