Concurrent migration of vms with the same multiattach volume fails
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
In Progress
|
Low
|
Unassigned |
Bug Description
reproduce:
1. Create multiple vms
2. Create a multiattach volume
3. Attach the volume to all vms
4. Shut down all vms and migrate all vms at the same time
5. It is possible to find that a vm migration failed
The nova-compute log is as follows:
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
2022-04-11 16:49:46.685 23871 ERROR nova.compute.
vm migration Process:
1. Call cinder's attachment_create function on the source node.Set the
multiattach volume status to reserved.
2. The vm performs the migration operation
3. Call cinder attachment_update function on the destination node, update the connection_info information in the attachment record, and set the status of the volume to attaching
4. Call cinder attachment_complete function on the destination node and set the volume status to in-use
The reason for the failure of vm migration is that the multiattach volume status changes to attaching after the 3 step of the vm migration process. At this time, when another vm migrates to the 1 step, it is judged that the volume status is attaching, which leads to the execution of attachment_create fail.
def _attachment_
# NOTE(jdg): Reserved is a special case, we're avoiding allowing
# creation of other new reserves/
# so we avoid contention issues with shared connections
# Multiattach of bootable volumes is a special case with it's own
# policy, check that here right off the bat
if (vref.get(
# FIXME(JDG): We want to be able to do things here like reserve a
# volume for Nova to do BFV WHILE the volume may be in the process of
# downloading image, we add downloading here; that's easy enough but
# we've got a race between with the attaching/detaching that we do
# locally on the Cinder node. Just come up with an easy way to
# determine if we're attaching to the Cinder host for some work or if
# we're being used by the outside world.
expected = {'multiattach': vref.multiattach,
result = vref.conditiona
if not result:
if instance_uuid and vref.status in ('in-use', 'reserved'):
# Refresh the volume reference in case multiple instances were
# being concurrently attached to the same non-multiattach
# volume.
for attachment in vref.volume_
if not override:
msg = (_('Volume %(vol_id)s status must be %(statuses)s to '
values = {'volume_id': vref.id,
db_ref = self.db.
return objects.
tags: | added: migration multiattach |
Seems valid indeed, we somehow need to ensure the concurrency mechanism verifies the correct attachment status.