BDMNotFound raised and stale block devices left over when simultaneously reboot and deleting an instance
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Undecided
|
Lee Yarwood | ||
Queens |
Fix Released
|
Undecided
|
Lee Yarwood | ||
Rocky |
Fix Committed
|
Undecided
|
Lee Yarwood | ||
Stein |
Fix Committed
|
Undecided
|
Lee Yarwood | ||
Train |
Fix Committed
|
Undecided
|
Lee Yarwood |
Bug Description
Description
===========
Simultaneous requests to reboot and delete an instance _will_ race as only the call to delete takes a lock against the instance.uuid.
One possible outcome of this seen in the wild with the Libvirt driver is that the request to soft reboot will eventually turn into a hard reboot, reconnecting volumes that the delete request has already disconnected. These volumes will eventually be unmapped on the Cinder side by the delete request leaving stale devices on the host. Additionally BDMNotFound is raised by the reboot operation as the delete operation has already deleted the BDMs.
Steps to reproduce
==================
$ nova reboot $instance && nova delete $instance
Expected result
===============
The instance reboots and is then deleted without any errors raised.
Actual result
=============
BDMNotFound raised and stale block devices left over.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://
1599e3cf68779ea
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
Libvirt + QEMU/kvm
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
N/A
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Logs & Configs
==============
Changed in nova: | |
assignee: | nobody → Lee Yarwood (lyarwood) |
status: | New → In Progress |
Reviewed: https:/ /review. opendev. org/673463 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=9ad54f3dacb d372271f441baea 5380f913072dde
Committed: https:/
Submitter: Zuul
Branch: master
commit 9ad54f3dacbd372 271f441baea5380 f913072dde
Author: Lee Yarwood <email address hidden>
Date: Mon Jul 29 16:25:45 2019 +0100
compute: Take an instance.uuid lock when rebooting
Previously simultaneous requests to reboot and delete an instance could
race as only the latter took a lock against the uuid of the instance.
With the Libvirt driver this race could potentially result in attempts
being made to reconnect previously disconnected volumes on the host.
Depending on the volume backend being used this could then result in
stale block devices point to unmapped volumes being left on the host
that in turn could cause failures later on when connecting newly mapped
volumes.
This change avoids this race by ensuring any request to reboot an
instance takes an instance.uuid lock within the compute manager,
serialising requests to reboot and then delete the instance.
Closes-Bug: #1838392 67f92ec05453576 6cdd722dae2
Change-Id: Ieb59de10c63bb0