port binding 'migrating_to' attribute not cleaned up on failed live migration if using local shared disk
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Low
|
Matt Riedemann | ||
Pike |
Confirmed
|
Low
|
Unassigned | ||
Queens |
Fix Committed
|
Low
|
Brian Haley |
Bug Description
This code was added back in Newton: https:/
That plumbs a 'migrating_to' attribute in the port binding profile during live migration. It's needed on the neutron side for live migration an instance with floating IPs using DVR.
When live migration completes, either successfully or due to failure, the migrating_to attribute should be cleaned up. This happens via the setup_networks_
The problem is that on a failed live migration, that cleanup only happens if the instance is not using shared local disk storage because of this do_cleanup flag:
This is based purely on code inspection since I don't have a multinode DVR setup with the rbd imagebackend for the libvirt driver to test this out (we could create a CI job to do all that if we wanted to). But it seems pretty obvious that the ComputeManager.
Having said all this, this code has been in nova since newton and the DVR migrating_to changes have been in neutron since mitaka, and no one has reported this problem, so it's either not widely used or it doesn't cause much of a problem if we don't cleanup the migrating_to entry in the binding profile on failed live migration, although I'd think neutron should cleanup the floating IP router gateway that DVR creates on the dest host.
FWIW, calling setup_networks_ on_host( ) from post_live_ migration_ at_destination( ) was added way back in Essex:
https:/ /github. com/openstack/ nova/commit/ 0c7a54b3b44f849 bf397bb4068ab16 c576c3559c
So it was primarily a nova-network thing (was Quantum even being used in Essex?).