live migration of a vm using the single port binding work flow is broken in train as a result of the introduction of sriov live migration
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
sean mooney | ||
Train |
Fix Released
|
High
|
sean mooney | ||
Ussuri |
Fix Released
|
High
|
Billy Olsen | ||
Ubuntu Cloud Archive |
Fix Released
|
Undecided
|
Unassigned | ||
Train |
Fix Released
|
Undecided
|
Unassigned | ||
Ussuri |
Fix Released
|
Undecided
|
Unassigned | ||
Victoria |
Fix Released
|
Undecided
|
Unassigned | ||
networking-opencontrail |
New
|
Undecided
|
Unassigned | ||
nova (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
High
|
Unassigned | ||
Groovy |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
Live migration of instances in an environment that uses neutron backends that do not support multiple port bindings will fail with error 'NotImplemented', effectively rendering live-migration inoperable in these environments.
This is fixed by first checking to ensure the backend supports the multiple port bindings before providing the port bindings.
[Test Plan]
1. deploy a Train/Ussuri OpenStack cloud w/ at least 2 compute nodes using an SDN that does not support multiple port bindings (e.g. opencontrail).
2. Attempt to perform a live migration of an instance.
3. Observe that the live migration will fail without this fix due to the trace below (NotImplemented
[Where problems could occur]
This affects the live migration code, so likely problems would arise in this area. Specifically, the check introduced is guarding information provided for instances using SR-IOV indirect migration.
Regressions would likely occur in the form of live migration errors around features that rely on the multiple port bindings (e.g. the SR-IOV) and not the more generic/common use case. Errors may be seen in standard network providers that are included with distro packaging, but may also be seen in scenarios where proprietary SDNs are used.
[Original Description]
it was working in queens but fails in train. nova compute at the target aborts with the exception:
Traceback (most recent call last):
File "/usr/lib/
res = self.dispatcher
File "/usr/lib/
return self._do_
File "/usr/lib/
result = func(ctxt, **new_args)
File "/usr/lib/
function_name, call_dict, binary, tb)
File "/usr/lib/
self.
File "/usr/lib/
six.
File "/usr/lib/
return f(self, context, *args, **kw)
File "/usr/lib/
return function(self, context, *args, **kwargs)
File "/usr/lib/
kwargs[
File "/usr/lib/
File "/usr/lib/
six.
return function(self, context, *args, **kwargs)
File "/usr/lib/
bdm.save()
File "/usr/lib/
self.
File "/usr/lib/
six.
File "/usr/lib/
migrate_data)
File "/usr/lib/
instance, network_info, migrate_data)
File "/usr/lib/
vif_
File "/usr/lib/
vif['type'] = self.vif_type
File "/usr/lib/
self.
File "/usr/lib/
_("Cannot load '%s' in the base class") % attrname)
NotImplementedE
steps to reproduce:
- train centos 7 based deployment: 1 controller, 2 computes, libvirt + qemu-kvm, ceph shared storage, neutron with contrail vrouter virtual network;
- create and start a vm;
- live migrate it between computes.
expected result: vm migrates successfully.
rpm -qa | grep nova:
python2-
openstack-
python2-
openstack-
Changed in nova: | |
assignee: | nobody → Sergey Galas' (shrike742) |
Changed in nova: | |
assignee: | Sergey Galas' (shrike742) → Kirill Egorov (kegorov-progmaticlab) |
status: | New → In Progress |
Changed in nova: | |
assignee: | Kirill Egorov (kegorov-progmaticlab) → sean mooney (sean-k-mooney) |
status: | Triaged → In Progress |
Changed in nova (Ubuntu Groovy): | |
status: | New → Fix Released |
Changed in nova (Ubuntu Focal): | |
status: | New → Triaged |
importance: | Undecided → High |
description: | updated |
as requested in https:/ /review. opendev. org/#/c/ 742180/ 4/nova/ objects/ migrate_ data.py@ 97
can you please provided addtional logs and repoduction steps.
specfically the nova compute server logs form the souce and dest compute node + the conductor logs and ideally the neutron server logs for this instance?