Force live migrate doesn't claim resources on the target host
Bug #1712008 reported by
Lajos Katona
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Critical
|
Matt Riedemann | ||
Pike |
Fix Committed
|
Critical
|
Matt Riedemann |
Bug Description
During force live live migrate nova doesn't claims the resources on the target host as expected, see the sequence:
* Boot a VM.
* Force live migrate the VM.
* Check the allocations:
** the claims are still on the source host.
** on the destination there is no claim.
This situation doesn't change after running the periodics.
The test that contains the expected assertions (commented out now):
https:/
nova commit: 08ec8a1ad3f3492
tags: | added: live-migration placement |
tags: | added: pike-rc-potential |
Changed in nova: | |
status: | New → Triaged |
To post a comment you must log in.
The problem is here:
https:/ /github. com/openstack/ nova/blob/ 16.0.0. 0rc1/nova/ conductor/ tasks/live_ migrate. py#L51- L56
When a host is forced, conductor bypasses the call to scheduler_ client. select_ destinations which is the code that eventually creates the allocation on the destination host:
https:/ /github. com/openstack/ nova/blob/ 16.0.0. 0rc1/nova/ scheduler/ client/ report. py#L147
And due to this change:
https:/ /review. openstack. org/#/c/ 491012/
If all of your computes are upgraded, the resource tracker isn't going to "heal" the allocations on the target host during it's update_ available_ resources periodic task.
Thinking of solutions:
1. Both paths are going to eventually call check_can_ live_migrate_ destination on the destination compute host so we could create the allocation there, although that gets tricky since it could overwrite any allocations that the scheduler created via select_destinations if a host isn't forced.
2. Just call placement from conductor when a host isn't forced, somewhere in this else block:
https:/ /github. com/openstack/ nova/blob/ 16.0.0. 0rc1/nova/ conductor/ tasks/live_ migrate. py#L56
That's probably the cleanest since it wouldn't overwrite any allocations by the scheduler, since the scheduler isn't called, and it would actually make the destination host allocations correct before the RT could heal them, assuming not all compute nodes are upgraded yet.