Allocations may not be removed from dest node during failed migrations

Bug #1712411 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Pike
Fix Committed
High
Matt Riedemann

Bug Description

This could also be true for cold migrate/resize/unshelve, but I'm specifically looking at live migration here.

As of this change in Pike:

https://review.openstack.org/#/c/491012/

Once all computes are upgraded, the resource tracker will no longer "heal" allocations in Placement for it's local node, meaning creating allocations for the node if the instance is on it, or removing allocations for the instance if the instance is not on the node.

During live migration, conductor will call the scheduler to select a host which is also going to claim resources against the dest node:

https://github.com/openstack/nova/blob/16.0.0.0rc1/nova/conductor/tasks/live_migrate.py#L181

https://github.com/openstack/nova/blob/16.0.0.0rc1/nova/scheduler/filter_scheduler.py#L287

https://github.com/openstack/nova/blob/16.0.0.0rc1/nova/scheduler/client/report.py#L147

The problem during live migration is once the scheduler picks a host, conductor performs some additional checks:

https://github.com/openstack/nova/blob/16.0.0.0rc1/nova/conductor/tasks/live_migrate.py#L194

Which could fail, and then conductor will retry the scheduler to get another host, until one is found and passes the pre-migration checks, or the number of retries are exhausted.

The problem is the allocation created in Placement for the destination node, which failed some later pre-migration check, is never cleaned up if the update_available_resource periodic task in the compute manager doesn't clean it up (again, once all computes are upgraded to Pike). This leaves the destination node having resources claimed against it which aren't really on the node.

We could rollback the allocation in conductor on a failure, or we could put some other kind of periodic cleanup task in the compute service which looks for failed migrations where the destination node in the migration record is for that node, and removes any failed allocations for that node and the given instance.

Matt Riedemann (mriedem)
Changed in nova:
status: New → Triaged
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
Revision history for this message
Matt Riedemann (mriedem) wrote :

Semi-related, but probably a separate bug, is that there is the ability to cancel an in-progress live migration and that also does not remove the allocation on the destination node for the instance.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Another semi-related bug Alex Xu pointed out, is that we don't cleanup allocations for the current host when doing a reschedule. The scheduler will claim allocations against the next host, but won't remove them for the current host.

Revision history for this message
Alex Xu (xuhj) wrote :

@Matt, yea, I filed a bug for that https://bugs.launchpad.net/nova/+bug/1712718

Revision history for this message
Matt Riedemann (mriedem) wrote :

After discussing this bug in the nova-scheduler meeting today, we agreed to just fix the problem in the conductor live migration task at the point of failure, rather than work on some periodic cleanup task.

Matt Riedemann (mriedem)
Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/498627

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/498861

Changed in nova:
status: Triaged → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

Following up on some earlier comments, this is already handled for cold migration / resize via this change: https://review.openstack.org/#/c/497606/

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looks like evacuate has some gaps still though, which should go into a new bug:

1. If the user specifies a host with evacuate, conductor bypasses the scheduler and we don't create allocations on the destination host:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/conductor/manager.py#L749

This could eventually lead to the claim failing on the destination compute:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795

2. If the user does not specify a host with evacuate, and the claim (or rebuild) fails on the destination node, we don't cleanup the allocation on the destination node even if the instance isn't spawned on it:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795

^ is pretty obvious that we should cleanup because the claim for resources failed.

This generic exception handling is harder to know if we should cleanup though since we'd need to know if the guest was spawned on it:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2812

But since we don't set the instance.host/node to the current host/node it won't be reported there anyway:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2822-L2824

I've reported this all as bug 1713786.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Bug 1713796 is reported for some things missing in unshelve instance scenarios.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/498627
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0b05655ef8b62bcb21345e778baeed3114c11571
Submitter: Jenkins
Branch: master

commit 0b05655ef8b62bcb21345e778baeed3114c11571
Author: Matt Riedemann <email address hidden>
Date: Mon Aug 28 20:07:30 2017 -0400

    Add functional recreate test for live migration pre-check fails

    This adds a functional test to recreate bug 1712411 where
    the allocations created against the destination node are not
    cleaned up by conductor when a MigrationPreCheckError happens
    and a rechedule to another host happens (or max retries are
    exceeded which is the case here with only two computes).

    Change-Id: Ieee45521c7e362b7dd053b20d3c39dea330ca080
    Relatd-Bug: #1712411

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/500907

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/500908

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/498861
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=94b904abbad1c9655b6dec1a2e58d73bc913ed47
Submitter: Jenkins
Branch: master

commit 94b904abbad1c9655b6dec1a2e58d73bc913ed47
Author: Matt Riedemann <email address hidden>
Date: Tue Aug 29 12:28:43 2017 -0400

    Cleanup allocations on invalid dest node during live migration

    Starting in Pike, the scheduler creates an allocation on a
    chosen destination node during live migration. This happens
    before the destination node pre-checks occur, which could
    fail and trigger a retry to the scheduler. The allocations
    created in Placement against the failed destination node
    were not being cleaned up though.

    This change adds some cleanup code to the live migration task
    in conductor to clean the allocations for the failed destination
    node before retrying.

    The functional recreate test for the bug is updated to show
    that the bug is fixed now.

    Also updates the docstring in the SchedulerReportClient
    remove_provider_from_instance_allocation method so that we
    no longer have to enumerate all of the places that call it.

    Change-Id: I41e5e1fa9938b5e04f7e20f78ccd77eca658885f
    Closes-Bug: #1712411

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/500907
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fd59e9adc62bec4502e764ff380a318d27a58b57
Submitter: Jenkins
Branch: stable/pike

commit fd59e9adc62bec4502e764ff380a318d27a58b57
Author: Matt Riedemann <email address hidden>
Date: Mon Aug 28 20:07:30 2017 -0400

    Add functional recreate test for live migration pre-check fails

    This adds a functional test to recreate bug 1712411 where
    the allocations created against the destination node are not
    cleaned up by conductor when a MigrationPreCheckError happens
    and a rechedule to another host happens (or max retries are
    exceeded which is the case here with only two computes).

    Change-Id: Ieee45521c7e362b7dd053b20d3c39dea330ca080
    Relatd-Bug: #1712411
    (cherry picked from commit 0b05655ef8b62bcb21345e778baeed3114c11571)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/500908
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a1462d2608bd0c3ee9cd8686b145bb1d84ffab3b
Submitter: Jenkins
Branch: stable/pike

commit a1462d2608bd0c3ee9cd8686b145bb1d84ffab3b
Author: Matt Riedemann <email address hidden>
Date: Tue Aug 29 12:28:43 2017 -0400

    Cleanup allocations on invalid dest node during live migration

    Starting in Pike, the scheduler creates an allocation on a
    chosen destination node during live migration. This happens
    before the destination node pre-checks occur, which could
    fail and trigger a retry to the scheduler. The allocations
    created in Placement against the failed destination node
    were not being cleaned up though.

    This change adds some cleanup code to the live migration task
    in conductor to clean the allocations for the failed destination
    node before retrying.

    The functional recreate test for the bug is updated to show
    that the bug is fixed now.

    Also updates the docstring in the SchedulerReportClient
    remove_provider_from_instance_allocation method so that we
    no longer have to enumerate all of the places that call it.

    Change-Id: I41e5e1fa9938b5e04f7e20f78ccd77eca658885f
    Closes-Bug: #1712411
    (cherry picked from commit 94b904abbad1c9655b6dec1a2e58d73bc913ed47)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.1

This issue was fixed in the openstack/nova 16.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b1

This issue was fixed in the openstack/nova 17.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.