_rollback_live_migration does not remove allocations from destination node

Bug #1715182 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Pike
Fix Committed
High
Matt Riedemann

Bug Description

This is a follow on to bug 1712411 where pre_live_migration fails on the destination host here:

https://github.com/openstack/nova/blob/0e52b3fe686ce1fc43fd3790711731bc806c6ad0/nova/compute/manager.py#L5456

And the source node starts rolling back things like volume connections on the destination host:

https://github.com/openstack/nova/blob/0e52b3fe686ce1fc43fd3790711731bc806c6ad0/nova/compute/manager.py#L5836

The tricky thing is we maybe can't cleanup the allocations from the _rollback_live_migration method since that's also passed to the virt driver in case live migration fails in the driver:

https://github.com/openstack/nova/blob/0e52b3fe686ce1fc43fd3790711731bc806c6ad0/nova/compute/manager.py#L5467

We might be unsure of what is actually running on the destination node and consuming resources at that point, however, the instance.host and instance.node should be pointed to the source node at that point of failure anyway, so removing the allocations on the destination node from within _rollback_live_migration should be OK, but it might require some investigation.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/507677

Matt Riedemann (mriedem)
Changed in nova:
status: Triaged → In Progress
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/507687

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/507677
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6c3a58ce8001c588fe225d60ec198c7141d74700
Submitter: Zuul
Branch: master

commit 6c3a58ce8001c588fe225d60ec198c7141d74700
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 26 17:03:12 2017 -0400

    Add recreate test for live migrate rollback not cleaning up dest allocs

    We rollback from a failed live migration in two cases:

    1. The pre_live_migration on the destination host fails. The
       _do_live_migration method calls _rollback_live_migration
       explicitly to cleanup the dest host.

    2. The live migration in the virt driver fails, and the virt driver
       calls back to _rollback_live_migration in the ComputeManager.

    Either way, the instance is not on the destination host, so just like
    how we remove volume connections and unplug vifs from the destination
    host, we need to also remove allocations for the destination node in
    Placement.

    This change adds a test to show that we are not currently cleaning up
    allocations on the destination node when we rollback from a live
    migration failure.

    Change-Id: Icbd5d7ff41aa04f8f7934fdce9668762691a4a69
    Related-Bug: #1715182

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/509923

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/509926

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/507687
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f90c61cd88edce74e3dbfd069beb2c33793d3371
Submitter: Jenkins
Branch: master

commit f90c61cd88edce74e3dbfd069beb2c33793d3371
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 26 17:37:19 2017 -0400

    Remove dest node allocations during live migration rollback

    When a live migration fails or is cancelled, either during
    pre_live_migration on the destination host or during the
    actual live migration itself, we rollback from the failure/abort
    by doing things like removing volume connections from the
    destination host and re-setup the network on the source host.

    As part of the rollback from a failed or cancelled live migration,
    we also need to remove the allocations created in Placement for the
    destination node, since the instance is not on the destination
    node.

    Change-Id: I7b70cf8d5233bd25bf865a1b2789640758493c2b
    Closes-Bug: #1715182
    Closes-Bug: #1714237

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/509923
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=81c4c17f2a0b9cd3a4d27c0b06092ef456c8979f
Submitter: Zuul
Branch: stable/pike

commit 81c4c17f2a0b9cd3a4d27c0b06092ef456c8979f
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 26 17:03:12 2017 -0400

    Add recreate test for live migrate rollback not cleaning up dest allocs

    We rollback from a failed live migration in two cases:

    1. The pre_live_migration on the destination host fails. The
       _do_live_migration method calls _rollback_live_migration
       explicitly to cleanup the dest host.

    2. The live migration in the virt driver fails, and the virt driver
       calls back to _rollback_live_migration in the ComputeManager.

    Either way, the instance is not on the destination host, so just like
    how we remove volume connections and unplug vifs from the destination
    host, we need to also remove allocations for the destination node in
    Placement.

    This change adds a test to show that we are not currently cleaning up
    allocations on the destination node when we rollback from a live
    migration failure.

    Change-Id: Icbd5d7ff41aa04f8f7934fdce9668762691a4a69
    Related-Bug: #1715182
    (cherry picked from commit 6c3a58ce8001c588fe225d60ec198c7141d74700)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b1

This issue was fixed in the openstack/nova 17.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/509926
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=76a3465271bdf48ebf579df77bb93739f39745c0
Submitter: Zuul
Branch: stable/pike

commit 76a3465271bdf48ebf579df77bb93739f39745c0
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 26 17:37:19 2017 -0400

    Remove dest node allocations during live migration rollback

    When a live migration fails or is cancelled, either during
    pre_live_migration on the destination host or during the
    actual live migration itself, we rollback from the failure/abort
    by doing things like removing volume connections from the
    destination host and re-setup the network on the source host.

    As part of the rollback from a failed or cancelled live migration,
    we also need to remove the allocations created in Placement for the
    destination node, since the instance is not on the destination
    node.

    Change-Id: I7b70cf8d5233bd25bf865a1b2789640758493c2b
    Closes-Bug: #1715182
    Closes-Bug: #1714237
    (cherry picked from commit f90c61cd88edce74e3dbfd069beb2c33793d3371)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.3

This issue was fixed in the openstack/nova 16.0.3 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.