[pike] Nova host disable and Live Migrate all instances fail.

Bug #1718455 reported by Steve Searles
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Sylvain Bauza
Pike
Fix Committed
High
Matt Riedemann

Bug Description

Disabling a host in horizon and live-migrating instances off fail with the following error if the instances are created with a single boot command. eg. create 10 cirros instances through the horizon dashboard. The instances are all KVM and backed by Cinder Volumes.

2017-09-19 19:02:30.588 19741 DEBUG nova.scheduler.filter_scheduler [req-4268ea83-0657-40cc-961b-f0ae9fb3019e 385c60230b3f49da930dda4d089eda6b 723aa12337a44f818b6d1e1a59f16e49 - default default] There are 1 hosts available but 10 instances requested to build. select_destinations /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:101

Steps to reproduce:

1. Create 10 instances via the horizon dashboard at the same time.
2. Set a compute host disabled.
3. Choose the migrate host option and select live-migrate as the method.
4. The live migration will fail all after passing the scheduler with the error above.

NOTE: Creating 10 separate instances with individual "openstack server create" commands or running the launch instance 10 times in horizon does not produce the same result and the scheduler will evacuate the host as expected.

Nova version on controller:

ii nova-api 2:16.0.0-0ubuntu1~cloud0 all OpenStack Compute - API frontend
ii nova-common 2:16.0.0-0ubuntu1~cloud0 all OpenStack Compute - common files
ii nova-conductor 2:16.0.0-0ubuntu1~cloud0 all OpenStack Compute - conductor service
ii nova-consoleauth 2:16.0.0-0ubuntu1~cloud0 all OpenStack Compute - Console Authenticator
ii nova-novncproxy 2:16.0.0-0ubuntu1~cloud0 all OpenStack Compute - NoVNC proxy
ii nova-placement-api 2:16.0.0-0ubuntu1~cloud0 all OpenStack Compute - placement API frontend
ii nova-scheduler 2:16.0.0-0ubuntu1~cloud0 all OpenStack Compute - virtual machine scheduler
ii python-nova 2:16.0.0-0ubuntu1~cloud0 all OpenStack Compute Python libraries
ii python-novaclient

Revision history for this message
Matt Riedemann (mriedem) wrote :

We should have a functional scenario test for this since we had a similar bug/fix in pike:

https://review.openstack.org/#/c/491439/

But that missed this other part because it's just a unit test:

https://review.openstack.org/#/c/491439/3/nova/scheduler/filter_scheduler.py@81

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Sylvain Bauza (sylvain-bauza)
Matt Riedemann (mriedem)
no longer affects: nova/ocata
tags: added: scheduler
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/506092

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/506093

Changed in nova:
assignee: Sylvain Bauza (sylvain-bauza) → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/506092
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=016d3efa393f864585ecc13317e46415a5cba825
Submitter: Jenkins
Branch: master

commit 016d3efa393f864585ecc13317e46415a5cba825
Author: Sylvain Bauza <email address hidden>
Date: Thu Sep 21 10:12:51 2017 +0200

    Add a regression test for bug 1718455

    Moving an instance that was created concurrently by a multiple instances create
    call no longer works in Pike because of a wrong RequestSpec field lookup.
    Verifying that regression so the next change will fix it.

    Change-Id: I26d1c90578b3dfb183bbf77ac758c2743dbced28
    Related-Bug: #1718455

Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Sylvain Bauza (sylvain-bauza)
Changed in nova:
assignee: Sylvain Bauza (sylvain-bauza) → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/508590

Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Sylvain Bauza (sylvain-bauza)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/508591

Matt Riedemann (mriedem)
Changed in nova:
importance: Medium → High
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/506093
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=87ca0d8af0d6440a7effd4da9e47321b3a335442
Submitter: Jenkins
Branch: master

commit 87ca0d8af0d6440a7effd4da9e47321b3a335442
Author: Sylvain Bauza <email address hidden>
Date: Thu Sep 21 10:52:46 2017 +0200

    Ensure instance can migrate when launched concurrently

    When we fixed the problem in If7da79356174be57481ef246618221e3b2ff8200
    we forgot to modify a specific check in select_destinations().

    Since _schedule() is returning the correct number of needed hosts but
    we were still using the wrong number of instances to verify, the
    conditional in select_destinations() was always incorrect.

    Note that we needed to modify test.py and fake driver because:

     - _do_check_can_live_migrate_destination was using CONF.host
     - check_can_live_migrate_destination in the fake driver was
       incorrectly trying to set a None value to the object while
       libvirt fixed that earlier (block_migration=None when the
       user specifies block_migration='auto' in the API)
     - pre_live_migration in the fake driver was not returning
       migrate_data, which is passed through live_migration and
       _post_live_migration to set the migration object status

    Change-Id: Iff839f3478ebe77bf3e2c4becbe9b9b62fff5035
    Closes-Bug: #1718455

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/508590
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2b0c1867dad6e92b593f16ac62b9325bcacf91a9
Submitter: Jenkins
Branch: stable/pike

commit 2b0c1867dad6e92b593f16ac62b9325bcacf91a9
Author: Sylvain Bauza <email address hidden>
Date: Thu Sep 21 10:12:51 2017 +0200

    Add a regression test for bug 1718455

    Moving an instance that was created concurrently by a multiple instances create
    call no longer works in Pike because of a wrong RequestSpec field lookup.
    Verifying that regression so the next change will fix it.

    Change-Id: I26d1c90578b3dfb183bbf77ac758c2743dbced28
    Related-Bug: #1718455
    (cherry picked from commit 016d3efa393f864585ecc13317e46415a5cba825)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/508591
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b97c433f7c24615a81a1342e75fd42fe3c15e3d4
Submitter: Jenkins
Branch: stable/pike

commit b97c433f7c24615a81a1342e75fd42fe3c15e3d4
Author: Sylvain Bauza <email address hidden>
Date: Thu Sep 21 10:52:46 2017 +0200

    Ensure instance can migrate when launched concurrently

    When we fixed the problem in If7da79356174be57481ef246618221e3b2ff8200
    we forgot to modify a specific check in select_destinations().

    Since _schedule() is returning the correct number of needed hosts but
    we were still using the wrong number of instances to verify, the
    conditional in select_destinations() was always incorrect.

    Note that we needed to modify test.py and fake driver because:

     - _do_check_can_live_migrate_destination was using CONF.host
     - check_can_live_migrate_destination in the fake driver was
       incorrectly trying to set a None value to the object while
       libvirt fixed that earlier (block_migration=None when the
       user specifies block_migration='auto' in the API)
     - pre_live_migration in the fake driver was not returning
       migrate_data, which is passed through live_migration and
       _post_live_migration to set the migration object status

    Change-Id: Iff839f3478ebe77bf3e2c4becbe9b9b62fff5035
    Closes-Bug: #1718455
    (cherry picked from commit 87ca0d8af0d6440a7effd4da9e47321b3a335442)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b1

This issue was fixed in the openstack/nova 17.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.2

This issue was fixed in the openstack/nova 16.0.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.