live-migration cinder boot volume target_lun id incorrect

Bug #1288039 reported by Walt Boring
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Anthony Lee
Juno
Fix Released
Undecided
Unassigned
Kilo
Fix Released
Undecided
Unassigned

Bug Description

When nova goes to cleanup _post_live_migration on the source host, the block_device_mapping has incorrect data.

I can reproduce this 100% of the time with a cinder iSCSI backend, such as 3PAR.

This is a Fresh install on 2 new servers with no attached storage from Cinder and no VMs.
I create a cinder volume from an image.
I create a VM booted from that Cinder volume. That vm shows up on host1 with a LUN id of 0.
I live migrate that vm. The vm moves to host 2 and has a LUN id of 0. The LUN on host1 is now gone.

I create another cinder volume from image.
I create another VM booted from the 2nd cinder volume. The vm shows up on host1 with a LUN id of 0.
I live migrate that vm. The VM moves to host 2 and has a LUN id of 1.
_post_live_migrate is called on host1 to clean up, and gets failures, because it's asking cinder to delete the volume
on host1 with a target_lun id of 1, which doesn't exist. It's supposed to be asking cinder to detach LUN 0.

First migrate
HOST2
2014-03-04 19:02:07.870 WARNING nova.compute.manager [req-24521cb1-8719-4bc5-b488-73a4980d7110 admin admin] pre_live_migrate: {'block_device_mapping': [{'guest_format': None, 'boot_index': 0, 'mount_device': u'vda', 'connection_info': {u'd
river_volume_type': u'iscsi', 'serial': u'83fb6f13-905e-45f8-a465-508cb343b721', u'data': {u'target_discovered': True, u'qos_specs': None, u'target_iqn': u'iqn.2000-05.com.3pardata:20810002ac00383d', u'target_portal': u'10.10.120.253:3260'
, u'target_lun': 0, u'access_mode': u'rw'}}, 'disk_bus': u'virtio', 'device_type': u'disk', 'delete_on_termination': False}]}
HOST1
2014-03-04 19:02:16.775 WARNING nova.compute.manager [-] _post_live_migration: block_device_info {'block_device_mapping': [{'guest_format': None, 'boot_index': 0, 'mount_device': u'vda', 'connection_info': {u'driver_volume_type': u'iscsi',
 u'serial': u'83fb6f13-905e-45f8-a465-508cb343b721', u'data': {u'target_discovered': True, u'qos_specs': None, u'target_iqn': u'iqn.2000-05.com.3pardata:20810002ac00383d', u'target_portal': u'10.10.120.253:3260', u'target_lun': 0, u'access_mode': u'rw'}}, 'disk_bus': u'virtio', 'device_type': u'disk', 'delete_on_termination': False}]}

Second Migration
This is in _post_live_migration on the host1. It calls libvirt's driver.py post_live_migration with the volume information returned from the new volume on host2, hence the target_lun = 1. It should be calling libvirt's driver.py to clean up the original volume on the source host, which has a target_lun = 0.
2014-03-04 19:24:51.626 WARNING nova.compute.manager [-] _post_live_migration: block_device_info {'block_device_mapping': [{'guest_format': None, 'boot_index': 0, 'mount_device': u'vda', 'connection_info': {u'driver_volume_type': u'iscsi', u'serial': u'f0087595-804d-4bdb-9bad-0da2166313ea', u'data': {u'target_discovered': True, u'qos_specs': None, u'target_iqn': u'iqn.2000-05.com.3pardata:20810002ac00383d', u'target_portal': u'10.10.120.253:3260', u'target_lun': 1, u'access_mode': u'rw'}}, 'disk_bus': u'virtio', 'device_type': u'disk', 'delete_on_termination': False}]}

summary: - live-migration cinder boot volume target_lun id
+ live-migration cinder boot volume target_lun id incorrect
Changed in nova:
status: New → Confirmed
Rohan (kanaderohan)
Changed in nova:
assignee: nobody → Rohan (kanaderohan)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/79368

Changed in nova:
status: Confirmed → In Progress
tags: added: live-migrate
Mark McLoughlin (markmc)
Changed in nova:
importance: Undecided → High
Revision history for this message
Ankit Agrawal (ankitagrawal) wrote :

Hi Walt,

I am not able to reproduce this issue with the latest master code. Is it still reproducible ?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/79368
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in nova:
assignee: Rohan (kanaderohan) → nobody
status: In Progress → New
Changed in nova:
status: New → Confirmed
Changed in nova:
assignee: nobody → Bartosz Fic (bartosz-fic)
Mike Perez (thingee)
tags: added: volumess
tags: added: volumes
removed: volumess
Changed in nova:
assignee: Bartosz Fic (bartosz-fic) → Anthony Lee (anthony-mic-lee)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/211051

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/202770
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8b649aa86fb26e998d66e75e5cebfd19c396942d
Submitter: Jenkins
Branch: master

commit 8b649aa86fb26e998d66e75e5cebfd19c396942d
Author: Anthony Lee <email address hidden>
Date: Thu Jul 16 13:02:00 2015 -0700

    Fix live-migrations usage of the wrong connector information

    During the post_live_migration step for the Nova libvirt driver
    an incorrect assumption is being made about the connector
    information being sent to _disconnect_volume. It is assumed that
    the connection information on the source and destination is the
    same but that is not always the case. The BDM, where the
    connector information is being retrieved from only contains the
    connection information for the destination. This will not work
    when trying to disconnect volumes from the source during live
    migration as the properties such as the target_lun and
    initiator_target_map could be different. This ends up leaving
    behind dangling LUNs and possibly removing the incorrect
    volume's LUNs.

    The solution proposed here utilizes the connection_info that
    can be retrieved for a host from Cinder's initialize_connection
    API. This connection information contains the correct data for
    the source host and allows volume LUNs to be removed properly.

    Change-Id: I3dfb75eb58dfbc66b218bcee473af4c2ac282eb6
    Closes-Bug: #1475411
    Closes-Bug: #1288039
    Closes-Bug: #1423772

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/kilo)

Reviewed: https://review.openstack.org/211051
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=587092c909e15e983f7aef31d7bc0862271a32c7
Submitter: Jenkins
Branch: stable/kilo

commit 587092c909e15e983f7aef31d7bc0862271a32c7
Author: Anthony Lee <email address hidden>
Date: Thu Jul 16 13:02:00 2015 -0700

    Fix live-migrations usage of the wrong connector information

    During the post_live_migration step for the Nova libvirt driver
    an incorrect assumption is being made about the connector
    information being sent to _disconnect_volume. It is assumed that
    the connection information on the source and destination is the
    same but that is not always the case. The BDM, where the
    connector information is being retrieved from only contains the
    connection information for the destination. This will not work
    when trying to disconnect volumes from the source during live
    migration as the properties such as the target_lun and
    initiator_target_map could be different. This ends up leaving
    behind dangling LUNs and possibly removing the incorrect
    volume's LUNs.

    The solution proposed here utilizes the connection_info that
    can be retrieved for a host from Cinder's initialize_connection
    API. This connection information contains the correct data for
    the source host and allows volume LUNs to be removed properly.

    --

    NOTE(sahid): The TODO comment in the original change on master is
    omitted here since os-brick wasn't used by nova in kilo so leaving
    it in the backport would be confusing.

    Change-Id: I3dfb75eb58dfbc66b218bcee473af4c2ac282eb6
    Closes-Bug: #1475411
    Closes-Bug: #1288039
    Closes-Bug: #1423772

tags: added: in-stable-kilo
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-3
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/228517

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/juno)

Reviewed: https://review.openstack.org/228517
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9d2abbd9ab60ca873650759feaba98b4d8d35566
Submitter: Jenkins
Branch: stable/juno

commit 9d2abbd9ab60ca873650759feaba98b4d8d35566
Author: Anthony Lee <email address hidden>
Date: Thu Jul 16 13:02:00 2015 -0700

    Fix live-migrations usage of the wrong connector information

    During the post_live_migration step for the Nova libvirt driver
    an incorrect assumption is being made about the connector
    information being sent to _disconnect_volume. It is assumed that
    the connection information on the source and destination is the
    same but that is not always the case. The BDM, where the
    connector information is being retrieved from only contains the
    connection information for the destination. This will not work
    when trying to disconnect volumes from the source during live
    migration as the properties such as the target_lun and
    initiator_target_map could be different. This ends up leaving
    behind dangling LUNs and possibly removing the incorrect
    volume's LUNs.

    The solution proposed here utilizes the connection_info that
    can be retrieved for a host from Cinder's initialize_connection
    API. This connection information contains the correct data for
    the source host and allows volume LUNs to be removed properly.

    Conflicts:
            nova/tests/unit/virt/libvirt/test_driver.py

    NOTE(mriedem): The conflicts are due to the tests being moved
    in Kilo and 41f80226e0a1f73af76c7968617ebfda0aeb40b1 not being
    in stable/juno (renamed conn var to drvr in libvirt tests).

    Change-Id: I3dfb75eb58dfbc66b218bcee473af4c2ac282eb6
    Closes-Bug: #1475411
    Closes-Bug: #1288039
    Closes-Bug: #1423772
    (cherry picked from commit 587092c909e15e983f7aef31d7bc0862271a32c7)

tags: added: in-stable-juno
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-3 → 12.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.