disconnect_volume not called when rebase failures are encountered during swap_volume

Bug #1685185 reported by Lee Yarwood
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Lee Yarwood
Newton
Fix Committed
Medium
Lee Yarwood
Ocata
Fix Committed
Medium
Lee Yarwood

Bug Description

Description
===========

At present disconnect_volume is not called when rebase failures are encountered during swap_volume. This results in the new volume remaining connected to the compute host prior to terminate_connection then being called. This can in turn lead to left over devices remaining on the compute host for some volume backends such as LVM/iSCSI.

Steps to reproduce
==================

Downstream, the easiest way to reproduce this is via https://bugzilla.redhat.com/show_bug.cgi?id=1401755 :

# sudo setenforce 1
# nova update-volume ${instance_uuid} \
                     ${attached_NFS_volume_id} \
                     ${unattached_iSCSI_volume_id}

Upstream, I've been unable to get devstack to even work correctly with SELinux in enforcing mode so I've been unable to reproduce the rollback this way.

Expected result
===============

New volume disconnected from compute host.

Actual result
=============

New volume remains connected to compute host.

Environment
===========
1. Exact version of OpenStack you are running. See the following
  list for all releases: http://docs.openstack.org/releases/

   N/A

2. Which hypervisor did you use?
   (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
   What's the version of that?

   Libvirt + KVM

2. Which storage type did you use?
   (For example: Ceph, LVM, GPFS, ...)
   What's the version of that?

   NFS + LVM/iSCSI

3. Which networking type did you use?
   (For example: nova-network, Neutron with OpenVSwitch, ...)

   N/A

Logs & Configs
==============

Example Newton trace :

2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher [req-f31f6110-e880-41e8-be1a-2c41ea7fd9ce 75fbc7a6db34480091d2a53e2e20b695 62e53e5e804e49a9890928a5a4846f60 - - -] Exception during message handling: internal error: unable to execute QEMU command 'drive-mirror': Could not open '/dev/disk/by-id/dm-uuid-mpath-360014053aac4f90daef4d76baa773169': Permission denied
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher incoming.message))
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _dispatch
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 110, in wrapped
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher payload)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher self.force_reraise()
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 89, in wrapped
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher return f(self, context, *args, **kw)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 359, in decorated_function
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher LOG.warning(msg, e, instance=instance)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher self.force_reraise()
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 328, in decorated_function
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 387, in decorated_function
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher kwargs['instance'], e, sys.exc_info())
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher self.force_reraise()
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 375, in decorated_function
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5033, in swap_volume
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher resize_to)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4980, in _swap_volume
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher self.volume_api.unreserve_volume(context, new_volume_id)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher self.force_reraise()
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4958, in _swap_volume
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher resize_to)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1434, in swap_volume
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher self._swap_volume(guest, disk_dev, conf.source_path, resize_to)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1391, in _swap_volume
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher dev.rebase(new_path, copy=True, reuse_ext=True)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 546, in rebase
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher self._disk, base, self.REBASE_DEFAULT_BANDWIDTH, flags=flags)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher result = proxy_call(self._autowrap, f, *args, **kwargs)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher rv = execute(f, *args, **kwargs)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher six.reraise(c, e, tb)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher rv = meth(*args, **kwargs)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/libvirt.py", line 930, in blockRebase
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher if ret == -1: raise libvirtError ('virDomainBlockRebase() failed', dom=self)
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher libvirtError: internal error: unable to execute QEMU command 'drive-mirror': Could not open '/dev/disk/by-id/dm-uuid-mpath-360014053aac4f90daef4d76baa773169': Permission denied
2017-04-12 09:37:53.744 3077 ERROR oslo_messaging.rpc.dispatcher

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/458807

Changed in nova:
assignee: nobody → Lee Yarwood (lyarwood)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/458807
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=809065485c19fd535db6740bb21b867c41c008fe
Submitter: Jenkins
Branch: master

commit 809065485c19fd535db6740bb21b867c41c008fe
Author: Lee Yarwood <email address hidden>
Date: Thu Apr 20 19:43:32 2017 +0100

    libvirt: Always disconnect_volume after rebase failures

    Previously failures when rebasing onto a new volume would leave the
    volume connected to the compute host. For some volume backends such as
    iSCSI the subsequent call to terminate_connection would then result in
    leftover devices remaining on the host.

    This change simply catches any error associated with the rebase and
    ensures that disconnect_volume is called for the new volume prior to
    terminate_connection finally being called.

    Change-Id: I5997000a0ba6341c4987405cdc0760c3b471bd59
    Closes-bug: #1685185

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/460296

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/460482

Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/460296
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fbcf8d673342570a1518dbf8d88f289c2c39cd30
Submitter: Jenkins
Branch: stable/ocata

commit fbcf8d673342570a1518dbf8d88f289c2c39cd30
Author: Lee Yarwood <email address hidden>
Date: Thu Apr 20 19:43:32 2017 +0100

    libvirt: Always disconnect_volume after rebase failures

    Previously failures when rebasing onto a new volume would leave the
    volume connected to the compute host. For some volume backends such as
    iSCSI the subsequent call to terminate_connection would then result in
    leftover devices remaining on the host.

    This change simply catches any error associated with the rebase and
    ensures that disconnect_volume is called for the new volume prior to
    terminate_connection finally being called.

    NOTE(lyarwood): Conflict caused by MIN_LIBVIRT_VERSION being 1.2.1 in
    stable/ocata making I81c32bbea0f04ca876f2078ef2ae0e1975473584
    unsuitable. The is_job_complete polling loop removed by that change in
    master is now moved into the try block of this change ensuring we catch
    any errors that might be thrown while waiting for the async pivot. The
    log.exception message also requires translation in Ocata.

    Conflicts:
            nova/virt/libvirt/driver.py

    Change-Id: I5997000a0ba6341c4987405cdc0760c3b471bd59
    Closes-bug: #1685185
    (cherry picked from commit 809065485c19fd535db6740bb21b867c41c008fe)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.0.0b2

This issue was fixed in the openstack/nova 16.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.6

This issue was fixed in the openstack/nova 15.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/460482
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8fb411c9b2af9bb5fef3cc272517c54bab821cd4
Submitter: Jenkins
Branch: stable/newton

commit 8fb411c9b2af9bb5fef3cc272517c54bab821cd4
Author: Lee Yarwood <email address hidden>
Date: Thu Apr 20 19:43:32 2017 +0100

    libvirt: Always disconnect_volume after rebase failures

    Previously failures when rebasing onto a new volume would leave the
    volume connected to the compute host. For some volume backends such as
    iSCSI the subsequent call to terminate_connection would then result in
    leftover devices remaining on the host.

    This change simply catches any error associated with the rebase and
    ensures that disconnect_volume is called for the new volume prior to
    terminate_connection finally being called.

    NOTE(lyarwood): Conflict caused by MIN_LIBVIRT_VERSION being 1.2.1 in
    stable/ocata making I81c32bbea0f04ca876f2078ef2ae0e1975473584
    unsuitable. The is_job_complete polling loop removed by that change in
    master is now moved into the try block of this change ensuring we catch
    any errors that might be thrown while waiting for the async pivot. The
    log.exception message also requires translation in Ocata.

    Conflicts:
            nova/virt/libvirt/driver.py

    Change-Id: I5997000a0ba6341c4987405cdc0760c3b471bd59
    Closes-bug: #1685185
    (cherry picked from commit 809065485c19fd535db6740bb21b867c41c008fe)
    (cherry picked from commit fbcf8d673342570a1518dbf8d88f289c2c39cd30)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.8

This issue was fixed in the openstack/nova 14.0.8 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.