OpenStack Compute (nova)

libvirt: nova's detach_volume silently fails sometimes

Bug #1452840 reported by Nicolas Simonds on 2015-05-07

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Libvirt Python	New	Undecided	Unassigned
	OpenStack Compute (nova)	Confirmed	Low	Unassigned

Bug Description

This behavior has been observed on the following platforms:

* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse NFS driver, CirrOS 0.3.2 guest
* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse RBD (Ceph) driver, CirrOS 0.3.2 guest
* Nova master, Debian 14.04, QEMU 2.0.0, libvirt 1.2.2, with the Cinder master iSCSI driver, CirrOS 0.3.2 guest

Nova's "detach_volume" fires the detach method into libvirt, which claims success, but the device is still attached according to "virsh domblklist". Nova then finishes the teardown, releasing the resources, which then causes I/O errors in the guest, and subsequent volume_attach requests from Nova to fail spectacularly due to it trying to use an in-use resource.

This appears to be a race condition, in that it does occasionally work fine.

Steps to Reproduce:

This script will usually trigger the error condition:

#!/bin/bash -vx

    : Setup
    img=$(glance image-list --disk-format ami | awk '/cirros-0.3.2-x86_64-uec/ {print $2}')
    vol1_id=$(cinder create 1 | awk '($2=="id"){print $4}')
    sleep 5

: Launch
nova boot --flavor m1.tiny --image "$img" --block-device source=volume,id="$vol1_id",dest=volume,shutdown=preserve --poll test

: Measure
nova show test | grep "volumes_attached.*$vol1_id"

    : Poke the bear
    nova volume-detach test "$vol1_id"
    sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
    sleep 10
    sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
    vol2_id=$(cinder create 1 | awk '($2=="id"){print $4}')
    nova volume-attach test "$vol2_id"
    sleep 1

: Measure again
nova show test | grep "volumes_attached.*$vol2_id"

Expected behavior:

The volumes attach/detach/attach properly

Actual behavior:

The second attachment fails, and n-cpu throws the following exception:

    Failed to attach volume at mountpoint: /dev/vdb
    Traceback (most recent call last):
        File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1057, in attach_volume
         virt_dom.attachDeviceFlags(conf.to_xml(), flags)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
         result = proxy_call(self._autowrap, f, *args, **kwargs)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
         rv = execute(f, *args, **kwargs)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
         six.reraise(c, e, tb)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
         rv = meth(*args, **kwargs)
       File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 517, in attachDeviceFlags
         if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
     libvirtError: operation failed: target vdb already exists

Workaround:

"sudo virsh detach-disk $SOME_UUID $SOME_DISK_ID" appears to cause the guest to properly detach the device, and also seems to ward off whatever gremlins caused the problem in the first place; i.e., the problem gets much less likely to present itself after firing a virsh command.

See original description

Tags:

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-05-07:

What version of libvirt/qemu used with master nova?

tags:

added: libvirt volumes

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-05-07:

Oh nevermind, libvirt 1.2.2 with nova master on debian. Have you tried testing against newer/latest libvirt/qemu?

Revision history for this message

Nicolas Simonds (nicolas.simonds) wrote on 2015-05-07:

Addendum:

I between runs of the test script, clean up with:

nova delete test ; cinder list | awk '/avail/ {print $2}' | xargs -r cinder delete

Revision history for this message

Nicolas Simonds (nicolas.simonds) wrote on 2015-05-07:

No, I'm testing with stock Ubuntu Trusty and devstack with no local.conf, i.e., all defaults, all the time.

description:

updated

Sylvain Bauza (sylvain-bauza) on 2015-05-07

Changed in nova:
status:	New → Confirmed
importance:	Undecided → Low

Revision history for this message

Nicolas Simonds (nicolas.simonds) wrote on 2015-05-08:

In an attempt to gain insight, I altered Nova's detach_volume method to recheck+retry+log indefinitely, to see how many tries it would take for the detach to eventually succeed.

The answer is, "never, unless another request comes in on a different greenthread to alter the guest's configuration". The test provided script attaches another volume after ten seconds, so after futilely trying to detach the volume (/dev/vdb) for ten seconds, an attach request comes in, succeeds (on /dev/vdc), and unsticks libvirt with regards to detaching the volume, and cleans everything up.

jimmy.zhao (jimmy-zhao) on 2017-03-09

Changed in nova:
status:	Confirmed → In Progress

Takashi Natsume (natsume-takashi) on 2017-06-05

Changed in nova:
status:	In Progress → Confirmed

Revision history for this message

Sean Dague (sdague) wrote on 2017-06-27:

Automatically discovered version icehouse in description. If this is incorrect, please update the description to include 'nova version: ...'

tags:

added: openstack-version.icehouse

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.