extend_volume for libvirt / iscsi volumes fails due to faulty debug code

Bug #1936439 reported by MarkMielke
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
MarkMielke

Bug Description

Python module nova/virt/libvirt/volume/iscsi.py has the following code:

    def extend_volume(self, connection_info, instance, requested_size):
        """Extend the volume."""
        LOG.debug("calling os-brick to extend iSCSI Volume", instance=instance)
        new_size = self.connector.extend_volume(connection_info['data'])
        LOG.debug("Extend iSCSI Volume %s; new_size=%s",
                  connection_info['data']['device_path'],
                  new_size, instance=instance)
        return new_size

In cases where device_path is not available, the above code fails due to LOG.debug():

2021-07-15 16:03:41.137 1546583 WARNING nova.compute.manager [req-9fee5153-b004-4606-800a-bb82cb87eeb9 2fbdb548b5444008b47cf373ae16aeeb 0d40f63055ab45c6975233bdbe8737ac - default default] [instance: 96e18906-3d4f-4f77-890c-53d8ea59e26b] Extend volume failed, volume_id=bd6cf322-da11-4f23-bb77-d92e83cda0fe, reason: 'device_path'
2021-07-15 16:03:41.170 1546583 ERROR oslo_messaging.rpc.server [req-9fee5153-b004-4606-800a-bb82cb87eeb9 2fbdb548b5444008b47cf373ae16aeeb 0d40f63055ab45c6975233bdbe8737ac - default default] Exception during message handling: KeyError: 'device_path'
...
2021-07-15 16:03:41.170 1546583 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/volume/iscsi.py", line 88, in extend_volume
2021-07-15 16:03:41.170 1546583 ERROR oslo_messaging.rpc.server connection_info['data']['device_path'],
2021-07-15 16:03:41.170 1546583 ERROR oslo_messaging.rpc.server KeyError: 'device_path'

If this code is commented out, the use case works correctly. Also, there is other code in Nova and os-brick that acknowledges that device_path may not be set, such as in nova/virt/libvirt/driver.py :

                # NOTE(lyarwood): Find the path to provide to qemu-img
                if 'device_path' in connection_info['data']:
                    path = connection_info['data']['device_path']

It seems like this is a left over code path that does not handle the case that device_path must be derived at runtime, and is not captured in conneciton_info['data'].

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/801003

Changed in nova:
status: New → In Progress
MarkMielke (mark-mielke)
Changed in nova:
assignee: nobody → MarkMielke (mark-mielke)
Revision history for this message
MarkMielke (mark-mielke) wrote :

After it is merged to master, please backport to prior releases. I have been doing local fixes for the last few releases, and finally decided to open this issue and submit a fix to benefit others. We use SolidFire volumes, and this seems to trigger the device_path not being present under libvirt / iscsi. os-brick generates a device path as needed.

Revision history for this message
Lee Yarwood (lyarwood) wrote :

Can you update the bug with an example `openstack server event list $instance` where you see connection_info['data']['device_path'] unset?

Revision history for this message
MarkMielke (mark-mielke) wrote :

I had a similar wonder myself.... I took a guess, and it seems the problem gets introduced during live migration. I confirmed I could live extend the volume by +1G in its original instance. But, after one live migration I reproduced the problem. Here is the output you asked for:

+------------------------------------------+--------------------------------------+----------------+----------------------------+
| Request ID | Server ID | Action | Start Time |
+------------------------------------------+--------------------------------------+----------------+----------------------------+
| req-34d9edb8-c96a-4ae3-8eb4-e15368206f4e | 6f6486e3-9c8a-41a5-9efe-26898bb73079 | extend_volume | 2021-07-29T04:52:56.000000 |
| req-70908b98-37ca-484f-ace0-a7152b4cdc5a | 6f6486e3-9c8a-41a5-9efe-26898bb73079 | live-migration | 2021-07-29T04:51:32.000000 |
| req-65f68e19-94e0-4ab0-ab69-39d44d79bacb | 6f6486e3-9c8a-41a5-9efe-26898bb73079 | extend_volume | 2021-07-29T04:51:01.000000 |
| req-b20ed3f1-db1c-431b-96bd-12b30a4dba35 | 6f6486e3-9c8a-41a5-9efe-26898bb73079 | create | 2021-07-26T02:53:47.000000 |
+------------------------------------------+--------------------------------------+----------------+----------------------------+

I would note, though - that while I can agree there are *two* bugs, rather than one - I still have to deal with device_path being missing on basically all of my instances. :-) So, I think the debug statement might be a clue of a different problem, but that a debug statement causes failure is a problem on its own.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/801003
Committed: https://opendev.org/openstack/nova/commit/ad60f23be3d562422b350aade04aa92ade39fb32
Submitter: "Zuul (22348)"
Branch: master

commit ad60f23be3d562422b350aade04aa92ade39fb32
Author: Mark Mielke <email address hidden>
Date: Thu Jul 15 18:34:36 2021 -0400

    extend_volume of libvirt/volume/iscsi should not use device_path

    The connection_info['data']['device_path'] field is not always
    available. In cases that it was not available, it would cause
    the debug code to raise a KeyError instead of proceeding.

    Other similar debug messages in the same file do not include
    device_path. As a simple fix, just drop the device_path from
    the log.

    Closes-Bug: #1936439

    Change-Id: Id0539d2ee909d86ffef07ae566697db8ae0f83b4
    Signed-off-by: Mark Mielke <email address hidden>

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.0.0.0rc1

This issue was fixed in the openstack/nova 24.0.0.0rc1 release candidate.

Revision history for this message
Telmo Morais (shipsnbolts) wrote (last edit ):

I'm using nova 23.1.1 (Wallaby), and the same bug happens.
commenting out the debug line, allows the process to finish without error.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/836604

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/836605

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/836606

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/836607

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/865214

Revision history for this message
rune32bit (none2021) wrote :

The connection_info['data']['device_path'] field is lost when
instance do live migration at pre_live_migration stage, it would
cause the debug code to raise a KeyError instead of proceeding.

As a fix, store device_path to bdm table.

Review: https://review.opendev.org/c/openstack/nova/+/865214

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by "xielijie <xielijay@126.com>" on branch: master
Review: https://review.opendev.org/c/openstack/nova/+/865214
Reason: https://review.opendev.org/c/openstack/nova/+/843680 solved

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/836607
Reason: stable/train branch of nova projects' have been tagged as End of Life. All open patches have to be abandoned in order to be able to delete the branch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.