the volume multiattach and in-use after retype another backend, then can not detach it

Bug #1994018 reported by shiyawei
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
New
Medium
Unassigned

Bug Description

first step: we create a volume support multiattach and volume type is ceph;
second step: then we attach it on a vm;
third step: when the volume in-use, then retype the volume to another backend such as huawei;
forth step: we detach the volume from the vm, but we can not detach it, we can get the bug;

a bug info like this:

./nova-compute.log:2022-07-13 14:41:57.299 7 ERROR nova.virt.block_device [req-d4796bec-486d-4d02-ad0e-c507484b315a 25d564a674bc40f8aecb1ceeae3ff61d 1c599ef556bd40039b5dbf464ff418ab - default default] [instance: ec8fa5d4-bf6e-4329-9473-593ba19b53ee] Failed to detach volume 27feb4bf-aa9e-4209-907c-822480fa6234 from /dev/vdb: nova.exception.VolumeNotFound: Volume 9b7b3b2c-6164-411f-8a4a-11c7751e1dfc could not be found.

we location this bug happen at code (nova/virt/libvirt/driver.py)

    def _should_disconnect_target(self, context, instance, multiattach,
                                  vol_driver, volume_id):
        # NOTE(jdg): Multiattach is a special case (not to be confused
        # with shared_targets). With multiattach we may have a single volume
        # attached multiple times to *this* compute node (ie Server-1 and
        # Server-2). So, if we receive a call to delete the attachment for
        # Server-1 we need to take special care to make sure that the Volume
        # isn't also attached to another Server on this Node. Otherwise we
        # will indiscriminantly delete the connection for all Server and that's
        # no good. So check if it's attached multiple times on this node
        # if it is we skip the call to brick to delete the connection.
        if not multiattach:
            return True

        # NOTE(deiter): Volume drivers using _HostMountStateManager are another
        # special case. _HostMountStateManager ensures that the compute node
        # only attempts to mount a single mountpoint in use by multiple
        # attachments once, and that it is not unmounted until it is no longer
        # in use by any attachments. So we can skip the multiattach check for
        # volume drivers that based on LibvirtMountedFileSystemVolumeDriver.
        if isinstance(vol_driver, fs.LibvirtMountedFileSystemVolumeDriver):
            return True

        connection_count = 0
        volume = self._volume_api.get(context, volume_id)

volume = self._volume_api.get(context, volume_id) #

this could not get volume, because volume_id from nova connection_info , when we have retype once to another backend, volume_id change to the new id, but cinder only save the old volume id, at connection info is the serial filed.

for example:

the ceph volume had been attach on a vm, it is connection info is:

connection_info: {"driver_volume_type": "rbd", "data": {"name": "volumes/volume-cfbd9c4d-90b3-4118-9405-9b5912e6614a", "hosts": ["192.168.167.218", "192.168.167.219", "192.168.167.220"], "ports": ["6789", "6789", "6789"], "cluster_name": "ceph", "auth_enabled": true, "auth_username": "admin", "secret_type": "ceph", "secret_uuid": "e8556aec-5b56-42f9-bdf8-b43c21f0c8d7", "volume_id": "cfbd9c4d-90b3-4118-9405-9b5912e6614a", "discard": true, "keyring": null, "qos_specs": null, "access_mode": "rw", "encrypted": false}, "status": "reserved", "instance": "bd0b7b08-0c4c-46e7-8810-472f7f765f6f", "attached_at": "", "detached_at": "", "volume_id": "cfbd9c4d-90b3-4118-9405-9b5912e6614a", "serial": "cfbd9c4d-90b3-4118-9405-9b5912e6614a", "multiattach": true}

after we retype it to huawei, it is connection info is:
{"driver_volume_type": "iscsi", "data": {"target_discovered": false, "hostlun_id": 5, "mappingview_id": "47", "lun_id": "2091", "target_iqn": "iqn.2006-08.com.huawei:oceanstor:2100ccbbfef80d62::20000:172.16.0.249", "target_portal": "172.16.0.249:3260", "target_lun": 5, "qos_specs": null, "access_mode": "rw", "encrypted": false, "device_path": "/dev/disk/by-id/scsi-36ccbbfe100f80d6276afb5410000082b"}, "status": "reserved", "instance": "bd0b7b08-0c4c-46e7-8810-472f7f765f6f", "attached_at": "", "detached_at": "", "volume_id": "93010a1b-9bbe-4222-a4a4-bf7392bde116", "multiattach": true, "serial": "cfbd9c4d-90b3-4118-9405-9b5912e6614a"}

we colud find after retype, the connection info volume_id has been changed from cfbd9c4d-90b3-4118-9405-9b5912e6614a to 93010a1b-9bbe-4222-a4a4-bf7392bde116

Revision history for this message
Sofia Enriquez (lsofia-enriquez) wrote :

Hi shiyawei,
would you mind sharing the cinder version you're using?
Thanks in advance.
Sofia

Changed in cinder:
importance: Undecided → Medium
tags: added: attached in-use migrate multiattach retype
Revision history for this message
Gorka Eguileor (gorka) wrote :

This seems to be a bug in the cinder/volume/apy.py code:

        if src_is_multiattach != tgt_is_multiattach:
            if volume.status != "available":
                msg = _('Invalid volume_type passed, retypes affecting '
                        'multiattach are only allowed on available volumes, '
                        'the specified volume however currently has a status '
                        'of: %s.') % volume.status
                LOG.info(msg)
                raise exception.InvalidInput(reason=msg)

            # If they are retyping to a multiattach capable, make sure they
            # are allowed to do so.
            if tgt_is_multiattach:
                context.authorize(vol_policy.MULTIATTACH_POLICY,
                                  target_obj=volume)

According to what was agreed the code should be:

        if src_is_multiattach and volume.status != "available":
            msg = _('Invalid volume_type passed, retypes affecting '
                    'multiattach are only allowed on available volumes, '
                    'the specified volume however currently has a status '
                    'of: %s.') % volume.status
            LOG.info(msg)
            raise exception.InvalidInput(reason=msg)

        # If they are retyping to a multiattach capable, make sure they
        # are allowed to do so.
        if tgt_is_multiattach:
            context.authorize(vol_policy.MULTIATTACH_POLICY,
                              target_obj=volume)

But that would prevent all multi-attach volume from being retyped when in-use, but what we really should do is add a condition to the DB conditional update so that it would only be blocked if the source volume is multi-attach and it has more than one attachment.

Revision history for this message
shiyawei (shiyawei) wrote :

Hi gorka,
This bug do not happend at your attach codes, because we create new volume-type support to tgt_is_multiattach. so can pass your attach codes

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.