kolla-ansible

Bug #2004555
Comment #27

Comment 27 for bug 2004555

Revision history for this message

melanie witt (melwitt) wrote on 2023-02-10: Re: [ussuri] Wrong volume attachment - volumes overlapping when connected through iscsi on host

#27

We definitely should not allow a delete to fail from a user's perspective.

My suggestion of a patch to not delete an attachment when detach fails during instance delete if delete_on_termination=False is intended to be better than what we have today, not necessarily to be perfect.

We could consider doing a periodic like Dan mentions. We already do similar with our "cleanup running deleted instances" periodic. The volume attachment cleanup could be hooked into that if it doesn't already do it.

From what I can tell, our periodic is already capable of taking care of it, but it's not enabled [1][2]:

    elif action == 'reap':
        LOG.info("Destroying instance with name label "
                 "'%s' which is marked as "
                 "DELETED but still present on host.",
                 instance.name, instance=instance)
        bdms = objects.BlockDeviceMappingList.get_by_instance_uuid(
            context, instance.uuid, use_slave=True)
        self.instance_events.clear_events_for_instance(instance)
        try:
            self._shutdown_instance(context, instance, bdms,
                                    notify=False)
            self._cleanup_volumes(context, instance, bdms,
                                  detach=False)

    def _cleanup_volumes(self, context, instance, bdms, raise_exc=True,
                         detach=True):
        original_exception = None
        for bdm in bdms:
            if detach and bdm.volume_id:
                try:
                    LOG.debug("Detaching volume: %s", bdm.volume_id,
                              instance_uuid=instance.uuid)
                    destroy = bdm.delete_on_termination
                    self._detach_volume(context, bdm, instance,
                                        destroy_bdm=destroy)
                except Exception as exc:
                    original_exception = exc
                    LOG.warning('Failed to detach volume: %(volume_id)s '
                                'due to %(exc)s',
                                {'volume_id': bdm.volume_id, 'exc': exc})

            if bdm.volume_id and bdm.delete_on_termination:
                try:
                    LOG.debug("Deleting volume: %s", bdm.volume_id,
                              instance_uuid=instance.uuid)
                    self.volume_api.delete(context, bdm.volume_id)
                except Exception as exc:
                    original_exception = exc
                    LOG.warning('Failed to delete volume: %(volume_id)s '
                                'due to %(exc)s',
                                {'volume_id': bdm.volume_id, 'exc': exc})
        if original_exception is not None and raise_exc:
            raise original_exception

Currently we're calling _cleanup_volumes with detach=False. Not sure what the reason for that is but if we determine there should be no problems with it, we can change it to detach=True in combination with not deleting the attachment on instance delete if delete_on_termination=False.

[1] https://github.com/openstack/nova/blob/a2964417822bd1a4a83fa5c27282d2be1e18868a/nova/compute/manager.py#L10579
[2] https://github.com/openstack/nova/blob/a2964417822bd1a4a83fa5c27282d2be1e18868a/nova/compute/manager.py#L3183-L3211

We definitely should not allow a delete to fail from a user's perspective.

From what I can tell, our periodic is already capable of taking care of it, but it's not enabled [1][2]: