ImageNotFound error occurs after live migration

Bug #1920886 reported by Nobuhiro MIKI
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Medium
Nobuhiro MIKI

Bug Description

Description
===========

After live migration with a shared instance path, when I stop the instance,
delete the glance image, and start the instance, ImageNotFound error
occurs unintentionally. This issue can be reproduced in a typical configuration
using devstack. The reproduction procedure and log messages are described in
detail in the sections below.

Steps to reproduce
===============

Set up the two hosts by referring to devstack multi node lab [1]. In this
example, the name of the first host is node1, and the name of the second
host is node2. Mount the same NFS from node1 and node2 in order to share
the instance path /opt/stack/data/nova/instances/.

$ cat <<EOF | tee -a /etc/fstab
{nfs server ip}:/ /opt/stack/data/nova/instances nfs4 defaults,nofail 0 0
EOF
$ mount -a -v

$ git clone https://github.com/openstack/devstack.git
$ cd devstack
$ ./stack.sh

$ nova boot --flavor m1.small --image cirros-0.5.2-x86_64-disk \
            --nic net-id=$(neutron net-list | awk '/private/ {print $2;}') \
            --availability-zone=nova:node1 nmiki.test
$ nova live-migration 22a86da9-f6db-493d-aca7-3fcdc2d318a8 node2
$ nova stop 22a86da9-f6db-493d-aca7-3fcdc2d318a8
$ qemu-img info /opt/stack/data/nova/instances/22a86da9-f6db-493d-aca7-3fcdc2d318a8/disk
$ glance image-delete 6291c220-4bb5-48e6-9071-2ddb0c09a61e
$ nova start 22a86da9-f6db-493d-aca7-3fcdc2d318a8

Expected result
===============

A series of commands are executed successfully, and finally, the VM is started.

Actual result
=============

The `nova start 22a86da9-f6db-493d-aca7-3fcdc2d318a8` command failed with an
nova.exception.ImageNotFound error.

Looking at the result of the qemu-img command, the backing file of the instance
is still pointing to node1 even after the live migration is finished.

Since there are few environments that use shared NFS and distinguish
image_cache.subdirectory_name for each node, I think there are only a few
users affected.

Environment
===========

- Ubuntu 20.04.1 LTS
- devstack (commit e049a319701ee77b27b9382bdce8e175abf65eed)
- nova (commit 3de7fb7c327db348d04d15d4cd3c4f811a336126)

Configs
==============

# local.conf for node1
[[local|localrc]]
ADMIN_PASSWORD=nomoresecret
DATABASE_PASSWORD=stackdb
RABBIT_PASSWORD=stackqueue
SERVICE_PASSWORD=$ADMIN_PASSWORD
LOGFILE=$DEST/logs/stack.sh.log
LOGDAYS=2
SWIFT_HASH=66a3d6b56c1f479c8b4e70ab5c2000f5
SWIFT_REPLICAS=1
SWIFT_DATA_DIR=$DEST/data
[[post-config|$NOVA_CONF]]
[image_cache]
subdirectory_name = _base_\\\$my_ip
remove_unused_original_minimum_age_seconds = 180

# local.conf for node2
[[local|localrc]]
ADMIN_PASSWORD=nomoresecret
DATABASE_PASSWORD=stackdb
MYSQL_PASSWORD=$DATABASE_PASSWORD
RABBIT_PASSWORD=stackqueue
SERVICE_PASSWORD=$ADMIN_PASSWORD
DATABASE_TYPE=mysql
SERVICE_HOST={node1 ip}
MYSQL_HOST=$SERVICE_HOST
RABBIT_HOST=$SERVICE_HOST
GLANCE_HOSTPORT=$SERVICE_HOST:9292
ENABLED_SERVICES=n-cpu,q-agt,c-vol,placement-client
[[post-config|$NOVA_CONF]]
[image_cache]
subdirectory_name = _base_\\\$my_ip
remove_unused_original_minimum_age_seconds = 180

Logs
==============

$ sudo journalctl -u devstack@n-cpu
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/image/glance.py", line 375, in download
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server image_chunks = self._client.call(
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/image/glance.py", line 190, in call
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server result = getattr(controller, method)(*args, **kwargs)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/glanceclient/common/utils.py", line 628, in inner
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server return RequestIdProxy(wrapped(*args, **kwargs))
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/glanceclient/v2/images.py", line 249, in data
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server resp, image_meta = self.http_client.get(url)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/keystoneauth1/adapter.py", line 395, in get
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server return self.request(url, 'GET', **kwargs)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/glanceclient/common/http.py", line 380, in request
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server return self._handle_response(resp)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/glanceclient/common/http.py", line 120, in _handle_response
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server raise exc.from_response(resp, resp.content)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server glanceclient.exc.HTTPNotFound: HTTP 404 Not Found: No image found with ID 6291c220-4bb5-48e6-9071-2ddb0c09a61e
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 71, in wrapped
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server _emit_versioned_exception_notification(
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server self.force_reraise()
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server raise self.value
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 63, in wrapped
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 183, in decorated_function
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server LOG.warning("Failed to revert task state for instance. "
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server self.force_reraise()
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server raise self.value
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 154, in decorated_function
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/utils.py", line 1434, in decorated_function
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 211, in decorated_function
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server compute_utils.add_instance_fault_from_exc(context,
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server self.force_reraise()
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server raise self.value
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 200, in decorated_function
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 3149, in start_instance
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server self._power_on(context, instance)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 3117, in _power_on
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server self.driver.power_on(context, instance,
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3614, in power_on
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server self._hard_reboot(context, instance, network_info, block_device_info,
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3492, in _hard_reboot
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server self._create_images_and_backing(context, instance, instance_dir,
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 10061, in _create_images_and_backing
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server self._try_fetch_image_cache(disk,
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 9946, in _try_fetch_image_cache
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server image.cache(fetch_func=fetch_func,
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/imagebackend.py", line 274, in cache
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server self.create_image(fetch_func_sync, base, size,
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/imagebackend.py", line 637, in create_image
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server prepare_template(target=base, *args, **kwargs)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py", line 360, in inner
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/imagebackend.py", line 271, in fetch_func_sync
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server fetch_func(target=target, *args, **kwargs)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/utils.py", line 395, in fetch_image
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server images.fetch_to_raw(context, image_id, target, trusted_certs)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/images.py", line 115, in fetch_to_raw
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server fetch(context, image_href, path_tmp, trusted_certs)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/images.py", line 105, in fetch
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server IMAGE_API.download(context, image_href, dest_path=path,
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/image/glance.py", line 1295, in download
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server return session.download(context, image_id, data=data,
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/image/glance.py", line 378, in download
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server _reraise_translated_image_exception(image_id)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/image/glance.py", line 1028, in _reraise_translated_image_exception
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server raise new_exc.with_traceback(exc_trace)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/image/glance.py", line 375, in download
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server image_chunks = self._client.call(
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/image/glance.py", line 190, in call
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server result = getattr(controller, method)(*args, **kwargs)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/glanceclient/common/utils.py", line 628, in inner
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server return RequestIdProxy(wrapped(*args, **kwargs))
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/glanceclient/v2/images.py", line 249, in data
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server resp, image_meta = self.http_client.get(url)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/keystoneauth1/adapter.py", line 395, in get
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server return self.request(url, 'GET', **kwargs)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/glanceclient/common/http.py", line 380, in request
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server return self._handle_response(resp)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/glanceclient/common/http.py", line 120, in _handle_response
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server raise exc.from_response(resp, resp.content)
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server nova.exception.ImageNotFound: Image 6291c220-4bb5-48e6-9071-2ddb0c09a61e could not be found.
Mar 23 12:33:44 node2 nova-compute[393558]: ERROR oslo_messaging.rpc.server

[1] https://docs.openstack.org/devstack/latest/guides/multinode-lab.html

Revision history for this message
Lee Yarwood (lyarwood) wrote :

> subdirectory_name = _base_\\\$my_ip

Can you give an actual example of the directory you're suggesting here, I can't parse this or tell if it's under the shared directory used by the instance disks. Additionally what's your use case for *not* hosting this on the share?

Revision history for this message
Nobuhiro MIKI (nmiki) wrote :

The following is an actual example. Referring to the nova documentation [1], the subdirectory_name is set to split the image cache directory for each node. Please note that the entire /opt/stack/data/nova/instances/ is shared.

I know this is a strange use case. I found this issue through trial and error while verifying live migration.

$ grep subdirectory_name /etc/nova/nova-cpu.conf
subdirectory_name = _base_$my_ip

$ find /opt/stack/data/nova/instances/
/opt/stack/data/nova/instances/
/opt/stack/data/nova/instances/locks
/opt/stack/data/nova/instances/locks/nova-storage-registry-lock
/opt/stack/data/nova/instances/locks/nova-1da480fd0bd6d365cce06421659d922b1bbed441
/opt/stack/data/nova/instances/compute_nodes
/opt/stack/data/nova/instances/22a86da9-f6db-493d-aca7-3fcdc2d318a8
/opt/stack/data/nova/instances/22a86da9-f6db-493d-aca7-3fcdc2d318a8/disk.info
/opt/stack/data/nova/instances/22a86da9-f6db-493d-aca7-3fcdc2d318a8/disk
/opt/stack/data/nova/instances/22a86da9-f6db-493d-aca7-3fcdc2d318a8/console.log
/opt/stack/data/nova/instances/_base_192.168.0.1 # for node1
/opt/stack/data/nova/instances/_base_192.168.0.1/1da480fd0bd6d365cce06421659d922b1bbed441
/opt/stack/data/nova/instances/_base_192.168.0.2 # for node2
/opt/stack/data/nova/instances/_base_192.168.0.2/1da480fd0bd6d365cce06421659d922b1bbed441

[1] https://github.com/openstack/nova/blob/3de7fb7c327db348d04d15d4cd3c4f811a336126/nova/conf/imagecache.py#L57

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :
Changed in nova:
assignee: nobody → Balazs Gibizer (balazs-gibizer)
assignee: Balazs Gibizer (balazs-gibizer) → Nobuhiro MIKI (nmiki)
status: New → Confirmed
importance: Undecided → Medium
Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Lee Yarwood (lyarwood) wrote :

I'm still not convinced by this use case. You're using a shared instance path (/opt/stack/data/nova/instances/) with individual compute image caches hosted within it that are duplicating cached images?!

/opt/stack/data/nova/instances/_base_192.168.0.1/1da480fd0bd6d365cce06421659d922b1bbed441
/opt/stack/data/nova/instances/_base_192.168.0.2/1da480fd0bd6d365cce06421659d922b1bbed441

How is this better than hosting the cached image once for all computes?

Revision history for this message
Nobuhiro MIKI (nmiki) wrote :

> I'm still not convinced by this use case. You're using a shared instance path (/opt/stack/data/nova/instances/) with individual compute image caches hosted within it that are duplicating cached images?!

Yes exactly. I'm sorry for making you deal with this inefficient use cases.

This may sound silly, but I was trying to figure out what would happen
if I split the image cache directory for each compute nodes in the sense
of IO distribution for Software Defined Storage.

Anyway, since this configuration is possible in nova, I think it would be
operator-friendly if it works as expected or leaves some message
in the log.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.