We have flavor classes using different nova disk backends that are separated using host aggregates. For example, we have a flavor named l1.tiny which is using imagebackend, and s1.small using rbd backend. The hypervisors configured for imagebackend are added to the host aggregate where l1.* instances are scheduled, and the rbd hypervisors are in an aggregate where s1.* instances are scheduled.
When resizing an instance from l1.tiny to s1.small, the instance fails to resize and enters error state. The root disk is also lost during the failed resize. The host of the instance is set to one of the s1.* aggregate HVs, and the imagebackend disk is no longer present on the original l1.* hypervisor.
The error provided in 'instance show' is:
| fault | {u'message': u'[errno 2] error opening image 5a8ab7a3-3e59-442c-a603-2c24652788cb_disk at snapshot None', u'code': 500, u'details': u' File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/nova/compute/manager.py", line 204, in decorated_function\n return function(self, context, *args, **kwargs)\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/nova/compute/manager.py", line 4062, in finish_resize\n self._set_instance_obj_error_state(context, instance)\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n self.force_reraise()\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type_, self.value, self.tb)\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/nova/compute/manager.py", line 4050, in finish_resize\n disk_info, image_meta)\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/nova/compute/manager.py", line 4012, in _finish_resize\n old_instance_type)\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n self.force_reraise()\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type_, self.value, self.tb)\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/nova/compute/manager.py", line 4007, in _finish_resize\n block_device_info, power_on)\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7454, in finish_migration\n fallback_from_host=migration.source_compute)\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3160, in _create_image\n fallback_from_host)\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3264, in _create_and_inject_local_root\n backend.create_snap(libvirt_utils.RESIZE_SNAPSHOT_NAME)\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 941, in create_snap\n return self.driver.create_snap(self.rbd_name, name)\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py", line 392, in create_snap\n with RBDVolumeProxy(self, str(volume), pool=pool) as vol:\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py", line 78, in __init__\n driver._disconnect_from_rados(client, ioctx)\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n self.force_reraise()\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type_, self.value, self.tb)\n File "/openstack/venvs/nova-untagged/local/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py", line 74, in __init__\n read_only=read_only))\n File "rbd.pyx", line 1392, in rbd.Image.__init__ (/build/ceph-12.2.2/obj-x86_64-linux-gnu/src/pybind/rbd/pyrex/rbd.c:13540)\n', u'created': u'2018-11-14T11:03:11Z'} |
We are currently seeing this behavior on Ocata. I'm not certain if more recent nova releases experience this also.
based on irc conversation this is a more general bug then the title suggest.
this likly will effect resize and cold migrate between any to hosts where the image backend changes
e.g. any combination of lvm,rbd,image where the value differes on each host.
this may or may not also affect live migration.