We can see that in this case, the instance in question has it's files deleted right before the libvirt driver, via the update_available_resource periodic task, gets to processing that instance:
Mar 13 22:24:03.367217 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: INFO nova.virt.libvirt.driver [None req-d0181533-0010-4268-bea5-fda2a392e8f1 tempest-LiveMigrationRemoteConsolesV26Test-2100428245 tempest-LiveMigrationRemoteConsolesV26Test-2100428245] [instance: 0460cf87-16f1-4aa3-a964-7cab159327dc] Deleting instance files /opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc_del
Mar 13 22:24:03.367725 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: INFO nova.virt.libvirt.driver [None req-d0181533-0010-4268-bea5-fda2a392e8f1 tempest-LiveMigrationRemoteConsolesV26Test-2100428245 tempest-LiveMigrationRemoteConsolesV26Test-2100428245] [instance: 0460cf87-16f1-4aa3-a964-7cab159327dc] Deletion of /opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc_del complete
Mar 13 22:24:03.425580 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager [None req-d0181533-0010-4268-bea5-fda2a392e8f1 tempest-LiveMigrationRemoteConsolesV26Test-2100428245 tempest-LiveMigrationRemoteConsolesV26Test-2100428245] Error updating resources for node ubuntu-xenial-rax-iad-0002937850.: InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk : Unexpected error while running command.
Mar 13 22:24:03.425969 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk --force-share
Mar 13 22:24:03.426340 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: Exit code: 1
Mar 13 22:24:03.426705 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: Stdout: u''
Mar 13 22:24:03.427044 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: Stderr: u"qemu-img: Could not open '/opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk': Could not open '/opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk': No such file or directory\n"
Mar 13 22:24:03.427380 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager Traceback (most recent call last):
Mar 13 22:24:03.427724 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/manager.py", line 7320, in update_available_resource_for_node
Mar 13 22:24:03.428084 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager rt.update_available_resource(context, nodename)
Mar 13 22:24:03.428410 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 664, in update_available_resource
Mar 13 22:24:03.428653 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename)
Mar 13 22:24:03.428898 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 6432, in get_available_resource
Mar 13 22:24:03.429219 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager disk_over_committed = self._get_disk_over_committed_size_total()
Mar 13 22:24:03.429576 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 7977, in _get_disk_over_committed_size_total
Mar 13 22:24:03.429850 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager config, block_device_info)
Mar 13 22:24:03.430176 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 7896, in _get_instance_disk_info_from_config
Mar 13 22:24:03.430498 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager virt_size = disk_api.get_disk_size(path)
Mar 13 22:24:03.430768 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/virt/disk/api.py", line 140, in get_disk_size
Mar 13 22:24:03.431081 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager return images.qemu_img_info(path).virtual_size
Mar 13 22:24:03.431316 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/virt/images.py", line 82, in qemu_img_info
Mar 13 22:24:03.431546 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager raise exception.InvalidDiskInfo(reason=msg)
Mar 13 22:24:03.431785 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk : Unexpected error while running command.
Mar 13 22:24:03.432017 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk --force-share
Mar 13 22:24:03.432266 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager Exit code: 1
Mar 13 22:24:03.432498 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager Stdout: u''
Mar 13 22:24:03.432742 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager Stderr: u"qemu-img: Could not open '/opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk': Could not open '/opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk': No such file or directory\n"
Mar 13 22:24:03.432981 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager
We should likely just catch the ProcessExecutionError, make a best effort attempt at checking the return code (1) and error message for "No such file or directory" and if that's the case, skip that instance and don't fail the entire process of building the current set of resources for the host.
I thought about adding the deleted=False filter to this query:
But at this point, libvirt has already told us the domain still exists on the host so filtering by deleted=False doesn't really help here since the instance isn't deleted yet, it's still on the host according to the hypervisor, but it's in the process of being deleted and the files are already cleaned up by the time we hit the error.
We still see this in CI runs periodically:
http:// logs.openstack. org/70/ 552170/ 2/check/ legacy- tempest- dsvm-multinode- live-migration/ 474e6b5/ logs/screen- n-cpu.txt. gz?level= INFO#_Mar_ 13_22_24_ 03_425580
We can see that in this case, the instance in question has it's files deleted right before the libvirt driver, via the update_ available_ resource periodic task, gets to processing that instance:
Mar 13 22:24:03.367217 ubuntu- xenial- rax-iad- 0002937850 nova-compute[ 30634]: INFO nova.virt. libvirt. driver [None req-d0181533- 0010-4268- bea5-fda2a392e8 f1 tempest- LiveMigrationRe moteConsolesV26 Test-2100428245 tempest- LiveMigrationRe moteConsolesV26 Test-2100428245 ] [instance: 0460cf87- 16f1-4aa3- a964-7cab159327 dc] Deleting instance files /opt/stack/ data/nova/ instances/ 0460cf87- 16f1-4aa3- a964-7cab159327 dc_del xenial- rax-iad- 0002937850 nova-compute[ 30634]: INFO nova.virt. libvirt. driver [None req-d0181533- 0010-4268- bea5-fda2a392e8 f1 tempest- LiveMigrationRe moteConsolesV26 Test-2100428245 tempest- LiveMigrationRe moteConsolesV26 Test-2100428245 ] [instance: 0460cf87- 16f1-4aa3- a964-7cab159327 dc] Deletion of /opt/stack/ data/nova/ instances/ 0460cf87- 16f1-4aa3- a964-7cab159327 dc_del complete xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager [None req-d0181533- 0010-4268- bea5-fda2a392e8 f1 tempest- LiveMigrationRe moteConsolesV26 Test-2100428245 tempest- LiveMigrationRe moteConsolesV26 Test-2100428245 ] Error updating resources for node ubuntu- xenial- rax-iad- 0002937850. : InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /opt/stack/ data/nova/ instances/ 0460cf87- 16f1-4aa3- a964-7cab159327 dc/disk : Unexpected error while running command. xenial- rax-iad- 0002937850 nova-compute[ 30634]: Command: /usr/bin/python2 -m oslo_concurrenc y.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /opt/stack/ data/nova/ instances/ 0460cf87- 16f1-4aa3- a964-7cab159327 dc/disk --force-share xenial- rax-iad- 0002937850 nova-compute[ 30634]: Exit code: 1 xenial- rax-iad- 0002937850 nova-compute[ 30634]: Stdout: u'' xenial- rax-iad- 0002937850 nova-compute[ 30634]: Stderr: u"qemu-img: Could not open '/opt/stack/ data/nova/ instances/ 0460cf87- 16f1-4aa3- a964-7cab159327 dc/disk' : Could not open '/opt/stack/ data/nova/ instances/ 0460cf87- 16f1-4aa3- a964-7cab159327 dc/disk' : No such file or directory\n" xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager Traceback (most recent call last): xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager File "/opt/stack/ new/nova/ nova/compute/ manager. py", line 7320, in update_ available_ resource_ for_node xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager rt.update_ available_ resource( context, nodename) xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager File "/opt/stack/ new/nova/ nova/compute/ resource_ tracker. py", line 664, in update_ available_ resource xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager resources = self.driver. get_available_ resource( nodename) xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager File "/opt/stack/ new/nova/ nova/virt/ libvirt/ driver. py", line 6432, in get_available_ resource xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager disk_over_committed = self._get_ disk_over_ committed_ size_total( ) xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager File "/opt/stack/ new/nova/ nova/virt/ libvirt/ driver. py", line 7977, in _get_disk_ over_committed_ size_total xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager config, block_device_info) xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager File "/opt/stack/ new/nova/ nova/virt/ libvirt/ driver. py", line 7896, in _get_instance_ disk_info_ from_config xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager virt_size = disk_api. get_disk_ size(path) xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager File "/opt/stack/ new/nova/ nova/virt/ disk/api. py", line 140, in get_disk_size xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager return images. qemu_img_ info(path) .virtual_ size xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager File "/opt/stack/ new/nova/ nova/virt/ images. py", line 82, in qemu_img_info xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager raise exception. InvalidDiskInfo (reason= msg) xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /opt/stack/ data/nova/ instances/ 0460cf87- 16f1-4aa3- a964-7cab159327 dc/disk : Unexpected error while running command. xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager Command: /usr/bin/python2 -m oslo_concurrenc y.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /opt/stack/ data/nova/ instances/ 0460cf87- 16f1-4aa3- a964-7cab159327 dc/disk --force-share xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager Exit code: 1 xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager Stdout: u'' xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager Stderr: u"qemu-img: Could not open '/opt/stack/ data/nova/ instances/ 0460cf87- 16f1-4aa3- a964-7cab159327 dc/disk' : Could not open '/opt/stack/ data/nova/ instances/ 0460cf87- 16f1-4aa3- a964-7cab159327 dc/disk' : No such file or directory\n" xenial- rax-iad- 0002937850 nova-compute[ 30634]: ERROR nova.compute. manager
Mar 13 22:24:03.367725 ubuntu-
Mar 13 22:24:03.425580 ubuntu-
Mar 13 22:24:03.425969 ubuntu-
Mar 13 22:24:03.426340 ubuntu-
Mar 13 22:24:03.426705 ubuntu-
Mar 13 22:24:03.427044 ubuntu-
Mar 13 22:24:03.427380 ubuntu-
Mar 13 22:24:03.427724 ubuntu-
Mar 13 22:24:03.428084 ubuntu-
Mar 13 22:24:03.428410 ubuntu-
Mar 13 22:24:03.428653 ubuntu-
Mar 13 22:24:03.428898 ubuntu-
Mar 13 22:24:03.429219 ubuntu-
Mar 13 22:24:03.429576 ubuntu-
Mar 13 22:24:03.429850 ubuntu-
Mar 13 22:24:03.430176 ubuntu-
Mar 13 22:24:03.430498 ubuntu-
Mar 13 22:24:03.430768 ubuntu-
Mar 13 22:24:03.431081 ubuntu-
Mar 13 22:24:03.431316 ubuntu-
Mar 13 22:24:03.431546 ubuntu-
Mar 13 22:24:03.431785 ubuntu-
Mar 13 22:24:03.432017 ubuntu-
Mar 13 22:24:03.432266 ubuntu-
Mar 13 22:24:03.432498 ubuntu-
Mar 13 22:24:03.432742 ubuntu-
Mar 13 22:24:03.432981 ubuntu-
We should likely just catch the ProcessExecutio nError, make a best effort attempt at checking the return code (1) and error message for "No such file or directory" and if that's the case, skip that instance and don't fail the entire process of building the current set of resources for the host.
I thought about adding the deleted=False filter to this query:
https:/ /github. com/openstack/ nova/blob/ 96fdce7cab7c507 36fb96bcaa622da c825031a2f/ nova/virt/ libvirt/ driver. py#L7932
But at this point, libvirt has already told us the domain still exists on the host so filtering by deleted=False doesn't really help here since the instance isn't deleted yet, it's still on the host according to the hypervisor, but it's in the process of being deleted and the files are already cleaned up by the time we hit the error.