When testing some Xeon server for virtual GPU support, I saw that Nova provides an exception as the i915 driver doesn't provide a name for mdev types :
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager Traceback (most recent call last):
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 9824, in _update_available_resource_for_node
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager startup=startup)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 896, in update_available_resource
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/oslo_concurrency/lockutils.py", line 360, in inner
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return f(*args, **kwargs)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 981, in _update_available_resource
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update(context, cn, startup=startup)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1233, in _update
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update_to_placement(context, compute_node, startup)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 49, in wrapped_f
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 206, in call
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return attempt.get(self._wrap_exception)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 247, in get
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager six.reraise(self.value[0], self.value[1], self.value[2])
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/six.py", line 703, in reraise
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager raise value
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 200, in call
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1169, in _update_to_placement
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self.driver.update_provider_tree(prov_tree, nodename)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7857, in update_provider_tree
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager provider_tree, nodename, allocations=allocations)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 8250, in _update_provider_tree_for_vgpu
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager inventories_dict = self._get_gpu_inventories()
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7028, in _get_gpu_inventories
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager count_per_dev = self._count_mdev_capable_devices(enabled_vgpu_types)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6984, in _count_mdev_capable_devices
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager types=enabled_vgpu_types)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7268, in _get_mdev_capable_devices
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager device = self._get_mdev_capabilities_for_dev(name, types)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7253, in _get_mdev_capabilities_for_dev
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager 'name': cap['name'],
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager KeyError: 'name'
For example :
[root@mymachine ~]# ll /sys/class/mdev_bus/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_8/
total 0
-r--r--r--. 1 root root 4096 Sep 22 14:18 available_instances
--w-------. 1 root root 4096 Sep 23 06:01 create
-r--r--r--. 1 root root 4096 Sep 23 05:43 description
-r--r--r--. 1 root root 4096 Sep 22 14:18 device_api
drwxr-xr-x. 2 root root 0 Sep 23 06:01 devices
When looking at the kernel driver API documentation https://www.kernel.org/doc/html/latest/driver-api/vfio-mediated-device.html it says that the "name" attribute is optional:
"name
This attribute should show human readable name. This is optional attribute."
The fix should be easy, we don't use this attribute in Nova.
Fix proposed to branch: master /review. opendev. org/753574
Review: https:/