OpenStack Nova Compute NVIDIA vGPU Plugin Charm

Bug #2008146
Comment #10

Comment 10 for bug 2008146

Revision history for this message

Billy Olsen (billy-olsen) wrote on 2023-03-10:

#10

The vGPU enablement support within the nova-compute-nvidia-gpu charm was designed around the idea of mediated devices in conjunction with the upstream Nova documentation and implementation identified here https://docs.openstack.org/nova/zed/admin/virtual-gpu.html.

The gpu card's SR-IOV capabilities were not included in this work and will need to be worked into the plan in the future. Note, the mediated device bits were patched in the HWE kernel in order to continue to support the mediated device story.

Additionally, you will find that the OpenStack Nova code is designed to use mediated devices for setting up devices again and attaching them to the right parent devices. Libvirt has the option of persistent mdev devices, but this is not leveraged by Nova in its current state.

The sriov formatted vgpu devices remain an untested feature at this point in time from the OpenStack perspective and guarantees are not provided. The time-sliced mediated devices options were the ones implemented.

-- Note --

The work-around provided in the description allows for the card to be observed appropriately and can potentially be used. However, as odufourc noted in comment #7, this is not persisted across reboots of the server. This can be worked-around by using Juju's cloud-init snippet for instances and running the command in the `per-boot` section.

You should be able to leverage this with the following snippet in the `per-boot` section:

for card in `lspci -nn | awk '/NVIDIA/ {print "0000:" $1}`; do
echo "Enabling /usr/lib/nvidia/sriov-manage -e 0000:$card;
done