Comment 0 for bug 2008146

Revision history for this message
Andy Wu (qch2012) wrote : charm does not create vgpu functions

Tested this on node with Nvidia Tesla A10 card with vGPU software: nvidia-vgpu-ubuntu-525_525.85.07_amd64.deb

channel : yoga/stable
OS: jammy

After attaching vGPU driver to nova-compute-nvidia-vgpu and reboot the node, the nova-compute-nvidia-vgpu unit is active with status : Unit is ready: NVIDIA GPU found; installed NVIDIA software: 525.85.07

Execute nvidia-smi on the node confirms driver is intalled successfully
   ubuntu@ps6-rb2-n1:~$ sudo nvidia-smi
   Thu Feb 23 01:20:24 2023
   +-----------------------------------------------------------------------------+
   | NVIDIA-SMI 525.85.07 Driver Version: 525.85.07 CUDA Version: N/A |
   |-------------------------------+----------------------+----------------------+
   | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
   | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
   | | | MIG M. |
   |===============================+======================+======================|
   | 0 NVIDIA A10 On | 00000000:25:00.0 Off | 0 |
   | 0% 33C P8 22W / 150W | 0MiB / 23028MiB | 0% Default |
   | | | N/A |
   +-------------------------------+----------------------+----------------------+

   +-----------------------------------------------------------------------------+
   | Processes: |
   | GPU GI CI PID Type Process name GPU Memory |
   | ID ID Usage |
   |=============================================================================|
   | No running processes found |
   +-----------------------------------------------------------------------------+

However juju run-action --wait nova-compute-nvidia-vgpu/0 list-vgpu-types does not return anything

ubuntu@ps6-infra1:~$ juju run-action --wait nova-compute-nvidia-vgpu/5 list-vgpu-types
unit-nova-compute-nvidia-vgpu-5:
  UnitId: nova-compute-nvidia-vgpu/5
  id: "346"
  results:
    output: ""
  status: completed

Inside the node, gpu card bus info is 25:00.0
   ubuntu@ps6-rb2-n1:~$ lspci -nn | grep -i Nvidia
   25:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [A10] [10de:2236] (rev a1)

But no virtual functions are created

  cd /sys/bus/pci/devices/0000\:25\:00.0/
  ls | grep virtfn

I need create virtual funciton manually
   /usr/lib/nvidia/sriov-manage -e 0000:25:00.0

after that I can see virtual functions
  ls | grep virtfn
   virtfn0
   virtfn1
   virtfn10
   virtfn11

Re-run list-vpu-types

ubuntu@ps6-infra1:~$ juju run-action --wait nova-compute-nvidia-vgpu/5 list-vgpu-types
unit-nova-compute-nvidia-vgpu-5:
  UnitId: nova-compute-nvidia-vgpu/5
  id: "348"
  results:
    output: |-
      nvidia-588, 0000:25:02.3, NVIDIA A10-1B, num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
      nvidia-589, 0000:25:02.3, NVIDIA A10-2B, num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12
      nvidia-590, 0000:25:02.3, NVIDIA A10-1Q, num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
      nvidia-591, 0000:25:02.3, NVIDIA A10-2Q, num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12