The allocation of VGPU has race problem
Bug #1836204 reported by
Alex Xu
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Triaged
|
High
|
Alex Xu |
Bug Description
The vgpu is allocated by this method https:/
That method list the assigned mdev by listing the libvirt domain.
But if there are two concurrent request come to this method. They will see the set of assigned mdev. So they may get same free mdev also.
So there are a race window between:
https:/
and
We create the domain in the libvirt
https:/
Changed in nova: | |
assignee: | nobody → Alex Xu (xuhj) |
Changed in nova: | |
status: | New → Triaged |
importance: | Undecided → High |
tags: | added: libvirt |
To post a comment you must log in.
This is of high importance not because the race is particularly likely in current code, but we need to establish the framework to fix it so we can reuse that framework for other similar types of hardware.
In general, the fix is to claim (earmark for use by a specific instance) specific hardware artifacts [1] on the compute node in instance_claim, which is under COMPUTE_ RESOURCE_ SEMAPHORE. But only the virt driver can know what needs to be done to effect that claim for its specific hypervisor. And today instance_claim doesn't talk to the virt driver at all.
So the solution discussed in IRC [2] is to establish a new ComputeDriver interface, working title claim_for_ instance( ) (and possibly a corresponding unclaim_ for_instance( ) for rollbacks), which will be invoked from instance_claim (and _move_claim).
Using VGPUs-in-libvirt as an example, claim_for_instance would use an in-memory dict to associate a specific mdev with the specific instance for each VGPU in the allocation. This mapping could then be deleted during spawn, since the information can subsequently be gleaned from the domain XML.
[1] where "hardware" encompasses things like VFs - don't get pedantic on me eavesdrop. openstack. org/irclogs/ %23openstack- nova/%23opensta ck-nova. 2019-07- 11.log. html#t2019- 07-11T12: 39:18
[2] http://