Proper handling of suspended/stopped VMs in scheduling
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Public Cloud WG |
Won't Fix
|
Undecided
|
Tobias Rydberg |
Bug Description
Currently, suspended VMs are handled for resource allocation by the Nova scheduler in exactly the same way as running VMs are. For example, say a compute node has 128GB RAM to use for guests, and it has 4 32GB guests scheduled that are all suspended (and hence, use no RAM and no CPU), then the Nova scheduler considers that node "full" and won't assign any more guests to that node, under the assumption that suspended guests can wake up any time. The same applies to guests that are currently stopped. In summary, guests in the SHUTOFF and ACTIVE states are treated equally.
While that assumption is fine to be applied by default, it should be configurable. For example, we could have an option like shutoff_ram_ratio (I can't think of a better word right now) that would default to 1.0, meaning a suspended/stopped VM is treated exactly like a running one for scheduling purposes. Then if an operator changes that value to, say, 0.2, suspended VMs would only count toward scheduling with 20% of configured RAM allocation. A similar factor could be applied for CPU cores (but not, obviously, to disk space, as that is utilized even if a guest is suspended).
Ideally, just like {ram,disk,
Changed in openstack-publiccloud-wg: | |
status: | New → Won't Fix |
But if we don't support https:/ /bugs.launchpad .net/openstack- publiccloud- wg/+bug/ 1791681 and the user resumes the guest, and now the compute node is overcommitted, is that acceptable? Because we don't plan on supporting https:/ /bugs.launchpad .net/openstack- publiccloud- wg/+bug/ 1791681 to auto-migrate on resume if the node is overcommitted.