Allocations are "doubled up" on same host resize even though there is only 1 server on the host
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Triaged
|
High
|
Unassigned |
Bug Description
This is a long-standing known issue from at least Pike when the nova FilterScheduler started using placement to create allocations during server create and move (e.g. resize) operations.
In Pike, resize to the same host resulted in allocations against the compute node provider in placement to come from both the old and new flavor and were both tied to the instance as the resource consumer.
Move operations and allocation handling was improved in Queens with this blueprint:
https:/
Where the source node allocations are moved to the migration record as the consumer and the target node allocations are against the instance record consumer.
That is also true of resize to the same host, however, we still have the issue that the compute node resource provider usage is still effectively "doubled up" during the resize because it's showing usage for two flavors total when really there is only one being used.
The reported resource usage on the compute node provider during a same host resize should be the *maximum* of both the old and new flavor, not the combined aggregate.
Here is a simple recreate with devstack (created from master today):
1. we start with no resource usage on the single node provider
stack@stein:~$ openstack resource provider usage show e2bc5091-
+------
| resource_class | usage |
+------
| VCPU | 0 |
| MEMORY_MB | 0 |
| DISK_GB | 0 |
+------
2. create a server and show there is usage:
stack@stein:~$ openstack flavor list
+----+-
| ID | Name | RAM | Disk | Ephemeral | VCPUs | Is Public |
+----+-
| 1 | m1.tiny | 512 | 1 | 0 | 1 | True |
| 2 | m1.small | 2048 | 20 | 0 | 1 | True |
| 3 | m1.medium | 4096 | 40 | 0 | 2 | True |
| 4 | m1.large | 8192 | 80 | 0 | 4 | True |
| 5 | m1.xlarge | 16384 | 160 | 0 | 8 | True |
| c1 | cirros256 | 256 | 0 | 0 | 1 | True |
| d1 | ds512M | 512 | 5 | 0 | 1 | True |
| d2 | ds1G | 1024 | 10 | 0 | 1 | True |
| d3 | ds2G | 2048 | 10 | 0 | 2 | True |
| d4 | ds4G | 4096 | 20 | 0 | 4 | True |
+----+-
stack@stein:~$ openstack server create --flavor m1.tiny --image cirros-
stack@stein:~$ openstack resource provider usage show e2bc5091-
+------
| resource_class | usage |
+------
| VCPU | 1 |
| MEMORY_MB | 512 |
| DISK_GB | 1 |
+------
3. resize the server and check usage:
stack@stein:~$ openstack server resize resize-same-host --flavor m1.small
stack@stein:~$ openstack server list
+------
| ID | Name | Status | Networks | Image | Flavor |
+------
| d7d743d8-
+------
stack@stein:~$ openstack resource provider usage show e2bc5091-
+------
| resource_class | usage |
+------
| VCPU | 2 |
| MEMORY_MB | 2560 |
| DISK_GB | 21 |
+------
And here we see the old/new flavor usage are cumulative on the single node provider.
4. confirm the resize and the usage is just the new m1.small flavor.
stack@stein:~$ openstack server resize resize-same-host --confirm
stack@stein:~$ openstack server list
+------
| ID | Name | Status | Networks | Image | Flavor |
+------
| d7d743d8-
+------
stack@stein:~$ openstack resource provider usage show e2bc5091-
+------
| resource_class | usage |
+------
| VCPU | 1 |
| MEMORY_MB | 2048 |
| DISK_GB | 20 |
+------
stack@stein:~$
===
Same-host resize is disabled by default but can be important in at least two cases:
1. Servers in an affinity (same-host) group cannot resize if they are not allowed to resize on the same host.
2. "Edge" deployment scenarios where there are 1 or 2 compute hosts means being able to resize on the same host is critical - and probably what's more critical in those edge scenarios is not reporting resource usage that is not really there, since it could result in scheduling failures to that host which otherwise would have fit.
Changed in nova: | |
assignee: | nobody → Zhenyu Zheng (zhengzhenyu) |
Changed in nova: | |
assignee: | Zhenyu Zheng (zhengzhenyu) → nobody |
One hacky way we could handle this is in conductor, after we've moved the instance allocations for the old_flavor to the migration record, if the selected host is the same host the instance is already one, we just fix the allocations so they are the max of the two flavors - but we'd need to sort out if we still use 2 consumers or only 1 - it might make sense to only have the instance consumer for the same-host resize case, but there is logic in the nova-compute service since queens that expects the source node allocations to be tracked by the migration record consumer, so those would have to be audited so they don't blow up now depending on what conductor does.
Another wrinkle that we have to worry about is resize can reschedule if the selected host fails the resize. So we could have a case where the scheduler picks 3 hosts:
1. first selected host is the same host, but fails, so we reschedule to host 2
2. second host fails, we reschedule to host 3
3. the resize passes on the 3rd host (2nd alternate)
In those cases, the alternative hosts are *not* the same host so how would we deal with the allocations then because the old flavor allocations still need to be on the source host and the new flavor allocations need to be on the destination host.