commit 44376d2e212e0f9405a58dc7fc4d5b38d70ac42e
Author: Stephen Finucane <email address hidden>
Date: Wed Aug 5 14:27:06 2020 +0100
Don't unset Instance.old_flavor, new_flavor until necessary
Since change Ia6d8a7909081b0b856bd7e290e234af7e42a2b38, the resource
tracker's 'drop_move_claim' method has been capable of freeing up
resource usage. However, this relies on accurate resource reporting.
It transpires that there's a race whereby the resource tracker's
'update_available_resource' periodic task can end up not accounting for
usage from migrations that are in the process of being completed. The
root cause is the resource tracker's reliance on the stashed flavor in a
given migration record [1]. Previously, this information was deleted by
the compute manager at the start of the confirm migration operation [2].
The compute manager would then call the virt driver [3], which could
take a not insignificant amount of time to return, before finally
dropping the move claim. If the periodic task ran between the clearing
of the stashed flavor and the return of the virt driver, it would find a
migration record with no stashed flavor and would therefore ignore this
record for accounting purposes [4], resulting in an incorrect record for
the compute node, and an exception when the 'drop_move_claim' attempts
to free up the resources that aren't being tracked.
The solution to this issue is pretty simple. Instead of unsetting the
old flavor record from the migration at the start of the various move
operations, do it afterwards.
Reviewed: https:/ /review. opendev. org/744958 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=44376d2e212 e0f9405a58dc7fc 4d5b38d70ac42e
Committed: https:/
Submitter: Zuul
Branch: master
commit 44376d2e212e0f9 405a58dc7fc4d5b 38d70ac42e
Author: Stephen Finucane <email address hidden>
Date: Wed Aug 5 14:27:06 2020 +0100
Don't unset Instance. old_flavor, new_flavor until necessary
Since change Ia6d8a7909081b0 b856bd7e290e234 af7e42a2b38, the resource available_ resource' periodic task can end up not accounting for
tracker's 'drop_move_claim' method has been capable of freeing up
resource usage. However, this relies on accurate resource reporting.
It transpires that there's a race whereby the resource tracker's
'update_
usage from migrations that are in the process of being completed. The
root cause is the resource tracker's reliance on the stashed flavor in a
given migration record [1]. Previously, this information was deleted by
the compute manager at the start of the confirm migration operation [2].
The compute manager would then call the virt driver [3], which could
take a not insignificant amount of time to return, before finally
dropping the move claim. If the periodic task ran between the clearing
of the stashed flavor and the return of the virt driver, it would find a
migration record with no stashed flavor and would therefore ignore this
record for accounting purposes [4], resulting in an incorrect record for
the compute node, and an exception when the 'drop_move_claim' attempts
to free up the resources that aren't being tracked.
The solution to this issue is pretty simple. Instead of unsetting the
old flavor record from the migration at the start of the various move
operations, do it afterwards.
[1] https:/ /github. com/openstack/ nova/blob/ 6557d67/ nova/compute/ resource_ tracker. py#L1288 /github. com/openstack/ nova/blob/ 6557d67/ nova/compute/ manager. py#L4310- L4315 /github. com/openstack/ nova/blob/ 6557d67/ nova/compute/ manager. py#L4330- L4331 /github. com/openstack/ nova/blob/ 6557d67/ nova/compute/ resource_ tracker. py#L1300
[2] https:/
[3] https:/
[4] https:/
Change-Id: I4760b01b695c94 fa371b72216d398 388cf981d28
Signed-off-by: Stephen Finucane <email address hidden>
Partial-Bug: #1879878
Related-Bug: #1834349
Related-Bug: #1818914