ResourceTracker._update should restore previous old_resources value if ComputeNode.save fails
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Matt Riedemann | ||
Stein |
Fix Committed
|
Medium
|
Matt Riedemann |
Bug Description
This is a follow up to bug 1834694 with the debug information here:
https:/
This is on an overloaded system where conductor and mysql are having problems and database connections are getting dropped.
On the first start of the compute service, the compute node record is created without the free_disk_gb field set.
Later in the _update() method in ResourceTracker the _resource_change method returns True and updates the self.old_resources value:
Then the ComputeNode.save() fails with a DB error here:
That kills the update_
Later when update_
So we don't try to call ComputeNode.save() again but instead call _update_
This can create the resource provider with inventory in the placement service.
As a result, the scheduler can get the compute node resource provider back from placement even though it's not updated which results in hitting this code in the scheduler:
That leaves some of the HostState fields unset which in turn results in issues like bug 1834691 and bug 1834694.
We could deal with the RT issues in a few ways, like not allowing the compute service to start if we can't create and update the compute node (rather than just catch and swallow Exception in the ComputeManager), but that might have other side effects. An easy thing to do here is make sure to rollback the changes to old_resources in the RT if compute_node.save() fails.
tags: | added: db scheduler |
Changed in nova: | |
assignee: | Matt Riedemann (mriedem) → Chris Dent (cdent) |
Changed in nova: | |
assignee: | Chris Dent (cdent) → Matt Riedemann (mriedem) |
Fix proposed to branch: master /review. opendev. org/668263
Review: https:/