Too many errors can trigger compute failed_builds to get incremented
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
New
|
Undecided
|
Unassigned | ||
OpenStack Security Advisory |
New
|
Undecided
|
Unassigned |
Bug Description
So let's analyze what can cause a compute managers failed_builds to get incremented and point out that some of them should not be causing failed_builds to get incremented (which then can have the 'nice' effect of auto-disabling a nova-compute service).
So the return code of self._do_
Some unrelated to nova-compute exceptions that from reading the code can trigger this to happen:
- Unable to base64 decode injected files.
- Failure of notify_
- exception.
- exception.
exception.
exception.
cursive_
exception.
exception.
- exception.
- Anything that pops out of _build_resources
- Failed to allocate network
And many more?
summary: |
- To many errors can trigger compute failed_builds to get incremented + Too many errors can trigger compute failed_builds to get incremented |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
information type: | Public → Private Security |
information type: | Private Security → Public |
I just want to add my concern over the impact of this bug. I work with Josh and observed how this was discovered. We had a misconfigured image service for a little while, which left behind an image that could not be booted. As a result, all of our hypervisors in our staging environment except one were disabled. That's not what was intended by this setting. The only failures that should increment this counter are those that are *specifically* faults caused by the compute node itself and if that can't be determined 100% of the time, the counter must not be incremented, as it would allow users to DoS the entire cloud simply by uploading an image that has a root disk that is too big for a flavor.