scheduler: build failure high negative weighting
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| OpenStack Compute (nova) |
Incomplete
|
Undecided
|
Unassigned | ||
| OpenStack Nova Cloud Controller Charm |
Fix Released
|
High
|
James Page | ||
| OpenStack Security Advisory |
Won't Fix
|
Undecided
|
Unassigned | ||
| nova (Ubuntu) |
Triaged
|
High
|
Unassigned | ||
Bug Description
Whilst debugging a Queens cloud which seems to be landing all new instances on 3 out of 9 hypervisors (which resulted in three very heavily overloaded servers) I noticed that the weighting of the build failure weighter is -1000000.0 * number of failures:
https:/
This means that a server which has any sort of build failure instantly drops to the bottom of the weighed list of hypervisors for scheduling of instances.
Why might a instance fail to build? Could be a timeout due to load, might also be due to a bad image (one that won't actually boot under qemu). This second cause could be triggered by an end user of the cloud inadvertently causing all instances to be pushed to a small subset of hypervisors (which is what I think happened in our case).
This feels like quite a dangerous default to have given the potential to DOS hypervisors intentionally or otherwise.
ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: nova-scheduler 2:17.0.7-0ubuntu1
ProcVersionSign
Uname: Linux 4.15.0-43-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
Date: Fri Mar 1 13:57:39 2019
NovaConf: Error: [Errno 13] Permission denied: '/etc/nova/
PackageArchitec
ProcEnviron:
TERM=screen-
PATH=(custom, no user)
XDG_RUNTIME_
LANG=C.UTF-8
SHELL=/bin/bash
SourcePackage: nova
UpgradeStatus: No upgrade log present (probably fresh install)
| Changed in nova (Ubuntu): | |
| status: | New → Won't Fix |
| information type: | Private Security → Public Security |
| Changed in charm-nova-cloud-controller: | |
| status: | New → In Progress |
| importance: | Undecided → High |
| assignee: | nobody → James Page (james-page) |
| tags: | added: sts |
| Changed in charm-nova-cloud-controller: | |
| milestone: | none → 19.04 |
| status: | Fix Committed → Confirmed |
| status: | Confirmed → Fix Released |

as a side note its really hard to see the calculated weights for each host in the scheduler as the weighting is stripped before the debug log message is made here:
https:/ /github. com/openstack/ nova/blob/ master/ nova/scheduler/ filter_ scheduler. py#L460
I figured what was happening out by logging the list of WeighedHosts rather than the encapsulated obj's