scheduler: build failure high negative weighting
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Incomplete
|
Undecided
|
Unassigned | ||
OpenStack Nova Cloud Controller Charm |
Fix Released
|
High
|
James Page | ||
OpenStack Security Advisory |
Won't Fix
|
Undecided
|
Unassigned | ||
nova (Ubuntu) |
Triaged
|
High
|
Unassigned |
Bug Description
Whilst debugging a Queens cloud which seems to be landing all new instances on 3 out of 9 hypervisors (which resulted in three very heavily overloaded servers) I noticed that the weighting of the build failure weighter is -1000000.0 * number of failures:
https:/
This means that a server which has any sort of build failure instantly drops to the bottom of the weighed list of hypervisors for scheduling of instances.
Why might a instance fail to build? Could be a timeout due to load, might also be due to a bad image (one that won't actually boot under qemu). This second cause could be triggered by an end user of the cloud inadvertently causing all instances to be pushed to a small subset of hypervisors (which is what I think happened in our case).
This feels like quite a dangerous default to have given the potential to DOS hypervisors intentionally or otherwise.
ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: nova-scheduler 2:17.0.7-0ubuntu1
ProcVersionSign
Uname: Linux 4.15.0-43-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
Date: Fri Mar 1 13:57:39 2019
NovaConf: Error: [Errno 13] Permission denied: '/etc/nova/
PackageArchitec
ProcEnviron:
TERM=screen-
PATH=(custom, no user)
XDG_RUNTIME_
LANG=C.UTF-8
SHELL=/bin/bash
SourcePackage: nova
UpgradeStatus: No upgrade log present (probably fresh install)
Changed in nova (Ubuntu): | |
status: | New → Won't Fix |
information type: | Private Security → Public Security |
Changed in charm-nova-cloud-controller: | |
status: | New → In Progress |
importance: | Undecided → High |
assignee: | nobody → James Page (james-page) |
tags: | added: sts |
Changed in charm-nova-cloud-controller: | |
milestone: | none → 19.04 |
status: | Fix Committed → Confirmed |
status: | Confirmed → Fix Released |
as a side note its really hard to see the calculated weights for each host in the scheduler as the weighting is stripped before the debug log message is made here:
https:/ /github. com/openstack/ nova/blob/ master/ nova/scheduler/ filter_ scheduler. py#L460
I figured what was happening out by logging the list of WeighedHosts rather than the encapsulated obj's