Scheduler Race Condition at high volume
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
Vish Ishaya | ||
Folsom |
Fix Released
|
High
|
Vish Ishaya | ||
nova (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
When trying to rapidly schedule hundres of instances the nova scheduler will not honor cpu_allocation_
using 'euca-run-instances -n' or a shell loop around 'nova boot' will produce this I've seen it with numbers as low as 100 though others report it taking >300 to reproduce.
On a cloud with a single nova-scheduler host configured with cpu_allocation_
for i in `seq 1 100`;do nova boot --image someimage --availability-zone nova:nova-1 --flavor 1 chaff-$i > /dev/null & done
Operating system is Ubuntu 12.04, using package 2012.2-
Related branches
- Openstack Ubuntu Testers: Pending requested
-
Diff: 121 lines (+100/-4)1 file modifieddebian/changelog (+100/-4)
tags: | removed: folsom-backport-potential |
Changed in nova: | |
milestone: | none → grizzly-1 |
status: | Fix Committed → Fix Released |
affects: | ubuntu → nova (Ubuntu) |
Changed in nova (Ubuntu): | |
status: | Confirmed → Fix Released |
Changed in nova: | |
milestone: | grizzly-1 → 2013.1 |
A few commments here:
a) the availability-zone scheduling uses forced_host which skips the scheduler logic.
b) euca-run-instances -n actually uses a different code path than nova boot with a shell loop. I have not been able to reproduce your issue in this case.
c) The issue with the shell loop is reproducible and I have a patch which appears to fix it in the case of one scheduler. Multiple schedulers could be trickier.