Inefficient multi-cell instance list
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Dan Smith | ||
Queens |
New
|
Undecided
|
Unassigned | ||
Rocky |
New
|
Undecided
|
Unassigned |
Bug Description
This is based on some performance and scale testing done by Huawei, reported in this dev ML thread:
http://
In that scenario, they have 10 cells with 10000 instances in each cell. They then run through a few GET /servers/detail scenarios with multiple cells and varying limits.
The thread discussion pointed out that they were wasting time pulling 1000 records (the default [api]/max_limit) from all 10 cells and then throwing away 9000 of those results, so the DB query time per cell was small, but the sqla/ORM/python was chewing up the time.
Dan Smith has a series of changes here:
https:/
Which allow us to batch the DB queries per cell which, when distributed across the 10 cells, e.g. 1000 / 10 = 100 batch size per cell, ends up cutting the time spent in about half (around 11 sec to around 6 sec).
This is clearly a performance issue which we have a fix, and we arguably should backport the fix.
Note this is less of an issue for deployments that leverage the [api]/instance_
https:/
Changed in nova: | |
status: | Triaged → In Progress |
The only argument against backporting is that we identified this as a potential situation at PTG in Denver (the first one), and said we would deal with it if/when it came up. At the time we had the most information from CERN, which is mostly immune to this situation.
That said, the batching is a lot less complicated than I originally expected and there isn't really any technical reason not to backport it so I think we should.