nova-api startup does not scan cells looking for minimum nova-compute service version
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Dan Smith | ||
Pike |
Won't Fix
|
Medium
|
Unassigned | ||
Queens |
Won't Fix
|
Medium
|
Unassigned | ||
Rocky |
Fix Committed
|
Medium
|
Matt Riedemann |
Bug Description
This CI job failed devstack setup because nova-api took longer than 60 seconds to start (it took 64 seconds):
http://
Looking at what could be taking time in there, it was noticed that this is logged a lot:
Dec 05 20:14:00.919520 ubuntu-
That's coming from here:
Which is when the compute rpcapi client is initialized, which happens when nova.compute.
Which happens for most of the API extensions, e.g.:
So that init and DB query happens num_workers * num_extensions times (we have 2 workers in this case and it looks like there are at least 29 instantiations of the compute API code in the extensions).
The bigger problem is that in this case, nova-api is configured to hit cell0 for the [database]
[database]
connection = mysql+pymysql:
And there will not be nova-compute services in the cell0 database (if configured properly).
So this query is always going to return 0, at least for devstack:
We should really be scanning the cells to get the minimum nova-compute version using this:
But even on first startup of nova-api before any computes are started and registered with a cell, that initial query will return 0 which means we won't cache the result and continue to do that query and log that message for extensions * workers.
So there are really kind of two issues here:
1. We're not iterating cells properly for that version check. This is the more important issue.
2. We're needlessly doing this query on initial startup which slows down the startup (and contributes to timeouts the devstack jobs on slow nodes) and lots of excessive logging.
Another thing that is probably contributing to the slow nova-api start time is that every nova.compute. api.API constructs a SchedulerReport Client, which grabs an in-memory lock per API worker during init:
Dec 05 20:14:27.694593 ubuntu- xenial- ovh-bhs1- 0000959981 <email address hidden>[23459]: DEBUG oslo_concurrenc y.lockutils [None req-dfdfad07- 2ff4-43ed- 9f67-2acd59687e 0c None None] Lock "placement_client" released by "nova.scheduler .client. report. _create_ client" :: held 0.006s {{(pid=23462) inner /usr/local/ lib/python2. 7/dist- packages/ oslo_concurrenc y/lockutils. py:339} }
We could probably be smarter about either making that a singleton in the API or only init on first access since most of the API extensions aren't going to even use that SchedulerReport Client.