Performance degradation archiving DB with large numbers of FK related records
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
In Progress
|
Undecided
|
melanie witt | ||
Antelope |
New
|
Undecided
|
Unassigned | ||
Wallaby |
New
|
Undecided
|
Unassigned | ||
Xena |
New
|
Undecided
|
Unassigned | ||
Yoga |
New
|
Undecided
|
Unassigned | ||
Zed |
New
|
Undecided
|
Unassigned |
Bug Description
Observed downstream in a large scale cluster with constant create/delete
server activity and hundreds of thousands of deleted instances rows.
Currently, we archive deleted rows in batches of max_rows parents +
their child rows in a single database transaction. Doing it that way
limits how high a value of max_rows can be specified by the caller
because of the size of the database transaction it could generate.
For example, in a large scale deployment with hundreds of thousands of
deleted rows and constant server creation and deletion activity, a
value of max_rows=1000 might exceed the database's configured maximum
packet size or timeout due to a database deadlock, forcing the operator
to use a much lower max_rows value like 100 or 50.
And when the operator has e.g. 500,000 deleted instances rows (and
millions of deleted rows total) they are trying to archive, being
forced to use a max_rows value several orders of magnitude lower than
the number of rows they need to archive is a poor user experience and
also makes it unclear if archive progress is actually being made.
This issue was fixed in the openstack/nova 28.0.0.0rc1 release candidate.