Query for OPAC copies can be extremely slow
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
Fix Released
|
Medium
|
Unassigned |
Bug Description
Evergreen 2.9 (at least)
Okay, we are in the process of upgrading to a new server, and in running some speed tests, we found an anomaly. In the end, it turned out to be a multi-layered issue:
Problem: certain OPAC copy querys were ~1000x slower than the rest (e.g. 7 seconds vs 7ms)
Cause: Bad query plan
This much was pretty obvious, but why a bad plan, and why wasn't it consistent? Here are two examples of the generated queries, one slow and one fast (slightly simplified from real-life versions):
--SLOW:
SELECT *
FROM asset.copy AS "acp"
LEFT JOIN asset.copy_part_map AS "acpm" ON ( "acpm".target_copy = "acp".id )
LEFT JOIN biblio.
INNER JOIN config.copy_status AS "ccs" ON ( "ccs".id = "acp".status AND "ccs".opac_visible = 't' )
LEFT JOIN action.circulation AS "circ" ON ( "circ".target_copy = "acp".id AND "circ".checkin_time IS NULL )
INNER JOIN asset.copy_location AS "acpl" ON ( "acpl".id = "acp".location AND "acpl".deleted = 'f' AND "acpl".opac_visible = 't' )
INNER JOIN actor.org_unit AS "aou" ON ( "aou".id = "acp".circ_lib AND "aou".id IN (SELECT (actor.
INNER JOIN asset.call_number AS "acn" ON ( "acn".id = "acp".call_number AND ( "acn".deleted = 'f' ) AND ( "acn".record = '801073' ) )
INNER JOIN asset.call_
INNER JOIN asset.call_
WHERE ( "aou".opac_visible = 't' ) AND ( "acp".opac_visible = 't' AND "acp".deleted = 'f' )
ORDER BY evergreen.
LIMIT 10 OFFSET 0;
--FAST:
SELECT *
FROM asset.copy AS "acp"
LEFT JOIN asset.copy_part_map AS "acpm" ON ( "acpm".target_copy = "acp".id )
LEFT JOIN biblio.
INNER JOIN asset.call_number AS "acn" ON ( "acn".id = "acp".call_number AND ( "acn".deleted = 'f' ) AND ( "acn".record = '801073' ) )
INNER JOIN asset.call_
INNER JOIN asset.call_
LEFT JOIN action.circulation AS "circ" ON ( "circ".target_copy = "acp".id AND "circ".checkin_time IS NULL )
INNER JOIN actor.org_unit AS "aou" ON ( "aou".id = "acp".circ_lib AND "aou".id IN (SELECT (actor.
INNER JOIN config.copy_status AS "ccs" ON ( "ccs".id = "acp".status AND "ccs".opac_visible = 't' )
INNER JOIN asset.copy_location AS "acpl" ON ( "acpl".id = "acp".location AND "acpl".deleted = 'f' AND "acpl".opac_visible = 't' )
WHERE ( "acp".deleted = 'f' AND "acp".opac_visible = 't' ) AND ( "aou".opac_visible = 't' )
ORDER BY evergreen.
LIMIT 10 OFFSET 0;
These queries are identical in everything but the JOIN order, which generally doesn't matter, since the planner will try to optimize the JOINs away, except...
-- Deeper cause: Too many joins can not be "collapsed"
There is a (configurable) limit on the number of JOINs which can be optimized in the query plan. In this case, the most important JOIN for efficiency is the call number, because without that, we JOIN in a *lot* of extra rows (like every copy and circ, I think). If that JOIN is too far down the list, we hit our optimization limit, and the planner never "sees" it to make the right plan.
---- Potential solution #1: Increase join_collapse_limit (or in some similar cases, from_collapse_
The default limit for JOIN collapsing is '8'. In this case, if we bump it (join_collapse_
------ Problem with solution #1: O(n!) complexity
With this bump, we don't generate 12.5% more plans, we generate 900% more plans, as every possible JOIN order is considered (factorial complexity). If we go to '10', then we're at 9,000% more plans (or something like that). Point is, we can't just bump this value without thinking long and hard about it.
---- Even deeper cause: JSON queries cannot be ordered in Perl (object order is preserved in direct JSON)
The reason the order is inconsistent in the first place is because this begins as a JSON query represented in Perl. In our case, the "acn" join comes first in the Perl, but that doesn't matter since the Perl JOIN hash doesn't give us any real ordering.
------ Potential solution #2: Find a way to specify JOIN order in Perl
If order can be maintained, then more selective JOINs can reliably come first in the resulting query, and we will hit this issue less often, or at least the performance will be consistent. For example, if we run the same queries (more or less) in srfsh as direct JSON, we don't have this problem [*] .
-------- Problems with solution #2: Ugh, where to start with this one... (or maybe there's a way?)
The 'Tie::IxHash' is one example of an ordered hash-like Perl structure, but retrofitting our infrastructure would be a huge undertaking. Perhaps slightly more feasible would be to allow an array-of-objects optional JOIN syntax, then only retrofit known cases (like this one) where order matters. All this said, the parser syntax is pretty rich already, so maybe some other option exists already to help cases like this?
------ Potential solution #3: Restructure this query to use "acn" as the core table
This is untested, but I imagine it would work. Still, it won't *always* work if the best plans rely on selectivity of two or more tables.
------ Potential workarounds
Give up and run this query as raw SQL in 'storage', make a DB function out of it, etc.
Anyway, just throwing this out for discussion. For our needs, we plan to bump up the 'join_collapse_
Sincerely,
Dan
* Here's a (fast) query for srfsh (moving the "acn" join to the end of the list will make it slow):
request open-ils.cstore open-ils.
Changed in evergreen: | |
status: | New → Confirmed |
assignee: | nobody → Galen Charlton (gmc) |
importance: | Undecided → Medium |
milestone: | none → 3.0.2 |
Changed in evergreen: | |
assignee: | Galen Charlton (gmc) → nobody |
Changed in evergreen: | |
status: | Fix Committed → Fix Released |
Dan, where in the code is this query generated? I wonder if we can't remove some of the joins if they provide unnecessary information in some contexts...