open-ils.storage.actor.org_unit.closed_date.overlap can hang and gives inconsistent results
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
Confirmed
|
High
|
Unassigned |
Bug Description
Evergreen Versions: 3.2.10+ (Tested on 3.2.10 and Master)
OpenSRF Version: N/A
PostgreSQL Version: Tested with PostgreSQL 9.6 and Postgresql 12 (likely N/A)
O/S Version(s): Tested on Ubuntu 16.04 and 18.04 (likely N/A)
open-ils.
This function will also return inconsistent start and end times for certain overlaps using the same data. That is, two dates where I would expect to see the same start and end date, return different start and end dates. I do not think this is intentional.
I am attaching a SQL script that can be loaded on the Concerto dataset that will reliably reproduce these issues. After loading the data, you can expose the issue with srfsh.
To recreate the loop issue, run the following in srfsh:
request open-ils.storage open-ils.
To recreate the inconsistent start and end dates, the following two commands will suffice:
request open-ils.storage open-ils.
request open-ils.storage open-ils.
You can obtain similar results with this data using other dates as well.
tags: | added: closeddates sampledata |
Changed in evergreen: | |
status: | New → Confirmed |
importance: | Undecided → High |
Here is a SQL query that you can use to find the queries run by the storage drone that is looping. You will want to scan the output for look ups on the actor.hours_ of_operation and actor.org_ unit_closed tables. Note that you will have to run the SQL several times to find any/all processes. These queries are short-lived and there is a decent chance that you will miss some if you only run the query once or twice. If you run this on a database with a lot going on, i.e. a production database, then I would suggest running it several times before executing any of the following steps to make sure that you are getting processes stuck in a loop.
You can find the process ID of the storage drone(s) that is/are looping over the queries by copying the client port column value, logging into the machine with the client IP address and running the following command:
lsof -iTCP | grep {client port}
Where {client port} is the port number reported by the query.
The second column of the output from lsof is the drone's process ID (PID). You can terminate the drone with the following command:
kill {PID}
Where {PID} is, of course, the numeric process ID reported by lsof.