archiving stat cats with aged circulation

Bug #798255 reported by Tim Spindler
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Status tracked in Main
Main
Fix Released
Medium
Unassigned

Bug Description

Wishlist Item

We have a real need to retain patron stat cats and copy stat cats for statistical purposes. I have attached an sql statement that sets up a trigger to capture copy stat cat and patron stat cat information when transactions are deleted and moved to the aged circulation table. I set up an archiving mechanism for the copy stat cats because copy stat cats may be deleted changed over time and having it linked to a production table could cause statistical info to be lost. In looking at course reserves, for instance, we have libraries who may be interested in using a copy stat cat like anthro101 which might be used for 2 semesters and then removed. Under the current design, getting circ statistics would be lost once the stat cat is separated from the copy and deleted.

Revision history for this message
Tim Spindler (tspindler-cwmars) wrote :
Revision history for this message
Tim Spindler (tspindler-cwmars) wrote :

Found an error in my attached code. Attached is the corrected statement.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Tim,

Add a Developer's Certificate of Origin, and I'll push to have this code added. It might need a switch (YAOUS) so that people can turn it on or off.

Otherwise, it'll sit here as a dusty patch that people can use if they want to.

Jason

Revision history for this message
Tim Spindler (tspindler-cwmars) wrote :

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

Signed-off-by: Tim Spindler, <email address hidden>

Revision history for this message
Tim Spindler (tspindler-cwmars) wrote :

I also found their is something wrong with my view extend_reporter.all_circulation. I may have carried over a bug that was identified already (but can't find the reference). I have also since added code to archive the copy stat cat. This is also important for our member libraries who will assign a temporary stat cat for course reserves. The code is updated as follows and I have removed the "all_circulation" view until I can fix the code.

BEGIN;

SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;

CREATE TABLE extend_reporter.aged_patron_stat_cats (
    id bigserial NOT NULL,
    aged_circ_id bigint NOT NULL,
    stat_cat text NOT NULL,
    stat_cat_entry text NOT NULL
);

ALTER TABLE ONLY extend_reporter.aged_patron_stat_cats
    ADD CONSTRAINT aged_patron_stat_cats_pkey PRIMARY KEY (id);

CREATE TABLE extend_reporter.aged_copy_stat_cats (
    id bigserial NOT NULL,
    aged_circ_id bigint NOT NULL,
    stat_cat text NOT NULL,
    stat_cat_entry text NOT NULL
);

ALTER TABLE ONLY extend_reporter.aged_copy_stat_cats
    ADD CONSTRAINT aged_copy_stat_cats_pkey PRIMARY KEY (id);

CREATE OR REPLACE VIEW extend_reporter.circulation_patron_stat_cats AS
  SELECT c.id, s.name, m.stat_cat_entry
  FROM action.circulation c JOIN (actor.stat_cat_entry_usr_map m JOIN actor.stat_cat s ON m.stat_cat = s.id) ON c.usr = m.target_usr
  UNION ALL
  SELECT aged_circ_id, stat_cat, stat_cat_entry FROM extend_reporter.aged_patron_stat_cats;

CREATE OR REPLACE VIEW extend_reporter.circulation_copy_stat_cats AS
  SELECT c.id, s.name as stat_cat, e.value as stat_cat_entry
  FROM action.circulation c JOIN (asset.stat_cat_entry_copy_map m JOIN (asset.stat_cat s JOIN asset.stat_cat_entry e ON s.id=e.stat_cat) ON m.stat_cat = s.id ) ON c.target_copy = m.owning_copy
  UNION ALL
  SELECT aged_circ_id, stat_cat, stat_cat_entry FROM extend_reporter.aged_copy_stat_cats;

CREATE OR REPLACE FUNCTION extend_reporter.age_circ_stat_cat_on_delete () RETURNS TRIGGER AS $$
DECLARE
found char := 'N';
BEGIN
    -- If there are any renewals for this circulation, don't archive or delete
    -- it yet. We'll do so later, when we archive and delete the renewals.
    SELECT 'Y' INTO found
    FROM action.circulation
    WHERE parent_circ = OLD.id
    LIMIT 1;

    IF found = 'Y' THEN
        RETURN NULL; -- don't delete
 END IF;

    -- Archive a copy of the old row to extend_reporter.aged_stat_cats
    INSERT INTO extend_reporter.aged_copy_stat_cats
  (aged_circ_id, stat_cat, stat_cat_entry)
      SELECT id, stat_cat, stat_cat_entry
  FROM extend_reporter.circulation_copy_stat_cats WHERE id = OLD.id;

    INSERT INTO extend_reporter.aged_patron_stat_cats
  (aged_circ_id, stat_cat, stat_cat_entry)
      SELECT id, name, stat_cat_entry
  FROM extend_reporter.circulation_patron_stat_cats WHERE id = OLD.id;

    RETURN OLD;

END;

$$ LANGUAGE 'plpgsql';

CREATE TRIGGER action_circulation_aging_stat_cat_tgr
 BEFORE DELETE ON action.circulation
 FOR EACH ROW
 EXECUTE PROCEDURE extend_reporter.age_circ_stat_cat_on_delete ();

END

Revision history for this message
Jason Stephenson (jstephenson) wrote :

I will attempt to "branchify" this and see about adding YAOUS or some way to control what patron stat_cats get saved and what ones don't. It could be that some folks are putting personally identifiable information in stat_cats and preserving those stat_cats would defeat the purpose of aging and anonymizing patron data. (Well, I know some users are storing personally identifiable info in some stat_cats, so there does need to some configuration for this.)

Revision history for this message
Thomas Berezansky (tsbere) wrote :

I wrote a patch up to add a flag to statistical categories for whether or not they should be archived with circulations (aged or active) or not, as well as keeping the circ-time copy location around.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/tsbere/statistical_archive

tags: added: pullrequest
Revision history for this message
Dan Scott (denials) wrote :

Tested with two different patrons and a handful of copies, each of which had both archivable and non-archivable stat cats, ran a number of circs, deleted the patrons, and all seemed to work as promised. The archivable stats were archived, and the circ-time copy location was present in the action.all_circulation view.

Committed to master. Thanks Thomas!

Revision history for this message
Dan Scott (denials) wrote :

Also, thanks Tim for inspiring this work in the first place!

Revision history for this message
Dan Scott (denials) wrote :

Jason Stephenson reported that the database upgrade for this branch was taking over 24 hours to complete; in looking at the upgrade, it issues two update events that affect every row in the action.circulation table, and the table has four triggers that fire on UPDATE events. In theory, disabling all triggers on action.circulation during the upgrade should speed the process up considerably.

Accordingly, I've pushed user/dbs/speedier_0663_upgrade to working to try to diminish the impact of the upgrade (and have set the status of this bug back to "Incomplete" until the change gets integrated).

Revision history for this message
Jason Stephenson (jstephenson) wrote :

While Dan's changes make this upgrade script faster, it still has some issues that need to be worked out:

UPDATE 4238722
psql:0663.schema.archive_circ_stat_cats.sql:46: ERROR: cannot ALTER TABLE "circulation" because it has pending trigger events

Looks like we need to reorder some of the queries. I'll do this and post another branch based on Dan's when I have something that works, probably later today.

Revision history for this message
Thomas Berezansky (tsbere) wrote :

For the record, I am currently testing an alternate attempt to improve the speed of the script. If it works I will branchify it.

Revision history for this message
Thomas Berezansky (tsbere) wrote :

My alternate attempt. Not fully tested due to production issues today, but putting it out there anyway.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/tsbere/faster_0663_upgrade

Revision history for this message
Thomas Berezansky (tsbere) wrote :

I have force-pushed my alternate attempt after adjusting it slightly.

It is now taking < 30 minutes reliably on our database server, and my last test run (where the data was, I believe, no longer only on disk, having rolled back a previous test run) took just over 15 minutes.

The database I tested in has 3,953,845 rows in action.circulation.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

I concur that Thomas's branch is much faster, and acceptably faster at that, on my development server.

Using a copy of our production database with 4,544,711 circ transactions, the upgrade script has gone from taking over a day to just three hours to run. This is on a virtual machine with 6GB of RAM allotted.

I'll sign off on Thomas's branch, since it is a performance improvement of something that already exists and not necessarily a new feature. (We have a rule between us about things like that to avoid any problems with the community.)

It would be good if others could test the script, too. Just to make sure that we haven't somehow magically optimized it for our data, but left others broken. I seriously doubt it, but you never know until you try.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Signed and pushed to collab branch:

working/collab/dyrcona/faster_0663_upgrade

Revision history for this message
Thomas Berezansky (tsbere) wrote :

The faster upgrade script ran very quickly when we did our production upgrade, for the record.

Revision history for this message
Thomas Berezansky (tsbere) wrote :

Pushed the faster upgrade script.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.