Metabib Display Fields
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
Fix Released
|
Wishlist
|
Unassigned |
Bug Description
See also bug #1009980 for some history. This bug takes the original idea and extends it.
The goal is to support extraction of display fields from MARC records, similar to facet, search, and browse fields, during record ingest, with the intention of creating a consistent, configurable mechanism for producing bib record display field values.
Among other things, this will allow us to stop using MVR's (commonly used in user interfaces) and reporter.
The ingest procedure will extract values for all config.
Working branch to follow.
Bill Erickson (berick) wrote : | #1 |
Bill Erickson (berick) wrote : | #2 |
... and IDL stuff.
Mike Rylander (mrylander) wrote : | #3 |
Awesome, Bill!
I pushed a collab branch with some minor schema realignment and a fix for repeatable display fields. See: http://
Bill Erickson (berick) wrote : | #4 |
Thanks, Mike. Changes pulled w/ sign-off and some additional fixes applied into user/berick/
Bill Erickson (berick) wrote : | #5 |
Further testing, updates, and questions:
Accessing display fields via the various auto-APIs is working well.
E.g. with no authtoken (~100ms)
request open-ils.fielder open-ils.
E.g. with an authtoken (~35ms)
request open-ils.pcrud open-ils.
---
Each of these returns lists of fields for each record, which raises a question. Given a set of display fields for a record, how do we choose which to show by default in a UI? In the MVR/simple-record world, we have something called "title". Do we need conventions (configuration?) for determining which title (proper, uniform, ...), author (personal, corporate...) , etc. to display by default? (This does not preclude access to all fields, of course, but it allows us to display a saner/smaller default set, which is best for most UIs).
If such a convention exists and is generally universal, it would be extremely convenient to have a DB view or similar which codified this, such that clients could say, "give me the title", while still having access to all display fields. I imagine this would be useful for reporting as well.
------------------
TODO:
-- pgtap test
-- re-ingest for upgrades. I'll need assistance with this. The re-ingest in my upgrade script is a step, maybe, but I'm sure there are better ways.
-- right now, most fields are set as display_fields (see below). Do we need to trim those down any?
UPDATE config.
-- Do we need any default xpath or normalizers for display fields?
Mike Rylander (mrylander) wrote : | #6 |
Bill,
There's no convention yet that would choose the best of field to represent the class. Only title and author really fall into that category of having a "main" value. I wonder if, perhaps, the simplest solution is to augment config.search_class with a "representative
As for reporting, things are a little more complicated. There are some array-type fields for ISBN and such which would need more consideration, but for (nominally) unique-per-record fields, the existing reporting fields could be augmented to provide better data, for sure.
Thoughts?
Bill Erickson (berick) wrote : | #7 |
"representative
+1 from me as it solves that problem once and for all.
--
Agreed that creating a display_
Bill Erickson (berick) wrote : | #8 |
I've pushed some code to add support for representative_
Dan Wells (dbw2) wrote : | #9 |
Bill, I think this is certainly an area of need, and the direction you have chosen is definitely viable. I have concerns about continually increasing disk space requirements, but if this ultimately does wholly replace reporter.
That said (and I am only raising this point for discussion), I wonder if we should be thinking bigger. In some parts of the code, we transform to MODS, which is often only sort of what we need, then try to further manipulate the MODS output (through normalizers or other stock functions) to get a final result. In other places (especially the OPAC), we have given up on MODS, and instead deal with the MARC directly.
This approach has some known weaknesses:
1) Duplication of logic in multiple places (reporter, MVR, OPAC)
2) Inconsistency when new logic is not propagated completely
3) Not every tool is equally suited to manipulate XML/MARC data
While it has come up a few times in various past discussions, I would like to propose that we reconsider adopting a custom XSL approach, which would mean at least one and perhaps many XSL stylesheets dedicated to MARC data extraction, for indexing, display, or either/both.
In this case, I am imagining that the data we would want in various client interfaces is exactly what is already in the results list of the OPAC. It would therefore seem very beneficial for the data extraction and selection logic to be consolidated into a single place. And, while not a holy grail, XSL happens to be really good at transforming XML data, especially when speed is a factor.
In some ways, we are already doing this, as our MODS XSL slowly drifts away from the original standard version. What I am asking is that we be more explicit about it, and also that perhaps we also separate some logic into some more special-purpose sheets. I know there will be some concerns about multiple transforms, but the cost of the transforms are very cheap compared to many things we already do, and in fact are even cheaper when the XSL is trim and focused. In a basic test on my aging DB server, a simplified in-DB transform which did a MODS-style title and author extraction only went at a clip of around 1,000 records per second.
Again, this is all just for discussion, as I haven't done enough testing (or thinking) to wholesale endorse any of this. Also, I know there are plenty of in-between approaches, and I recognize that this proposal isn't even necessarily in conflict with what we are doing here, but in fact could be done in a more or less complementary way. Still, it appears we find ourselves defining best practices for display of fundamental things like 'title' and 'author', so I think we should consider moving that discussion up (down?) a level into the MARCXML transformation logic itself.
Thoughts?
Bill Erickson (berick) wrote : | #10 |
The type of customization and configurability you're describing, Dan, is fully supported by the display_field code. We're adding a display_xpath column for storing display-specific value extraction. That XPATH can be run against MARCXML, MODS, or any number of local custom style sheets (config.
FWIW, field values can also be run through index normalizers.
As far as the space goes, I do think we can find some savings, e.g. reporter.
Dan Wells (dbw2) wrote : | #11 |
Thanks for the response, Bill. I know my post was a little long-winded and hedged, so I'll to try to respond and summarize my views at the same time.
1) My main push was to get us to consider officially adopting some custom stylesheets as part of this development. It sounds like you are on board with the idea, so I can definitely put some man-power behind that to try and get something going in tandem.
2) My secondary (unclear) goal was to try and advocate for some amount of optimization and/or refactoring as we are all looking at this code again. I know that config.
On our system, we can reingest right around 10 records per second on a single thread. As noted in my previous post, I can extract display fields on the fly at a rate of around 1000 per second, also on a single thread. Obviously, the reingest does much more work, but 100 times more? What are the bottlenecks, and how do they apply to display field extraction? Time will tell as more testing gets done, but I am afraid if we keep this addition too generic, we might leave a lot of performance on the table.
It certainly makes sense to leverage existing infrastructure, but given that display fields are potentially a much simpler beast, perhaps we should also ensure that we can provide a path through the code which is properly streamlined to take advantage of this simplicity.
Bill Erickson (berick) wrote : | #12 |
Thanks for clarifying, Dan. I can't really speak to the ingest internals, at least not without doing some research first. It seems likely there's room for optimization, but I'm not sure.
Is it fair to say that if this feature produced a net zero change in ingest speed, that your concerns would be addressed? Or, do you see this feature as something that must be addressed as a separate process from ingest so that it can run significantly faster and, if so, why exactly? (The obvious benefit would be quicker propagation of display field modifications, but I don't know if that justifies using a separate structure).
I'm curious, Dan, in your 1k records / second test, were you writing output to database rows or to STDOUT?
Bill Erickson (berick) wrote : | #13 |
I was curious, so I ran some tests in my desktop dev vm w/ a stock Postgres 9.1 install. I ran batches of ingests of 100 records, repeated 10 times, the first batch without display field extraction and the second with:
Total Time: 74540.091 ms — no display fields
Total Time: 77274.786 ms — with display fields
If my math is right, that's a 3.7% increase in overall time. Or, roughly an additional minute for every 30 minutes of ingest time.
Dan Wells (dbw2) wrote : | #14 |
Bill, I do think display field changes will happen considerably more frequently and more commonly than index/browse/facet field changes, but it will probably still be not terribly often in the grand scheme. So, you are right, that alone probably doesn't justify a separate structure.
I guess the opportunity I am feeling is that I believe display extraction code can be done very simply if we allow the XSL to do the heavy lifting, especially if we place limits on the resulting transformed structure. If an experimental effort to write such code proved successful, it would not only provide a possible path for fast display field propagation, but also inform ways we might tweak the existing extraction process for more general speed gains.
As I said to start, I am just opening the discussion and "testing the waters" on this idea. If there is enough interest and/or no clear opposition, I can produce some code (which would probably clarify things and generate more discussion) and see if this has any real legs, or is just a dead-end. Thoughts?
P.S. Yes, the 1k/second was just a quick and dirty "on the fly" extraction test to STDOUT, no writing of rows. I know that writing rows will slow that to some degree :)
Bill Erickson (berick) wrote : | #15 |
If you have a structure in mind, I'm very curious. Faster propagation of display fields would be a great thing to have. (Ditto a faster DB upgrade for this feature). Thanks for keeping the thoughts rolling, Dan.
Kathy Lussier (klussier) wrote : | #16 |
We have a lot of interest in this code. Is this something that is ready for testing or is everyone still working out ways and means?
Thanks!
Kathy
Mike Rylander (mrylander) wrote : | #17 |
I'd like to spit-ball on a couple things above...
1) Speed concerns - I agree with having the capability to do a display-only reingest. We can leverage the existing infrastructure that uses flags, and extend it with a wrapper function that targets specifically a display reingest. (Likewise, a browse reingest.) With that, we have one code path (internally) during full ingest of a record, and a way to invoke display-only from the outside. In fact, this may tie into my thoughts on queued reingest, where the reingest might be typed (optionally) to allow faster processing. We might not gain /all/ possible speed increases, but I think it's likely we'll gain most of them.
2) Space concerns - I think we should consider leveraging something akin to the browse data and uncontrolled attribute value tables, which store unique, post-normalization values to which other, slimmer tables link. There is the option of reducing the number of secondary XPath expressions, and thus collapsing, say, facets, browse data and display data into a single table, and having all three use cases share the same underlying strings. I'm not sure about that, though .... there are cases where facets need to be different due to syntax constraints on search (today). Also, keeping the underlying structures separate means it's safer and easier to do parallel search+
Thoughts?
Changed in evergreen: | |
status: | New → Triaged |
importance: | Undecided → Wishlist |
Mike Rylander (mrylander) wrote : | #18 |
I've pushed a master-rebased version of this branch (no conflicts!) and will be digging into it a bit more soon. Branch here: http://
Changed in evergreen: | |
assignee: | Bill Erickson (erickson-esilibrary) → Mike Rylander (mrylander) |
Changed in evergreen: | |
assignee: | Mike Rylander (mrylander) → nobody |
Kathy Lussier (klussier) wrote : | #19 |
We had somebody at MassLNC who wanted to take a look at this branch, and I loaded it on a Sandbox this morning. However, after loading it, I get an Internal Server error when navigating to the OPAC.
I don't see any errors in the Apache logs (not sure if we have any Apache logging configured), but I've pasted the errors I see in the osrfsys log at http://
Mike Rylander (mrylander) wrote : | #20 |
Kathy,
The paste suggests that either the DB load failed (all those "schema does not exist" errors) or the opensrf.xml file was pointing at the wrong database. Since this is an all-DB feature so far, the former is more likely. Where there any errors captured during the database creation?
Kathy Lussier (klussier) wrote : | #21 |
The current error I'm seeing during database creation is:
psql:002.
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
Dan Wells (dbw2) wrote : | #22 |
'representative
Working branch:
working/
Terran McCanna (tmccanna) wrote : | #23 |
This one hasn't gotten any attention in a while - does it need a pullrequest tag?
Mike Rylander (mrylander) wrote : | #24 |
Terran,
No, there's certainly enough drift that it will need some work. It's rather low priority ATM, also, since making use of it everywhere will be a large undertaking, and disruptive to ongoing work like the web staff client.
Bill Erickson (berick) wrote : | #25 |
I was curious about the state of this, so I rebased the most recent branch, updated a few functions to match current master, did some light squashing, and added a sign-off for Dan's SQL fix commit:
Code still works. And it's still fast (and very satisfying) to do stuff like this and get a list of display fields for a bib record:
srfsh# request open-ils.pcrud open-ils.
(Now even faster w/ anon-pcrud mode -- ~10ms).
What's our bar for getting this merged? If we want to demonstrate its use in live code, I can think of a few web staff UI's that would be fairly simple to port to display fields. (I'm thinking specifically of a few that use normalized reporter data for title, author, etc., display). In any event, I can help push this along if there's interest / feedback on what's left.
Other TODO items include PGTAP tests and release notes.
Galen Charlton (gmc) wrote : | #26 |
Thanks for rebasing, Bill.
I think it would be better if the cmf.display_field column defaulted to FALSE and that we supplied some stock cmf definitions that are specifically tuned for display; this would reduce potential requirements conflict between setting up definitions for indexing versus rules for generating human-readable display labels.
For example: the "author" of a work could be taken from author|personal, author|corporate, and so forth. While the set of author|* cmf rows is fine for indexing, a single display field definition that extracts the first mods:name (or however we choose to set it up) would be easier to use.
As a minimal set, I suggest title, author, and publication date (which is particularly interesting given that nowadays both the 260 and 264 must be consulted). We might even go so far as create a 'display' cmf field class.
That's not to say that I think it should be verboten to have a cmf row that's used for both search/browse/facet generation and a display field, but I do think that existing stock definitions should be checked individually before turning on display_field.
One thing that I think would be nice to have but which I don't consider a blocker for merging: a database function to perform a display field reingest that takes bib ID and a list of display field IDs.
Galen Charlton (gmc) wrote : | #27 |
(Also, I can and will help out a bit with the seed data)
Bill Erickson (berick) wrote : | #28 |
Thanks for the feedback, Galen. Agreed on all points. Seed data help would certainly be appreciated.
I don't have a strong opinion on creating a "display" field class, but if we do we'll need an alternate way to specify what class each display field is part of. As it stands, config.
On that note, there's another wrinkle I want to iron. One of the goals for this bug is to provide the basis for replacing MVR's. This will mean, for example, creating display fields for ISBN's and allowing UI's to grab displayable ISBN's. As it stands, I don't believe the code supports that use case.
It can say "give me the representative title" or "give me all of the displayable identifier fields", but not "give me the list of displayable ISBN's". That is, unless the caller assumes the field has a certain name, like "isbn", which is not a controlled vocabulary at this point. Are we ready to make the leap to defining a well-known base set of field names for display fields?
In any event, a display-focused reingest function will make all future steps easier, so I'll look at that. We can change the function guts as needed.
Galen Charlton (gmc) wrote : Re: [Bug 1251394] Re: Metabib Display Fields | #29 |
> It can say "give me the representative title" or "give me all of the
> displayable identifier fields", but not "give me the list of displayable
> ISBN's". That is, unless the caller assumes the field has a certain
> name, like "isbn", which is not a controlled vocabulary at this point.
> Are we ready to make the leap to defining a well-known base set of field
> names for display fields?
I think we should -- it would be preferable to using opaque IDs to
refer to them.
Looking at it from the POV of replacing MVRs, we've got several fields
that don't fit into the author/
classes, including publisher, pubdate, edition, and toc. While I'm
not strongly attached to the notion of using 'display' as a new field
class, if we added it, we could refer to fields like this:
display|
display|title
display|pubdate
and so forth.
> In any event, a display-focused reingest function will make all future
> steps easier, so I'll look at that. We can change the function guts as
> needed.
Great, thanks!
Mike Rylander (mrylander) wrote : | #30 |
I don't see a problem with the template's conception of field names being required to match extant CMF rows. We're effectively doing that by hard coding xpath to extract fields now in any case.
[thinking...] Extending the representative field concept to point through a new table mapping from the display field controlled-names to actual fields (a la config.
CREATE TABLE config.
id serial,
name text,
cmf int references config.
multi bool default false
);
The templates would reference config.
In fact, we could use a view over metabib.
The first view would be a 3-column view (record id, field name, json data) and the simple-record view would pivot specific rows from that to a wide projection with prescribed columns.
This provides us vocab control, programmatic mapping, and backward compatibility, I think...
With that, I don't see a good argument for creating a new display class -- we have the catch-all keyword class, and can just map through to that.
Thoughts?
Mike Rylander (mrylander) wrote : | #31 |
Also, the new table described above could (should?) inform the ingest process about whether it should attempt to store multiple values from a display_field CMF, forcing some structure on "bad" records.
Bill Erickson (berick) wrote : | #32 |
Hey Mike, I think this sounds good. This gets us all the way back, as alluded, to a reporter.
Am I right to assume the 3-column view could contain both controlled fields plus any locally added custom fields, while the simple-record view would only include the controlled fields (a la reporter.
And of course we no longer need the representative_
For the 3-column view, it sounds like we're essentially morphing the "mfde" IDL (view) class from the working branch, teaching it to understand config.
Bill Erickson (berick) wrote : | #33 |
Pushed a few commits to:
* Adds the config.
* Tweaks metabib.
One concern about my implementation of metabib.
* Adds a few stub seed data values to the YYYY file for testing.
* Adds a ZZZ.UNDO.*.sql file for reverting all SQL changes added by the XXX and YYYY files. Very useful for testing. This will eventually go away.
* Beware the base schema is not in sync w/ the update scripts.
Bill Erickson (berick) wrote : | #34 |
Fleshing this out a little more. With an eye toward making data retrieval as flexible as possible, given this data will be used by a variety of clients, here's a breakdown of the current way clients can fetch the display data.
1. metabib.
Thin table of display entry values.
2. metabib.
One row per value, combined with data from config.
3. metabib.
Same as flat_display_entry, except there's only 1 row per config.
4. metabib.
Analogous to reporter.
One row per source (bib record) with a column per well-known field. As with compressed_
===
Next steps are to flesh out the stock display field configuration data, then finalize the columns for metabib.
Still using user/berick/
Bill Erickson (berick) wrote : | #35 |
And last for today, a proof of concept using wide_display_entry as a drop-in replacement for reporter.
(BTW, some of these views could have better names. At some point I get lazy w/ names).
Bill Erickson (berick) wrote : | #36 |
Rebased to current master. Pushed a commit adding an egBibDisplay Angular service that can de-JSON-ify the wide_display_entry data and present flat_display_
Bill Erickson (berick) wrote : | #37 |
After experimenting and sleeping on this a bit, I have some thoughts.
Adding the new config.
I'd like to revisit Galen's suggestion of adding a new "display" metabib class. I see a few benefits to this:
1. It creates a clear separation between display fields and index, etc. fields. I think in practice, few metabib_field's would be both a display and index field and separating them from the start will make them easier to manage and faster to re-ingest for display-only ingests.
2. We can use the config.
3. No new config.
My only concern is whether adding a new metabib class (which we rarely do) will have unexpected consequences in the code.
Thoughts appreciated.
Mike Rylander (mrylander) wrote : | #38 |
Hi Bill,
First, thanks for continually pushing this forward. I personally really appreciate it.
One of the near-term future benefits of display fields, which I'm hoping to work on in the next few months, is the ability to highlight search terms in the fields that were actually searched and matched by the user query. If we add display-centered adornments to the existing searched/
We could add a different, external mapping table to tie search and display fields together, but then we're back to having a mapping table and we lose the direct knowledge.
I don't actually agree that display and search fields will be mostly orthogonal. In fact, display will be primarily a subset of search. We may certainly display less than we search (think: subfield h of 245), but I don't think we'd ever show more than we'd allow searching on, except in circumstances that would likely be covered by existing (or useful) facet or browse fields. For instance, facets are a step toward display fields already, in that the values we extract for facet limiting are the same we use for display of facet data. The only facet field in stock that is not also a search field is the authority id facet. Browse is in a similar position: there are 5 of 17 browse fields that are not search fields, but one is because of a sorting need special to browse and could probably be addressed without the separate field, and the others are driven by how cataloging is performed rather than separate needs for search/browse as performed by a user per se.
I do see where you're coming from WRT reingest, but I think we have much bigger fish to fry in that area. Separating out display fields for that reason will not make any of the existing TODOs any easier, and they still need design and doing. (I think queued ingest along with a wholesale rewrite of the ingest process will be needed, and I hope that starts sooner rather than later, but that's a separate concern and I don't think it should have much weight on this bug's direction.) That said, I do think special care should be taken to make sure that display-only reingest is as efficient as possible on day one.
WRT config, I don't see the UI issue the way you describe. It may be just a difference in design philosophy, but I don't see why we need to have a UI per table. I think the existing (or, perhaps, angularized) CMF config UI can be made to deal with display mapping. When configuring a field, if display_field is true, offer a modal action that creates or changes the mapping. We save a UI, and lead the user to what they need at the time they need it. And as far as having a mapping table in place, it will be tiny and not factor into performance (not that you claimed it would, I just want to make that clear.)
Also, I'm glad you mention the code/maintenance effect of adding a new class. Currently, the classes are not special -- they can all be used for any purpose, an...
Mike Rylander (mrylander) wrote : | #39 |
Also, I should note that I think the code as-is is pretty darn close to commit-ready, IMO.
I want to think more about if/how we might start fast-pathing all non-full reingests -- hopefully we can shard what we have and put pieces back together as compossible units without incurring too much overhead (extra UDF calls) or doing too much repeated work (XML re-parsing), while also making each part simpler. I'll try to find some time in the next couple weeks to look at that ... I'm hopeful, but not overly optimistic, that it's doable with restricted tuits.
Changed in evergreen: | |
assignee: | nobody → Mike Rylander (mrylander) |
Mike Rylander (mrylander) wrote : | #40 |
I've rebased this to master and added some reingest streamlining. I got rid of the apparently redundant field definitions, but if they had a reason to live, please let me know about that. Branch at:
Bill Erickson (berick) wrote : | #41 |
Thanks, Mike. Streamlining++
The redundant definitions were added as a step toward addressing Galen's concerns from #26, i.e. that some stock index definitions will not match what we want to use for display data. The XPATH is identical because we never got around to actually creating the display-friendly definitions.
Also, why the return of the 'representative
Mike Rylander (mrylander) wrote : | #42 |
Updated with minor fixes. I'm pullrequesting this now...
Bill, do you think you'll have time to check this out soon-ish?
TIA
Changed in evergreen: | |
assignee: | Mike Rylander (mrylander) → nobody |
tags: | added: pullrequest webstaffclient |
Mike Rylander (mrylander) wrote : | #43 |
Re the representative_
Mike Rylander (mrylander) wrote : | #44 |
I'll get rid of the rep-field stuff in the upgrade and baseline, and adjust the baseline seed data a bit with display_xpath and joiners.
FWIW, I think we can drop the subject|topic display mapping ... the complete one (after setting the joiner to --) is really what we want, I think, to start with anyway.
Mike Rylander (mrylander) wrote : | #45 |
I've updated the branch as described.
Other than making sure that all the new views are represented somewhere in the baseline schema, I think it's pretty much reified.
For reference, regarding streamlining of reingest, I'm getting about 100 records/second for display-only reingest as shown in the YYYY upgrade file on a slow VM and a small data set.
Bill Erickson (berick) wrote : | #46 |
Thanks, Mike! I'm going to look at this tomorrow.
Changed in evergreen: | |
assignee: | nobody → Bill Erickson (berick) |
Bill Erickson (berick) wrote : | #47 |
Here be another branch:
Stuff I did:
copied some remaining bits into the base schema / seed data
other misc. cleanup
replaced wide entry references to 'topic_subject' with the new 'creators' field.
added display_xpath to 'author' field to upgrade script
remove ZZZZ.UNDO script
made the reingest call more generic in case we make more field changes (and for release notes)
expanded inline documentation for webstaff egBibDisplay service
remove webstaff items-out proof-of-concept code (see below)
sign-off's for Mike's commits.
release notes
TODO:
More sign-offs needed.
Other thoughts:
From where I'm sitting, the code itself is ready. More eyes are great, obviously, but I think it's there. The seed data is very slim, though. As it stands, the data we have is only useful in addition to other data, for example it's not yet a full replacement for reporter.
Having said that, though, I'm OK with merging as-is, with the understanding we'll be expanding the seed data and using it in the code, coming soon to a release near you. If we go this route, we should probably remove (comment-out) the display reingest from the upgrade script, since it's probably not yet warranted.
Changed in evergreen: | |
assignee: | Bill Erickson (berick) → nobody |
Changed in evergreen: | |
assignee: | nobody → Mike Rylander (mrylander) |
Mike Rylander (mrylander) wrote : | #48 |
Merged to master! And the people did feast upon the lambs, and sloths, and carp, and anchovies, and orangutans, and breakfast cereals, and fruit bats, and large chu...
Thanks, Dan(s), Bill, and all!
Changed in evergreen: | |
assignee: | Mike Rylander (mrylander) → nobody |
status: | Triaged → Fix Committed |
Changed in evergreen: | |
status: | Fix Committed → Fix Released |
Changed in evergreen: | |
milestone: | none → 3.0-beta |
Branch started at
http:// git.evergreen- ils.org/ ?p=working/ Evergreen. git;a=shortlog; h=refs/ heads/user/ berick/ lp1251394- metabib- display- fields
It's largely a copy/paste job from facet_entry, w/ some slight modifications. It seems to work, in as much as values are extracted, but it will need lots of eyes.
TODO:
-- API / Views(?) for easier programmatic access.
-- managing the re-ingest on upgrade