The type of customization and configurability you're describing, Dan, is fully supported by the display_field code. We're adding a display_xpath column for storing display-specific value extraction. That XPATH can be run against MARCXML, MODS, or any number of local custom style sheets (config.xml_transform). One or more custom style sheets that do what the TPAC does (and more) would be a great addition to this code and would ideally be developed in tandem to reduce re-index churn.
FWIW, field values can also be run through index normalizers.
As far as the space goes, I do think we can find some savings, e.g. reporter.simple_record, but we may also be trading some disk space for speed (pre-extracted data is a lot faster than on-the-fly MVR generation) and configurability.
The type of customization and configurability you're describing, Dan, is fully supported by the display_field code. We're adding a display_xpath column for storing display-specific value extraction. That XPATH can be run against MARCXML, MODS, or any number of local custom style sheets (config. xml_transform) . One or more custom style sheets that do what the TPAC does (and more) would be a great addition to this code and would ideally be developed in tandem to reduce re-index churn.
FWIW, field values can also be run through index normalizers.
As far as the space goes, I do think we can find some savings, e.g. reporter. simple_ record, but we may also be trading some disk space for speed (pre-extracted data is a lot faster than on-the-fly MVR generation) and configurability.