Evergreen

db-seed.po files need cleanup to remove duplicate IDs from generated localized seed data

Bug #1681864 reported by Dan Scott on 2017-04-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Evergreen	Fix Released	Medium	Unassigned	Evergreen 3.0-beta
	2.12	Fix Released	Medium	Unassigned	Evergreen 2.12.5

Bug Description

This bug continues #1680312 - running through the translation process on master / rel_2_12, we find the localized 950.data.seed-values.sql files can contain duplicate rows:

cd build/i18n

# generate the new base POT
make newpot

# update the translation for fr-CA
make LOCALE=fr-CA updatepo

# create an updated SQL file to load fr-CA
make LOCALE=fr-CA project

# load the generated fr-CA SQL
psql -f project/locale/fr-CA/950.data.seed-values.sql evergreen

At this last step, I ran into some errors about duplicates in config.i18n_core. However, the load was successful, and the errors didn't seem to be harmful. To determine why we were getting duplicates, I examined the results of the "updatepo" step (which adds and merges the new strings with an existing translation) vs. a "newpo" step (which generates a fresh, untranslated PO file).

With the "updatepo" step, you get entries like:

# id::cmf.label__26 id::cza.label__1 id::cza.label__10
# id::cza.label__1 id::cza.label__10 id::cza.label__19
#: 950.data.seed-values.sql:173 950.data.seed-values.sql:410
#: 950.data.seed-values.sql:431
msgid "Title Control Number"
msgstr "Numéro de contrôle pour les titres"

With the "newpo" step, you get entries like:

# id::cmf.label__26 id::cza.label__1 id::cza.label__10
#: 950.data.seed-values.sql:173 950.data.seed-values.sql:410
#: 950.data.seed-values.sql:431
msgid "Title Control Number"
msgstr ""

So it looks like in the merge process, "pot2po" is holding onto prior ID rows, resulting in duplicates getting fed into the final SQL file. The problem could be cleaned up by running the output SQL through "sort | uniq", but that's fixing the symptom rather than the root of the problem. More investigation into what pot2po is doing may be required to fix the root of the problem.

Tags:

Dan Scott (denials) on 2017-04-11

Changed in evergreen:
status:	New → Triaged
milestone:	none → 2.12.1

Revision history for this message

Ben Shum (bshum) wrote on 2017-04-12:

Looks like we may want to insert a step for "pocommentclean" which is part of translate toolkit and can be used to remove all existing comments in a given file. While the manpage says that you can only use pocommentclean against po directories and not individual files, my local testing with translate toolkit 1.13.0 on Ubuntu 16.04 yielded successfully cleaned files on a case by case basis.

So running "pocommentclean build/i18n/po/db.seed/fr-CA.po" resulted in a file fr-CA.po with no comments in it. Followed by running the individual command to update one PO file, like "pot2po --progress verbose -i po/db.seed/db.seed.pot -o po/db.seed/fr-CA.po -t po/db.seed/fr-CA.po" which resulted in an fr-CA.po file with the corrected IDs only (and all the updated IDs that were missing from the file too).

We'd have to figure out where best to implement the task to clean out the existing IDs and replace them all. And then with git version control tracking all our changes, the final update that gets committed would appear only to replace any mistakes in the file. Any existing IDs that are unchanged would remain unchanged with the git history.

Further testing and coding to come...

tags:	added: i18n
Changed in evergreen:
status:	Triaged → Confirmed
importance:	Undecided → Medium

Revision history for this message

Ben Shum (bshum) wrote on 2017-04-17:

Working branch started up: user/bshum/lp161864-i18n-db-seed-fixing

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/bshum/lp161864-i18n-db-seed-fixing

All this does is add a pocommentclean step in the middle of updatepo make target that will selectively wipe out comments for a given db.seed locale file. Tested only on Ubuntu 16.04 presently with translate toolkit 1.13.0. There may be differences on other platforms in how this ends up behaving.

The test procedure would involve running "make newpot" in build/i18n first, to deploy the new template changes for IDs to the base db.seed.pot file. Then run "make LOCALE=ar-JO updatepo" or "make update_all_locales" to hit all of them at once, to apply the changes throughout between the templates and locale PO files.

As a potential hiccup, it looks like updatepo hasn't been called between POT and PO in a very long time (since it's not a step in the wiki instructions for release making predating 2.6). So it has been shown for me that running updatepo to apply this fix also nets a ton of other PO updates and changes. Testing this fix will require some finesse with managing the various i18n pieces that need updating and repair.

Not putting a pullrequest on this yet, while we investigate further and discuss any other potential ramifications.

Evergreen Bug Maintenance (bugmaster) on 2017-04-24

Changed in evergreen:
milestone:	2.12.1 → 2.12.2

Evergreen Bug Maintenance (bugmaster) on 2017-05-24

Changed in evergreen:
milestone:	2.12.2 → 2.12.3

Evergreen Bug Maintenance (bugmaster) on 2017-06-22

Changed in evergreen:
milestone:	2.12.3 → 2.12.4

Revision history for this message

Ben Shum (bshum) wrote on 2017-07-18:

Setting a pullrequest on this now. The change should assist with fixing up the IDs during the next PO template sync and PO files.

tags:

added: pullrequest

Evergreen Bug Maintenance (bugmaster) on 2017-07-28

Changed in evergreen:
milestone:	2.12.4 → 2.12.5

Revision history for this message

Chris Sharp (chrissharp123) wrote on 2017-08-10:

Signoff branch here:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/csharp/lp1681864-i18n-db-seed-fixing

This was useful when testing the fix for bug 1160347.

tags:

added: signedoff

Revision history for this message

Galen Charlton (gmc) wrote on 2017-08-10:

Pushed to master. Thanks, Ben and Chris!

Changed in evergreen:
status:	Confirmed → Fix Committed
milestone:	2.12.5 → 3.0-alpha

Revision history for this message

Galen Charlton (gmc) wrote on 2017-08-10:

Advice on whether to backport to rel_2_12?

Revision history for this message

Ben Shum (bshum) wrote on 2017-08-10:

Probably wouldn't hurt to have it in rel_2_12. +1 to backport

Revision history for this message

Galen Charlton (gmc) wrote on 2017-08-17:

Pushed to rel_2_12

Evergreen Bug Maintenance (bugmaster) on 2017-09-07

Changed in evergreen:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.