oxfam error: Converted text is not utf-8

Bug #1035308 reported by Alexander Bittner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
KARL3
Fix Released
Low
Carlos de la Guardia

Bug Description

* detected via error monitor
* we see a lot of these errors
* some of them are related to blobcache files:

Thu Aug 9 05:38:27 2012 ERROR karl Converted text is not utf-8: /srv/multikarl/production/10/var/blob_cache/oxfam/16/5467.03986be2159cf0aa.blob

-> this file is a binary file

* and some are related to temp files (the majority):

Thu Aug 9 11:17:54 2012 ERROR karl Converted text is not utf-8: /tmp/tmpZURL94

-> however, we can't have a look into these temporary files, since they only seem to exist for a short period of time

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

We get these a lot also, but we just tolerate them. But you're right, we need to either fix it or stop having it raise an alarm.

I'm assigning to Carlos but not until next week. I hope to get a full traceback from the next time it happens. Of course, our inability to get the source file producing this problem is an issue. We might have to put something in place that captures the offending file and copies it to var for analysis.

Changed in karl3:
assignee: nobody → Carlos de la Guardia (cguardia)
importance: Undecided → Medium
milestone: none → m116
Changed in karl3:
milestone: m116 → m117
Changed in karl3:
importance: Medium → Low
JimPGlenn (jpglenn09)
Changed in karl3:
milestone: m117 → m118
JimPGlenn (jpglenn09)
Changed in karl3:
milestone: m118 → m119
Revision history for this message
Paul Everitt (paul-agendaless) wrote :

We need to slow everything down until we get clarity on the Q4 budget.

Changed in karl3:
milestone: m119 → m120
Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Carlos, think you can work on this one during the next 2 weeks (M120)?

Revision history for this message
Carlos de la Guardia (cguardia) wrote :

This will happen anytime the text on an uploaded file does not match the encoding expected by the converter triggered by its file type. When this happens, text is coerced into ascii and processing continues. This text is used for indexing the file, so except for the non-compliant characters text search will work for this file.

This could probably be a warning instead of an error, but aside from that I don't think there's much that can be done, since users can upload whatever they want and there can't be a guarantee that they will always stick to some encoding.

Any suggestions?

Changed in karl3:
status: New → In Progress
Revision history for this message
Paul Everitt (paul-agendaless) wrote :

I agree with your diagnosis. Thus, the real problem to solve is simple: get the stupid error monitor to shut up. :) Make it a warning and we'll close the ticket.

Revision history for this message
Carlos de la Guardia (cguardia) wrote :

Done. Deployed to cguardia-1035308-converted-text-error

Changed in karl3:
status: In Progress → Fix Committed
Revision history for this message
JimPGlenn (jpglenn09) wrote :

looks good.

tags: added: branch-default tested ux2
Revision history for this message
Paul Everitt (paul-agendaless) wrote :

I am going to teleport these into the future so we can test them. Breaking the rules a little.

Changed in karl3:
milestone: m120 → m123
tags: removed: ux2
Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Already tested and released in r3.99

tags: added: r3.104
tags: added: r3.99
removed: r3.104
Changed in karl3:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.