Coding system bug - Julian Bradfield
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
VM |
Fix Released
|
High
|
Uday Reddy |
Bug Description
Julian Bradfield reports (vm.info, 2011-1-1)
OK, here's the first real bug. The symptoms are corruption of data
when converting mime objects to other types. It's something I
encountered a while ago in old vm, and fixed in mine, but I'm not sure
what design of fix is correct, so I'll describe it rather than just
sending in a patch.
When a mime object is type-converted, it's fed to an external
program. The write to that program is (following an earlier partial
discovery of this problem by me) done in binary.
However, the read *from* the program is done in the default coding
system, probably for most people utf-8. That seems right for text, but
not if you're converting to another binary object.
However...it's not even right for text, because the output of the program
is decoded again by vm later on when it parses the new mime part
that's been created by conversion.
So the output should be read in binary, regardless of what it is.
The question then is what to do in the explicit decoding; or rather
what charset the type-conversion should assign to the new mime part,
so that decoding is done correctly. At present, no charset is
assigned, so the decoding is done in binary; which is wrong if what
the conversion program output was utf-8 text, and so your
type-converted stuff gets corrupted. (In my case, converting msword to
text.)
My solution to this was to allow the coding-system to be specified
explicitly as an extra element of the elements of
vm-mime-
However, I'm not sure that's correct. Arguably, the coding system
should simply be the default for text output types, and binary for
others, because any program outputting text is going to pickup the
Unix locale, and any program not outputting text isn't.
If that's agreed, I'll do that and send along a patch.
Changed in vm: | |
status: | In Progress → Fix Committed |
tags: | added: 7.19 |
Changed in vm: | |
status: | Fix Committed → Fix Released |
no longer affects: | vm/8.1.x |
> However, I'm not sure that's correct. Arguably, the coding system
> should simply be the default for text output types, and binary for
> others, because any program outputting text is going to pickup the
> Unix locale, and any program not outputting text isn't.
>
> If that's agreed, I'll do that and send along a patch.
This solution sounds right to me. Please do send me a patch whenever
you are ready.