> Another thing worth mentioning is that not all invalid Unicode is Latin1, of course. It could just as well be CP-850 or KOI-8R or ISO2022-JP or KSC-5601 or GB2312 or any of at least a hundred other legacy encodings.
True, but it is essentially impossible for us to autodetect which encoding it is. That's the crux of the problem with legacy encodings.
Also, I still haven't heard whether you tried trunk to see if it has the same problem. In theory trunk should be avoiding touching the encoding as much as possible, which should allow it to work with any encoding. Theoretically, anyway.
> Another thing worth mentioning is that not all invalid Unicode is Latin1, of course. It could just as well be CP-850 or KOI-8R or ISO2022-JP or KSC-5601 or GB2312 or any of at least a hundred other legacy encodings.
True, but it is essentially impossible for us to autodetect which encoding it is. That's the crux of the problem with legacy encodings.
Also, I still haven't heard whether you tried trunk to see if it has the same problem. In theory trunk should be avoiding touching the encoding as much as possible, which should allow it to work with any encoding. Theoretically, anyway.