> > Another thing worth mentioning is that not all invalid Unicode
> > is Latin1, of course. It could just as well be CP-850 or KOI-8R
> > or ISO2022-JP or KSC-5601 or GB2312 or any of at least a
> > hundred other legacy encodings.
>
> True, but it is essentially impossible for us to autodetect which
> encoding it is. That's the crux of the problem with legacy encodings.
That is precisely the point I was trying to make. Some of the comments above allude to an automatic translation based on the assumption that any invalid UTF8 is Latin-1 which is not true, and should be avoided. The only robust approach to this problem is "I don't know what encoding you used in the file name, so I can't show it correctly, but I sure can play the music in that file". Anything else is bound to fall on its face in new and interesting ways (unless of course you manage to solve the very ambitious task to correctly guess encodings, which is hardly of central importance for a media player).
> Also, I still haven't heard whether you tried trunk to see
> if it has the same problem. In theory trunk should be
> avoiding touching the encoding as much as possible,
> which should allow it to work with any encoding.
> Theoretically, anyway.
I'm afraid I won't necessarily be able to make that kind of investment in this bug report. On my production system, I had to settle on a different music player because of this issue, and I'm not too familiar with Exaile, its development model, or Python. I'll be happy to offer input on how to reproduce this bug if somebody else would like to create test cases. Because this is a thorny issue, there should probably be several test cases for this in the test suite. And anyway, there are several other persons who have chimed in and reported that they too have this problem.
Still, if you could provide a pointer to a brief howto for running the trunk version in a virtual machine with Ubuntu 9.04 or 9.10 prerelease, I could try to find the time to do that.
> > Another thing worth mentioning is that not all invalid Unicode
> > is Latin1, of course. It could just as well be CP-850 or KOI-8R
> > or ISO2022-JP or KSC-5601 or GB2312 or any of at least a
> > hundred other legacy encodings.
>
> True, but it is essentially impossible for us to autodetect which
> encoding it is. That's the crux of the problem with legacy encodings.
That is precisely the point I was trying to make. Some of the comments above allude to an automatic translation based on the assumption that any invalid UTF8 is Latin-1 which is not true, and should be avoided. The only robust approach to this problem is "I don't know what encoding you used in the file name, so I can't show it correctly, but I sure can play the music in that file". Anything else is bound to fall on its face in new and interesting ways (unless of course you manage to solve the very ambitious task to correctly guess encodings, which is hardly of central importance for a media player).
> Also, I still haven't heard whether you tried trunk to see
> if it has the same problem. In theory trunk should be
> avoiding touching the encoding as much as possible,
> which should allow it to work with any encoding.
> Theoretically, anyway.
I'm afraid I won't necessarily be able to make that kind of investment in this bug report. On my production system, I had to settle on a different music player because of this issue, and I'm not too familiar with Exaile, its development model, or Python. I'll be happy to offer input on how to reproduce this bug if somebody else would like to create test cases. Because this is a thorny issue, there should probably be several test cases for this in the test suite. And anyway, there are several other persons who have chimed in and reported that they too have this problem.
Still, if you could provide a pointer to a brief howto for running the trunk version in a virtual machine with Ubuntu 9.04 or 9.10 prerelease, I could try to find the time to do that.