media file content and filename encoding is not consistient
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Exaile |
Confirmed
|
High
|
Unassigned |
Bug Description
I see that there have been a number of bugs opened and fixed recently that deal with character encoding ([ticket:90], [ticket:165], [ticket:247], [ticket:292]). It's really a pain for a variety of historical reasons, but Exaile is going to have to deal with it. I'm going to try to explain the problems as I see them, and present the design for a solution. It's going to be more complicated than you want...
The two major problems are with '''MP3 tags''' and with '''filenames'''.
First, filenames. UNIX never had any encoding definition for filenames. The only restriction was that the name couldn't contain any slash (/). But it meant that for people who wanted to save files with non-English names had to make something up. It's a long story, but fairly recently the GNOME/gtk++ folks decided to assume UTF-8 and allow users to override this if needed. Check out the [http://
To handle '''filename encoding''' robustly, Exaile will have to allow for a per-file option for which encoding to use. With nothing set, use UTF-8, but would be possible for someone to have a French song in Latin-1 and a Polish song in Latin-2. It might be nice to have a per-directory setting. But then again, these files are non-standard, so perhaps just an easy way to set the filename encoding for a lot of files at once. Yes, both filenames should be in UTF-8 (and a conversion option would be nice), but Exaile should be able to open the file no matter what. A simpler option would be to not care, but I think there are problems with SQLite and non-UTF8 strings. Something else to consider is external album art.
For '''media tags''', the problem is more complicated. For id3v1 and v1.1, the tags are "supposed" to be in ISO-8859-1 (Latin-1), but they are often not. id3v2.0 v2.1, v2.2 and v2.3 "should" be in ISO-8859-1, while the uncommon id3v2.4 should be in UTF8. But lots of these text strings are not encoded correctly. Check out [http://
APEv1 tags are rare (and ASCII only) and APEv2 and Vorbis comments (for ogg, flac and Speex) are all UTF-8 all the time. WMA and ACC have their own tagging standard (See the [http://
For '''radio streams''' the filename option is unneeded, but the tag option should be kept.
So, in short:
|| Type || Default Encoding || Overridable? ||
|| audio filenames || UTF-8 || per file, per directory(?) ||
|| coverart filenames || UTF-8 || per file, per directory(?) ||
|| mp3 id3v1 || ISO-8859-1 || per track, per directory, global for tag type(?) ||
|| mp3 id3v1.1 || ISO-8859-1 || per track, per directory, global for tag type(?) ||
|| mp3 id3v2 || ISO-8859-1 || per track, per directory, global for tag type(?) ||
|| mp3 id3v2.3 || ISO-8859-1 || per track, per directory, global for tag type(?) ||
|| mp3 id3v2.4 || UTF-8 || per track, per directory, global for tag type(?) ||
|| APE & Vorbis || UTF-8 || per track? ||
|| WMA || Unknown || per track? ||
|| AAC || Unknown || per track? ||
|| Radio Streams || From above tag type|| per track, global for tag type(?) ||
Yes, this is a pain. But either you have to make it easy for users to use their existing data, or make it easy to change it. I hope it's not too discouraging!
This ticket was migrated from the old trac: re #293
Changed in exaile: | |
importance: | Undecided → High |
status: | New → Confirmed |
description: | updated |
@Adam
Thanks for good, detailed report.
I also hope to improve this problem.
Maybe https:/ /bugs.launchpad .net/exaile/ +bug/135950 has same suggest.