On Windows, file names are encoded with different encoding for CJKV+th locales, while ZIP archive does not store file name encoding information. When decompressing the ZIP archive on system with another encoding (i.e. UTF-8 on Linux), the file names are garbage and those characters are replaced to ??? by unzip command. And in reality there is no concrete algorithm can detect encoding reliably, not mentioning file names are too short (so it becomes more unreliable, not like in browsers).
Upstream solution to this problem was documented in bug #580961 which is not a direct path that works for ordinary users, hence we are adding a -O switch to specify encoding for archives created on Windows as a locale hack in distribution.
Additional background:
On Windows, file names are encoded with different encoding for CJKV+th locales, while ZIP archive does not store file name encoding information. When decompressing the ZIP archive on system with another encoding (i.e. UTF-8 on Linux), the file names are garbage and those characters are replaced to ??? by unzip command. And in reality there is no concrete algorithm can detect encoding reliably, not mentioning file names are too short (so it becomes more unreliable, not like in browsers).
Upstream solution to this problem was documented in bug #580961 which is not a direct path that works for ordinary users, hence we are adding a -O switch to specify encoding for archives created on Windows as a locale hack in distribution.