[Bug 203609] Re: unzip should use encoding according to locale, not utf-8
Unxed
203609 at bugs.launchpad.net
Thu Jul 16 10:16:08 UTC 2020
*** This bug is a duplicate of bug 580961 ***
https://bugs.launchpad.net/bugs/580961
Wrote a patch for unzip fixing this issue:
https://sourceforge.net/p/infozip/patches/29/
The same patch for p7zip:
https://sourceforge.net/p/p7zip/bugs/187/
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to unzip in Ubuntu.
https://bugs.launchpad.net/bugs/203609
Title:
unzip should use encoding according to locale, not utf-8
Status in unzip package in Ubuntu:
Confirmed
Status in unzip package in Debian:
Confirmed
Bug description:
As ZIP files doesn't include information on the encoding of the
filenames, most of ZIP archivers use native(system) encoding for it.
This is why the ZIP files archived on Windows can't be unarchived on
Linux. For example, Korean version of Windows uses 'cp949(extended
euc-kr)' encoding to zip and unzip the files. Japanese version of
Windows uses 'shift-jis', and so on.
Recently, encoding selection options are added to unzip. Two of them
can be controlled by environment variables.
export UNZIP='-O cp949'
export ZIPINFO='-O cp949'
These settings let unzip use cp949 instead of utf-8, the native linux
encoding, and improve compatibility with Windows.
So I propose that Ubuntu should include the settings above according
its locale. If the system uses ko_KR.UTF-8, cp949 should be selected.
For ja_JP.UTF-8, shift-jis should be used. zh_CN and other locales
also can be configured.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/203609/+subscriptions
More information about the foundations-bugs
mailing list