[Bug 203609] Re: unzip should use encoding according to locale, not utf-8

Thu Jul 16 10:16:08 UTC 2020

*** This bug is a duplicate of bug 580961 ***
    https://bugs.launchpad.net/bugs/580961

Wrote a patch for unzip fixing this issue:
https://sourceforge.net/p/infozip/patches/29/

The same patch for p7zip:
https://sourceforge.net/p/p7zip/bugs/187/

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to unzip in Ubuntu.
https://bugs.launchpad.net/bugs/203609

Title:
  unzip should use encoding according to locale, not utf-8

Status in unzip package in Ubuntu:
  Confirmed
Status in unzip package in Debian:
  Confirmed

Bug description:
  As ZIP files doesn't include information on the encoding of the
  filenames, most of ZIP archivers use native(system) encoding for it.
  This is why the ZIP files archived on Windows can't be unarchived on
  Linux. For example, Korean version of Windows uses 'cp949(extended
  euc-kr)' encoding to zip and unzip the files. Japanese version of
  Windows uses 'shift-jis', and so on.

  Recently, encoding selection options are added to unzip. Two of them
  can be controlled by environment variables.

  export UNZIP='-O cp949'
  export ZIPINFO='-O cp949'

  These settings let unzip use cp949 instead of utf-8, the native linux
  encoding, and improve compatibility with Windows.

  So I propose that Ubuntu should include the settings above according
  its locale. If the system uses ko_KR.UTF-8, cp949 should be selected.
  For ja_JP.UTF-8, shift-jis should be used. zh_CN and other locales
  also can be configured.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/203609/+subscriptions