[Bug 1422290] Re: Default charsets handling for Windows archives in CJKV+th locale

Nobuto Murata nobuto at nobuto-murata.org
Wed Jul 8 06:41:15 UTC 2015


I have sent an enhancement request to upstream through http://www.info-
zip.org/zip-bug.html since the issue is still reproducible with
6.1c19-BETA which you can try from:
https://launchpad.net/~nobuto/+archive/ubuntu/build-test/+build/7630500

Putting a copy of the request here for your reference.

====

This is an enhancement request. Thanks to ICONV_MAPPING(-O/-I options),
we can specify character encoding when extracting zip files. However in
combination with GUI application(e.g. file-roller on Linux), there is no
way to specify -I/-O from a user perspective. Therefore We cannot
extract zip files created on localized Windows system properly with GUI.

A workaround would be exporting UNZIP and ZIPINFO variables with "-O
<local charset on Windows>" per locale on login by putting [1] under
/etc/profile.d/.

[1] http://bazaar.launchpad.net/~nobuto/ubuntu/vivid/unzip/fallback-
encoding/view/head:/debian/unzip-default-charset.sh

It would be nice if unzip had fallback charset mapping per locale out of
the box. I have created a test case to handle 3 types of zip files in
ja_JP locale.

[2] http://bazaar.launchpad.net/~nobuto/ubuntu/vivid/unzip/fallback-encoding/view/head:/debian/tests/fallback-encoding
(without [1], 3rd test case, fat and CP932, will fail.)

$ unzip -v 
UnZip 6.1c19-BETA (2015-04-15) by Info-ZIP.  Maintainer: Steven M. Schweda
 Copyright (c) 1990-2015 Info-ZIP.  For software license: unzip --license
 See README for details.  More info: http://info-zip.org/UnZip.html

Compiled with GCC 4.9.2 for Unix (GNU/Linux x86_64).

UnZip special compilation options:
        ARCHIVE_STDIN        (Allow streaming archive from stdin)
        ICONV_MAPPING        (ISO/OEM (iconv, -I/-O) conversion supported)
        IZ_HAVE_UXUIDGID     (UID, GID > 16-bit ("ux" extra block) supported)
        SET_DIR_ATTRIB       (Setting directory attributes supported)
        SYMLINKS             (Symbolic links supported, if RTL and file sys do)
        TIMESTAMP            (Restoring file timestamps supported)
        UNIXBACKUP           (-B creates backup files)
        USE_EF_UT_TIME       (Use Universal Time, if available)
        UNSHRINK_SUPPORT     (PKZIP/Zip 1.x Shrink compression)
        DEFLATE64_SUPPORT    (PKZIP 4.x Deflate64(tm) compression)
        UNICODE_SUPPORT [wide-chars, char coding: UTF-8] (handle UTF-8 paths)
        MBCS-support         (Multibyte character support, MB_CUR_MAX = 6)
        LARGE_FILE_SUPPORT   (Large files over 2 GiB supported)
        ZIP64_SUPPORT        (Archives using Zip64 for large files supported)
        BZIP2_SUPPORT        (PKZIP 4.6+, bzip2 lib ver 1.0.6, 6-Sept-2010)
        LZMA_SUPPORT         (PKZIP 6.3+, LZMA compression, ver 9.20)
        PPMD_SUPPORT         (PKZIP 6.3+, PPMd compression, ver 9.20)
        VMS_TEXT_CONV        (Conversion of VMS var-len rec fmt text supported)
        IZ_CRYPT_TRAD        (Traditional (weak) encryption, ver 3.0)

Traditional Zip Encryption notice:
        The traditional zip encryption code of this program is not
        copyrighted, and is put in the public domain.  It was originally
        written in Europe, and, to the best of our knowledge, can be freely
        distributed in both source and object forms from any country,
        including the USA under License Exception TSU of the U.S. Export
        Administration Regulations (section 740.13(e)) of 6 June 2002.

UnZip and ZipInfo environment options:
           UNZIP:  [none]
        UNZIPOPT:  [none]
         ZIPINFO:  [none]
      ZIPINFOOPT:  [none]

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to unzip in Ubuntu.
https://bugs.launchpad.net/bugs/1422290

Title:
  Default charsets handling for Windows archives in CJKV+th locale

Status in unzip package in Ubuntu:
  Triaged
Status in unzip package in Debian:
  Confirmed

Bug description:
  With the current unzip package in Ubuntu, we need to specify charset
  explicitly to extract zip files sent from localized Windows systems.

  For example zip files sent from Japanese localized Windows,
  $ zipinfo -O CP932 sent-from-localized-windows.zip
  $ unzip -O CP932 sent-from-localized-windows.zip

  This method won't work for GUI application like file-roller, users do
  not have way to specify charset from GUI.

  Attached branch adds default charsets handling for Windows archives in
  CJKV+th locale, inspired by Ubuntu Kylin way.

  As a result of bug #580961, two options have been added as Ubuntu patch.
  > -O CHARSET specify a character encoding for DOS, Windows and OS/2 archives
  > -I CHARSET specify a character encoding for UNIX and other archives

  Then Ubuntu Kylin added default encoding as environment variables for their distribution.
  http://bazaar.launchpad.net/~ubuntukylin-members/ubuntukylin-default-settings/trunk/revision/171

  Now as Ubuntu, we can go further by a better way:
   - per user settings by their locales instead of global settings
   - don't interfere in other locales by locale guard

  I only add "-O", so no behavior change for zip files created on Ubuntu
  or other Linux/UNIX systems. This branch just handles zip file created
  on localized Windows system seamlessly.

  charsets list is taken from:
  https://msdn.microsoft.com/en-us/goglobal/bb964654
  and
  msdos/msdos.c in unzip package:
     1682 case 932: /* Japanese */
     1683 case 949: /* Korean */
     1684 case 936: /* Chinese, simple */
     1685 case 950: /* Chinese, traditional */
     1686 case 874: /* Thai */
     1687 case 1258: /* Vietnamese */

  (Copied from @nobuto's branch description.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/1422290/+subscriptions



More information about the foundations-bugs mailing list