[Bug 580961] Re: unzip fails to deal correctly with filename encodings

Vladimir Skvortsov 580961 at bugs.launchpad.net
Sun Feb 10 15:42:46 UTC 2013


Ubuntu 12.10 (UI with US English-UTF-8 codepage)

It seems if you KNOW from which SW platform zip file comes from and
codepage, you can successfully unzip the archive without loosing non-
ASCII filenames not encoded in UTF-8.

I just did one experiment to unpack zip file that has been created in
Korean Windows 7 and contains the Korean characters in both zip archive
name and compressed files.

First let's get a local-specific info:

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Let's check the version of unzip utility:

$ unzip --help
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.
...
Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]
Default action is to extract files in list, except those in xlist, to exdir;
file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for usage).
...
-O CHARSET specify a character encoding for DOS, Windows and OS/2 archives
-I CHARSET specify a character encoding for UNIX and other archives

Look at options with the following modifier:

-O CHARSET  specify a character encoding for DOS, Windows and OS/2
archives

It is not -"zero", it is -O (capital O letter)!

In my case Korean Windows has EUC-KR codepage. The compressed zip-file
has "2013년 설날" file name.

It means my command line will look like:

$ unzip -O EUC-KR "2013년 설날"

After checking unpacked files, it works! All files have right Korean
encoding without strange characters.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to unzip in Ubuntu.
https://bugs.launchpad.net/bugs/580961

Title:
  unzip fails to deal correctly with filename encodings

Status in File Roller:
  Confirmed
Status in One Hundred Paper Cuts:
  Invalid
Status in The Linux Mint Distribution:
  Triaged
Status in Ubuntu Japanese Kaizen Project:
  Fix Committed
Status in unzip - free software .zip unarchiver:
  Unknown
Status in “unzip” package in Ubuntu:
  Triaged
Status in “unzip” source package in Natty:
  Won't Fix
Status in “unzip” package in Debian:
  Confirmed
Status in Gentoo Linux:
  Won't Fix
Status in “unzip” package in Mandriva:
  Unknown
Status in “unzip” package in openSUSE:
  Fix Released

Bug description:
  Binary package hint: unzip

  This is a fairly annoying bug that's been around and known at least
  since 2005.  It's very visible as it will very often make exchange of
  zip files with Windows users impossible, for example.  As such, it
  gathered it's fair share of "me too" and "how dare you haven't fixed
  this yet!!111!" comments.

  Problem description:
  zip/unzip and the specification fall short when dealing with non-ASCII filenames not encoded in UTF-8

  test case:
  do an "unzip -l" on the file http://tinyurl.com/2aofpxs and witness the question marks

  affected programs:
  the problem is in unzip itself, but affects GUI like xarchiver, file-roller, etc. that rely on unzip for the decompression

  suggested solutions (most are workarounds, not proper fixes):
   a) reintroduce patch for codepage-based zip filenames: bug 477755, http://tinyurl.com/2aqdbqg (Ubuntu blueprint)
   b) unzip filename according to locale: bug 203609
   c) Ubuntu JP has a patch, probably not generally applicable, bug 269482
   d) Russian altlinux distro uses natspec lib and patched zip binary

  natspec was mentioned in bug 477755 comment #2 and may indeed be a
  proper fix, needs closer inspection (I haven't really looked, yet.  As
  discussed in https://bugzilla.gnome.org/show_bug.cgi?id=306403 there
  is no failsafe, straight-forward way to fix this in all cases.
  Nonetheless, the current situation can and should be improved.
  There's some good ideas floating around.  It needs somebody to pull
  and wrap them together.

  It's unfortunate the FOSS community so far hasn't been able to fix
  this rather visible problem.  I'm opening this ticket as a master bug
  and clean slate to document the issue and current status.  Please
  don't ruin it by making above-mentioned unhelpful comments, they
  actually slow things down!  Please don't nominate for a release.

  Unless you're a dev and can provide a patch, you should think VERY
  carefully to do anything but

  1) subscribe yourself to this ticket
  2) mark this bug as affecting you
  3) tell me via mail about other bugs you think are a duplicate of this one, discussing the same problem

  1) to 3) will showcase to the devs how many people are affected and
  that is the only real chance we have for somebody to take a serious
  look.  "Me too" comments do the opposite, so again, please don't do
  it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/file-roller/+bug/580961/+subscriptions




More information about the foundations-bugs mailing list