[Bug 580961] Re: unzip fails to deal correctly with filename encodings
Vladimir Skvortsov
580961 at bugs.launchpad.net
Sun Feb 10 15:42:46 UTC 2013
Ubuntu 12.10 (UI with US English-UTF-8 codepage)
It seems if you KNOW from which SW platform zip file comes from and
codepage, you can successfully unzip the archive without loosing non-
ASCII filenames not encoded in UTF-8.
I just did one experiment to unpack zip file that has been created in
Korean Windows 7 and contains the Korean characters in both zip archive
name and compressed files.
First let's get a local-specific info:
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Let's check the version of unzip utility:
$ unzip --help
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.
...
Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]
Default action is to extract files in list, except those in xlist, to exdir;
file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for usage).
...
-O CHARSET specify a character encoding for DOS, Windows and OS/2 archives
-I CHARSET specify a character encoding for UNIX and other archives
Look at options with the following modifier:
-O CHARSET specify a character encoding for DOS, Windows and OS/2
archives
It is not -"zero", it is -O (capital O letter)!
In my case Korean Windows has EUC-KR codepage. The compressed zip-file
has "2013년 설날" file name.
It means my command line will look like:
$ unzip -O EUC-KR "2013년 설날"
After checking unpacked files, it works! All files have right Korean
encoding without strange characters.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to unzip in Ubuntu.
https://bugs.launchpad.net/bugs/580961
Title:
unzip fails to deal correctly with filename encodings
Status in File Roller:
Confirmed
Status in One Hundred Paper Cuts:
Invalid
Status in The Linux Mint Distribution:
Triaged
Status in Ubuntu Japanese Kaizen Project:
Fix Committed
Status in unzip - free software .zip unarchiver:
Unknown
Status in “unzip” package in Ubuntu:
Triaged
Status in “unzip” source package in Natty:
Won't Fix
Status in “unzip” package in Debian:
Confirmed
Status in Gentoo Linux:
Won't Fix
Status in “unzip” package in Mandriva:
Unknown
Status in “unzip” package in openSUSE:
Fix Released
Bug description:
Binary package hint: unzip
This is a fairly annoying bug that's been around and known at least
since 2005. It's very visible as it will very often make exchange of
zip files with Windows users impossible, for example. As such, it
gathered it's fair share of "me too" and "how dare you haven't fixed
this yet!!111!" comments.
Problem description:
zip/unzip and the specification fall short when dealing with non-ASCII filenames not encoded in UTF-8
test case:
do an "unzip -l" on the file http://tinyurl.com/2aofpxs and witness the question marks
affected programs:
the problem is in unzip itself, but affects GUI like xarchiver, file-roller, etc. that rely on unzip for the decompression
suggested solutions (most are workarounds, not proper fixes):
a) reintroduce patch for codepage-based zip filenames: bug 477755, http://tinyurl.com/2aqdbqg (Ubuntu blueprint)
b) unzip filename according to locale: bug 203609
c) Ubuntu JP has a patch, probably not generally applicable, bug 269482
d) Russian altlinux distro uses natspec lib and patched zip binary
natspec was mentioned in bug 477755 comment #2 and may indeed be a
proper fix, needs closer inspection (I haven't really looked, yet. As
discussed in https://bugzilla.gnome.org/show_bug.cgi?id=306403 there
is no failsafe, straight-forward way to fix this in all cases.
Nonetheless, the current situation can and should be improved.
There's some good ideas floating around. It needs somebody to pull
and wrap them together.
It's unfortunate the FOSS community so far hasn't been able to fix
this rather visible problem. I'm opening this ticket as a master bug
and clean slate to document the issue and current status. Please
don't ruin it by making above-mentioned unhelpful comments, they
actually slow things down! Please don't nominate for a release.
Unless you're a dev and can provide a patch, you should think VERY
carefully to do anything but
1) subscribe yourself to this ticket
2) mark this bug as affecting you
3) tell me via mail about other bugs you think are a duplicate of this one, discussing the same problem
1) to 3) will showcase to the devs how many people are affected and
that is the only real chance we have for somebody to take a serious
look. "Me too" comments do the opposite, so again, please don't do
it.
To manage notifications about this bug go to:
https://bugs.launchpad.net/file-roller/+bug/580961/+subscriptions
More information about the foundations-bugs
mailing list