[Bug 10979] Re: unzip does not support UTF-8 filenames
Vladimir Skvortsov
10979 at bugs.launchpad.net
Sun Feb 10 15:51:47 UTC 2013
*** This bug is a duplicate of bug 580961 ***
https://bugs.launchpad.net/bugs/580961
Ubuntu 12.10 (UI with US English-UTF-8 codepage)
It seems if you KNOW from which SW platform zip file comes from and
codepage, you can successfully unzip the archive without loosing non-
ASCII filenames not encoded in UTF-8.
I just did one experiment to unpack zip file that has been created in
Korean Windows 7 and contains the Korean characters in both zip archive
name and compressed files.
First let's get a local-specific info:
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Let's check the version of unzip utility:
$ unzip --help
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.
...
Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]
Default action is to extract files in list, except those in xlist, to exdir;
file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for usage).
...
-O CHARSET specify a character encoding for DOS, Windows and OS/2 archives
-I CHARSET specify a character encoding for UNIX and other archives
Look at options with the following modifier:
-O CHARSET specify a character encoding for DOS, Windows and OS/2
archives
It is not -"zero", it is -O (capital O letter)!
In my case Korean Windows has EUC-KR codepage. The compressed zip-file
has "2013년 설날" file name.
It means my command line will look like:
$ unzip -O EUC-KR "2013년 설날"
After checking unpacked files, it works! All files have right Korean
encoding without strange characters.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to unzip in Ubuntu.
https://bugs.launchpad.net/bugs/10979
Title:
unzip does not support UTF-8 filenames
Status in File Roller:
New
Status in “unzip” package in Ubuntu:
Confirmed
Status in “unzip” package in Debian:
Confirmed
Status in “unzip” package in Gentoo Linux:
Fix Released
Status in “unzip” package in Mandriva:
Confirmed
Bug description:
when unzip extract filename , unzip handle with 7 bit filename.
so filenames with non-latin1 characters are broken.
I described in gentoo bugzilla #69945. and reported zip-bug form.
http://bugs.gentoo.org/show_bug.cgi?id=69945:
http://bugs.gentoo.org/show_bug.cgi?id=69945
To manage notifications about this bug go to:
https://bugs.launchpad.net/file-roller/+bug/10979/+subscriptions
More information about the foundations-bugs
mailing list