[Bug 1947451] [NEW] unzip, manpage doesn't describe -I -O swicthes

Sat Oct 16 12:41:20 UTC 2021

Public bug reported:

unzip 6.0-21ubuntu1.1
Linux Mint 19.3 (Ubuntu 18.04)

ZIP files doesn't store information about character set used to encode
filenames in the archive. In the case that user tries to extract ZIP
file created at Windows with unzip at Linux, filenames can be corrupted
in the case those contain characters from extended ASCII table...

unzip can handle this, it has switches "-I" and "-O" to specify encoding
of filenames in the archive.

$ unzip -h | grep CHARSET
  -O CHARSET  specify a character encoding for DOS, Windows and OS/2 archives
  -I CHARSET  specify a character encoding for UNIX and other archives

Information about these switches is missing in manual page (man unzip)

This is a way how to "list" files in ZIP file from Czech edition of
Windows at Ubuntu:

$ unzip -l -O CP852 archive-win.zip

File archive-win.zip is in format 2.0

Other problem is that it is not possible to instruct ZIP to create ZIP
archive with specific encoding; so I cannot create at Linux ZIP file
that has file names encoded in CP852 codepage, such archive could be
opened at Czech Windows without issue...

Another problem is that it is not possible to create ZIP file in older
format, like 2.0 (zip creates archives in format 3.0 (it seems it could
be a problem for Windows user because that file uses UTF-8 characters).
I am not sure what is problem here but there is a problem... I need a
computer with Windows to check details.

** Affects: unzip (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to unzip in Ubuntu.
https://bugs.launchpad.net/bugs/1947451

Title:
  unzip, manpage doesn't describe -I -O swicthes

Status in unzip package in Ubuntu:
  New

Bug description:
  unzip 6.0-21ubuntu1.1
  Linux Mint 19.3 (Ubuntu 18.04)

  ZIP files doesn't store information about character set used to encode
  filenames in the archive. In the case that user tries to extract ZIP
  file created at Windows with unzip at Linux, filenames can be
  corrupted in the case those contain characters from extended ASCII
  table...

  unzip can handle this, it has switches "-I" and "-O" to specify
  encoding of filenames in the archive.

  $ unzip -h | grep CHARSET
    -O CHARSET  specify a character encoding for DOS, Windows and OS/2 archives
    -I CHARSET  specify a character encoding for UNIX and other archives

  Information about these switches is missing in manual page (man unzip)

  This is a way how to "list" files in ZIP file from Czech edition of
  Windows at Ubuntu:

  $ unzip -l -O CP852 archive-win.zip

  File archive-win.zip is in format 2.0

  Other problem is that it is not possible to instruct ZIP to create ZIP
  archive with specific encoding; so I cannot create at Linux ZIP file
  that has file names encoded in CP852 codepage, such archive could be
  opened at Czech Windows without issue...

  Another problem is that it is not possible to create ZIP file in older
  format, like 2.0 (zip creates archives in format 3.0 (it seems it
  could be a problem for Windows user because that file uses UTF-8
  characters). I am not sure what is problem here but there is a
  problem... I need a computer with Windows to check details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/1947451/+subscriptions