[Bug 2066389] [NEW] unzip uses Russian Cyrillic CP866 as the OEM encoding, even if the Russian locale is not selected in the system

Unxed 2066389 at bugs.launchpad.net
Wed May 22 15:11:28 UTC 2024


Public bug reported:

The built-in .zip archiver in older versions of Windows used DOS (OEM)
or Windows (ANSI) code page corresponding to current regional settings
for new archives. Lots of such archives still exist.

The problem is that Ubuntu's unzip is stuck with CP866 for such archives. Have a look at
20-unzip60-alt-iconv-utf8.patch
especially on mapping of system charset to charsets unzip expects to have in archive

+/* A mapping of local <-> archive charsets used by default to convert filenames
+ * of DOS/Windows Zip archives. Currently very basic. */
+static CHARSET_MAP dos_charset_map[] = {
+    { "ANSI_X3.4-1968", "CP850" },
+    { "ISO-8859-1", "CP850" },
+    { "CP1252", "CP850" },
+    { "UTF-8", "CP866" },
+    { "KOI8-R", "CP866" },
+    { "KOI8-U", "CP866" },
+    { "ISO-8859-5", "CP866" }
+};

As you see, CP866 is selected on all systems having UTF-8 as system
charset (almost any modern system). Definitely not correct behavior.

The correct behavior is to determine the relevant OEM or ANSI code page based on the system locale and use it. You can look at this PR for reference implementation:
https://github.com/p7zip-project/p7zip/pull/232

Upstream issue:
https://sourceforge.net/p/infozip/bugs/43/#951c

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: unzip 6.0-28ubuntu4
ProcVersionSignature: User Name 6.8.0-31.31-generic 6.8.1
Uname: Linux 6.8.0-31-generic x86_64
ApportVersion: 2.28.1-0ubuntu2
Architecture: amd64
CasperMD5CheckMismatches: ./boot/grub/grub.cfg
CasperMD5CheckResult: fail
CurrentDesktop: ubuntu:GNOME
Date: Wed May 22 11:05:59 2024
InstallationDate: Installed on 2024-04-29 (23 days ago)
InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424)
ProcEnviron:
 LANG=en_US.UTF-8
 PATH=(custom, no user)
 SHELL=/bin/bash
 TERM=xterm-256color
 XDG_RUNTIME_DIR=<set>
SourcePackage: unzip
UpgradeStatus: No upgrade log present (probably fresh install)

** Affects: unzip (Ubuntu)
     Importance: Undecided
         Status: Confirmed


** Tags: amd64 apport-bug noble wayland-session

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to unzip in Ubuntu.
https://bugs.launchpad.net/bugs/2066389

Title:
  unzip uses Russian Cyrillic CP866 as the OEM encoding, even if the
  Russian locale is not selected in the system

Status in unzip package in Ubuntu:
  Confirmed

Bug description:
  The built-in .zip archiver in older versions of Windows used DOS (OEM)
  or Windows (ANSI) code page corresponding to current regional settings
  for new archives. Lots of such archives still exist.

  The problem is that Ubuntu's unzip is stuck with CP866 for such archives. Have a look at
  20-unzip60-alt-iconv-utf8.patch
  especially on mapping of system charset to charsets unzip expects to have in archive

  +/* A mapping of local <-> archive charsets used by default to convert filenames
  + * of DOS/Windows Zip archives. Currently very basic. */
  +static CHARSET_MAP dos_charset_map[] = {
  +    { "ANSI_X3.4-1968", "CP850" },
  +    { "ISO-8859-1", "CP850" },
  +    { "CP1252", "CP850" },
  +    { "UTF-8", "CP866" },
  +    { "KOI8-R", "CP866" },
  +    { "KOI8-U", "CP866" },
  +    { "ISO-8859-5", "CP866" }
  +};

  As you see, CP866 is selected on all systems having UTF-8 as system
  charset (almost any modern system). Definitely not correct behavior.

  The correct behavior is to determine the relevant OEM or ANSI code page based on the system locale and use it. You can look at this PR for reference implementation:
  https://github.com/p7zip-project/p7zip/pull/232

  Upstream issue:
  https://sourceforge.net/p/infozip/bugs/43/#951c

  ProblemType: Bug
  DistroRelease: Ubuntu 24.04
  Package: unzip 6.0-28ubuntu4
  ProcVersionSignature: User Name 6.8.0-31.31-generic 6.8.1
  Uname: Linux 6.8.0-31-generic x86_64
  ApportVersion: 2.28.1-0ubuntu2
  Architecture: amd64
  CasperMD5CheckMismatches: ./boot/grub/grub.cfg
  CasperMD5CheckResult: fail
  CurrentDesktop: ubuntu:GNOME
  Date: Wed May 22 11:05:59 2024
  InstallationDate: Installed on 2024-04-29 (23 days ago)
  InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424)
  ProcEnviron:
   LANG=en_US.UTF-8
   PATH=(custom, no user)
   SHELL=/bin/bash
   TERM=xterm-256color
   XDG_RUNTIME_DIR=<set>
  SourcePackage: unzip
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/2066389/+subscriptions




More information about the foundations-bugs mailing list