[Bug 2066389] Re: unzip uses Russian Cyrillic CP866 as the OEM encoding, even if the Russian locale is not selected in the system
Launchpad Bug Tracker
2066389 at bugs.launchpad.net
Fri Jun 14 15:29:42 UTC 2024
This bug was fixed in the package unzip - 6.0-28ubuntu5
---------------
unzip (6.0-28ubuntu5) oracular; urgency=medium
[ Ivan Sorokin ]
* Add 30-fix-code-pages.patch with the following fixes (LP: #2066389):
- Fixed bit 11 of General purpose flag support on systems with UTF-8
system charset.
- Fixed OEM code page being always assumed Russian/Cyrillic CP866 on
any UTF-8 system.
- Added proper OEM code page detection based on system locale setting.
- Removed translation from ISO 8859-1 to local charset; assumption that
any non-unicode archive uses it is for sure wrong as it can be any
charset used on archive creator's local system; also do not treat
PKZIP for UNIX 2.51 archives as having ISO 8859-1 charset for the
same reasons.
- Enabled UTF-8 output by default on Unix systems.
[ Dmitry Shachnev ]
* Add tests for unicode file names in different encodings.
-- Dmitry Shachnev <mitya57 at ubuntu.com> Tue, 11 Jun 2024 21:48:13
+0300
** Changed in: unzip (Ubuntu)
Status: Confirmed => Fix Released
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to unzip in Ubuntu.
https://bugs.launchpad.net/bugs/2066389
Title:
unzip uses Russian Cyrillic CP866 as the OEM encoding, even if the
Russian locale is not selected in the system
Status in unzip package in Ubuntu:
Fix Released
Bug description:
The built-in .zip archiver in Windows uses DOS (OEM) code page
corresponding to current regional settings for new archives. Lots of
such archives exist.
The problem is that Ubuntu's unzip is stuck with CP866 for such archives. Have a look at
20-unzip60-alt-iconv-utf8.patch
especially on mapping of system charset to charsets unzip expects to have in archive
+/* A mapping of local <-> archive charsets used by default to convert filenames
+ * of DOS/Windows Zip archives. Currently very basic. */
+static CHARSET_MAP dos_charset_map[] = {
+ { "ANSI_X3.4-1968", "CP850" },
+ { "ISO-8859-1", "CP850" },
+ { "CP1252", "CP850" },
+ { "UTF-8", "CP866" },
+ { "KOI8-R", "CP866" },
+ { "KOI8-U", "CP866" },
+ { "ISO-8859-5", "CP866" }
+};
As you see, CP866 is selected on all systems having UTF-8 as system
charset (almost any modern system). Definitely not correct behavior.
The correct behavior is to determine the relevant OEM or ANSI code page based on the system locale and use it. You can look at this PR for reference implementation:
https://github.com/p7zip-project/p7zip/pull/232
Upstream issue:
https://sourceforge.net/p/infozip/bugs/43/#951c
ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: unzip 6.0-28ubuntu4
ProcVersionSignature: User Name 6.8.0-31.31-generic 6.8.1
Uname: Linux 6.8.0-31-generic x86_64
ApportVersion: 2.28.1-0ubuntu2
Architecture: amd64
CasperMD5CheckMismatches: ./boot/grub/grub.cfg
CasperMD5CheckResult: fail
CurrentDesktop: ubuntu:GNOME
Date: Wed May 22 11:05:59 2024
InstallationDate: Installed on 2024-04-29 (23 days ago)
InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424)
ProcEnviron:
LANG=en_US.UTF-8
PATH=(custom, no user)
SHELL=/bin/bash
TERM=xterm-256color
XDG_RUNTIME_DIR=<set>
SourcePackage: unzip
UpgradeStatus: No upgrade log present (probably fresh install)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/2066389/+subscriptions
More information about the foundations-bugs
mailing list