[Bug 2058775] Re: coreutils: printf formatting bug for nb_NO and nn_NO locales
Thomas Dreibholz
2058775 at bugs.launchpad.net
Fri Mar 22 21:07:27 UTC 2024
In a hexdump, printf seems to add a 3 characters for the thousands
separator:
#!/bin/sh
for l in de_DE en_US nb_NO nn_NO ; do
echo "LC_NUMERIC=$l.UTF-8"
for n in 1 100 1000 10000 100000 1000000 10000000 ; do
LC_NUMERIC=$l.UTF-8 /usr/bin/printf "<%'8d>" $n | hexdump -C
done
done
Output:
LC_NUMERIC=nb_NO.UTF-8
00000000 3c 20 20 20 20 20 20 20 31 3e |< 1>|
0000000a
00000000 3c 20 20 20 20 20 31 30 30 3e |< 100>|
0000000a
00000000 3c 20 31 e2 80 af 30 30 30 3e |< 1...000>|
0000000a
00000000 3c 31 30 e2 80 af 30 30 30 3e |<10...000>|
0000000a
00000000 3c 31 30 30 e2 80 af 30 30 30 3e |<100...000>|
0000000b
00000000 3c 31 e2 80 af 30 30 30 e2 80 af 30 30 30 3e |<1...000...000>|
0000000f
00000000 3c 31 30 e2 80 af 30 30 30 e2 80 af 30 30 30 3e |<10...000...000>|
00000010
LC_NUMERIC=nn_NO.UTF-8
00000000 3c 20 20 20 20 20 20 20 31 3e |< 1>|
0000000a
00000000 3c 20 20 20 20 20 31 30 30 3e |< 100>|
0000000a
00000000 3c 20 31 e2 80 af 30 30 30 3e |< 1...000>|
0000000a
00000000 3c 31 30 e2 80 af 30 30 30 3e |<10...000>|
0000000a
00000000 3c 31 30 30 e2 80 af 30 30 30 3e |<100...000>|
0000000b
00000000 3c 31 e2 80 af 30 30 30 e2 80 af 30 30 30 3e |<1...000...000>|
0000000f
00000000 3c 31 30 e2 80 af 30 30 30 e2 80 af 30 30 30 3e |<10...000...000>|
00000010
However, both in Konsole as well as in XTerm, the issue occurs. So, the
bytes "0xe2 0x80 0xaf" inserted by printf for the thousands separator
seem to be incorrect? "0xe2 0x80 0xaf" is UTF-8 NARROW NO-BREAK SPACE ->
https://www.fileformat.info/info/ .unicode/char/202f/index.htm .
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to coreutils in Ubuntu.
https://bugs.launchpad.net/bugs/2058775
Title:
coreutils: printf formatting bug for nb_NO and nn_NO locales
Status in coreutils package in Ubuntu:
New
Bug description:
I just discovered a printf bug for at least the nb_NO and nn_NO
locales when printing numbers with thousands separator. To reproduce:
#!/bin/bash
for l in de_DE en_US nb_NO ; do
echo "LC_NUMERIC=$l.UTF-8"
for n in 1 100 1000 10000 100000 1000000 10000000 ; do
LC_NUMERIC=$l.UTF-8 /usr/bin/printf "<%'10d>\n" $n
done
done
The expected output of "%'10d" is a right-formatted number string with
10 characters.
The output of the test script is fine for e.g. LC_NUMERIC=de_DE.UTF-8
and LC_NUMERIC=en_US.UTF-8:
LC_NUMERIC=de_DE.UTF-8
< 1>
< 100>
< 1.000>
< 10.000>
< 100.000>
< 1.000.000>
<10.000.000>
LC_NUMERIC=en_US.UTF-8
< 1>
< 100>
< 1,000>
< 10,000>
< 100,000>
< 1,000,000>
<10,000,000>
However, for LC_NUMERIC=nb_NO.UTF-8 and LC_NUMERIC=nn_NO.UTF-8, the
formatting is wrong:
LC_NUMERIC=nb_NO.UTF-8
< 1>
< 100>
< 1 000>
< 10 000>
< 100 000>
<1 000 000>
<10 000 000>
LC_NUMERIC=nn_NO.UTF-8
< 1>
< 100>
< 1 000>
< 10 000>
< 100 000>
<1 000 000>
<10 000 000>
I reproduced the issue with coreutils-8.32-4.1ubuntu1.1 (Ubuntu 22.04)
as well as coreutils-9.3-5.fc39.x86_64 (Fedora 39).
ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: coreutils 8.32-4.1ubuntu1.1
ProcVersionSignature: Ubuntu 6.5.0-26.26~22.04.1-generic 6.5.13
Uname: Linux 6.5.0-26-generic x86_64
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
CasperMD5CheckResult: pass
CurrentDesktop: KDE
Date: Fri Mar 22 21:33:13 2024
InstallationDate: Installed on 2022-11-29 (479 days ago)
InstallationMedia: Kubuntu 22.04.1 LTS "Jammy Jellyfish" - Release amd64 (20220809.1)
SourcePackage: coreutils
UpgradeStatus: No upgrade log present (probably fresh install)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/2058775/+subscriptions
More information about the foundations-bugs
mailing list