[Bug 846628] Re: gnu sort extremely slow in non C locale
Pádraig Brady
P at draigBrady.com
Thu Jul 10 08:50:52 UTC 2014
coreutils maintainer here.
Honoring locale sorting rules takes lots of extra logic.
POSIX and long time behavior dictates that remains the default.
Only you know what your data is, so it's up to you
to better describe it by passing LC_ALL=C to sort as above.
Given that, GNU sort should perform better than other implementations,
due to auto using multiple threads if appropriate etc.
The same data classification methods apply to grep etc.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to coreutils in Ubuntu.
https://bugs.launchpad.net/bugs/846628
Title:
gnu sort extremely slow in non C locale
Status in “coreutils” package in Ubuntu:
New
Bug description:
I tried sorting an ascii file of about 300 Megs and 8 million lines
with gnu sort and it was taking forever.
After 10 minutes I stopped it. I tried another sort program and it
finished in about 40 seconds.
I then took the output of that second sort and I checked it in gnu
sort, which reported that some lines were out of order.
The following lines:
....bbbbbbbbbwbbwwbwwwwwww.ww...1
....bbbbbbbbbwbbwwbwwwwwwwww....0
....bbbbbbbbbwbwwbwbwwwww.ww..w.1
But they are not as far as I can tell. Then I thought the problem was the locale. Indeed my locale was set to:
LANG=en_CA.UTF-8
setting it to:
LANG=C
both made gnu sort finish the sort in 40 seconds, and confirm the
proper order.
Since the file is %100 ASCII (it only has the 6 characters ".01bw\n" I
think this is a bug, that the locale should make any difference.
Best regards,
Bijan
ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: coreutils 8.5-1ubuntu6
ProcVersionSignature: Ubuntu 2.6.38-11.48-generic 2.6.38.8
Uname: Linux 2.6.38-11-generic i686
Architecture: i386
Date: Sat Sep 10 15:59:07 2011
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Release i386 (20110427.1)
ProcEnviron:
LANGUAGE=en_CA:en
LANG=en_CA.UTF-8
SHELL=/bin/bash
SourcePackage: coreutils
UpgradeStatus: No upgrade log present (probably fresh install)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/846628/+subscriptions
More information about the foundations-bugs
mailing list