[Bug 1547466] Re: grep switches into binary mode while processing a text file
teo1978
teo8976 at gmail.com
Fri Apr 29 12:41:24 UTC 2016
> This has been the case for a long time.
Nope. Just a few months.
> If you try to show non-UTF-8
> data in an UTF-8 locale you'll just see garbage (or other encoding
> mismatches)
That doesn't mean that the file should be processed as binary. Also,
previous to the regression, grep would work as expected. I guess it
might fail to find matches of non-ascii characters encoded in a non-utf8
encoding (though I don't see why it couldn't decode each file according
to its encoding and match the contents unicode-wise), but when grepping
for "foo" it would find matches for the string "foo" both in
utf8-encoded files and in iso8859-encoded files.
> This bug is about switching to binary mode in the 'C' locale only.
Then I wonder why somebody marked the one I reported as duplicate of
this one
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to grep in Ubuntu.
https://bugs.launchpad.net/bugs/1547466
Title:
grep switches into binary mode while processing a text file
Status in grep package in Ubuntu:
Fix Released
Status in grep source package in Xenial:
Fix Committed
Status in grep source package in Yakkety:
Fix Released
Status in grep package in Debian:
Fix Released
Bug description:
I noticed this staring to happen in Xenial about two days ago. When
running sbuild (or now the buildd, too), the build breaks when trying
to compile a generated file. I traced the problem down to grep
suddenly acting weird. When not having any language set (or a non-UTF8
mode) it will start printing some lines of a source file and then
suddenly end that by printing "Binary file ... matches".
With the attached file, the difference can be observed (running
Xenial):
LANG=C grep -v xxx grant_table.h
and
LANG=C.UTF-8 grep -v xxx grant_table.h
SRU INFORMATION
===============
Upstream fixes:
- http://git.savannah.gnu.org/cgit/grep.git/commit/?id=d8a366218 (but depends on previous patches and is not sufficient by itself)
- http://git.savannah.gnu.org/cgit/grep.git/commit/?id=d8a366218 (tests+doc)
Test case:
Call grep on a file or a string with non-ASCII characters in the C locale:
$ echo 'héll☺ ≥x' | LC_ALL=C grep .
In xenial this just shows "Binary file (standard input) matches", with the fix it should show the actual input string (with some garbled output of course as the UTF-8 chars cannot be displayed in C)
Regression potential: grep is being used in tons of places; during
xenial we had to fix/put a "use grep -a" workaround into a lot of
packages to fix the fallout from grep 2.23 which introduced this. That
said, as a result of "Binary file matches" does not give any more
information than the actual string match, and scripts which get along
with this answer most likely just check the exit code anyway (which
does not change), the risk is bearable.
We will soon do a test rebuild in yakkety with gcc-6 and grep 2.25,
and will sift through the results to identify new FTBFS that are due
to grep 2.25. This SRU should not be released until this happens.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1547466/+subscriptions
More information about the foundations-bugs
mailing list