[Bug 1165536] [NEW] grep doesn't handle \w correctly

matteo sisti sette 1165536 at bugs.launchpad.net
Sat Apr 6 20:39:41 UTC 2013


Public bug reported:

Put the following text into a file called testfile.txt:

-------testfile.txt-------
if ($this->_touchOnly) //log if touchOnly
only log the output of the controller if $silent
if (window.console) console.log(data);

ERunActions::runAction($logOutput=false,$silent=false)
self::logTrace('Action output: ' . $output,$controller->id,$actionId);
self::logError($msg,$controller->id,$actionId);
-------end of testfile.txt

Try this command:
$ grep -Re "[^\w]log[^\w]" testfile.txt

Expected result:
Should find matches only in lines 1,2,3, gighlighting the word "log"

Observed result:
Finds also matches in lines 5,6,7, highlighting the strings "logO", "logT" and "logE" respectively.


According to the man page:
    The symbol \w is a synonym for [_[:alnum:]]

And in turn:
    For  example,  [[:alnum:]]  means  the  character  class of numbers and
       letters in the current locale. In the C locale and ASCII character  set
       encoding,  this is the same as [0-9A-Za-z].
    

It looks like, instead, \w is interpreted as only numbers and LOWERCASE
letters, but not uppercase letters.


I know it seems unbelievable to be finding a bug in grep in 2013, but compare to
http://regexr.com?34e4i
which shows the expected result

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: grep 2.12-2
ProcVersionSignature: Ubuntu 3.5.0-27.46-generic 3.5.7.7
Uname: Linux 3.5.0-27-generic i686
NonfreeKernelModules: nvidia
ApportVersion: 2.6.1-0ubuntu10
Architecture: i386
Date: Sat Apr  6 22:27:39 2013
InstallationDate: Installed on 2010-06-23 (1018 days ago)
InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release i386 (20100429)
MarkForUpload: True
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: grep
UpgradeStatus: Upgraded to quantal on 2013-01-13 (83 days ago)

** Affects: grep (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: apport-bug i386 quantal running-unity

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to grep in Ubuntu.
https://bugs.launchpad.net/bugs/1165536

Title:
  grep doesn't handle \w correctly

Status in “grep” package in Ubuntu:
  New

Bug description:
  Put the following text into a file called testfile.txt:

  -------testfile.txt-------
  if ($this->_touchOnly) //log if touchOnly
  only log the output of the controller if $silent
  if (window.console) console.log(data);

  ERunActions::runAction($logOutput=false,$silent=false)
  self::logTrace('Action output: ' . $output,$controller->id,$actionId);
  self::logError($msg,$controller->id,$actionId);
  -------end of testfile.txt

  Try this command:
  $ grep -Re "[^\w]log[^\w]" testfile.txt

  Expected result:
  Should find matches only in lines 1,2,3, gighlighting the word "log"

  Observed result:
  Finds also matches in lines 5,6,7, highlighting the strings "logO", "logT" and "logE" respectively.

  
  According to the man page:
      The symbol \w is a synonym for [_[:alnum:]]

  And in turn:
      For  example,  [[:alnum:]]  means  the  character  class of numbers and
         letters in the current locale. In the C locale and ASCII character  set
         encoding,  this is the same as [0-9A-Za-z].
      

  It looks like, instead, \w is interpreted as only numbers and
  LOWERCASE letters, but not uppercase letters.

  
  I know it seems unbelievable to be finding a bug in grep in 2013, but compare to
  http://regexr.com?34e4i
  which shows the expected result

  ProblemType: Bug
  DistroRelease: Ubuntu 12.10
  Package: grep 2.12-2
  ProcVersionSignature: Ubuntu 3.5.0-27.46-generic 3.5.7.7
  Uname: Linux 3.5.0-27-generic i686
  NonfreeKernelModules: nvidia
  ApportVersion: 2.6.1-0ubuntu10
  Architecture: i386
  Date: Sat Apr  6 22:27:39 2013
  InstallationDate: Installed on 2010-06-23 (1018 days ago)
  InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release i386 (20100429)
  MarkForUpload: True
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  SourcePackage: grep
  UpgradeStatus: Upgraded to quantal on 2013-01-13 (83 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1165536/+subscriptions




More information about the foundations-bugs mailing list