Help with a regular expression???

Kevin O'Gorman kogorman at gmail.com
Wed Jan 27 14:06:52 UTC 2010


On Wed, Jan 27, 2010 at 5:36 AM, Ray Parrish <crp at cmc.net> wrote:

> Hello,
>
> I am working on some code to take a textually formatted list, and change
> it into an HTML formatted list. I have input that looks like the following
> -
>
>  - visual wxWindows frame design,
>  - object inspector and explorer,
>  - syntax highlighting editor with code completion, call tips and code
>    browsing for Python code,
>  - syntax highlighting editor for C, C++, HTML, XML, config files (INI
>    style),
>  - documentation generation,
>  - an integrated Python debugger,
>  - integrated help,
>  - a Python Shell,
>  - an explorer able to browse, open/edit, inspect and interact with
>    various data sources including files, CVS, Zope, FTP, DAV and SSH,
>  - an UML view generator.
>
> What I need to do is replace the occurrences of 4 spaces followed by any
> character, with an underline character, a space character, and the
> original fifth character on the line to indicate line continuation to a
> following routine, which will concatenate the pieces on the continued
> lines onto the previous line segments.
>
> The part I do not know how to do is preserving the fifth character in an
> assignment like the following [the entire data section above will be in
> one variable]
>
> Data=${Data/    [a-z,A-Z]/_ }
>
> As you can likely see, that code line will not work yet, as I do not
> know how to specify that whatever character gets found after the fourth
> space is to be part of the replace term. I'm not even sure that I have
> properly specified the search term to match the single fifth character
> either.
>
> It looks like you are trying to use bash patterns.  They are not even
regular expressions.
However that may be, I would suggest a perl filter, which is a one-liner
suitable for a pipeline.
   perl -p -e 's/    ([^ ])/_ \1/g;'
translation:
   -p: copy everything in a loop
   -e: statement for the loop follows
   s/   /   /g:   do a substitution on everything (g=multiple times on a
line)
   (  ): remember this
   [^ ]: a character class consisting of any one non-space character.
Probably does not match newlines either.
   \1  : the first remembered thing

Hope this helps.

++ kevin

-- 
Kevin O'Gorman, PhD
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20100127/05fb20da/attachment.html>


More information about the ubuntu-users mailing list