Text processing tools and/or languages
Peter Flynn
peter at silmaril.ie
Thu Jan 20 21:41:25 UTC 2022
On 20/01/2022 15:42, Chris Green wrote:
> I'm looking for tools (if there are any) for processing a text file
> line by line sequentially.
Pretty mcuh all the standard Unix text tool do exactly this.
> As it goes through the file it needs to make decisions based on the
> contents of the line(s) of text and change its state as it goes.
> The decisions it makes depend on the state it's in.
awk is the obvious choice to me, but for others it would be one of the
common scripting languages like Perl or Python.
A lot may depend on the nature of the data and what you want to do with
it. Picking the right tool for the job isn't always simple, although
there is a tendency for people to stick to one tool they know well, and
shoe-horn every task into the constraints of that tool :-)
> Basically I'm processing some (fairly) fixed format messages from a
> forum to remove some matched header and trailer lines, modify and
> output a few other matched lines and simply output the body of the
> message.
>
> The (most) difficult bit is removing blank lines before something.
As many tools don't read ahead to the next line, you will need to set
some kind of flag value to indicate what type of line the previous line
was, and make decisions on that basis.
There are some languages with built-in features for doing exactly this
kind of processing, understanding the concept of "lines of importance
separated by blank lines". Omnimark and Saxon are the two I have used most.
Peter
More information about the ubuntu-users
mailing list