AWK experts - how would I code around this in awk...
Doug Robinson
dkrr at telus.net
Thu Feb 18 23:47:23 UTC 2010
Doug Robinson wrote:
> Alex Janssen wrote:
>
>> Steve Flynn wrote:
>>
>>
>>> I have a text file with lines of differing length.
>>>
>>> I want to parse the entire file and make each line (for example) 20 bytes long.
>>>
>>> If a record is too short as in the following example: (rec1,2 and 3
>>> are just so I can refer to them easily - the example file should be
>>> just the numeric portion).
>>>
>>> rec1 123456789012345
>>> rec2 67890
>>> rec3 12345678901234567890
>>>
>>> ... then I need to append rec2 to rec1.
>>>
>>> Obviously after appending rec2 to rec1, the next line to be read
>>> should be rec3. After completion, the entire file would consist of two
>>> records in this example case, both 20 bytes long.
>>>
>>>
>>>
>>> I should point out that the complete file may well be in the hundreds
>>> of millions of records so holding the entire thing in memory is
>>> probably not a good idea.
>>>
>>> Any idea on how I would go about this in awk?
>>>
>>> If you believe awk to not be a good candidate for this, I'm open to
>>> suggestions on alternatives.
>>>
>>>
>>> (as a side note, this is for some data which I need to parse which has
>>> embedded CF/LF's in it, thus splitting what should be one record into
>>> perhaps multiples rows... I need a quick (and easy) way of stitching
>>> it back together.
>>>
>>>
>>>
>>>
>>>
>> Maybe a bash script that removes all CR's and LF's and uses echo to
>> reinsert them every 20 characters would do the job.
>>
>> #######script
>> OLDFILE="whatever"
>> NEWFILE="whatever-new"
>> touch $NEWFILE
>> while read -n20 LINE
>> do
>> echo "$LINE" >>$NEWFILE
>> done <$(cat $OLDFILE|tr -d "\n\r")
>> exit 0
>> ###########end script
>>
>> Alex
>>
>>
>>
> Geee - years & year since I thought in awk; but do you mean this?
> BEGIN {
> i = 1
> }
>
> /.*/ {
> if (i++ >= 3) {
> printf ("%s\n", $1)
> i =1;
> }else {
> printf ("%s ", $1)
> }
> }
>
> hundreds of thousands may talk a while just to read & write!
>
> dkr
>
>
>
or perhaps this:
BEGIN {
len = 0
}
END {
print ""
}
/.*/ {
if (len < 20) {
printf ("%s", $1)
len = len + length ($1)
} else {
printf ("%s\n", $1)
len = 0
}
}
anyway this is the idea - good luck with AWK - fun fun fun
dkr
More information about the ubuntu-users
mailing list