Solved: Re: Diff two address lists?

Tue Jul 31 13:48:28 UTC 2012

On Mon, Jul 30, 2012 at 7:49 AM, Patton <p.echols at comcast.net> wrote:
> On 07/28/2012 10:11 AM, Jonesy wrote:
>>
>> On Fri, 27 Jul 2012 20:08:22 -0700, Patton wrote:
>>>
>>> I have a need to compare two lists of email addresses.  Each one is a
>>> text file with address per line.  What I need to output is a list of
>>> email addresses that are only in list B, I do not want the addresses
>>> that are in both or are only in A.
>>
>>   _IF_ (big "if") the two lists are pure email addys with no further
>> embellishments/contents, something viz:
>>
>>      grep -v -f "B"_file "A"_file
>>
>> Or, maybe I'm rong.  Haven't had my coffee yet....
>>
>> Jonesy
>
>
> Thanks, but since the lists had columns for "first name" "Last name" and
> "Full Name", I think that would not work.  I ended up doing a script that
> did something similar
>
> For those searching the archives with the same problem,
>
> #! /bin/bash
> # Crosswalk file1 against file2
> # output lines of file2 not in file1 to file3
> # Use Syntax parse-csv.sh file1 file2 file3
>
> FILE_1="$1"
> FILE_2="$2"
> FILE_3="$3"
>
> cp $FILE_2 TX_FILE_1.csv
>
> # for each line, parse the fields into script variables
> while IFS='\",\"' read a b c d
>     do
>
> # Now grep the TX file and print everything but the line with the email in
> the line of the first input file ($d )
>
> grep -v $d TX_FILE_1.csv > TX_FILE_2.csv
> mv TX_FILE_2.csv TX_FILE_1.csv
>
> # Next for debugging, to see what the script thinks the variable is
> echo "Dee is equal $d"
>
>     done < $FILE_1 #end the loop
>
> cp TX_FILE_1.csv $FILE_3
>
> =====================
>
> There might have been a more elegant way, but this worked. Anyone wants to
> comment, I appreciate learning new stuff.

I seem to recall a program in the diff family that did this directly.
Probably out of my dim past somewhere.  However, it should not be hard
to do with the existing utilities with very little scripting, if you
can work on sorted files.

So:
1 sort both files, and work on the sorted versions.  The rest of this
can be done in a pipeline.  Actually the whole thing can be done in a
pipeline if you're VERY clever with named pipes, redirection and such
but I won't go into that.
2 use 'diff -u'  this will emit only differences, and tag lines that
are unique to one file or the other.
3 grep the result with either "^- " (hat minus space) or "^+ " (hat
plus space) to pick the lines unique to A or B
      at your discretion.
4 "cut -b3-" to remove the tag that diff put on the lines.

++ kevin

-- 
Kevin O'Gorman, PhD