Is it possible to use the Compare Tool for this?

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
Kelly
Posts: 34
Joined: Sat May 28, 2011 8:59 am
Location: Ellsworth, ME

Is it possible to use the Compare Tool for this?

Post by Kelly »

I have two lists. The first list (UMM) contains (75) towns and cities in Maine for parcels layers provided by the University of Maine at Machias and in the second list (MEGIS) there are (235) towns and cities in Maine for parcels layers provided by the Maine Office of GIS.

The Compare Files... tool produces an output that is slightly bewildering to me, but short of this goal:

Create separate Lists Showing
1) Only those towns and cities which are in UMM, but not MEGIS
2) Only those towns and cities which are in MEGIS, but not UMM
3) Only those towns and cities which are in both MEGIS and UMM

I went looking in hopes of finding a macro or add-on that might help to create the uncommon / common sets but couldn't discover any, nor did doing searches in the forums ... and I'm still wondering: Is it possible to use the Compare Tool for this?

Thank you very much for any hints and pointers.

Kindest regards,

Kelly
ak47wong
Posts: 703
Joined: Tue Aug 12, 2003 9:37 am
Location: Sydney, Australia

Post by ak47wong »

You could use the Compare Files tool, but it's not ideal for this. Here's one solution, which doesn't actually use TextPad and which also requires a separate program, namely the comm utility from UNIX.

The only prerequisite is that your files, UMM.txt and MEGIS.txt, must first be sorted. Then follow these steps:
  1. Download this set of utilities: http://sourceforge.net/projects/unxutils/
  2. Extract the file usr\local\wbin\comm.exe from the archive UnxUtils.zip.
  3. At the Windows Command Prompt, enter these commands:
    comm -2 -3 UMM.txt MEGIS.txt >UMM_only.txt
    comm -1 -3 UMM.txt MEGIS.txt >MEGIS_only.txt
    comm -1 -2 UMM.txt MEGIS.txt >common.txt
This produces the following three files:
1) Only those towns and cities which are in UMM, but not MEGIS: UMM_only.txt
2) Only those towns and cities which are in MEGIS, but not UMM: MEGIS_only.txt
3) Only those towns and cities which are in both MEGIS and UMM: common.txt

Perhaps ben_josephs can offer a Perl script to do the same thing!
Kelly
Posts: 34
Joined: Sat May 28, 2011 8:59 am
Location: Ellsworth, ME

Post by Kelly »

ak47wong,

Thank you very much for the kind reply and suggestion.

I'll give comm a try once I have replaced the CMOS backup battery (CR2032) for my Ubuntu 12.04 LTS box (old Dell Dimension 8300 desktop) - hopefully later today :)

The first battery lasted from 2003 to about 2010 when I retired the 8300, but the CR2032's replacement (from Radio Shack) less than 9 months!

Ah, but now I see I was a little confused - this is for Windows! Good to learn. And for anybody else (like me ;) looking for an expansion of

Code: Select all

comm --help
here's one link: http://www.gnu.org/software/coreutils/m ... ation.html

I'll still want to get that battery soon :)

Thanks again! Very nice of you.

Kelly
ben_josephs
Posts: 2457
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

ak47wong wrote:Perhaps ben_josephs can offer a Perl script to do the same thing!
I could offer such a script; it would be easy to write.

But I'm not sure there would be much benefit in writing a new command-line tool when, as you have pointed out, a perfectly good one already exists. And it would require the installation of Perl, which might be of no other use to the OP, who I suspect is not a programmer.

I have no idea whether the versions of unix tools that you recommend are good implementations. I use Cygwin (http://www.cygwin.com/), which is a huge and regularly updated collection of linux tools for Windows.
Kelly
Posts: 34
Joined: Sat May 28, 2011 8:59 am
Location: Ellsworth, ME

Post by Kelly »

Hi Ben - you suspect correctly! - I'm no programmer (hello world! is about as far as I got ;)

And just to be sure, is the default sorting that's afforded through TextPad 7.0.9 sufficient for comm's purposes? for example, should the Properties dialog box for both files being compared need the 'Strip trailing spaces from lines saving' check box checked?
ben_josephs
Posts: 2457
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Yes, if your files use only single-byte characters (not Unicode) then I believe TextPad's ascending, case-sensitive sort is what is required.

And yes, if the files may contain different amounts of trailing white space you should strip that white space, as comm treats lines that differ in the amount of white space as different.
ben_josephs
Posts: 2457
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Of course, if you're using the command line you might use Windows or linux command-line sort to sort the files. And you might use sed to remove the trailing white space.
Kelly
Posts: 34
Joined: Sat May 28, 2011 8:59 am
Location: Ellsworth, ME

Post by Kelly »

Thanks Ben! Actually, the sorting and trail stripping by TextPad proved sufficient for comm.

The -1, -2, -3 switches are good to know about, but actually, running comm without any switch produced three columns showing all three sets in one output (just needed a couple extra tabs between columns).

Very cool little utility - thank you both for your help.

Kelly
User avatar
jeffy
Posts: 323
Joined: Mon Mar 03, 2003 9:04 am
Location: Philadelphia

Post by jeffy »

Great find! Thanks ak47wong!
ak47wong
Posts: 703
Joined: Tue Aug 12, 2003 9:37 am
Location: Sydney, Australia

Post by ak47wong »

Thanks jeffy, but it wasn't really a "find" on my part; the comm utility has been a part of UNIX for 40 years now :)
Kelly
Posts: 34
Joined: Sat May 28, 2011 8:59 am
Location: Ellsworth, ME

Post by Kelly »

Hi AK47wong, I'm with Jeff - it was great for me to find it with many thanks to you for sharing it :)

Best,

Kelly
Post Reply