Delete duplicated lines

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
Luoji

Delete duplicated lines

Post by Luoji »

Hello,

It seems it is only possible to delete duplicated lines when you sort a file. I tried to do it using regular expressions but I didn't success:

replace:
\(^.*\n\)\([.]*\)\1
by:
\1\2

report the following error message:

Unmatched '( or }'

Does someone have another idea?

Regards,

Luoji.
Roy Beatty

Re: Delete duplicated lines

Post by Roy Beatty »

Yes, it would be nice to have a Sort option to "Delete subsequent lines with duplicate keys." Then you could add a padded index column, run the sort on a key other than the index column, and then sort on the index column to restore the starting positions of your records.

I don't see a way to do this, but it would be a great enhancement. You could then use it to massage a data file to cull duplicate keys before loading to a SQL table with a no duplicate constraint.

If you find a way, please post yor method!

Roy
Jeff Rozycki

Re: Delete duplicated lines

Post by Jeff Rozycki »

I have an awk script which will delete duplicate lines, but I am having a problem setting up Text pad to run this as a tool. I have to save the file I want to dedup and run the awk script from a Cygwin Bash shell.
How do I set up Textpad run the awk script? Any ideas? Here is the awk script:

#! D:/Applications/Cygwin/bin/awk -f
BEGIN {
if (data[$0]++ == 0)
lines[++count] = $0
}

END {
for (i = 1; i <= count; i++)
print lines
}
Jeff Rozycki

Re: Delete duplicated lines

Post by Jeff Rozycki »

search the forum for "dedup" I posted a corrected awk script to remove duplicate lines and how to implement it in another thread.
Post Reply