Page 1 of 1

Delete duplicated lines

Posted: Wed May 09, 2001 6:21 am
by Luoji
Hello,

It seems it is only possible to delete duplicated lines when you sort a file. I tried to do it using regular expressions but I didn't success:

replace:
\(^.*\n\)\([.]*\)\1
by:
\1\2

report the following error message:

Unmatched '( or }'

Does someone have another idea?

Regards,

Luoji.

Re: Delete duplicated lines

Posted: Wed May 09, 2001 2:21 pm
by Roy Beatty
Yes, it would be nice to have a Sort option to "Delete subsequent lines with duplicate keys." Then you could add a padded index column, run the sort on a key other than the index column, and then sort on the index column to restore the starting positions of your records.

I don't see a way to do this, but it would be a great enhancement. You could then use it to massage a data file to cull duplicate keys before loading to a SQL table with a no duplicate constraint.

If you find a way, please post yor method!

Roy

Re: Delete duplicated lines

Posted: Wed May 30, 2001 5:42 pm
by Jeff Rozycki
I have an awk script which will delete duplicate lines, but I am having a problem setting up Text pad to run this as a tool. I have to save the file I want to dedup and run the awk script from a Cygwin Bash shell.
How do I set up Textpad run the awk script? Any ideas? Here is the awk script:

#! D:/Applications/Cygwin/bin/awk -f
BEGIN {
if (data[$0]++ == 0)
lines[++count] = $0
}

END {
for (i = 1; i <= count; i++)
print lines
}

Re: Delete duplicated lines

Posted: Fri Jun 01, 2001 1:37 pm
by Jeff Rozycki
search the forum for "dedup" I posted a corrected awk script to remove duplicate lines and how to implement it in another thread.