How to delete a duplicate line (all instances)

erpankajgupta · Post by **erpankajgupta** » Tue Jul 29, 2008 8:39 am

Hi,
I have a file which looks like as below:

A12345
A123
B23456
A12345
B23456
A2345
A123456
A12345
B23456

In the above file, A12345 & B23456 have multiple instances. I want to remove a line if one or more duplicates exist for the line (including the duplicates) i.e. delete all instances of a line if a duplicate line is found.
The expected result is as below:

A123
A2345
A123456

(Please note that the expected result does not have any instance of "A12345" or "B23456". )

Please let me know if it is possible through Macros? If yes, please suggest me the detailed steps.

Thanks & regards,
Panki

ben_josephs · Post by **ben_josephs** » Tue Jul 29, 2008 11:06 am

If you were happy to sort the records of the file you could sort them and delete duplicates, but that would leave one copy of each duplicated record.

TextPad macros aren't powerful enough to do this. You need a proper scripting language, but TextPad doesn't provide one. If you use Perl you can use this script that takes two passes over the input to do it:

Code: Select all

my @file = <> ;

my %counts ;

for my $line ( @file )
{
  $counts{ $line } ++ ;
}

for my $line ( @file )
{
  if ( $counts{ $line } == 1 )
  {
    print $line ;
  }
}

You can run this on the command line or as a TextPad tool.

Bob Hansen · Post by **Bob Hansen** » Wed Jul 30, 2008 4:06 am

You could also sort them, but remove the checkmark that Deletes duplicates, and then manually delete ALL the duplicates vs. using the Delete function.

If you only have the two duplicate codes as your example this will be easy to do.

Before sorting, you could insert a leading incrementing sequence number, then do the sort by the new column position after the leading number, manually delete the duplicate lines, then resort from column 1 putting them back into original order, then remove those leading sequence numbers.

aznap · Post by **aznap** » Wed Jul 30, 2008 5:18 am

I wonder if there isn't a way to sort the list and then do a search replace?

In the help file it says:

For example \(tu\) \1 matches the string "tu tu".

I can't make it work by sorting a list and then searching for the same pattern on consecutive lines. However, I stripped out trailing spaces and then I replaced all \n with |||, then in the Find what box entered
\(.+|||\)\1
and in the Replace with box entered nothing.

That got rid of doubles.

When the duplicates are gone, then replace ||| with \n again to make the list items again on separate lines.

To get rid of triples, put in the Find what box
\(.+|||\)\1\1
and you could add more "\1"s for more repeats.

I suppose if you knew there would never be more than X repeats, you could create a macro that first got rid of trailing spaces, then sorted the list, then replaced \n with |||, then did a find/replace for X repeats, then did a find replace for X-1 repeats, then for X-2 repeats, and so on, until you are deleting only duplicates, then end the macro with the replace of ||| with \n.

There might be a problem if the expected maximum of repeats is some large number?

Maybe there is a better way? Like maybe marking the repeated lines so that they all start with the same character(s), then re-sorting the list so all of those lines fall together so they can be manually deleted?

Hope this helps.

Community

How to delete a duplicate line (all instances)

How to delete a duplicate line (all instances)

why can't search be used?