Finding repeated lines

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
Henrique Serra

Finding repeated lines

Post by Henrique Serra »

Hi. I need to find and eliminate duplicate lines on a file with more than 5000 lines. I know this can be partly achieved with the "sort" + "eliminate duplicate lines" command. However what I need to do is slightly different.

Wherever there is a duplicate line I need to eliminate both the duplicate AND the original line. Since the file is very large and the strings quite long, it is a rather tedious process to do it by hand.

I've tried to understand the RE syntax for "find and replace" but could not come up with a solution. Can anybody help?

Henrique Serra
serra@cpd.ufmt.br
Ed Orchard

Re: Finding repeated lines

Post by Ed Orchard »

Are there only ever pairs of identical lines? If so then there is a (clunky) solution.
If there could be more than 2 identical lines then it won't work.

1. Select a character that is unused in the file (¬ in this example)
2. Sort without deleting duplicates
3. Combine pairs of lines by:
Search for ^\(.*\)\n\(.*\)$
Replace all with \1¬\2
4. Mark repeating pairs by:
Search for ^\([^¬]*\)¬\1$
Mark All
5. Delete marked lines
Edit/Cut other/Bookmarked lines
6. Remove ¬
Search ¬
Replace all with \n
7. Add a dummy line at start of file
8. Repeat steps 3 to 6
9. Remove dummy line
voila
Henrique Serra

Re: Finding repeated lines

Post by Henrique Serra »

Yes, Ed, your answer completely addressed the issue. Your clever solution works great. Thank you so very much for your help.

Henrique Serra
Post Reply