Deleting Duplicate Lines Using Tagged Expressions

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, MudGuard

Post Reply
Adrian
Posts: 2
Joined: Tue Jul 03, 2012 3:52 pm

Deleting Duplicate Lines Using Tagged Expressions

Post by Adrian »

Hello,

I would like to remove duplicate lines from a text file. Thus, if my input file looks like this:

Code: Select all

line1
row2
row2
row3
line4
I would like to generate output like this:

Code: Select all

line1
row2
row3
line4
Using Google I found a small regular expression. Consequently, I activated POSIX regular expressions and tried to replace

Code: Select all

 ^(.*)\n\1 
by

Code: Select all

 \1 
Unfortunately, I received the error "Invalid regular expression". I tried a very similar expression in a different Editor and it worked. What am I doing wrong?

Thank your very much in advance,

Adrian
ak47wong
Posts: 703
Joined: Tue Aug 12, 2003 9:37 am
Location: Sydney, Australia

Post by ak47wong »

TextPad's regular expression engine doesn't allow the use of backreferences in the search string. Your options are:
  1. Use the other text editor you tried.
  2. Try this tool.
  3. Use the Sort function in TextPad (Tools > Sort) and select Delete duplicate lines, provided sorting the file at the same time is acceptable.
ben_josephs
Posts: 2464
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

In fact, TextPad does allow back-references in a search string; it just doesn't allow them to refer back over a newline.

For example,
\<([^ ]+) \1\>
matches repeated words within a line.
Adrian
Posts: 2
Joined: Tue Jul 03, 2012 3:52 pm

Post by Adrian »

Thank you very much for the fast response. Now that I know why my expression failed I was able to solve the problem with a three step approach:

1. Replacing all newlines by a unique string.

Code: Select all

Search:  \n
Replace: XXX
2. Replacing duplicate "lines"

Code: Select all

Search:  XXX(.*)XXX\1
Replace: XXX\1
3. Replacing the unique strings by newlines again.

Code: Select all

Search:  XXX
Replace: \n
It is a little bit ugly, but works for me and can maybe help someone else ;-)

Kind regards,

Adrian
Post Reply