General questions about using TextPad
Moderators: AmigoJack , bbadmin , helios , MudGuard
Adrian
Posts: 2 Joined: Tue Jul 03, 2012 3:52 pm
Post
by Adrian » Tue Jul 03, 2012 4:23 pm
Hello,
I would like to remove duplicate lines from a text file. Thus, if my input file looks like this:
I would like to generate output like this:
Using Google I found a small regular expression. Consequently, I activated POSIX regular expressions and tried to replace
by
Unfortunately, I received the error "Invalid regular expression". I tried a very similar expression in a different Editor and it worked. What am I doing wrong?
Thank your very much in advance,
Adrian
ak47wong
Posts: 703 Joined: Tue Aug 12, 2003 9:37 am
Location: Sydney, Australia
Post
by ak47wong » Tue Jul 03, 2012 5:07 pm
TextPad's regular expression engine doesn't allow the use of backreferences in the search string. Your options are:
Use the other text editor you tried.
Try this tool .
Use the Sort function in TextPad (Tools > Sort ) and select Delete duplicate lines , provided sorting the file at the same time is acceptable.
ben_josephs
Posts: 2464 Joined: Sun Mar 02, 2003 9:22 pm
Post
by ben_josephs » Tue Jul 03, 2012 9:34 pm
In fact, TextPad does allow back-references in a search string; it just doesn't allow them to refer back over a newline.
For example,
\<([^ ]+) \1\>
matches repeated words within a line.
Adrian
Posts: 2 Joined: Tue Jul 03, 2012 3:52 pm
Post
by Adrian » Wed Jul 04, 2012 4:09 pm
Thank you very much for the fast response. Now that I know why my expression failed I was able to solve the problem with a three step approach:
1. Replacing all newlines by a unique string.
2. Replacing duplicate "lines"
Code: Select all
Search: XXX(.*)XXX\1
Replace: XXX\1
3. Replacing the unique strings by newlines again.
It is a little bit ugly, but works for me and can maybe help someone else
Kind regards,
Adrian