Page 1 of 1

Delete Duplicate Lines (without sorting)

Posted: Mon Jan 12, 2004 8:59 pm
by talleyrand
I've been using TP for a few years now and this if the first time I've thought about this feature so it'll get a somewhat vote from me but I think it'd be nice to have the delete duplicate lines functionality (for a file or highlighted lines) outside of the sort tool. I can always shell out to an external program to accomplish this (which I'll do now) but in the future if it's not a bother, I could see it being handy.

Posted: Mon Jan 12, 2004 10:54 pm
by csalsa
This function could be easily written in a scripting lanaguage like Perl or Python. Why not write such a script and call it from the 'Tools' menu?

Personally I find Python more friendly than Perl and Python is JIT compiled at runtime.

Posted: Mon Jan 12, 2004 11:09 pm
by MudGuard
Full multiline regex-support is much more important.

And with that, it would be a simple replacement:

^(.*)$\n\1\n
by
\1\n

Posted: Tue Jan 13, 2004 6:44 am
by Bob Hansen
I think that will only work in one instance, if the duplicate lines are directly in sequence with no other lines in between.

:?: Thoughts:
Assume line 1 has duplicates in the file.

I think that will only work if line 2 is the only duplicate.

What if the duplicate line is line 3 or higher?

What if there are two or more duplicates of line 1 - lines 4,7,15?

Posted: Tue Jan 13, 2004 9:15 am
by MudGuard
Then use

^(.*)$\n(.*\n)?\1\n
by
\1\2\n

if necessary, repeat until no more occurences exist...

Posted: Tue Jan 13, 2004 6:21 pm
by Bob Hansen
AHA!
Full multiline regex-support
Not used to having that tool here. Thanks MudGuard for a good example.

Posted: Mon Apr 26, 2004 6:06 pm
by iangalbraith
I just picked up on this thread. The ability to use full multiline regex, as in Mudguard's example, would be enormously helpful. De-duping of records in a file is a regular requirement of mine - often hundreds of duplicates are present. (I can always sort to get them adjacent, so their initial order is not important.) The need to write or even to use somebody else's code is a pain when in principle a regex find/replace pair could do everything.

Ian