Sorting, erasing multiple lines

Ideas for new features

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
bluebang
Posts: 4
Joined: Mon Sep 03, 2007 8:38 am

Sorting, erasing multiple lines

Post by bluebang »

Especially big file manipulation may require a sorting option
'delete all lines with double occurece of the same sorting argument'
as an extension of
'delte double lines'.

Example:

Code: Select all

Peter   England
Julia    France
Hans     Germany
Gerlinde Germany
Margit   Norway
Nils     Norway
Victor   Poland
Marek    Poland
John     USA
Task: From this tabel a list of 'touched' countries shall be derivated
Activating this option while sorting columns 9 ... will result in:

Code: Select all

Peter   England
Julia    France
Hans     Germany
Margit   Norway
Victor   Poland
John     USA
Searchis this forum I found some questions and suggestions concerning sorting end extraction that will become obsolet with this extension.
User avatar
MudGuard
Posts: 1295
Joined: Sun Mar 02, 2003 10:15 pm
Location: Munich, Germany
Contact:

Post by MudGuard »

But then the question arises: which of the lines with the doubled (tripled/quadrupled/...) sort keys is to be kept?


If you only want to find a list of (unique) country names:

select the column with the country names (block select), copy it into a new file, do a sort with deletion of duplicate lines.
bluebang
Posts: 4
Joined: Mon Sep 03, 2007 8:38 am

Post by bluebang »

The algorithm will keep the first entry maybe as a result of a previous sorting. This list is only thought to be an example. There are lots of more redundancies that might be valuable and should remain in the file.
nvj1662
Posts: 53
Joined: Thu May 17, 2007 10:02 am

Post by nvj1662 »

I suspect all this can be achieved via regualr expression. I'm sure if you post your requirement in that forum, one of the regex gurus will give you the answer.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

TextPad's regular expression recogniser doesn't allow back-references (such as \1) that refer back over a newline. So, within a TextPad regular expression, you can't refer back to text on a previous line that matched a subexpression of the same regular expression.

But you can do it with WildEdit (http://www.textpad.com/products/wildedit/), which uses a far more powerful regex recogniser. Repeatedly run this replacement:
Find what: ([^ ]+ +)(.+)\r?\n[^ ]+ +\2
Replace with: $1$2

[X] Regular expression
[X] Replacement format

Options
[X] '.' does not match a newline character
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Or, in WildEdit, run this just once:
Find what: ([^ ]+ +)(.+)(\r?\n[^ ]+ +\2)+
Replace with: $1$2

[X] Regular expression
[X] Replacement format

Options
[X] '.' does not match a newline character
Post Reply