How to delete lines based on frequecny of appearance

Pat Parker · Post by **Pat Parker** » Mon Feb 20, 2012 2:57 am

I have the following text:

Â§ 508 Harm Done by Indigenous Wild Animal After Its Escape
Â§ 509 Harm Done by Abnormally Dangerous Domestic Animals
1. A keeps a dog,
2. A keeps a dog
Â§ 510 Effect of Contributing Actions of Third Persons, Animals and Forces of Nature
1. A keeps on his
2. A keeps in his
3. A is the owner
4. An elephant
Â§ 511 Liability to Trespassers
Â§ 512 Liability to Trespassers for Negligence
1. A's pasture
2. A chains a
Â§ 513 Liability to Licensees and Invitees
Â§ 514 Harborers of Wild Animals or Abnormally Dangerous Domestic Animals
Â§ 515 Plaintiff's Conduct
Â§ 516 Watchdogs
Â§ 517 Animals Kept in Pursuance of a Public Duty
Â§ 518 Liability for Harm Done by Domestic Animals That Are Not Abnormally Dangerous
Â§ 519 General Principle
1. A, with
Â§ 520 Abnormally Dangerous Activities

I want to delete all but the last line that starts with the Â§ symbol for each group. The results should be as follows:

Â§ 509 Harm Done by Abnormally Dangerous Domestic Animals
1. A keeps a dog,
2. A keeps a dog
Â§ 510 Effect of Contributing Actions of Third Persons, Animals and Forces of Nature
1. A keeps on his
2. A keeps in his
3. A is the owner
4. An elephant
Â§ 512 Liability to Trespassers for Negligence
1. A's pasture
2. A chains a
Â§ 519 General Principle
1. A, with
Â§ 520 Abnormally Dangerous Activities

Is this at all possible?

Pat Parker · Post by **Pat Parker** » Mon Feb 20, 2012 4:28 am

I solved it from looking at other examples.

I did the following search:
(^Â§.*)\n(.*^Â§)

Then do Mark All, and delete them so you only have the relevant section.

ben_josephs · Post by **ben_josephs** » Mon Feb 20, 2012 8:59 am

Thanks for reporting your solution. It's a clever trick which I don't recall ever using.

How it works isn't obvious. It relies on two facts:

1. TextPad only marks the first line of each multi-line match; and

2. each match attempt starts one character after the start of the previous attempt. Thus after a successful match the search continues one character after the start of that match: it doesn't skip to the end of the match.

Note that your use of parentheses without backslashes assumes Posix regular expression syntax (which is a Good Thing).

The parentheses in your solution serve only to capture matches. But you don't need to capture anything, so they're not necessary. Thus the solution can be simplified to
^Â§.*\n.*^Â§

Note that \n.*^ matches

Code: Select all

\n            a newline
.*            any text within a line (i.e., not containing a newline) (see below)
^             the beginning of a line

where .* matches:

Code: Select all

.             any character other than newline
*             ... any (possibly zero) number of times

So .*^ after a newline matches nothing, and \n.*^ is equivalent to a simple newline: \n.

So this simpler version of your solution also works:
^Â§.*\nÂ§

Pat Parker · Post by **Pat Parker** » Mon Feb 20, 2012 11:53 pm

Ben:

Thanks. I have very little knowledge or experience with regular expressions, so I certainly stumbled into that solution.

One thing that has perplexed me for some time is how I can select a range of text on multiple lines using regular expression. For example, with the below text, I want to select the line END OF EXAM and the line ID: nnn (values will change) and select all the lines between them to delete them so all I have left are the some text and more text lines. I currently do this by simply recording a macro but would love to know if its possible through regular expressions.

some text
END OF EXAM
Page 6 of 6
Item 6
ID: 001
more text

ben_josephs · Post by **ben_josephs** » Tue Feb 21, 2012 8:00 am

If the number of lines betwenn the END OF EXAM and ID: nnn lines is fixed at, say, 2 then you can match them all explicitly:
END OF EXAM\n.*\n.*\nID: [0-9][0-9][0-9]\n

If the number of lines is not fixed then you can't do this in TextPad using regular expressions alone, as TextPad's regex recogniser is incapable of matching an arbitrary number of newlines. So your solution using a macro might be the best solution.