Exclude string, not just chars, from search

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
gdutoit
Posts: 7
Joined: Tue May 08, 2007 4:01 pm
Location: Cape Town

Exclude string, not just chars, from search

Post by gdutoit »

I want to find everything between a start and end code, e.g.:
<startcode> - whole lot of text including angle brackets etc - <endcode>

Of course if I search for: <startcode>[^<endcode>]*<endcode>
it's going to stop at any of the characters between [ and ]

How do I make it exclude only the entire string <endcode>?

(I suspect the answer may be elementary, but I'm afraid I haven't been able to find/figure it out.)
ben_josephs
Posts: 2464
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

If the text to be matched spans an unknown number of lines you can't do that directly in TextPad, as it is incapable of matching text containing an arbitrary number of newlines.

If the text is all on one line, then <startcode>.*<endcode> will match everything from the first <startcode> to the last <endcode>, inclusive, because .* matches greedily; it matches as much as possible.

And <startcode>(.*)<endcode> will match the same thing, while capturing what is between <startcode> and <endcode> so that you can use that captured text in a replacement, where it is represented as \1.

This assumes you are using Posix regular expression syntax:
Configure | Preferences | Editor

[X] Use POSIX regular expression syntax
If you need to match text that spans arbitrary number of newlines, you might try WildEdit (http://www.textpad.com/products/wildedit/), which uses a far more powerful regular expression engine than TextPad.
gdutoit
Posts: 7
Joined: Tue May 08, 2007 4:01 pm
Location: Cape Town

Post by gdutoit »

However, if the line contains two instances of text between <startcode> and <endcode>, e.g.:

<startcode> text <endcode> blah blah <startcode> more text <endcode>

this greedy search will find the entire line instead of the individual coded portions. That's why I'm looking for a "not <endcode>" option in the search string, similar to, e.g.

{[^{}]*}

to find the individual portions in curly brackets in

{text} text {text} text
ben_josephs
Posts: 2464
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

You can't do this is TextPad, but you can in WildEdit, with a non-greedy repeat:
<startcode>.*?<endcode>

In WildEdit .*? matches non-greedily; it matches the shortest possible substring that allows the whole expression to match.
gdutoit
Posts: 7
Joined: Tue May 08, 2007 4:01 pm
Location: Cape Town

Post by gdutoit »

Thanks man, that will certainly save some frustration.

(I'm surprised, though, that an option for non-greedy search or something along the lines of [^"string"]*, where everything between quotes is excluded, isn't standard fare in REs.)
ben_josephs
Posts: 2464
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

They are available in many recent regular expression recognisers. I showed you a non-greedy quantifier (*?). To find a match that doesn't contain text that matches a particular regular subexpression you can use negative lookahead assertions ((?!...)). For example, to solve your problem:
<startcode>(?:(?!<startcode>|<endcode>).)*<endcode>
(This also handles nested occurrences of <startcode>...<endcode> properly.)

Both of these constructs are available in WildEdit.

For functionality to be added to a regular expression recogniser it isn't sufficient that the proposed functionality is convenient. It has to fit into the underlying regular expression concept in such a way that its essential efficiency is maintained.
gdutoit
Posts: 7
Joined: Tue May 08, 2007 4:01 pm
Location: Cape Town

Post by gdutoit »

Thanks again.

I haven't used WildEdit (it seemed to me that a utility like BK ReplacEm, which is free and can do long lists of replacements on multiple files, makes more sense.

But it seems I should give WildEdit a try. Will download the trial version immediately!
gdutoit
Posts: 7
Joined: Tue May 08, 2007 4:01 pm
Location: Cape Town

Post by gdutoit »

Well I gave WildEdit a try, and it's all there in the Help!

Suppose I should've gone there first, and saved you some trouble. (But, in mitigation, I didn't expect WildEdit's functionality to be different from TextPad.)
ben_josephs
Posts: 2464
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

This, posted here in various forms a number of times, may be of interest:

There are many regular expression tutorials on the web, and you will find recommendations for some of them if you search this forum.

A standard reference for regular expressions is

Friedl, Jeffrey E F
Mastering Regular Expressions, 3rd ed
O'Reilly, 2006
ISBN 10: 0-596-52812-4
http://regex.info/

But be aware that the regular expression recogniser used by TextPad is rather weak by the standards of recent tools, so you may get frustrated if you discover a handy trick that doesn't work in TextPad. The recogniser that WildEdit (http://www.textpad.com/products/wildedit/) uses (Boost) is far more powerful.

Edit: updated to 3rd edition.
Last edited by ben_josephs on Thu May 10, 2007 2:39 pm, edited 1 time in total.
User avatar
MudGuard
Posts: 1295
Joined: Sun Mar 02, 2003 10:15 pm
Location: Munich, Germany
Contact:

Post by MudGuard »

There is a third ed from last August (which I haven't got, so I can't say whether it is better than 2nd ...)
ben_josephs
Posts: 2464
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

So there is! Thanks. Earlier posting updated.
Post Reply