Exclude string, not just chars, from search
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
Exclude string, not just chars, from search
I want to find everything between a start and end code, e.g.:
<startcode> - whole lot of text including angle brackets etc - <endcode>
Of course if I search for: <startcode>[^<endcode>]*<endcode>
it's going to stop at any of the characters between [ and ]
How do I make it exclude only the entire string <endcode>?
(I suspect the answer may be elementary, but I'm afraid I haven't been able to find/figure it out.)
<startcode> - whole lot of text including angle brackets etc - <endcode>
Of course if I search for: <startcode>[^<endcode>]*<endcode>
it's going to stop at any of the characters between [ and ]
How do I make it exclude only the entire string <endcode>?
(I suspect the answer may be elementary, but I'm afraid I haven't been able to find/figure it out.)
-
ben_josephs
- Posts: 2464
- Joined: Sun Mar 02, 2003 9:22 pm
If the text to be matched spans an unknown number of lines you can't do that directly in TextPad, as it is incapable of matching text containing an arbitrary number of newlines.
If the text is all on one line, then <startcode>.*<endcode> will match everything from the first <startcode> to the last <endcode>, inclusive, because .* matches greedily; it matches as much as possible.
And <startcode>(.*)<endcode> will match the same thing, while capturing what is between <startcode> and <endcode> so that you can use that captured text in a replacement, where it is represented as \1.
This assumes you are using Posix regular expression syntax:
If the text is all on one line, then <startcode>.*<endcode> will match everything from the first <startcode> to the last <endcode>, inclusive, because .* matches greedily; it matches as much as possible.
And <startcode>(.*)<endcode> will match the same thing, while capturing what is between <startcode> and <endcode> so that you can use that captured text in a replacement, where it is represented as \1.
This assumes you are using Posix regular expression syntax:
If you need to match text that spans arbitrary number of newlines, you might try WildEdit (http://www.textpad.com/products/wildedit/), which uses a far more powerful regular expression engine than TextPad.Configure | Preferences | Editor
[X] Use POSIX regular expression syntax
However, if the line contains two instances of text between <startcode> and <endcode>, e.g.:
<startcode> text <endcode> blah blah <startcode> more text <endcode>
this greedy search will find the entire line instead of the individual coded portions. That's why I'm looking for a "not <endcode>" option in the search string, similar to, e.g.
{[^{}]*}
to find the individual portions in curly brackets in
{text} text {text} text
<startcode> text <endcode> blah blah <startcode> more text <endcode>
this greedy search will find the entire line instead of the individual coded portions. That's why I'm looking for a "not <endcode>" option in the search string, similar to, e.g.
{[^{}]*}
to find the individual portions in curly brackets in
{text} text {text} text
-
ben_josephs
- Posts: 2464
- Joined: Sun Mar 02, 2003 9:22 pm
-
ben_josephs
- Posts: 2464
- Joined: Sun Mar 02, 2003 9:22 pm
They are available in many recent regular expression recognisers. I showed you a non-greedy quantifier (*?). To find a match that doesn't contain text that matches a particular regular subexpression you can use negative lookahead assertions ((?!...)). For example, to solve your problem:
<startcode>(?:(?!<startcode>|<endcode>).)*<endcode>
(This also handles nested occurrences of <startcode>...<endcode> properly.)
Both of these constructs are available in WildEdit.
For functionality to be added to a regular expression recogniser it isn't sufficient that the proposed functionality is convenient. It has to fit into the underlying regular expression concept in such a way that its essential efficiency is maintained.
<startcode>(?:(?!<startcode>|<endcode>).)*<endcode>
(This also handles nested occurrences of <startcode>...<endcode> properly.)
Both of these constructs are available in WildEdit.
For functionality to be added to a regular expression recogniser it isn't sufficient that the proposed functionality is convenient. It has to fit into the underlying regular expression concept in such a way that its essential efficiency is maintained.
-
ben_josephs
- Posts: 2464
- Joined: Sun Mar 02, 2003 9:22 pm
This, posted here in various forms a number of times, may be of interest:
There are many regular expression tutorials on the web, and you will find recommendations for some of them if you search this forum.
A standard reference for regular expressions is
Friedl, Jeffrey E F
Mastering Regular Expressions, 3rd ed
O'Reilly, 2006
ISBN 10: 0-596-52812-4
http://regex.info/
But be aware that the regular expression recogniser used by TextPad is rather weak by the standards of recent tools, so you may get frustrated if you discover a handy trick that doesn't work in TextPad. The recogniser that WildEdit (http://www.textpad.com/products/wildedit/) uses (Boost) is far more powerful.
Edit: updated to 3rd edition.
There are many regular expression tutorials on the web, and you will find recommendations for some of them if you search this forum.
A standard reference for regular expressions is
Friedl, Jeffrey E F
Mastering Regular Expressions, 3rd ed
O'Reilly, 2006
ISBN 10: 0-596-52812-4
http://regex.info/
But be aware that the regular expression recogniser used by TextPad is rather weak by the standards of recent tools, so you may get frustrated if you discover a handy trick that doesn't work in TextPad. The recogniser that WildEdit (http://www.textpad.com/products/wildedit/) uses (Boost) is far more powerful.
Edit: updated to 3rd edition.
Last edited by ben_josephs on Thu May 10, 2007 2:39 pm, edited 1 time in total.
-
ben_josephs
- Posts: 2464
- Joined: Sun Mar 02, 2003 9:22 pm