Page 1 of 1
Regex bug with lookahead AND lookbehind
Posted: Thu Aug 22, 2013 2:38 am
by jeffy
I'm pretty sure I found a regex related bug here. Could someone please confirm? I'm using 7.0.9.
Put the cursor somewhere in the middle word and search for '$' down (regex, without the quotes). The cursor goes to the end of the line. Then search for '(?<=\S)\t(?=\S)' down. It wraps around and finds the first tab (a tab between two non-whitespace characters).
Now do it again, but this time search for '(?<=\S)\t(?=\S)'
up. It doesn't work (says not found).
However, if you put the cursor at the start of the line (actually, I think anywhere before the tab itself) and then search for the same thing, it works.
Thanks for checking!
Posted: Thu Aug 22, 2013 6:58 pm
by ben_josephs
You're right. In fact it appears that all look-behind expressions fail when searching backwards. I suspect that fixing this would be a non-trivial job.
Negative look-behind ( (?<!...) ) appears not to work at all. It always matches, so the regex as a whole behaves as if the look-behind assertion wasn't there.
Posted: Fri Aug 23, 2013 5:05 pm
by jeffy
I'm glad I'm not the only one. Thanks.
I wonder why it's a non-trivial change, although I expect the explanation is non-trivial

Posted: Mon Oct 07, 2013 2:51 pm
by jeffy
Another example of this problem I just encountered:
Code: Select all
(?<=[ \t])\bIIMeta\b\s*\(\s*\b(\w+)(|(?:<[?\w ]+>)|(?:<[^<]*<[?\w ]+>[^>]*>)|(?:<[^<]*<[^<]*<[?\w ]+>[^>]*>[^>]*>))\s+(\w+)\b(?!>)\s*\)
This finds a Java function signature with exactly 1 parameter (including up to three levels of generics after the type). It should find either of these:
Code: Select all
public IIMeta(IIMeta ii_toCopy) {
public IIMeta(String s_instanceName) {
specifically selecting only "IIMeta(...)", but it doesn't work.
However, removing the back-reference (despite selecting the initial whitespace character, which is what I'm trying to avoid) does work:
Code: Select all
[ \t]\bIIMeta\b\s*\(\s*\b(\w+)(|(?:<[?\w ]+>)|(?:<[^<]*<[?\w ]+>[^>]*>)|(?:<[^<]*<[^<]*<[?\w ]+>[^>]*>[^>]*>))\s+(\w+)\b(?!>)\s*\)
Also, the backreference DOES work if you first click the very top of the document, and then search down, but it only finds the first instance, and then gets stuck again.
Posted: Mon Oct 07, 2013 3:56 pm
by ben_josephs
If you prefix your regex with the modifier
(?x) you can add white space without changing the meaning, making the regex (perhaps) somewhat easier to read:
Code: Select all
(?x) (?<=[ \t]) \b IIMeta \b \s* \( \s* \b (\w+) ( | (?:<[?\w ]+>) | (?:<[^<]*<[?\w ]+>[^>]*>) | (?:<[^<]*<[^<]*<[?\w ]+>[^>]*>[^>]*>) ) \s+ (\w+) \b (?!>) \s* \)
For the sake of simplicity, remove the Java template stuff, which isn't required in your example:
Code: Select all
(?x) (?<=[ \t]) \b IIMeta \b \s* \( \s* \b (\w+) \s+ (\w+) \b (?!>) \s* \)
It is now apparent that the word boundary anchors (
\b) are redundant, and can be removed:
Code: Select all
(?x) (?<=[ \t]) IIMeta \s* \( \s* (\w+) \s+ (\w+) (?!>) \s* \)
As can the
(?!>) look-ahead:
Code: Select all
(?x) (?<=[ \t]) IIMeta \s* \( \s* (\w+) \s+ (\w+) \s* \)
And some parentheses:
Code: Select all
(?x) (?<=[ \t]) IIMeta \s* \( \s* \w+ \s+ \w+ \s* \)
It is now clear that your regex matches the function name and its single parenthesised typed parameter.
I don't see anything wrong.
(By the way, the expression
(?<=[ \t]) is a look-behind assertion, not a back-reference.)
Posted: Wed Oct 09, 2013 1:40 pm
by bbadmin
The Boost regular expression engine used by TextPad does not support backwards searches. (It's author says it would require a whole new state machine implementation and the dropping of lots of features.) The workaround that TextPad implements is to iterate backwards, a character at a time, and try to match the search pattern forwards from there. This works in most cases, but not with look behinds.