Page 1 of 1

I Can Consistently Crash TP 7.0.4 with RE

Posted: Wed Apr 24, 2013 5:38 pm
by kengrubb
I can consistently crash TP 7.0.4 using an RE Find, and I have sent the .dmp file to support@textpad.com

I am fairly adept with RE, and thus far the move to Perl RE isn't stopping me. In fact, it makes a lot of things easier.

I wanted to change the first \ in a line to a tab, and then change the last \ in a line to a tab.

The RE Find syntax that crashes TP 7.0.4 is this

^([^\]{1,})\\

I wanted to change it to this:

\1\t

I ended up changing \ to ÿ (\xFF), made my changes, then changed all remaining ÿ back to \

Can anyone help me with a better workaround?

Posted: Wed Apr 24, 2013 6:54 pm
by bbadmin
Ken, "^([^\]{1,})\\" is an invalid regular expression, which is causing the crash that will be fixed. Any literal "\" must be input as "\\", otherwise it escapes the next character. In this case it causes the closing "]" to be treated as a literal, so the opening "[" is never closed.

Try searching for: "([^\\\r\n]*)\\(.+)\\"
and replacing with: "$1\t$2\t"

Posted: Wed Apr 24, 2013 7:04 pm
by ben_josephs
I can reproduce that. The Boost regex library throws an exception: "Unmatched [ or [^ in character class declaration" but TextPad doesn't catch it.

You need to quote the backslash in the character set:
^([^\\]{1,})\\

Your regex requires a matched line to begin with a character that isn't a backslash; I assume that's intentional.

It's equivalent to
^([^\\]+)\\

But negated character sets match newline characters, so the whole regex matches across newlines. You need
^([^\\\r\n]+)\\

The placeholder symbol is now $, not \. So your replacement should be
$1\t

If the lines always contain at least two backslashes, you can do it all in one go:
Find what: ^([^\\\r\n]+)\\(.*)\\
Replace with: $1\t$2\t

Posted: Wed Apr 24, 2013 7:05 pm
by ben_josephs
Snap!

Posted: Mon Apr 29, 2013 7:56 pm
by kengrubb
Very much appreciated.

So "Not the literal \" is:

[^\\]

Rather than:

[^\]

This actually may clear up one or two similar issues where I was unable to find what I wanted with RE from the past.

Posted: Mon Apr 29, 2013 9:00 pm
by ben_josephs
The regex recogniser used by TextPad before version 7 did not require a backslash in a character set to be quoted with a backslash. So [^\] was the correct expression for any one character that isn't a backslash (note: this is not the same as "not a backslash").

The regex recogniser used by TextPad version 7 allows a character set to contain constructs that use backslash as an escape, such as \n and \x41. Therefore if a backslash is being used as a literal it must be quoted. So [^\\] is now the correct expression for any one character that isn't a backslash.

Note that the backslash has a split personaility. With Perl-style regexes (as in TextPad 7) there is a consistent rule. Before an alphanumeric character the backslash escapes the character: it changes it from a literal to something special. Before a non-alphanumeric charcter the backslash quotes the character: it changes it from something special to a literal.