Page 1 of 1

Why RE "137[^\x5d]" found pattern "137]"

Posted: Tue Oct 16, 2007 1:22 am
by tn7077
My file contained the following 2 lines:
1. source\fi_output_msg.c(408): case INTERROGATE_TRK_REQ_FC: /* fUNCTION CODE 137 */
2. source\library_lookup.c(779): 5142, /* itbl_atan[ 0][137] */

I used POSIX regular expression and found the following:
a. RE "137\x5d" found "137]" in line 2 as expected.
b. RE "137[\x5d]" found no occurrence. I expected it to find "137]" in line 2 as in a.
c. RE "137[^\x5d]" found both "137 " in line 1 and "137]" in line 2. I expected it to find only "137 " in line 1.

I wanted to search for "137" not followed by ']'.

Questions
1. Are my expectations correct?
2. Am I missing something?

Artifacts
TextPad version 5.0.3
Windows XP with SP2 installed

Posted: Tue Oct 16, 2007 5:50 am
by ben_josephs
137\x5d matches 137]. You can use the simpler regex 137] for that.
137[\x5d] matches 137 followed by any one of the characters \ x 5 d. This is not what you want.
137[^\x5d] matches 137 followed by any one character that is not one of \ x 5 d. This also is not what you want.
What you want is 137[^]].

Posted: Tue Oct 16, 2007 3:46 pm
by tn7077
Thanks, Ben.

Here's what I learned then:

1. "\" in [] is a literal "\", not an Esc character.

2. It looks to me that
to find "137[", one uses RE "137\["
to find "137]", one uses RE "137]"

Question: when does one need to use "\x<hexdigit><hexdigit>"?
an example would be appreciated.

Thanks again.

Posted: Tue Oct 16, 2007 5:53 pm
by ben_josephs
You need to quote a literal [ to indicate that it's not the beginning of a character class expression. A ] is only special to the right of a [, so one on its own doesn't need to be quoted. A similar rule applies to ( and ). On the other hand, a { is special only when it starts a legal interval operator. This anomaly is the result of the history of regex development.

You use the \xdd notation to represent characters that are difficult (for example, most control characters, such as ESC - \x1B) or impossible (for example, NULL - \x00) to represent literally.

When to use hex codes for characters?

Posted: Tue Oct 16, 2007 5:55 pm
by Kaizyn
http://www.regular-expressions.info/quickstart.html

Basically, if you want to match a non-printable character use the hex codes. Otherwise, it's not needed.

Posted: Tue Oct 16, 2007 6:31 pm
by ben_josephs
But be aware that the regular expression recogniser used by TextPad is very old and rather weak by the standards of recent tools. Much of what you will find at that site and the many other regex sites on the web is not available in TextPad, so you may get frustrated if you discover a handy trick that doesn't work in TextPad. The recogniser that WildEdit (http://www.textpad.com/products/wildedit/) uses (Boost) is far more powerful.

Posted: Tue Oct 16, 2007 7:12 pm
by tn7077
Appreciate the info.