Visual Basic & Regular Expressions

texasmeds2 · Post by **texasmeds2** » Sat Sep 13, 2003 12:08 am

Has anyone made a module, or is there an easy way to transfer textpad's regular expressions in to VB? I am just now learning VB and have a decent knowledge of regular expressions and would hate to let it all go to waste.

Thanks,

Darin

P.S My other user name of Texasmeds would not let me log in or send me a password. Said I was not supplying same email address.

s_reynisson · Post by **s_reynisson** » Sat Sep 13, 2003 12:41 am

Google it?
http://www.google.com/search?hl=en&ie=U ... xpressions

texasmeds2 · Post by **texasmeds2** » Sat Sep 13, 2003 1:05 am

Hey, thanks for the rapid reply.

I have done that in the past. The link you posted shows 167,000 while if you ad textpad to it, it knocks it down to 384. Most of what is there is just talking about the text editor of choice, etc..

Maybe I am just missing something or trying to make it harder than it is, but isn't there a difference between basic regular expressions and Textpad's? It sure looks different from what I spent years learning.

Darin

s_reynisson · Post by **s_reynisson** » Sat Sep 13, 2003 1:18 am

I think they're the same all over, esp. if you go to
Configure->Preferences->Editor->Use POSIX...

I liked this link, from the horse's mouth sort of...
http://support.microsoft.com/default.as ... s%3B818802

Bob Hansen · Post by **Bob Hansen** » Sat Sep 13, 2003 1:59 am

VBScript supports many more syntax options than TextPad. You will be able to keep almost all that you have learned with TextPad. But you will have more tools to work with. Especially helpful will the Positive/Negative LookAhead tools.

Look at this list of available metacharacters:

Character.............Description
===========================================
\ Marks the next character as a special character, a literal, a
backreference, or an octal escape. For example, 'n' matches the character "n". '\n' matches a newline character. The sequence '\\' matches "\" and "\(" matches "(".

^ Matches the position at the beginning of the input string. If the RegExp object's Multiline property is set, ^ also matches the position following '\n' or '\r'.

$ Matches the position at the end of the input string. If the RegExp object's Multiline property is set, $ also matches the position preceding '\n' or '\r'.

* Matches the preceding character or subexpression zero or more times. For example, zo* matches "z" and "zoo". * is equivalent to {0,}.

+ Matches the preceding character or subexpression one or more times. For example, 'zo+' matches "zo" and "zoo", but not "z". + is equivalent to {1,}.

? Matches the preceding character or subexpression zero or one time. For example, "do(es)?" matches the "do" in "do" or "does". ? is equivalent to {0,1}

{n} n is a nonnegative integer. Matches exactly n times. For example, 'o{2}' does not match the 'o' in "Bob," but matches the two o's in "food".

{n,} n is a nonnegative integer. Matches at least n times. For example, 'o{2,}' does not match the "o" in "Bob" and matches all the o's in "foooood". 'o{1,}' is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'.

{n,m} m and n are nonnegative integers, where n <= m. Matches at least n and at most m times. For example, "o{1,3}" matches the first three o's in "fooooood". 'o{0,1}' is equivalent to 'o?'. Note that you cannot put a space between the comma and the numbers.

? When this character immediately follows any of the other quantifiers (*,+, ?, {n}, {n,}, {n,m}), the matching pattern is non-greedy. A non-greedy pattern matches as little of the searched string as possible, whereas the default greedy pattern matches as much of the searched string as possible. For example, in the string "oooo", 'o+?' matches a single "o", while 'o+' matches all 'o's.

. Matches any single character except "\n". To match any character including the '\n', use a pattern such as '[\s\S].

(pattern) Matches pattern and captures the match. The captured match can be retrieved from the resulting Matches collection, using the SubMatches collection in VBScript or the $0$9 properties in JScript. To match parentheses characters ( ), use '$' or '$'.

(?:pattern) Matches pattern but does not capture the match, that is, it is a non-capturing match that is not stored for possible later use. This is useful for combining parts of a pattern with the "or" character (|). For example, 'industr(?:y|ies) is a more economical expression than 'industry|industries'.

(?=pattern) Positive lookahead matches the search string at any point where a string matching pattern begins. This is a non-capturing match, that is, the match is not captured for possible later use. For example 'Windows(?=95|98|NT|2000)' matches "Windows" in "Windows 2000" but not "Windows" in "Windows 3.1". Lookaheads do not consume characters, that is, after a match occurs, the search for the next match begins immediately following the last match, not after the characters that comprised the lookahead.

(?!pattern) Negative lookahead matches the search string at any point where a string not matching pattern begins. This is a non-capturing match, that is, the match is not captured for possible later use. For example 'Windows (?!95|98|NT|2000)' matches "Windows" in "Windows 3.1" but does not match "Windows" in "Windows 2000". Lookaheads do not consume characters, that is, after a match occurs, the search for the next match begins immediately following the last match, not after the characters that comprised the lookahead.

x|y Matches either x or y. For example, 'z|food' matches "z" or "food". '(z|f)ood' matches "zood" or "food".

[xyz] A character set. Matches any one of the enclosed characters. For example, '[abc]' matches the 'a' in "plain".

[^xyz] A negative character set. Matches any character not enclosed. For example, '[^abc]' matches the 'p' in "plain".

[a-z] A range of characters. Matches any character in the specified range. For example, '[a-z]' matches any lowercase alphabetic character in the range 'a' through 'z'.

[^a-z] A negative range characters. Matches any character not in the specified range. For example, '[^a-z]' matches any character not in the range 'a' through 'z'.

\b Matches a word boundary, that is, the position between a word and a space. For example, 'er\b' matches the 'er' in "never" but not the 'er' in "verb".

\B Matches a nonword boundary. 'er\B' matches the 'er' in "verb" but not the 'er' in "never".

\cx Matches the control character indicated by x. For example, \cM matches a Control-M or carriage return character. The value of x must be in the range of A-Z or a-z. If not, c is assumed to be a literal 'c' character.

\d Matches a digit character. Equivalent to [0-9].

\D Matches a nondigit character. Equivalent to [^0-9].

\f Matches a form-feed character. Equivalent to \x0c and \cL.

\n Matches a newline character. Equivalent to \x0a and \cJ.

\r Matches a carriage return character. Equivalent to \x0d and \cM.

\s Matches any white space character including space, tab, form-feed, and so on. Equivalent to [ \f\n\r\t\v].

\S Matches any non-white space character. Equivalent
to [^ \f\n\r\t\v].

\t Matches a tab character. Equivalent to \x09 and \cI.

\v Matches a vertical tab character. Equivalent to \x0b and \cK.

\w Matches any word character including underscore. Equivalent
to '[A-Za-z0-9_]'.

\W Matches any nonword character. Equivalent to '[^A-Za-z0-9_]'.

\xn Matches n, where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long. For example, '\x41' matches "A". '\x041' is equivalent to '\x04' & "1". Allows ASCII codes to be used in regular expressions.

\num Matches num, where num is a positive integer. A reference back to captured matches. For example, '(.)\1' matches two consecutive identical characters.

\n Identifies either an octal escape value or a backreference. If \n is preceded by at least n captured subexpressions, n is a backreference. Otherwise, n is an octal escape value if n is an octal digit (0-7).

\nm Identifies either an octal escape value or a backreference. If /nm is preceded by at least nm captured subexpressions, nm is a backreference. If \nm is preceded by at least n captures, n is a backreference followed by literal m. If neither of the preceding conditions exist, \nm matches octal escape value nm when n and m are octal digits (0-7).

\nml Matches octal escape value nml when n is an octal digit (0-3) and m and l are octal digits (0-7).

\un Matches n, where n is a Unicode character expressed as four hexadecimal digits. For example, \u00A9 matches the copyright
symbol (Â©).

==========================

An excellent source of information is Mastering Regulare Expressions, 2nd edition, by Jefrey E. F. Friedl, publiched by O'Reily, copyright 2002. Be sure to check for 2nd edition. Price about $40 USD.

Try looking at this Microsoft MSDN information (source of syntax info above):
http://msdn.microsoft.com/library/defau ... syntax.asp

Hope this was helpful...............good luck,
Bob

texasmeds2 · Post by **texasmeds2** » Sat Sep 13, 2003 5:52 am

Hey you guys. Thanks so much for the info. It will help a lot I am sure.

Darin

texasmeds2 · Post by **texasmeds2** » Tue Sep 16, 2003 6:08 pm

s_reynisson wrote:I think they're the same all over, esp. if you go to
Configure->Preferences->Editor->Use POSIX...

Hey all,

I did not notice this portion of this post initially, only the link. I am glad you posted this because it's a prime example of what I was trying to say in my initial post of not being able to transport what I know into VB.

When I change the preferences as stated above, this sequence no longer works......

SAMPLE TEXT TO MANIPULATE WITH EXPRESSION:

http://ftp3.ru.freebsd.org/pub/pc/windo ... n/release/
http://www.gfa.net/pub/gfa/gfabasic/win ... /sysinfo2/
http://nctuccca.nctu.edu.tw/ftp/Windows/cygwin/release/
http://www.gfa.net/pub/gfa/gfabasic/win ... unk/hello/
http://www.gfa.net/pub/gfa/gfabasic/win ... ions/mort/
http://www.gfa.net/pub/gfa/gfabasic/win ... s/tsunami/

Find: $\.[0-9 a-z]+/$.+
Replace \1

Which would give me this normally.....

http://ftp3.ru.freebsd.org/
http://www.gfa.net/
http://nctuccca.nctu.edu.tw/
http://www.gfa.net/
http://www.gfa.net/
http://www.gfa.net/

but with the options changed (which I'm sure you are right in changing) it does not work. Which leads me right back to my original question which was.... "Has anyone made a module, or is there an easy way to transfer textpad's regular expressions in to VB?"

Darin

Post by **MudGuard** » Tue Sep 16, 2003 6:36 pm

Of course, if you switch to POSIX syntax, you have to USE POSIX syntax, i.e. remove the \ in front of the ( and )

texasmeds2 · Post by **texasmeds2** » Tue Sep 16, 2003 8:10 pm

Mudguard,

Thanks for your reply. So bottom line, will my regular expressions (The way I know them) be able to be plugged in to VB as is? Sorry for seeming so anal about this, it's just something I need to know before deciding to go that route with VB.

Also, if the answer is no..... then again I have to ask my original question which (although there has been some great input which I truly appreciate) has still not been directly answered and that is.... "Has anyone made a module, or is there an easy way to transfer textpad's regular expressions in to VB?"

Darin