Replacement tags when first tag empty...

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
daveokeeffe
Posts: 10
Joined: Tue May 17, 2005 1:53 pm

Replacement tags when first tag empty...

Post by daveokeeffe »

Hi,

I had a big code conversion job to do, and it required TP's regexp, so I decided to write a wee tutorial for the others in my office. Sadly, it didn't turn out as I expected it to.

Basically...

My search expression matches two groups. The first group may or may not be matched.
My replacement expression disregards first match completely, and sticks the second match onto the end of some plain text.

In the case where the first match is not fulfilled, the second match gets shifted to the first, thus completely destroying my replacement expression!

Does anyone know of a workaround for this? It makes sense that it would happen, but I'd really hoped it wouldn't.

Here's a full copy of my 'tutorial', comments and criticisms on it, if you're willing...

Thanks in advance, I'll not be so bold as to offer $25 for an answer! (some people eh?)

Task:

Replace all references to the public final statics contained within myContainer
(across all files) with defines


Problem:

The statics can be referenced from within the class, outside the class
through the uninstantiated object, outside the class by referencing an
instatiated myContainer object, outside the class by using an IQ
function that returns the global instance of the class (ie: getContainer())

Constructing the Search Expression:

1. Match all vars/functions using a character class: [A-Za-z0-9_]+

2. Lock this character class to the ends of words using the <> characters: \<[A-Za-z0-9_]+\>

3. Account for the brackets used in functions, making it a 0 or more match: [\(\)]*

4. Putting 2 + 3 together: \<[A-Za-z0-9_]+\>[\(\)]*

5. Make the match require a period immediately after the var/function match:
\<[A-Za-z0-9_]+\>[\(\)]*\.

6. Group this match together so we can make it a zero or more match. This will
allow the full regexp to match the single usage of the vars as well as the
Object.member style reference:
\(\<[A-Za-z0-9_]+\>[\(\)]*\.\)*

7. Use the (thingone|thingtwo) (brackets and pipe must be escaped) functionality to
match all the vars:
\(kState_Game\|kState_Attract\|kState_Max\|kCat_Nil\|kCat_Nor\|kCat_For\|kCat_Adv\|kCat_Ers\|kCat_Max\|kRSt_Nor\|kRSt_Erase\|kRSt_EraseEffect\|kRSt_Wait\)

8. Combine match 6 with match 7 to create the full regular expression
\(\<[A-Za-z0-9_]+\>[\(\)]*\.\)*\(kState_Game\|kState_Attract\|kState_Max\|kCat_Nil\|kCat_Nor\|kCat_For\|kCat_Adv\|kCat_Ers\|kCat_Max\|kRSt_Nor\|kRSt_Erase\|kRSt_EraseEffect\|kRSt_Wait\)


Test the above regexp on the test strings below. Strings referencing the members via
objects will match twice, but this doesn't matter as we'll be using a replacement
expression to do the actual work. This means that as soon as one string is matched
and replaced, the second match is removed.


Constructing the Replacement Expression:

We need access to the name of the member variable we're catching, and we use the match
tags to do this (\1 \2 \x...). These match tags contain the text matched by the groups
in your search expression. Groups are defined by brackets, and we have two groups.
The first is the object match, the second is the variable match.

Replacement expression: CONTAINER_\2


Select the test pieces below and test the expression.



PROBLEM:

The member vars that are referenced without object name are replaced with 'CONTAINER_\2',
showing that the \1 contains what I had hoped would be \2.



Test Pieces:

// a member we don't want to match
weDONTmatchTHIS

// list of members we DO want to match
kState_Game
kState_Attract
kState_Max

kCat_Nil
kCat_Nor
kCat_For
kCat_Adv
kCat_Ers
kCat_Max

kRSt_Nor
kRSt_Erase
kRSt_EraseEffect
kRSt_Wait


// different usages
myContainer.weDONTmatchTHIS

myContainer.kState_Game
myContainer.kState_Attract
myContainer.kState_Max

myContainer.kCat_Nil
myContainer.kCat_Nor
myContainer.kCat_For
myContainer.kCat_Adv
myContainer.kCat_Ers
myContainer.kCat_Max

myContainer.kRSt_Nor
myContainer.kRSt_Erase
myContainer.kRSt_EraseEffect
myContainer.kRSt_Wait



getContainer().weDONTmatchTHIS

getContainer().kState_Game
getContainer().kState_Attract
getContainer().kState_Max

getContainer().kCat_Nil
getContainer().kCat_Nor
getContainer().kCat_For
getContainer().kCat_Adv
getContainer().kCat_Ers
getContainer().kCat_Max

getContainer().kRSt_Nor
getContainer().kRSt_Erase
getContainer().kRSt_EraseEffect
getContainer().kRSt_Wait
ben_josephs
Posts: 2459
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

What you are looking for is a way to group subexpressions without capturing them. The regular expression recogniser used by TextPad provides no way to do this. More modern recognisers do support non-capturing groups, using the syntax (?:...). For example, Helios's own WildEdit (http://www.textpad.com/products/wildedit/), which uses the Boost recogniser.

A tip to aid your sanity: Use Posix regular expression syntax:
Configuration | Preferences | Editor

[X] Use POSIX regular expression syntax
Also, if you reverse the order of kRSt_Erase and kRSt_EraseEffect you may avoid a problem that seems sometimes to arise with TextPad's recogniser when you feed it an alternation containing two items that match at the same position, where the first matches a shorter subexpression than the second.
ben_josephs
Posts: 2459
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

[\(\)]* is probably not what you wanted. It matches a (possibly empty) sequence of open or close parentheses or backslashes. [...] is a quoting operator.
ben_josephs
Posts: 2459
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Here's a partial solution, that you can use if you know the maximum number of dotted prefixes. This version assumes a maximum of two. Note the empty element in the alternation. And note that it uses Posix syntax!
Find what: (|[A-Za-z0-9_]+[()]*\.|[A-Za-z0-9_]+[()]*\.[A-Za-z0-9_]+[()]*\.)(kState_Game|kState_Attract) (etc.)
Replace with: CONTAINER_\2

[X] Regular expression
I don't think you need the \<...\>.

[Edit: corrected typo in regex.]
daveokeeffe
Posts: 10
Joined: Tue May 17, 2005 1:53 pm

Post by daveokeeffe »

Thanks Ben, this is all good stuff. Pity about textpads inability to discard groups, but no matter. I get the feeling I should get with the program and use posix syntax? I'll give it a go, but I promise nothing :p

You're right about the [\(\)], plain old \(\) is a more exact match for that.

Thanks for all the suggestions, it's all good info...
Post Reply