Regex replacement bug with accent char? (v7.1.0)

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
User avatar
jeffy
Posts: 323
Joined: Mon Mar 03, 2003 9:04 am
Location: Philadelphia

Regex replacement bug with accent char? (v7.1.0)

Post by jeffy »

Could someone please try this regex replacement, and tell me if they have success?

Take this paragraph:

Code: Select all

The value of each position is retrieved by `pos.getValue()` or `pos.isPath()`--I think `1` is a "wall" and `0` is the "path". (As an aside: The huge 2d-array should really contain one-bit `booleans`, instead of 4-byte `ints`, but *looking* at the array's code makes sense with `int`s, and doesn't with booleans... Note that it should at least be changed to `byte`s.)
And try to replace this: `\b(\w+)`\b
With this: <CODE>$1</CODE>

It works with replace-all, but I can't get it to work with replace-next. It does nothing. Escaping the accents doesn't match anything.

Am I missing something here?
ben_josephs
Posts: 2464
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

\b matches a word boundary, that is, the zero-length string between a word character (anything that matches \w) and a non-word character.

Back-tick (`) is a non-word character. So `\b only matches a back-tick if it's followed by a word character. Your regex matches the `int` in `int`s and the `byte` in `byte`s , but nothing else. Remove the \b at the end of your regex.

Also, the \b between ` and \w is redundant, so you can remove that as well. Try this:
`(\w+)`
User avatar
jeffy
Posts: 323
Joined: Mon Mar 03, 2003 9:04 am
Location: Philadelphia

Post by jeffy »

Goodness. My brain was fried last night. Of course.

I honestly think there's still something funky related to accents in regexes going on, as I was doing other related work last night, but of course that word boundary is in the wrong place.

Thanks!
User avatar
kengrubb
Posts: 324
Joined: Thu Dec 11, 2003 5:23 pm
Location: Olympia, WA, USA

Post by kengrubb »

ben_josephs is the Mace Windu of RE.
(2[Bb]|[^2].|.[^Bb])

That is the question.
Post Reply