Page 1 of 1

Regex replacement bug with accent char? (v7.1.0)

Posted: Fri Mar 07, 2014 4:37 am
by jeffy
Could someone please try this regex replacement, and tell me if they have success?

Take this paragraph:

Code: Select all

The value of each position is retrieved by `pos.getValue()` or `pos.isPath()`--I think `1` is a "wall" and `0` is the "path". (As an aside: The huge 2d-array should really contain one-bit `booleans`, instead of 4-byte `ints`, but *looking* at the array's code makes sense with `int`s, and doesn't with booleans... Note that it should at least be changed to `byte`s.)
And try to replace this: `\b(\w+)`\b
With this: <CODE>$1</CODE>

It works with replace-all, but I can't get it to work with replace-next. It does nothing. Escaping the accents doesn't match anything.

Am I missing something here?

Posted: Fri Mar 07, 2014 8:28 am
by ben_josephs
\b matches a word boundary, that is, the zero-length string between a word character (anything that matches \w) and a non-word character.

Back-tick (`) is a non-word character. So `\b only matches a back-tick if it's followed by a word character. Your regex matches the `int` in `int`s and the `byte` in `byte`s , but nothing else. Remove the \b at the end of your regex.

Also, the \b between ` and \w is redundant, so you can remove that as well. Try this:
`(\w+)`

Posted: Fri Mar 07, 2014 1:34 pm
by jeffy
Goodness. My brain was fried last night. Of course.

I honestly think there's still something funky related to accents in regexes going on, as I was doing other related work last night, but of course that word boundary is in the wrong place.

Thanks!

Posted: Fri Mar 07, 2014 4:27 pm
by kengrubb
ben_josephs is the Mace Windu of RE.