Why is character class [a-z] not case sensitive?

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
dougwong55
Posts: 13
Joined: Tue Feb 15, 2005 1:00 am

Why is character class [a-z] not case sensitive?

Post by dougwong55 »

The character class [a-z] matches all letters except when the Match case box is checked. I would expect it to find only lower case letters.

Here is the example text:
A
a
B
b
C
c

Here is my RE, Match case was not checked:
^[a-z]

When I do a Find All, all the lines are marked. Why does it happen this way? More than once I've been bitten by this seeming bug. The same thing happens if I use the POSIX syntax.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

This is not a bug.

The regex [a-z] matches lower-case letters. If Match case is not selected it matches lower case letters, ignoring case; that is, it matches lower or upper case letters. That is as it should be.

POSIX syntax? You are using a very old version of TextPad with a very weak regex engine. The current version of TextPad has a much more powerful (Perl-compatible) regex engine (but Match case behaves the same, correct, way).
dougwong55
Posts: 13
Joined: Tue Feb 15, 2005 1:00 am

Post by dougwong55 »

I would expect [a-z] to match only those characters in the set and thus would not expect it to match A-Z. This has caused me much confusion.

Correct me if I'm wrong, but I thought [:lower:] is the POSIX usage. It's supported in TextPad 8.2.0, 64-bit.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

[a-z] does match only the characters a to z... unless you've told the recogniser to ignore case, in which case it does exactly that: it ignores case.

If you're using TextPad 8 you can use
(?-i:[a-z])
or
(?-i:[[:lower:]])
as a whole regex or anywhere within a larger regex to force case-sensitivity for that expression (or subexpression), whatever the Match case setting.

[a-z] matches any of the 26 lower-case letters in the (ASCII) range a to z.
[[:lower:]] matches any lower-case letter, including, for example, é and α.

POSIX syntax was a term used to distinguish two styles of regex in old versions of TextPad. I didn't realise you were referring to POSIX-standard character classes.
Post Reply