Lowercase everything unless it's within single quotes

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
Art Metzer
Posts: 27
Joined: Mon Mar 06, 2006 5:31 pm

Lowercase everything unless it's within single quotes

Post by Art Metzer »

I would like to convert all characters in a file to lowercase, unless a character appears inside single quotes, in which case I would like to leave that character alone.

Pairs of single quotes will always appear on one line.

Don't worry about interwoven or unpaired single quotes; assume it's a straightforward exercise.

For example:

Before:

Code: Select all

SELECT COUNT(*)
FROM Table
Where FLAG = 'Y'
AND DATE > '23-Jun-2006'
After:

Code: Select all

select count(*)
from table
where flag = 'Y'
and date > '23-Jun-2006'
Thanks.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Find what: ([^']*)('([^']*)'|)
Replace with: \L\1\E\2

[X] Regular expression
This assumes you are using Posix regular expression syntax:
Configure | Preferences | Editor

[X] Use POSIX regular expression syntax
Art Metzer
Posts: 27
Joined: Mon Mar 06, 2006 5:31 pm

Post by Art Metzer »

Beautiful, Ben!

One question, though, for my own edification: what's the purpose of that pipe character near the end of the "Find what" string?
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Try it without.

It's the alternation operator. It means: match either what is on my left or what is on my right. That is, in this case: match '([^']*)' or nothing. This allows the whole regular expression to match either when unquoted text is followed by quoted text or when it isn't. If it isn't followed by quoted text it's followed by a newline (or the end of the file). So ([^']*)('([^']*)'|$) also works. So, incidentally, does ([^']*)(('([^']*)')?). But these are longer (and that last one sometimes captures an extra subexpression, because of the extra pair of parentheses, so it's potentially slower).

There is a symmetry between the optionality of the quoted text (that is, it might be absent) and the possible zero length of the unquoted text that precedes it (that is, it might be absent). Thus the regex ([^']+)('([^']*)'|) is wrong, because it fails to match when quoted text is at the beginning of a line (or immediately following other quoted text), that is, when the unquoted text preceding it is empty.

See TextPad's help under
Reference Information | Regular Expressions and
How to... | Find and Replace Text | Use Regular Expressions.
Post Reply