Automated (Macro?) upper case to lower case

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
GIS_USER
Posts: 7
Joined: Wed Jul 07, 2004 3:18 pm

Automated (Macro?) upper case to lower case

Post by GIS_USER »

Is there a way to quickly automate HTML tags across several documents from upper case to lower case? I need this done for all tags across multiple files.

Here is a small portion of what I need done...

<BODY LINK="#3366CC" BGCOLOR="WHITE" MARGINWIDTH=0 MARGINHEIGHT=0>
<TABLE WIDTH="100%" CELLPADDING=0 CELLSPACING=0 BORDER=0> to

<body link="#3366cc" bgcolor="white" marginwidth=0 marginheight=0>
<table width="100%" cellpadding=0 cellspacing=0 border=0>

Thanks!
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Here's one way, using regular expression search and replace:
Find what: <[^>]+>
Replace with: \L&
using
Configure | Preferences | Editor | Use POSIX regular expression syntax
User avatar
MudGuard
Posts: 1295
Joined: Sun Mar 02, 2003 10:15 pm
Location: Munich, Germany
Contact:

Post by MudGuard »

ben, your expression replaces too much - imagine this:

Code: Select all

<IMG SRC="BLA.PNG" ALT="Picture of George Washington">
would end up as

Code: Select all

<img src="bla.png" alt="picture of george washington">
Not only the alt text would be changed, but also the URL - which could lead to broken links.

Another thing - attributes like alt or title might contain the > character.

HTML can't be parsed with regexes alone - this is the reason why I do not offer a better solution...

What can be done is changing the element name:

Search for

Code: Select all

</?[A-Za-z]+
Replace by

Code: Select all

\L&
Automatically finding/changing the attribute names is much more difficult - as attributes may contain quotes, blanks, and = and >
e.g.

Code: Select all

title="title='> > >'"
or

Code: Select all

title='title=">">">"'
are both valid HTML attributes but give lots of trouble when trying to use regexes on them...
User avatar
MudGuard
Posts: 1295
Joined: Sun Mar 02, 2003 10:15 pm
Location: Munich, Germany
Contact:

Post by MudGuard »

Forgot something:

There are tools that have an HTML parser inside and can therefore do the job properly, e.g.
HTML Tidy
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

MudGuard wrote:ben, your expression replaces too much
You're quite right, of course. But it's OK for the example given. Apologies if it misled.
HTML can't be parsed with regexes alone
Indeed. Anything with arbitrarily nested structures can't be.
GIS_USER
Posts: 7
Joined: Wed Jul 07, 2004 3:18 pm

Post by GIS_USER »

Thanks for the info everyone. MudGuard was correct in assuming the following scenario:

<IMG SRC="BLA.PNG" ALT="Picture of George Washington">

<img src="bla.png" alt="picture of george washington">

Of course, idealy the end result would be:

<img src="bla.png" alt="Picture of George Washington">

UNIX constraints also require file names (ie. bla.png) to be lower case, however any non-code text would remain the same. We are translating our HTML to XHTML
User avatar
MudGuard
Posts: 1295
Joined: Sun Mar 02, 2003 10:15 pm
Location: Munich, Germany
Contact:

Post by MudGuard »

UNIX constraints also require file names (ie. bla.png) to be lower case.
No. UNIX has case sensitive file names. Same goes for all derivates of UNIX that I know (including HP-UX, AIX, SINIX, Linux)

You can, e.g. have the following files in one folder:

bla
blA
bLa
bLA
Bla
BlA
BLa
BLA

without any problem.
Post Reply