Page 1 of 1
Automated (Macro?) upper case to lower case
Posted: Wed Jul 07, 2004 3:32 pm
by GIS_USER
Is there a way to quickly automate HTML tags across several documents from upper case to lower case? I need this done for all tags across multiple files.
Here is a small portion of what I need done...
<BODY LINK="#3366CC" BGCOLOR="WHITE" MARGINWIDTH=0 MARGINHEIGHT=0>
<TABLE WIDTH="100%" CELLPADDING=0 CELLSPACING=0 BORDER=0> to
<body link="#3366cc" bgcolor="white" marginwidth=0 marginheight=0>
<table width="100%" cellpadding=0 cellspacing=0 border=0>
Thanks!
Posted: Wed Jul 07, 2004 4:30 pm
by ben_josephs
Here's one way, using regular expression search and replace:
Find what: <[^>]+>
Replace with: \L&
using
Configure | Preferences | Editor | Use POSIX regular expression syntax
Posted: Wed Jul 07, 2004 5:22 pm
by MudGuard
ben, your expression replaces too much - imagine this:
Code: Select all
<IMG SRC="BLA.PNG" ALT="Picture of George Washington">
would end up as
Code: Select all
<img src="bla.png" alt="picture of george washington">
Not only the alt text would be changed, but also the URL - which could lead to broken links.
Another thing - attributes like alt or title might contain the > character.
HTML can't be parsed with regexes alone - this is the reason why I do not offer a better solution...
What can be done is changing the element name:
Search for
Replace by
Automatically finding/changing the attribute names is much more difficult - as attributes may contain quotes, blanks, and = and >
e.g.
or
are both valid HTML attributes but give lots of trouble when trying to use regexes on them...
Posted: Wed Jul 07, 2004 5:27 pm
by MudGuard
Forgot something:
There are tools that have an HTML parser inside and can therefore do the job properly, e.g.
HTML Tidy
Posted: Wed Jul 07, 2004 8:27 pm
by ben_josephs
MudGuard wrote:ben, your expression replaces too much
You're quite right, of course. But it's OK for the example given. Apologies if it misled.
HTML can't be parsed with regexes alone
Indeed. Anything with arbitrarily nested structures can't be.
Posted: Thu Jul 08, 2004 12:51 pm
by GIS_USER
Thanks for the info everyone. MudGuard was correct in assuming the following scenario:
<IMG SRC="BLA.PNG" ALT="Picture of George Washington">
<img src="bla.png" alt="picture of george washington">
Of course, idealy the end result would be:
<img src="bla.png" alt="Picture of George Washington">
UNIX constraints also require file names (ie. bla.png) to be lower case, however any non-code text would remain the same. We are translating our HTML to XHTML
Posted: Thu Jul 08, 2004 3:37 pm
by MudGuard
UNIX constraints also require file names (ie. bla.png) to be lower case.
No. UNIX has case sensitive file names. Same goes for all derivates of UNIX that I know (including HP-UX, AIX, SINIX, Linux)
You can, e.g. have the following files in one folder:
bla
blA
bLa
bLA
Bla
BlA
BLa
BLA
without any problem.