Page 1 of 1

Global S/R HTML Tag Management

Posted: Fri Dec 03, 2004 9:19 pm
by yowbooks
I'm doing eBooks and I need to use WildEdit to clean my HTML files to semantically pure text. I need to remove attributes to HTML tags.

REMOVE HTML TAG ATTRIBUTES and ARGUMENTS
I need to globally search all occurrences of the <P> tag with or without various attributes such as:

<P STYLE="text-align:center" align="center">
<p align="center">

which I need to change to a semantically clean tag:
<p>

REMOVE TAGS
In other cases, I need to globally remove a specific tag altogether. For example:

<a href="http://dx.doi.org/10.1572/1597720046">Palm eBook</a>

I need to remove the open and close tags, but leave the literal text as shown below.

Palm eBook

If you will help me with this, you will make my life immeasurably easier.

Regards, Marshall Masters

Posted: Fri Dec 03, 2004 9:32 pm
by MudGuard
for the p stuff:

search for
<p[^>]*>
replace by
<p>

might be problematic if you have > characters within attribute values - regex is good for many things, but not for everything.

for the remove tag: same method as above, but as you do not clearly specify what to remove, no code from me

Posted: Fri Dec 03, 2004 9:55 pm
by s_reynisson
To change
<a href="http://dx.doi.org/10.1572/1597720046">Palm eBook</a>
into
Palm eBook

search for
<a href=[^>]+>(.+?)</a>
replace by
$1
I'm using POSIX extended Regular expression and Replacement format.

Many thanks MudGaurd

Posted: Sat Dec 04, 2004 5:54 am
by yowbooks
I couldn't get your search string to work, but since I'm still at the bumbling fool stage and took a wild guess. I changed it to:

<p[^>]*.>

Worked like a champ.

Many many many thanks, Marshall

Thank you s_reynisson

Posted: Sat Dec 04, 2004 6:04 am
by yowbooks
The strip routine for the anchor worked beautifully. When this stuff works, it is flat out amazing.

Many thanks, Marshall