Page 1 of 1

Replace all HTML tags with a comma

Posted: Wed Nov 23, 2016 12:40 am
by rrhandle
I need to replace the tags here with commas so I can turn it into a .cvs file. I know I will end up with multiple commas, but that is OK. I can do another replace and change ,, to ,

Found it

Posted: Wed Nov 23, 2016 3:49 am
by rrhandle
<.*?>

Posted: Wed Nov 23, 2016 10:39 pm
by MudGuard
<span title="-->">

is legal in HTML, your regex would only find the green part, not the red.

If you can guarantee that there is no > in an attribute value, then your regex works. But it does not work with html generally.

Posted: Thu Nov 24, 2016 9:04 am
by ben_josephs
<([^"]|"[^"]*")+?>

Posted: Fri Nov 25, 2016 7:40 am
by MudGuard
<img title="-->" alt='-->'>

or, to make it more tricky,

<img title="'-->" alt='"-->'>

;-)


If you really want to cover all variants, the regex must be quite complicated ...

Up to html 4 (i.e. as long as html was based on sgml), even <p/bla/ is legal html, considered the same as <p>bla</p> ...

Posted: Fri Nov 25, 2016 8:45 am
by ben_josephs
<([^"']|"[^"]*"|'[^']*')+?>

Leaving out the trailing > is just being unreasonable!

Have you tried the example for macros?

Posted: Sun Feb 12, 2017 4:46 pm
by pbaumann
You could use a variation of the first example for macros. The basic version is marking the entire tag. The example explains how to use DEL key to delete a HTML TAG and that you can perform this until all HTML TAGs are gone. So additionally to add DEL key you add the introduction of a comma.

Defining the macro in the proper way it will be executed until all the replacements have taken place.

Posted: Fri Mar 24, 2017 6:01 pm
by AmigoJack
I can top that: an HTML element (or "tag") can span across multiple lines and attributes don't have to be enclosed in quotations under certain conditions - this is perfectly legal:

Code: Select all

<a
 href="one"
 title=two
 style='color: yellow'
>

Posted: Fri Mar 24, 2017 11:07 pm
by ben_josephs
The regular expression I suggested above handles that.