Page 1 of 1
Replace all HTML tags with a comma
Posted: Wed Nov 23, 2016 12:40 am
by rrhandle
I need to replace the tags here with commas so I can turn it into a .cvs file. I know I will end up with multiple commas, but that is OK. I can do another replace and change ,, to ,
Found it
Posted: Wed Nov 23, 2016 3:49 am
by rrhandle
<.*?>
Posted: Wed Nov 23, 2016 10:39 pm
by MudGuard
<span title="-->">
is legal in HTML, your regex would only find the green part, not the red.
If you can guarantee that there is no > in an attribute value, then your regex works. But it does not work with html generally.
Posted: Thu Nov 24, 2016 9:04 am
by ben_josephs
<([^"]|"[^"]*")+?>
Posted: Fri Nov 25, 2016 7:40 am
by MudGuard
<img title="-->" alt='-->'>
or, to make it more tricky,
<img title="'-->" alt='"-->'>
;-)
If you really want to cover all variants, the regex must be quite complicated ...
Up to html 4 (i.e. as long as html was based on sgml), even <p/bla/ is legal html, considered the same as <p>bla</p> ...
Posted: Fri Nov 25, 2016 8:45 am
by ben_josephs
<([^"']|"[^"]*"|'[^']*')+?>
Leaving out the trailing > is just being unreasonable!
Have you tried the example for macros?
Posted: Sun Feb 12, 2017 4:46 pm
by pbaumann
You could use a variation of the first example for macros. The basic version is marking the entire tag. The example explains how to use DEL key to delete a HTML TAG and that you can perform this until all HTML TAGs are gone. So additionally to add DEL key you add the introduction of a comma.
Defining the macro in the proper way it will be executed until all the replacements have taken place.
Posted: Fri Mar 24, 2017 6:01 pm
by AmigoJack
I can top that: an HTML element (or "tag") can span across multiple lines and attributes don't have to be enclosed in quotations under certain conditions - this is perfectly legal:
Code: Select all
<a
href="one"
title=two
style='color: yellow'
>
Posted: Fri Mar 24, 2017 11:07 pm
by ben_josephs
The regular expression I suggested above handles that.