Page 1 of 1

Keep only text that is between certain tag <a </a>

Posted: Wed Sep 05, 2007 5:21 pm
by steve1040
How do I only keep the text that is between <a </a>?

This is the only constant in the website address: href="http://www.website.com/xxxxx/wrwrwr/

The only text I want to keep in the following example is:
Current Gen finder (CG)

<b><a
href="http://www.website.com/xxxxx/wrwrwr/Pla ... l">Current
Gen finder (CG)</a></b> - A bunch of text here weder erer ere find skjewkrje wdlwerje that you are looking for.<o:p></o:p></p>

Posted: Thu Sep 06, 2007 12:36 pm
by steve1040
^bump

Posted: Thu Sep 06, 2007 1:16 pm
by ben_josephs
This is not something to which TextPad is ideally suited. In particular, its regular expression engine is incapable of matching text containing an arbitrary number of newlines.

But you can do it with WildEdit (http://www.textpad.com/products/wildedit/), which uses a far more powerful regular expression engine than TextPad. Try something like
Find what: .*?<a.+?>(.*?)</a>.*?(?=<a|\Z)
Replace with: $1\n

[X] Regular expression
[X] Replacement format

Options
[ ] '.' does not match a newline character [i.e., not selected]

Posted: Fri Sep 07, 2007 11:59 am
by nitinmukesh123
<b><a href="http://www.website.com/xxxxx/wrwrwr/Pla ... l">Current Gen finder (CG)</a></b> - A bunch of text here weder erer ere find skjewkrje wdlwerje that you are looking for.<o:p></o:p></p>
<b><a href="http://www.website.com/xxxxx/wrwrwr/Pla ... l">Current Gen finder (CG)1</a></b> - A bunch of text here weder erer ere find skjewkrje wdlwerje that you are looking for.<o:p></o:p></p>
Find what .*<a.*>(.*)</a>.*\n
Replace with \1\n

Result
Current Gen finder (CG)
Current Gen finder (CG)1
Newbie at regex stuff so not sure it will work for all conditions you might have.

TextPad v4.7.3

Posted: Fri Sep 07, 2007 12:30 pm
by ben_josephs
That doesn't work if the <a> elements straddle newlines, as in the original poster's example, or if there is more than one <a> element on a line.

Posted: Tue Sep 25, 2007 1:40 pm
by textpad-fan
Detagger can do that and much more.