find a href links using a regular expression

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
abasey
Posts: 7
Joined: Fri Dec 23, 2005 5:06 pm

find a href links using a regular expression

Post by abasey »

I have found some code on the internet relating to this which should work but I cant get it to work with Wild Edit. Does anyone have a regular expression that could pull out all the possible different patterns of a href links.



I need to use a regular expression to do a search and replace because all of the links could be varied in very minute ways:


For example:

A href=�http://www.myserver.com <http://www.myserver.com/> �

A href=�http://myserver.com <http://myserver.com/> �

A href=�https://www.myserver.com <https://www.myserver.com/> � note the https vs http

A href=https://myserver.com <https://myserver.com/>


In addition, there could be a space(or multiple spaces) between ‘href’ and ‘=’ and between ‘=’ and ‘http(s)’
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

It's not clear exactly what your requirements are, but this will match all of your example expressions, with optional spaces around the "=" (and with the initial "<"s omitted):

A href *= *�?https?://[a-z0-9_.-]+ <https?://[a-z0-9_.-]+/> *�?
abasey
Posts: 7
Joined: Fri Dec 23, 2005 5:06 pm

no modifications

Post by abasey »

Hello ben_josephs and thanks for the help.

I dont know what I am doing wrong... I have tried lots of different expressions and your seems the most concise but I still can not get any replacements to occur. The html files are being searched but no changes are being made.

I am new to WildEdit. Are there option I need to be paying attention or settings that could be prohibiting what I need to do.

I basically am doing a move from a mac server to a pc server and I need to change all the links to point to the new server.

Anything else you could help me with would be great!

Thanks
Abasey
User avatar
talleyrand
Posts: 624
Joined: Mon Jul 21, 2003 6:56 pm
Location: Kansas City, MO, USA
Contact:

Post by talleyrand »

Well, ben_josephs has provided a regular expression to find your urls, not change them. What are you using as your replace criteria?
I choose to fight with a sack of angry cats.
abasey
Posts: 7
Joined: Fri Dec 23, 2005 5:06 pm

Post by abasey »

talleyrand,

ok I guess I didnt really know what I needed. Here it is..

As I said before I need to find any pattern of an a href link but only the ones that read

a href="http://alumni.indiana.edu or any variation of this in terms of extra spaces or capital vs. loweracase etc.. of course there may be more code after the initial a href="http://alumni.indiana.edu but that is the only portion I need to change after the link. I need to change it to a href="http://alumnitest.alumni.iu.edu

Also any http will need to stay http and any https will need to stay https

Examples of code I need to change:
<a href="http://alumni.indiana.edu/subscribe/">Subscribe</a>
<a href="https://alumni.indiana.edu/access/">Look up</a>
<A HREF="https://alumni.indiana.edu/career">Search</A>

<IMG SRC="http://www.alumni.indiana.edu/magazine/ ... /cover.jpg

so in all these examples links the alumni.indiana.edu would need to be changed to alumnitest.alumni.indiana.edu while keeping the https or http the same.

Also will this still find what I need A href *= *�?https?://[a-z0-9_.-]+ <https?://[a-z0-9_.-]+/> *�? I assume modifications to this will need to be made.

Thanks for your help and sorry if this is confusing becuase it is for me.. lol
this is my first time doing this sort of thing.

abasey@indiana.edu
User avatar
talleyrand
Posts: 624
Joined: Mon Jul 21, 2003 6:56 pm
Location: Kansas City, MO, USA
Contact:

Post by talleyrand »

You could win the jackpot with this question: will the urls you are wanting to change always be alumni.indiana.edu => alumnitest.alumni.iu.edu?

If so, it'd be trivial to make that change. Your Find what would be
alumni.indiana.edu
and your Replace with would be
alumnitest.alumni.iu.edu

This would allow the method of access (http, https, gopher, ftp, wais, etc) to remain the same as well as the resource name while just changing the URI. It also allows me to show off my knowledge of arcane protocols ;)
I choose to fight with a sack of angry cats.
abasey
Posts: 7
Joined: Fri Dec 23, 2005 5:06 pm

Post by abasey »

Thats what I thought too. There are instances where alumni.indiana.edu is merely text on the page and not part of a link and I am dealing with thousands of webpages. So I only want to change the instances of alumni.indiana.edu to alumnitest.iu.edu when it is found within the URL.

I agree its a trivial change but there are so many variations I thought a regular expression would be the way to go.

What do you think?

Thanks
Abasey
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Does the :// identify the instances that need to be changed?

If so, then you can replace ://alumni.indiana.edu/ with ://alumnitest.alumni.indiana.edu/.

In one of your examples the host was www.alumni.indiana.edu. What does that get changed to?
abasey
Posts: 7
Joined: Fri Dec 23, 2005 5:06 pm

same thing

Post by abasey »

hi ben_josephs

Yeah its gets changed to the same thing.

I will check on it when I get back to the office tommorow... and I'll probably have some more questions.

Thanks
abasey
Post Reply