Page 1 of 1

Newbie - find / replace question

Posted: Thu Jun 28, 2007 11:36 pm
by kessa
H Guys,

I wonder if someone can help me with the following?

I'm trying to do a find and replace to update some encoded characters but I can't seem to get it to work so I suspect I'd doing something wrong (but don't know what :oops: )

Here's what I'm using for the find command:

Code: Select all

\<(?:(&shy;)|(&nbsp;)|(&lt;)|(&gt;)|(&amp;)|(')|(&quot;)|(&Aacute;))\>
... and here's what I'm trying to replace it with...

Code: Select all

?1(&# 173;):?2(&# 160;):?3(&# 38;):?4(&# 62;):?5(&# 38;):?6(&# 39;):?7(&# 34;):?8(&# 193;)
Note: I've included a space between the # and the number so that the code displays in the forum - in the actual code there is no space.

Does anyone know where I'm going wrong?

Thanks
Kessa

Posted: Fri Jun 29, 2007 2:43 am
by Bob Hansen
We need to see samples of the "before" and "after" to help you with this.

Posted: Fri Jun 29, 2007 9:07 am
by ben_josephs
\< matches at the beginning of a word, defined as the empty string between a non-word character (or beginning of the line) and a word character.
\> matches at the end of a word, defined as the empty string between a word character and a non-word character (or end of the line).

& is not a word character, so \< cannot match in front of it.
; is not a word character, so \> cannot match behind it.

(\< and \> appear to be a little buggy, but that's another matter.)

Try
Find what: (&shy;)|(&nbsp;)|(&lt;)|(&gt;)|(&amp;)|(')|(&quot;)|(&Aacute;)
Replace with: ?1(&# 173;):?2(&# 160;):?3(&# 38;):?4(&# 62;):?5(&# 38;):?6(&# 39;):?7(&# 34;):?8(&# 193;) [Remove the spaces]

[X] Regular expression
[X] Replacement format
Or simply
Find what: &shy;|&nbsp;|&lt;|&gt;|&amp;|'|&quot;|&Aacute;

Posted: Fri Jun 29, 2007 12:04 pm
by kessa
Hi Ben,

Thanks for that - I'll give it a shot when I get home tonight.

1 quick question - do I also need to remove the opening "(?:" and the closing ")" ?

Cheers
Kessa

Posted: Fri Jun 29, 2007 12:54 pm
by ben_josephs
No, you can leave them in if you like. But they won't have any effect, unless you include the whole expression within a larger regex, as you did in your original version, in which case they act to restrict the extent of the alternation within them. But they won't do any harm either, as they don't capture any subexpressions.

Posted: Fri Jun 29, 2007 7:55 pm
by kessa
Hi Ben,
Just a quickie to say that unfortunately that didn't work:

Here's the result I got back:

Code: Select all

	Character Encoding:  windows-1252
	Root folder:  C:\inetpub\wwwroot\myfolder\xml
	File Filter:  myfile.xml
	Regular Expression:  true
	Replacement Format:  true
	Match Case: false
	Match Words: false
	Search Subfolders:  false
}
C:/inetpub/wwwroot/myfolder/xml/myfile.xml: 0 replacements made
Number of files searched: 1
Number of files modified: 0
Total changes made: 0
Any ideas?

Thanks
Kessa

Posted: Fri Jun 29, 2007 9:30 pm
by ben_josephs
No, because you haven't shown us the text you're searching.

Posted: Fri Jul 13, 2007 4:08 pm
by kessa
Hi Ben,

It's quite a big file and so I've reduced it to just 3 examples which hopefully should provide enough info?

I've tried adding it to this post, but the forum converts the encoding and so you may not get the same results - as a result, I've put a copy of the file up here (http://www.discoveryvillas.co.uk/temp.txt) temporarily so that you can take a look (I've saved it as a .txt file for the time being, but the file itself is normally .xml) - I've also had to remove the

Code: Select all

<?xml version="1.0" encoding="ISO-8859-1"?>
from the first line, otherwise IE renders it as XML anyway

Cheers!
Kessa

Posted: Fri Jul 20, 2007 3:16 pm
by ben_josephs
It works here.

Did you use
Find what: (&shy;)|(&nbsp;)|(&lt;)|(&gt;)|(&amp;)|(')|(&quot;)|(&Aacute;)
Replace with: ?1(&# 173;):?2(&# 160;):?3(&# 38;):?4(&# 62;):?5(&# 38;):?6(&# 39;):?7(&# 34;):?8(&# 193;) [Remove the spaces]

[X] Regular expression
[X] Replacement format
?

Posted: Wed Jul 25, 2007 8:51 pm
by kessa
Hi Ben,

Just a quick update on this - it seems that it works when I change the Character Encoding from the default to ISO-8859-1 - not sure why?

Cheers
Kieran