Selecting blocks of tagged (e.g XML) text

TobyR · Post by **TobyR** » Fri Aug 18, 2006 8:18 am

Hi all,
I know there have been a *lot* of posts about finding and replacing text which is de-limited with tags (e.g. HTML, XML) but which spans multiple lines.

I have a large document (8k lines) in which I use various tags e.g. <PROJ1> to start a block on Project 1, and </> to close an open tag (I only need one close tag).

In most respects the large size of the document is not a problem for Textpad. However, for finding blocks of text, when I tried to use the method of temporarily replaing all '\n' with a temporary identifier (~#~), as recommended elsewhere in this forum, I have problems. What I have tried to do is (after the temp replacement) to find all text which lies between a close tag and the next open tag of intereset (<PROJ1> here) and delete that text.

So, find </>.*<PROJ1> replace with </>/n<PROJ1>

This works for small blocks of text but when I try it in my main file I get a message 'recursion too deep' - which I can sort of understand because there can be a lot of lines in between the blocks of interest.

I tried WildEdit but a) couldn't get it to do what I want and b) since I am limited in the trial version to 10k documents I have no way of knowing if the recursion problem will ... recur!

Can anyone help me out here? I really feel (and I believe the amount of other posts on the subject support this) that an ability to handle such tags is important for Textpad's continuing succes, as marked-up documents such as XML appear to be the future of document interchange.

ben_josephs · Post by **ben_josephs** » Fri Aug 18, 2006 9:17 am

In WildEdit, without replacing \n with anything:

Find what: </>.*?<PROJ1>
Replace with: </>\n<PROJ1>

[X] Regular expression
[X] Replacement format

Options
[ ] '.' does not match a newline character [i.e., not selected]

Note the use of a non-greedy repeat (.*?).

TobyR · Post by **TobyR** » Fri Aug 18, 2006 11:21 am

Thanks, I'll give it a go! A few quick questions, if you don't mind.

1) Are you confident this will work for large files? (I can't test it with the eval version!)
2) In WildEdit, how can I simply select one file and output the results to a second (new) file?
3) Just out of interest, what's a 'non-greedy' repeat? Is it particular to WildEdit?

Thanks again!

ben_josephs · Post by **ben_josephs** » Fri Aug 18, 2006 11:50 am

1) I presume so. If it works on a small file I don't see why it shouldn't work on a big one. I don't have a registered copy of WildEdit as I only use it to check answers to questions about its regular expressions.
2) I don't know. See 1.
3) Try searching for "non-greedy repeat" (with the quotes) in WildEdit's help. Many modern regex recognisers support it.