Page 1 of 1

Newline characters in brackets

Posted: Thu Nov 20, 2008 9:26 am
by Jo3p
Hi,

I'm trying to clean up a huge XML file (150mb, 30000 root elements, 200000+lines) by throwing out all tags I don't need.

My strategy was to match every root element using regular expressions, but I can't get this to work because:

1. I can't include newline characters inside brackets;
2. I can't seem to replace every newline with an empty string;

I have replaced every newline character by an empty string using UltraEdit, but then I can't open the file in TexPad. "Line too long". When I finally removed the newlines inbetween subroot elements, but not inbetween root elements, the file opened fine, but then TextPad starts crashing on me every I use regular expressions...

My questions are:

- Why can't I use newline characters inside brackets with regular expressions?
- Is there any workaround for this? (except for taking the newlines outside the brackets, a solution I found on this forum but is of no use for me)

Thanks in advance!

Jo3p

Posted: Thu Nov 20, 2008 3:55 pm
by Bob Hansen
Would be easiest for us to help if you showed a sample of the file, before and after the changes, and showing the exact Search/Replace strings that your are using.

Why do you need to put end of line inside brackets?

With regular expression enabled, using POSIX, you can replace \n\n with nothing which will remove many blank lines. You may need to run multiple times.

Have you checked out the TextPad HELP section on using Regular Expressions? Many expressions are common to other versions of RegEx, but there are some limitations also.

Posted: Thu Nov 20, 2008 3:57 pm
by talleyrand
TP uses a dated regular expression parser which probably accounts for why you cannot use a newline. You might try WildEdit, also from Helios, as it uses the more modern and powerful Boost regex engine. Sorry I can't speak more forcefully but my regex-fu is weak.

SUVs while flexible are not earth movers ...

Posted: Thu Nov 27, 2008 6:54 am
by LDR
:roll: TextPad is dandy. I use it all the time.

But, you need the right tool for the job! :idea:

Since you seem to understand regex:

Try Cygwin/gawk, Perl. http://Cygwin.com :

Free and Open Source Software (FOSS). :D