Hi,
I'm trying to clean up a huge XML file (150mb, 30000 root elements, 200000+lines) by throwing out all tags I don't need.
My strategy was to match every root element using regular expressions, but I can't get this to work because:
1. I can't include newline characters inside brackets;
2. I can't seem to replace every newline with an empty string;
I have replaced every newline character by an empty string using UltraEdit, but then I can't open the file in TexPad. "Line too long". When I finally removed the newlines inbetween subroot elements, but not inbetween root elements, the file opened fine, but then TextPad starts crashing on me every I use regular expressions...
My questions are:
- Why can't I use newline characters inside brackets with regular expressions?
- Is there any workaround for this? (except for taking the newlines outside the brackets, a solution I found on this forum but is of no use for me)
Thanks in advance!
Jo3p
Newline characters in brackets
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
Would be easiest for us to help if you showed a sample of the file, before and after the changes, and showing the exact Search/Replace strings that your are using.
Why do you need to put end of line inside brackets?
With regular expression enabled, using POSIX, you can replace \n\n with nothing which will remove many blank lines. You may need to run multiple times.
Have you checked out the TextPad HELP section on using Regular Expressions? Many expressions are common to other versions of RegEx, but there are some limitations also.
Why do you need to put end of line inside brackets?
With regular expression enabled, using POSIX, you can replace \n\n with nothing which will remove many blank lines. You may need to run multiple times.
Have you checked out the TextPad HELP section on using Regular Expressions? Many expressions are common to other versions of RegEx, but there are some limitations also.
Hope this was helpful.............good luck,
Bob
Bob
- talleyrand
- Posts: 624
- Joined: Mon Jul 21, 2003 6:56 pm
- Location: Kansas City, MO, USA
- Contact:
TP uses a dated regular expression parser which probably accounts for why you cannot use a newline. You might try WildEdit, also from Helios, as it uses the more modern and powerful Boost regex engine. Sorry I can't speak more forcefully but my regex-fu is weak.
I choose to fight with a sack of angry cats.
SUVs while flexible are not earth movers ...
TextPad is dandy. I use it all the time.
But, you need the right tool for the job!
Since you seem to understand regex:
Try Cygwin/gawk, Perl. http://Cygwin.com :
Free and Open Source Software (FOSS).
But, you need the right tool for the job!
Since you seem to understand regex:
Try Cygwin/gawk, Perl. http://Cygwin.com :
Free and Open Source Software (FOSS).