Newline characters in brackets

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
Jo3p
Posts: 1
Joined: Thu Nov 20, 2008 12:42 am

Newline characters in brackets

Post by Jo3p »

Hi,

I'm trying to clean up a huge XML file (150mb, 30000 root elements, 200000+lines) by throwing out all tags I don't need.

My strategy was to match every root element using regular expressions, but I can't get this to work because:

1. I can't include newline characters inside brackets;
2. I can't seem to replace every newline with an empty string;

I have replaced every newline character by an empty string using UltraEdit, but then I can't open the file in TexPad. "Line too long". When I finally removed the newlines inbetween subroot elements, but not inbetween root elements, the file opened fine, but then TextPad starts crashing on me every I use regular expressions...

My questions are:

- Why can't I use newline characters inside brackets with regular expressions?
- Is there any workaround for this? (except for taking the newlines outside the brackets, a solution I found on this forum but is of no use for me)

Thanks in advance!

Jo3p
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Would be easiest for us to help if you showed a sample of the file, before and after the changes, and showing the exact Search/Replace strings that your are using.

Why do you need to put end of line inside brackets?

With regular expression enabled, using POSIX, you can replace \n\n with nothing which will remove many blank lines. You may need to run multiple times.

Have you checked out the TextPad HELP section on using Regular Expressions? Many expressions are common to other versions of RegEx, but there are some limitations also.
Hope this was helpful.............good luck,
Bob
User avatar
talleyrand
Posts: 624
Joined: Mon Jul 21, 2003 6:56 pm
Location: Kansas City, MO, USA
Contact:

Post by talleyrand »

TP uses a dated regular expression parser which probably accounts for why you cannot use a newline. You might try WildEdit, also from Helios, as it uses the more modern and powerful Boost regex engine. Sorry I can't speak more forcefully but my regex-fu is weak.
I choose to fight with a sack of angry cats.
LDR
Posts: 9
Joined: Thu Jul 03, 2003 6:04 am

SUVs while flexible are not earth movers ...

Post by LDR »

:roll: TextPad is dandy. I use it all the time.

But, you need the right tool for the job! :idea:

Since you seem to understand regex:

Try Cygwin/gawk, Perl. http://Cygwin.com :

Free and Open Source Software (FOSS). :D
Post Reply