Deleting batches by lines

harrycornelius · Post by **harrycornelius** » Thu Feb 08, 2007 3:21 pm

Hi All

I have a rather strange request.

I need to delete lines 2-40, 42-80, 82-120 and so on...

And I then need to delete lines 51-2000, 2051-4000, 4051-6000 and so on...

Can anyone shed some light on the best way to do this? If I can crack this one, it will mean a great deal.

Thanks to everyone in advance.

Harry

Bob Hansen · Post by **Bob Hansen** » Thu Feb 08, 2007 5:25 pm

Regarding this section:

And I then need to delete lines 51-2000, 2051-4000, 4051-6000 and so on...

It looks like you are trying to delete every other block of 1950 lines.

Since TextPad RegEx cannot find a count of \n, you could replace \n with another unique character, like "~", and then do a search for something like (.*~){1950} blocks.

Go to line 51
Find, find next, delete, find next, find next, delete, find next, find next, delete, etc....

When done, replace the "~" with \n and you should be done.
-----------------------------------------

Hmmmm, won't work, because you will still need to move another 50 characters for each next find. Maybe do two finds. The first one finds 2000 characters, then next one does the 1950 to be deleted.
Find 2000, find 1950, delete, find 2000, find 1950, delete, etc.

This probably sound more confusing that I am explaining. I don't have access to TextPad to test this out right now, but maybe the ideas are making sense and you can make it happen.

ben_josephs will probably show up with a much easier method for you.

ben_josephs · Post by **ben_josephs** » Thu Feb 08, 2007 10:44 pm

These suggestions may be easier, but it could be that they're not the suggestions you want.

This isn't the sort of thing that TextPad is best at. I would write a quick script to do it. For example, in Perl, for the first job:

Code: Select all

my $i = 1 ;
for my $line ( <> )
{ if ( ( $i % 40 ) == 1 )
  { print $line ;
  }
  ++ $i ;
}

or, more obscurely,

Code: Select all

for ( my $i = 1 ; <> ; ++ $i )
{ ( ( $i % 40 ) == 1 ) and print $_ ;
}

For the second job:

Code: Select all

my $i = 1 ;
for my $line ( <> )
{ if ( ( ( $i - 1 ) % 2000 ) < 50 ) )
  { print $line ;
  }
  ++ $i ;
}

or

Code: Select all

for ( my $i = 1 ; <> ; ++ $i )
{ ( ( ( $i - 1 ) % 2000 ) < 50 ) and print $_ ;
}

Alternatively, you could use WildEdit (http://www.textpad.com/products/wildedit/):

Find what: (.*\r?\n)(.*\r?\n){39}
Replace with: $1

[X] Regular expression
[X] Replacement format

Options
[X] '.' does not match a newline character

and

Find what: ((.*\r?\n){50})(.*\r?\n){1950}
Replace with: $1

The \r?\n bit allows it to cope with files with either DOS or Unix line endings.

You'll have to buy a licence for WildEdit to use it for files of the size you've indicated.

harrycornelius · Post by **harrycornelius** » Tue Feb 13, 2007 12:18 pm

Thanks for the help.

I have purchased Wildedit, and will try it in there.

I'll pass on any relevant info.

Thanks everyone,

Harry

harrycornelius · Post by **harrycornelius** » Wed Feb 14, 2007 5:03 pm

Hi Ben

That has worked a treat, I can't thank you enough as I've been trying to find a solution to this for ages.

I'm writing an article for a company that provide height data to local authorities - and I'll give WildEdit/Textpad a plug, if this is OK with you. This article goes out to all Local Authorities in the UK.

It means that a previously 100Mb plus height data file at 5m intervals now becomes a much more usable file at 65Kb and 200m grid intervals - which is good enough for most applications, especially where the land coverage is large. These grids are then draped with JPEG aerial photography and bought into a VRML where buildings, trees, digital photos etc can be added.

I thought you might be interested in the application.

Thanks again,
Harry

ben_josephs · Post by **ben_josephs** » Wed Feb 14, 2007 5:51 pm

You're welcome.

I have no problem with your plugging Helios's products. I have no connection with them except that I use TextPad.

harrycornelius · Post by **harrycornelius** » Tue May 01, 2007 10:30 am

OK, I'm now trying to create a 100m grid using the following syntax:

Stage1: (.*\r?\n)(.*\r?\n){19}
Replace with: $1

and

Stage2: ((.*\r?\n){100})(.*\r?\n){1900}
Replace with: $1

The first stage works fine, but the second stage 'hangs' and gives me a 'memory exhausted' message. Does anyone have any ideas?

Thanks,
Harry

ben_josephs · Post by **ben_josephs** » Tue May 01, 2007 11:59 am

I don't actually use WildEdit, so I haven't got a licence for it. Therefore I can't test it on files as big as yours.

But you might try
((?:.*\r?\n){100})(?:.*\r?\n){1900}
which doesn't capture subexpressions you're not going to use.

If your files have DOS line endings you can use
((?:.*\r\n){100})(?:.*\r\n){1900}
and if they have Unix line endings you can use
((?:.*\n){100})(?:.*\n){1900}
although I doubt that will make much difference.

Community

Deleting batches by lines

Deleting batches by lines

A new problem..