Retaining only specific lines of text

terrypin · Post by **terrypin** » Tue Jan 24, 2012 3:07 pm

Hi,

Just re-discovered this forum which I joined years ago and had forgotten about!

I was initially going to call this subject
'Deleting many lines of text between specified characters?
but I suppose they're essentially equivalent?

Anyway, although a TextPad veteran I'm a relative Regex novice and hoping one of the experts can help please. I have a text file that looks like this:

--- Start paste ---
[BlackfordLane.jpg]
File name = BlackfordLane.jpg
Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\
Compression = JPEG, quality: 87, subsampling OFF
Resolution = 96 x 96 DPI
File date/time = 19/01/2012 / 15:01:23

- IPTC -
Object Name - s bridge over the River Thames is not a footbridge but carries pipes.

- COMMENT -
Thames Path on Blackford Lane heading towards Blackford Farm, east of Castle Eaton.

[Castle Eaton Church.jpg]
File name = Castle Eaton Church.jpg
Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\
Compression = JPEG, quality: 87, subsampling OFF
Resolution = 72 x 72 DPI
File date/time = 19/01/2012 / 14:03:55

- EXIF -
Make - FUJIFILM
Model - FinePix2600Zoom
Orientation - Top left
XResolution - 72
YResolution - 72
ResolutionUnit - Inch

- COMMENT -
Castle Eaton Church

[CastleEaton-2.jpg]
File name = CastleEaton-2.jpg
Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\
Compression = JPEG, quality: 75
Resolution = 0 x 0 DPI
File date/time = 18/01/2012 / 15:40:05

- COMMENT -
The Red Lion, Castle Eaton
A warm welcoming pub on a cold winter's day, with the River Thames running at the bottom of the garden.
--- End paste ---

And this is what I want to get as a result:

--- Start paste ---
BlackfordLane.jpg
Thames Path on Blackford Lane heading towards Blackford Farm, east of Castle Eaton.

Castle Eaton Church.jpg
Castle Eaton Church

CastleEaton-2.jpg
The Red Lion, Castle Eaton
A warm welcoming pub on a cold winter's day, with the River Thames running at the bottom of the garden.
--- End paste ---

My first line of attack is to try for a Regex expression that will Find everything (for example) between the ']' of '[BlackfordLane.jpg]' and the '-' of '- COMMENT -'? That would leave only a little tidying up, I think.

But so far that's eluded me after some hours. The best I could come up with was the following to delete all lines from File name... to File date/time.

Find:
File name = .*\nDirectory = .*\nCompression = .*\nResolution = .*\nImage dimensions = .*\nPrint size = .*\nColor depth = .*\nNumber of unique colors = .*\nDisk size = .*\nCurrent memory size = .*\nFile date/time = .*\n

Replace:
(Nothing)

But that's only part of the task and seems very inelegant.

Any suggestions please?

I'm beginning to suspect TextPad can't do it and that I'll have to find the Filenames and Comments with separate Regex expressions then use my macro program to iterate down the entire file and assemble the cumulative result. But I hope someone can prove me wrong please!

--
Terry, East Grinstead, UK

terrypin · Post by **terrypin** » Tue Jan 24, 2012 4:20 pm

Update:
Shortly after the above post I discovered WildEdit, via another thread. Duly downloaded and a licence ordered. I'm sure it will easily handle my requirement and other similar tasks I anticipate, once I've learnt how to use it.

Still like to know if TextPad could do it...

BTW, I see that WildEdit's RegEx type is PERLE by default. I've been using POSIX in TextPad so would I find either of WildEdit's other types
POSIX Extended Regular Expression Syntax
POSIX Basic Regular Expression Syntax
more familiar?

I'm only an occasional RegEx user, a novice, so I'd like an easy ride.

--
Terry, East Grinstead, UK

PeteTheBloke · Post by **PeteTheBloke** » Wed Jan 25, 2012 2:15 pm

I haven't read every word in your request, but it looks like you want the File name field and the comment field?

I tidy up files like this all the time with TP. If it's a regular job make a macro. Here's the procedure I'd follow:

1. Bookmark all lines starting 'File name'
Find-> regex ^File name (the ^ ties it to the start of a line) ->mark all

2. Remove the new line after - COMMENT -
Replace - COMMENT -\n with '' (i.e. nothing, empty string). \n is regex for a new line.

3. Mark all those lines
Find-> COMMENT ->mark all

Cut all bookmarked lines.
Select all.
Paste bookmarked lines.

This leaves you with only the lines you want and a simple search /replace gets rid of the extra text you don't need.

terrypin · Post by **terrypin** » Wed Jan 25, 2012 3:35 pm

PeteTheBloke wrote:I haven't read every word in your request, but it looks like you want the File name field and the comment field?

Thanks, that's exactly right.

I tidy up files like this all the time with TP. If it's a regular job make a macro. Here's the procedure I'd follow:

1. Bookmark all lines starting 'File name'
Find-> regex ^File name (the ^ ties it to the start of a line) ->mark all

OK.

2. Remove the new line after - COMMENT -
Replace - COMMENT -\n with '' (i.e. nothing, empty string). \n is regex for a new line.

I assume you mean:
Replace - COMMENT -\n with '- COMMENT -' as that's what your next step seems to imply?

3. Mark all those lines
Find-> COMMENT ->mark all
[/quote]

OK. But that doesn't cover the case when there are two or more lines of comment, as in the case of CastleEaton-2.jpg?

Cut all bookmarked lines.
Select all.
Paste bookmarked lines.

This leaves you with only the lines you want and a simple search /replace gets rid of the extra text you don't need.

[/quote]

I'll play around and see if I can extend that to cover more than one comment line and maybe macro some of it.

Just to be sure I've understood, is my summary in the following correct please?
http://dl.dropbox.com/u/4019461/PeterStep1.jpg
http://dl.dropbox.com/u/4019461/PeterStep2.jpg
http://dl.dropbox.com/u/4019461/PeterStep3.jpg
http://dl.dropbox.com/u/4019461/PeterStep4.jpg

--
Terry, East Grinstead, UK

PeteTheBloke · Post by **PeteTheBloke** » Wed Jan 25, 2012 6:51 pm

Terry

Your observations are all correct in every regard

You seem to have solved most of it. These multi-liners are often a pain in the neck and my solution is replace double new lines with something like FRED and then replace FRED afterwards.

In a rush. You'll get it.

kengrubb · Post by **kengrubb** » Sat Jan 28, 2012 12:47 am

I have licenses for both TP and WE.

For much of my day, I live in TP. However, there are some things for which it's much easier to use WE.

TP is precision guided missile whereas WE is an area effect precision guided missile.

At times I find it's simple a feel that WE is the better tool. My sense is that Perl REGEX in WE handles multilines much better than TP.