Page 1 of 1

Is this possible?

Posted: Tue Mar 12, 2013 12:19 pm
by terrypin
I want to delete the duplicate file names (always JPG). They occur in every group, always the first and last entries. And I want the remaining filename to be followed by a tab and then the descriptive text.

So for example I want this:

20030820-113000-TP01.jpg
Shortly after leaving Kemble station, on the way to Thames Head, the recognised Thames source.
20030820-113000-TP01.jpg

20030820-120300-TP03-43.jpg
Wed 20th Aug 2003. At the source.
20030820-120300-TP03-43.jpg

20030820-120000-TP02.jpg
At the dry source, Wed 20th August 2003.
Here's another line to screw things up.
20030820-120000-TP02.jpg

20030820-120300-TP03.jpg
Source
20030820-120300-TP03.jpg

to become this:

20030820-113000-TP01.jpg Shortly after leaving Kemble station, on the way to Thames Head, the recognised Thames source.

20030820-120300-TP03-43.jpg Wed 20th Aug 2003. At the source.

20030820-120000-TP02.jpg At the dry source, Wed 20th August 2003. Here's another line to screw things up.

20030820-120300-TP03.jpg Source

I see my tabs have not been replicated here. This is what that example looks like in TextPad:
https://dl.dropbox.com/u/4019461/TextPad-RE-1.jpg


So far my attempts have failed. Is it possible in TextPad please?

--
Terry, East Grinstead, UK

Posted: Tue Mar 12, 2013 2:01 pm
by ak47wong
Here's one way to do it. First enable POSIX regular expression syntax in Configure > Preferences > Editor.

Step 1: Collapse each group into a single tab-separated line
Find what: (.)\n(.)
Replace with: \1\t\2

Step 2: Delete the final tab and file name from each line
Find what: (.*)\t.*
Replace with: \1

Step 3: If necessary, fix the tab before the extra "line to screw things up"
Find what: ([^\t]*\t)(.*)\t(.*)
Replace with: \1\2_\3 [replace the underscore with a space]

Posted: Tue Mar 12, 2013 2:48 pm
by ben_josephs
I'd already typed this, so I might as well post it, to show a very similar, but slightly different, approach.

1. Delete the filename at the end of each group:
Find what: .*\.jpg\n\n
Replace with: \n
2. Replace the newline following each filename with a tab:
Find what: \.jpg\n
Replace with: .jpg\t
3. Join the lines of each group:
Find what: (.)\n(.)
Replace with: \1 \2

Posted: Tue Mar 12, 2013 3:58 pm
by terrypin
Thanks both, much appreciate the fast responses.

Both work a treat. I was scratching my head over the failure of the third step in each - until I remembered this setting!

Image

--
Terry, East Grinstead, UK