Finding DEADSPACE that could be tabs, spaces, even carrier r

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Finding DEADSPACE that could be tabs, spaces, even carrier r

Post by no.cache »

Hi friends! Okay, it's time for me to return to CLEAN-UP DUTY and I need help with (what else heh) Mr. Regular Expression. I'm producing a mailing list, and I need to get rid of multiple lines of extraneous garbage. For this example I'll use the boundary words PURPLE and GRAPE.

"GRAPE" represents the first instance of a string of (something) that I want to keep and will always be preceded by the appearance of a COLON. I'm stuck as to what falls between that COLON and GRAPE because the deadspace rendered from my OCR manifests alternatively as space(s), tab(s), or (in rare cases) carrier return(s), eg.
~~~~~~:=deadspace=GRAPE~~~~~~~
or
~~~~~~:=carrierreturn(s)=
GRAPE~~~~~~~
or even
~~~~~~:=deadspace(s)carrierreturn(s)=
GRAPE~~~~~~~

I've (crudely) gotten as far as matching the literal instance of PURPLE up to =onespace=GRAPE, eg.
PURPLE.*\n.*\n.*\n.*\n.*: GRAPE

but since I can't reliably know how that =deadspace= between the colon and GRAPE will express itself, I'm stuck on what wildcard string I can use to locate it. It's complicated by the fact that there _could_ be carrier return(s) somewhere after the colon.

Thanks for any help you can provide!

Skye
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

I have a question in advance of looking at this....I am not sure that the TP version of Regex supports wraps around return codes.

Is it possible for you to do a temporary search and replace of the return codes? Maybe change them to ~ or | or some other character that would be unique?

After doing Regex,we could go back and do another search and replace for ~ or | and replace with return code again.
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Post by no.cache »

Hi Bob,

As you can see from my GRAPE example, I've already resigned myself for S/R multiple \n's . . . one more won't kill me.

Would I be looking for ~~~:=deadspaceortabs=\n
=morepossibledeadspaceortabs=GRAPE

?

I just hoped there was a more efficient S/R that could grab that (possible) carrier return. I can't be too picky about this however because I know those \n's are prickly heh.

Lead on.

Skye Girl
User avatar
jeffy
Posts: 323
Joined: Mon Mar 03, 2003 9:04 am
Location: Philadelphia

Post by jeffy »

Carrige (how the heck to you spell that?!) returns are not well supported in TextPad regular expressions. Replacing \n before running an RE is the best option I know of.
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Post by no.cache »

Jeffy, okay, could you help me _just_ locate the blank space before a \n? If it is either a tab(s) or space(s)?

This is just driving me nuts. I've tried [:blank:]*\n and can't get it to reliably locate _just_ the dead space. Arrrrrrghhhh!!!!!

:oops:

Skye
User avatar
jeffy
Posts: 323
Joined: Mon Mar 03, 2003 9:04 am
Location: Philadelphia

Post by jeffy »

Try

[a space]$

$ means "the end of a line" and is more reliable than searching for \n.

Hope this is what you're looking for.
User avatar
jeffy
Posts: 323
Joined: Mon Mar 03, 2003 9:04 am
Location: Philadelphia

Post by jeffy »

Also, I'm thinking you might want

[ ]+$

Where there could be more than one space before the end of the line.

Going further...

[ a-z]+$ would find one or more space, AND/OR lowercase letter existing before the end of the line.

Replace "+" with "*" if you need zero or more.
User avatar
jeffy
Posts: 323
Joined: Mon Mar 03, 2003 9:04 am
Location: Philadelphia

Post by jeffy »

Check out my TextPad Regular Expression FAQ if you need more information:

http://www.jeffyjeffy.com/code/textpad/documentation/regular_expression_faq.html
User avatar
jeffy
Posts: 323
Joined: Mon Mar 03, 2003 9:04 am
Location: Philadelphia

Post by jeffy »

Hey, I just rapid fire created three posts, so what's one more...?

:' )
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Post by no.cache »

Jeff, Bob . . . got it!

[ \t]+$

Now I don't know why when I first tried this it didn't work (doubtless I didn't have the expression correct) but I just tried it again — this time testing it by inserting various combinations of tabs and spaces — and it works great.

:D

Also guys, I know TP would perform these S/R's more efficiently if I were to remove those carrier returns . . . but you cannot imagine (yes you can heh) how much more difficult it makes my first edit of these crummy OCR'd files not to at least have some crude semblance of their graphic shape.

I gotta stitch as the last step or I'll lose my mind. :wink:

On to the next headache.

Skye-a-Watha
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Post by no.cache »

jeffy wrote:Check out my TextPad Regular Expression FAQ if you need more information:

http://www.jeffyjeffy.com/code/textpad/documentation/regular_expression_faq.html
The World Famous JeffyJeff FAQ! :D I had completely forgot about this fantastic page. Got it bookmarked now Jeff.

Hugs,
Skye King
User avatar
jeffy
Posts: 323
Joined: Mon Mar 03, 2003 9:04 am
Location: Philadelphia

Post by jeffy »

You just single-handedly made my month, Skye.

:' )
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Post by no.cache »

Cool! :D

But trust me, the worst is yet to come. As Arnold said:
I'll be back.

groan :wink:
Nial
Posts: 29
Joined: Fri May 09, 2003 12:04 pm

Text manipulation

Post by Nial »

> But trust me, the worst is yet to come. As Arnold said:
> I'll be back.

Skye,

Looking at all your posts asking for help about regexps, I'd say
you'd probably be better off having a look at Perl. It's designed
to do exactly the sort of things you're trying to do, and isn't
hard to pick up (if you take things one step at a time).

It's also got better regexp handling than textpad.

See 'Links' on my web site for some Perl books, a web based
tutorial and details on where to download Perl free.

http://www.nialstewartdevelopments.co.uk

Nial.
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Post by no.cache »

I'll definitely look into that Nial. Right now I'm jammed by a deadline but I'll come over and visit once I'm done with this first OCR project, because there will be dozens to follow.

Skye-a-Watha
Image
Post Reply