Finding DEADSPACE that could be tabs, spaces, even carrier r
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
Finding DEADSPACE that could be tabs, spaces, even carrier r
Hi friends! Okay, it's time for me to return to CLEAN-UP DUTY and I need help with (what else heh) Mr. Regular Expression. I'm producing a mailing list, and I need to get rid of multiple lines of extraneous garbage. For this example I'll use the boundary words PURPLE and GRAPE.
"GRAPE" represents the first instance of a string of (something) that I want to keep and will always be preceded by the appearance of a COLON. I'm stuck as to what falls between that COLON and GRAPE because the deadspace rendered from my OCR manifests alternatively as space(s), tab(s), or (in rare cases) carrier return(s), eg.
~~~~~~:=deadspace=GRAPE~~~~~~~
or
~~~~~~:=carrierreturn(s)=
GRAPE~~~~~~~
or even
~~~~~~:=deadspace(s)carrierreturn(s)=
GRAPE~~~~~~~
I've (crudely) gotten as far as matching the literal instance of PURPLE up to =onespace=GRAPE, eg.
PURPLE.*\n.*\n.*\n.*\n.*: GRAPE
but since I can't reliably know how that =deadspace= between the colon and GRAPE will express itself, I'm stuck on what wildcard string I can use to locate it. It's complicated by the fact that there _could_ be carrier return(s) somewhere after the colon.
Thanks for any help you can provide!
Skye
"GRAPE" represents the first instance of a string of (something) that I want to keep and will always be preceded by the appearance of a COLON. I'm stuck as to what falls between that COLON and GRAPE because the deadspace rendered from my OCR manifests alternatively as space(s), tab(s), or (in rare cases) carrier return(s), eg.
~~~~~~:=deadspace=GRAPE~~~~~~~
or
~~~~~~:=carrierreturn(s)=
GRAPE~~~~~~~
or even
~~~~~~:=deadspace(s)carrierreturn(s)=
GRAPE~~~~~~~
I've (crudely) gotten as far as matching the literal instance of PURPLE up to =onespace=GRAPE, eg.
PURPLE.*\n.*\n.*\n.*\n.*: GRAPE
but since I can't reliably know how that =deadspace= between the colon and GRAPE will express itself, I'm stuck on what wildcard string I can use to locate it. It's complicated by the fact that there _could_ be carrier return(s) somewhere after the colon.
Thanks for any help you can provide!
Skye
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
I have a question in advance of looking at this....I am not sure that the TP version of Regex supports wraps around return codes.
Is it possible for you to do a temporary search and replace of the return codes? Maybe change them to ~ or | or some other character that would be unique?
After doing Regex,we could go back and do another search and replace for ~ or | and replace with return code again.
Is it possible for you to do a temporary search and replace of the return codes? Maybe change them to ~ or | or some other character that would be unique?
After doing Regex,we could go back and do another search and replace for ~ or | and replace with return code again.
Hi Bob,
As you can see from my GRAPE example, I've already resigned myself for S/R multiple \n's . . . one more won't kill me.
Would I be looking for ~~~:=deadspaceortabs=\n
=morepossibledeadspaceortabs=GRAPE
?
I just hoped there was a more efficient S/R that could grab that (possible) carrier return. I can't be too picky about this however because I know those \n's are prickly heh.
Lead on.
Skye Girl
As you can see from my GRAPE example, I've already resigned myself for S/R multiple \n's . . . one more won't kill me.
Would I be looking for ~~~:=deadspaceortabs=\n
=morepossibledeadspaceortabs=GRAPE
?
I just hoped there was a more efficient S/R that could grab that (possible) carrier return. I can't be too picky about this however because I know those \n's are prickly heh.
Lead on.
Skye Girl
Check out my TextPad Regular Expression FAQ if you need more information:
http://www.jeffyjeffy.com/code/textpad/documentation/regular_expression_faq.html
http://www.jeffyjeffy.com/code/textpad/documentation/regular_expression_faq.html
Jeff, Bob . . . got it!
[ \t]+$
Now I don't know why when I first tried this it didn't work (doubtless I didn't have the expression correct) but I just tried it again — this time testing it by inserting various combinations of tabs and spaces — and it works great.
Also guys, I know TP would perform these S/R's more efficiently if I were to remove those carrier returns . . . but you cannot imagine (yes you can heh) how much more difficult it makes my first edit of these crummy OCR'd files not to at least have some crude semblance of their graphic shape.
I gotta stitch as the last step or I'll lose my mind.
On to the next headache.
Skye-a-Watha
[ \t]+$
Now I don't know why when I first tried this it didn't work (doubtless I didn't have the expression correct) but I just tried it again — this time testing it by inserting various combinations of tabs and spaces — and it works great.
Also guys, I know TP would perform these S/R's more efficiently if I were to remove those carrier returns . . . but you cannot imagine (yes you can heh) how much more difficult it makes my first edit of these crummy OCR'd files not to at least have some crude semblance of their graphic shape.
I gotta stitch as the last step or I'll lose my mind.
On to the next headache.
Skye-a-Watha
The World Famous JeffyJeff FAQ! I had completely forgot about this fantastic page. Got it bookmarked now Jeff.jeffy wrote:Check out my TextPad Regular Expression FAQ if you need more information:
http://www.jeffyjeffy.com/code/textpad/documentation/regular_expression_faq.html
Hugs,
Skye King
Text manipulation
> But trust me, the worst is yet to come. As Arnold said:
> I'll be back.
Skye,
Looking at all your posts asking for help about regexps, I'd say
you'd probably be better off having a look at Perl. It's designed
to do exactly the sort of things you're trying to do, and isn't
hard to pick up (if you take things one step at a time).
It's also got better regexp handling than textpad.
See 'Links' on my web site for some Perl books, a web based
tutorial and details on where to download Perl free.
http://www.nialstewartdevelopments.co.uk
Nial.
> I'll be back.
Skye,
Looking at all your posts asking for help about regexps, I'd say
you'd probably be better off having a look at Perl. It's designed
to do exactly the sort of things you're trying to do, and isn't
hard to pick up (if you take things one step at a time).
It's also got better regexp handling than textpad.
See 'Links' on my web site for some Perl books, a web based
tutorial and details on where to download Perl free.
http://www.nialstewartdevelopments.co.uk
Nial.