Find [.,]\n ...but not if what follows \n is the word This

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Find [.,]\n ...but not if what follows \n is the word This

Post by no.cache »

If the word This follows that \n, then I want to ignore that hit. Can I even do this? :oops:

I tried
[.,]\n[^This]
but it's grabbing the first character of what follows \n — and I need it to leave what follows the \n in place (unless what follows is the word This).

I hope that makes sense.

brblbrblbrbl :roll:

Skye
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Post by no.cache »

In other words . . .

text text text,
Purple Monkey

is finding
text text text,
P
urple Monkey

when I only want it to find
,
- - - - - - - - - - - - - - - -
and the strings . . .

text text text,
This Purple Monkey

I want it to ignore altogether because "This" follows the ,\n
Matthew Pearce
Posts: 7
Joined: Tue Jun 10, 2003 10:16 am

[.,] sb [\.,]

Post by Matthew Pearce »

I think the first part of your expression should be [\.,] since the fullstop is a regular expression in its own right (matching any single character) and therefore needs to be countermanded by the slash.

Matthew
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Post by no.cache »

Matthew, could you explain what

[.,] sb [\.,]

does? I take it that was what you meant to put in the body of your post (I saw it peeking out on the subject line)?

Thanks! How about a NOT THIS WORD? Is something like this even possible in Textpad? Because there are plenty of NOT THIS CHARACTER workarounds.

Just had an idea: Could you string together the characters, something like:

[^T+h+i+s]

I know that's garbage, but I use it as a model only. Thanks!

Skye
Ed
Posts: 103
Joined: Tue Mar 04, 2003 9:09 am
Location: Devon, UK

Post by Ed »

Hi Skye
What you need is
\([.]\|,\)\n\([^T][^h][^i][^s]\)
and when you replace use \2 to replace the text that is not "This" - an example of "before and after" from you would be helpful in giving a solution
Ed
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Post by no.cache »

Ed wrote:Hi Skye
What you need is
\([.]\|,\)\n\([^T][^h][^i][^s]\)
and when you replace use \2 to replace the text that is not "This" - an example of "before and after" from you would be helpful in giving a solution
Ed
EGADS! (runs from the room leaving a diminishing trail ha ha).

Wow. Okay, let me get that example for you. Give me just a bit here, been working since 3 a.m. (don't ask) . . .

Skye :P
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Post by no.cache »

Okay Ed, here is a somewhat different version of the This question, amounts to the same thing: How do you put a brake before the first character following the space . . . without also grabbing that first character and (essentially) erasing it?

These strings are GOOD:
MC PETERSON
MC CARTHY
MACPHERSON
MACARTHUR

These strings are NOT GOOD, and need "spacing tidying":
MCPETERSON . . . add space
MCCARTHY . . . add space
MAC PHERSON . . . remove space
MAC ARTHUR . . . remove space


By the way, these are mercifully the only two issues in the "MC n' MAC" family of proper names. I'm certain we can do this through Textpad, but noodling around with your sample produced some error messages, so I know I'm not doing it correctly. Thanks again Ed!
Skye

Image
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Post by no.cache »

For example:

(space)MC\([A-Z]\) . . . . on (space)MCTIERNY is finding

(space)MCTIERNY

and there goes my "T" when I only wanted it to grab the (space)MC. Hope that helps to define what I mean by braking the search.

Mc Skye
Image
User avatar
MudGuard
Posts: 1295
Joined: Sun Mar 02, 2003 10:15 pm
Location: Munich, Germany
Contact:

Post by MudGuard »

You can use \1 in the replacement part to put there whatever was in the first \(\), \2 for the second, \3 for the third ... \9 for the nineth.
Ed
Posts: 103
Joined: Tue Mar 04, 2003 9:09 am
Location: Devon, UK

Post by Ed »

You're not telling us the whole story here are you? Your 2 examples seem to be doing different things.

Now, if all you want to do is replace
MCPETERSON
MCCARTHY
MAC PHERSON
MAC ARTHUR

with

MC PETERSON
MC CARTHY
MACPHERSON
MACARTHUR

then find all:
^MC\([^ ]\)
replace with
MC \1

then find all:
^MAC
replace with
MAC

...but I don't think that's much help is it?[/quote]
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Post by no.cache »

Ed, Mudgard . . .

I'm trying to clean up some scanned-in (OCR) text that contains a whole bunch of stuff (names, addresses, arcane number strings, stray punctuation, dates, and much more). Not through any fault of either my scanner or my software, the documents are an unholy mess. These are commercially-purchased reports for which I have no control whatsoever on either the accuracy of the original typist -or- the resulting reports' friendliness to a scanner. In their miserly greed to save (negligible) real estate on each page of the publication in order to save on postage, two years ago they began to randomly tighten the letting (the spacing) between each line of text — to the point now where the publisher shamelessly allows the characters between lines to not only touch but at times overlap.

When I'm not devising ways of torturing the typesetter responsible for committing these crimes upon page layout, I'm contemplating personally sending him the entire section of my Pagemaker manual describing the basics of letting, which begins with: Do not overlap lines or you really will look like an amateur.

The lines frequently touch one another (for an OCR this is the Kiss of Death) but it's even worse!

The moron who types these documents, in addition to being a terrible typist, follows no single convention for either capitalization, abbreviation, or dates!

Mr. McIntyre and Mr. Mac Arthur can show up as (take your pick):

Mcintyre, McIntyre, MCINTYRE, MC INTYRE, and (pant pant) Mc Intyre.

Mac Arthur meets the same fate:
Macarthur, MacArthur, MACARTHUR, MAC ARTHUR and Mac Arthur.

Still with me? tsk. You did ask.

There are about, oh 30 crisis points in these documents, one of which is the CAPITALIZATION torture performed on the Houses of MC and MAC. I'm particular about last names that begin with Mc~ or Mac~ (no! we'd never guess why): I want my Mc with a lower-case c followed by an UPPER CASE letter; and the same for Mac — the letter that follows Mac~ in UPPER CASE. Examples from the Wasteland of my brain:

Mc Laughlin
Mac Arthur

Now in Textpad I can easily select the entire document and give it a sex change to CHANGE CASE/CAPITALIZE (the one where the first letter of every word is capitalized) but there has to be a space inserted if it wasn't already — and that, gentlemen, is what I was trying to find a one-S/R-expression to accomplish. I don't know if it will be a McCarthy or a McKillip; a MacFinnegan or a Mac Intosh or a Macintosh or a MACINTOSH or (yes, he has thus offended*) macintosh; but if I could force my space, the CHANGE CASE/CAPITALIZE would act efficiently on the two words that would have been made . . . no matter what the devil they were.

And now I'm so pooped from having typed this out I may just change ALL their names to BAKER and make them Methodists phhhllllttttttt.

Skye
Image

*it is my sincerest prayer that no one gives this person access to foreign characters
Ed
Posts: 103
Joined: Tue Mar 04, 2003 9:09 am
Location: Devon, UK

Post by Ed »

Well, why didn't you say earlier...
After your capitalizing trick, find:
\<m\(a\{0,1\}\)c\([a-z]\)
and replace with
M\l\1c\u\2
(Regexp, no Match Case)

Is there really anyone with a first name of Mac? or - Do you really want to change MacArthur to Mac Arthur? We can arrange that if you wamt. :D
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Post by no.cache »

Hi Ed,

That is a very cool string and I thank you for it, but it is not accomplishing quite what I wanted. Example:

JAMES MCINTYRE
becomes
James Mcintyre (after CHANGE CASE\CAPITALIZATION)
becomes
James McIntyre (after Ed's Patented Acme S/R string)
but what I wanted was
James Mc Intyre

It's the space, you see. Indeed, could I find a way to just insert the bloody space if no space already followed, then all I would have to do is the CAPITALIZATION.

The trick is locating the ones that are not already spaced. Example:

The search would locate
JAMES MCINTYRE
but it would ignore
JAMES MC INTYRE


. . . and would be intelligent enough to give JAMES MCINTYRE and SARAH MACLAUGHLIN both their needed spaces, thus

JAMES MC INTYRE
SARAH MAC LAUGHLIN
are now prep'd to ride the same CAPITALIZATION train as the rest of the document.

As I boil this all down it has to do with selectively adding one space.

Or put another way, let me tell you the crude way I'm doing it now, which forces a double S/R step because I have no way of discriminating who already has a space:

FIND (space)MC
CHG (space)MC(space)
————then
FIND (space)MC(space)(space)
CHG (space)MC(space)

. . . and the whole thing repeated for MAC, followed by CHANGE CASE on the whole document.

It's not the end of the world if I have to perform 4 separate S/R passes when there is probably just one graceful way to handle them. Earlier I fooled around with the | pipe but couldn't get that to work. I hope I've stated this better than my previous posts. After 6 hours of this you're ready for the rubber room heh.

Skye
Image
Ed
Posts: 103
Joined: Tue Mar 04, 2003 9:09 am
Location: Devon, UK

Post by Ed »

The stuff I posted was for use AFTER capitalization.

Try find
\<m\(a\{0,1\}\)c *\([a-z]\)
Replace with
M\l\1c \u\2

That'll put a space after Mc or Mac or Mc<space> of Mac<space>
User avatar
no.cache
Posts: 165
Joined: Thu May 15, 2003 2:52 pm

Post by no.cache »

Great Scott it works! it works!

You are amazing Mac Ed. It's brilliant, and it's going to be heading for Macroland very shortly. Image Image

Skye-a-Watha

Image Image Image Image Image Image Image Image Image Image
Post Reply