Ed, Mudgard . . .
I'm trying to clean up some scanned-in (OCR) text that contains a whole bunch of stuff (names, addresses, arcane number strings, stray punctuation, dates, and much more). Not through any fault of either my scanner or my software, the documents are an unholy mess. These are commercially-purchased reports for which I have no control whatsoever on either the accuracy of the original typist -or- the resulting reports' friendliness to a scanner. In their miserly greed to save (negligible) real estate on each page of the publication in order to save on postage, two years ago they began to randomly tighten the letting (the spacing) between each line of text — to the point now where the publisher shamelessly allows the characters between lines to not only touch but at times overlap.
When I'm not devising ways of torturing the typesetter responsible for committing these crimes upon page layout, I'm contemplating personally sending him the entire section of my Pagemaker manual describing the basics of letting, which begins with:
Do not overlap lines or you really will look like an amateur.
The lines frequently touch one another (for an OCR this is the Kiss of Death) but it's even worse!
The moron who types these documents, in addition to being a terrible typist, follows
no single convention for either capitalization, abbreviation, or dates!
Mr. McIntyre and
Mr. Mac Arthur can show up as (take your pick):
Mcintyre, McIntyre, MCINTYRE, MC INTYRE, and (pant pant) Mc Intyre.
Mac Arthur meets the same fate:
Macarthur, MacArthur, MACARTHUR, MAC ARTHUR and Mac Arthur.
Still with me? tsk. You
did ask.
There are about, oh 30 crisis points in these documents, one of which is the CAPITALIZATION torture performed on the Houses of MC and MAC. I'm particular about last names that begin with Mc~ or Mac~ (no! we'd never guess why): I want my Mc with a lower-case c followed by an UPPER CASE letter; and the same for Mac — the letter that follows Mac~ in UPPER CASE. Examples from the Wasteland of my brain:
Mc Laughlin
Mac Arthur
Now in Textpad I can easily select the entire document and give it a sex change to CHANGE CASE/CAPITALIZE (the one where the first letter of every word is capitalized)
but there has to be a space inserted if it wasn't already — and that, gentlemen, is what I was trying to find a one-S/R-expression to accomplish. I don't know if it will be a McCarthy or a McKillip; a MacFinnegan or a Mac Intosh or a Macintosh or a MACINTOSH or (yes, he has thus offended*) macintosh; but if I could force my space, the CHANGE CASE/CAPITALIZE would act efficiently on the two words that would have been made . . . no matter what the devil they were.
And now I'm so pooped from having typed this out I may just change ALL their names to BAKER and make them Methodists phhhllllttttttt.
Skye
*it is my sincerest prayer that no one gives this person access to foreign characters