As much as I love TextPad, I prefer reading Gutenberg books formatted.
I have been chugging along, tagging a few texts from Gutenberg for formatting, using the Clip Library to make sure I don't forget any of those pesky closing tags.
Often these texts show emphasis either through ALL CAPS or _faux_ underlining, which I change to more reader-friendly emphasis also using the Clip Library.
While the "underlined" text can be quite easily dealt with using a single search/replace for the whole book, I'm finding that changing the all caps text a bit more tedious to accomplish.
First of all, I have been having trouble figuring out a single regular expression to find (a) all single all capped words, whether they are followed by a space, punctuation, or a hard return; and also (b) all all capped groups of words, whether etc. ... (I know this isn't the right forum for that problem, and I'm still playing around with that anyway and hope to figure out the solution on my own).
Second -- and finally we get to the subject line -- after finding the set of all capped words that need to be changed to different emphasis, theyboth need to be made either lower or title case as necessary, and made the correct tags applied (either italic or bold, depending on the context).
I would love if it were possible to either
(preferably) -- make a macro that would both lowercase and then use a specific clipping
or if that is not possible (as I suspect)
to assign a keyboard shortcut to a specific clippingso I am not jumping back and forth from the keyboard to the mouse constantly.
As I said, I have done a couple of these the hard way already -- meaning select the all caps words, then either Ctrl+L or Ctrl+Shift+U, as needed, and then mouse over to the Clip Library and double click on the correct clipping.
While the results are ok, I am pretty much determined to figure out a better way before I get stuck doing this to another long file.
I suppose another solution would be if I could create a macro with a hot spot like the clippings do, but I also suspect that is not possible in TextPad.
Any suggestions? Am I missing something?
macro with clipping?
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
-
- Posts: 2461
- Joined: Sun Mar 02, 2003 9:22 pm
Re: macro with clipping?
I don't fully understand the description of your problem, but these ideas may give you some pointers.
First, use Posix syntax:
For all of this you must match case-sensitively:
\<[A-Z]+\>
or, if you want to exclude single-letter words, such as I, you can use
\<[A-Z]{2,}\>
To find all sequences of fully-capitalised words on a single line you can use something like this:
\<[A-Z]+\>([[:punct:][:space:]]+\<[A-Z]+\>)*
or, to exclude single-letter words standing alone,
\<[A-Z]{2,}\>|\<[A-Z]+\>([[:punct:][:space:]]+\<[A-Z]+\>)+
To make them all lower case and enclose them in <caps> tags (you don't say what the tags should look like), you can use this:
As for the rest:
First, use Posix syntax:
It makes the world go round a deal faster.Configure | Preferences | Editor
[X] Use POSIX regular expression syntax
For all of this you must match case-sensitively:
To find all fully-capitalised words you can use this regular expression:[X] Match case
\<[A-Z]+\>
or, if you want to exclude single-letter words, such as I, you can use
\<[A-Z]{2,}\>
To find all sequences of fully-capitalised words on a single line you can use something like this:
\<[A-Z]+\>([[:punct:][:space:]]+\<[A-Z]+\>)*
or, to exclude single-letter words standing alone,
\<[A-Z]{2,}\>|\<[A-Z]+\>([[:punct:][:space:]]+\<[A-Z]+\>)+
To make them all lower case and enclose them in <caps> tags (you don't say what the tags should look like), you can use this:
You can't specify optional new lines using TextPad's weak regular expression recogniser (although you can with WildEdit (http://www.textpad.com/products/wildedit/). Perhaps the easiest way to handle sequences of upper-case words that span lines is to handle the individual lines first, as above, and then deal with the spans:Find what: \<[A-Z]+\>([[:punct:][:space:]]+\<[A-Z]+\>)*
Replace with: <caps>\L\0</caps>
If you want to untag occurrences of tagged I on its own, useFind what: </caps>([[:punct:][:space:]]*)\n([[:punct:]]*)<caps>
Replace with: \1\n\2
Does this solve part of your problem?Find what: <caps>i</caps>
Replace with: I
As for the rest:
Can you explain this more clearly?aznap wrote:they both need to be made either lower or title case as necessary, and made the correct tags applied (either italic or bold, depending on the context).
Re: macro with clipping?
I think it nearly gets me there (see below)ben_josephs wrote:Does this solve part of your problem?
Sorry for the first rambling question.ben_josephs wrote:As for the rest:Can you explain this more clearly?aznap wrote:they both need to be made either lower or title case as necessary, and made the correct tags applied (either italic or bold, depending on the context).
I think for what I need I cannot do a search/replace by itself to accomplish what I need, since some of the text needs to change from all caps to lower case and other text needs to change from all caps to just capitalized/title case, and still other all cap phrases need to be changed to mostly lower except for a few proper nouns. I think these decisions cannot be automated.
That being said, a regular expression search that will simply find all caps words or phrases would help quite a bit.
After thinking about it some more, I don't think I need to be able to find all caps phrases that go from one line to another, since I have been doing these searches after removing all hard returns except between paragraphs.
I have been using PML to tag the text so it can be read in eReader, but the specific markup doesn't matter, it could just as easily be HTML.
I don't have it set up as POSIX, but continuing from your example I have found that:
Both italicizes and makes lower case both single words and phrases.Find what: \(\<[A-Z[:space:][:punct:]]+[A-Z]+\>\)
Replace with: <i>\L\1</i>
(or for PML -- Replace with: \\i\L\1\\i)
From this point I think it would be easy enough to then search for <i> (or \i in PML) and Ctrl+F to manually cap those words that need it with Ctrl+Shift+U.
Thus this mess from Common Sense:
after the two steps becomes:Alas, we have been long led away by ancient prejudices, and made large sacrifices to superstition. We have boasted the protection of Great Britain, without considering, that her motive was INTEREST not ATTACHMENT; that she did not protect us from OUR ENEMIES on OUR ACCOUNT, but from HER ENEMIES on HER OWN ACCOUNT, from those who had no quarrel with us on any OTHER ACCOUNT, and who will always be our enemies on the SAME ACCOUNT. Let Britain wave her pretensions to the continent, or the continent throw off the dependance, and we should be at peace with France and Spain were they at war with Britain. The miseries of Hanover last war ought to warn us against connexions.
It hath lately been asserted in parliament, that the colonies have no relation to each other but through the parent country, i. e. that Pennsylvania and the Jerseys, and so on for the rest, are sister colonies by the way of England; this is certainly a very round-about way of proving relationship, but it is the nearest and only true way of proving enemyship, if I may so call it. France and Spain never were, nor perhaps ever will be our enemies as AMERICANS, but as our being the SUBJECTS OF GREAT BRITAIN.
This still doesn't quite get me where I wanted to be, but it is far, far better than where I was before. No kind of macro was going to be able to recognize and capitalize proper nouns for me in any case.Alas, we have been long led away by ancient prejudices, and made large sacrifices to superstition. We have boasted the protection of Great Britain, without considering, that her motive was \iinterest\i not \iattachment\i; that she did not protect us from \iour enemies\i on \iour account\i, but from \iher enemies\i on \iher own account\i, from those who had no quarrel with us on any \iother account\i, and who will always be our enemies on the \isame account\i. Let Britain wave her pretensions to the continent, or the continent throw off the dependance, and we should be at peace with France and Spain were they at war with Britain. The miseries of Hanover last war ought to warn us against connexions.
It hath lately been asserted in parliament, that the colonies have no relation to each other but through the parent country, i. e. that Pennsylvania and the Jerseys, and so on for the rest, are sister colonies by the way of England; this is certainly a very round-about way of proving relationship, but it is the nearest and only true way of proving enemyship, if I may so call it. France and Spain never were, nor perhaps ever will be our enemies as \iAmericans\i, but as our being the \isubjects of Great Britain\i.
What I was hoping for was to make a couple macros which I could define keyboard shortcuts for, so one would make selected text lower case and italicized and another would make selected text lower case and bold.
ISTM that being able to use clippings either with a macro or a keyboard shortcut would be useful for other things as well.
No worries. This works.
Thanks!
-
- Posts: 2461
- Joined: Sun Mar 02, 2003 9:22 pm
I gave you one.aznap wrote:That being said, a regular expression search that will simply find all caps words or phrases would help quite a bit.
as does my suggestion.aznap wrote:I don't have it set up as POSIX, but continuing from your example I have found that:Both italicizes and makes lower case both single words and phrases.Find what: \(\<[A-Z[:space:][:punct:]]+[A-Z]+\>\)
Replace with: <i>\L\1</i>
(or for PML -- Replace with: \\i\L\1\\i)
Your regex excludes single-letter words standing alone. Otherwise it's equivalent to the one I suggested. Note that you don't need the outer parentheses in the regex; you can use \0 in the replacement expression to represent the entire matched text. Your regex is equivalent to
\<[A-Z[:space:][:punct:]]{2,}\>
which is simpler and clearer.
You are right, yours works just as well and is a simpler expression.ben_josephs wrote:as does my suggestion.
Your regex excludes single-letter words standing alone. Otherwise it's equivalent to the one I suggested. Note that you don't need the outer parentheses in the regex; you can use \0 in the replacement expression to represent the entire matched text. Your regex is equivalent to
\<[A-Z[:space:][:punct:]]{2,}\>
which is simpler and clearer.
I admit I didn't try that one because I was afraid it would capture trailing spaces or punctuation, which I didn't want. I didn't take into account that the end-of-word marker would avoid that unwanted error.
Thanks again. Learning to use a tool better makes it a better tool.