RE search-replace dashes

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
Paul Havemann

RE search-replace dashes

Post by Paul Havemann »

I'm trying to tidy up a large body of text which has 'em' dashes in it, by adding spaces around each 'em' dash. But I want to exclude dashes which already have spaces around them. Here's an example with both cases in it:

This is--in my opinion--the best White Castle burger I've ever had -- and I've eaten a lot of them.

with...

This is -- in my opinion -- the best White Castle burger I've ever had -- and I've eaten a lot of them..

I've tried various regular expressions as suggested in the Help file, but I can't find the solution.

Thanks in advance!
Stephan

Re: RE search-replace dashes

Post by Stephan »

Click "Search | Replace" (or hit F8)


Find what:
\([^[:space:]]\)\(--\)\([^[:space:]]\)

Replace with
\1 \2 \3

Note: No spaces before '\1' and after '\3' in the above line

BTW, check the 'regular expression' box.

That works on your example, so I think it should work...

Hope that helped,

Stephan
Paul Havemann

Re: RE search-replace dashes

Post by Paul Havemann »

Works like a champ. Thanks much!

Perhaps, one day, I'll figure out *why* it works. ;}
Stephan

Re: RE search-replace dashes

Post by Stephan »

Well that's not that complicated (just finished "Mastering Regular Expressions"):

'\(' and '\)' specify the start / end of a sequence of characters _and_ remembers that sequence for later referencing.

Now, the RE

\([^[:space:]]\)\(--\)\([^[:space:]]\)

has 3 such parts:

1. \([^[:space:]]\)

That is match one character that's not a white space. The outer []'s delimit a character class, the '^' denotes a NOT and [:space:] is a 'name' for, well, spaces.
Note that this will require exactly one character to match, no more and no less.

2. \(--\)

This matches and 'remembers' just '--'

3. \([^[:space:]]\)

Hey we've already seen that.

Now the capturing and remembering is stored in character sequences like \1, \2 and the like.
Usually the first '\(' (from the left) goes into \1, the 2nd \( is stored in \2...

So if you search '\(aaa\(bbb\)\)' in

aaaabbbbb

and replace it with '*\1#\2+' you'll end up with

a*aaabbb#bbb+bb

as \1 is 'aaabbb' and \2 is bbb - because the '\(' are nested....

Now that I think about it, it's likely that there's a more elegant solution:
Find '\>--\<' and replace with ' -- '

At least works with your example, too. \> matches the end of a word, \< matches the beginning of a word.
Check the help file about this.

Happy regexing!

Stephan
Post Reply