Regular Expressions for Find/Replace

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
BBowers
Posts: 17
Joined: Fri Jan 05, 2007 4:19 pm
Location: Colorado

Regular Expressions for Find/Replace

Post by BBowers »

I have the following text:
-110353,103080,This is a test,This is a test of more text\n
More test with numbers 9090909\n
520-393-2700\n
-110353,103080,This is a test,This is a test of more text\n
More test with numbers 9090909\n
520-393-2700\n
-110353,103080,This is a test,This is a test of more text\n
More test with numbers 9090909\n
520-393-2700\n

I want it to come out as:
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n

Note: I want to add a comma at the end of each line, I though I could do it with Find: \n Replace with , but obviously
it just gets rid of all the \n's and I end up with one big line.

I'm new to TextPad and I can't quite figure out the Regular Expressions yet.
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

I am thinking that the "\n" you show is not real text in the lines, but your representation of the line endings.

Try this:

Find what:
^-(.*)\n(.*)\n(.*)\n

Replace with:
-\1,\2,\3,\n

Go to beginning of document.
Select Active Document,
Click on Replace Next. Replace All will not work.
Repeat doing Replace Next. You could make a macro to do Replace Next so you don't need to do it manually

Regular Expressions is checked, using POSIX syntax.
Hope this was helpful.............good luck,
Bob
ben_josephs
Posts: 2464
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

In what sense does Replace All not work?
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

When I do Replace All, lines 1 and 3 work, line 2 does not. I get the following:

Code: Select all

-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700,\n

-110353,103080,This is a test,This is a test of more text\n
More test with numbers 9090909\n
520-393-2700\n

-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700,\n
I have separated them and included "\n" for clarification.
I have not taken the time to understand why that happens yet.
Does Replace All work for you?
Hope this was helpful.............good luck,
Bob
ben_josephs
Posts: 2464
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Ah, I see what you mean. I was using
Find what: ^(-.*)\n(.*)\n(.*)
Replace with: \1,\2,\3,
which does work. Note that, in this version, the newline at the end of the third line is not matched and is, therefore, not replaced.

This is a bug. It's caused, perhaps, by the recogniser failing to notice that it's at the beginning of a line when it's positioned immediately after a newline that's just been inserted in the course of a global replacement.
BBowers
Posts: 17
Joined: Fri Jan 05, 2007 4:19 pm
Location: Colorado

Post by BBowers »

Thanks for the info... both of you. Even after reading the Help file, I still don't quite understand the code.

Would you mind terribly, if I asked you to explain in layman's language what each of the RE codes mean?

I know the ^ by itself means Start of Line, and that the \n is the line return, but the other symbols have me a bit stumped. Especially how their arranged. Likewise what it the -\1,\2,\3,\n doing?

Sorry to be so stumped, but as I said I'm brand new at this.
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

For BBowers, explanation of my solution:

Summary: Find a line that starts with a hyphen, followed by two other lines. Replace those three lines with one line, showing the contents of each line followed by a comma,

Search for:
^-(.*)\n(.*)\n(.*)\n

^.....Start at the beginning of the line
-......a hyphen
(......Start a tagged expression This will be start of \1
. .... any character
* .....any quantity of preceding character
) .....End of tagged expression This will be the end of \1
\n......End of line
(.*)\n ....Tagged expression 2 (\2) is any number of any characters, and followed by end of line
(.*)\n ....Tagged expression 3 (\3) is any number of any characters, and followed by end of line


(Tagged expressions) are numbered sequentially as they appear in the Search string.
The first (contents) = \1 and the second (contents) is \2 etc.

Replace with:
-\1,\2,\3,\n

- ..... hyphen
\1.......Expression1 first set of (contents between parentheses)
, .......comma
\2, .....Expression 2 followed by a comma
\3, .....Expression followed by a comma
\n ....end of line


Best reference book on RegEx is "Mastering Regular Expressions", 2nd edition, by Jeffrey Friedl, from O"Reilly publisher. Be sure to use 2nd edition, earlier edition is not as complete.

Note that TextPad is limited in its support of RegEx, but still has many more abilities than other programs. WildEdit has more RegEx capabilities.

And much of RegEx that I have learned has been from ben_josephs who provided a working solution for you. Thanks again, Ben
Hope this was helpful.............good luck,
Bob
ben_josephs
Posts: 2464
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Bob has already answered your question. My answer is almost identical, but as I've already typed it, I'll post it anyway, in case a different slant is helpful.

This assumes you are using Posix regular expression syntax:
Configure | Preferences | Editor

[X] Use POSIX regular expression syntax
^(-.*)\n(.*)\n(.*) matches

Code: Select all

1.    ^        the beginning of a line

2.    (-.*)    the contents of a line beginning with a hyphen, that is:
2.1     (        beginning of 1st capturing subexpression
2.2     -        a hyphen
2.3     .*       the rest of the line, that is:
2.3.1     .        any character except newline
2.3.2     *        repeated any number (possibly zero) of times
2.4     )        end of capturing subexpression

3.    \n       a newline

4.    (.*)     the contents of a line, that is:
4.1     (        beginning of 2nd capturing subexpression
4.2     .*       a whole line, that is:
4.2.1     .        any character except newline
4.2.2     *        repeated any number (possibly zero) of times
4.3     )        end of capturing subexpression

5.    \n       a newline

6.    (.*)     the contents of a line, that is:
6.1     (        beginning of 3rd capturing subexpression
6.2     .*       the whole line, that is:
6.2.1     .        any character except newline
6.2.2     *        repeated any number (possibly zero) of times
6.3     )        end of capturing subexpression
Each parenthesised subexpression captures the text substring it matches. The substring matched by the 1st parenthesised subexpression can be referred to as \1 in the replacement expression, the 2nd as \2, and so on up to \9.

Thus the contents of the first line are matched by the 1st parenthesised subexpression, and are referrred to as \1. Similarly, the contents of the second line are matched by the 2nd parenthesised subexpression, and are referrred to as \2. And the contents of the third line are matched by the 3rd parenthesised subexpression, and are referrred to as \3.

Look in TextPad's help under
Reference Information | Regular Expressions,
Reference Information | Replacement Expressions and
How to... | Find and Replace Text | Use Regular Expressions.

There are many regular expression tutorials on the web, and you will find recommendations for some of them if you search this forum.

A standard reference for regular expressions is

Friedl, Jeffrey E F
Mastering Regular Expressions, 2nd ed
O'Reilly, 2002
ISBN: 0596002890
http://regex.info/

But be aware that the regular expression recogniser used by TextPad is rather weak by the standards of recent tools, so you may get frustrated if you discover a handy trick that doesn't work in TextPad.
BBowers
Posts: 17
Joined: Fri Jan 05, 2007 4:19 pm
Location: Colorado

Post by BBowers »

Thanks a lot, both of you. Just what I needed. Also thanks for the references for Regular Expressions. I'll definitely look into Jeffrey Friedl's book.
Post Reply