Page 1 of 1
Regular Expressions for Find/Replace
Posted: Sat Apr 14, 2007 1:39 pm
by BBowers
I have the following text:
-110353,103080,This is a test,This is a test of more text\n
More test with numbers 9090909\n
520-393-2700\n
-110353,103080,This is a test,This is a test of more text\n
More test with numbers 9090909\n
520-393-2700\n
-110353,103080,This is a test,This is a test of more text\n
More test with numbers 9090909\n
520-393-2700\n
I want it to come out as:
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n
Note: I want to add a comma at the end of each line, I though I could do it with Find: \n Replace with , but obviously
it just gets rid of all the \n's and I end up with one big line.
I'm new to TextPad and I can't quite figure out the Regular Expressions yet.
Posted: Sat Apr 14, 2007 4:01 pm
by Bob Hansen
I am thinking that the "\n" you show is not real text in the lines, but your representation of the line endings.
Try this:
Find what:
^-(.*)\n(.*)\n(.*)\n
Replace with:
-\1,\2,\3,\n
Go to beginning of document.
Select Active Document,
Click on Replace Next. Replace All will not work.
Repeat doing Replace Next. You could make a macro to do Replace Next so you don't need to do it manually
Regular Expressions is checked, using POSIX syntax.
Posted: Sat Apr 14, 2007 6:17 pm
by ben_josephs
In what sense does Replace All not work?
Posted: Sat Apr 14, 2007 8:28 pm
by Bob Hansen
When I do Replace All, lines 1 and 3 work, line 2 does not. I get the following:
Code: Select all
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700,\n
-110353,103080,This is a test,This is a test of more text\n
More test with numbers 9090909\n
520-393-2700\n
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700,\n
I have separated them and included "\n" for clarification.
I have not taken the time to understand why that happens yet.
Does Replace All work for you?
Posted: Sat Apr 14, 2007 8:57 pm
by ben_josephs
Ah, I see what you mean. I was using
Find what: ^(-.*)\n(.*)\n(.*)
Replace with: \1,\2,\3,
which does work. Note that, in this version, the newline at the end of the third line is not matched and is, therefore, not replaced.
This is a bug. It's caused, perhaps, by the recogniser failing to notice that it's at the beginning of a line when it's positioned immediately after a newline that's just been inserted in the course of a global replacement.
Posted: Sat Apr 14, 2007 9:21 pm
by BBowers
Thanks for the info... both of you. Even after reading the Help file, I still don't quite understand the code.
Would you mind terribly, if I asked you to explain in layman's language what each of the RE codes mean?
I know the ^ by itself means Start of Line, and that the \n is the line return, but the other symbols have me a bit stumped. Especially how their arranged. Likewise what it the -\1,\2,\3,\n doing?
Sorry to be so stumped, but as I said I'm brand new at this.
Posted: Sat Apr 14, 2007 10:13 pm
by Bob Hansen
For BBowers, explanation of my solution:
Summary: Find a line that starts with a hyphen, followed by two other lines. Replace those three lines with one line, showing the contents of each line followed by a comma,
Search for:
^-(.*)\n(.*)\n(.*)\n
^.....Start at the beginning of the line
-......a hyphen
(......Start a tagged expression This will be start of \1
. .... any character
* .....any quantity of preceding character
) .....End of tagged expression This will be the end of \1
\n......End of line
(.*)\n ....Tagged expression 2 (\2) is any number of any characters, and followed by end of line
(.*)\n ....Tagged expression 3 (\3) is any number of any characters, and followed by end of line
(Tagged expressions) are numbered sequentially as they appear in the Search string.
The first (contents) = \1 and the second (contents) is \2 etc.
Replace with:
-\1,\2,\3,\n
- ..... hyphen
\1.......Expression1 first set of (contents between parentheses)
, .......comma
\2, .....Expression 2 followed by a comma
\3, .....Expression followed by a comma
\n ....end of line
Best reference book on RegEx is "Mastering Regular Expressions", 2nd edition, by Jeffrey Friedl, from O"Reilly publisher. Be sure to use 2nd edition, earlier edition is not as complete.
Note that TextPad is limited in its support of RegEx, but still has many more abilities than other programs. WildEdit has more RegEx capabilities.
And much of RegEx that I have learned has been from ben_josephs who provided a working solution for you. Thanks again, Ben
Posted: Sat Apr 14, 2007 10:24 pm
by ben_josephs
Bob has already answered your question. My answer is almost identical, but as I've already typed it, I'll post it anyway, in case a different slant is helpful.
This assumes you are using Posix regular expression syntax:
Configure | Preferences | Editor
[X] Use POSIX regular expression syntax
^(-.*)\n(.*)\n(.*) matches
Code: Select all
1. ^ the beginning of a line
2. (-.*) the contents of a line beginning with a hyphen, that is:
2.1 ( beginning of 1st capturing subexpression
2.2 - a hyphen
2.3 .* the rest of the line, that is:
2.3.1 . any character except newline
2.3.2 * repeated any number (possibly zero) of times
2.4 ) end of capturing subexpression
3. \n a newline
4. (.*) the contents of a line, that is:
4.1 ( beginning of 2nd capturing subexpression
4.2 .* a whole line, that is:
4.2.1 . any character except newline
4.2.2 * repeated any number (possibly zero) of times
4.3 ) end of capturing subexpression
5. \n a newline
6. (.*) the contents of a line, that is:
6.1 ( beginning of 3rd capturing subexpression
6.2 .* the whole line, that is:
6.2.1 . any character except newline
6.2.2 * repeated any number (possibly zero) of times
6.3 ) end of capturing subexpression
Each parenthesised subexpression captures the text substring it matches. The substring matched by the 1st parenthesised subexpression can be referred to as
\1 in the replacement expression, the 2nd as
\2, and so on up to
\9.
Thus the contents of the first line are matched by the 1st parenthesised subexpression, and are referrred to as
\1. Similarly, the contents of the second line are matched by the 2nd parenthesised subexpression, and are referrred to as
\2. And the contents of the third line are matched by the 3rd parenthesised subexpression, and are referrred to as
\3.
Look in TextPad's help under
Reference Information | Regular Expressions,
Reference Information | Replacement Expressions and
How to... | Find and Replace Text | Use Regular Expressions.
There are many regular expression tutorials on the web, and you will find recommendations for some of them if you search this forum.
A standard reference for regular expressions is
Friedl, Jeffrey E F
Mastering Regular Expressions, 2nd ed
O'Reilly, 2002
ISBN: 0596002890
http://regex.info/
But be aware that the regular expression recogniser used by TextPad is rather weak by the standards of recent tools, so you may get frustrated if you discover a handy trick that doesn't work in TextPad.
Posted: Sat Apr 14, 2007 11:23 pm
by BBowers
Thanks a lot, both of you. Just what I needed. Also thanks for the references for Regular Expressions. I'll definitely look into Jeffrey Friedl's book.