Regular Expressions for Find/Replace
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
Regular Expressions for Find/Replace
I have the following text:
-110353,103080,This is a test,This is a test of more text\n
More test with numbers 9090909\n
520-393-2700\n
-110353,103080,This is a test,This is a test of more text\n
More test with numbers 9090909\n
520-393-2700\n
-110353,103080,This is a test,This is a test of more text\n
More test with numbers 9090909\n
520-393-2700\n
I want it to come out as:
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n
Note: I want to add a comma at the end of each line, I though I could do it with Find: \n Replace with , but obviously
it just gets rid of all the \n's and I end up with one big line.
I'm new to TextPad and I can't quite figure out the Regular Expressions yet.
-110353,103080,This is a test,This is a test of more text\n
More test with numbers 9090909\n
520-393-2700\n
-110353,103080,This is a test,This is a test of more text\n
More test with numbers 9090909\n
520-393-2700\n
-110353,103080,This is a test,This is a test of more text\n
More test with numbers 9090909\n
520-393-2700\n
I want it to come out as:
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700\n
Note: I want to add a comma at the end of each line, I though I could do it with Find: \n Replace with , but obviously
it just gets rid of all the \n's and I end up with one big line.
I'm new to TextPad and I can't quite figure out the Regular Expressions yet.
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
I am thinking that the "\n" you show is not real text in the lines, but your representation of the line endings.
Try this:
Find what:
^-(.*)\n(.*)\n(.*)\n
Replace with:
-\1,\2,\3,\n
Go to beginning of document.
Select Active Document,
Click on Replace Next. Replace All will not work.
Repeat doing Replace Next. You could make a macro to do Replace Next so you don't need to do it manually
Regular Expressions is checked, using POSIX syntax.
Try this:
Find what:
^-(.*)\n(.*)\n(.*)\n
Replace with:
-\1,\2,\3,\n
Go to beginning of document.
Select Active Document,
Click on Replace Next. Replace All will not work.
Repeat doing Replace Next. You could make a macro to do Replace Next so you don't need to do it manually
Regular Expressions is checked, using POSIX syntax.
Hope this was helpful.............good luck,
Bob
Bob
-
ben_josephs
- Posts: 2464
- Joined: Sun Mar 02, 2003 9:22 pm
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
When I do Replace All, lines 1 and 3 work, line 2 does not. I get the following:
I have separated them and included "\n" for clarification.
I have not taken the time to understand why that happens yet.
Does Replace All work for you?
Code: Select all
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700,\n
-110353,103080,This is a test,This is a test of more text\n
More test with numbers 9090909\n
520-393-2700\n
-110353,103080,This is a test,This is a test of more text,More test with numbers 9090909,520-393-2700,\nI have not taken the time to understand why that happens yet.
Does Replace All work for you?
Hope this was helpful.............good luck,
Bob
Bob
-
ben_josephs
- Posts: 2464
- Joined: Sun Mar 02, 2003 9:22 pm
Ah, I see what you mean. I was using
This is a bug. It's caused, perhaps, by the recogniser failing to notice that it's at the beginning of a line when it's positioned immediately after a newline that's just been inserted in the course of a global replacement.
which does work. Note that, in this version, the newline at the end of the third line is not matched and is, therefore, not replaced.Find what: ^(-.*)\n(.*)\n(.*)
Replace with: \1,\2,\3,
This is a bug. It's caused, perhaps, by the recogniser failing to notice that it's at the beginning of a line when it's positioned immediately after a newline that's just been inserted in the course of a global replacement.
Thanks for the info... both of you. Even after reading the Help file, I still don't quite understand the code.
Would you mind terribly, if I asked you to explain in layman's language what each of the RE codes mean?
I know the ^ by itself means Start of Line, and that the \n is the line return, but the other symbols have me a bit stumped. Especially how their arranged. Likewise what it the -\1,\2,\3,\n doing?
Sorry to be so stumped, but as I said I'm brand new at this.
Would you mind terribly, if I asked you to explain in layman's language what each of the RE codes mean?
I know the ^ by itself means Start of Line, and that the \n is the line return, but the other symbols have me a bit stumped. Especially how their arranged. Likewise what it the -\1,\2,\3,\n doing?
Sorry to be so stumped, but as I said I'm brand new at this.
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
For BBowers, explanation of my solution:
Summary: Find a line that starts with a hyphen, followed by two other lines. Replace those three lines with one line, showing the contents of each line followed by a comma,
Search for:
^-(.*)\n(.*)\n(.*)\n
^.....Start at the beginning of the line
-......a hyphen
(......Start a tagged expression This will be start of \1
. .... any character
* .....any quantity of preceding character
) .....End of tagged expression This will be the end of \1
\n......End of line
(.*)\n ....Tagged expression 2 (\2) is any number of any characters, and followed by end of line
(.*)\n ....Tagged expression 3 (\3) is any number of any characters, and followed by end of line
(Tagged expressions) are numbered sequentially as they appear in the Search string.
The first (contents) = \1 and the second (contents) is \2 etc.
Replace with:
-\1,\2,\3,\n
- ..... hyphen
\1.......Expression1 first set of (contents between parentheses)
, .......comma
\2, .....Expression 2 followed by a comma
\3, .....Expression followed by a comma
\n ....end of line
Best reference book on RegEx is "Mastering Regular Expressions", 2nd edition, by Jeffrey Friedl, from O"Reilly publisher. Be sure to use 2nd edition, earlier edition is not as complete.
Note that TextPad is limited in its support of RegEx, but still has many more abilities than other programs. WildEdit has more RegEx capabilities.
And much of RegEx that I have learned has been from ben_josephs who provided a working solution for you. Thanks again, Ben
Summary: Find a line that starts with a hyphen, followed by two other lines. Replace those three lines with one line, showing the contents of each line followed by a comma,
Search for:
^-(.*)\n(.*)\n(.*)\n
^.....Start at the beginning of the line
-......a hyphen
(......Start a tagged expression This will be start of \1
. .... any character
* .....any quantity of preceding character
) .....End of tagged expression This will be the end of \1
\n......End of line
(.*)\n ....Tagged expression 2 (\2) is any number of any characters, and followed by end of line
(.*)\n ....Tagged expression 3 (\3) is any number of any characters, and followed by end of line
(Tagged expressions) are numbered sequentially as they appear in the Search string.
The first (contents) = \1 and the second (contents) is \2 etc.
Replace with:
-\1,\2,\3,\n
- ..... hyphen
\1.......Expression1 first set of (contents between parentheses)
, .......comma
\2, .....Expression 2 followed by a comma
\3, .....Expression followed by a comma
\n ....end of line
Best reference book on RegEx is "Mastering Regular Expressions", 2nd edition, by Jeffrey Friedl, from O"Reilly publisher. Be sure to use 2nd edition, earlier edition is not as complete.
Note that TextPad is limited in its support of RegEx, but still has many more abilities than other programs. WildEdit has more RegEx capabilities.
And much of RegEx that I have learned has been from ben_josephs who provided a working solution for you. Thanks again, Ben
Hope this was helpful.............good luck,
Bob
Bob
-
ben_josephs
- Posts: 2464
- Joined: Sun Mar 02, 2003 9:22 pm
Bob has already answered your question. My answer is almost identical, but as I've already typed it, I'll post it anyway, in case a different slant is helpful.
This assumes you are using Posix regular expression syntax:
Each parenthesised subexpression captures the text substring it matches. The substring matched by the 1st parenthesised subexpression can be referred to as \1 in the replacement expression, the 2nd as \2, and so on up to \9.
Thus the contents of the first line are matched by the 1st parenthesised subexpression, and are referrred to as \1. Similarly, the contents of the second line are matched by the 2nd parenthesised subexpression, and are referrred to as \2. And the contents of the third line are matched by the 3rd parenthesised subexpression, and are referrred to as \3.
Look in TextPad's help under
Reference Information | Regular Expressions,
Reference Information | Replacement Expressions and
How to... | Find and Replace Text | Use Regular Expressions.
There are many regular expression tutorials on the web, and you will find recommendations for some of them if you search this forum.
A standard reference for regular expressions is
Friedl, Jeffrey E F
Mastering Regular Expressions, 2nd ed
O'Reilly, 2002
ISBN: 0596002890
http://regex.info/
But be aware that the regular expression recogniser used by TextPad is rather weak by the standards of recent tools, so you may get frustrated if you discover a handy trick that doesn't work in TextPad.
This assumes you are using Posix regular expression syntax:
^(-.*)\n(.*)\n(.*) matchesConfigure | Preferences | Editor
[X] Use POSIX regular expression syntax
Code: Select all
1. ^ the beginning of a line
2. (-.*) the contents of a line beginning with a hyphen, that is:
2.1 ( beginning of 1st capturing subexpression
2.2 - a hyphen
2.3 .* the rest of the line, that is:
2.3.1 . any character except newline
2.3.2 * repeated any number (possibly zero) of times
2.4 ) end of capturing subexpression
3. \n a newline
4. (.*) the contents of a line, that is:
4.1 ( beginning of 2nd capturing subexpression
4.2 .* a whole line, that is:
4.2.1 . any character except newline
4.2.2 * repeated any number (possibly zero) of times
4.3 ) end of capturing subexpression
5. \n a newline
6. (.*) the contents of a line, that is:
6.1 ( beginning of 3rd capturing subexpression
6.2 .* the whole line, that is:
6.2.1 . any character except newline
6.2.2 * repeated any number (possibly zero) of times
6.3 ) end of capturing subexpression
Thus the contents of the first line are matched by the 1st parenthesised subexpression, and are referrred to as \1. Similarly, the contents of the second line are matched by the 2nd parenthesised subexpression, and are referrred to as \2. And the contents of the third line are matched by the 3rd parenthesised subexpression, and are referrred to as \3.
Look in TextPad's help under
Reference Information | Regular Expressions,
Reference Information | Replacement Expressions and
How to... | Find and Replace Text | Use Regular Expressions.
There are many regular expression tutorials on the web, and you will find recommendations for some of them if you search this forum.
A standard reference for regular expressions is
Friedl, Jeffrey E F
Mastering Regular Expressions, 2nd ed
O'Reilly, 2002
ISBN: 0596002890
http://regex.info/
But be aware that the regular expression recogniser used by TextPad is rather weak by the standards of recent tools, so you may get frustrated if you discover a handy trick that doesn't work in TextPad.