TEXTPAD SEARCH AND REPLACE QUERY
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
TEXTPAD SEARCH AND REPLACE QUERY
I have a data file which contains lots of text. The format is something like this...
eg.
1,5,8,3,4,"aaa",2,4,"zzz",3,4
I'd like to convert this text to read down sequentially as follows...
1
5
8
3
4
"aaa"
2
4
"zzz"
3
4
That's easy to do in the above example as I replace the comma with a carriage return. My problems occur when commas appear in between the " " delimiters.
eg.
1,5,8,3,4,"aaa,vvv",2,4,"zzz",3,4
Using the same search and replace technique gives me...
1
5
8
3
4
"aaa
vvv"
2
4
zzz
3
4
I'd like that example to look like this...
1
5
8
3
4
"aaa,vvv"
2
4
zzz
3
4
Can anyone help me with this problem please? Do I require an external program or does TextPad have the ability to perform this? As I perform a lot of this type of data conversion, I'd like to eliminte the manual processes I'm currently going through.
eg.
1,5,8,3,4,"aaa",2,4,"zzz",3,4
I'd like to convert this text to read down sequentially as follows...
1
5
8
3
4
"aaa"
2
4
"zzz"
3
4
That's easy to do in the above example as I replace the comma with a carriage return. My problems occur when commas appear in between the " " delimiters.
eg.
1,5,8,3,4,"aaa,vvv",2,4,"zzz",3,4
Using the same search and replace technique gives me...
1
5
8
3
4
"aaa
vvv"
2
4
zzz
3
4
I'd like that example to look like this...
1
5
8
3
4
"aaa,vvv"
2
4
zzz
3
4
Can anyone help me with this problem please? Do I require an external program or does TextPad have the ability to perform this? As I perform a lot of this type of data conversion, I'd like to eliminte the manual processes I'm currently going through.
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
one solution
One way to solve your problem - not being a regex expert
Do as you have done with the find/replace "," for "\n"
Then write a simple macro to reconstruct the split lines:
Find "QUOTE"
goto end of line (END)
(DEL)
Move cursor down one line
Repeat to EOF.
You can write this into one MACRO.
Do as you have done with the find/replace "," for "\n"
Then write a simple macro to reconstruct the split lines:
Find "QUOTE"
goto end of line (END)
(DEL)
Move cursor down one line
Repeat to EOF.
You can write this into one MACRO.
-
- Posts: 2461
- Joined: Sun Mar 02, 2003 9:22 pm
I'd already typed this! It's basically ACRobin's solution, using a regex search and replacement instead of a macro. It's easier than hiding the internal commas before the main replacement and unhiding them afterwards.
You can't do that in one go. The quickest way might be to change all the commas blindly, and then change back the ones that shouldn't have been changed:
Change all commas to newlines:
This assumes you are using Posix regular expression syntax:
You can't do that in one go. The quickest way might be to change all the commas blindly, and then change back the ones that shouldn't have been changed:
Change all commas to newlines:
Then for each line that begins with a double quote symbol but doesn't end with one, change its newline back to a comma:Find what: ,
Replace with: \n
[X] Regular expression
Repeat the second relacement until there are no more matches.Find what: ^(".*[^"])\n
Replace with: \1,
[X] Regular expression
This assumes you are using Posix regular expression syntax:
Configuration | Preferences | Editor
[X] Use POSIX regular expression syntax
TEXTPAD SEARCH AND REPLACE QUERY
Many thanks for the quick replies and for a solution. Much appreciated.
I understand the logic behind the solution. I'm not totally sure of the search and replace syntax? I was looking up regular expression information but as I am inexperienced with TextPad, I'm not sure what...
Find what: ^(".*[^"])\n
Replace with: \1,
actually means? I can see there are wildcard type features involved and the ^ character refers to an anchor feature. Could I have a breakdown summary of the above regular expression please. In the meantime, I'm looking through the help file to sort it out. Thanks.
I understand the logic behind the solution. I'm not totally sure of the search and replace syntax? I was looking up regular expression information but as I am inexperienced with TextPad, I'm not sure what...
Find what: ^(".*[^"])\n
Replace with: \1,
actually means? I can see there are wildcard type features involved and the ^ character refers to an anchor feature. Could I have a breakdown summary of the above regular expression please. In the meantime, I'm looking through the help file to sort it out. Thanks.
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
Progressive attempts to explain previous syntax:
Find what: ^(".*[^"])\n
Replace with: \1,
----------------------------
First explanation
^ = Start at the beginning of the line
(...........) = Group all between paranthesis as the first element = \1
" = a quote symbol
. = any character
* = any number of the preceding character/element
[^..........] = exclude any that is in this group
\n = end of line
-----------------------------
Second explanation:
^ from start of line
(start of element 1)
".* = a double quote followed by any number of characters
[^"] = that is not a double quote
\n = to the end of the line.
-----------------------------------
Third explanation:
(".*[^"]) = a string that starts with double quote followed by any group of characters that is not a double quote
------------------------
Fourth explanation:
\1 = (".*[^"])
---------------------------
So this:
Find what: ^(".*[^"])\n
Replace with: \1,
Means: from the start of the line, look for a string that starts with double quote until the end of line, or the next double quote and replace it with the string that was located, followed by a comma.
===============================
===============================
With that said, when I tried it out from the previous instructions, it did not work for me. So I am suggesting this as an alternative:
Edited Note: Subsequent to this posting, ben_josephs pointed out that my Search string had an invisible trailing space character that caused this to fail for me. It was my typo error, not his solution. Sorry about that. In any event we were able to come up with a solution and explanations fo the various components that were used.
Step 1: As ben_josephs suggested, replace all commas with \n, that will result in this:
1
5
8
3
4
"aaa
vvv"
2
4
"zzz"
3
4
-----------------------------------
Step 2. Now do this:[
Search for: ^(".*)\n(.*")\n
Replace with: \1,\2\n
--------------------------------------
Explanation of my solution:
^" = take lines that have double quote as first character
( = start of element 1)
.* = followed any number of any characters
) = end of element 1
\n = to the end of one line
( = start of element 2)
.* = followed by any number of any characters
" = followed by a double quote
) = end of element 2
\n = that is at the end of the line.
Replace that found string with element 1, followed by a comma, followed by element 2, followed with a line ending.
OR:
Look for two lines in succession where the first line starts with double quote and the next line ends with double quote. Replace the line ending between them with a comma.
Result is:
1
5
8
3
4
"aaa,vvv"
2
4
"zzz"
3
4
Find what: ^(".*[^"])\n
Replace with: \1,
----------------------------
First explanation
^ = Start at the beginning of the line
(...........) = Group all between paranthesis as the first element = \1
" = a quote symbol
. = any character
* = any number of the preceding character/element
[^..........] = exclude any that is in this group
\n = end of line
-----------------------------
Second explanation:
^ from start of line
(start of element 1)
".* = a double quote followed by any number of characters
[^"] = that is not a double quote
\n = to the end of the line.
-----------------------------------
Third explanation:
(".*[^"]) = a string that starts with double quote followed by any group of characters that is not a double quote
------------------------
Fourth explanation:
\1 = (".*[^"])
---------------------------
So this:
Find what: ^(".*[^"])\n
Replace with: \1,
Means: from the start of the line, look for a string that starts with double quote until the end of line, or the next double quote and replace it with the string that was located, followed by a comma.
===============================
===============================
With that said, when I tried it out from the previous instructions, it did not work for me. So I am suggesting this as an alternative:
Edited Note: Subsequent to this posting, ben_josephs pointed out that my Search string had an invisible trailing space character that caused this to fail for me. It was my typo error, not his solution. Sorry about that. In any event we were able to come up with a solution and explanations fo the various components that were used.
Step 1: As ben_josephs suggested, replace all commas with \n, that will result in this:
1
5
8
3
4
"aaa
vvv"
2
4
"zzz"
3
4
-----------------------------------
Step 2. Now do this:[
Search for: ^(".*)\n(.*")\n
Replace with: \1,\2\n
--------------------------------------
Explanation of my solution:
^" = take lines that have double quote as first character
( = start of element 1)
.* = followed any number of any characters
) = end of element 1
\n = to the end of one line
( = start of element 2)
.* = followed by any number of any characters
" = followed by a double quote
) = end of element 2
\n = that is at the end of the line.
Replace that found string with element 1, followed by a comma, followed by element 2, followed with a line ending.
OR:
Look for two lines in succession where the first line starts with double quote and the next line ends with double quote. Replace the line ending between them with a comma.
Result is:
1
5
8
3
4
"aaa,vvv"
2
4
"zzz"
3
4
Last edited by Bob Hansen on Tue May 23, 2006 3:22 am, edited 1 time in total.
Hope this was helpful.............good luck,
Bob
Bob
TEXTPAD SEARCH AND REPLACE QUERY
Thanks for the prompt reply again. I tried both techniques and fully tested them. They both work splendidly well. I've managed to save the search & replace techniques you provided into macros. I can run these in a particular order that converts and cleans the data files into the exact format I require. Many thanks for your valuable advise.
-
- Posts: 2461
- Joined: Sun Mar 02, 2003 9:22 pm
In what way did it not work?Bob Hansen wrote:Find what: ^(".*[^"])\n
Replace with: \1,
...
when I tried it out from the previous instructions, it did not work for me.
This won't work if there are two or more commas embedded in one quoted item.Bob Hansen wrote:Search for: ^(".*)\n(.*")\n
Replace with: \1,\2\n
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
Hi Ben
1
5
8
3
4
"aaa
vvv"
2
4
"zzz"
3
4
When I then use the Search expression, I get message that "cannot find regular expression ^(".*[^"])\n " Did I miss something in the steps? Or is there a typo in the Search expression?
I tried to respond to the sample provided.
I did test it against successive strings with similar formats and it was successful.
And as above, I applied this to the strings after replacing the commas with \n.
--------------------------
In any event, at least Paul Moon is happy!
After replacing the commas with \n I ended up with this:In what way did it not work?
1
5
8
3
4
"aaa
vvv"
2
4
"zzz"
3
4
When I then use the Search expression, I get message that "cannot find regular expression ^(".*[^"])\n " Did I miss something in the steps? Or is there a typo in the Search expression?
The original sample does not show any instances where there are two or more commas in one quoted item, so I did not plan for it.This won't work if there are two or more commas embedded in one quoted item.
I tried to respond to the sample provided.
I did test it against successive strings with similar formats and it was successful.
And as above, I applied this to the strings after replacing the commas with \n.
--------------------------
In any event, at least Paul Moon is happy!
Hope this was helpful.............good luck,
Bob
Bob
-
- Posts: 2461
- Joined: Sun Mar 02, 2003 9:22 pm
TEXTPAD SEARCH AND REPLACE
Thanks again guys for the effort. When two or more commas exist between the quotes, I just rerun the macro and it cleans it up until all commas have been removed and the text appears as it should (on the one line).
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
YES! Good catch ben, because I have never seen one of your solutions not work before.Is there a space at the end of that regex? There shouldn't be.
I usually have Visible Spaces turned on, but that does not work in the Search/Find/Replace fields.
Mystery solved. I will modify my earlier postings.
Hope this was helpful.............good luck,
Bob
Bob