TEXTPAD SEARCH AND REPLACE QUERY

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
Paul Moon
Posts: 9
Joined: Wed May 17, 2006 3:38 am
Location: Melbourne, Victoria, Australia

TEXTPAD SEARCH AND REPLACE QUERY

Post by Paul Moon »

I have a data file which contains lots of text. The format is something like this...

eg.
1,5,8,3,4,"aaa",2,4,"zzz",3,4

I'd like to convert this text to read down sequentially as follows...
1
5
8
3
4
"aaa"
2
4
"zzz"
3
4

That's easy to do in the above example as I replace the comma with a carriage return. My problems occur when commas appear in between the " " delimiters.

eg.
1,5,8,3,4,"aaa,vvv",2,4,"zzz",3,4

Using the same search and replace technique gives me...

1
5
8
3
4
"aaa
vvv"
2
4
zzz
3
4

I'd like that example to look like this...

1
5
8
3
4
"aaa,vvv"
2
4
zzz
3
4

Can anyone help me with this problem please? Do I require an external program or does TextPad have the ability to perform this? As I perform a lot of this type of data conversion, I'd like to eliminte the manual processes I'm currently going through.
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

1. Replace the commas within the quotes with tilde(~)
2. Replace the commas with \n
3. Replace the tildes with commas.
Hope this was helpful.............good luck,
Bob
ACRobin
Posts: 134
Joined: Fri Nov 04, 2005 9:51 pm
Location: Northumberland,UK
Contact:

one solution

Post by ACRobin »

One way to solve your problem - not being a regex expert

Do as you have done with the find/replace "," for "\n"

Then write a simple macro to reconstruct the split lines:
Find "QUOTE"
goto end of line (END)
(DEL)
Move cursor down one line
Repeat to EOF.

You can write this into one MACRO.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

I'd already typed this! It's basically ACRobin's solution, using a regex search and replacement instead of a macro. It's easier than hiding the internal commas before the main replacement and unhiding them afterwards.

You can't do that in one go. The quickest way might be to change all the commas blindly, and then change back the ones that shouldn't have been changed:

Change all commas to newlines:
Find what: ,
Replace with: \n

[X] Regular expression
Then for each line that begins with a double quote symbol but doesn't end with one, change its newline back to a comma:
Find what: ^(".*[^"])\n
Replace with: \1,

[X] Regular expression
Repeat the second relacement until there are no more matches.

This assumes you are using Posix regular expression syntax:
Configuration | Preferences | Editor

[X] Use POSIX regular expression syntax
Paul Moon
Posts: 9
Joined: Wed May 17, 2006 3:38 am
Location: Melbourne, Victoria, Australia

TEXTPAD SEARCH AND REPLACE QUERY

Post by Paul Moon »

Many thanks for the quick replies and for a solution. Much appreciated.

I understand the logic behind the solution. I'm not totally sure of the search and replace syntax? I was looking up regular expression information but as I am inexperienced with TextPad, I'm not sure what...

Find what: ^(".*[^"])\n
Replace with: \1,

actually means? I can see there are wildcard type features involved and the ^ character refers to an anchor feature. Could I have a breakdown summary of the above regular expression please. In the meantime, I'm looking through the help file to sort it out. Thanks.
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Progressive attempts to explain previous syntax:
Find what: ^(".*[^"])\n
Replace with: \1,

----------------------------
First explanation
^ = Start at the beginning of the line
(...........) = Group all between paranthesis as the first element = \1
" = a quote symbol
. = any character
* = any number of the preceding character/element
[^..........] = exclude any that is in this group
\n = end of line
-----------------------------
Second explanation:
^ from start of line
(start of element 1)
".* = a double quote followed by any number of characters
[^"] = that is not a double quote
\n = to the end of the line.
-----------------------------------
Third explanation:
(".*[^"]) = a string that starts with double quote followed by any group of characters that is not a double quote
------------------------
Fourth explanation:
\1 = (".*[^"])
---------------------------

So this:
Find what: ^(".*[^"])\n
Replace with: \1,

Means: from the start of the line, look for a string that starts with double quote until the end of line, or the next double quote and replace it with the string that was located, followed by a comma.

===============================
===============================

With that said, when I tried it out from the previous instructions, it did not work for me. So I am suggesting this as an alternative:

Edited Note: Subsequent to this posting, ben_josephs pointed out that my Search string had an invisible trailing space character that caused this to fail for me. It was my typo error, not his solution. Sorry about that. In any event we were able to come up with a solution and explanations fo the various components that were used.

Step 1: As ben_josephs suggested, replace all commas with \n, that will result in this:
1
5
8
3
4
"aaa
vvv"
2
4
"zzz"
3
4
-----------------------------------
Step 2. Now do this:[
Search for: ^(".*)\n(.*")\n
Replace with: \1,\2\n
--------------------------------------
Explanation of my solution:
^" = take lines that have double quote as first character
( = start of element 1)
.* = followed any number of any characters
) = end of element 1
\n = to the end of one line
( = start of element 2)
.* = followed by any number of any characters
" = followed by a double quote
) = end of element 2
\n = that is at the end of the line.

Replace that found string with element 1, followed by a comma, followed by element 2, followed with a line ending.

OR:
Look for two lines in succession where the first line starts with double quote and the next line ends with double quote. Replace the line ending between them with a comma.

Result is:
1
5
8
3
4
"aaa,vvv"
2
4
"zzz"
3
4
Last edited by Bob Hansen on Tue May 23, 2006 3:22 am, edited 1 time in total.
Hope this was helpful.............good luck,
Bob
Paul Moon
Posts: 9
Joined: Wed May 17, 2006 3:38 am
Location: Melbourne, Victoria, Australia

TEXTPAD SEARCH AND REPLACE QUERY

Post by Paul Moon »

Thanks for the prompt reply again. I tried both techniques and fully tested them. They both work splendidly well. I've managed to save the search & replace techniques you provided into macros. I can run these in a particular order that converts and cleans the data files into the exact format I require. Many thanks for your valuable advise.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Bob Hansen wrote:Find what: ^(".*[^"])\n
Replace with: \1,
...
when I tried it out from the previous instructions, it did not work for me.
In what way did it not work?
Bob Hansen wrote:Search for: ^(".*)\n(.*")\n
Replace with: \1,\2\n
This won't work if there are two or more commas embedded in one quoted item.
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Hi Ben
In what way did it not work?
After replacing the commas with \n I ended up with this:
1
5
8
3
4
"aaa
vvv"
2
4
"zzz"
3
4
When I then use the Search expression, I get message that "cannot find regular expression ^(".*[^"])\n " Did I miss something in the steps? Or is there a typo in the Search expression?
This won't work if there are two or more commas embedded in one quoted item.
The original sample does not show any instances where there are two or more commas in one quoted item, so I did not plan for it.
I tried to respond to the sample provided.
I did test it against successive strings with similar formats and it was successful.
And as above, I applied this to the strings after replacing the commas with \n.

--------------------------
In any event, at least Paul Moon is happy!
Hope this was helpful.............good luck,
Bob
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Bob Hansen wrote:"cannot find regular expression ^(".*[^"])\n "
Is there a space at the end of that regex? There shouldn't be.
Paul Moon
Posts: 9
Joined: Wed May 17, 2006 3:38 am
Location: Melbourne, Victoria, Australia

TEXTPAD SEARCH AND REPLACE

Post by Paul Moon »

Thanks again guys for the effort. When two or more commas exist between the quotes, I just rerun the macro and it cleans it up until all commas have been removed and the text appears as it should (on the one line).
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Is there a space at the end of that regex? There shouldn't be.
YES! Good catch ben, because I have never seen one of your solutions not work before.

I usually have Visible Spaces turned on, but that does not work in the Search/Find/Replace fields.

Mystery solved. I will modify my earlier postings.
Hope this was helpful.............good luck,
Bob
Post Reply