Finding character within quotes

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
southiesl
Posts: 2
Joined: Mon Oct 09, 2006 8:33 pm

Finding character within quotes

Post by southiesl »

Hello all, I have a CSV file where i need to eliminate all the commas that are contained with quotes on a single entry....for example:

AdrTp,6,String,O,,"ADDR,PBOX,HOME,BIZZ,MLTO,DLVY",0/1,
AdrLine,6,Section,N/A,,N/A,0/5,
TradgSsn,5,Section,N/A,,N/A,0/1,
TradgSsnCd,6,String,O,,"ACHO,ACHC,ACHL,WAM1,WMAI,NNET,JNET,TOS1,TOS2",0/1,
Value,7,String,M,,,1/1,
StrtNm,6,String,O,,,0/1,

Becomes

AdrTp,6,String,O,,"ADDR / PBOX / HOME / BIZZ / MLTO / DLVY",0/1,
AdrLine,6,Section,N/A,,N/A,0/5,
TradgSsn,5,Section,N/A,,N/A,0/1,
TradgSsnCd,6,String,O,,"ACHO / ACHC / ACHL / WAM1 / WMAI / NNET / JNET / TOS1 / TOS2",0/1,
Value,7,String,M,,,1/1,
StrtNm,6,String,O,,,0/1,


Note I added the Bolds just for effect, and the backslash could be any character, it just needs to be something aside from a comma, as comma's are the delimater for the rest of the file....

Thanks!
Steve
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Find what: ^([^"]*("[^"]*{2})*"[^"]*),
Replace with: \1 /

[X] Regular expression
Replace All repeatedly until it's all done.

This assumes you are using Posix regular expression syntax:
Configure | Preferences | Editor

[X] Use POSIX regular expression syntax
It searches for

Code: Select all

1.    ^                        the beginning of a line
2.    [^"]*("[^"]*{2})*"[^"]*  text containing an odd number of quotes,
                                 that is:
2.1.      [^"]*                    text not containing quotes
2.2.      ("[^"]*{2})*             text containing an even number
                                       (possibly zero) of quotes
2.3.      "                        a quote
2.4.      [^"]*                    text not containing quotes
3.    ,                        a comma
It captures the text that matches 1 and 2 so that it can be used in the replacement, and it appends / .
Last edited by ben_josephs on Tue Oct 10, 2006 8:29 am, edited 1 time in total.
User avatar
SteveH
Posts: 327
Joined: Thu Apr 03, 2003 11:37 am
Location: Edinburgh, Scotland
Contact:

Post by SteveH »

I have one that works but it is not ideal as you have to run a search and replace multiple times to catch all the instances.
Find what: (".*),(.*")
Replace with:\1 \2
This is replacing each commas within quotes with a space. The brackets in the find expression create a reference that can be used (\1 \2 etc) within the replacement expression.

Make sure you have the Regular expression checkbox ticked.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

SteveH wrote:Find what: (".*),(.*")
Replace with:\1 \2
That will replace too many commas. It will replace all commas between the first quote in a line and the last one, including any that are not between paired quotes.
User avatar
SteveH
Posts: 327
Joined: Thu Apr 03, 2003 11:37 am
Location: Edinburgh, Scotland
Contact:

Post by SteveH »

That will replace too many commas.
Agreed. Just to clarify for southiesl, this would be a problem if you had multiple strings enclosed in quotes on a line. TextPad will find the longest matching string when searching.
southiesl
Posts: 2
Joined: Mon Oct 09, 2006 8:33 pm

THANKS

Post by southiesl »

Thanks guys, those worked great!
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

It's been pointed out that there's an error in the regular expression I suggested above. Despite this, the thing as a whole appears to work, although I'm not sure why!

The subexpression
"[^"]*{2}
should be
("[^"]*){2}
and the whole expression should be
^([^"]*(("[^"]*){2})*"[^"]*),

The expression is composed of:

Code: Select all

1.    ^                          the beginning of a line
2.    [^"]*(("[^"]*){2})*"[^"]*  text containing an odd number of quotes,
3.    ,                          a comma
2 is composed of:

Code: Select all

2.1.    [^"]*                    text not containing quotes
2.2.    (("[^"]*){2})*           text containing an even number
                                     (possibly zero) of quotes
2.3.    "                        a quote
2.4.    [^"]*                    text not containing quotes
2.2 is composed of:

Code: Select all

2.2.1     ("[^"]*){2}            text containing 2 quotes
2.2.2     *                        ... any number of times
2.2.1 is composed of:

Code: Select all

2.2.1.1     "[^"]*               text containing 1 quote
2.2.1.2     {2}                  ... twice
2.2.1.1 is composed of:

Code: Select all

2.2.1.1.1     "                  a quote
2.2.1.1.2     [^"]*              text not containing quotes
2.1, 2.4, 2.2.1.1.2 are each composed of:

Code: Select all

...1            [^"]             anything that isn't a quote
...2            *                  ... any number (possibly zero) of times
Apologies for any confusion caused, and thanks, Ronny.
User avatar
meisn
Posts: 11
Joined: Wed Oct 18, 2006 6:25 pm
Location: Germany
Contact:

Your welcome!

Post by meisn »

ben_josephs wrote: Apologies for any confusion caused, and thanks, Ronny.
It was a pleasure to help you Ben. :)

Cheers

Meisn
Post Reply