Regex stumper
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
Regex stumper
John "Harry" "Timothy Jessica Mel" "Pat" Ginger
How do I split this string so that the output is
John
"Harry"
"Timothy Jessica Mel"
"Pat"
Ginger
In other words, split on space only if its NOT between "quotes".
Also some strings could very well have a "quoted" value at the end or could very well start with a quoted value.
eg; "John" Harry "Timothy Jessica Mel" Pat "Ginger"
wierd data, I know...
UNIX please,
Thnx.
webM
How do I split this string so that the output is
John
"Harry"
"Timothy Jessica Mel"
"Pat"
Ginger
In other words, split on space only if its NOT between "quotes".
Also some strings could very well have a "quoted" value at the end or could very well start with a quoted value.
eg; "John" Harry "Timothy Jessica Mel" Pat "Ginger"
wierd data, I know...
UNIX please,
Thnx.
webM
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
Got it.........
How about this?
Three passes (Using underscore "_" to represent space character):
Start at beginning
First past replace "_" with "\n"
Start at beginning
Second pass replace _" with \n"
Start at beginning
Third pass, replace "_ with "\n
Works for me.......should work for you too.........good luck,
Bob
==================================
I'm a humble man..............and PROUD of it!
How about this?
Three passes (Using underscore "_" to represent space character):
Start at beginning
First past replace "_" with "\n"
Start at beginning
Second pass replace _" with \n"
Start at beginning
Third pass, replace "_ with "\n
Works for me.......should work for you too.........good luck,
Bob
==================================
I'm a humble man..............and PROUD of it!
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
LOL Bob .. the example is in the first post....
And actually, I didnt want to replace the quotes, I wanted to split the string on whitespace ONLY if that space is NOT between "quotes"
Right now I feel like giving this data file its first and only flying lesson >>> through the window.
webM
And actually, I didnt want to replace the quotes, I wanted to split the string on whitespace ONLY if that space is NOT between "quotes"
Still crunching at it here .. this one is a back breaker/head banger/ using regex.In other words, split on space only if its NOT between "quotes".
Also some strings could very well have a "quoted" value at the end or could very well start with a quoted value.
eg; "John" Harry "Timothy Jessica Mel" Pat "Ginger"
wierd data, I know...
Right now I feel like giving this data file its first and only flying lesson >>> through the window.
webM
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
You mentioned that:
If it's a problem to go through three passes, how about making a macro to do that for you:
CTRL-HOME
Find and Replace PASS1
CTRL-HOME
Find and Replace PASS2
CTRL-HOME
Find and Replace PASS3
============================
By demanding a single REGEX you are going to really test me. I look at these as challenges, and it forces me to learn new things. But I'm not in the mood right now, and time is tight. GIMME A BREAK, will you?
The solution I used was tested on your example, and gave the exact results that you showed.the example is in the first post
If it's a problem to go through three passes, how about making a macro to do that for you:
CTRL-HOME
Find and Replace PASS1
CTRL-HOME
Find and Replace PASS2
CTRL-HOME
Find and Replace PASS3
============================
By demanding a single REGEX you are going to really test me. I look at these as challenges, and it forces me to learn new things. But I'm not in the mood right now, and time is tight. GIMME A BREAK, will you?
Take two or even three. Then come back and read my first post....GIMME A BREAK, will you?
Your instructions replaces the quotes with \n...not right, it should replace SPACES with \n if the space DOES NOT fall within quotes.
No sweat,... but the reason why i was trying to do it with regex is because I have to do a perl script to retrieve the file from the web.. push the lines into @array, masssage the data and write to a new file... I ws only testing the outcome in TP before setting the script loose.
I was trying to employ the same 3/4 pass technique with regex in the script but having a hard time remembering when the " is passed so as not to split on the next space but continue till the next " then split.
The dope who came up with that data structure is job hunting now.... so much for that bright spark.
webM
- s_reynisson
- Posts: 939
- Joined: Tue May 06, 2003 1:59 pm
Give this a "flying lesson"
Add as many middle names as you need ie. _*[a-zA-Z]* in pass 1.
Also beware that I'm only using a-z to grab a name...
Hmm, on my n-th edit here but, is there any way to grab all visual chars?
Might by handy to be sure you're getting all names.
Code: Select all
Pass 1:
_("[a-zA-Z]+_*[a-zA-Z]*_*[a-zA-Z]*") -> \n\1
Pass 2:
"_([a-zA-Z]) -> "\n\1
Using POSIX and _=space
Code: Select all
"John" Harry "Timothy Jessica Mel" Pat "Ginger"
becomes
"John"
Harry
"Timothy Jessica Mel"
Pat
"Ginger"
Also beware that I'm only using a-z to grab a name...
Hmm, on my n-th edit here but, is there any way to grab all visual chars?
Might by handy to be sure you're getting all names.
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
Hello s_reynisson. It look like you fell into the same trap that I did. He (webmasta) submitted another example in his first posting:
"John" Harry "Timothy Jessica Mel" Pat "Ginger"
This does not come out correctly with your solution, but it works good on the first sample.
I got this result using your solution on the second example:
"John"Harry
"Timothy Jessica Mel"Pat
"Ginger"
============================
My earlier version still works but end up with a different result from what was displayed, but the first display was for the first sample. This result looks like what would be expected:
"John"
Harry
"Timothy Jessica Mel"
Pat
"Ginger"
Which looks good to me.
I would add one more pass replacing all " with nothing. ==============================
So I would like to resubmit:
Start at beginning
First past replace "_" with "\n"
Start at beginning
Second pass replace _" with \n"
Start at beginning
Third pass, replace "_ with "\n
Start at beginning
Fourth pass, replace " with nothing, delete them all.
Final result for both models =:
John
Harry
Timothy Jessica Mel
Pat
Ginger
=====================================
Thanks for letting me take a break, but enough for tonight. good luck.
"John" Harry "Timothy Jessica Mel" Pat "Ginger"
This does not come out correctly with your solution, but it works good on the first sample.
I got this result using your solution on the second example:
"John"Harry
"Timothy Jessica Mel"Pat
"Ginger"
============================
My earlier version still works but end up with a different result from what was displayed, but the first display was for the first sample. This result looks like what would be expected:
"John"
Harry
"Timothy Jessica Mel"
Pat
"Ginger"
Which looks good to me.
I would add one more pass replacing all " with nothing. ==============================
So I would like to resubmit:
Start at beginning
First past replace "_" with "\n"
Start at beginning
Second pass replace _" with \n"
Start at beginning
Third pass, replace "_ with "\n
Start at beginning
Fourth pass, replace " with nothing, delete them all.
Final result for both models =:
John
Harry
Timothy Jessica Mel
Pat
Ginger
=====================================
Thanks for letting me take a break, but enough for tonight. good luck.
- s_reynisson
- Posts: 939
- Joined: Tue May 06, 2003 1:59 pm
hmm, I get
I think the " are supposed to be left in.
Code: Select all
"John"
Harry
"Timothy Jessica Mel"
Pat
"Ginger"
from
"John" Harry "Timothy Jessica Mel" Pat "Ginger"
Code: Select all
Using POSIX and _=space
p1 _("[a-zA-Z]+_*[a-zA-Z]*_*[a-zA-Z]*") -> \n\1
p2 "_([a-zA-Z]) -> "\n\1
Rey .. I gotta sleep and take a break from this.. tomorro is another day..
First .. I keep getting invalid regex... phew...
POSIX is checked ...all underscores were replaced with spaces.. that means that the first regex starts with a space.
Next and MOST IMPORTANT .. TP is driving me over the wall .. I am already up it.
Been at this since 8 am this morn.. midnight now..
[rant]The darn s/r dialog is so small even on my 800x600 res .. cannot pull the box to expand it .. the s/r fields cut off the search terms, the arial text is hard to read in the search field, letters are so close together you cant select properly, eyes are sh*t right now... rave rave more rave[/rant]
Bob.. you need a vacation .. Rey is right.. the quotes are supposed to be left in...(split on space only when the space is not between quotes)
Dont sweat this... I wont get back to it for a couple of days at least..
Got another headache to deal with...Norton Internet Securities firewall popup blocker.
Thnx guys... will be back...
First .. I keep getting invalid regex... phew...
POSIX is checked ...all underscores were replaced with spaces.. that means that the first regex starts with a space.
Next and MOST IMPORTANT .. TP is driving me over the wall .. I am already up it.
Been at this since 8 am this morn.. midnight now..
[rant]The darn s/r dialog is so small even on my 800x600 res .. cannot pull the box to expand it .. the s/r fields cut off the search terms, the arial text is hard to read in the search field, letters are so close together you cant select properly, eyes are sh*t right now... rave rave more rave[/rant]
Bob.. you need a vacation .. Rey is right.. the quotes are supposed to be left in...(split on space only when the space is not between quotes)
Dont sweat this... I wont get back to it for a couple of days at least..
Got another headache to deal with...Norton Internet Securities firewall popup blocker.
Thnx guys... will be back...
-
- Posts: 3
- Joined: Tue Sep 16, 2003 7:08 am
Nice one Milonguero! I'll just point out that you've used non-POSIX syntax, in case anybody comes across this thread in years to come. With POSIX syntax you lose all the backslashes:
Keith MacDonald
Helios Software Solutions
Code: Select all
Find what: *(([^"][^ "]*)|("[^"]*")) +
Replace with:\1\n
Helios Software Solutions
-
- Posts: 3
- Joined: Tue Sep 16, 2003 7:08 am
Well WTF .. It works on both examples in one fell swoop .. Keith, where were ya all day yesterday?? love ya.... thanx heaps... Suddenly my day seems like its gonna be a good one.
John "Harry" "Timothy Jessica Mel" "Pat" Ginger
"John" Harry "Timothy Jessica Mel" Pat "Ginger"
John
"Harry"
"Timothy Jessica Mel"
"Pat"
Ginger
"John"
Harry
"Timothy Jessica Mel"
Pat
"Ginger"
Still cant understand why Rey's regex was returning invalid regex, I didnt change anything this morning and Keith's regex works from the word go.
Bob, take that vaction, you need it.
Thnx again guys,
John "Harry" "Timothy Jessica Mel" "Pat" Ginger
"John" Harry "Timothy Jessica Mel" Pat "Ginger"
John
"Harry"
"Timothy Jessica Mel"
"Pat"
Ginger
"John"
Harry
"Timothy Jessica Mel"
Pat
"Ginger"
Still cant understand why Rey's regex was returning invalid regex, I didnt change anything this morning and Keith's regex works from the word go.
Bob, take that vaction, you need it.
Thnx again guys,
Last edited by webmasta on Wed Sep 17, 2003 12:08 am, edited 1 time in total.