REGEX Capturing multiple lines

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
randomwanderer
Posts: 10
Joined: Thu Feb 10, 2005 6:57 pm

REGEX Capturing multiple lines

Post by randomwanderer »

I am attempting to fully utilize regex.
But first can anyone tell me if the following is possible. In below I want to find P2.46 and then capture all characters between the previous $ and the next $


C THE FOLLOWING ROUTINE WILL STABILIZE THE DMM FOR VOLTAGE TRMS.
$
FOR, 'DMM STABILIZE' = 1 THRU 6, THEN $
MEASURE, (VOLTAGE-TRMS), AC SIGNAL,
VOLTAGE-TRMS RANGE 0.1V TO 10V, FREQ 2.4KHZ,
CNX HI P2.46 LO GND $
END, FOR $
C $



Therefore I want to end up with:
$
MEASURE, (VOLTAGE-TRMS), AC SIGNAL,
VOLTAGE-TRMS RANGE 0.1V TO 10V, FREQ 2.4KHZ,
CNX HI P2.46 LO GND $


Thanks to enyone why can answer if TextPax implementaiton of REGEX makes this possible!

Thanks,
Forrest
User avatar
talleyrand
Posts: 625
Joined: Mon Jul 21, 2003 6:56 pm
Location: Kansas City, MO, USA
Contact:

Post by talleyrand »

I'm not a regex guru but having followed the boards for a bit I'm pretty sure the multi-line thing isn't going to work. A common work around is to concatenate all lines together using a character, or sequence of characters, that should not exist in your file. Apply your regular expression to the concatenated line and then replace the sentinal values with newlines.

Having said all that, I think I just learned from ben_josephs (I could be wrong) that the look behind something or other may not work in this case. Apparently it's a hard thing to implement with any efficiency. Please correct me if I'm wrong.

On a final note, if you're desperate for this sort of thing and don't mind installing the Python programming language on your machine, I could write a trivial program to do your search and replace.
I choose to fight with a sack of angry cats.
Ed
Posts: 103
Joined: Tue Mar 04, 2003 9:09 am
Location: Devon, UK

Post by Ed »

with Posix regular expressions:-
search for
\n
replace with
¬
search for
\$[^$]+P2.46[^$]*\$
finds what you want

finally replace ¬ with \n

Bur with an offer like that from talleyrand I'd take it.
randomwanderer
Posts: 10
Joined: Thu Feb 10, 2005 6:57 pm

Wow that totally worked!!

Post by randomwanderer »

Thanks. I was trying to figure that out for 2 days. I will learn a lot more when I decompose it and figure how it works.

\$[^$]+P2.46[^$]*\$

But also why does that not work when I do a search in files? It seaches the file but says it was found once and then just lists the first line.
My goal is to end up with what is captured from above, however many times it's found. That way I have in a file all the complete statements that use that pin. (i.e. P2.46)

Thanks a million,
Forrest
Ed
Posts: 103
Joined: Tue Mar 04, 2003 9:09 am
Location: Devon, UK

Post by Ed »

\$[^$]+P2.46[^$]*\$
\$ - search for $ \ means don't treat $ as an end of line marker
[^$]+ - 1 or more characters that are not a $
P2.46 - match this
[^$]* - 0 or more characters that are not a $
\$ - as above

I believe what yo actually want is to search for everything else and delete it, whereas what you are doing is finding everything that matches.

I'm not sure of your terminology, are you searching many files or just the one?

after the replacement trick perhaps you could try
search for
(\$[^$]+P2.46[^$]*\$)
replace with
\n\1\n
search for
(\$[^$]+P2.46[^$]*\$)
and press "Mark"
From the menu select Seatch>"Invert all bookmarks"
Then Edit>"Cut other">"Bookmarked lines"
Then subs \n for ¬ as before

I've not tried it but reckon its about right.

but take up talleyrand's offer!
randomwanderer
Posts: 10
Joined: Thu Feb 10, 2005 6:57 pm

Post by randomwanderer »

Ed,
Thanks so much for the help! I now understand a lot more than before.
But
this part
,,,after the replacement trick perhaps you could try
search for
(\$[^$]+P2.46[^$]*\$)
replace with
\n\1\n

It looks like the intention is to find the pattern and then in front and back of it put carriage returns. I guess that \1 means keep the pattern found intact. But that search/replace only puts the characters '\1' one time and the pattern is gone.
Am I missing somethiong here? And again I sure do appreciate your help!

Forrest
Ed
Posts: 103
Joined: Tue Mar 04, 2003 9:09 am
Location: Devon, UK

Post by Ed »

I'm not sure I understand your reply, perhaps my instructions weren't clear.

I should have said:
with Posix regular expressions:-
search for
\n
replace ALL with
¬
search for
(\$[^$]+P2.46[^$]*\$)
replace ALL with
\n\1\n
search (not search and replace) for
(\$[^$]+P2.46[^$]*\$)
and press "Mark"
From the menu select Search>"Invert all bookmarks"
Then Edit>"Cut other">"Bookmarked lines"

finally replace ALL ¬ with \n
randomwanderer
Posts: 10
Joined: Thu Feb 10, 2005 6:57 pm

Post by randomwanderer »

I go from this:

C¬ Remove UUT power.¬ $¬ 55 REMOVE, DC SIGNAL USING '+30VDC_UUT',¬ VOLTAGE 30.0 V,¬ CURRENT MAX 1.0 A,¬ CNX HI J1-1 LO J1-5 $¬ 60 REMOVE, DC SIGNAL USING '+6VDC_UUT',¬ VOLTAGE 6.0 V,¬ CURRENT MAX 1.0 A,¬ CNX HI J1-4 LO J1-5 $¬ 65 REMOVE, DC SIGNAL USING '-10VDC_UUT',¬ VOLTAGE -10.0 V,¬ CURRENT MAX 1.0 A,¬ CNX HI P2.46 LO J1-5 $¬ 70 REMOVE, DC SIGNAL USING '-25VDC_UUT',¬ VOLTAGE -25.0 V,¬ CURRENT MAX 1.0 A,¬ CNX HI J1-8 LO J1-5 $



To this after replacing \$[^$]+P2.46[^$]*\$ with this \n\1\n



C¬ Remove UUT power.¬ $¬ 55 REMOVE, DC SIGNAL USING '+30VDC_UUT',¬ VOLTAGE 30.0 V,¬ CURRENT MAX 1.0 A,¬ CNX HI J1-1 LO J1-5 $¬ 60 REMOVE, DC SIGNAL USING '+6VDC_UUT',¬ VOLTAGE 6.0 V,¬ CURRENT MAX 1.0 A,¬ CNX HI J1-4 LO J1-5
\1
¬ 70 REMOVE, DC SIGNAL USING '-25VDC_UUT',¬ VOLTAGE -25.0 V,¬ CURRENT MAX 1.0 A,¬ CNX HI J1-8 LO J1-5 $


So the matching pattern is replaced with '\1' literally.
So I can't do a search on \$[^$]+P2.46[^$]*\$ because that pattern is now replaced with '\1' literally.

That is the part where I must be doing something wrong!

Again thanks so much! If this can work it would mean a lot?

Thanks,
Forrest
Orlando, FL
User avatar
talleyrand
Posts: 625
Joined: Mon Jul 21, 2003 6:56 pm
Location: Kansas City, MO, USA
Contact:

Post by talleyrand »

Make sure you are using POSIX syntax and the parenthesis around the regular expression are required (it's how the grouping for replacement \1 is done)

Doing that resulted in

Code: Select all

C¬ Remove UUT power.¬ $¬ 55 REMOVE, DC SIGNAL USING '+30VDC_UUT',¬ VOLTAGE 30.0 V,¬ CURRENT MAX 1.0 A,¬ CNX HI J1-1 LO J1-5 $¬ 60 REMOVE, DC SIGNAL USING '+6VDC_UUT',¬ VOLTAGE 6.0 V,¬ CURRENT MAX 1.0 A,¬ CNX HI J1-4 LO J1-5 

¬ 70 REMOVE, DC SIGNAL USING '-25VDC_UUT',¬ VOLTAGE -25.0 V,¬ CURRENT MAX 1.0 A,¬ CNX HI J1-8 LO J1-5 $
I choose to fight with a sack of angry cats.
Ed
Posts: 103
Joined: Tue Mar 04, 2003 9:09 am
Location: Devon, UK

Post by Ed »

OK, I believe you are omitting the ( and ) around the search expression. I also made the mistake of omitting a \ before the . although that won't have made much difference.

so search for
(\$[^$]+P2\.46[^$]*\$)
and replace with
\n\1\n

Ensure Posix is set on Configure>Preferences>Editor>"Use Posix..."
Lostgallifreyan
Posts: 25
Joined: Mon Feb 14, 2005 10:51 am

Post by Lostgallifreyan »

At the risk of being facetious, you could always paste that block into a file called X.htm, them pass it through Proxomitron to a browser... :twisted:

This would let you use the $ char in a bounds check, and do exactly what you asked.

Seriously, Prox has an awesome RexExp set, with choice of greedy and non-greedy matches. On occasion I've done exactly what I've said here, where it seems easier than trying to figure out what I might have missed in TextPad's methods. I don't actually pass it to a browser though, just use the test dialog...
randomwanderer
Posts: 10
Joined: Thu Feb 10, 2005 6:57 pm

Post by randomwanderer »

It works! of course I was doing it wrong..
Thought I had Posix on at home last night but I did not. Had only set it at work.
That led me to believe that it was not working because of the parentheses, which I removed. Worthwhile mistake for me though, I learned a lot from everyone's help.


Still have a problem though but may be bumping capabilities of TextPad.
When I do the following

(\$[^$]+P2.46[^$]*\$)
and press "Mark"

I the get the message

'Recursion too deep, The stake overflowed.'

It's a pretty big ATLAS source file. When I make it smaller it works. Do you think there is a way to get beyond that?
I am looking into the Proxomitron possible alternate solution.

Thanks
Ed
Posts: 103
Joined: Tue Mar 04, 2003 9:09 am
Location: Devon, UK

Post by Ed »

It's a bit of a long shot but you might alter the last * to a +
Don't forget that there should also be a \ before the .
I tried the mark with a file over 300000 lines long without a problem. How big is your file?

Is it really the mark that's a problem?
I believe there may be a problem with line length when searching, but by the time you get to marking lines it's all been broken up into lines
Lostgallifreyan
Posts: 25
Joined: Mon Feb 14, 2005 10:51 am

Post by Lostgallifreyan »

Line length is the problem, as you said. I saw that stack overflow warning when working on the problem I posted about in another thread (80555 lines of unindexed forum archive, thread here about stable sorting).

The line count wasn't a problem with regexp macro preprocessing, but the largest posts were, each post was on one line for sorting, arranged with the same method described in this thread. I ended up finding workrounds that didn't raise that warning, basically by making sure that match tests were satisfied without searching too deeply.
Post Reply