Is this a reg exp bug or is it just me?

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
Ed
Posts: 103
Joined: Tue Mar 04, 2003 9:09 am
Location: Devon, UK

Is this a reg exp bug or is it just me?

Post by Ed »

Search for (regexp non-Posix) ^.*: *\|\\n"
Replace All with nothing

Before:
"PcTestData:\n"
"PcAnswerPosX: 4027727.93728731\n"

After:
\n"
4027727.93728731

I was expecting:

4027727.93728731
User avatar
s_reynisson
Posts: 940
Joined: Tue May 06, 2003 1:59 pm

Post by s_reynisson »

Can you use something like this? (w/POSIX)

^[^.0-9]*|\\n"

HTH
Then I open up and see
the person fumbling here is me
a different way to be
User avatar
Bob Hansen
Posts: 1517
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

^[^.0-9]*|\\n"
did not work for me.

Try this:
Search for:
^.*: ([0-9]+.[0-9]+)\\n"

Replace with:
\1


Explanation of Search RegEx:
^......................Start from beginning of line
.* ....................Any number of characters
: ..................... A colon followed by a space (there is a space after this colon)
( ......................Beginning of first tagged expression
[0-9]+ .............One or more digits
. ......................A period (decimal point)
[0-9]+ ..............One or more digits
) .......................End of first tagged expression
\ ......................Treat next character as normal character
\n" ...................Specific string of characters

Explanation of Replace RegEx:
\1 ..................Contents of first tagged expression
======================================
This is assuming that the double quotes are really on the lines.
This does not eliminate the first line.

This will result with convert from:
"PcTestData:\n"
"PcAnswerPosX: 4027727.93728731\n"

To:
"PcTestData:\n"
4027727.93728731
Hope this was helpful.............good luck,
Bob
User avatar
s_reynisson
Posts: 940
Joined: Tue May 06, 2003 1:59 pm

Post by s_reynisson »

Hmm, I thought Ed wanted the result:
I was expecting:

4027727.93728731
Including a single emty line before line with the number.
Double checked, ^[^.0-9]*|\\n" works for that.
Anyway, covering both solutions can't hurt 8)
Then I open up and see
the person fumbling here is me
a different way to be
User avatar
Bob Hansen
Posts: 1517
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Hmmm back to s_reynisson.

It looks like I was wrong. :oops:

Your Regex just worked for me. It appears that I made an operational error in my testing.

I had done a Find using your RegEx and that did not work. I frequently have done Finds in the past so I would not do a Replace by mistake. But I just now did a Search/Replace for the RegEx, and left the Replace field blank, clicked on Replace All, and it did work as you showed, with the blank line above the number string.

:idea: Gotta go back and understand how TextPad is doing that, replacing something different than what shows up as Find. I may have to relearn the way I have been using RegEx with TestPad.

Thanks for your solution.
Hope this was helpful.............good luck,
Bob
User avatar
s_reynisson
Posts: 940
Joined: Tue May 06, 2003 1:59 pm

Post by s_reynisson »

Ok, I must be getting tired :shock:

Code: Select all

"PcTestData:\n" 
"PcAnswerPosX: 4027727.93728731\n" 

to test, TP stops on line 3
TP simply stops on line 3 and does not report
"can not find regular expression...." :?:
Same if you put an emty line 1, stops on that to.
Edit: tested a little more and

Code: Select all

^.*|\\n" fails
^.+|\\n" works
So the solution ^[^.0-9]+|\\n" is ok if there are emty lines.
Then I open up and see
the person fumbling here is me
a different way to be
User avatar
Bob Hansen
Posts: 1517
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Did some more testing on ^[^.0-9]*|\\n", seeing a number of strange (to me) things. Using POSIX, Conditions are text, and regex. Scope is active document. All tests are done from the top of the document.

Using Search, Replace dialog box:

1. If I have multiple lines on document with some blank lines in between, doing "Replace All" works fine, all characters that are not digits/decimals are eliminated. (brackets"[]", parenthesis "()", operators "+-*\"^, pipes "|", colons":", are all treated as digits/decimal, they are not eliminated)

2. If I do Find Next, it will highlight the correct strings until it comes to a blank line. It stops looking at that point. If I move the cursor down to another line with text, then FindNext continues correctly until it finds a blank line and stops again.

3. If I do Replace Next, the first instance is replaced. But it does not find any other instances. This is probably related to #2 above, because the line it is on after the first replace is a blank line.

4. Using Find dialog box, selecting Mark All marks every line. Empty lines, all text, all digits, mixed text/digits.

I did not expect to see the extra "digit/decimal" characters retained.
I did not expect to see the Find Next procress stop at a blank line, thought it would continue to end of document.
I did not expect to see every line bookmarked with Mark All, it must be because of the invisible \n

I suppose these results may be normal, but thought I would detail them for others to also understand this behavior. Can someone please provide an explanation for me?
======================
Editing note after posting: I see that s_reynisson also saw some of these anomolies. He snuck that posting in ahead of me while I was working on this. He also modified the code to continue past blank lines.......atta boy!.

Looks like it may be better to use "+" for one or more rather than "*" for any quantity. Only in this instance or similar instances, or as a general rule in TextPad? If also in similar, can guidelines be provided to define "similar instances". What are the tradeoffs of "+" vs. "*" ?
Hope this was helpful.............good luck,
Bob
Ed
Posts: 103
Joined: Tue Mar 04, 2003 9:09 am
Location: Devon, UK

Post by Ed »

Well, I came in this morning to find a flurry of responses. Thank-you so much. I should point out that I don't actually need a solution to the regexp - but thanks anyway. But can someone tell me either is my original regexp wrong or is there a bug in the TP regexp code? From Bob's research I think there are some anomalies. Perhaps Helios will take note. Having said that, it's hard to find fault with TP especially now that Helios have been more actively dealing with problems.
User avatar
Bob Hansen
Posts: 1517
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Hi Ed. This is a fair question:
But can someone tell me either is my original regexp wrong or is there a bug in the TP regexp code?
At one point, I thought I had it figured out. Can't go into details right now, but now I also wonder if there may be a bug here. Will try to do some more testing, don't want to cry "Wolf!" Although I think you were provided with a solution, your question is still valid, on the surface, the Regex looks like it should work.

I have just spent hours testing various modifications of the RegEx you were asking about. I have been working in the area of the "non ordinary" characters "\" and "n" and the timing associated with those replacements.

I have come up with a number of combinations that step through properly giving correct results with Replace Next, but when doing Replace All, the "\" is still there. The best I can do still ends up with a "\" on the first line.

One quick observation: At times, when I add a real backslash "\", I may end up with message that "missing ( or )" is reason for failure. But the number of matching ( and ) is correct. And that is using \( and \) for non-Posix as you indicated in your first message.

:idea: I will try to do some more testing using Posix instead to see if both behave the same.
Hope this was helpful.............good luck,
Bob
Post Reply