Page 1 of 2

Find string within range

Posted: Tue Mar 25, 2014 10:31 pm
by gcotterl
To find rows with '684556' between positions 718 and 1198, I'm using this regular expression:

^.{717,1198}684556

But a dialog box is displayed saying "The complexity of matching the regular expression exceeded the predefined bounds. Try refactoring the regular expression to make each choice made by the state machine unambiguous. This exception is thrown to prevent "eternal" matches that take an indefinite period time to locate."

What would be a better expression?

Posted: Tue Mar 25, 2014 11:19 pm
by ben_josephs
I can't reproduce this.

Is that the exact entire regex?
Please describe the target text (but don't post huge quantities of it).

Posted: Tue Mar 25, 2014 11:32 pm
by gcotterl
Except for the string and the range, it has the same syntax as your previous reply; see:

http://forums.textpad.com/viewtopic.php ... ght=068345

In my file, each row has 1,274 characters; in some of the rows, the string is located within the bounds specified.

Posted: Wed Mar 26, 2014 8:04 am
by ben_josephs
Please answer my questions explicitly.

Is ^.{717,1198}684556 the exact and entire regex you're using?
Please provide one line that elicits the problem.

I have tried many things, but I can't make this search fail or even just run slowly.

Are you using TextPad 7.2.0?

Posted: Wed Mar 26, 2014 4:15 pm
by gcotterl
I'm using Textpad 7.2.0 (32-bit edition)
The exact and entire regex I'm using is: ^.{717,1198}684556
The "Regular Expression" box is ticked.
My file contains 985,945 rows and each row contains 1,274 characters.
The search string ('684556') exists in 175,296 rows within the specified range.
Here is one line:

00961805522013UNIT 1 CM 181/073 INT. IN COMM IN LOT 1-P OF TR 33576 MB 407/001 Y01105201249560004210COPPER CANYON RD PALM SPRINGS 000092262LND000090270STR000206550 000000000 000000000 000000000 000000000 000000000 000000000 000000000 000000000 000000000010000000002968200351280000003847103900100000005921045121000000296826826170000001680068263700000033946684556000000003320000009999999999900000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000210986000000000000002109860000000000000UNPAID UNPAID 00000905200000000000000000000000000UNPAID

[color=yellow]However, Textpad finds thousands of lines containing the string before the dialog box with the "The complexity..." message is displayed.[/color]

Posted: Wed Mar 26, 2014 6:30 pm
by ben_josephs
I'm still unable to reproduce your problem. I even got TextPad to mark 200 000 long lines matching your regex. It took 9 seconds.

Perhaps the issue only arises with the 32-bit version of TextPad. I'm using the 64-bit version.

Posted: Wed Mar 26, 2014 6:51 pm
by gcotterl
I just tried the regex again.

This time, when I clicked "Mark All", TextPad marked 22 rows in the first 13,822 rows before the "The complexity..." dialog box was displayed.

All of the rows look the same (except for the actual data) and no "weird" characters exist.

Posted: Wed Mar 26, 2014 6:55 pm
by gcotterl
When I placed the cursor on the row below the 22nd marked row and pressed "FIND NEXT", the "Complexity ...." box was immediately displayed.

Posted: Wed Mar 26, 2014 8:23 pm
by ben_josephs
How many rows below the 22nd marked row is the first row that the regex should match?

If it's not too many, post all those lines from, but not including, the 22nd marked row up to and including the next one that the regex should match.

Enclose them in a

Code: Select all

[/color][/b]...[b][color=blue]
[/color][/b] block.

Posted: Wed Mar 26, 2014 9:25 pm
by gcotterl
About 603,000 rows are between the 22nd marked row and the next row that contains the string.

Posted: Thu Mar 27, 2014 12:01 pm
by ben_josephs
I still can't reproduce your problem.
Which line is the cursor on when the error message is displayed?
Please post that line and (if it's a different line) the next line on which the regex should match. Enclose them in

Code: Select all

[/color][/b]...[b][color=blue]
[/color][/b] blocks.
As I suggested earlier, the issue might arise only with the 32-bit version of TextPad.

Posted: Thu Mar 27, 2014 3:39 pm
by gcotterl
The cursor is on line 13810 (the first line that contains the search string).

Here is line 13810 and line 13811 (both containing the regex):

[code]00961805522013UNIT 1 CM 181/073 INT. IN COMM IN LOT 1-P OF TR 33576 MB 407/001 Y01105201249560004210COPPER CANYON RD PALM SPRINGS 000092262LND000090270STR000206550 000000000 000000000 000000000 000000000 000000000 000000000 000000000 000000000 000000000010000000002968200351280000003847103900100000005921045121000000296826826170000001680068263700000033946684556000000003320000009999999999900000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000210986000000000000002109860000000000000UNPAID UNPAID 00000905200000000000000000000000000UNPAID
00961805632013UNIT 2 CM 181/073 INT. IN COMM IN LOT 1-P OF TR 33576 MB 407/001 Y01105201249560004230COPPER CANYON RD PALM SPRINGS 000092262LND000094000STR000159000 000000000 000000000 000000000 000000000 000000000 000000000 000000000 000000000 000000000010000000002530000351280000003279103900100000005047045121000000253006826170000001680068263700000033946684556000000003320000009999999999900000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000183608000000000000001836080000000000000UNPAID UNPAID 00000905300000000000000000000000000UNPAID
[/code]

Posted: Thu Mar 27, 2014 10:16 pm
by ben_josephs
... and deselect Disable BBCode in this post.

Posted: Thu Mar 27, 2014 10:48 pm
by gcotterl
Here you go:

The cursor is on line 13810 (the first line that contains the search string).

Here is line 13810 and line 13811 (both containing the regex):

Code: Select all

00961805522013UNIT 1 CM 181/073 INT. IN COMM IN LOT 1-P OF TR 33576 MB 407/001 Y01105201249560004210COPPER CANYON RD PALM SPRINGS 000092262LND000090270STR000206550 000000000 000000000 000000000 000000000 000000000 000000000 000000000 000000000 000000000010000000002968200351280000003847103900100000005921045121000000296826826170000001680068263700000033946684556000000003320000009999999999900000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000210986000000000000002109860000000000000UNPAID UNPAID 00000905200000000000000000000000000UNPAID 
00961805632013UNIT 2 CM 181/073 INT. IN COMM IN LOT 1-P OF TR 33576 MB 407/001 Y01105201249560004230COPPER CANYON RD PALM SPRINGS 000092262LND000094000STR000159000 000000000 000000000 000000000 000000000 000000000 000000000 000000000 000000000 000000000010000000002530000351280000003279103900100000005047045121000000253006826170000001680068263700000033946684556000000003320000009999999999900000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000183608000000000000001836080000000000000UNPAID UNPAID 00000905300000000000000000000000000UNPAID 
 

Posted: Fri Mar 28, 2014 10:38 am
by ben_josephs
Sorry, I'm still unable to get 64-bit TextPad to fail in this way. I have no more ideas.