RegEx has strange result

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
obfusc88
Posts: 8
Joined: Mon Dec 18, 2006 8:50 pm

RegEx has strange result

Post by obfusc88 »

I am using the following Search/Replace strings. I thought it was working OK but found some random lines that were not being picked up. I have provided four groups in two lines from the source file.

Code: Select all

<span class="FirstColumnPrompt">District Code:</span></td><td><span class="Value">04</span>

<span class="FirstColumnPrompt">Document Type:</span></td><td><span class="Value">SUMMONS</span>

<span class="Prompt">Location Code:</span><span class="Value">02</span>

<span class="Prompt">Issued Date:</span><span class="Value">01/15/2010</span>
Function: Strip out the Class name and the value and combine on separate lines:
ClassName: Value

Search for: <span class=.[^>]*Prompt.[^>]*>(.[^:]*).[^"]*"Value">(.[^<])*</span>
Replace with: \1: \2\n
-----------------------------------------------

Basic problem is trying to capture line 2.
Step to Find Next/Replace Nest finds groups 1,3,4. Does not find group 2.

Original file had lines 1,2 combined, and lines 3,4 combined. In testing I have split the groups into four separate lines starting with <span... no different results, just easier to read.

Here are some of my observations, that make no sense to me.

Add a digit anywhere in Value, and group 2 is found. IE: SUM7MONS or 3SUMMONS
Remove any single letter from Value and group 2 is found. IE: UMMONS or SMMONS or SUMMNS
Remove two or more letters from Value and group 2 is not found.
Remove all letters from Value and group 2 is found.:
Replace SUMMONS with FRED and group 2 is found.
Replace SUMMONS with FREDDIE and group 2 is not found.
Add a digit anywhere in FREDDIE and group 2 is found.
Add a letter to FREDDIE and group 2 is found.
Add a letter to SUMMONS and group 2 is found.

IS this a RegEx bug or more likely a syntax error?
ben_josephs
Posts: 2459
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

The star near the end of your regex is in the wrong place. Try
<span class=.[^>]*Prompt.[^>]*>(.[^:]*).[^"]*"Value">(.[^<]*)</span>
or, without the unnecessary dots,
<span class=[^>]*Prompt[^>]*>([^:]*)[^"]*"Value">([^<]*)</span>
obfusc88
Posts: 8
Joined: Mon Dec 18, 2006 8:50 pm

Post by obfusc88 »

I knew it was me. What a dummy.

Thank you, that worked out perfect.

That was my first attempt to find the first occurrence of a string where it repeated a few times. I was feeling pretty good until I could not get it totally correct. Thank you again.
Post Reply