RegEx has strange result
Posted: Thu Feb 11, 2010 3:15 am
I am using the following Search/Replace strings. I thought it was working OK but found some random lines that were not being picked up. I have provided four groups in two lines from the source file.
Function: Strip out the Class name and the value and combine on separate lines:
ClassName: Value
Search for: <span class=.[^>]*Prompt.[^>]*>(.[^:]*).[^"]*"Value">(.[^<])*</span>
Replace with: \1: \2\n
-----------------------------------------------
Basic problem is trying to capture line 2.
Step to Find Next/Replace Nest finds groups 1,3,4. Does not find group 2.
Original file had lines 1,2 combined, and lines 3,4 combined. In testing I have split the groups into four separate lines starting with <span... no different results, just easier to read.
Here are some of my observations, that make no sense to me.
Add a digit anywhere in Value, and group 2 is found. IE: SUM7MONS or 3SUMMONS
Remove any single letter from Value and group 2 is found. IE: UMMONS or SMMONS or SUMMNS
Remove two or more letters from Value and group 2 is not found.
Remove all letters from Value and group 2 is found.:
Replace SUMMONS with FRED and group 2 is found.
Replace SUMMONS with FREDDIE and group 2 is not found.
Add a digit anywhere in FREDDIE and group 2 is found.
Add a letter to FREDDIE and group 2 is found.
Add a letter to SUMMONS and group 2 is found.
IS this a RegEx bug or more likely a syntax error?
Code: Select all
<span class="FirstColumnPrompt">District Code:</span></td><td><span class="Value">04</span>
<span class="FirstColumnPrompt">Document Type:</span></td><td><span class="Value">SUMMONS</span>
<span class="Prompt">Location Code:</span><span class="Value">02</span>
<span class="Prompt">Issued Date:</span><span class="Value">01/15/2010</span>ClassName: Value
Search for: <span class=.[^>]*Prompt.[^>]*>(.[^:]*).[^"]*"Value">(.[^<])*</span>
Replace with: \1: \2\n
-----------------------------------------------
Basic problem is trying to capture line 2.
Step to Find Next/Replace Nest finds groups 1,3,4. Does not find group 2.
Original file had lines 1,2 combined, and lines 3,4 combined. In testing I have split the groups into four separate lines starting with <span... no different results, just easier to read.
Here are some of my observations, that make no sense to me.
Add a digit anywhere in Value, and group 2 is found. IE: SUM7MONS or 3SUMMONS
Remove any single letter from Value and group 2 is found. IE: UMMONS or SMMONS or SUMMNS
Remove two or more letters from Value and group 2 is not found.
Remove all letters from Value and group 2 is found.:
Replace SUMMONS with FRED and group 2 is found.
Replace SUMMONS with FREDDIE and group 2 is not found.
Add a digit anywhere in FREDDIE and group 2 is found.
Add a letter to FREDDIE and group 2 is found.
Add a letter to SUMMONS and group 2 is found.
IS this a RegEx bug or more likely a syntax error?