Regular Expression for multiple lines
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
Regular Expression for multiple lines
I am trying to write a regular expression that will replace or delete multiple lines that always start and end with a fixed sequence of characters. For example:
START OF DATA
.
.
.
END OF DATA
The problem I have is that . does not match line terminators. Also, I can't have a class [.\n]. Any suggestions?
I check the forum without any luck. Thanks.
START OF DATA
.
.
.
END OF DATA
The problem I have is that . does not match line terminators. Also, I can't have a class [.\n]. Any suggestions?
I check the forum without any luck. Thanks.
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
OK, I eliminated the new lines. I am left with a regular expression like this:
START.+END
where "START" and "END" are the fixed sequence of characters and any characters can be in between. I tested this in another regular expression processor (Java) and it works - but not in TextPad.
Any more ideas?
Thanks.
START.+END
where "START" and "END" are the fixed sequence of characters and any characters can be in between. I tested this in another regular expression processor (Java) and it works - but not in TextPad.
Any more ideas?
Thanks.
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
If you have other lines in the document besides the START OF DATA and END OF DATA groups, then be sure to only replace "\n" on the selected text for those lines.
If you replaced all \n with a unique code then you should have one long string that now replaces those blocks. Now search for "START.*" and replace with nothing.
If you replaced all \n with a unique code then you should have one long string that now replaces those blocks. Now search for "START.*" and replace with nothing.
Hope this was helpful.............good luck,
Bob
Bob
Hi Bob,
I need to specify a start and an end, e.g. "START.*END" and delete everything in-between as well as the start and end. Unfortunately, the file is very large and the pattern occurs many times. I can always do this manually. But I am sure the problem will come up again and I am looking for a simple general solution.
In addition "START.*END" does not work even if the start and end are on a single line. It seems to me that the regular expression processor is deficient - but, admittedly, I am not an expert in such matters.
Thanks.
I need to specify a start and an end, e.g. "START.*END" and delete everything in-between as well as the start and end. Unfortunately, the file is very large and the pattern occurs many times. I can always do this manually. But I am sure the problem will come up again and I am looking for a simple general solution.
In addition "START.*END" does not work even if the start and end are on a single line. It seems to me that the regular expression processor is deficient - but, admittedly, I am not an expert in such matters.
Thanks.
- talleyrand
- Posts: 624
- Joined: Mon Jul 21, 2003 6:56 pm
- Location: Kansas City, MO, USA
- Contact:
How 'bout just a simple program to handle it?
Install Python and you're good to go.
Copy this code and paste into Textpad.
Save it to something like C:\alg.py
Update the filename variables and your marker criteria. If you need a more complex search, say a regular expression, it wouldn't be hard to implement.
C:\>python alg.py
You can also run it straight out of Texpad. Set the command to wherever you installed Python (probably C:\python23\python.exe)
Install Python and you're good to go.
Copy this code and paste into Textpad.
Save it to something like C:\alg.py
Update the filename variables and your marker criteria. If you need a more complex search, say a regular expression, it wouldn't be hard to implement.
C:\>python alg.py
You can also run it straight out of Texpad. Set the command to wherever you installed Python (probably C:\python23\python.exe)
Code: Select all
import sys
def processFile(fileName, outFileName, beginMarker, endMarker):
"""
Process our file.
"""
ignore = False
out = open(outFileName, 'w')
for currentLine in open(fileName, 'r').readlines():
#if the marker not a consistent case, uncomment the following
# currentLine = currentLine.lower()
print currentLine[:-1]
if (currentLine.find(beginMarker) >= 0):
ignore = True
if (not ignore):
#write line
out.write(currentLine)
if (currentLine.find(endMarker) >= 0):
ignore = False
#clean up
out.close()
def main():
fileName = r'c:\bfellows\alg_test.txt'
outFileName = r'c:\bfellows\alg_out.txt'
beginMarker = 'START OF DATA'
endMarker = 'END OF DATA'
processFile(fileName, outFileName, beginMarker, endMarker)
if __name__ == '__main__':
main()
I choose to fight with a sack of angry cats.
I reported a problem with 4.6.2 and regexp to replace text across multiple lines: http://textpad.com/forum/viewtopic.php? ... highlight=
... could be related .. ?
... could be related .. ?
When someone suggested that I write a simple program, I came to the conclusion - perhaps inaccurately - that there isn't an interest in nailing down this problem.
My impression is that this is a problem with the regualar expression processor. The regular expression "START.*END" does not work - period. This is the case whether START and END are on the same line or not. When I use the same regular expression in a Java program - it works!
My impression is that this is a problem with the regualar expression processor. The regular expression "START.*END" does not work - period. This is the case whether START and END are on the same line or not. When I use the same regular expression in a Java program - it works!
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
I cannot duplicate this problem.The regular expression "START.*END" does not work - period. This is the case whether START and END are on the same line or not
===================================
Regex works fine for the following test lines
START some data, more data, to the END
START some data, more data, to the END
START some data, more data, to the END
START some data, more data, to the END
START some data, more data, to the END
START some data, more data, to the END
START some data, more data, to the END
START some other data, more other data, to the END
START somemore data, more different data, to the END
START some new data, more new data, to the END
START some old data, more old data, to the END
START some repeat data, more repeat data, to the END
START some unique data, more unique data, to the END
Searching for "START.*END" (without quotes) replaces each line with a blank line.
Searching for "START.*END\n" (without quotes) deletes each line completely.
Selection of POSIX does not matter, same results in both instances.
===============================
Steps taken to make this happen:
From the Main Menu, using Search, Replace, entering value into "Find What" field in the Replace window. Making sure the "Replace With" field is blank. Conditions box have selected "text" and "Regular expression", Scope has selected Active document. Click on Replace All.
Replacements happen as noted above.
Replacing "\n" with a unique value "~" also works on these lines. Using the Scope of Selected Text vs. Active document:
Select text, (lines 4-10)
Replace \n with ~ (combines to START.........END).
Select text (the block just modified START.....END)
Replace START.*END with nothing (all of line is replaced with blank line).
=====================================
If any of this is not working for you.....
What are your values, what steps are you taking? What are your results?
is a bit vague, no error message? no cursor movement? no focus change? Specifics are helpful here.does not work - period
========================
If that were the case, there would have been no responses to your request. There must be some interest, the item has been looked at over 125 times.that there isn't an interest in nailing down this problem.
Hope this was helpful.............good luck,
Bob
Bob
- talleyrand
- Posts: 624
- Joined: Mon Jul 21, 2003 6:56 pm
- Location: Kansas City, MO, USA
- Contact:
I didn't suggest you write a program, I suggested you run the program I just wrote. It appeared to be working fine for eliminating stuff. Bob already wrote a response but I would second his findings that the regular expression appears to be working for me as well.alg wrote:When someone suggested that I write a simple program, I came to the conclusion - perhaps inaccurately - that there isn't an interest in nailing down this problem.
I choose to fight with a sack of angry cats.
- s_reynisson
- Posts: 939
- Joined: Tue May 06, 2003 1:59 pm
hello there(1st post)
I can't get this to work, you mean it should look like
replace
START OF DATA .*\n.*[^END]
with
START OF DATA
then it replaces one line and waits?
why you just ad a new expression like # for all char including newline?!
so that I can edit something like
so I replace
<hello>#*<bye>
with
<hello><bye>
or let it even accept \n*
I can't get this to work, you mean it should look like
replace
START OF DATA .*\n.*[^END]
with
START OF DATA
then it replaces one line and waits?
why you just ad a new expression like # for all char including newline?!
so that I can edit something like
Code: Select all
<hello>
I am trying to write a regular expression that will replace or delete multiple lines that always start and end with a fixed sequence of characters. For example:
START OF DATA
.
.
.
END OF DATA
The problem I have is that . does not match line terminators. Also, I can't have a class [.\n]. Any suggestions?
I check the forum without any luck. Thanks.
<bye>
<hello>#*<bye>
with
<hello><bye>
or let it even accept \n*
The trick to deleting to replacing a sequence of characters that extends over multiple lines was given by Bob Hansen. Namely, first you have to replace all new lines with a unique code. Then you can use a regular expression like START.*END to find and replace the sequence that starts with START and ends with END. The final step would be to replace any remaining unique codes with new lines.
I hope this helps.
I hope this helps.