Regular Expression for multiple lines

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

alg
Posts: 6
Joined: Tue Jan 06, 2004 4:19 pm

Regular Expression for multiple lines

Post by alg »

I am trying to write a regular expression that will replace or delete multiple lines that always start and end with a fixed sequence of characters. For example:

START OF DATA
.
.
.
END OF DATA

The problem I have is that . does not match line terminators. Also, I can't have a class [.\n]. Any suggestions?

I check the forum without any luck. Thanks.
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Have you tried changing the "\n" at the end of the lines (except the last one, END OF DATA) to something unique like "Q3Qk" which will make those sections one long string and overcome the problem of unknown multiple "\n"
Hope this was helpful.............good luck,
Bob
alg
Posts: 6
Joined: Tue Jan 06, 2004 4:19 pm

Post by alg »

OK, I eliminated the new lines. I am left with a regular expression like this:

START.+END

where "START" and "END" are the fixed sequence of characters and any characters can be in between. I tested this in another regular expression processor (Java) and it works - but not in TextPad.

Any more ideas?

Thanks.
Bekah
Posts: 11
Joined: Thu Jan 08, 2004 2:01 pm

Post by Bekah »

Hi Textpad people,
I think I want to do the same thing as alg.
How do I include a newline in a replacement expression?
Thanks,
Bekah
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

If you have other lines in the document besides the START OF DATA and END OF DATA groups, then be sure to only replace "\n" on the selected text for those lines.

If you replaced all \n with a unique code then you should have one long string that now replaces those blocks. Now search for "START.*" and replace with nothing.
Hope this was helpful.............good luck,
Bob
alg
Posts: 6
Joined: Tue Jan 06, 2004 4:19 pm

Post by alg »

Hi Bob,

I need to specify a start and an end, e.g. "START.*END" and delete everything in-between as well as the start and end. Unfortunately, the file is very large and the pattern occurs many times. I can always do this manually. But I am sure the problem will come up again and I am looking for a simple general solution.

In addition "START.*END" does not work even if the start and end are on a single line. It seems to me that the regular expression processor is deficient - but, admittedly, I am not an expert in such matters.

Thanks.
User avatar
talleyrand
Posts: 624
Joined: Mon Jul 21, 2003 6:56 pm
Location: Kansas City, MO, USA
Contact:

Post by talleyrand »

How 'bout just a simple program to handle it?

Install Python and you're good to go.

Copy this code and paste into Textpad.
Save it to something like C:\alg.py
Update the filename variables and your marker criteria. If you need a more complex search, say a regular expression, it wouldn't be hard to implement.

C:\>python alg.py

You can also run it straight out of Texpad. Set the command to wherever you installed Python (probably C:\python23\python.exe)

Code: Select all

import sys

def processFile(fileName, outFileName, beginMarker, endMarker):
    """
    Process our file.
    """
    ignore = False
    out = open(outFileName, 'w')

    for currentLine in open(fileName, 'r').readlines():
        #if the marker not a consistent case, uncomment the following
        # currentLine = currentLine.lower()
        print currentLine[:-1]

        if (currentLine.find(beginMarker) >= 0):
            ignore = True

        if (not ignore):
            #write line
            out.write(currentLine)

        if (currentLine.find(endMarker) >= 0):
            ignore = False

    #clean up
    out.close()

def main():
    fileName = r'c:\bfellows\alg_test.txt'
    outFileName = r'c:\bfellows\alg_out.txt'
    beginMarker = 'START OF DATA'
    endMarker = 'END OF DATA'
    processFile(fileName, outFileName, beginMarker, endMarker)

if __name__ == '__main__':
    main()
I choose to fight with a sack of angry cats.
User avatar
trids
Posts: 69
Joined: Wed May 07, 2003 10:16 am
Location: South Africa

Post by trids »

I reported a problem with 4.6.2 and regexp to replace text across multiple lines: http://textpad.com/forum/viewtopic.php? ... highlight=

... could be related .. ?
alg
Posts: 6
Joined: Tue Jan 06, 2004 4:19 pm

Post by alg »

When someone suggested that I write a simple program, I came to the conclusion - perhaps inaccurately - that there isn't an interest in nailing down this problem.

My impression is that this is a problem with the regualar expression processor. The regular expression "START.*END" does not work - period. This is the case whether START and END are on the same line or not. When I use the same regular expression in a Java program - it works!
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

The regular expression "START.*END" does not work - period. This is the case whether START and END are on the same line or not
I cannot duplicate this problem.
===================================
Regex works fine for the following test lines

START some data, more data, to the END
START some data, more data, to the END
START some data, more data, to the END
START some data, more data, to the END
START some data, more data, to the END
START some data, more data, to the END
START some data, more data, to the END
START some other data, more other data, to the END
START somemore data, more different data, to the END
START some new data, more new data, to the END
START some old data, more old data, to the END
START some repeat data, more repeat data, to the END
START some unique data, more unique data, to the END


Searching for "START.*END" (without quotes) replaces each line with a blank line.
Searching for "START.*END\n" (without quotes) deletes each line completely.

Selection of POSIX does not matter, same results in both instances.
===============================
Steps taken to make this happen:
From the Main Menu, using Search, Replace, entering value into "Find What" field in the Replace window. Making sure the "Replace With" field is blank. Conditions box have selected "text" and "Regular expression", Scope has selected Active document. Click on Replace All.
Replacements happen as noted above.

Replacing "\n" with a unique value "~" also works on these lines. Using the Scope of Selected Text vs. Active document:
Select text, (lines 4-10)
Replace \n with ~ (combines to START.........END).
Select text (the block just modified START.....END)
Replace START.*END with nothing (all of line is replaced with blank line).
=====================================
If any of this is not working for you.....
What are your values, what steps are you taking? What are your results?
does not work - period
is a bit vague, no error message? no cursor movement? no focus change? Specifics are helpful here.
========================
that there isn't an interest in nailing down this problem.
If that were the case, there would have been no responses to your request. There must be some interest, the item has been looked at over 125 times.
Hope this was helpful.............good luck,
Bob
User avatar
talleyrand
Posts: 624
Joined: Mon Jul 21, 2003 6:56 pm
Location: Kansas City, MO, USA
Contact:

Post by talleyrand »

alg wrote:When someone suggested that I write a simple program, I came to the conclusion - perhaps inaccurately - that there isn't an interest in nailing down this problem.
I didn't suggest you write a program, I suggested you run the program I just wrote. It appeared to be working fine for eliminating stuff. Bob already wrote a response but I would second his findings that the regular expression appears to be working for me as well.
I choose to fight with a sack of angry cats.
User avatar
s_reynisson
Posts: 939
Joined: Tue May 06, 2003 1:59 pm

Post by s_reynisson »

Just to confirm B&T's notes, I can not duplicate this problem.
Find and Replace with and without \n works fine.
What version of TP are you using? 4.7.2? HTH
Then I open up and see
the person fumbling here is me
a different way to be
alg
Posts: 6
Joined: Tue Jan 06, 2004 4:19 pm

Post by alg »

My apologies to everyone. My foot is in my mouth. I don't know what I was doing wrong. I thought I tried the regular expression several times and it didn't work for me. But when I followed the posted instructions, it worked!
walidaly
Posts: 5
Joined: Sat Jan 17, 2004 4:51 pm

Post by walidaly »

hello there(1st post)
I can't get this to work, you mean it should look like
replace
START OF DATA .*\n.*[^END]
with
START OF DATA

then it replaces one line and waits?

why you just ad a new expression like # for all char including newline?!

so that I can edit something like

Code: Select all

<hello>
I am trying to write a regular expression that will replace or delete multiple lines that always start and end with a fixed sequence of characters. For example: 

START OF DATA 
. 
. 
. 
END OF DATA 

The problem I have is that . does not match line terminators. Also, I can't have a class [.\n]. Any suggestions? 

I check the forum without any luck. Thanks.
<bye>
so I replace
<hello>#*<bye>
with
<hello><bye>

or let it even accept \n*
alg
Posts: 6
Joined: Tue Jan 06, 2004 4:19 pm

Post by alg »

The trick to deleting to replacing a sequence of characters that extends over multiple lines was given by Bob Hansen. Namely, first you have to replace all new lines with a unique code. Then you can use a regular expression like START.*END to find and replace the sequence that starts with START and ends with END. The final step would be to replace any remaining unique codes with new lines.

I hope this helps.
Post Reply