Delete everything after "Text"

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
steve1040
Posts: 39
Joined: Fri Oct 13, 2006 2:19 am

Delete everything after "Text"

Post by steve1040 »

I'm trying to cleanup 100+ html files

How would I go about searching for
"</BODY></HTML>"
and delete everything starting at next line to the End of each file?

Thanks
Steve
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Since that string is usually near the end of a file, I am not sure you can do that just with RegEx in TextPad.

How does this approach work for you?

Search for "</BODY></HTML>" and Mark All lines above including this line.
Search/Invert All Bookmarks
Edit/Delete/Bookmarked Lines

You might then be able to make a macro to do that for you.
Last edited by Bob Hansen on Fri Oct 13, 2006 6:57 pm, edited 2 times in total.
Hope this was helpful.............good luck,
Bob
steve1040
Posts: 39
Joined: Fri Oct 13, 2006 2:19 am

Post by steve1040 »

I'm thinking I'll have to get a Unix account and do this there
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

I don't see why you need a Unix account to do this.

You can readily do it using Helios's own WildEdit (http://www.textpad.com/products/wildedit/).

You can use Perl (for example, from http://www.activestate.com/Products/ActivePerl/), or your favourite scripting language, from a Windows command line.

If you want a Unix shell and Unix utilities, you can get bash and a vast collection of utilities from Cygwin (http://www.cygwin.com/).
steve1040
Posts: 39
Joined: Fri Oct 13, 2006 2:19 am

Post by steve1040 »

ben_josephs wrote:I don't see why you need a Unix account to do this.

You can readily do it using Helios's own WildEdit (http://www.textpad.com/products/wildedit/).
I've downloaded wildedit - How would I perform the task in this application?
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Find what: </BODY></HTML>.*
Replace with: </BODY></HTML>\n

[X] Regular expression
[X] Replacement format

Options
[ ] '.' does not match a newline character [i.e., not selected]
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Hi ben_josephs.

Nice simple solution, well done, and thanks for letting us know that .* in WildEdit also includes \n codes.

If I am understanding you, this means that WildEdit has the ability to scan past multiple \n codes, correct?
I missed that ability somewhere, sure solves a lot of issues.
I limited my solution to TextPad alone.

Thanks again, for the technical kick.....

I will have to spend more time with WildEdit.
Hope this was helpful.............good luck,
Bob
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Yes. The Boost regular expression recogniser that WildEdit uses (http://www.boost.org/libs/regex/doc/index.html) treats the text it is searching as a sequence of characters, not as a sequence of lines. This makes it possible for the regex . (dot) to match any character, including newline (if you want it to). It also makes it possible for \n to have full rights as a regular expression that can be incorporated as a component in other regular expressions without restriction.

The regular expression . when the option '.' does not match a newline character is not selected
is equivalent to
the regular expression (.|\n) when the option '.' does not match a newline character is selected.
Post Reply