I'm trying to cleanup 100+ html files
How would I go about searching for
"</BODY></HTML>"
and delete everything starting at next line to the End of each file?
Thanks
Steve
Delete everything after "Text"
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
Since that string is usually near the end of a file, I am not sure you can do that just with RegEx in TextPad.
How does this approach work for you?
Search for "</BODY></HTML>" and Mark All lines above including this line.
Search/Invert All Bookmarks
Edit/Delete/Bookmarked Lines
You might then be able to make a macro to do that for you.
How does this approach work for you?
Search for "</BODY></HTML>" and Mark All lines above including this line.
Search/Invert All Bookmarks
Edit/Delete/Bookmarked Lines
You might then be able to make a macro to do that for you.
Last edited by Bob Hansen on Fri Oct 13, 2006 6:57 pm, edited 2 times in total.
Hope this was helpful.............good luck,
Bob
Bob
-
- Posts: 2461
- Joined: Sun Mar 02, 2003 9:22 pm
I don't see why you need a Unix account to do this.
You can readily do it using Helios's own WildEdit (http://www.textpad.com/products/wildedit/).
You can use Perl (for example, from http://www.activestate.com/Products/ActivePerl/), or your favourite scripting language, from a Windows command line.
If you want a Unix shell and Unix utilities, you can get bash and a vast collection of utilities from Cygwin (http://www.cygwin.com/).
You can readily do it using Helios's own WildEdit (http://www.textpad.com/products/wildedit/).
You can use Perl (for example, from http://www.activestate.com/Products/ActivePerl/), or your favourite scripting language, from a Windows command line.
If you want a Unix shell and Unix utilities, you can get bash and a vast collection of utilities from Cygwin (http://www.cygwin.com/).
I've downloaded wildedit - How would I perform the task in this application?ben_josephs wrote:I don't see why you need a Unix account to do this.
You can readily do it using Helios's own WildEdit (http://www.textpad.com/products/wildedit/).
-
- Posts: 2461
- Joined: Sun Mar 02, 2003 9:22 pm
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
Hi ben_josephs.
Nice simple solution, well done, and thanks for letting us know that .* in WildEdit also includes \n codes.
If I am understanding you, this means that WildEdit has the ability to scan past multiple \n codes, correct?
I missed that ability somewhere, sure solves a lot of issues.
I limited my solution to TextPad alone.
Thanks again, for the technical kick.....
I will have to spend more time with WildEdit.
Nice simple solution, well done, and thanks for letting us know that .* in WildEdit also includes \n codes.
If I am understanding you, this means that WildEdit has the ability to scan past multiple \n codes, correct?
I missed that ability somewhere, sure solves a lot of issues.
I limited my solution to TextPad alone.
Thanks again, for the technical kick.....
I will have to spend more time with WildEdit.
Hope this was helpful.............good luck,
Bob
Bob
-
- Posts: 2461
- Joined: Sun Mar 02, 2003 9:22 pm
Yes. The Boost regular expression recogniser that WildEdit uses (http://www.boost.org/libs/regex/doc/index.html) treats the text it is searching as a sequence of characters, not as a sequence of lines. This makes it possible for the regex . (dot) to match any character, including newline (if you want it to). It also makes it possible for \n to have full rights as a regular expression that can be incorporated as a component in other regular expressions without restriction.
The regular expression . when the option '.' does not match a newline character is not selected
is equivalent to
the regular expression (.|\n) when the option '.' does not match a newline character is selected.
The regular expression . when the option '.' does not match a newline character is not selected
is equivalent to
the regular expression (.|\n) when the option '.' does not match a newline character is selected.