MudGuard wrote:Find (checking Regular exception): . . .
Thank you Mudguard, as usual you're
AWESOME! � in fact you may have suggestions I haven't even thought of, so to that end I'm supplying some context:
This project can be sourced to the HTML code of a web page on the site
DigitalTrends, although I could easily find millions more just like it. There are
2,193,480 characters in this one page's code

(now you know why I can't paste it here in the forums!). The HTML file is so huge that the mere act of opening the file in Dreamweaver crashes the program. Over 98% of the code is related to monetizing the site and it is this
advertising bloat code � of this one page
and others like it � that I hope to automate the means to DELETE. As will soon be apparent, an all-purpose "style stripper" is too arbitrary.
I'm giving you the link to a .ZIP file I just uploaded to EXPIREBOX � a FREE online temporary storage site which you'll need to download from quickly because the link expires in 48-Hours. The .ZIP file contains three files: 1) a PDF graphics reference, 2) an HTML file, and 3) a text version of the HTML file. I strongly recommend that you
open the text version of the page first. I've changed nothing in the code; the page I downloaded is exactly as you see it, warts and all.
THE PROJECT
I'd like to have the means at my disposal to
automate the removal of all advertising & related code �
and events (such as Javascript, PHP etc.) �
while keeping the page's essential visual look & style. In this code are hundreds of script and styles data related exclusively to monetizing the site � Class and ID selectors with labels such as
Code: Select all
id="dt-video-container-2989522218"
I do not expect any search & replace session to collect all of them � indeed, I have to be careful that selectors matched aren't
also-or-exclusively related to the page's content � so for obvious reasons I want to make sure that the data I've cut is backed up
in its own file. As Mudguard's example above indicates, this is one of those projects where even the most artful Textpad Macro cannot escape the immutable fact of it being one of
multiple steps.
On Line 64 you'll see an uninterrupted 56,947-character block-of-code, and this block was the catalyst for my thread. Solid blocks of data such as these � with no spaces or carrier returns � are a matter of
seconds to delete. In fact the monetizing of virtually all web pages use two methods, and always include Google:
- iframes
- scripts
- Google
(and Amazon if necessary).
Iframes and scripts can be removed in
seconds (at least I hope so!) � it's the selector tags that introduce the greatest challenge. This will be an ongoing project that will need to be edited and perfected over time as the web evolves.
So why am I doing this? Sometimes I like to download technical guides and store them � and their graphics � for my own personal offline use as a
reference I can annotate. I want to preserve each site's visual style because their respective page designs help me to associate them in my mind. I'll post my own progress here, but for Mudguard and others who are interested: Download my .ZIP file (in the next 48 hours) and let's play.

Trump.
Jesus wept.