Page 1 of 1

Replace text inside an HTML document without opening it?

Posted: Wed Dec 14, 2016 6:38 am
by no.cache
I have an HTML file that can't be opened in Textpad because it contains UNICODE glyphs that I don't want Textpad to convert. Here's the little devil, and unfortunately there are dozens of these stinkers. They're always on 2 lines. The </a> string is always the same four characters, however the data following the = sign is always different. I've bolded and colorized blue an example:

<h2><a name="x1000">Myanmar
</a></h2>

. . . and here is what it should look like:

<h2>Myanmar</h2>


Thanks friends. :P

Posted: Wed Dec 14, 2016 8:21 am
by AmigoJack
Replacing bytes in a file without opening it is impossible, no matter how you look at it. You're either asking for a different editor (Notepad++ supports Unicode) or a different version (TextPad 8 finally supports Unicode as well).

Posted: Wed Dec 14, 2016 10:36 am
by ben_josephs
Which Unicode characters do you suspect that the latest version of TextPad does not handle correctly?

If this is not a real problem, try
Find what: <h2><a name="[^"]*">Myanmar\n</a></h2>
Replace with: <h2>Myanmar</h2>

[X] Regular expression

Replace All

Posted: Thu Dec 15, 2016 1:22 am
by no.cache
ben_josephs wrote:Which Unicode characters do you suspect that the latest version of TextPad does not handle correctly?
This (but see this¹).
AmigoJack wrote:You're either asking for a different editor <SNIP>
You're right AmigoJack. In fact, I didn't even realize there were these editors customized to work with Unicode specifically. In approximately 10 seconds I stripped the script, java, advertising and other crap from my HTML file, the one containing the Unicode glyphs, with no corruption (as far as I can see) using UniRed � a free Unicode Editor weighing in at 1,156 kb and completely portable. But I tried a second unicode editor, as well . . .

RJ TextEd is a free unicode editor that unpacks to a hefty 296 MB. It wasn't the program's resource consumption that I objected to, it was the program: The last time I demo'd a program this formidable was Photoshop. More features are always better imho. I just need the time to master the program's numerous features (thank God they have a forum lol).

¹I'll be building a 64-BIT WINDOWS 7 PRO computer over the holiday as I say farewell to Windows XP. Armed with 64-Bits and (what I call) a mature operating system one of the first things I am eager to do Is finally install the Helios License I paid for a few months ago.

Posted: Thu Dec 15, 2016 11:14 am
by bbadmin
You haven't said which version of TextPad you are using, so it would be helpful to know that as well as the encoding of your HTML files.

Posted: Sun Dec 18, 2016 4:06 am
by no.cache
@bbadmin . . . I'm using version 7.5.1 [32-BIT] of Textpad (but not for long). I've been meaning to ask about this issue for years now: Is there no way Textpad can handle Unicode glyphs (symbols, bullets, arrows etc.)? I've tried just about everything to make it display the unicode but it always says We need to convert your document to ANSI (or words to the effect) and dumps null charaters in their place. I wish I could use Textpad more than I can because of this issue. In answer to your questions:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<HEAD>
<META content="text/html; charset=UTF-8" http-equiv="Content-Type">

Posted: Sun Dec 18, 2016 2:04 pm
by ben_josephs
TextPad 7 does not properly support Unicode. It can only display characters from a single 8-bit code page in each document.

Try TextPad 8.