UTF-8 and text exported from the web

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
User avatar
ineuw
Posts: 191
Joined: Sun Mar 18, 2007 3:23 pm

UTF-8 and text exported from the web

Post by ineuw »

Is there a way to open a text file in TP without loosing the extended characters like é or ü, (which end up as ?), when the text file encoding is also UTF-8 exported from a webpage?
TextPad 8.16.0 64bit in English and TextPad 9.1.0 64bit in French, on two separate Windows installations
User avatar
bbadmin
Site Admin
Posts: 1020
Joined: Mon Feb 17, 2003 8:54 pm
Contact:

Post by bbadmin »

If a web page is encoded in UTF-8, its header should contain this line:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

TextPad looks for that and reads the contents accordingly. If that line is missing, it looks for combinations of characters in the first part of the file which can only be utf-8. If both those checks fail, it assumes the contents are in the default code page set in Windows. You can override that by selecting the encoding on the Open File dialog box.

For other types of UTF-8 encoded file, you can save them with a Unicode byte order mark (BOM) in the first three bytes, to prevent any ambiguity.

Keith MacDonald
Helios Software Solutions
User avatar
ineuw
Posts: 191
Joined: Sun Mar 18, 2007 3:23 pm

Post by ineuw »

Thanks for the reply. Will test each possibility
TextPad 8.16.0 64bit in English and TextPad 9.1.0 64bit in French, on two separate Windows installations
Post Reply