UTF-8 and text exported from the web
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
UTF-8 and text exported from the web
Is there a way to open a text file in TP without loosing the extended characters like é or ü, (which end up as ?), when the text file encoding is also UTF-8 exported from a webpage?
TextPad 8.16.0 64bit in English and TextPad 9.1.0 64bit in French, on two separate Windows installations
If a web page is encoded in UTF-8, its header should contain this line:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
TextPad looks for that and reads the contents accordingly. If that line is missing, it looks for combinations of characters in the first part of the file which can only be utf-8. If both those checks fail, it assumes the contents are in the default code page set in Windows. You can override that by selecting the encoding on the Open File dialog box.
For other types of UTF-8 encoded file, you can save them with a Unicode byte order mark (BOM) in the first three bytes, to prevent any ambiguity.
Keith MacDonald
Helios Software Solutions
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
TextPad looks for that and reads the contents accordingly. If that line is missing, it looks for combinations of characters in the first part of the file which can only be utf-8. If both those checks fail, it assumes the contents are in the default code page set in Windows. You can override that by selecting the encoding on the Open File dialog box.
For other types of UTF-8 encoded file, you can save them with a Unicode byte order mark (BOM) in the first three bytes, to prevent any ambiguity.
Keith MacDonald
Helios Software Solutions