Page 1 of 1
UTF-8 and text exported from the web
Posted: Mon Jan 18, 2016 4:12 am
by ineuw
Is there a way to open a text file in TP without loosing the extended characters like é or ü, (which end up as ?), when the text file encoding is also UTF-8 exported from a webpage?
Posted: Mon Jan 18, 2016 9:26 am
by bbadmin
If a web page is encoded in UTF-8, its header should contain this line:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
TextPad looks for that and reads the contents accordingly. If that line is missing, it looks for combinations of characters in the first part of the file which can only be utf-8. If both those checks fail, it assumes the contents are in the default code page set in Windows. You can override that by selecting the encoding on the Open File dialog box.
For other types of UTF-8 encoded file, you can save them with a Unicode byte order mark (BOM) in the first three bytes, to prevent any ambiguity.
Keith MacDonald
Helios Software Solutions
Posted: Mon Jan 18, 2016 2:55 pm
by ineuw
Thanks for the reply. Will test each possibility