V 5.0.3 : cannot open UTF-8 file withou error

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
iltis
Posts: 2
Joined: Thu Jan 10, 2008 2:14 pm
Location: Switzerland

V 5.0.3 : cannot open UTF-8 file withou error

Post by iltis »

Hi
got an UTF-8 encoded file.
When opening this the editor (TextPad 5.0.3) says:
WARNING: <file> contains characters that do not exist in code page 1252 (ANSI - Lateinisch I). They will be converted to the system default character, if you click OK.

But I don't want them to be converted!!!
Because doing that the BOM disappears.........

Since I read everywhere that "TextPad automatically detects... UTF-8" why doesn't this apply to my installation.
Opening the file (with conversion) and save_as UTF-8 will not add the BOM.
As well trying to make a new file (UTF-8 ) will as well not add the BOM.

The document class I'm using has:
- the tick on "Write Unicode and UTF-8 BOM"
- default encoding on UTF-8
- create new file as PC

Can anyone tell me what I am doing wrong? Or is this yet another feature?

Thanks a lot for your help....
tranglos
Posts: 2
Joined: Mon Jul 05, 2004 8:58 pm

Post by tranglos »

You are not doing anything wrong - you just came across at a limitation in TextPad. TextPad corectly detects UTF-8 files (as the message indicates), but the editor itself is not capable of editing Unicode. So the UTF-8 file must be converted to ANSI character set. TextPad will then convert ANSI back to UTF-8 when saving.

Often this works fine, but some documents contain characters that are not present in your current ANSI codepage (selected in Control panel, Regional Options). They simpy cannot be represented using that codepage, at all. So TextPad must convert them to what it calls "system default character", which really means these characters will be lost. Either that, or it cannot display the file at all.

If you often need to edit UTF-8 or UTF-16 files, you will need to find a text editor that fully supports these encodings - there are several free and shareware ones, but the quality of Unicode support varies.
iltis
Posts: 2
Joined: Thu Jan 10, 2008 2:14 pm
Location: Switzerland

Post by iltis »

Thanks for your anwer. I've been looking for what codes are in charge of this dirty error message and found these:

Code: Select all

What is it:                             Unicode (Hex):          UTF-8 (Hex)
Latin Small Letter T with Caron            00165                 C5A5
Latin Capital Letter C with Caron          0010C                 C48C
Latin Small Letter C with Caron            0010D                 C48D
Latin Small Letter C with Acute            00107                 C487
Latin Capital Letter D with Caron          0010E                 C48E
Latin Small Letter D with Caron            0010F                 C48F
Latin Small Letter L with Acute            0013A                 C4BA
Latin Capital Letter L with Caron          0013D                 C4BD
Latin Small Letter L with Caron            0013E                 C4BE
Latin Small Letter N with Caron            00148                 C588
Latin Small Letter O with double Acute     00151                 C591
Latin Small Letter U with double Acute     00171                 C5B1
Latin Small Letter S with Caron            00161                 C5A1
This is just an excerpt of what is missing I suppose.

Can anyone from TextPad tell me, when these codes might be corrected?
TextPad has much better features than other editors.
- First allow to read UTF-8 files
- Then also allow to write this as an UTF-8 file (incl. BOM).
Therefore it would be fine if one could really use it also for editing UTF-8 encoded files.

Thanks for your answer.......
bveldkamp

Post by bveldkamp »

Post Reply