Page 1 of 1

Problem with 1 or 2 UTF-8 characters

Posted: Thu Jun 22, 2017 11:53 am
by Plasm
Hi, I encountered another problem in 8.1.2 (64Bit) related to UTF-8 encoding:
If I open a file with only one or two UTF-8 characters, the file is loaded as ANSI which leads to a broken character presentation. Even if the file open dialog is used and the charset is set to UTF-8 explicitly, the file is loaded as ANSI.
If the file has at least three UTF-8 characters, everything works fine.

Example with german umlauts:
ä => ä
äö => äö
äöü => äöü

Best regards
Plasm

Posted: Mon Jun 26, 2017 8:45 am
by Plasm
To be clear: The problem occurs if there are only one/two UTF-8 characters amongst others.

Thus:
äbcdefghijklmnöpqrstuvwxyz => äbcdefghijklmnöpqrstuvwxyz (2x UTF-8)
äbcdefghijklmnöpqrstüvwxyz => äbcdefghijklmnöpqrstüvwxyz (3x UTF-8)

[No edit privilige, unfortunately]

Posted: Mon Aug 07, 2017 12:51 am
by bluesix
I can report the same issue.
When re-opening files saved as UTF-8, they opened as ANSI - Latin 1 and are therefore corrupted.

one, to three utf chars

Posted: Sun Sep 03, 2017 8:25 pm
by christiandittmann41
Hello!
I've tested your problem with Win10 and TP32. All is ok.
The error occurs only in the 64bit version.
So, the workaround is to use the 32bit version of TP.
Why do you think that you really need the 64bit version? This is ridiculous, no one edits such large files and in a dialog program speed is secondary...

So long
Christian, the kraut, from good old germany

Re: one, to three utf chars

Posted: Thu Sep 14, 2017 8:22 am
by AmigoJack
Thanks for this hint, although it doesn't make that much sense why a different platform compilation should behave differently in its logic.


christiandittmann41 wrote:Why do you think that you really need the 64bit version?
Because the system is 64bit and every process not being 64bit needs to be adapted, hence running effectively slower.
christiandittmann41 wrote:no one edits such large files
I do (i.e. 2.6 GiB files) and I am someone.
christiandittmann41 wrote:in a dialog program speed is secondary
By that you mean speed in your internet browser, your photo editor, your file manager and probably non-fullscreen games as well the speed is not important to you? I have my doubts.

Re: Problem with 1 or 2 UTF-8 characters

Posted: Wed Jun 05, 2019 7:50 am
by jmparatte
Plasm wrote:Hi, I encountered another problem in 8.1.2 (64Bit) related to UTF-8 encoding:
If I open a file with only one or two UTF-8 characters, the file is loaded as ANSI which leads to a broken character presentation. Even if the file open dialog is used and the charset is set to UTF-8 explicitly, the file is loaded as ANSI.
If the file has at least three UTF-8 characters, everything works fine.

Example with german umlauts:
ä => ä
äö => äö
äöü => äöü

Best regards
Plasm
My solution is to insert at beginning of for example a PHP file:

Code: Select all

<?php //éèà
...
?>
The "éèà" 3 non-ascii characters placed very near the beginning of file is analyzed and correctly decoded to switch the encoding as an UTF-8 file.
If the same sequence is placed too far from the beginning, the encoding could be incorrectly determined.

Posted: Wed Jun 05, 2019 7:51 am
by Plasm
Problem still persists in 8.2.0 (64 Bit).

Test case:
- Create a new file
- Write: "äeiöu"
- Save the file as UTF-8 without BOM
- Close the file (or Textpad itself)
- Open the file by double-clicking on it, from the open dialog or via dragging it into Textpad (doesn't matter)
- Result: Textpad displays "äeiöu"

The file is saved correctly (tested with other editors). The Problem occurs at opening the file.
If there are more than 2 UTF-8 characters, everything is fine. For example: "äeiöü" results in "äeiöü".

BTW: I saved the file as .txt. The Text document class has UTF-8 charset and no BOM as default settings, if that matters.

Best regards
Plasm

Posted: Wed Jun 05, 2019 8:48 am
by jmparatte
Plasm wrote:...example: "äeiöü" results in "äeiöü"...
The decoding at open fails also with 3 non-ascii characters when the 3 non-ascii characters are not consecutive.

Posted: Wed Jun 05, 2019 2:59 pm
by ben_josephs
In https://forums.textpad.com/viewtopic.php?t=13253 I suggested:

    Save your session in a workspace and open the file by opening the workspace.

Is that a suitable solution?

(I use workspaces for all my non-transient editing work.)