Problem with files encoded with Windows-1251 Cyrillic

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
Fikri
Posts: 4
Joined: Tue Jul 23, 2019 5:52 pm
Location: Bulgaria

Problem with files encoded with Windows-1251 Cyrillic

Post by Fikri »

When I set all Document Classes default encoding to UTF-8, and open a file encoded with Windows-1251 Cyrillic, Textpad 8.16.1 does not display it correctly (it shows question marks).

Textpad gives wrong information about the file in the status bar (or from View -> Document Properties, Alt+Enter): UTF-8. This is an error. When I open this file with Notepad or Notepad++, they display the file correctly. In the status bar they shows correctly: ANSI Windows-1251.

I use Notepad++ to convert the file to UTF-8 and then TextPad 8.16.1 displays it correctly.

I think you should fix this error.
new Thread(this).start();
User avatar
AmigoJack
Posts: 500
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Re: Problem with files encoded with Windows-1251 Cyrillic

Post by AmigoJack »

Can you please attach said file here to reproduce your problem? Both TextPad and Notepad++ can only guess the encoding of a text file not having a BOM, so your file might have too many portions that aren't distinctive enough (f.e. latin letters in the beginning).
User avatar
bbadmin
Site Admin
Posts: 820
Joined: Mon Feb 17, 2003 8:54 pm
Contact:

Re: Problem with files encoded with Windows-1251 Cyrillic

Post by bbadmin »

It is not possible to determine heuristically if a file is encoded in Windows 1251, or any other encoding except the UTF ones. (An exception to this is if the program reading them knows the language of the files, it can check by attempting to look up words in a lexicon of that language.) By specifying that the encoding of your files is UTF-8, TextPad attempts to decode them from that into its internal encoding, but that fails because they are not in UTF-8.

If the default code page of your PC is 1251, you should set the encoding in TextPad to ANSI. If you want to save them as UTF-8, select that encoding on the Save As dialog box. When ANSI is set and a UTF-8 file is opened, TextPad attempts to detect the encoding. This fails, if no UTF-8 specific character combinations are present, so the default encoding will be assumed. To force UTF-8 recognition, you can save files with the Unicode byte order mark (BOM) in the first 3 bytes.

I hope this helps.
Fikri
Posts: 4
Joined: Tue Jul 23, 2019 5:52 pm
Location: Bulgaria

Re: Problem with files encoded with Windows-1251 Cyrillic

Post by Fikri »

AmigoJack wrote: Mon Apr 17, 2023 7:29 amCan you please attach said file here to reproduce your problem?
See this file: TestFileWin-1251.txt, attached here.

I got this error:

ERROR:
Invalid file extension: TestFileWin-1251.txt
Last edited by AmigoJack on Mon Apr 17, 2023 11:02 pm, edited 1 time in total.
Reason: reducing full quote to relevant quote
User avatar
bbadmin
Site Admin
Posts: 820
Joined: Mon Feb 17, 2003 8:54 pm
Contact:

Re: Problem with files encoded with Windows-1251 Cyrillic

Post by bbadmin »

The forum now allows text files to be attached.

Does my previous response not solve your problem?
Fikri
Posts: 4
Joined: Tue Jul 23, 2019 5:52 pm
Location: Bulgaria

Re: Problem with files encoded with Windows-1251 Cyrillic

Post by Fikri »

By default my Windows 10 (and 11) encoding is UTF-8. I set TextPad default encoding of all Document Classes to ANSI. And now TextPad correctly displays ANSI-encoded files and UTF-8-encoded files. The problem is solved. Thanks.
new Thread(this).start();
Fikri
Posts: 4
Joined: Tue Jul 23, 2019 5:52 pm
Location: Bulgaria

Re: Problem with files encoded with Windows-1251 Cyrillic

Post by Fikri »

See this file: TestFileWin-1251.txt, attached here.
TestFileWin-1251.txt
(2.47 KiB) Downloaded 105 times
new Thread(this).start();
Post Reply