8.0.2 How to get Tool Output to show Unicode?

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

User avatar
bbadmin
Site Admin
Posts: 879
Joined: Mon Feb 17, 2003 8:54 pm
Contact:

Post by bbadmin »

When the output of a Run command is captured, it is always redirected to a temporary file which is read into the Tool Output document when the tool terminates. At that point, heuristics are used to determine the codepage of the file. In your case, this fails to detect UTF-8, because there's insufficient data to distinguish that from ANSI text. This can be fixed by preceding the output with a UTF-8 BOM.

If you add the command to the Tools menu instead, you can choose to redirect to a temporary file using the "Suppress output until completed" option, which will have the same results. However, without that option, the output is piped directly into TextPad and converted to the codepage specified for the Tool Output document class.

While checking this out, I noticed that the codepage which is displayed on the statusbar is not updated when the output has been captured via a pipe. It remains as whatever was set when loading output redirected to a temporary file. This is harmless, but will be fixed in the next release.

I hope this helps.

Keith MacDonald
Helios Software Solutions
chrisjj
Posts: 149
Joined: Sat Jan 21, 2006 10:32 pm

Post by chrisjj »

bbadmin wrote:heuristics are used to determine the codepage of the file.
I assume that's a bug. The encoding should be determined by my setting in Preferences: http://i.imgur.com/nV9IB3A.png
bbadmin wrote:In your case, this fails to detect UTF-8, because there's insufficient data to distinguish that from ANSI text. This can be fixed by preceding the output with a UTF-8 BOM.
No, that doesn't fix it. That worksaround it by making me change my program, and creatinf output that is explicitly not recommended by the Unicode Standard.
bbadmin wrote:If you add the command to the Tools menu instead, you can choose to redirect to a temporary file using the "Suppress output until completed" option, which will have the same results. However, without that option, the output is piped directly into TextPad and converted to the codepage specified for the Tool Output document class.
I confirm 'without that option' works for an External Tool created as a Program. I.e. 8.1.0 has remedied the failure in that particular case.

Image
Thanks.

The failure remains in the External Tool created as a DOS Command:

Image
though that may be due to CMD.EXE rather than any fault of TextPad.

Are there plans to remedy TextPad Run command's failure to respect the Tool Output document class encoding setting?
User avatar
AmigoJack
Posts: 533
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Post by AmigoJack »

bbadmin wrote:At that point, heuristics are used to determine the codepage of the file.
When opening a file (CTRL+O) I have a combobox to select an encoding[1] - why not having it also for
  1. the "Run" command, and
  2. the "Find In Files" command, which most supposely has the same dilemma? It can never surely know which encoding is used in a particular file. Does it even stick to the filename extensions defined in the Document Classes?
In both cases such a combobox would solve more problems than being forced to use workarounds.

Oh, and "DOS" nowhere exists anymore since Windows 2000[2]. You can't show me any instance of Windows that has the caption "DOS". Even back in the DOS-based Windows versions you were executing a program inside Windows, not DOS.



[1] "Codepage" might still sound correct to you thru API calls, but this term is Windows slang and any UTF is surely no codepage, but instead an encoding. The naming thru TextPad is inconsistent: opening a file thru CTRL+O has the caption "Encoding", but the window with the "More..." list has the caption "Code Page"[3]. In preferences it's called "Default encoding" again.

[2] Rename "DOS" to "CMD" and the caption "DOS Command" to i.e. "Use CMD". Unless, of course, the majority of users would be confused and they also still think of running an operating system in an operating system.

[3] Seeing this list I wonder if you will also add support for codepages 12000 and 12001 (UTF-32 and UTF-32BE) at a later time.
User avatar
bbadmin
Site Admin
Posts: 879
Joined: Mon Feb 17, 2003 8:54 pm
Contact:

Post by bbadmin »

TextPad's user interface correctly uses the term encoding, but the APIs doing the work internally convert from one "codepage" to another. Even though UTF-8 and UTF-16 encodings are not the same as codepages, Windows assigns codepage numbers to them for consistency.

CMD.EXE always outputs in the DOS/OEM code page, as I said in a previous reply. This allows it to support the line drawing characters for constructing forms.

The ideal solution is to have an encoding option on the Run dialog box and for each user tool. However, we wanted to implement a workable solution in the latest release, but it was too late in the development cycle to commit to the necessary user interface changes.

Note that you can get most of the flexibility of the Run command in a User Tool by judicious use of tool parameter macros. For instance, use $File with PHP to have it run the code in the active document, and/or $Prompt to prompt for parameters at runtime.

In TextPad 8, Find in Files uses heuristics to determine the encoding of each document it searches. TextPad 7 assumes they are all in the ANSI codepage, because that's what it uses internally.

Keith MacDonald
Helios Software Solutions
chrisjj
Posts: 149
Joined: Sat Jan 21, 2006 10:32 pm

Post by chrisjj »

Keith, Are there plans to remedy TextPad Run command's failure to respect the Tool Output document class encoding setting?
bbadmin wrote:In TextPad 8, Find in Files uses heuristics to determine the encoding of each document it searches.
Yowch. Thanks for the warning. That should really go in the user docs.
User avatar
AmigoJack
Posts: 533
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Post by AmigoJack »

It just confirms what I already found out:
AmigoJack wrote:I can only assume that á alone is not enough for Textpad to think the encoding is meant to be UTF-8 - it just thinks it's ANSI. But my two additional Katakanas are enough as an indication to UTF-8.
chrisjj
Posts: 149
Joined: Sat Jan 21, 2006 10:32 pm

Post by chrisjj »

AmigoJack wrote:It just confirms what I already found out:
AmigoJack wrote:I can only assume that á alone is not enough for Textpad to think the encoding is meant to be UTF-8 - it just thinks it's ANSI. But my two additional Katakanas are enough as an indication to UTF-8.
It's poor heuristics that concludes José is more likely than José.

Regardless, I think guessing of encoding is completely unacceptable for use of TextPad in program development. I look forward to a TextPad version that just does what it is told, and does not guess.
User avatar
AmigoJack
Posts: 533
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Post by AmigoJack »

chrisjj wrote:It's poor heuristics that concludes José is more likely than José.
You seem to never have written code for an encoding sniffer, especially for european texts - it's far from being easy. The Win32 API fails at this as well.

(@bbadmin: if you're actually using IMultiLanguage2::DetectInputCodepage then experiment on inflating small texts, i.e. by repeating the same text until it grows to more than 400 characters - then chances are higher to detect the correct codepage).
chrisjj
Posts: 149
Joined: Sat Jan 21, 2006 10:32 pm

Post by chrisjj »

AmigoJack wrote:
chrisjj wrote:It's poor heuristics that concludes José is more likely than José.
You seem to never have written code for an encoding sniffer, especially for european texts
Correct. But that doesn't make TextPad's implementation less poor.
AmigoJack wrote: - it's far from being easy.
I believe it's far from easy to achieve a satisfactory result. Which is one reason I'd never bother to try it. ;-)
AmigoJack wrote:The Win32 API fails at this as well.
No problem to me. I don't use that API feature.

I do however use TextPad. Which is why I would like it to stop attempting to guess the encoding when I have explicitly set the encoding.
User avatar
bbadmin
Site Admin
Posts: 879
Joined: Mon Feb 17, 2003 8:54 pm
Contact:

Post by bbadmin »

Your options for specifying the encoding of output from tools in TextPad 8.1 are:
  1. Make the tool write a UTF-8 or UTF-16 BOM.
  2. Add it as a command to the Tools menu and set the required encoding for the Tool Output Document class.
  3. Make the tool write

    Code: Select all

    <head><meta charset="UTF-8"></head>
    (or the HTML 4.01 equivalent).
Sorry if those options still leave you stymied.
Last edited by bbadmin on Tue Nov 15, 2016 4:29 pm, edited 1 time in total.
chrisjj
Posts: 149
Joined: Sat Jan 21, 2006 10:32 pm

Post by chrisjj »

bbadmin wrote:Your options for specifying the encoding of output from tools in TextPad 8.1 are:
Are there plans to remedy TextPad Run command's failure to respect the Tool Output document class encoding setting?
User avatar
bbadmin
Site Admin
Posts: 879
Joined: Mon Feb 17, 2003 8:54 pm
Contact:

Post by bbadmin »

The intention is to add an option to the Run dialog box to specify the encoding. This will also be available for each tool added to the Tools menu.
chrisjj
Posts: 149
Joined: Sat Jan 21, 2006 10:32 pm

Post by chrisjj »

bbadmin wrote:The intention is to add an option to the Run dialog box to specify the encoding. This will also be available for each tool added to the Tools menu.
I'd really appreciate an answer to the question.

The question s: Are there plans to remedy TextPad Run command's failure to respect the Tool Output document class encoding setting?
User avatar
bbadmin
Site Admin
Posts: 879
Joined: Mon Feb 17, 2003 8:54 pm
Contact:

Post by bbadmin »

Why would you want it to do that, rather than be able to specify the encoding on the Run dialog box?
chrisjj
Posts: 149
Joined: Sat Jan 21, 2006 10:32 pm

Post by chrisjj »

bbadmin wrote:Why would you want it to do that, rather than be able to specify the encoding on the Run dialog box?
To set the default - as that Preferences option says.
Post Reply