8.0.2 How to get Tool Output to show Unicode?
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
When the output of a Run command is captured, it is always redirected to a temporary file which is read into the Tool Output document when the tool terminates. At that point, heuristics are used to determine the codepage of the file. In your case, this fails to detect UTF-8, because there's insufficient data to distinguish that from ANSI text. This can be fixed by preceding the output with a UTF-8 BOM.
If you add the command to the Tools menu instead, you can choose to redirect to a temporary file using the "Suppress output until completed" option, which will have the same results. However, without that option, the output is piped directly into TextPad and converted to the codepage specified for the Tool Output document class.
While checking this out, I noticed that the codepage which is displayed on the statusbar is not updated when the output has been captured via a pipe. It remains as whatever was set when loading output redirected to a temporary file. This is harmless, but will be fixed in the next release.
I hope this helps.
Keith MacDonald
Helios Software Solutions
If you add the command to the Tools menu instead, you can choose to redirect to a temporary file using the "Suppress output until completed" option, which will have the same results. However, without that option, the output is piped directly into TextPad and converted to the codepage specified for the Tool Output document class.
While checking this out, I noticed that the codepage which is displayed on the statusbar is not updated when the output has been captured via a pipe. It remains as whatever was set when loading output redirected to a temporary file. This is harmless, but will be fixed in the next release.
I hope this helps.
Keith MacDonald
Helios Software Solutions
I assume that's a bug. The encoding should be determined by my setting in Preferences: http://i.imgur.com/nV9IB3A.pngbbadmin wrote:heuristics are used to determine the codepage of the file.
No, that doesn't fix it. That worksaround it by making me change my program, and creatinf output that is explicitly not recommended by the Unicode Standard.bbadmin wrote:In your case, this fails to detect UTF-8, because there's insufficient data to distinguish that from ANSI text. This can be fixed by preceding the output with a UTF-8 BOM.
I confirm 'without that option' works for an External Tool created as a Program. I.e. 8.1.0 has remedied the failure in that particular case.bbadmin wrote:If you add the command to the Tools menu instead, you can choose to redirect to a temporary file using the "Suppress output until completed" option, which will have the same results. However, without that option, the output is piped directly into TextPad and converted to the codepage specified for the Tool Output document class.
Thanks.
The failure remains in the External Tool created as a DOS Command:
though that may be due to CMD.EXE rather than any fault of TextPad.
Are there plans to remedy TextPad Run command's failure to respect the Tool Output document class encoding setting?
When opening a file (CTRL+O) I have a combobox to select an encoding[1] - why not having it also forbbadmin wrote:At that point, heuristics are used to determine the codepage of the file.
- the "Run" command, and
- the "Find In Files" command, which most supposely has the same dilemma? It can never surely know which encoding is used in a particular file. Does it even stick to the filename extensions defined in the Document Classes?
Oh, and "DOS" nowhere exists anymore since Windows 2000[2]. You can't show me any instance of Windows that has the caption "DOS". Even back in the DOS-based Windows versions you were executing a program inside Windows, not DOS.
[1] "Codepage" might still sound correct to you thru API calls, but this term is Windows slang and any UTF is surely no codepage, but instead an encoding. The naming thru TextPad is inconsistent: opening a file thru CTRL+O has the caption "Encoding", but the window with the "More..." list has the caption "Code Page"[3]. In preferences it's called "Default encoding" again.
[2] Rename "DOS" to "CMD" and the caption "DOS Command" to i.e. "Use CMD". Unless, of course, the majority of users would be confused and they also still think of running an operating system in an operating system.
[3] Seeing this list I wonder if you will also add support for codepages 12000 and 12001 (UTF-32 and UTF-32BE) at a later time.
TextPad's user interface correctly uses the term encoding, but the APIs doing the work internally convert from one "codepage" to another. Even though UTF-8 and UTF-16 encodings are not the same as codepages, Windows assigns codepage numbers to them for consistency.
CMD.EXE always outputs in the DOS/OEM code page, as I said in a previous reply. This allows it to support the line drawing characters for constructing forms.
The ideal solution is to have an encoding option on the Run dialog box and for each user tool. However, we wanted to implement a workable solution in the latest release, but it was too late in the development cycle to commit to the necessary user interface changes.
Note that you can get most of the flexibility of the Run command in a User Tool by judicious use of tool parameter macros. For instance, use $File with PHP to have it run the code in the active document, and/or $Prompt to prompt for parameters at runtime.
In TextPad 8, Find in Files uses heuristics to determine the encoding of each document it searches. TextPad 7 assumes they are all in the ANSI codepage, because that's what it uses internally.
Keith MacDonald
Helios Software Solutions
CMD.EXE always outputs in the DOS/OEM code page, as I said in a previous reply. This allows it to support the line drawing characters for constructing forms.
The ideal solution is to have an encoding option on the Run dialog box and for each user tool. However, we wanted to implement a workable solution in the latest release, but it was too late in the development cycle to commit to the necessary user interface changes.
Note that you can get most of the flexibility of the Run command in a User Tool by judicious use of tool parameter macros. For instance, use $File with PHP to have it run the code in the active document, and/or $Prompt to prompt for parameters at runtime.
In TextPad 8, Find in Files uses heuristics to determine the encoding of each document it searches. TextPad 7 assumes they are all in the ANSI codepage, because that's what it uses internally.
Keith MacDonald
Helios Software Solutions
Keith, Are there plans to remedy TextPad Run command's failure to respect the Tool Output document class encoding setting?
Yowch. Thanks for the warning. That should really go in the user docs.bbadmin wrote:In TextPad 8, Find in Files uses heuristics to determine the encoding of each document it searches.
It's poor heuristics that concludes José is more likely than José.AmigoJack wrote:It just confirms what I already found out:AmigoJack wrote:I can only assume that á alone is not enough for Textpad to think the encoding is meant to be UTF-8 - it just thinks it's ANSI. But my two additional Katakanas are enough as an indication to UTF-8.
Regardless, I think guessing of encoding is completely unacceptable for use of TextPad in program development. I look forward to a TextPad version that just does what it is told, and does not guess.
You seem to never have written code for an encoding sniffer, especially for european texts - it's far from being easy. The Win32 API fails at this as well.chrisjj wrote:It's poor heuristics that concludes José is more likely than José.
(@bbadmin: if you're actually using IMultiLanguage2::DetectInputCodepage then experiment on inflating small texts, i.e. by repeating the same text until it grows to more than 400 characters - then chances are higher to detect the correct codepage).
Correct. But that doesn't make TextPad's implementation less poor.AmigoJack wrote:You seem to never have written code for an encoding sniffer, especially for european textschrisjj wrote:It's poor heuristics that concludes José is more likely than José.
I believe it's far from easy to achieve a satisfactory result. Which is one reason I'd never bother to try it. ;-)AmigoJack wrote: - it's far from being easy.
No problem to me. I don't use that API feature.AmigoJack wrote:The Win32 API fails at this as well.
I do however use TextPad. Which is why I would like it to stop attempting to guess the encoding when I have explicitly set the encoding.
Your options for specifying the encoding of output from tools in TextPad 8.1 are:
- Make the tool write a UTF-8 or UTF-16 BOM.
- Add it as a command to the Tools menu and set the required encoding for the Tool Output Document class.
- Make the tool write (or the HTML 4.01 equivalent).
Code: Select all
<head><meta charset="UTF-8"></head>
Last edited by bbadmin on Tue Nov 15, 2016 4:29 pm, edited 1 time in total.
I'd really appreciate an answer to the question.bbadmin wrote:The intention is to add an option to the Run dialog box to specify the encoding. This will also be available for each tool added to the Tools menu.
The question s: Are there plans to remedy TextPad Run command's failure to respect the Tool Output document class encoding setting?