Bug dealing with Unicode

schnitzi · Post by **schnitzi** » Tue Jul 29, 2003 5:17 am

I've used textpad to open some files with Chinese characters in them, which textpad doesn't actually fully support. This is fine; my issue is with how textpad responds in this situation. The message you get on loading is:

Warning: "myfilename" contains characters that do not exist in
code page 1252 (ANSI - Latin I). They will be converted to a
system default character, if you click OK.

[ OK ] [ Cancel ]

This message is rather vague. Will it a) convert these characters for display purposes only (i.e. characters in file are not changed), b) change the characters in the file itself, but don't save it yet, or c) change the characters in the file, and save it to disk that way?

That much is all just an issue of clarity; the bug is, IMHO, that Textpad seems to choose option b, but does not treat the buffer as if it has been changed and needs to be saved. No asterisk appears on the title bar, and you can kill the buffer without it prompting you if you want to save it. This is not how it should be treating a buffer in which changes have been made.

schnitzi · Post by **schnitzi** » Tue Jul 29, 2003 5:53 am

I titled that post "Bug dealing with unicode" but the characters involved are not necessarily unicode; same problem occurs with other various UTF encodings or any other encoding with characters not within textpad's accepted range.

devdanke · Post by **devdanke** » Tue Aug 05, 2003 6:56 pm

Hi,
I noticed this same problem in earlier versions of TP. However, I think v4.7 no longer transforms Unicode characters into something else. In v4.7 TP seems to display these Unicode characters as a variety of ASCII characters, but TP leaves the underlying Unicode data unchanged. I'm interested to know if v4.7 solves this problem for you.

schnitzi · Post by **schnitzi** » Wed Aug 06, 2003 5:13 am

devdanke wrote:Hi,
I noticed this same problem in earlier versions of TP. However, I think v4.7 no longer transforms Unicode characters into something else. In v4.7 TP seems to display these Unicode characters as a variety of ASCII characters, but TP leaves the underlying Unicode data unchanged. I'm interested to know if v4.7 solves this problem for you.

I thought it did, for a second, as it displayed a Japanese file I have with strange ASCII (I assume) characters in place of the Japanese letter codes. But with a Chinese file it displayed the same behavior as before, apparently converting the bad characters in the buffer but not marking the file as changed. I did a trivial change to the file, saved it, and was able to reload the file without any warnings about bad characters.

ramonsky · Post by **ramonsky** » Fri Nov 21, 2003 2:18 pm

There's a poll on this now. You can vote on it. See "Unicode Conformance".

schnitzi · Post by **schnitzi** » Tue Nov 25, 2003 10:08 am

Just a /nota bene/ here -- I believe Unicode compliance to be a Worthy and Noble Cause, but the bug I described here at the top of this thread is still a separate issue that I think could be cleared up easily without my having to wait for full Unicode compliance. Not that I mind the mention of the other thread here; I just don't want this thread to go away because of it...

ramonsky · Post by **ramonsky** » Tue Nov 25, 2003 1:01 pm

Not so. Firstly, I don't know what Unicode "compliance" means. So far as I am aware, it is not a formally defined term.

On the other hand, Unicode "conformance" is precisely defined, and is EXACTLY what you need to fix that bug. Unicode conformance means explicitly this - thou shalt not corrupt characters - and it means no more than that.

There's a whole thread on what Unicode conformance ISN'T (and those features may indeed be a worthy and noble cause), but what Unicode conformance IS is precisely the bug-fix you want. Non-conformance allows TextPad to say that characters "will be converted to a system default character". Conformance would require it not to do that.

Or to put it another way, if your bug is fixed, TextPad will be Unicode conformant.

So you see, the Conformance poll is precisely relevant to your problem, and to no more.

Jill

schnitzi · Post by **schnitzi** » Wed Nov 26, 2003 5:30 am

ramonsky wrote:Not so. Firstly, I don't know what Unicode "compliance" means. So far as I am aware, it is not a formally defined term.

On the other hand, Unicode "conformance" is precisely defined, and is EXACTLY what you need to fix that bug. Unicode conformance means explicitly this - thou shalt not corrupt characters - and it means no more than that.

I was using "compliance" and "conformance" interchangeably -- my bad. Worse than that -- I only skimmed your description and thought you were talking about full Unicode support when you had clearly explained what you meant by "conformance".

ramonsky wrote: There's a whole thread on what Unicode conformance ISN'T (and those features may indeed be a worthy and noble cause), but what Unicode conformance IS is precisely the bug-fix you want. Non-conformance allows TextPad to say that characters "will be converted to a system default character". Conformance would require it not to do that.

Or to put it another way, if your bug is fixed, TextPad will be Unicode conformant.

So you see, the Conformance poll is precisely relevant to your problem, and to no more.

I am in favor of the Conformance poll, and of Unicode conformance itself. But I'm not sure Unicode conformance is precisely the bug-fix I want.

My complaint was about an inconsistency shown by TextPad in a particular situation. Full Unicode compliance^H^H^H^H^H^Hconformance would be ONE fix for it, but it is maybe more of an extensive fix than the situation calls for. TextPad would be better for it at the end, no doubt, but I would be happy with a simple fix to (what I view as) the inconsistency, in the interim. That is, convert the characters, and mark the buffer as changed. I'm wondering about this now, though, because (as you pointed out elsewhere) it might lead to a bad situation where TextPad prompts you to save a file (when exiting) that you yourself may never have changed, nor wanted changed. I don't know. Maybe this seemingly inconsistent way it's being handled now is best in the interim. Will ponder.

In any case, fix the message that's displayed to clearly state how exactly TextPad is handling the situation. That's still a bug, by any standard, IMHO.

Christine VACHER · Post by **Christine VACHER** » Thu Mar 04, 2004 3:28 am

I voted for ramonsky's poll, but the description is too long and mixes several issues of various importance. This is bad tactics.

There is a very serious issue with Textpad corrupting common XML documents like those made by Word 2003.
Textpad is inferior to Notepad in that respect.

phoenixlpr · Post by **phoenixlpr** » Wed Jan 25, 2006 2:33 pm

devdanke wrote:Hi,
I noticed this same problem in earlier versions of TP. However, I think v4.7 no longer transforms Unicode characters into something else. In v4.7 TP seems to display these Unicode characters as a variety of ASCII characters, but TP leaves the underlying Unicode data unchanged. I'm interested to know if v4.7 solves this problem for you.

I have the exact same problem with 4.7.1 and 4.7.3 too.

Any updates? This bug seems to be exist there.

Community

Bug dealing with Unicode

Bug dealing with Unicode

not necessarily unicode

I think TP v4.7 fixed this problem

Re: I think TP v4.7 fixed this problem

Clarification

An urgent bug fix, not a compliance thing

Re: I think TP v4.7 fixed this problem