Page 1 of 2

Unicode yet again - when and how it should be approached

Posted: Fri Oct 17, 2008 10:47 pm
by smjg
This has gone on far too long. And while it's hard to believe that they're still trying to think up a strategy for getting it in, here's my proposal....
  1. Freeze all development of new features in TextPad apart from Unicode support. Bug fixes could still be allowed.
  2. Do some initial bug fixes:
    1. whereby Edit -> Copy Other -> As a [sic] HTML page declares a bogus encoding, whereas it should not declare an encoding at all.
    2. whereby, in certain circumstances, the encoding the user selects when opening a file is not honoured.
  3. Create a fork of the TextPad code. This would enable Unicode support to be developed while, in the meantime, bug fixes to the non-Unicode TextPad can still be made and released.
  4. Refactor the code base to work in UTF-32 (see the long-running thread for reasons) throughout. No code relying on 8-bit characters should remain, except for the little bits to read and write to files in 8-bit encodings.

    This would achieve basic Unicode support - Unicode conformance as already described plus the ability to display (subject to availability) and hopefully input any Unicode character.
  5. Once this is done, release the result as the next version of TextPad. Discard the non-Unicode TextPad code.
  6. Make a start at implementing some of the features that go beyond basic Unicode support. Once we've at least got somewhere, the feature freeze set at step 1 can be lifted.
Can anyone think of a better idea?

The only remaining question is when Helios should get going with this process. Going by the combination of existing constraints and popular demand, I feel there's only one right answer to this question: now.

Does everyone agree?

Re: Unicode yet again - when and how it should be approached

Posted: Sun Nov 02, 2008 9:24 am
by SteveH
smjg wrote:The only remaining question is when Helios should get going with this process. Going by the combination of existing constraints and popular demand, I feel there's only one right answer to this question: now.

Does everyone agree?
I don't actually agree with this statement, much as I would love to have Unicode support in TextPad.

Clearly there is significant demand for this feature, but at the end of the day TextPad is Helios' baby and they can choose to develop it how they will, release the end result and let the market decide whether it is successful or not. This 'agreement' cuts both ways though; if current users require a feature that is not currently provided they are free to look elsewhere and this is what I've done. I still like and use TextPad but other editors are developed faster and include Unicode capabilities that I prefer to have available.

desirable yes, demand no

Posted: Sun Nov 02, 2008 2:46 pm
by Nicholas Jordan
After seconding Steve's caveat, I note that Java natively writes UTF-16 or so I think it does. The definition of UTF-8 / UTF-16 provides a fallback vector that should allow some progress in this direction but Unihan is fraught with nasty areas due to commercial implementations not complying with standards and RFC's - noteworthy is that TP exceeds any commercial tool I know of for it's responsiveness and ability to handle 4k line buffers up to available memory of the underlying OS.

Given the skills displayed by Helios, I think it telling that built-in handling of Unihan has not been done already. I see no substantial basis for bringing the hammer down on Bellringer's Tomb here. Maybe a short FlashWar - fun - but as the owner of the rights, it is right that they can do whatever they want.

Re: desirable yes, demand no

Posted: Sun Nov 02, 2008 3:46 pm
by smjg
Nicholas Jordan wrote:Given the skills displayed by Helios, I think it telling that built-in handling of Unihan has not been done already. I see no substantial basis for bringing the hammer down on Bellringer's Tomb here. Maybe a short FlashWar - fun - but as the owner of the rights, it is right that they can do whatever they want.
But is Unihan handling part of basic Unicode support or full Unicode support? I personally think Helios should cross this bridge when they get to it....

And to both of you: "Helios might not want to" isn't really a counter-argument. What the developers want or don't want in a product plays little part in deciding its level of either mainstream popularity or commercial success.

Re: desirable yes, demand no

Posted: Sun Nov 02, 2008 4:41 pm
by SteveH
smjg wrote:What the developers want or don't want in a product plays little part in deciding its level of either mainstream popularity or commercial success.
Are you serious? I would argue that the features the developers include or not in an application are critical to it's success or failure. The important caveat is that all users are not looking for the same features in their application.

Re: desirable yes, demand no

Posted: Sun Nov 02, 2008 5:22 pm
by smjg
SteveH wrote:I would argue that the features the developers include or not in an application are critical to it's success or failure.
True, but surely that's a matter of matching the features to what the users want, rather than what the developers want?
SteveH wrote:The important caveat is that all users are not looking for the same features in their application.
That's true as well. But can you think of a feature that's more frequently requested for TextPad than Unicode support?

Re: desirable yes, demand no

Posted: Sun Nov 02, 2008 6:45 pm
by SteveH
smjg wrote:But can you think of a feature that's more frequently requested for TextPad than Unicode support?
I could at one time - check out this old poll.

Back at that time editable macros was the number 1 request followed by code folding. I'm not hugely interested in either of these (unlike Unicode) but as 5.X has not introduced any new features :cry: in these areas I reckon there is a good chance these are still pretty popular too.

UTF8 encoding and "Unicode" are not REAL UNICODE!

Posted: Fri Dec 26, 2008 1:33 am
by S Peterson
I had several months of work wrapped up in a large HTML file, a way of encoding an etext in Unicode Greek, as well as Latin, Hebrew, Syriac, and occasional Coptic scripts.

This was done in Notepad++.

I was delighted to find, download, and configure TextPad. It offered a simpler interface than Notepad++ and apparently offered Unicode support. I could configure HTML to read/encode in UTF8, and even in UTF8 free of the BOM that messes up Notepad files (Notepad++ can also do this).

The important added feature, to my way of thinking, with this and similar bread-and-butter files, is the ability to set the backup timing. So today I set it at 3 Minutes.

With all the settings arranged to my liking, in an easy and clear process, I loaded the most difficult of the multilingual Unicode files I had developed.

It was appalling to discover that TextPad read even the basic Greek text in ???, not in Unicode Greek. It was only then that the warning appeared - "only characters in the 1252 character ASCII group have been encoded." I had such high hopes that I didn't immediately clear the file.

I am very afraid that the 3-minute backup may have kicked in, wiping out a month's worth of work, not yet sold and paid for.

I want to plead with you to post a warning very conspicuously, until you have a Unicode-ready editor that will deal with multiple codes in (at least) the UTF-8 group. When you have that, please specify the extent of the codes you can represent.

For example, Notepad++ uses a glyph something like a fleur-de-lys to represent a range of codes it doesn't draw, but it keeps the UTF-8 representations separated from each other, and they are recoverable. When loaded into a browser, all UTF-8 codes are viewable.

I have not checked my file yet, because if I did, I WOULD BE SHOUTING in frustration.

Please see the 2003 ! discussion of a truly Unicode-compliant program.

Re: UTF8 encoding and "Unicode" are not REAL UNICO

Posted: Fri Dec 26, 2008 6:34 pm
by smjg
S Peterson wrote:I am very afraid that the 3-minute backup may have kicked in, wiping out a month's worth of work, not yet sold and paid for.
The automatic backup shouldn't be replacing the original file under any circumstances.
S Peterson wrote:I want to plead with you to post a warning very conspicuously, until you have a Unicode-ready editor that will deal with multiple codes in (at least) the UTF-8 group. When you have that, please specify the extent of the codes you can represent.
A Unicode-ready editor, by definition, can internally represent and therefore preserve all Unicode characters.

This is getting silly. Correction: It already is very silly

Posted: Mon Aug 17, 2009 9:08 pm
by smjg
It would appear from my records that it was back in April 2008 that I wrote to Helios asking when this is going to happen. Needless to say, no answer came.

Thinking about it now, maybe the poll question should've been: When should Helios unbury its head from the sand?

Posted: Wed Aug 19, 2009 9:27 pm
by SteveH
I agree that it's disappointing that there has been no progress in this area but stand by my earlier assertion regarding mission-critical features:
if current users require a feature that is not currently provided they are free to look elsewhere

At least give warnings and remove false claims

Posted: Fri May 21, 2010 12:29 pm
by ccollins
Judging form the number of respondents to this poll, this must not be that important of an issue. However, even if Helois does not want to give full UTF support, they should at least

1. WARN if Textpad is going to change the format from (for example) PC UTF-8 to PC ANSI.

2. Get rid of the choices in the "Save As" "Encoding" drop down if it really isn't going to save in the encoding.

Both of these should be very easy to implement, and I believe, are a matter of integrity. 1. If you need to change the format of a file, at least tell the person you are doing it. 2. Don't claim to be able to do things you can't.

PS. I am using 5.3.0. I'll change to 5.3.1 but the notes for it do not say anything about this issue.

PPS. I do not pay for TextPad right now. I feel as long as Helois claims TexpPad can do things it can't, there is no reason for me to. As soon as Helois drops these false claims or fulfills them, I will start paying.

Re: At least give warnings and remove false claims

Posted: Fri May 21, 2010 2:39 pm
by smjg
ccollins wrote:Judging form the number of respondents to this poll, this must not be that important of an issue.
I think it means that there are actually far more in the "It's too late - I've already given up on TextPad" category, who aren't active here because they're just not interested now. I don't suppose we'll ever know how many users TextPad has lost because of this.
ccollins wrote:However, even if Helois does not want to give full UTF support, they should at least

1. WARN if Textpad is going to change the format from (for example) PC UTF-8 to PC ANSI.
It already does warn you when you open the file that it's about to mangle your characters. Does it sometimes change the save format to ANSI if you opened it as UTF-8? I've never seen this happen - can you give steps to reproduce?
ccollins wrote:2. Get rid of the choices in the "Save As" "Encoding" drop down if it really isn't going to save in the encoding.
Getting rid of it from the Open dialog, given that it doesn't work, is more important. In any case, it should stop going out of it way to prevent you from opening files as ANSI.

In any case, current TextPad is worse-behaved than editors that have no concept of Unicode at all.

Re: At least give warnings and remove false claims

Posted: Mon May 24, 2010 12:01 am
by ccollins
smjg wrote: It already does warn you when you open the file that it's about to mangle your characters. Does it sometimes change the save format to ANSI if you opened it as UTF-8? I've never seen this happen - can you give steps to reproduce?
Create a UTF-8 file that has no special characters (e.g. hello). Open it in TextPad. There are no warnings since there are no special characters. Modify the file (e.g. add world) in TP and save it. TextPad saves it as an ANSI file instead of a UTF-8 without a notification that it changed formats. I have several tools that complain because the file format has changed.

Re: At least give warnings and remove false claims

Posted: Mon May 24, 2010 12:17 am
by smjg
ccollins wrote:Create a UTF-8 file that has no special characters (e.g. hello). Open it in TextPad. There are no warnings since there are no special characters. Modify the file (e.g. add world) in TP and save it. TextPad saves it as an ANSI file instead of a UTF-8 without a notification that it changed formats. I have several tools that complain because the file format has changed.
But a UTF-8 file and an ANSI file are identical if there's no character U+007F. So how have you come to that conclusion?

Could you be confusing encoding with the presence or absence of a BOM?