Unicode yet again - when and how it should be approached

Ideas for new features

Moderators: AmigoJack, helios, bbadmin, Bob Hansen, MudGuard

When should TextPad support Unicode?

It's too late - I've already given up on TextPad
9
22%
Get going with it right now and then get it out
18
44%
Make it one of a handful of improvements for version 5.3
2
5%
I can live with it waiting until you were going to release version 6.0 anyway
11
27%
Maybe another ten years or so down the line
1
2%
Never - TextPad wouldn't be TextPad without that silly "They will be converted to the system default character" message
0
No votes
 
Total votes: 41

smjg
Posts: 30
Joined: Mon Mar 08, 2004 10:34 am
Contact:

Unicode yet again - when and how it should be approached

Post by smjg »

This has gone on far too long. And while it's hard to believe that they're still trying to think up a strategy for getting it in, here's my proposal....
  1. Freeze all development of new features in TextPad apart from Unicode support. Bug fixes could still be allowed.
  2. Do some initial bug fixes:
    1. whereby Edit -> Copy Other -> As a [sic] HTML page declares a bogus encoding, whereas it should not declare an encoding at all.
    2. whereby, in certain circumstances, the encoding the user selects when opening a file is not honoured.
  3. Create a fork of the TextPad code. This would enable Unicode support to be developed while, in the meantime, bug fixes to the non-Unicode TextPad can still be made and released.
  4. Refactor the code base to work in UTF-32 (see the long-running thread for reasons) throughout. No code relying on 8-bit characters should remain, except for the little bits to read and write to files in 8-bit encodings.

    This would achieve basic Unicode support - Unicode conformance as already described plus the ability to display (subject to availability) and hopefully input any Unicode character.
  5. Once this is done, release the result as the next version of TextPad. Discard the non-Unicode TextPad code.
  6. Make a start at implementing some of the features that go beyond basic Unicode support. Once we've at least got somewhere, the feature freeze set at step 1 can be lifted.
Can anyone think of a better idea?

The only remaining question is when Helios should get going with this process. Going by the combination of existing constraints and popular demand, I feel there's only one right answer to this question: now.

Does everyone agree?
User avatar
SteveH
Posts: 327
Joined: Thu Apr 03, 2003 11:37 am
Location: Edinburgh, Scotland
Contact:

Re: Unicode yet again - when and how it should be approached

Post by SteveH »

smjg wrote:The only remaining question is when Helios should get going with this process. Going by the combination of existing constraints and popular demand, I feel there's only one right answer to this question: now.

Does everyone agree?
I don't actually agree with this statement, much as I would love to have Unicode support in TextPad.

Clearly there is significant demand for this feature, but at the end of the day TextPad is Helios' baby and they can choose to develop it how they will, release the end result and let the market decide whether it is successful or not. This 'agreement' cuts both ways though; if current users require a feature that is not currently provided they are free to look elsewhere and this is what I've done. I still like and use TextPad but other editors are developed faster and include Unicode capabilities that I prefer to have available.
Running TextPad 5.4 on Windows XP SP3 and on OS X 10.7 under VMWare or Crossover.
User avatar
Nicholas Jordan
Posts: 124
Joined: Mon Dec 20, 2004 12:33 am
Location: Central Texas ISO Latin-1
Contact:

desirable yes, demand no

Post by Nicholas Jordan »

After seconding Steve's caveat, I note that Java natively writes UTF-16 or so I think it does. The definition of UTF-8 / UTF-16 provides a fallback vector that should allow some progress in this direction but Unihan is fraught with nasty areas due to commercial implementations not complying with standards and RFC's - noteworthy is that TP exceeds any commercial tool I know of for it's responsiveness and ability to handle 4k line buffers up to available memory of the underlying OS.

Given the skills displayed by Helios, I think it telling that built-in handling of Unihan has not been done already. I see no substantial basis for bringing the hammer down on Bellringer's Tomb here. Maybe a short FlashWar - fun - but as the owner of the rights, it is right that they can do whatever they want.
smjg
Posts: 30
Joined: Mon Mar 08, 2004 10:34 am
Contact:

Re: desirable yes, demand no

Post by smjg »

Nicholas Jordan wrote:Given the skills displayed by Helios, I think it telling that built-in handling of Unihan has not been done already. I see no substantial basis for bringing the hammer down on Bellringer's Tomb here. Maybe a short FlashWar - fun - but as the owner of the rights, it is right that they can do whatever they want.
But is Unihan handling part of basic Unicode support or full Unicode support? I personally think Helios should cross this bridge when they get to it....

And to both of you: "Helios might not want to" isn't really a counter-argument. What the developers want or don't want in a product plays little part in deciding its level of either mainstream popularity or commercial success.
User avatar
SteveH
Posts: 327
Joined: Thu Apr 03, 2003 11:37 am
Location: Edinburgh, Scotland
Contact:

Re: desirable yes, demand no

Post by SteveH »

smjg wrote:What the developers want or don't want in a product plays little part in deciding its level of either mainstream popularity or commercial success.
Are you serious? I would argue that the features the developers include or not in an application are critical to it's success or failure. The important caveat is that all users are not looking for the same features in their application.
Running TextPad 5.4 on Windows XP SP3 and on OS X 10.7 under VMWare or Crossover.
smjg
Posts: 30
Joined: Mon Mar 08, 2004 10:34 am
Contact:

Re: desirable yes, demand no

Post by smjg »

SteveH wrote:I would argue that the features the developers include or not in an application are critical to it's success or failure.
True, but surely that's a matter of matching the features to what the users want, rather than what the developers want?
SteveH wrote:The important caveat is that all users are not looking for the same features in their application.
That's true as well. But can you think of a feature that's more frequently requested for TextPad than Unicode support?
User avatar
SteveH
Posts: 327
Joined: Thu Apr 03, 2003 11:37 am
Location: Edinburgh, Scotland
Contact:

Re: desirable yes, demand no

Post by SteveH »

smjg wrote:But can you think of a feature that's more frequently requested for TextPad than Unicode support?
I could at one time - check out this old poll.

Back at that time editable macros was the number 1 request followed by code folding. I'm not hugely interested in either of these (unlike Unicode) but as 5.X has not introduced any new features :cry: in these areas I reckon there is a good chance these are still pretty popular too.
Running TextPad 5.4 on Windows XP SP3 and on OS X 10.7 under VMWare or Crossover.
S Peterson
Posts: 1
Joined: Fri Dec 26, 2008 12:54 am
Location: Easter Pennsylvania, USA

UTF8 encoding and "Unicode" are not REAL UNICODE!

Post by S Peterson »

I had several months of work wrapped up in a large HTML file, a way of encoding an etext in Unicode Greek, as well as Latin, Hebrew, Syriac, and occasional Coptic scripts.

This was done in Notepad++.

I was delighted to find, download, and configure TextPad. It offered a simpler interface than Notepad++ and apparently offered Unicode support. I could configure HTML to read/encode in UTF8, and even in UTF8 free of the BOM that messes up Notepad files (Notepad++ can also do this).

The important added feature, to my way of thinking, with this and similar bread-and-butter files, is the ability to set the backup timing. So today I set it at 3 Minutes.

With all the settings arranged to my liking, in an easy and clear process, I loaded the most difficult of the multilingual Unicode files I had developed.

It was appalling to discover that TextPad read even the basic Greek text in ???, not in Unicode Greek. It was only then that the warning appeared - "only characters in the 1252 character ASCII group have been encoded." I had such high hopes that I didn't immediately clear the file.

I am very afraid that the 3-minute backup may have kicked in, wiping out a month's worth of work, not yet sold and paid for.

I want to plead with you to post a warning very conspicuously, until you have a Unicode-ready editor that will deal with multiple codes in (at least) the UTF-8 group. When you have that, please specify the extent of the codes you can represent.

For example, Notepad++ uses a glyph something like a fleur-de-lys to represent a range of codes it doesn't draw, but it keeps the UTF-8 representations separated from each other, and they are recoverable. When loaded into a browser, all UTF-8 codes are viewable.

I have not checked my file yet, because if I did, I WOULD BE SHOUTING in frustration.

Please see the 2003 ! discussion of a truly Unicode-compliant program.
smjg
Posts: 30
Joined: Mon Mar 08, 2004 10:34 am
Contact:

Re: UTF8 encoding and "Unicode" are not REAL UNICO

Post by smjg »

S Peterson wrote:I am very afraid that the 3-minute backup may have kicked in, wiping out a month's worth of work, not yet sold and paid for.
The automatic backup shouldn't be replacing the original file under any circumstances.
S Peterson wrote:I want to plead with you to post a warning very conspicuously, until you have a Unicode-ready editor that will deal with multiple codes in (at least) the UTF-8 group. When you have that, please specify the extent of the codes you can represent.
A Unicode-ready editor, by definition, can internally represent and therefore preserve all Unicode characters.
smjg
Posts: 30
Joined: Mon Mar 08, 2004 10:34 am
Contact:

This is getting silly. Correction: It already is very silly

Post by smjg »

It would appear from my records that it was back in April 2008 that I wrote to Helios asking when this is going to happen. Needless to say, no answer came.

Thinking about it now, maybe the poll question should've been: When should Helios unbury its head from the sand?
User avatar
SteveH
Posts: 327
Joined: Thu Apr 03, 2003 11:37 am
Location: Edinburgh, Scotland
Contact:

Post by SteveH »

I agree that it's disappointing that there has been no progress in this area but stand by my earlier assertion regarding mission-critical features:
if current users require a feature that is not currently provided they are free to look elsewhere
Running TextPad 5.4 on Windows XP SP3 and on OS X 10.7 under VMWare or Crossover.
ccollins
Posts: 10
Joined: Thu Feb 28, 2008 2:54 pm
Location: Ohio USA

At least give warnings and remove false claims

Post by ccollins »

Judging form the number of respondents to this poll, this must not be that important of an issue. However, even if Helois does not want to give full UTF support, they should at least

1. WARN if Textpad is going to change the format from (for example) PC UTF-8 to PC ANSI.

2. Get rid of the choices in the "Save As" "Encoding" drop down if it really isn't going to save in the encoding.

Both of these should be very easy to implement, and I believe, are a matter of integrity. 1. If you need to change the format of a file, at least tell the person you are doing it. 2. Don't claim to be able to do things you can't.

PS. I am using 5.3.0. I'll change to 5.3.1 but the notes for it do not say anything about this issue.

PPS. I do not pay for TextPad right now. I feel as long as Helois claims TexpPad can do things it can't, there is no reason for me to. As soon as Helois drops these false claims or fulfills them, I will start paying.
smjg
Posts: 30
Joined: Mon Mar 08, 2004 10:34 am
Contact:

Re: At least give warnings and remove false claims

Post by smjg »

ccollins wrote:Judging form the number of respondents to this poll, this must not be that important of an issue.
I think it means that there are actually far more in the "It's too late - I've already given up on TextPad" category, who aren't active here because they're just not interested now. I don't suppose we'll ever know how many users TextPad has lost because of this.
ccollins wrote:However, even if Helois does not want to give full UTF support, they should at least

1. WARN if Textpad is going to change the format from (for example) PC UTF-8 to PC ANSI.
It already does warn you when you open the file that it's about to mangle your characters. Does it sometimes change the save format to ANSI if you opened it as UTF-8? I've never seen this happen - can you give steps to reproduce?
ccollins wrote:2. Get rid of the choices in the "Save As" "Encoding" drop down if it really isn't going to save in the encoding.
Getting rid of it from the Open dialog, given that it doesn't work, is more important. In any case, it should stop going out of it way to prevent you from opening files as ANSI.

In any case, current TextPad is worse-behaved than editors that have no concept of Unicode at all.
ccollins
Posts: 10
Joined: Thu Feb 28, 2008 2:54 pm
Location: Ohio USA

Re: At least give warnings and remove false claims

Post by ccollins »

smjg wrote: It already does warn you when you open the file that it's about to mangle your characters. Does it sometimes change the save format to ANSI if you opened it as UTF-8? I've never seen this happen - can you give steps to reproduce?
Create a UTF-8 file that has no special characters (e.g. hello). Open it in TextPad. There are no warnings since there are no special characters. Modify the file (e.g. add world) in TP and save it. TextPad saves it as an ANSI file instead of a UTF-8 without a notification that it changed formats. I have several tools that complain because the file format has changed.
smjg
Posts: 30
Joined: Mon Mar 08, 2004 10:34 am
Contact:

Re: At least give warnings and remove false claims

Post by smjg »

ccollins wrote:Create a UTF-8 file that has no special characters (e.g. hello). Open it in TextPad. There are no warnings since there are no special characters. Modify the file (e.g. add world) in TP and save it. TextPad saves it as an ANSI file instead of a UTF-8 without a notification that it changed formats. I have several tools that complain because the file format has changed.
But a UTF-8 file and an ANSI file are identical if there's no character U+007F. So how have you come to that conclusion?

Could you be confusing encoding with the presence or absence of a BOM?
Post Reply