32 bit Unicode encoding

Ideas for new features

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply

Do you want to be able to save files in 32 bit unicode format?

Yes, that would be very important for me.
10
38%
I'd like to see this option although it's not necessary for me right now.
6
23%
I don't care, I'm an ASCII lover.
5
19%
The author should use his time to work on more important features.
5
19%
No, why would you ever want to use 32 bit character encoding?
0
No votes
 
Total votes: 26

Baraclese
Posts: 1
Joined: Wed Aug 06, 2003 5:14 pm

32 bit Unicode encoding

Post by Baraclese »

I've got to use other editors to save files in UTF-32 format.. which sucks cause I love textpad :)
LonelyPixel
Posts: 12
Joined: Mon Sep 15, 2003 9:17 pm
Location: Germany

Post by LonelyPixel »

Ehm, what is UCS-4 good for? I thought there's enough space ever in UCS-2...

But I believe there should be REAL UCS-2/UTF-8 support first, then we could talk about the 'more advanced features'... unless UCS-4 would be easy to do when implementing UCS-2 functionality.

I personally wouldn't need this one, waiting for an explanation before I vote on this... ;)
User avatar
ramonsky
Posts: 88
Joined: Fri Nov 14, 2003 10:54 am

Post by ramonsky »

LonelyPixel wrote:waiting for an explanation before I vote on this...
Okey Dokey, here goes...

When you click on Save As you get a dialog box. It contains a drop-down menu called "Encoding". One of these encodings is UTF-8. So far so good.

Also on the list are "Unicode" and "Unicode (big endian)". These are misnamed - Unicode is a character set, not an encoding. Ideally, TP should refer to these encodings by their correct names: UCS-2LE and UCS-2BE respectively (collectively known as UCS-2).

However, notably ABSENT from the list are UTF-16LE, UTF-16BE (collectively known as UTF-16), UTF-32LE and UTF-32BE (collectively known as UTF-32). All of these are important for saving Unicode in a file. Baraclese's suggestion may seem trivial, but it's a piece of cake to implement. Given access to TP's source code, I could code all of these in less than ten minutes (half an hour if you want them tested). It's a trivial enhancement, and wouldn't significantly increase either TP's size or efficiency.

BUT ... since TextPad is not currently capable of storing Unicode characters which are not in the current Windows codepage, it's also an enhancement suggestion with decidedly limited usefulness. :(

(Technical note: TextPad doesn't interpret UTF-8 correctly when opening a file it didn't create either. This is easy to demonstrate by messing around with a binary file editor).

You asked what is UCS-4 good for? I shall explain. UCS-2 is the subset of Unicode consisting of the codepoints from U+0000 to U+FFFF inclusive. Each character is saved as precisely two bytes. However, Unicode doesn't stop at U+FFFF - it goes all the way up to U+10FFFF, so all of the characters between U+010000 and U+10FFFF are as inexpressible in UCS-2 as they are in ASCII. UCS-4, on the other hand, stores codepoints from U+00000000 to U+FFFFFFFF inclusive, saved as precisely four bytes per character. Thus, it can store every Unicode character ... as well as the very, very high codepoints beyond U+10FFFF which even Unicode doesn't claim. The UTF- formats are slightly different, in that they can all store every Unicode character. UTF-8 is a variable-byte-width encoding (each character takes 1, 2, 3 or 4 bytes), and UTF-16 is a variable-word-width encoding (each character takes 1 or 2 16-bit words). UTF-32 is effectively the same as UCS-4 except that codepoints above U+10FFFF are illegal.

I suggest that people vote for this because, as I said, it's a trivial enhancement, and the only difference you'd notice is a few extra choices on one particular pulldown menu.

However, :idea: , I'd also like to point you in the direction of a more significant poll ... Unicode Conformance ... which is an important and non-trivial enhancement request. And, in point of fact, the suggestion of THIS thread is going to be pretty useless unless we get proper Unicode support (as in, not destroying characters) first.
mdemirha
Posts: 1
Joined: Mon Dec 22, 2003 6:00 pm

Post by mdemirha »

I really dont understand what is so hard to support Unicode character sets. I know there are some complications (sorting/searching,..etc), but they are not the highest priority for most of the people. But support (save/display) Unicode characters is very important for many users I believe.

I once converted 100,000 lines of an ANSI program to UNICODE all on my own in 2-3 weeks. It requires some work and it sometimes drives you mad. But the result is sooo beautiful. If you write your code good enough, Windows API handles every damn task for you!

I was just going to order multiple licenses for myself and my group, but when I tried to edit a UTF-16 RC file in Textpad I was really disappointed. It is a shame that such a great tool does not have a good Unicode support.

Anyway, I will keep an eye on the updates. I hope to see a good Unicode support in version 5 ;)
zridling
Posts: 55
Joined: Tue Apr 08, 2003 5:33 am
Location: Chicago, US
Contact:

Post by zridling »

This would be a great enhancement, indeed. I'm using UltraEdit when I work using Unicode 4.0.
Post Reply