32 bit Unicode encoding

Baraclese · Post by **Baraclese** » Wed Aug 06, 2003 5:35 pm

I've got to use other editors to save files in UTF-32 format.. which sucks cause I love textpad

LonelyPixel · Post by **LonelyPixel** » Mon Sep 15, 2003 9:32 pm

Ehm, what is UCS-4 good for? I thought there's enough space ever in UCS-2...

But I believe there should be REAL UCS-2/UTF-8 support first, then we could talk about the 'more advanced features'... unless UCS-4 would be easy to do when implementing UCS-2 functionality.

I personally wouldn't need this one, waiting for an explanation before I vote on this...

ramonsky · Post by **ramonsky** » Tue Dec 16, 2003 11:45 am

LonelyPixel wrote:waiting for an explanation before I vote on this...

Okey Dokey, here goes...

When you click on Save As you get a dialog box. It contains a drop-down menu called "Encoding". One of these encodings is UTF-8. So far so good.

Also on the list are "Unicode" and "Unicode (big endian)". These are misnamed - Unicode is a character set, not an encoding. Ideally, TP should refer to these encodings by their correct names: UCS-2LE and UCS-2BE respectively (collectively known as UCS-2).

However, notably ABSENT from the list are UTF-16LE, UTF-16BE (collectively known as UTF-16), UTF-32LE and UTF-32BE (collectively known as UTF-32). All of these are important for saving Unicode in a file. Baraclese's suggestion may seem trivial, but it's a piece of cake to implement. Given access to TP's source code, I could code all of these in less than ten minutes (half an hour if you want them tested). It's a trivial enhancement, and wouldn't significantly increase either TP's size or efficiency.

BUT ... since TextPad is not currently capable of storing Unicode characters which are not in the current Windows codepage, it's also an enhancement suggestion with decidedly limited usefulness.

(Technical note: TextPad doesn't interpret UTF-8 correctly when opening a file it didn't create either. This is easy to demonstrate by messing around with a binary file editor).

You asked what is UCS-4 good for? I shall explain. UCS-2 is the subset of Unicode consisting of the codepoints from U+0000 to U+FFFF inclusive. Each character is saved as precisely two bytes. However, Unicode doesn't stop at U+FFFF - it goes all the way up to U+10FFFF, so all of the characters between U+010000 and U+10FFFF are as inexpressible in UCS-2 as they are in ASCII. UCS-4, on the other hand, stores codepoints from U+00000000 to U+FFFFFFFF inclusive, saved as precisely four bytes per character. Thus, it can store every Unicode character ... as well as the very, very high codepoints beyond U+10FFFF which even Unicode doesn't claim. The UTF- formats are slightly different, in that they can all store every Unicode character. UTF-8 is a variable-byte-width encoding (each character takes 1, 2, 3 or 4 bytes), and UTF-16 is a variable-word-width encoding (each character takes 1 or 2 16-bit words). UTF-32 is effectively the same as UCS-4 except that codepoints above U+10FFFF are illegal.

I suggest that people vote for this because, as I said, it's a trivial enhancement, and the only difference you'd notice is a few extra choices on one particular pulldown menu.

However,

, I'd also like to point you in the direction of a more significant poll ... Unicode Conformance ... which is an important and non-trivial enhancement request. And, in point of fact, the suggestion of THIS thread is going to be pretty useless unless we get proper Unicode support (as in, not destroying characters) first.

mdemirha · Post by **mdemirha** » Mon Dec 22, 2003 6:08 pm

I really dont understand what is so hard to support Unicode character sets. I know there are some complications (sorting/searching,..etc), but they are not the highest priority for most of the people. But support (save/display) Unicode characters is very important for many users I believe.

I once converted 100,000 lines of an ANSI program to UNICODE all on my own in 2-3 weeks. It requires some work and it sometimes drives you mad. But the result is sooo beautiful. If you write your code good enough, Windows API handles every damn task for you!

I was just going to order multiple licenses for myself and my group, but when I tried to edit a UTF-16 RC file in Textpad I was really disappointed. It is a shame that such a great tool does not have a good Unicode support.

Anyway, I will keep an eye on the updates. I hope to see a good Unicode support in version 5

zridling · Post by **zridling** » Mon Jan 05, 2004 9:18 am

This would be a great enhancement, indeed. I'm using UltraEdit when I work using Unicode 4.0.

Community

32 bit Unicode encoding

Do you want to be able to save files in 32 bit unicode format?

32 bit Unicode encoding