This may seem real stupid, but:
If I edit a reg entry in Wordpad, and save it in Unicode format -- the only Unicode choice --, all is OK.
If I want to edit it in TextPad, is it then Inicode, Unicode (fig endlian) -- which is what? -- or UTF-8?
What the heck are the diffs? I've gotten so screwed up a couple times that it took me forever to finally get it saved right...
Regards,
Chuck Billow
Unicode?
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
-
ben_josephs
- Posts: 2464
- Joined: Sun Mar 02, 2003 9:22 pm
Unicode is a character encoding. Each Unicode character is assigned a number that can be stored in 21 bits.
There are a number of formats in which these 21-bit numbers can be stored and transmitted, including:
In UTF-32 (BE or LE), each character is represented as a single 32-bit (4-byte) number. In UTF-16 (BE or LE), each character is represented as a one or two 16-bit (2-byte) numbers. In UTF-8, each character is represented as one, two, three or four 8-bit (1-byte) numbers.
For the multi-byte representations (UTF-32 and UTF-16), the byte order for each character can be least significant byte first (LE, little-endian, Intel byte order) or most significant byte first (BE, big-endian, Sun byte order, network byte order).
For ASCII characters (whose values are in the range 0..127 and can be represented in a 7-bit number) the ASCII representation and UTF-8 representation are identical.
The representation and byte order can be indicated by the presence of a byte order mark (BOM - Unicode character U+FEFF) at the beginning of the text. In the above representations this is
TextPad doesn't handle UTF-32 (BE or LE).
What TextPad (misleadingly) calls Unicode is UTF-16LE.
What TextPad (misleadingly) calls Unicode (big endian) is UTF-16BE.
What TextPad (correctly) calls UTF-8 is UTF-8.
For Windows registry files you need UTF-16LE with BOM. If you open an existing registry file in binary format, you will see that it begins with FF FE.
Select Configure | Preferences | Document Classes | <Class> | Write Unicode and UTF-8 BOM
or View | Document Properties | Preferences | Write Unicode and UTF-8 BOM
and use
File | Save As... | Encoding: Unicode
There are a number of formats in which these 21-bit numbers can be stored and transmitted, including:
Code: Select all
UTF-32BE (32 bits, big-endian)
UTF-32LE (32 bits, little-endian)
UTF-16BE (16 bits, big-endian)
UTF-16LE (16 bits, little-endian)
UTF-8 (8 bits)
For the multi-byte representations (UTF-32 and UTF-16), the byte order for each character can be least significant byte first (LE, little-endian, Intel byte order) or most significant byte first (BE, big-endian, Sun byte order, network byte order).
For ASCII characters (whose values are in the range 0..127 and can be represented in a 7-bit number) the ASCII representation and UTF-8 representation are identical.
The representation and byte order can be indicated by the presence of a byte order mark (BOM - Unicode character U+FEFF) at the beginning of the text. In the above representations this is
Code: Select all
UTF-32BE 00 00 FE FF
UTF-32LE FF FE 00 00
UTF-16BE FE FF
UTF-16LE FF FE
UTF-8 EF BB BF
What TextPad (misleadingly) calls Unicode is UTF-16LE.
What TextPad (misleadingly) calls Unicode (big endian) is UTF-16BE.
What TextPad (correctly) calls UTF-8 is UTF-8.
For Windows registry files you need UTF-16LE with BOM. If you open an existing registry file in binary format, you will see that it begins with FF FE.
Select Configure | Preferences | Document Classes | <Class> | Write Unicode and UTF-8 BOM
or View | Document Properties | Preferences | Write Unicode and UTF-8 BOM
and use
File | Save As... | Encoding: Unicode
Ben:ben_josephs wrote:Unicode is a character encoding. Each Unicode character is assigned a number that can be stored in 21 bits.
There are a number of formats in which these 21-bit numbers can be stored and transmitted, including:In UTF-32 (BE or LE), each character is represented as a single 32-bit (4-byte) number. In UTF-16 (BE or LE), each character is represented as a one or two 16-bit (2-byte) numbers. In UTF-8, each character is represented as one, two, three or four 8-bit (1-byte) numbers.Code: Select all
UTF-32BE (32 bits, big-endian) UTF-32LE (32 bits, little-endian) UTF-16BE (16 bits, big-endian) UTF-16LE (16 bits, little-endian) UTF-8 (8 bits)
For the multi-byte representations (UTF-32 and UTF-16), the byte order for each character can be least significant byte first (LE, little-endian, Intel byte order) or most significant byte first (BE, big-endian, Sun byte order, network byte order).
For ASCII characters (whose values are in the range 0..127 and can be represented in a 7-bit number) the ASCII representation and UTF-8 representation are identical.
The representation and byte order can be indicated by the presence of a byte order mark (BOM - Unicode character U+FEFF) at the beginning of the text. In the above representations this isTextPad doesn't handle UTF-32 (BE or LE).Code: Select all
UTF-32BE 00 00 FE FF UTF-32LE FF FE 00 00 UTF-16BE FE FF UTF-16LE FF FE UTF-8 EF BB BF
What TextPad (misleadingly) calls Unicode is UTF-16LE.
What TextPad (misleadingly) calls Unicode (big endian) is UTF-16BE.
What TextPad (correctly) calls UTF-8 is UTF-8.
For Windows registry files you need UTF-16LE with BOM. If you open an existing registry file in binary format, you will see that it begins with FF FE.
Select Configure | Preferences | Document Classes | <Class> | Write Unicode and UTF-8 BOM
or View | Document Properties | Preferences | Write Unicode and UTF-8 BOM
and use
File | Save As... | Encoding: Unicode
That was not only thorough, but even for me understandable.
Thanks,
Chuck