I am a complete novice with this...
Are there advantages or disadvantages to editing files in Unicode as opposed to ASCII?
That being said, if a file is in Unicode, is the default UTF-8 or UTF-16?
Regards,
Chuck Billow
Unicode?
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
Keeping it simple:
ASCII characters can be represented in 7 bits, so there are 127 different ones. This is generally sufficient for US English texts and most programming languages.
ANSI characters are stored in 8 bits (1 byte). The first 127 are the same as ASCII, while the remaining ones are different in different languages (eg. é and ï). However, two bytes per character are used for languages such as Chinese and Japanese.
Unicode represents characters in up to 24 bits, so there could theoretically be 16,777,215 different ones, with the first 127 being the same as ASCII.
So, if you need to edit text files in more than one language, Unicode is the answer. Files can be stored in UTF-8 or UTF-16. The advantage of UTF-8 is that files containing only ASCII characters don't take up any extra space, but characters with code points > 127 are encoded in 2 to 4 bytes. In UTF-16, characters with code points < 65,536 are stored in 2 bytes, otherwise 4 bytes are required, but that is very rare.
Google will turn up plenty of reading matter on the subject, but this should be enough for you to decide if you actually need to use Unicode.
Keith MacDonald
Helios Software Solutions
ASCII characters can be represented in 7 bits, so there are 127 different ones. This is generally sufficient for US English texts and most programming languages.
ANSI characters are stored in 8 bits (1 byte). The first 127 are the same as ASCII, while the remaining ones are different in different languages (eg. é and ï). However, two bytes per character are used for languages such as Chinese and Japanese.
Unicode represents characters in up to 24 bits, so there could theoretically be 16,777,215 different ones, with the first 127 being the same as ASCII.
So, if you need to edit text files in more than one language, Unicode is the answer. Files can be stored in UTF-8 or UTF-16. The advantage of UTF-8 is that files containing only ASCII characters don't take up any extra space, but characters with code points > 127 are encoded in 2 to 4 bytes. In UTF-16, characters with code points < 65,536 are stored in 2 bytes, otherwise 4 bytes are required, but that is very rare.
Google will turn up plenty of reading matter on the subject, but this should be enough for you to decide if you actually need to use Unicode.
Keith MacDonald
Helios Software Solutions
Re: Unicode?
I tried to find similar questions (along with answers) which may help you in understanding what you asked:
Are there advantages or disadvantages to using a 64bit operating system as opposed to 32bit? [1]CWBillow wrote:Are there advantages or disadvantages to editing files in Unicode as opposed to ASCII?
If a file is an MP4, is the default resolution 800x600 or 1920x1080? [2]CWBillow wrote:if a file is in Unicode, is the default UTF-8 or UTF-16?
- Yes, both have both. 32bit operating systems suffice for most needs if you need only 3,5 GiB RAM or less, while 64bit operating systems have to draw a line somewhere on which legacy stuff is not supported anymore (i.e. 16bit executables, or 32bit drivers).
- There is no default: the file is just a file - its content has no default. Video files have a section that defines the output resolution; UTF8, UTF16 and UTF32 might also have a recognition marker, but that is optional.