Page 1 of 1
"ANSI" characters?
Posted: Mon Jan 10, 2005 7:11 am
by KimmoA
Why does it say "ANSI characters"?
It's ASCII up to 128... then what does it use? Some ISO standard? Or some extended ASCII one? "ANSI ANSI"?

Posted: Mon Jan 10, 2005 9:53 am
by MudGuard
ASCII is a 7-Bit code thus "It's ASCII up to 128" is wrong as ASCII only goes up to 127 ...
The ANSI character set is a superset of ISO-8859-1 - in ISO-8859-1 the code positions 128 to 159 are not used, in ANSI they are used - e.g. ANSI 128 is the EURO currency symbol.
Posted: Mon Jan 10, 2005 10:09 am
by ben_josephs
A good question. Even Microsoft acknowledges that the term "ANSI character" is wrong (
http://www.microsoft.com/globaldev/refe ... ssary.mspx):
ANSI: Acronym for the American National Standards Institute. The term “ANSI� as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community. The source of this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft�which became International Organization for Standardization (ISO) Standard 8859-1. “ANSI applications� are usually a reference to non-Unicode or code page–based applications.
Windows-1252 (CP1252) differs from iso-8859-1 in code points 0x80..0x9F. In iso-8859-1 these are control characters; in windows-1252 most of them are printable characters.
Perhaps TextPad tries to use the code page set in Windows' "Regional and Language Option". In English-speaking countries by default it's windows-1252.
You can set the code page with
Configure | Preferences | Document Classes | <Class> | Font | Script
or
View | Document Properties | Font | Script
The correspondence is
Code: Select all
1250 Central European
1251 Cyrillic
1252 Western; Latin 1
1253 Greek
1254 Turkish
Posted: Mon Jan 10, 2005 10:12 am
by bbadmin
The term ANSI is used, because the characters defined for code points between 128 and 255 depend on the selected font script. For example, Western characters are different from Central European, and two bytes are needed for many Japanese characters. These characters are defined by ANSI standards - although Windows may not adhere to those standards.
Keith MacDonald
Helios Software Solutions
Posted: Mon Jan 10, 2005 10:13 am
by ben_josephs
MudGuard wrote:The ANSI character set is a superset of ISO-8859-1 - in ISO-8859-1 the code positions 128 to 159 are not used, in ANSI they are used - e.g. ANSI 128 is the EURO currency symbol.
They
are used. They're control characters. If you put
Code: Select all
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
in your web page and then include any of the characters in the range 0x80..0x9F, hoping that they will be interpreted as in windows-1252, the Prince of Darkness will appear in your sitting room.
Posted: Mon Jan 10, 2005 10:17 am
by ben_josephs
bbadmin wrote:These characters are defined by ANSI standards - although Windows may not adhere to those standards.
Nope. They're defined by MS,
not ANSI. The encodings CP-125? are not the same as the encodings iso-8859-*.

Posted: Mon Jan 10, 2005 11:16 am
by KimmoA
Hehe... at least it seems less than obvious.
It has bugged me for a while. I knew, of course, what ASCII is. I knew that ANSI is a standards body.
I had read something about "extended" ASCII, to be called simply "ANSI", which would make it "ANSI ANSI" in my eyes...
If it's ISO-8859-1 or similar, a great burden of confused would be taken away from the "back" of my mind.