"ANSI" characters?

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
KimmoA
Posts: 17
Joined: Thu Jan 06, 2005 2:51 am
Location: Sweden
Contact:

"ANSI" characters?

Post by KimmoA »

Why does it say "ANSI characters"?

It's ASCII up to 128... then what does it use? Some ISO standard? Or some extended ASCII one? "ANSI ANSI"? :?
User avatar
MudGuard
Posts: 1295
Joined: Sun Mar 02, 2003 10:15 pm
Location: Munich, Germany
Contact:

Post by MudGuard »

ASCII is a 7-Bit code thus "It's ASCII up to 128" is wrong as ASCII only goes up to 127 ...

The ANSI character set is a superset of ISO-8859-1 - in ISO-8859-1 the code positions 128 to 159 are not used, in ANSI they are used - e.g. ANSI 128 is the EURO currency symbol.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

A good question. Even Microsoft acknowledges that the term "ANSI character" is wrong (http://www.microsoft.com/globaldev/refe ... ssary.mspx):
ANSI: Acronym for the American National Standards Institute. The term “ANSI� as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community. The source of this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft�which became International Organization for Standardization (ISO) Standard 8859-1. “ANSI applications� are usually a reference to non-Unicode or code page–based applications.
Windows-1252 (CP1252) differs from iso-8859-1 in code points 0x80..0x9F. In iso-8859-1 these are control characters; in windows-1252 most of them are printable characters.

Perhaps TextPad tries to use the code page set in Windows' "Regional and Language Option". In English-speaking countries by default it's windows-1252.

You can set the code page with
Configure | Preferences | Document Classes | <Class> | Font | Script
or
View | Document Properties | Font | Script
The correspondence is

Code: Select all

1250  Central European
1251  Cyrillic
1252  Western; Latin 1
1253  Greek
1254  Turkish
User avatar
bbadmin
Site Admin
Posts: 854
Joined: Mon Feb 17, 2003 8:54 pm
Contact:

Post by bbadmin »

The term ANSI is used, because the characters defined for code points between 128 and 255 depend on the selected font script. For example, Western characters are different from Central European, and two bytes are needed for many Japanese characters. These characters are defined by ANSI standards - although Windows may not adhere to those standards.

Keith MacDonald
Helios Software Solutions
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

MudGuard wrote:The ANSI character set is a superset of ISO-8859-1 - in ISO-8859-1 the code positions 128 to 159 are not used, in ANSI they are used - e.g. ANSI 128 is the EURO currency symbol.
They are used. They're control characters. If you put

Code: Select all

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
in your web page and then include any of the characters in the range 0x80..0x9F, hoping that they will be interpreted as in windows-1252, the Prince of Darkness will appear in your sitting room.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

bbadmin wrote:These characters are defined by ANSI standards - although Windows may not adhere to those standards.
Nope. They're defined by MS, not ANSI. The encodings CP-125? are not the same as the encodings iso-8859-*.
:-)
KimmoA
Posts: 17
Joined: Thu Jan 06, 2005 2:51 am
Location: Sweden
Contact:

Post by KimmoA »

Hehe... at least it seems less than obvious.

It has bugged me for a while. I knew, of course, what ASCII is. I knew that ANSI is a standards body.

I had read something about "extended" ASCII, to be called simply "ANSI", which would make it "ANSI ANSI" in my eyes...

If it's ISO-8859-1 or similar, a great burden of confused would be taken away from the "back" of my mind.
Post Reply