Why is the following incorrect syntax?
class CharactersInPlay
{
public static void main (String[] args)
{
char a = 'u\16848'; //I tried escaping i.e. 'u\\16848', and, also other characters
System.out.println(a);
}
}
References
Banum Unicode Chart at http://www.unicode.org/charts/PDF/U16800.pdf.
SCJP 1.6 Exam Prep written by K. Bates and Sierra. P.189
Character literals?
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
-
- Posts: 2461
- Joined: Sun Mar 02, 2003 9:22 pm
UTF-16 (16-bit Unicode Transformation Format) is a character encoding for Unicode capable of encoding 1,112,064[1] numbers (called code points) in the Unicode code space from 0 to 0x10FFFF. It produces a variable-length result of either one or two 16-bit code units per code point.ben_josephs wrote:Java's chars are 16 bits (4 hex digits) wide, enough to hold the characters of the basic multilingual plane.
If you need to use wider ("supplementary") characters you'll have to use surrogate pairs.
Or ints.
http://en.wikipedia.org/wiki/UTF-16/UCS ... U.2B10FFFF
A code point is a code value that is associated with a character in an encoding scheme. in the Unicode standard, code points are written in hexadecimal and prefixed with U+, such as U+0041 for the code point of the letter A. Unicode has code points that are grouped into 17 code planes.
The first code plane, called the basic multilingual plane, consists of the classic Unicode characters with code points U+0000 to U+FFFF. Sixteen additional planes, with code points U+10000 to U10FFFF, hold the supplementary characters.
Cora Java Vol I (P.43)
Can you provide an example?
Jon
-
- Posts: 2461
- Joined: Sun Mar 02, 2003 9:22 pm
Yes, but I am down with coffee today and require an example, if possible.ben_josephs wrote:Indeed. The page you refer to also explains
Code points from the other planes (called Supplementary Planes) are encoded in UTF-16 by pairs of 16-bit code units called a surrogate pair...
Did you read the whole of that page or the Oracle Java page it refers to?
Jon
-
- Posts: 2461
- Joined: Sun Mar 02, 2003 9:22 pm
Oh ic, well I am reading the document at http://www.unicode.org/versions/Unicode6.1.0/ch02.pdf.ben_josephs wrote:How you do it depends on the requirements of whatever function you're going to pass it to.
And I've never done it, so I don't have an example.
Jon
Re: Character literals?
first of all, it is an escaped u that starts an unicode character:jon80 wrote:Why is the following incorrect syntax?
char a = 'u\16848'; //I tried escaping i.e. 'u\\16848', and, also
'\u0064'
Codes above \uFFFF must be encoded in two parts.
For details see
http://java.sun.com/developer/technical ... lementary/
(whether a java char or a Character can hold unicode characters above \uffff i don't know - I know they can be used in a string)