Character literals?

jon80 · Post by **jon80** » Tue Jul 17, 2012 7:01 am

Why is the following incorrect syntax?

class CharactersInPlay
{

public static void main (String[] args)
{
char a = 'u\16848'; //I tried escaping i.e. 'u\\16848', and, also other characters
System.out.println(a);
}

}

References
Banum Unicode Chart at http://www.unicode.org/charts/PDF/U16800.pdf.
SCJP 1.6 Exam Prep written by K. Bates and Sierra. P.189

ben_josephs · Post by **ben_josephs** » Tue Jul 17, 2012 8:40 am

Java's chars are 16 bits (4 hex digits) wide, enough to hold the characters of the basic multilingual plane.

If you need to use wider ("supplementary") characters you'll have to use surrogate pairs.

Or ints.

jon80 · Post by **jon80** » Tue Jul 17, 2012 8:43 am

ben_josephs wrote:Java's chars are 16 bits (4 hex digits) wide, enough to hold the characters of the basic multilingual plane.

If you need to use wider ("supplementary") characters you'll have to use surrogate pairs.

Or ints.

UTF-16 (16-bit Unicode Transformation Format) is a character encoding for Unicode capable of encoding 1,112,064[1] numbers (called code points) in the Unicode code space from 0 to 0x10FFFF. It produces a variable-length result of either one or two 16-bit code units per code point.
http://en.wikipedia.org/wiki/UTF-16/UCS ... U.2B10FFFF

A code point is a code value that is associated with a character in an encoding scheme. in the Unicode standard, code points are written in hexadecimal and prefixed with U+, such as U+0041 for the code point of the letter A. Unicode has code points that are grouped into 17 code planes.

The first code plane, called the basic multilingual plane, consists of the classic Unicode characters with code points U+0000 to U+FFFF. Sixteen additional planes, with code points U+10000 to U10FFFF, hold the supplementary characters.

Cora Java Vol I (P.43)

Can you provide an example?

ben_josephs · Post by **ben_josephs** » Tue Jul 17, 2012 8:58 am

Indeed. The page you refer to also explains
Code points from the other planes (called Supplementary Planes) are encoded in UTF-16 by pairs of 16-bit code units called a surrogate pair...

Did you read the whole of that page or the Oracle Java page it refers to?

jon80 · Post by **jon80** » Tue Jul 17, 2012 9:02 am

ben_josephs wrote:Indeed. The page you refer to also explains
Code points from the other planes (called Supplementary Planes) are encoded in UTF-16 by pairs of 16-bit code units called a surrogate pair...

Did you read the whole of that page or the Oracle Java page it refers to?

Yes, but I am down with coffee today and require an example, if possible.

ben_josephs · Post by **ben_josephs** » Tue Jul 17, 2012 9:20 am

How you do it depends on the requirements of whatever function you're going to pass it to.

And I've never done it, so I don't have an example.

jon80 · Post by **jon80** » Tue Jul 17, 2012 9:21 am

ben_josephs wrote:How you do it depends on the requirements of whatever function you're going to pass it to.

And I've never done it, so I don't have an example.

Oh ic, well I am reading the document at http://www.unicode.org/versions/Unicode6.1.0/ch02.pdf.

Post by **MudGuard** » Tue Jul 17, 2012 1:16 pm

jon80 wrote:Why is the following incorrect syntax?
char a = 'u\16848'; //I tried escaping i.e. 'u\\16848', and, also

first of all, it is an escaped u that starts an unicode character:
'\u0064'

Codes above \uFFFF must be encoded in two parts.
For details see
http://java.sun.com/developer/technical ... lementary/

(whether a java char or a Character can hold unicode characters above \uffff i don't know - I know they can be used in a string)

jon80 · Post by **jon80** » Tue Jul 17, 2012 1:27 pm

okay I will have a look, thanks

Community

Character literals?

Character literals?

Re: Character literals?

Re: Character literals?