Next: Wide Character Constants, Previous: UTF-8 String Constants, Up: Constants [Contents][Index]
You can specify Unicode characters, for individual character constants or as part of string constants (see String Constants), using escape sequences; and even in C identifiers. Use the ‘\u’ escape sequence with a 16-bit hexadecimal Unicode character code. If the code value is too big for 16 bits, use the ‘\U’ escape sequence with a 32-bit hexadecimal Unicode character code. (These codes are called universal character names.) For example,
\u6C34 /* 16-bit code (UTF-16) */ \U0010ABCD /* 32-bit code (UTF-32) */
One way to use these is in UTF-8 string constants (see UTF-8 String Constants). For instance,
u8"fóó \u6C34 \U0010ABCD"
You can also use them in wide character constants (see Wide Character Constants), like this:
u'\u6C34' /* 16-bit code */ U'\U0010ABCD' /* 32-bit code */
and in wide string constants (see Wide String Constants), like this:
u"\u6C34\u6C33" /* 16-bit code */ U"\U0010ABCD" /* 32-bit code */
And in an identifier:
int foo\u6C34bar = 0;
Codes in the range of D800
through DFFF
are not valid
in Unicode. Codes less than 00A0
are also forbidden, except for
0024
, 0040
, and 0060
; these characters are
actually ASCII control characters, and you can specify them with other
escape sequences (see Character Constants).