26.2 Unicode support

All Manuals > LispWorks User Guide and Reference Manual > 26 Internationalization: characters, strings and encodings

26.2 Unicode support

Character implementation in LispWorks covers the full range of the Unicode standard.

cl:char-code-limit is #x110000, which covers exactly the Unicode range. The surrogate code points (codes #xd800 to #xdfff) are illegal as character codes.

cl:code-char accepts integers from 0 below cl:char-code-limit. Other values cause an error. For codes in the surrogate range it returns nil. Reading characters from streams and converting characters from foreign strings can generate characters in all the range (depending on the external-format used), and can never generate character objects corresponding to surrogate code points.

text-string and simple-text-string take 32 bits per character and can store the full range of Unicode characters.

simple-char is now a synonym for cl:character, and is deprecated.

16-bit characters and 16-bit strings are implemented by types bmp-char and bmp-string and simple-bmp-string (BMP is Basic Multilingual Plane, the first plane (0 - #xffff) of Unicode). You may want to use bmp-string to minimize memory usage if you have an application with many 16-bit strings. That will work provided all the characters you ever use have codes less than #x10000. If all of the codes are below 256, you can use base-string instead.

Note: Character bits and font attributes are not supported. To deal with bits, use Gesture Spec objects (see make-gesture-spec and coerce-to-gesture-spec).

LispWorks User Guide and Reference Manual - 20 Sep 2017