All Manuals > LispWorks User Guide and Reference Manual > 26 Internationalization: characters, strings and encodings

NextPrevUpTopContentsIndex

26.3 Character and String types

26.3.1 Character types

LispWorks supports all the characters in the Unicode range [0, #x10ffff], excluding the surrogate range [#xd800, #xdfff]. Note that character objects corresponding to surrogate code points may be produced by some APIs in LispWorks, but not by the interfaces that you should normally use to generate characters and strings in Common Lisp (that is cl:code-char, reading from a stream, converting from a foreign string, loading and storing from or to strings).

The following subtypes of character are defined:

base-char

Characters with cl:char-code less than base-char-code-limit (256).

bmp-char

Characters with cl:char-code less than #x10000 (BMP stands for Basic Multilingual Plane in Unicode).

character

All characters.

26.3.2 Compatibility notes

In LispWorks 6.1 and earlier versions, characters with codes up to #x10000 are supported, and surrogate code points are allowed.

bmp-char is new in LispWorks 7.0, and matches the range of characters in LispWorks 6.1 and earlier versions, except that surrogate code points are no longer valid.

In LispWorks 6.1 and earlier versions there is simple-char which is now a synonym for cl:character. Using cl:character is preferable and portable.

In LispWorks 6.1 and earlier versions character bits attributes are supported, and also some characters represent keyboard gestures. These are no longer supported.

26.3.3 Character Syntax

All simple characters have names that consist of U+ followed by the code of the character in hexadecimal, for example #\U+764F is (code-char #x764F).

The hexadecimal number must be 4-6 characters, for example #\U+a0 is illegal. Use #\U+00a0 instead.

Additionally, Latin-1 characters have names derived from the ISO10646 name, for example:

(char-name (code-char 190))
=>
"Vulgar-Fraction-Three-Quarters"

Names are also provided for space characters:

(name-char "Ideographic-Space")
=>
#\Ideographic-Space

Note that surrogate characters, that is the inclusive range [#xd800, #xdfff] are not acceptable, and trying to read such a character, for example #\U+d835, produces an error.

26.3.4 Compatibility notes

In LispWorks 6.1 and earlier versions you can specify bits in character names. This is illegal in LispWorks 7.0 and later.

In LispWorks 6.1 and earlier versions character codes are limited to less than #x10000, and surrogate code points are allowed.

26.3.5 String types

String types are supplied which are capable of holding each of the character types mentioned above. The following string types are defined:

base-string

holds any base-char.

bmp-string

holds any bmp-char.

text-string

holds any cl:character (see Character types).

Compatibility note: bmp-string is new in 7.0. In LispWorks 6.1 and earlier versions there is augmented-string, this is now a synonym for text-string and is deprecated.

In LispWorks 6.1 and earlier versions, text-string could hold characters with codes less than #x10000.

The types above include non-simple strings - those which are displaced, adjustable or with a fill-pointer.

The Common Lisp type string itself is dependent on the value of *default-character-element-type* according to the rules for string construction described in String Construction. For example:

CL-USER 1 > (set-default-character-element-type 'base-char)
BASE-CHAR
 
CL-USER 2 > (coerce (list #\Ideographic-Space) 'string)
 
Error: #\Ideographic-Space is not of type BASE-CHAR.
  1 (abort) Return to level 0.
  2 Return to top loop level 0.
 
Type :b for backtrace or :c <option number> to proceed.
Type :bug-form "<subject>" for a bug report template or :? for other options.
 
CL-USER 3 : 1 > :a
 
CL-USER 4 > (set-default-character-element-type 'character)
CHARACTER
 
CL-USER 5 > (coerce (list #\Ideographic-Space) 'string)
" "

The following types are subtypes of cl:simple-string. Note that in the names of the string types, 'simple' refers to the string object and does not mean that the string's elements are simple-chars.

simple-base-string

holds any base-char.

simple-bmp-string

holds any bmp-char.

simple-text-string

holds any cl:character.

The Common Lisp type simple-string itself is dependent on the value of *default-character-element-type* according to the rules for string construction described in String Construction.

26.3.5.1 String types at runtime

The type string (and hence simple-string) is defined by ANSI Common Lisp to be a union of all the character array types. This makes a call like

(coerce s 'simple-string)

ambiguous because it needs to select a concrete type (such as simple-base-string or simple-text-string).

When LispWorks is running with *default-character-element-type* set to base-char, it expects that you will want strings with element type base-char, so functions like coerce treat references to simple-string as if they were (simple-array base-char (*)).

If you call set-default-character-element-type with a larger character type, then simple-string becomes a union of the array types that are subtypes of that character type.

26.3.5.2 String types at compile time

The compiler always does type inferencing for simple-string as if *default-character-element-type* was set to character.

For example, when you declare something to be of type simple-string, the compiler will never treat it as simple-base-string. Therefore calls like

(schar (the simple-string x) 0)

will work whether x is a simple-base-string, simple-bmp-string or simple-text-string.


LispWorks User Guide and Reference Manual - 13 Feb 2015

NextPrevUpTopContentsIndex