2.1 New data types

2.1.1 Character types

Previously, all characters were represented as data objects of typecharacter. Character objects were encoded as single-byte characters that could have bits and font attributes.

In this release, Liquid Common Lisp adds new character types and subtypes to the Common Lisp type hierarchy to support both single-byte and double-byte characters.

The following new character type specifiers have been added to the Common Lisp type hierarchy:

The type specifier(character repertoire-name) specifies a character in a particular character repertoire.

A character repertoire is an unordered set of abstract characters; it does not specify the encoding of the characters. Liquid Common Lisp supports the following character repertoires:

(character :base) and(character :ascii)

Type specifiers denoting the base character repertoire.

For the HP and RS6000 platforms, the base character repertoire contains the standard ASCII characters.

For the SunOS and Solaris platforms, the base character repertoire contains those ASCII characters that are in Extended UNIX Code (EUC) codeset 0.

Base characters are represented as single-byte characters. The base character repertoire can overlap with characters from other repertoires, such as:english.

The typebase-character is synonymous with(character :base).

(character :standard)

Type specifier denoting the repertoire of characters that consists of the 96 standard characters as defined by CLtL2. This type is a subtype of (character :base).

The typestandard-char is synonymous with(character :standard) and is a subtype ofbase-character.

(character :english)

Type specifier denoting the English alphabet.

(character :kanji)

Type specifier denoting Japanese Kanji characters.

(character :katakana)

Type specifier denoting Japanese Katakana characters.

(character :hiragana)

Type specifier denoting Japanese Hiragana characters.

(character :sbcs)

Type specifier denoting any single-byte character.

(character :dbcs)

Type specifier denoting any double-byte character.

The type specifierextended-character denotes all characters that are not base characters. Extended characters are full-fledged Common Lisp characters; that is, you can use extended characters wherever you would use any Common Lisp character. In particular, you can construct symbol names and package names with extended characters, base characters, or a combination of extended and base characters.

For example, if you could type Greek characters as well as ASCII characters from your keyboard, you could have the following interaction with Lisp:

> (setq f  '(1 2 3))
(1 2 3)

> (car f) 1

> (defun g (c) (car c)) g

> (g f) 1

> (defvar *c* '(9 8 7)) *c*

> (g *c*) 9

> (setq mixed-string "c-987") "c-987"

> mixed-string "c-987"

Extended characters are represented as double-byte characters.

Table 2.1 shows which coded character set represents the extended characters for each supported platform.

Platform-specific extended character sets
PlatformCoded Character Set
HPJIS
RS6000PC932 (Shift JIS)
SunOS/SolarisEUC codeset 1

In Common Lisp, you can represent character objects by writing#\ followed by the character. For example,#\a represents a lowercasea. In Liquid Common Lisp, you can also represent characters as a hexadecimal character code; for example, the lettera can be represented as#\c61, which is the same as(int-char #x61).

Extended characters can be represented using the same hex notation; for example, you can enter a character of the Kanji character set by using any of the following notations:

(int-char #xc7ad)
#\cc7ad
#\Kanji ideogram
On streams of typecharacter, strings of extended characters and character objects are printed out directly. On streams of typebase-character, extended characters are printed in the hex notation. For example, if you write the extended character#\cc7ad, it is printed as#\cC7AD. Extended characters in strings and symbol names are printed in a similar syntax. For example, the string"abc#\cc7ad" is printed as#<"abc[C7AD]">, and the symbol nameabc#\cc7ad is printed as#<Symbol |abc[C7AD]|>.

The type specifieraugmented-character is equivalent tocharacter except when used for character I/O operations. Specifically, whenaugmented-character is specified as the value of the:element-type keyword argument toopen andmake-lisp-stream, bits and font attributes are preserved; for all other element types, bits and font attributes are ignored. See Chapter 3, "Using Characters and Strings" for a complete description of the:element-type keyword option.

Figure 2.1 shows the additions to the character type hierarchy and their relationship to the typecharacter, which is the union of the typesbase-character andextended-character:

Figure 2.1 Character type hierarchy


International Character Sets - 9 SEP 1996

Generated with Harlequin WebMaker