1 Introduction

1.2 Definition of terms

Some terms used in this guide define concepts that are new to the discussion of Common Lisp characters and strings. These new concepts represent abstractions that allow common lisp to provide a uniform treatment of data from many languages without specifying standard external encoding schemes.

A character repertoire is an unordered set of abstract characters, independent of coding, visual representation, or font. For example, the English alphabet and the Kanji characters are separate character repertoires.

Characters within a repertoire are identified by a unique numeric encoding. A coded character set is a mapping between the characters in one or more character repertoires and a number that serves as the character code for that particular set.

A character contained in a repertoire can be a member of more than one coded character set. For example, the character#\a from the English repertoire is a member of both the ASCII and EBCDIC coded character sets. The JIS (Japanese Industry Standard) coded character set contains characters from the Kanji, Katakana, Hiragana, Greek, and Cyrillic repertoires.

Within a given coded character set, individual member characters are distinguished by their character external code, a nonnegative integer.

Each character data object that is classified as graphic, or displayable, is associated with a glyph. The glyph is the visual representation of the character.

When a character stream is opened, an external format is associated with it. The external format is one of possibly several implementation-specific external encodings of character data. The mapping between the external format and the internal code is implementation dependent. On some systems, the internal encoding for a character may be different from the encoding used to represent the character in a file on disk.

Liquid Common Lisp supports a distinguished character repertoire, called the base character repertoire, and one or more corresponding coded character sets. All other characters are called extended characters. Strings that consist solely of base characters are called base strings, whereas strings that can contain any character are called general strings. The typestring is a union of these two subtypes.

Table 1.1 shows how the English character#\a and the Greek character#\j are classified.

Classification of#\a and#\j
PlatformCharacterCharacter RepertoireCoded Character SetExternal FormatGlyph
HP#\aEnglishASCII, EBCDICASCIIa
 #\jGreekJISHP15j
RS6000#\aEnglishASCII, EBCDICASCIIa
 #\jGreekJISShift JISj
SunOS/Solaris#\aEnglishASCII, EBCDICEUC ASCII codeset 0a
 #\jGreekJISEUC codeset 1j


International Character Sets - 9 SEP 1996

Generated with Harlequin WebMaker