In LispWorks 6.1 and earlier versions the external format :unicode is actually "raw UCS-2", that is reading and writing only 16-bit characters (including character objects corresponding to surrogate code points). The :unicode format now maps to :utf-16 with the native endianness (by default). This interprets surrogate code points (
#xdfff) differently: the old :unicode would read these as if they are actual characters, while :utf-16 and hence the new :unicode will try to interpret them as encoding supplementary characters (codes
#x10ffff). The latter behavior is probably what you need, so in most cases there is no need to replace usage of :unicode. There is no external format which interprets surrogate code points as characters now, but you can uses any of the
:bmp formats with
:use-replacement t to read 16-bit characters without giving an error, although this does not exactly match the input, because surrogate code points are translated by the replacement character. The only format that can read anything without any loss is
:utf-16le implement the big-endian (:utf-16be) and little-endian (:utf-16le) UTF-16. The system maps these formats to :utf-16-native or :utf-16-reversed as appropriate, depending on the byte order of the computer.
:utf-16 implements the UTF-16 standard, defaulting to UTF-16BE unless there is a BOM (Byte Order Mark).
:bmp-reversed are the actual implementation formats. They implement reading 16-bit characters with the native byte order (:bmp-native) or the reversed byte order (:bmp-reversed). These formats never read supplementary characters. When they encounter a surrogate code point, they either signal an error or replace it by the replacement character, depending on the parameter
:bmp implements 16-bit character reading and writing, defaulting to the native one.
Notes: In LispWorks 6.1 and earlier versions, the :unicode external format is similar to :bmp now, but handles surrogate code points as if they represent characters. In LispWorks 7.0 and later :unicode maps to :utf-16, and there is no external format that reads surrogate code points as characters.
LispWorks User Guide and Reference Manual - 13 Feb 2015