detect-unicode-bom

detect-utf32-bom

detect-utf8-bom Functions

Summary

Looks for the Unicode Byte Order Mark, which if found is assumed to indicate the matching Unicode encoding.

Package

system

Signatures

detect-unicode-bom pathname ef-spec buffer length => new-ef-spec

detect-utf32-bom pathname ef-spec buffer length => new-ef-spec

detect-utf8-bom pathname ef-spec buffer length => new-ef-spec

Arguments

`pathname`⇩	Pathname identifying the location of `buffer`.
`ef-spec`⇩	An external format spec.
`buffer`⇩	A buffer whose contents are examined.
`length`⇩	Length (an integer) up to which `buffer` should be examined.

Values

new-ef-spec

A new external format spec created by merging ef-spec with the encoding that was found.

Description

These functions are called as part of open's encoding detection routine, and try to detect the encoding if it is not already supplied by ef-spec (i.e. is not :default).

detect-unicode-bom tries to detect UTF-16 encoding.

detect-utf32-bom tries to detect UTF-32 encoding.

detect-utf8-bom tries to detect UTF-8 encoding.

These functions work by checking whether the bytes in buffer (bounded by length) starts with the Unicode character #xFEFF (BOM) encoded in the relevant encoding, and if it does assumes the file is encoded in this encoding. detect-unicode-bom and detect-utf32-bom also deduce the direction (little-endian or big-endian) if ef-spec does not include this.

Note that files starting with 0xff 0xfe 0x00 0x00 can match both UTF-16 and UTF-32 little-endian. By default detect-utf32-bom is applied first, because it precedes detect-unicode-bom in *file-encoding-detection-algorithm*. You can change this behavior by altering the order of functions in *file-encoding-detection-algorithm*.

pathname is ignored.