detect-unicode-bom

Summary

Looks for the Unicode Byte Order Mark, which if found is assumed to indicate the matching Unicode encoding.

Signature

detect-unicode-bom pathname ef-spec buffer length => new-ef-spec

detect-utf32-bom pathname ef-spec buffer length => new-ef-spec

detect-utf8-bom pathname ef-spec buffer length => new-ef-spec

Arguments

pathname

Pathname identifying the location of buffer.

ef-spec

An external format spec.

buffer

A buffer whose contents are examined.

length

Length (an integer) up to which buffer should be examined.

Values

new-ef-spec

A new external format spec created by merging ef-spec with the encoding that was found.

Description

These functions are called as part of open's encoding detection routine, and try to detect the encoding if it is not already supplied in the external-format argument.

detect-unicode-bom tries to detect UTF-16 encoding.

detect-utf32-bom tries to detect UTF-32 encoding.

detect-utf8-bom tries to detect UTF-8 encoding.

These functions work by checking whether the file starts with the Unicode character #xFEFF (BOM) encoded in the relevant encoding, and if it does assumes the file is encoded in this encoding. detect-unicode-bom and detect-utf32-bom also deduce the direction (little-endian or big-endian).

Note that files starting with 0xff 0xfe 0x00 0x00 can match both UTF-16 and UTF-32 little-endian. By default detect-utf32-bom is applied first, because it precedes detect-unicode-bom in *file-encoding-detection-algorithm*. You can change this behavior by altering the order of functions in *file-encoding-detection-algorithm*.

detect-unicode-bom

Functions

Summary

Package

Signature

Arguments

Values

Description

See also