Struct encoding::codec::utf_8::UTF8Encoding
[−]
[src]
pub struct UTF8Encoding;
UTF-8 (UCS Transformation Format, 8-bit).
This is a Unicode encoding compatible to ASCII (ISO/IEC 646:US) and able to represent all Unicode codepoints uniquely and unambiguously. It has a variable-length design, where one codepoint may use 1 (up to U+007F), 2 (up to U+07FF), 3 (up to U+FFFF) and 4 bytes (up to U+10FFFF) depending on its value. The first byte of the sequence is distinct from other "continuation" bytes of the sequence making UTF-8 self-synchronizable and easy to handle. It has a fixed endianness, and can be lexicographically sorted by codepoints.
The UTF-8 scanner used by this module is heavily based on Bjoern Hoehrmann's Flexible and Economical UTF-8 Decoder.