Unicode collation algorithm

From HandWiki
Revision as of 16:55, 6 February 2024 by Wikisleeper (talk | contribs) (correction)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Unicode. These keys can then be efficiently byte-by-byte compared in order to collate or sort them according to the rules of the language, with options for ignoring case, accents, etc.[1] Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET), this data file specifies a default collation ordering, the DUCET is customizable for different languages.[1][2] Some such customizations can be found in the Unicode Common Locale Data Repository (CLDR).[3]

An open source implementation of UCA is included with the International Components for Unicode, ICU.[4][5] ICU supports tailoring, and the collation tailorings from CLDR are included in ICU.[6][2]

See also

References

External links

Tools