Unicode collation algorithm

The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Unicode. These keys can then be efficiently byte-by-byte compared in order to collate or sort them according to the rules of the language, with options for ignoring case, accents, etc.^[1] Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET), this data file specifies a default collation ordering, the DUCET is customizable for different languages.^[1]^[2] Some such customizations can be found in the Unicode Common Locale Data Repository (CLDR).^[3]

An open source implementation of UCA is included with the International Components for Unicode, ICU.^[4]^[5] ICU supports tailoring, and the collation tailorings from CLDR are included in ICU.^[6]^[2]

References

↑ ^1.0 ^1.1 Whistler, Ken; Scherer, Markus; Davis, Mark (2022-08-26). "UTS #10: Unicode Collation Algorithm". https://www.unicode.org/reports/tr10/.
↑ ^2.0 ^2.1 Hosken, Martin (2021-09-23) (PDF). Unicode Sort Tailoring: Tutorial (1.3 ed.). SIL Writing Systems Technology. pp. 2–3. https://scriptsource.org/cms/scripts/render_download.php?format=file&media_id=..%2Fsites%2Fs%2Fmedia%2Fdatabase%2Fssproto%2Fentries%2Fpn%2Frn%2Fpnrnlhkrq9_sort_tutorial.pdf&filename=sort_tutorial.pdf. Retrieved 2023-08-16.
↑ "CLDR Releases/Downloads". https://cldr.unicode.org/index/downloads.
↑ "ICU - International Components for Unicode". https://icu.unicode.org/home.
↑ "Collations". https://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbadmin/natlang-s-7003956.html.
↑ "Customization". https://unicode-org.github.io/icu/userguide/collation/customization/.

External links

Unicode Collation Algorithm: Unicode Technical Standard #10
Mimer SQL Unicode Collation Charts

Tools

ICU Locale Explorer An online demonstration of the Unicode Collation Algorithm using International Components for Unicode , as of 2023-08-16 it's not working.
An ICU collation demo, as of 2023-08-16 it's not working.
msort A sort program that provides an unusual level of flexibility in defining collations and extracting keys.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Unicode collation algorithm. Read more

[:0-1] 1.0 ^1.1 Whistler, Ken; Scherer, Markus; Davis, Mark (2022-08-26). "UTS #10: Unicode Collation Algorithm". https://www.unicode.org/reports/tr10/.

[:1-2] 2.0 ^2.1 Hosken, Martin (2021-09-23) (PDF). Unicode Sort Tailoring: Tutorial (1.3 ed.). SIL Writing Systems Technology. pp. 2–3. https://scriptsource.org/cms/scripts/render_download.php?format=file&media_id=..%2Fsites%2Fs%2Fmedia%2Fdatabase%2Fssproto%2Fentries%2Fpn%2Frn%2Fpnrnlhkrq9_sort_tutorial.pdf&filename=sort_tutorial.pdf. Retrieved 2023-08-16.

[3] "CLDR Releases/Downloads". https://cldr.unicode.org/index/downloads.

[4] "ICU - International Components for Unicode". https://icu.unicode.org/home.

[5] "Collations". https://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbadmin/natlang-s-7003956.html.

[6] "Customization". https://unicode-org.github.io/icu/userguide/collation/customization/.

[1]

[2]

[3]

[4]

[5]

[6]

Anonymous

Search

Unicode collation algorithm

Namespaces

More

Page actions

Contents

See also

References

External links

Tools

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Unicode collation algorithm

See also

References

External links

Tools

Navigation

Wiki tools

Page tools

Other projects

Categories