CJK Compatibility Ideographs

From HandWiki
Short description: Unicode character block
CJK Compatibility Ideographs
RangeU+F900..U+FAFF
(512 code points)
PlaneBMP
ScriptsHan
Assigned472 code points
Unused40 reserved code points
Source standardsKS X 1001
Big5
IBM 32
JIS X 0213
ARIB STD-B24
KPS 10721-2000
Unicode version history
1.0.1302 (+302)
3.2361 (+59)
4.1467 (+106)
5.2470 (+3)
6.1472 (+2)
Note: [1][2]
Range was initially part of the Private Use Area in Unicode 1.0.0,[3] and removed from it in Unicode 1.0.1.

CJK Compatibility Ideographs is a Unicode block created to contain mostly Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. However, it also contains 12 unified ideographs sourced from Japanese character sets from IBM.

The block has dozens of ideographic variation sequences registered in the Unicode Ideographic Variation Database (IVD).[4][5] These sequences specify the desired glyph variant for a given Unicode character.

Character sources

Sources for the original collection of CJK Compatibility Ideographs include:

  • South Korean KS X 1001 (U+F900–U+FA0B, 268 characters)
  • Taiwanese Big5 (U+FA0C–U+FA0D, 2 characters)
  • "IBM 32": 32 Japanese characters from IBM (U+FA0E–U+FA2D; see below)

In ensuing versions of the standard, more characters have been added to the block from:

  • South Korean KS X 1001 (U+FA2E–U+FA2F, 2 characters)
  • Japanese JIS X 0213 (U+FA30–U+FA6A, 59 characters)
  • Japanese ARIB STD-B24 (U+FA6B–U+FA6D, 3 characters)
  • North Korean KPS 10721-2000 (U+FA70–U+FAD9, 106 characters)

The "IBM 32" characters

IBM Japanese double-byte EBCDIC includes several kanji which do not exist in, or do not round-trip from, JIS X 0208. These were included as gaiji in extensions to Shift JIS and EUC-JP from IBM (e.g. code page 942), NEC, the Open Software Foundation, and Microsoft (e.g. Windows code page 932). However, they were not used as a source for the original Unified Repertoire and Ordering (URO). Instead, 32 of the IBM extension kanji, those which had not been included in the URO from other sources, were included in the CJK Compatibility Ideographs block in the range U+FA0E–U+FA2D.

Of these 32 characters:

  • 19 are unifiable with characters in the URO, and are therefore compatibility ideographs in the strict sense.
  • One (U+FA20 ) is a kyūjitai form of a kokuji whose extended shinjitai form exists in the URO (U+8612 ). Both are hyōgai kanji, and are variants of the jinmeiyō kanji U+8429 (i.e. Kummerowia). U+FA20 was assigned a compatibility normalisation to U+8612, even though the 龜 and 亀 components, while both forms of radical 213, are not usually considered unifiable.
  • The remaining 12 are kokuji characters which are actually unified ideographs (with the Unified_Ideograph property, and which do not change upon compatibility normalisation). In spite of their inclusion in the CJK Compatibility Ideographs block and their algorithmically generated character names beginning with "CJK COMPATIBILITY IDEOGRAPH", they are not duplicates of characters in the original CJK Unified Ideographs block in any respect;[6][7] 11 of these 12 are completely non-duplicate, while U+FA23 was later unintentionally duplicated in CJK Unified Ideographs Extension B as U+27EAF 𧺯 . They are as follows:
U+FA0E
U+FA0F
U+FA11
U+FA13
U+FA14
U+FA1F
U+FA21
U+FA23
U+FA24
U+FA27
U+FA28
U+FA29


Block

History

The following Unicode-related documents record the purpose and process of defining specific characters in the CJK Compatibility Ideographs block:

Version Final code points[lower-alpha 1] Count L2 ID WG2 ID IRG ID Document
1.0.1 U+F900..FA2D 302 N782 Ksar, Mike (1991-10-12), Attachment to N 767 WG2-Paris meeting copies of working papers 
L2/03-399 Fok, Anthony (2003-10-13), Unihan reported errors / changes re kHKSCS entries 
L2/03-367 N2667 Suignard, Michel; Muller, Eric; Jenkins, John (2003-10-22), CJK Ideograph source references corrections 
L2/03-398 Nguyen, D. (2003-10-29), Unihan reported errors / changes re kCowles 
L2/03-417 Muller, Eric (2003-10-31), Variation sequences for CJK Compatibility characters 
L2/06-309R Karlsson, Kent (2006-11-07), Bug in DerivedNumericValues.txt 
L2/06-324R2 Moore, Lisa (2006-11-29), UTC #109 Minutes, "Add numeric values to 8 compatibility ideographs to match their canonical characters." 
L2/08-373 N3525 Lunde, Ken; Muller, Eric (2008-10-06), Handling CJK compatibility characters with variation sequences 
L2/08-425 Cook, Richard; Lunde, Ken (2008-11-18), IRG Use of IVD Collections 
L2/09-003R Moore, Lisa (2009-02-12), UTC #118 / L2 #215 Minutes 
L2/09-080 N3590 Muller, Eric (2009-03-11), Difficulties with compatibility ideographs 
L2/09-290 Muller, Eric (2009-08-07), Draft IVD registration for Compatibility Characters 
L2/11-243 N4111 Sources for Orphaned CJK Ideographs, 2011-06-14 
L2/11-254 Constable, Peter (2011-06-20), UTC Liaison Report from WG2 
N4103 Unconfirmed minutes of WG 2 meeting 58, 2012-01-03 
L2/17-090 Chung, Jaemin (2017-04-07), Proposal to add informative notes and cross-reference to U+F92C and U+F9B8 
L2/17-103 Moore, Lisa (2017-05-18), UTC #151 Minutes 
3.2 U+FA30..FA6A 59 L2/99-016 N1935 Paterson, Bruce (1998-11-30), Editorial corrigenda on CJK compatibility ideographs, and other items 
L2/99-240 Addition of fifty six KANJIs for compatibility, 1999-07-15 
L2/99-232 N2003 Umamaheswaran, V. S. (1999-08-03), Minutes of WG 2 meeting 36, Fukuoka, Japan, 1999-03-09--15 
L2/99-311 Addition of fifty six KANJIs for compatibility, 1999-08-23 
L2/99-313 N2095 Sato, T. K. (1999-09-08), Addition of CJK ideographs which are already "unified" 
L2/99-316 Whistler, Ken (1999-09-13), Comments on JCS proposal 
L2/99-365 Moore, Lisa (1999-11-23), Comments on JCS Proposals 
L2/99-383 N2142 N710 The response to WG2 resolution M37.16: CJK compatibility ideographs from JIS (WG2 N2104), 1999-12-09 
L2/00-010 N2103 Umamaheswaran, V. S. (2000-01-05), Minutes of WG 2 meeting 37, Copenhagen, Denmark: 1999-09-13—16 
L2/99-260R Moore, Lisa (2000-02-07), Minutes of the UTC/L2 meeting in Mission Viejo, October 26-28, 1999 
L2/00-101 N2197 Sato, T. K. (2000-03-15), Update: CJK COMPATIBILITY IDEOGRAPH request 
L2/00-172 N2221 Sato, T. K. (2000-04-20), JIS COMPATIBILITY IDEOGRAPHS (draft for ammendment-1) [sic] 
N2221R JIS COMPATIBILITY IDEOGRAPHS (draft for ammendment-1) [sic] revised, 2000-06-01 
L2/00-190 Moore, Lisa (2000-06-22), UTC Rescinds Acceptance of Four Duplicate Radicals from JIS X 213 
L2/00-234 N2203 (rtf, txt) Umamaheswaran, V. S. (2000-07-21), Minutes from the SC2/WG2 meeting in Beijing, 2000-03-21 -- 24 
L2/00-337 N2273 JIS compatibility ideographs, 2000-09-19 
L2/00-378 N2295 Sato, T. K. (2000-10-26), Feedback from Japan on N2281 -- working draft on pDAM 1 -- CJK Compatibility 
L2/01-420 Whistler, Ken (2001-10-30), WG2 (Singapore) Resolution Consent Docket for UTC 
L2/01-405R Moore, Lisa (2001-12-12), Minutes from the UTC/L2 meeting in Mountain View, November 6-9, 2001 
L2/06-321 Whistler, Ken (2006-10-03), UCD Bug re JIS 0213 
L2/06-324R2 Moore, Lisa (2006-11-29), UTC #109 Minutes, "Give U+FA30..U+FA6A the ideographic property, and fix the wordbreak property." 
4.1 U+FA70..FAD9 106 L2/01-050 N2253 Umamaheswaran, V. S. (2001-01-21), Minutes of the SC2/WG2 meeting in Athens, September 2000 
L2/01-350 N2375 Proposal to add 160 Compatibility Hanja code table of D P R of Korea into CJK Compatibility Ideographs, 2001-09-03 
L2/02-154 N2403 Umamaheswaran, V. S. (2002-04-22), Draft minutes of WG 2 meeting 41, Hotel Phoenix, Singapore, 2001-10-15/19 
N2478 Proposed Disposition of comments on SC2 N 3584 (PDAM text for Amendment 2 to ISO/IEC 10646-1:2000), 2002-05-08 
N2541 Proposed disposition of comments on SC2 N 3624 (FPDAM text for Amendment 2 to ISO/IEC 10646-1:2000), 2002-12-02 
N2540 Freytag, Asmus (2002-12-05), Corrections to CJK Compatibility Ideographs Table in FPDAM 
L2/02-465 N2566 Collins, Lee; Freytag, Asmus (2002-12-09), Review of DPRK Compatibility Ideographs 
L2/02-471 N2572 CJK Compatibility Ideographs (Unicode 3.2, page 399), 2002-12-18 
L2/02-472 N2573 Report of DPRK compatibility characters ad hoc meeting, 2002-12-11 
L2/02-468 N2569 Suignard, Michel (2002-12-12), Proposed disposition of comments on SC2 N 3624 (FPDAM text for Amendment 2 to ISO/IEC 10646-1:2000) 
L2/03-023 N2569R Suignard, Michel (2003-01-27), Disposition of Comments Report on 10646-1/FPDAM 2 
L2/03-346 Chang, Cora (2003-10-20), Analysis of characters in WG2 documents N2572, N2573 
L2/03-346.1 Chang, Cora (2003-10-20), Analysis of characters in WG2 documents N2572, N2573 [spreadsheet without glyphs] 
L2/04-207 N2776 N1062 Proposal to add 106 Compatibility Hanjas of D P R of Korea to CJK Compatibility Ideographs, 2004-05-25 
L2/04-330 Whistler, Ken (2004-08-03), WG2 Consent Docket 
L2/04-316 Moore, Lisa (2004-08-19), UTC #100 Minutes 
L2/05-050R N2924R Freytag, Asmus (2005-01-28), Charts - Amendments 1 and 2 to ISO/IEC 10646:2003 
L2/10-367 N3899 KP1-0000, 2010-09-30 
L2/11-243 N4111 Sources for Orphaned CJK Ideographs, 2011-06-14 
L2/11-254 Constable, Peter (2011-06-20), UTC Liaison Report from WG2 
N4103 Unconfirmed minutes of WG 2 meeting 58, 2012-01-03 
5.2 U+FA6B..FA6D 3 N3353 (pdf, doc) Umamaheswaran, V. S. (2007-10-10), Unconfirmed minutes of WG 2 meeting 51 Hanzhou, China; 2007-04-24/27 
L2/07-387 Proposal to encode six CJK Ideographs in UCS, 2007-10-17 
L2/08-184 N3318R (pdf, appendix) Revised proposal to encode six CJK Ideographs in UCS, 2008-03-25 
L2/08-318 N3453 (pdf, doc) Umamaheswaran, V. S. (2008-08-13), Unconfirmed minutes of WG 2 meeting 52 
L2/08-161R2 Moore, Lisa (2008-11-05), UTC #115 Minutes 
6.1 U+FA2E..FA2F 2 L2/10-087 N3747 A solution proposed by R.O.Korea for incorrectly mapped compatibility chars, 2010-03-19 
L2/10-108 Moore, Lisa (2010-05-19), UTC #123 / L2 #220 Minutes 
N3803 (pdf, doc) Unconfirmed minutes of WG 2 meeting no. 56, 2010-09-24 
  1. Proposed code points and characters names may differ from final code points and names

See also

References

  1. "Unicode character database". The Unicode Standard. https://www.unicode.org/ucd/. Retrieved 2023-07-26. 
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. https://www.unicode.org/versions/enumeratedversions.html. Retrieved 2023-07-26. 
  3. The Unicode Standard, Version 1.0, Volume 1. Unicode Consortium. 1991. pp. 118–119. ISBN 0-201-56788-1. 
  4. "Ideographic Variation Database". Unicode Consortium. https://www.unicode.org/ivd/. 
  5. "UTS #37, Unicode Ideographic Variation Database". Unicode Consortium. https://www.unicode.org/reports/tr37/. 
  6. "PropList.txt". Unicode Consortium. https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt. 
  7. Freytag, Asmus; McGowan, Rick; Whistler, Ken (2021-06-14). "Known Anomalies in Unicode Character Names". Unicode Consortium. https://www.unicode.org/notes/tn27/. "These 12 characters are unified CJK ideographs, not compatibility ideographs, despite their names."