Add preliminary files with mappings from XCCS to Unihan set of Unicode.#2559
Add preliminary files with mappings from XCCS to Unihan set of Unicode.#2559MattHeffron merged 4 commits intomasterfrom
Conversation
These were generated with scripting from the data in the unihan folder of the Unicode Data Base. That info claims to know the mapping from Unicode(unihan) to "Xerox" coding (2 bytes in octal). These were not validated at all for correctness/completeness. None of these files have any descriptive header. (Unfortunately, this kind of mapping information to "Xerox" is *only* for Unihan characters.)
|
The nw files appear to be intact and readable. |
|
The format seems good. The only issue is that we need a predicate that is true for the range of Unihan charsetset numbers, like we have for Kanji and Chinese. That would be used to make sure that the glyphs for the bold and italic versions of the display fonts, if we ever got them, would not be faked up. |
|
Unihan is the unification of Kanji, Chinese, Japanese, etc. characters. See: Han unification The XCCS Standard Document map of character sets to languages (page 34, 2-8) appears quite incomplete. |
rmkaplan
left a comment
There was a problem hiding this comment.
Why not? Turns out that the function CHINESECHARSETP was basically picking out just these character sets (except for 171, which KANJICHARSETP should cover), so I'll separately rename CHINESECHARSETP to UNIHANCHARSETP
These were generated, using scripting, from the data in the unihan folder of the Unicode Data Base.
That info claims to know the mapping from Unicode(unihan) to "Xerox" coding (2 bytes in octal).
These were not validated at all for correctness/completeness.
None of these files have any descriptive header.
(Unfortunately, this kind of mapping information to "Xerox" is only for Unihan characters.)