Add preliminary files with mappings from XCCS to Unihan set of Unicode. by MattHeffron · Pull Request #2559 · Interlisp/medley

MattHeffron · 2026-04-10T22:02:32Z

These were generated, using scripting, from the data in the unihan folder of the Unicode Data Base.
That info claims to know the mapping from Unicode(unihan) to "Xerox" coding (2 bytes in octal).
These were not validated at all for correctness/completeness.
None of these files have any descriptive header.

(Unfortunately, this kind of mapping information to "Xerox" is only for Unihan characters.)

These were generated with scripting from the data in the unihan folder of the Unicode Data Base. That info claims to know the mapping from Unicode(unihan) to "Xerox" coding (2 bytes in octal). These were not validated at all for correctness/completeness. None of these files have any descriptive header. (Unfortunately, this kind of mapping information to "Xerox" is *only* for Unihan characters.)

….com/Interlisp/medley into mth67--Add_unihan_XCCS_mapping_files

pamoroso · 2026-04-11T09:15:08Z

The nw files appear to be intact and readable.

rmkaplan · 2026-04-12T07:11:00Z

The format seems good. The only issue is that we need a predicate that is true for the range of Unihan charsetset numbers, like we have for Kanji and Chinese. That would be used to make sure that the glyphs for the bold and italic versions of the display fonts, if we ever got them, would not be faked up.

MattHeffron · 2026-04-15T00:46:09Z

Unihan is the unification of Kanji, Chinese, Japanese, etc. characters. See: Han unification
But "many characters have regional variants assigned to different code points"

The XCCS Standard Document map of character sets to languages (page 34, 2-8) appears quite incomplete.
In addition, some character sets include mixes of Latin, Symbols, Asian, Arabic, etc. characters, so by character set management of faking might be insufficient.

rmkaplan

Why not? Turns out that the function CHINESECHARSETP was basically picking out just these character sets (except for 171, which KANJICHARSETP should cover), so I'll separately rename CHINESECHARSETP to UNIHANCHARSETP

MattHeffron requested review from hjellinek and rmkaplan April 10, 2026 22:02

MattHeffron self-assigned this Apr 10, 2026

MattHeffron added this to fonts/unicode Apr 10, 2026

MattHeffron added the enhancement New feature or request label Apr 10, 2026

MattHeffron added 3 commits April 10, 2026 15:54

Merge branch 'master' into mth67--Add_unihan_XCCS_mapping_files

8f6d164

Added char set number line comment to top of each file

3658e1f

Merge branch 'mth67--Add_unihan_XCCS_mapping_files' of https://github…

55d5f2d

….com/Interlisp/medley into mth67--Add_unihan_XCCS_mapping_files

MattHeffron marked this pull request as ready for review April 15, 2026 19:05

rmkaplan approved these changes Apr 17, 2026

View reviewed changes

MattHeffron merged commit 2814618 into master Apr 17, 2026

github-project-automation bot moved this to Done in fonts/unicode Apr 17, 2026

MattHeffron deleted the mth67--Add_unihan_XCCS_mapping_files branch April 17, 2026 19:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add preliminary files with mappings from XCCS to Unihan set of Unicode.#2559

Add preliminary files with mappings from XCCS to Unihan set of Unicode.#2559
MattHeffron merged 4 commits intomasterfrom
mth67--Add_unihan_XCCS_mapping_files

MattHeffron commented Apr 10, 2026

Uh oh!

pamoroso commented Apr 11, 2026

Uh oh!

rmkaplan commented Apr 12, 2026

Uh oh!

MattHeffron commented Apr 15, 2026

Uh oh!

rmkaplan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

MattHeffron commented Apr 10, 2026

Uh oh!

pamoroso commented Apr 11, 2026

Uh oh!

rmkaplan commented Apr 12, 2026

Uh oh!

MattHeffron commented Apr 15, 2026

Uh oh!

rmkaplan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants