Improve the ja character set per ARIB feedback by palemieux · Pull Request #614 · w3c/imsc

palemieux · 2025-09-07T03:48:19Z

Closes #613

imsc1/spec/ttml-ww-profiles.html

himorin · 2025-09-25T08:09:20Z

Looking at liaison text and ARIB B-62 2.2E1 F1, I think line 3642 should target character codes listed in this list of Japanese Supplementary character set, e.g. When using the variants of the above subset, the "Variation selector" and "Variation sequence" ... are used in ARIB B-62 chapter 5.2.
CJK compatibility ideographs are block of range U+F900..U+FAFF, which is different from CJK Unified Ideographs, and does not cover sections listed in this list.

Co-authored-by: himorin / Atsushi Shimono <atsushi@himor.in>

css-meeting-bot · 2025-09-25T15:18:47Z

The Timed Text Working Group just discussed Improve the ja character set per ARIB feedback w3c/imsc#614, and agreed to the following:

SUMMARY: Atsushi's comments accepted

The full IRC log of that discussion

<nigel> Topic: Improve the ja character set per ARIB feedback #614
<nigel> github: https://github.com//pull/614
<cpn> Pierre: Atsushi, if I accept both your comments, are you happy with the PR?
<cpn> Atsushi: The comment about ARIB was just a suggestion
<nigel> -> Atsushi's comment https://github.com//pull/614#issuecomment-3332729041
<cpn> Atsushi: The comment relates to the suggested change above.
<cpn> Pierre: I accepted the suggestion. Is the PR ok now?
<cpn> Atsushi: Yes
<nigel> SUMMARY: Atsushi's comments accepted

himorin · 2025-09-26T08:46:46Z

I've read through whole updated text again.
Sorry but it seems I've missed about line 3641 (CJK Compatibility Ideographs). I'm not sure where this came from, but I could not find in liaison text from ARIB, and also is not match with IVS.
In original text (from ARIB), this part is written as additional note starting with *, we could follow that format???

palemieux · 2025-09-26T16:45:32Z

In original text (from ARIB), this part is written as additional note starting with *, we could follow that format???

Ideographic Variation Selector is not a defined term in Unicode.

23.4 Variation Selectors, CJK Compatibility Ideographs states:

It is important to distinguish standardized variation sequences for CJK compatibility
ideographs from the variation sequences that are registered in the Ideographic
Variation Database (IVD). The former are normalization-stable representations of the
CJK compatibility ideographs; they are defined in StandardizedVariants.txt, and
there is precisely one variation sequence for each CJK compatibility ideograph.

I assumed ARIB meant standardized variation sequences for CJK compatibility ideographs.

himorin · 2025-09-29T06:36:44Z

In UTS #37, IVS (Ideographic Variation Sequence) is defined as a sequence of two coded characters, first as Ideographic, second as one of variation selector. Since IVS itself is a "sequence" of two Unicode codepoints, but uses one variation selector, so sometimes it is written as Ideographic Variation Selector (like About IVD/IVS at CITPC) (or used the term in early phase of development e.g. some proposals).
Codepoints from U+E0100 to U+E01EF are named as Ideographic-specific Variation Selectors.

CJK Compatibility ideographs are compatibility ideographs most of which are normalized into CJK Unified ideographs, but required for backward compatibility with local character encodings. Also some parts of CJK Compatibility ideographs are included as collections listed here, like IBM 32 compatibility ideographs U+FA0E to U+FA2D are listed as part of collection 287 Common Japanese. Following Unicode 6.3, these ranges got another table using SVS (Standardized Variation Sequences, using Standardized Variation Selectors - U+FE00 to U+FE02) as described in the section, which makes codepoints in CJK Compatibility ideographs to be written with CJK Unified Ideographs with one of SVS.

So, I believe the note included as the last line of list 2 in liaison text, shall be read as use the Ideographic-specific Variation Selector defined in Unicode, or Ideographic Variation Sequence (IVS) defined in Unicode.

palemieux · 2025-10-09T04:11:16Z

So, I believe the note included as the last line of list 2 in liaison text, shall be read as use the Ideographic-specific Variation Selector defined in Unicode, or Ideographic Variation Sequence (IVS) defined in Unicode.

Ideographic variation sequences are not part of Unicode and instead specified in UTS 37, but Standardized variation sequences are specified in Unicode.

Does ARIB STD-B62 reference UTS 37?

Are sure that ARIB does not mean Standardized variation sequences?

Can ARIB provide examples of what they mean by "variation of Kanji character"?

palemieux · 2025-10-09T15:23:39Z

The basic question is: which clauses of the Unicode standard and what expected conformance does the following requirement from the ARIB liaison refer to?

For variation of Kanji characters, the Ideographic Variation Selector defined in [ISO10646] shall be used.

In particular, are specifications beyond ISO 10646 required to specify and/or conform to the requirement.

In addition, a few examples would be appreciated.

css-meeting-bot · 2025-10-09T15:25:56Z

The Timed Text Working Group just discussed Improve the ja character set per ARIB feedback w3c/imsc#614, and agreed to the following:

SUMMARY: @himorin to ask informally for clarification as per the above discussion.

The full IRC log of that discussion

<nigel> Subtopic: Improve the ja character set per ARIB feedback #614
<nigel> github: https://github.com//pull/614
<nigel> Pierre: [shares screen]
<nigel> .. Liaison from ARIB raises the question at hand.
<nigel> .. ARIB kindly suggested character set changes for ja, which is great.
<nigel> .. There's a note about Ideographic Variation Selector.
<nigel> .. However that is not a defined term.
<nigel> .. Atsushi and I have been discussing how to interpret it.
<nigel> .. We need to figure out what that means, so we don't write something different from
<nigel> .. what they intend.
<nigel> .. From Atsushi's last comment I think "ideographic variation sequence"?
<nigel> Atsushi: CJK compatibility ideographs are there for compatibility.
<nigel> .. There can be mismapping between character set and what Unicode says.
<nigel> .. For backward compatibility between local character set and unicode some characters
<nigel> .. have both mappings within [scribe missed].
<nigel> .. I believe that is not related to variation sequence or anything else.
<nigel> .. If someone wants to say about the variation selector usually we say
<nigel> .. "ideographic variation selector" or "ideographic variation sequence"
<nigel> .. so they should mean the same as each other. They are terms used interchangeably.
<nigel> .. I believe what the point means is that the ideographic variation sequences shall be used.
<nigel> Pierre: That's not part of main Unicode, it's part of UCS-37. Does ARIB reference UCS-37?
<nigel> Atsushi: Variation selector itself is in ISO10646
<nigel> Pierre: That's a much broader thing though, includes emoji selectors which I think we don't want.
<nigel> Atsushi: shows [Ideographic variation sequence] in Unicode 17.0.0
<nigel> Pierre: You have to know how to represent it.
<nigel> Atsushi: Representation is described in a separate database, not in ISO10646.
<nigel> Pierre: Before saying you must or should support this I want to know absolutely certainly that
<nigel> .. is what ARIB has in mind. Can we get a sample?
<nigel> .. I don't want to suggest a mandatory thing that's wrong or won't be used.
<nigel> Atsushi: I wonder if I can ask a "side" way from colleagues in NHK.
<nigel> Pierre: Please ask informally! I'm interested as an Editor in knowing which part of Unicode
<nigel> .. this "SHALL" exactly means.
<nigel> .. Just to clarify the terminology that doesn't exactly match the spec.
<nigel> Atsushi: Is it okay to reply to the liaison email by myself?
<nigel> Nigel: Yes I think that would be good. I'd suggest if you can write informally in response
<nigel> .. that we noticed this small difference in language and want to make sure that we understand
<nigel> .. correctly and ask for guidance or even sample data then that would help clear this up for us.
<nigel> .. I don't want to go around a whole formal liaison/response loop which will take a long time.
<nigel> Pierre: [drafts the essential request in the GitHub issue]
<nigel> SUMMARY: @himorin to ask informally for clarification as per the above discussion.

himorin · 2025-10-31T09:59:23Z

(still waiting reply from ARIB colleagues.)

css-meeting-bot · 2025-11-11T01:28:47Z

The Timed Text Working Group just discussed Improve the ja character set per ARIB feedback w3c/imsc#614, and agreed to the following:

SUMMARY: Hold this PR open pending feedback and hopefully an example, and do not hold up CRS publication

The full IRC log of that discussion

<nigel> Subtopic: Improve the ja character set per ARIB feedback #614
<nigel> github: https://github.com//pull/614
<wschildbach> pierre: we added a recommend charset based on ARIB input.
<wschildbach> .. unfortunately, there is in the liaison some vagueness. We should make sure we get it right.
<wschildbach> .. we asked for more details but got no clarifciation.
<wschildbach> .. don't want to remove the text but we need clarification. This is informative (should not a shall), it is usefull but not necessary.
<wschildbach> nigel: I think that the idiographic selector is not defined where it says it is.
<wschildbach> .. translation issue?
<wschildbach> atsushi: this is not a stopping issue
<wschildbach> nigel: if your colleague comes later, let's ask them
<wschildbach> .. is there a choice of terminology and we need to use the correct one?
<wschildbach> pierre: this is a complex part with many things falling underneath it.
<wschildbach> .. what would be most useful would be an example of what is meant.
<wschildbach> .. I find it a complex part of the unicode spec.
<wschildbach> .. as atsushi pointed out, terms may have changed. Ideally have an example.
<wschildbach> .. here is sample tet that uses IVS, and here is what we expect the rendering to be.
<wschildbach> s/tet/text/
<wschildbach> .. and we could include a spec action.
<nigel> s/a spec action/in the spec actually
<nigel> s/in/it in
<wschildbach> nigel: this is unresolved right now. so we are saying we can proceed to CRS without resolving?
<wschildbach> atsushi: agrees.
<wschildbach> nigel: we merge later.
<wschildbach> atsushi: this is not normative, so don't need another crs
<wschildbach> nigel: we can put change in and request transition to rec
<wschildbach> .. implementation report will be empty. It is a formality.
<wschildbach> s/crs/CRS/
<nigel> SUMMARY: Hold this PR open pending feedback and hopefully an example, and do not hold up CRS publication
<nigel> forcedDisplay and visibility="hidden" #484
<nigel> s/forced/Subtopic: forced

…son-chars

palemieux · 2026-02-12T04:40:26Z

@himorin I have updated the ja character set with the list you provided:

  (Basic Japanese Collection)
  Collection 285 at [ISO10646]
  (Japanese Non Ideographic Extension)
  Collection 286 at [ISO10646]
  (JIS2004 Ideographics Extension)
  Collection 371 at [ISO10646]
  (Additional symbols and characters (Part 1))
  Table 5-2 and 5-3 at [ARIB-STD-B62]

imsc1/spec/ttml-ww-profiles.html

himorin · 2026-02-12T06:54:42Z

Could you add back about IVS part as suggestion? IVS is not related to CJK Compatibility Ideographs as slide I've shared, but is used for all of (CJK) Ideographic characters as a liaison reply to Nigel's email (could not find in archive,,,).

palemieux · 2026-02-12T15:59:34Z

Could you add back about IVS part as suggestion? IVS is not related to CJK Compatibility Ideographs as slide I've shared, but is used for all of (CJK) Ideographic characters as a liaison reply to Nigel's email (could not find in archive,,,).

The slides mentions that ARIB STD-B62 "picks 19 glyphs from IVD" not support for any and all contents of the IVD. Can you provide the list of 19?

palemieux · 2026-02-12T16:21:30Z

Add Table 7-8: Operational Ideographic Variation Sequence from
https://www.arib.or.jp/english/html/overview/doc/8-TR-B39v2_5-2p5-E1.pdf

css-meeting-bot · 2026-02-12T16:30:43Z

The Timed Text Working Group just discussed Japanese character set, and agreed to the following:

SUMMARY: @palemieux to add the 19 IVS, @himorin to check details with contact

The full IRC log of that discussion

<cpn> Subtopic: Japanese character set
<nigel> github: https://github.com//pull/614
<nigel> s/set/set #614
<cpn> Nigel: We have some feedback from ARIB
<cpn> ... What's the status now?
<cpn> Pierre: Atsushi has asked for text to be added that requires conformance with IVS
<cpn> Atsushi: What ARIB is used is standard IVS and IVD specificied in ISO ?? spec
<cpn> ... I asked to remove CJK Compatibility Ideographs, and add a note on using IVS for ideographic characters. This is background material for that
<cpn> Pierre: My concern is IVD is huge, with lots of unrelated stuff. Can we just include a list of the 19 glyphs?
<cpn> Atsushi: ARIB-STD-62 refers to IVD ...
<cpn> Pierre: My objection from the beginning about referencing IVD is that it's unbounded, and we don't want people to have to support all of IVD just to support the Japanese character set
<cpn> ... Can we copy the list?
<cpn> Atsushi: I believe so. I'm not sure exactly where they are
<atsushi> https://www.arib.or.jp/kikaku/kikaku_hoso/tr-b39.html
<cpn> Atsushi: I found the English version
<cpn> ... Look at the second table
<nigel> -> English translation of Fascicle 2 https://www.arib.or.jp/english/html/overview/doc/8-TR-B39v2_5-2p5-E1.pdf
<cpn> Nigel: Found it, Table 7-8, page 3-63, Fascicle 2
<nigel> Table 7-8 in section 7.4.4 includes the 19 IVS characters
<cpn> Pierre: I'll add it to the PR
<cpn> Nigel: In Section 8.4.1, it says the ideographic variation sequence is not operated.
<cpn> Atsushi: This is not a standard, but an operational recommendation
<cpn> ... In the discussion in Japan, they list commonly used variation selector characters. We mention Table 5.2 and 3 in the IMSC document
<cpn> ... I believe this is just a set of characters that are actually used in current broadcast systems
<cpn> Pierre Not sure how to interpret that sentence, Nigel. Atsushi, what does the original say?
<cpn> Atsushi: I don't have access, I only have the text provided by Ohmata-san
<cpn> Nigel: I think those sequences have proven useful, but not actually required
<cpn> Pierre: I recommend drafting the PR and ask ARIB for feedback
<cpn> Atsushi: The table in 7.4.4 are commonly used in broadcasting in Japan. It describes fallback operation, commonly used for IVS and IVD characters.
<cpn> ... Not all fonts support IVD glyphs, as IVD includes several sets of variation sequences
<cpn> ... But in any case, I'll ask Ohmata-san
<cpn> Nigel: I'll also look at the re-ordering PR
<nigel> SUMMARY: @palemieux to add the 19 IVS, @himorin to check details with contact

palemieux · 2026-02-12T16:57:18Z

@himorin Please share the revised ja character set with ARIB folks for their feedback.

nigelmegitt · 2026-02-13T10:11:47Z

Sorry to leak the PR Preview fix attempt into here, but since it's the only open PR that contains content I thought it worthwhile. I cherry-picked @himorin 's fix in #637 and the build step seemed to pass, but the PR description wasn't updated to add the Preview and Diff links. However, manually navigating to https://pr-preview.s3.amazonaws.com/w3c/imsc/pull/614.html shows that the PR Preview did actually build.

The diff link I was expecting is https://pr-preview.s3.amazonaws.com/w3c/imsc/614/9ea529d...875ec4c.html but it doesn't look like that's been created - at least I get an Access Denied page from it.

nigelmegitt · 2026-02-13T10:39:14Z

Link to W3C HtmlDiff page for this PR: https://services.w3.org/htmldiff?doc1=https%3A%2F%2Fwww.w3.org%2FTR%2Fttml-imsc1.3%2F&doc2=https%3A%2F%2Fpr-preview.s3.amazonaws.com%2Fw3c%2Fimsc%2Fpull%2F614.html

imsc1/spec/ttml-ww-profiles.html

Co-authored-by: Nigel Megitt <nigel.megitt@bbc.co.uk>

…son-chars

css-meeting-bot · 2026-02-26T16:23:41Z

The Timed Text Working Group just discussed Improve the ja character set per ARIB feedback w3c/imsc#614, and agreed to the following:

SUMMARY: PR to be merged.

The full IRC log of that discussion

<nigel> Topic: Improve the ja character set per ARIB feedback #614
<nigel> github: https://github.com//pull/614
<nigel> Nigel: We had some good input that we processed last meeting. What's the status?
<nigel> Pierre: [shares screen showing preview of the pull request]
<nigel> .. Atsushi, what do you suggest?
<nigel> Atsushi: ARIB TR document is some sort of operational manual which records the current situation.
<nigel> .. It is not normative.
<nigel> .. It could be changed.
<nigel> Pierre: Remove the TR-B39 sequence?
<nigel> Atsushi: Maybe just note that these are operationally used but not normative.
<nigel> Pierre: suggests removing the explicit list and just referencing TR-B39
<nigel> Atsushi: the note also could apply to CJK ideographic characters
<nigel> Pierre: Make the reference to the ARIB TR a note?
<nigel> Atsushi: Yes something like that, or suggest that IVS is used for CJK and operationally used IVSes are
<nigel> .. used in ARIB.
<nigel> Pierre: What's the down side of referencing ARIB-TR-B39?
<nigel> Atsushi: It's a link from normative text to a non-normative document.
<nigel> Pierre: The entire annex is just a SHOULD not a SHALL.
<nigel> Atsushi: SHOULD is normative too.
<nigel> Pierre: It's useful though.
<nigel> Atsushi: Yes, useful but the normative definition in ARIB STD is that IVS may be used, but operationally
<nigel> .. characters listed in ARIB TR are the ones currently used.
<nigel> Pierre: Exactly.
<nigel> Atsushi: That's why I'm afraid that the TR definition may be changed, so I want to turn that part into a non-normative note.
<nigel> Pierre: Sure, [makes edit that the IVS is a Note.
<nigel> s/e./e.]
<nigel> Pierre: Nigel, are you happy with this?
<nigel> Nigel: Yes. Are there any other issues related to the ARIB ja character set that we should be covering off here?
<nigel> Pierre: [checks] I think so
<nigel> Atsushi: Yes
<nigel> Nigel: Great, let's go for it then.
<nigel> Pierre: [pushes the change]
<nigel> Nigel: The note about 10646 and Unicode - why is that in the ja section? Oh, because it's only referenced in the ja character set
<nigel> Pierre: That's right
<nigel> Atsushi: There are corrections that are only in the ISO spec.
<nigel> .. I approved the PR
<nigel> Nigel: I approved it too
<nigel> Pierre: I have to fix the merge conflicts then I'll merge the PR.
<nigel> SUMMARY: PR to be merged.

Improve the ja character set per ARIB feedback

0438e9c

nigelmegitt added the agenda label Sep 9, 2025

nigelmegitt mentioned this pull request Jun 3, 2025

TTWG Meeting 2025-09-11 w3c/ttwg#315

Closed

palemieux requested a review from himorin September 11, 2025 15:54

palemieux mentioned this pull request Sep 11, 2025

ARIB liaison 2025-09-05: Section B. Common Character Sets - Table 2 #613

Closed

nigelmegitt mentioned this pull request Sep 23, 2025

TTWG Meeting 2025-09-25 w3c/ttwg#316

Closed

Merge branch 'main' into issues/0613-arib-liaison-chars

296506a

himorin reviewed Sep 25, 2025

View reviewed changes

imsc1/spec/ttml-ww-profiles.html Outdated Show resolved Hide resolved

himorin reviewed Sep 25, 2025

View reviewed changes

imsc1/spec/ttml-ww-profiles.html Outdated Show resolved Hide resolved

himorin reviewed Sep 25, 2025

View reviewed changes

imsc1/spec/ttml-ww-profiles.html Outdated Show resolved Hide resolved

himorin reviewed Sep 25, 2025

View reviewed changes

imsc1/spec/ttml-ww-profiles.html Outdated Show resolved Hide resolved

palemieux and others added 2 commits September 25, 2025 08:05

Update imsc1/spec/ttml-ww-profiles.html

056fa20

Co-authored-by: himorin / Atsushi Shimono <atsushi@himor.in>

Update imsc1/spec/ttml-ww-profiles.html

a42652f

Co-authored-by: himorin / Atsushi Shimono <atsushi@himor.in>

palemieux requested a review from himorin September 25, 2025 15:06

palemieux and others added 2 commits September 25, 2025 08:14

Update imsc1/spec/ttml-ww-profiles.html

9cd4103

Co-authored-by: himorin / Atsushi Shimono <atsushi@himor.in>

Update imsc1/spec/ttml-ww-profiles.html

101919e

Co-authored-by: himorin / Atsushi Shimono <atsushi@himor.in>

Change reference back

2c94e3e

palemieux mentioned this pull request Sep 25, 2025

Make the reference to The Unicode Standard undated #618

Closed

nigelmegitt mentioned this pull request Oct 7, 2025

TTWG Meeting 2025-10-09 w3c/ttwg#317

Closed

palemieux removed the agenda label Nov 11, 2025

nigelmegitt mentioned this pull request Jan 27, 2026

TTWG Meeting 2026-01-29 w3c/ttwg#326

Closed

nigelmegitt added the agenda label Feb 10, 2026

nigelmegitt mentioned this pull request Feb 10, 2026

TTWG Meeting 2026-02-12 w3c/ttwg#327

Closed

palemieux added 2 commits February 11, 2026 20:37

Update per feedback from @himorin

db761c6

Merge remote-tracking branch 'origin/main' into issues/0613-arib-liai…

c9619a8

…son-chars

himorin reviewed Feb 12, 2026

View reviewed changes

imsc1/spec/ttml-ww-profiles.html Show resolved Hide resolved

Added Operational Ideographic Variation Sequence at [[ARIB-TR-B39]]

0935142

himorin and others added 2 commits February 13, 2026 09:57

adding exact file name for xml-schemas referencing

e92a35f

remove trailing blank lines

9ea529d

nigelmegitt reviewed Feb 13, 2026

View reviewed changes

imsc1/spec/ttml-ww-profiles.html Outdated Show resolved Hide resolved

imsc1/spec/ttml-ww-profiles.html Outdated Show resolved Hide resolved

Address #614 (comment)

b5c3d1f

himorin mentioned this pull request Feb 16, 2026

PR preview seems to be broken #635

Closed

Apply suggestion from @nigelmegitt

c284c16

Co-authored-by: Nigel Megitt <nigel.megitt@bbc.co.uk>

nigelmegitt mentioned this pull request Feb 24, 2026

TTWG Meeting 2026-02-26 w3c/ttwg#328

Closed

Per 2026-02-26 TTWG meeting

c1d4062

palemieux requested review from himorin and nigelmegitt February 26, 2026 16:19

himorin approved these changes Feb 26, 2026

View reviewed changes

nigelmegitt approved these changes Feb 26, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into issues/0613-arib-liai…

86773b9

…son-chars

palemieux merged commit 8b73c56 into main Feb 26, 2026
2 of 3 checks passed

Conversation

palemieux commented Sep 7, 2025 • edited by pr-preview bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

himorin commented Sep 25, 2025

Uh oh!

css-meeting-bot commented Sep 25, 2025

Uh oh!

himorin commented Sep 26, 2025

Uh oh!

palemieux commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

himorin commented Sep 29, 2025

Uh oh!

palemieux commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palemieux commented Oct 9, 2025

Uh oh!

css-meeting-bot commented Oct 9, 2025

Uh oh!

himorin commented Oct 31, 2025

Uh oh!

css-meeting-bot commented Nov 11, 2025

Uh oh!

palemieux commented Feb 12, 2026

Uh oh!

Uh oh!

himorin commented Feb 12, 2026

Uh oh!

palemieux commented Feb 12, 2026

Uh oh!

palemieux commented Feb 12, 2026

Uh oh!

css-meeting-bot commented Feb 12, 2026

Uh oh!

palemieux commented Feb 12, 2026

Uh oh!

nigelmegitt commented Feb 13, 2026

Uh oh!

nigelmegitt commented Feb 13, 2026

Uh oh!

Uh oh!

Uh oh!

css-meeting-bot commented Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

palemieux commented Sep 7, 2025 •

edited by pr-preview bot

Loading

palemieux commented Sep 26, 2025 •

edited

Loading

palemieux commented Oct 9, 2025 •

edited

Loading