From 18a352f2ebb3c1bb7f6be02f9d8af700aab6ea80 Mon Sep 17 00:00:00 2001
From: r12a This document describes the basic requirements for Indic script layout and text support on the Web and in Digital Publications. These requirements provide information for Web technologies such as CSS, HTML, and SVG about how to support users of Indic scripts. The current document focuses on Devanagari, but there are plans to widen the scope to encompass additional Indian scripts as time goes on. This document describes the basic requirements for Indic script layout and text support on the Web and in eBooks. These requirements provide information for Web technologies such as CSS, HTML and SVG about how to support users of Indic scripts. The current document focuses on Devanagari, but there are plans to widen the scope to encompass additional Indian scripts as time goes on. The editor's draft of this document is being developed by the Indic Layout Task Force, part of the W3C Internationalization Interest Group. It is published by the Internationalization Working Group. The end target for this document is a Working Group Note. If you wish to make comments regarding this document, please raise them as github issues. Only send comments by email if you are unable to raise issues on github (see links below). All comments are welcome. To make it easier to track comments, please raise separate issues or emails for each comment, and point to the section you are commenting on using a URL for the dated version of the document. shows the canonical equivalence: There are two syllables in this word: SA+VIRAMA+KA+UU and LA. Note, however, that there are three Unicode grapheme clusters here: SA+VIRAMA, KA+UU and LA. Styling is done on the basis of the whole orthographic syllable, not the first character, nor even the first grapheme. Rule 5: Breaking should not be allowed at numerical values such as currency values, year etc. e.g. “100.00” or “10,000”, nor in “12:59”Canonical & Compatible Equivalence
Unicode Code charts – Devanagari & Devanagari Extended
-
+
Indic orthographic syllable boundaries
@@ -813,7 +817,7 @@ Various example use cases of ABNF based Indic orthographic syllable definiti
Typographic units
Guiding principles of Line breaking for Indian languages
Alignment of Initial letter of Indic scripts with hanging baseline
The part from the hanging baseline and the ascent of the Initial letter may follow the following mechanism, where n = h/2:
In Indic scripts that have a hanging baseline, the top alignment point is the hanging baseline, and the bottom alignment point is the text-after-edge, and the hanging baselines of both the initial letter and first line of text should be aligned.
@@ -1067,40 +1071,49 @@In vertical arrangement of characters writing each character on a new line may not be suitable in Indian languages. The vertical arrangements of characters are sometimes used in Indian texts. In order to form correct arrangements, it is preferred to follow tailored grapheme cluster approach. - Variations of vertical arrangement of the characters in Hindi is represent below :
- -✔️ | ❌ | ✔️ | ❌ | |
---|---|---|---|---|
व क्ता |
+ व क् ता |
+ + | श क्ति |
+ श क् ति |
+
स्वा | CHCv- Rule 2 |
ग | C - Rule 2 |
त | C - Rule 2 |
म् | CH - Rule 3 |
Collation is one of the most important features for Indic languages . It determines the order in which a given culture indexes its characters. This is best seen in a dictionary sorting order where for easy search words are sorted and arranged in a specific order. Within a given script, each allo-script may have a different sort-order. Thus in Hindi the conjunct glyph क्ष is sorted along with क , since the first letter of that conjunct is क and on a similar principle ज्ञ is sorted along with ज . The same is not the case with Marathi and Nepali which admit a different sort order.
Different scripts admit different sort orders and for all high end NLP applications. Sorting is @@ -1306,7 +1319,7 @@