Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 97 additions & 62 deletions spell-check-custom-dictionary/README.md
Original file line number Diff line number Diff line change
@@ -1,112 +1,147 @@
# Explainer: Spell Check Custom Dictionary API
# Explainer: Spell Check Dictionary API

## Authors
- [Ziran Sun](mailto:zsun@igalia.com)
- [Brian Kardell](mailto:bkardell@igalia.com)
- Jihye Hong

* Ziran Sun \<zsun@igalia.com\>
* Jihye Hong
---

## Table of Contents
- [Introduction](#introduction)
- [User-Facing Problem](#user-problem)
- [Proposed Approach](#proposed-approach)
- [Accessibility, Internationalization, Privacy, and Security Considerations](#security)
- [Stakeholder Feedback / Opposition](#stakeholder)
- [References & acknowledgements](#reference)
## Introduction

## <a name="introduction"></a> Introduction
Browsers already provide spell checking, correction, and completion by comparing text against built‑in dictionaries (local or server‑side). This works well for general language, but breaks down on pages that rely heavily on domain‑specific terminology—product names, proper nouns, fictional universes, technical jargon, and other vocabulary that is valid *in context* but absent from standard dictionaries.

This explainer proposes a new API, Spell Check Custom Dictionary API, to allow pages to selectively suppress spell check violations, offer better spelling correction suggestions, and potentially enable user agents to explore additional future niceties.
This explainer proposes a lightweight mechanism for pages to supply such context‑specific terminology to the user agent. By giving the browser a list of known‑valid words, authors can reduce false positives, improve suggestion quality, and open the door to future enhancements that rely on domain‑aware text processing.

## <a name="user-problem"></a> User-Facing Problems
---

Browsers offer various things related to spell-checking, correction and completion. Browsers accomplish this via comparison with words in installed dictionary/dictionaries(locally or at the server side). The misspelt words will then be marked for spelling errors, and/or offered as potential corrections, suggestions or completions.
## User‑Facing Problems

However, there are specific cases where words are not in those existing dictionaries but still valid in the context of the page. For example,
Spell checkers routinely flag words that are correct within a site’s domain but unknown to general dictionaries. Examples include:

- A website dedicated to Pokémon might feature the names of various Pokémon characters, such as Pikachu and Charmander.
- A website related to analyzing the economic market status might include the terminology related to companies' names and products that do not come from standard dictionaries but are valid in the context.
- A Pokémon wiki containing names like *Pikachu* or *Charmander*.
- A financial analysis dashboard referencing company‑specific product names or tickers.
- A medical or scientific tool using specialized terminology.

When the "valid" words are marked for spelling errors in these kinds of circumstances, it could be misleading, frustrating and distracting.
False positives in these contexts are distracting, misleading, and erode user trust. While browsers allow *users* to add custom words globally, there is currently no way for *pages* to provide a per‑document dictionary that applies only within their own context in order to reduce those false positives.

It would be useful for websites to have options to treat those "site specific" words as if they were in the known dictionaries.
Authors need a way to treat domain‑specific words as “known” without requiring user intervention.

---

## <a name="proposed-approach"></a> Proposed Approach
## Proposed Approach: SpellCheckDictionary API

### Spell Check Custom Dictionary API
We propose introducing a per‑document, transient dictionary exposed via a new interface:

We are introducing a new interface: the SpellCheckCustomDictionary API. As the name sugguests, it is a dictionary created and customized by websites/developers for spell checking related tasks. The proposed API will allow the web page to add, remove words, typically those not in the dictionary/dictionaries that browsers already have, in the custom dictionary. During the spelling check, the spellchecker will also check words against this custom dictionary. This mechanism gives the pages an option to selectively suppress spell check violations on their own page.
### `SpellCheckDictionary`

It is noted that some browsers allow users to add/remove words via the browsers’ setting panel, which is normally managed via the browser process. The SpellCheckCustomDictionary API is a different concept. The SpellCheckCustomDictionary API fills the gap that allows pages to programmatically modify the dictionary on a per-document basis and it is managed by the render process.
This interface provides a single observable array:

### Design Considerations on Interop
```js
SpellCheckDictionary.words = [
"Igalia",
"Wolvic",
"Ziran",
"SpellCheckDictionary"
];
```

Handling domain specific terminology/phrases/words probably is not the use case only for spell checker. For example, Web Speech has introduced [Contextural biasing API](https://github.com/WebAudio/web-speech-api/blob/main/explainers/contextual-biasing.md) to handle domain-specific terminology, proper nouns, or other words that are unlikely to appear in general conversation. There might be other usages or requirements for custom dictionaries in other places or in the future.
Key characteristics:

Words/phrases in a custom dictionary for one component (e.g. spellchecker) in a specific domain could apply for another component (e.g. web speech). For words/phrases that are duplicate among the dictionaries of these components, it would make sense for developers to only update the word list once rather than looping through all the dictionaries in the components. For this reason, we are introducing generic interfaces for phrase and dictionary. A component has the option to bind to the dictionary for word/phrase updates.
- **Observable array**
The browser’s spell checker observes changes to `.words` and incorporates them into its checks.

### New API Components
- **Per‑document lifecycle**
The dictionary exists only for the lifetime of the document. Closing the tab or navigating away discards it.

#### `CustomPhrase` Interface
- **Render‑process managed**
Unlike user‑managed dictionaries (which live in browser settings and are global), this dictionary is scoped to the page and controlled programmatically.

Represents a single phrase and an optional dictionary that contains the extra parameters associated with the phrase. The optional Dictionary data type attribute is introduced here to accommodate use cases that may require further specific parameters. For example, an extra parameter `boost` for web speech.
- **Simple, efficient design**
A static interface with a single observable array allows:
- fast bulk assignment,
- efficient parsing of serialized lists,
- straightforward garbage collection,
- minimal API surface.

```json=
new Customphrase('Igalia', {boost: 2.0})
```
Note that "words" is losely defined and may include spaces or special characters.

#### CustomDictionary Interface
A detailed description of the Chromium design is available in [The Per‑Document Design in Chromium](https://docs.google.com/document/d/1ND1a1Z4i6kXMHqMwEyRkHSj5VVTWgX5Ya0aNLgVQYGw/edit?tab=t.0#heading=h.kmfizh6cwyy4).

##### CustomDictionary.words
---

A`CustomPhrase` array. Since the words/phrases in the dictionary are mutable, we are adopting the concept of ObservableArray. This observable array can be modified like a JavaScript Array.
## Alternatives Considered

```json=
const customDict = new CustomDictionary();
const phraseData = [
{ phrase: 'Igalia' },
{ phrase: 'Wolvic' }
];
### 1. A Unified `CustomDictionary` Across Features

Domain‑specific vocabulary is not unique to spell checking. Web Speech, for example, includes a [Contextual biasing API](https://github.com/WebAudio/web-speech-api/blob/main/explainers/contextual-biasing.md) for transcription of rare or domain‑specific terms. Text‑to‑Speech may eventually need similar mechanisms for pronunciation.

We explored whether a unified `CustomDictionary` or shared class hierarchy could serve multiple features. However:

- The shared abstraction becomes little more than a marker interface.
- Chromium already ships the biasing feature unprefixed, and Firefox is close behind, limiting room for redesign.
- Browsers can already treat Web Speech terms as valid for spell checking (or vice versa) *without* additional API surface.

const phraseObjects = phraseData.map(p => new CustomPhrase(p.phrase));
customDict.words = phraseObjects;
Given these constraints, a unified abstraction adds complexity without clear benefit.

customDict.words.push(new Customphrase('Orca'));
customDict.words.pop();
### 2. Manual Synchronization Between Features

If authors *do* want to share vocabulary between APIs, this is trivial today. Given that
phrase objects have more robust information, it can be as simple as:

```js
SpellCheckDictionary.words =
recognition.phrases.map(it => it.phrase);
```

#### Component binding
While this isn't especially efficient, if it matters it's not _much_ harder to share the loop that creates each observable array...

An interface between a component and a `CustomDictionary`.
```js
// Populate both dictionaries in one pass
phraseObjects = [];
dictionaryWords = [];

```json=
cost customDict = new CustomDictionary();
wordData.forEach(item => {
// add words and phrases
});

const spellChecker = new SpellCheckBinding();
spellChecker.bind(customDict);
// Apply on assignment
SpellCheckDictionary.words = dictionaryWords;

const recognition = new SpeechRecognition();
recognition.bind(customDict);
recognition.phrases = phraseObjects;
```


customDict.words.push(Customphrase('Interop', {boost: 2.0}));
If this pattern is still taxing or inefficient, we can always consider adding a convenience method later.

```
For now, keeping the API minimal avoids premature abstraction.

---

## Accessibility, Internationalization, Privacy & Security

### Per-document based dictionary
- **Transient data**
The custom dictionary is discarded when the document or tab closes.

The spell check custom dictionary is a transient dictionary and lives no longer than the life cycle of the associated document. It is a per-document based in implementation. As an example, we have described the design for Chromium at [The Per-Document Design in Chromium](https://docs.google.com/document/d/1ND1a1Z4i6kXMHqMwEyRkHSj5VVTWgX5Ya0aNLgVQYGw/edit?tab=t.0#heading=h.5z9kcz3slooe).
- **No dictionary probing**
Browsers already prevent pages from detecting the contents of built‑in dictionaries via style or DOM observation. This API does not introduce new probing vectors.

## <a name="security"></a> Accessibility, Internationalization, Privacy, and Security Considerations
- **No new network exposure**
The API does not require network access and does not introduce new privacy risks.

The *Spell Check Custom Dictionary* data is transient and will be released once a document or tab is closed in Chromium.
We do not foresee accessibility or internationalization issues beyond those already inherent in spell checking.

Browsers already use designs that prevent observation or detection of words in the local dictionaries through style or DOM observations.
---

We do not foresee any particular network violation introduceds.
## Stakeholder Feedback / Opposition

*(To be filled as feedback is collected.)*

## <a name="stakeholder"></a> Stakeholder Feedback / Opposition
## <a name="reference"></a> References & Acknowledgements
---

## References & Acknowledgements

- Chromium design document: [The Per‑Document Design in Chromium](https://docs.google.com/document/d/1ND1a1Z4i6kXMHqMwEyRkHSj5VVTWgX5Ya0aNLgVQYGw/edit?tab=t.0#heading=h.kmfizh6cwyy4)
- Web Speech Contextual Biasing API
- Thanks to reviewers and collaborators across browser vendors and standards groups.