WIP: Use Finite State Transducers (FST) as the backing store for language models #458

adamreichold · 2025-03-24T08:45:46Z

FST have the nice properties of both compressing the ngram data by exploiting common suffixes and prefixes as well as being embeddable into the binary in a form that is directly suitable for look-up thereby avoiding the separate decompression step and indireclty using memory mappings as supplied by the operating system for all binaries.

This is still WIP since I do not know how to regenerate the language models and it also seems like the unique models are built elsewhere.

TODO:

Integrate unique ngram models
Regenerate all language models
Drop non-unified models

Closes #121

…uage models FST have the nice properties of both compressing the ngram data by exploiting common suffixes and prefixes as well as being embeddable into the binary in a form that is directly suitable for look-up thereby avoiding the separate decompression step and indireclty using memory mappings as supplied by the operating system for all binaries. This is still WIP since I do not know how to regenerate the language models and it also seems like the unique ngram models are built elsewhere. TODO: * Integrate unique ngram models * Regenerate all language models * Drop non-unified models

adamreichold force-pushed the fst-storage branch 3 times, most recently from 0a648a3 to eab2273 Compare March 24, 2025 09:22

adamreichold force-pushed the fst-storage branch from eab2273 to f16ad0c Compare March 24, 2025 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: Use Finite State Transducers (FST) as the backing store for language models #458

WIP: Use Finite State Transducers (FST) as the backing store for language models #458

Uh oh!

adamreichold commented Mar 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

WIP: Use Finite State Transducers (FST) as the backing store for language models #458

Are you sure you want to change the base?

WIP: Use Finite State Transducers (FST) as the backing store for language models #458

Uh oh!

Conversation

adamreichold commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

adamreichold commented Mar 24, 2025 •

edited

Loading