This repository packages the latest hama grapheme-to-phoneme (G2P) model for pure inference scenarios. It ships:
- a Python package built with
uv, powered by ONNX Runtime - a Bun/TypeScript package that runs under Node.js/Bun and the browser
- shared tokenizer + Hangul jamo helpers
- reproducible tests for both runtimes
The training stack continues to live in hama-training; this
repo focuses purely on runtime ergonomics.
assets/ contains the frozen g2p_fp16.onnx graph plus the decoder/encoder vocab.
Each subpackage embeds a copy so it can work out-of-the-box.
Requirements: uv>=0.3, Python 3.9+.
cd python
uv sync --extra test
uv run pytestQuick demo script (python/example.py):
from hama import G2PModel
def main() -> None:
model = G2PModel()
result = model.predict("μλ
νμΈμ")
print("IPA:", result.ipa)
print("Alignments:", result.alignments)
if __name__ == "__main__":
main()Run it with:
uv run python python/example.pyThe public API lives in hama.__init__:
split_text_to_jamo/join_jamo_tokensβ reversible Hangul disassemblyG2PModel.predict(text)β returns IPA string plusphoneme -> char_indexalignments derived from attention weights
Pass model_path / vocab_path to G2PModel to point at custom checkpoints
and call predict repeatedly (the ONNX session is cached).
Requirements: bun>=1.1.
cd ts
bun install
bun run build
bun test
# Install published package (instead of local dist/)
bun add hama-js
# or
npm install hama-jsNode/Bun demo (ts/example.js):
import { G2PNodeModel } from "./dist/node/index.js";
const run = async () => {
const model = await G2PNodeModel.create();
const result = await model.predict("μλ
νμΈμ");
console.log("IPA:", result.ipa);
console.log("Alignments:", result.alignments);
};
run().catch((err) => {
console.error(err);
process.exit(1);
});Execute it after building:
node ts/example.jsUsing the published package instead of the local dist:
import { G2PNodeModel } from "hama-js/g2p";API overview:
G2PNodeModel.create({ modelPath?, maxInputLen?, maxOutputLen? })model.predict(text)β{ ipa, alignments }- Browser bundle:
import { G2PBrowserModel } from "hama-js/g2p/browser";(loadsonnxruntime-weband fetches the embedded ONNX file)
The package already copies assets/g2p_fp16.onnx + g2p_vocab.json into the dist
folder so Node/Bun resolves them via import.meta.url. For browser deployments,
ensure the assets are hosted next to the bundle (the default URL resolves
relative to the built module).
- Both runtimes use identical Hangul jamo logic so character indices map back to the original graphemes, even after jamo expansion.
- Input length defaults to 128 time steps to accommodate Korean + mixed tokens.
- Output alignment is derived from attention argmax, mirroring the training scripts.
assets/ # Shared ONNX + vocabulary
python/src/hama/ # Python runtime
python/tests/ # pytest suite
ts/src/ # TypeScript runtime (Node + browser)
ts/tests/ # bun test suite
- Publish
python/viauv publish/ PyPI, andts/ashama-js. - Integrate CI to run both
uv run pytestandbun test. - Wire up docs/examples + simple CLI wrappers if needed.