fix: preserve sentence punctuation in chunk_text, align clean_text defaults, fix version by voidborne-d · Pull Request #131 · KittenML/KittenTTS

voidborne-d · 2026-04-21T20:04:58Z

Summary

Three interrelated fixes for text processing and API consistency.

1. `chunk_text` destroys sentence-ending punctuation (relates to #67, #72)

Root cause: re.split(r'[.!?]+', text) splits on sentence-ending punctuation and removes it entirely. The resulting chunks lose all periods, question marks, and exclamation marks. ensure_punctuation() then adds a comma to each chunk, so the TTS model sees commas where periods/question marks should be — degrading prosody, intonation, and causing the model to treat declarative, interrogative, and exclamatory sentences identically.

This contributes to:

BUG:[ONNXRuntimeError] : 2 : INVALID_ARGUMENT #67 — degraded output quality for longer texts (punctuation-dependent prosody is lost across all chunks)
合成语音末尾容易吞音 #72 — end-of-sentence audio issues (model receives comma instead of period/question mark)

Fix: Replace the destructive re.split(r'[.!?]+', text) with re.split(r'(?<=[.!?])\s+', text) — a lookbehind pattern that splits on whitespace after sentence-ending punctuation, keeping the punctuation attached to the preceding sentence.

2. `clean_text` default mismatch between public API and internal model

Users calling the public API (KittenTTS) never get text preprocessing (number expansion, abbreviation handling, etc.) unless they explicitly pass clean_text=True, because KittenTTS.generate() defaults to clean_text=False while the internal KittenTTS_1_Onnx.generate() defaults to True. Additionally, generate_to_file() didn't expose clean_text at all.

Fix: Align all KittenTTS wrapper defaults to True, add clean_text parameter to generate_to_file(), remove stale debug print() from generate().

3. Version mismatch (#80)

__init__.__version__ says 0.1.0 while pyproject.toml and setup.py say 0.8.1, causing wheel installation failures.

Fix: Update __init__.__version__ to 0.8.1.

Tests

26 new regression tests (tests/test_chunk_text_and_api.py), all lightweight (no model/ONNX/GPU/espeak needed):

TestChunkTextPunctuationPreservation (5)
TestChunkTextLongSentences (2)
TestChunkTextEdgeCases (6)
TestEnsurePunctuation (6)
TestCleanTextDefaults (3)
TestVersionConsistency (2)
TestSourceAudit (2)

26 passed, 0 failed

…faults, fix version Three interrelated fixes: 1. chunk_text destroys sentence-ending punctuation (affects KittenML#67, KittenML#72) - re.split(r'[.!?]+', text) strips all periods, question marks, and exclamation marks from the text before feeding it to the TTS model - The model needs this punctuation for correct prosody and intonation - Fix: use lookbehind split r'(?<=[.!?])\s+' to keep punctuation attached to the preceding sentence 2. clean_text default mismatch between public API and internal model - KittenTTS wrapper: clean_text=False (skips text preprocessing) - KittenTTS_1_Onnx: clean_text=True (runs text preprocessing) - Users calling the public API never get text preprocessing (number expansion, abbreviations, etc.) unless they explicitly pass clean_text=True - generate_to_file didn't expose clean_text at all - Fix: align wrapper defaults to True, add clean_text param to generate_to_file, remove debug print() from generate() 3. Version mismatch (KittenML#80) - __init__.py: 0.1.0 - pyproject.toml: 0.8.1 - setup.py: 0.8.1 - Fix: update __init__.py to 0.8.1 Tests: 26 new regression tests covering punctuation preservation, edge cases, API defaults, version consistency, and source audit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: preserve sentence punctuation in chunk_text, align clean_text defaults, fix version#131

fix: preserve sentence punctuation in chunk_text, align clean_text defaults, fix version#131
voidborne-d wants to merge 1 commit intoKittenML:mainfrom
voidborne-d:fix/chunk-text-punctuation-and-api-defaults

voidborne-d commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

voidborne-d commented Apr 21, 2026

Summary

1. chunk_text destroys sentence-ending punctuation (relates to #67, #72)

2. clean_text default mismatch between public API and internal model

3. Version mismatch (#80)

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `chunk_text` destroys sentence-ending punctuation (relates to #67, #72)

2. `clean_text` default mismatch between public API and internal model