-
Notifications
You must be signed in to change notification settings - Fork 30
Suggestion: Preserve commas in Kokoro's preprocessing. #114
Description
I've been playing around more with Kokoro ever since I've integrated it into KoboldCpp, and I've noticed that you strip out commas , for dashes -- in order to create a pause.
TTS.cpp/src/models/kokoro/model.cpp
Line 1416 in 724d97f
| normalized = replace_any(prompt, ",;:", "--"); |
Although it might seem similar on surface, the comma contains useful information that affects the characteristics of the generated audio.
Consider this sample phrase:
stealing from unsuspecting travelers, picking pockets, and conning the locals
There should be two pauses in a proper narration. This is from the hexgrad space
good.mp4
Now compare this with the TTS.cpp output, which since it strips all commas and replaces them effectively with spaces, generating this:
bad.mp4
Adding commas back solves the issue. I'm not sure if I missed something, so feel free to correct me. I do not use espeak so I don't know if there are negative repercussions there, but without espeak this results in an overall better narration for me personally.