Skip to content

Suggestion: Preserve commas in Kokoro's preprocessing. #114

@LostRuins

Description

@LostRuins

I've been playing around more with Kokoro ever since I've integrated it into KoboldCpp, and I've noticed that you strip out commas , for dashes -- in order to create a pause.

normalized = replace_any(prompt, ",;:", "--");

Although it might seem similar on surface, the comma contains useful information that affects the characteristics of the generated audio.

Consider this sample phrase:
stealing from unsuspecting travelers, picking pockets, and conning the locals

There should be two pauses in a proper narration. This is from the hexgrad space

good.mp4

Now compare this with the TTS.cpp output, which since it strips all commas and replaces them effectively with spaces, generating this:

bad.mp4

Adding commas back solves the issue. I'm not sure if I missed something, so feel free to correct me. I do not use espeak so I don't know if there are negative repercussions there, but without espeak this results in an overall better narration for me personally.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions