A production-ready PowerShell toolchain for Speech Synthesis (TTS). Orchestrates Piper for inference and FFmpeg for encoding, optimized for high-density audio asset generation.
- Dual-Stage Pipeline: Implements a WAV-to-Opus workflow, reducing storage requirements by up to 15x while maintaining vocal clarity.
- Batch Processing: Automated high-throughput synthesis from plaintext datasets.
- Manifest Generation: Exports a
map.jsonlinking source text to generated artifacts for database or Anki integration. - Encoding Stability: Enforces UTF-8 console and input handling for multilingual support (German, Polish, etc.).
- Piper: Download Windows binary from Piper GitHub Releases.
- FFmpeg: Required for Opus encoding. Download from ffmpeg.org.
- System PATH: Ensure both
piperandffmpegare added to your Environment Variables.
Download .onnx and .json files from the Piper Voices Repository and place them in your $voiceDir.
.\say.ps1 -text "Ich möchte ein Studienkolleg besuchen" -lang "de" -speed 1.3
Expects a UTF-8 encoded .txt file with one phrase per line:
.\mass_say.ps1 -inputFile "vocab.txt" -lang "de" -speed 1.2
-text/-inputFile: The content to convert (Mandatory).-lang: Language code (en,de,dem(emotional),pl). Defaults tode.-speed: Thelength_scalefactor. Higher is slower (e.g., 1.5). Defaults to 1.2.
Files are saved to your $outputDir in .opus format.
Mass processing generates a map.json:
[
{
"id": 1,
"text": "Guten Tag",
"audio": "[sound:de_audio_1.opus]"
}
]
- Encoding: Ensure input files are UTF-8 encoded without BOM to prevent character corruption.
- FFmpeg: If encoding fails, verify FFmpeg is accessible via terminal and you have write permissions for the destination directory.