Skip to content

feat(tts-native): platform-native TTS backends for macOS, Linux, Windows#487

Open
muunkky wants to merge 6 commits intoPeonPing:mainfrom
muunkky:upstream-pr/tts-native
Open

feat(tts-native): platform-native TTS backends for macOS, Linux, Windows#487
muunkky wants to merge 6 commits intoPeonPing:mainfrom
muunkky:upstream-pr/tts-native

Conversation

@muunkky
Copy link
Copy Markdown
Contributor

@muunkky muunkky commented Apr 21, 2026

Summary

Ships scripts/tts-native.sh (Unix) and scripts/tts-native.ps1 (Windows SAPI5) so tts.enabled: true produces audible speech on every supported platform. Before this, the integration layer landed in #442 was calling backend scripts that didn't exist yet — the hook silently no-op'd.

Built and tested against the interfaces already committed in ADR-001 (docs/adr/ADR-001-tts-backend-architecture.md, landed in v2.20.1) and against the existing Resolve-TtsBackend / Invoke-TtsSpeak paths in peon.ps1 and peon.sh.

What's in the PR (7 files, +1736 / -5)

New files

  • scripts/tts-native.sh — Unix backend. Platform-branches on uname: Darwin uses say, Linux prefers piper when binary+model both present else espeak-ng, MINGW/MSYS2 delegates to tts-native.ps1 via powershell.exe. Rate / volume passed via awk -v variables to block injection from hostile config. Always exits 0.
  • scripts/tts-native.ps1 — Windows SAPI5 backend via System.Speech.Synthesis. begin/process/end blocks for pipeline-bound $InputText. SAPI rate mapping [int][math]::Round(($Rate-1.0)*10) clamped to -10..+10. Volume mapping [int][math]::Round($Vol*100) clamped to 0..100. Voice resolution via GetInstalledVoices() (case-insensitive). -ListVoices switch emits installed voice names. try/catch around synthesis, always exits 0.
  • tests/tts-native.bats — 42 BATS scenarios: platform branching (Darwin/Linux/MINGW/unknown), engine priority (piper vs espeak-ng), unit conversions (rate and volume across all three engines), --list-voices per platform, contract (empty stdin, shell metacharacter safety, missing positional defaults), piper sidecar edge cases.
  • tests/tts-native.Tests.ps1 — 40 Pester scenarios: rate/volume mapping and clamping, stdin pipeline binding (production path), voice selection + case-insensitivity, SAPI5 spaced voice names (e.g. Microsoft David Desktop), -ListVoices output, error containment.

Modified

  • install.sh — 2 lines: remote-install curl for tts-native.sh + chmod +x. Local install already picks it up via the existing scripts/*.sh glob.
  • install.ps1 — new copy-or-download block for tts-native.ps1 matching the existing win-play.ps1 / win-notify.ps1 pattern. Also adds pwsh-with-fallback to the generated peon.cmd and peon bash shims (see below).
  • tests/adapters-windows.Tests.ps1 — structural tests for tts-native.ps1 (syntax, param shape, comment-based help, clamping, no ExecutionPolicy Bypass), install.ps1 copy-block assertion, and 2 tests for the pwsh-fallback shims.

Rides along: peon CLI shim resiliency

The generated peon.cmd and bash peon wrapper used to hardcode powershell -NoProfile -NonInteractive .... On environments where PSModulePath has PS 7 module dirs ahead of the 5.1 inbox paths (seen on dev boxes with CloudSDK or similar), PS 5.1 tries to load PS 7's incompatible Microsoft.PowerShell.Security module and fails with Get-ExecutionPolicy : module could not be loaded, breaking every peon ... invocation.

Both shims now probe for pwsh first (where pwsh in cmd, command -v pwsh in bash) and fall back to powershell.exe only when pwsh isn't on PATH. PS 5.1-only users see identical behavior. Covered by 2 structural tests.

I kept this in the same commit because the shim fix surfaced while smoke-testing tts-native.ps1 through the full peon CLI pipeline. Happy to split into a separate commit if you'd prefer that shape.

What's NOT in the PR (consciously held back)

  • No VERSION or CHANGELOG.md bump — that's your call at release time.
  • No roadmap / .gitban/ changes.
  • Two unrelated hygiene fixes found while developing this feature (timezone comparison in tests/peon-engine.Tests.ps1, hardcoded /usr/bin/python3 in tests/setup.bash) are held for separate tiny PRs if you want them.

Test plan

  • BATS tests/tts-native.bats 42/42 green locally (Git Bash + bats-core)
  • Pester tests/tts-native.Tests.ps1 40/40 green
  • Pester tests/adapters-windows.Tests.ps1 22/22 green on the new/modified tests (no regressions expected on the other 416)
  • Live audibility smoke on Windows SAPI5 via the direct script ('hello' | pwsh -File scripts/tts-native.ps1)
  • CI: BATS on macos-latest, Pester on windows-latest (will run when PR opens)
  • Maintainer real-hardware smoke on a Mac (say) and a Linux host with espeak-ng ('hello' | bash scripts/tts-native.sh default 1.0 0.5)
  • Hook-return latency within ±50ms of the tts.enabled: false baseline (measure via [exit] duration_ms log line)

Notes for reviewers

  • BATS suite mocks engine leaves (say, piper, aplay, espeak-ng, powershell.exe) via PATH-first stubs that log invocation shape to files. The real script body runs against stubs; no install_mock_tts_backend change.
  • Pester suite uses a PEON_TTS_DRY_RUN + PEON_TTS_TRACE_FILE mechanism on tts-native.ps1 so behavior can be asserted without actually speaking. Design-sanctioned testability hook.
  • This is a fork-maintained feature branch; happy to rebase, reshape, or split on request.

Adds scripts/tts-native.sh (Unix) and scripts/tts-native.ps1 (Windows
SAPI5) so 'tts.enabled: true' produces actual audible speech. Before
this, the integration layer landed in PeonPing#442 was calling backend scripts
that didn't exist yet — the hook silently no-op'd.

scripts/tts-native.sh branches on uname:
- Darwin  -> 'say' with rate conversion (rate * 200 wpm)
- Linux   -> piper preferred when binary + model both present, else
             espeak-ng; silent exit 0 if neither installed
- MINGW*  -> delegates to scripts/tts-native.ps1 via powershell.exe
- other   -> silent exit 0 (debug-logs under PEON_DEBUG=1)
Rate / volume passed via 'awk -v' variables to block injection from
hostile config. Always exits 0 so TTS failure never fails the hook.

scripts/tts-native.ps1 uses System.Speech.Synthesis (SAPI5, not WinRT):
- begin/process/end blocks accumulate stdin for ValueFromPipeline input
- Rate maps to SAPI -10..+10 via [int][math]::Round((Rate-1.0)*10)
- Vol maps to SAPI 0..100 via [int][math]::Round(Vol*100)
- Voice case-insensitive match against GetInstalledVoices()
- --list-voices / -ListVoices enumerates installed voices per platform
- try/catch around synthesis, always exits 0

Tests:
- tests/tts-native.bats: 42 scenarios (platform branching, engine
  priority, unit conversions, stdin handling including shell
  metacharacters, --list-voices per platform, piper sidecar edge cases)
- tests/tts-native.Tests.ps1: 40 Pester scenarios (rate/volume mapping,
  stdin pipeline binding, voice selection + case-insensitivity, spaced
  voice names, error containment, -ListVoices)
- tests/adapters-windows.Tests.ps1: structural assertions for
  tts-native.ps1 (syntax, param shape, help header, clamping, no
  ExecutionPolicy Bypass) and for the install.ps1 copy block

Installer wiring:
- install.sh: remote-install curl for tts-native.sh + chmod +x
  (local install already picks it up via scripts/*.sh glob)
- install.ps1: new copy-or-download block matching the existing
  win-play.ps1 / win-notify.ps1 pattern

Also fixes the peon CLI shims to prefer pwsh (PowerShell 7+) with a
fallback to powershell.exe. Environments with PS 7 installed can end up
with PSModulePath leaking PS 7 module dirs in front of the 5.1 inbox
paths; PS 5.1 then tries to load PS 7's incompatible
Microsoft.PowerShell.Security module and fails to resolve
Get-ExecutionPolicy, breaking every invocation through peon.cmd and the
bash 'peon' wrapper. pwsh uses its own isolated paths; 5.1-only users
see identical behavior. Covered by 2 structural tests in
adapters-windows.Tests.ps1.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 21, 2026

@muunkky is attempting to deploy a commit to the Gary Sheng's projects Team on Vercel.

A member of the Team first needs to authorize it.

muunkky added 5 commits April 21, 2026 22:36
The /usr/bin/python3 -> $PEON_PY portability fix from the TTS sprint
(card w3ciyq) got dropped during the squash onto upstream-pr/tts-native,
causing tts-native.bats test 815 to fail on CI.

Restores the PATH-based python3 resolver at file load time and swaps
all five /usr/bin/python3 callsites inside run_peon_tts() and
enable_debug_logging() to use "$PEON_PY".
Five pre-existing Pester failures on main (and PR 487) from PR PeonPing#475
trailed the install.ps1 refactor that replaced raw `-like` matching
with the Test-PathRuleMatch helper.

- adapters-windows.Tests.ps1 (3x "bindings marks active rule with
  asterisk"): regex updated from '\$marker.*-like.*\$rule\.pattern' to
  match the Test-PathRuleMatch call site.
- adapters-windows.Tests.ps1 ("evaluates path_rules against event
  cwd"): regex updated from 'cwd.*-like.*\$pattern' to the
  Test-PathRuleMatch call form.
- peon-packs.Tests.ps1 ("ide_rules evaluation runs after path_rules
  and before rotation"): IndexOf('\$config.pack_rotation') was
  matching the earlier '\$config.pack_rotation_mode' substring,
  making rotIdx smaller than ideIdx. Anchored on
  'elseif (\$config.pack_rotation' which only appears at the rotation
  fallback site.

The 6th CI-only failure ("status --verbose shows IDE rule and excluded
path context") passes locally on Windows but fails on GH Windows
runner, likely due to short-path (RUNNER~1) vs long-path resolution
in Test-PathRuleMatch when comparing \$PWD.Path to a config value.
Left for a separate investigation.
…unner

The "status --verbose shows IDE rule and excluded path context" test
was CI-only-red (passed locally). Root cause: on GitHub Windows runners
\$env:TEMP resolves to the short-name form (C:\Users\RUNNER~1\...), so
the config stored the short form while Set-Location + \$PWD.Path inside
the spawned peon.ps1 shell returned the long form. Test-PathRuleMatch
compared them as strings and missed the match, so the expected
"path rules skipped here (exclude_dirs): ..." line never printed.

Resolve-Path both WorktreesDir and ProjectDir to their canonical
long-path form after creating them, so both sides of the comparison
stay in sync regardless of how \$env:TEMP is spelled.
Adds an inline probe that echoes $PWD.Path and the configured
exclude pattern from the spawned shell, and emits the full captured
output as a -Because message on assertion failure. Purely diagnostic;
does not change the assertion semantics. Will be reverted once the
root cause of the CI-only failure is understood.
…ose test

Diagnostic probe on the GH Windows runner confirmed the mismatch:
  PWD.Path       = C:\Users\runneradmin\AppData\Local\Temp\...  (long)
  ExcludePattern = C:\Users\RUNNER~1\AppData\Local\Temp\...     (short)

On GitHub Windows runners, \$env:TEMP resolves to the short 8.3 form
(C:\Users\RUNNER~1\...), and Resolve-Path preserves that form. But
Set-Location + \$PWD.Path in the spawned shell expands the short name
to the long form — so Test-PathRuleMatch compared short-form pattern
against long-form PWD and missed.

Use Push-Location + \$PWD.Path on both WorktreesDir and ProjectDir at
test-setup time so the config values match the form peon.ps1 will see.
Diagnostic Write-Host probe from the prior commit is removed now that
the root cause is understood.
@muunkky muunkky marked this pull request as ready for review April 22, 2026 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant