Skip to content

Enhanced/Premium voices on iOS-app-on-Mac silently fail when AVSpeechUtterance.rate < default #114

@youtalk

Description

@youtalk

Summary

When running the Mora app as "Designed for iPad on Apple Silicon Mac" (or Mac Catalyst), AVSpeechSynthesizer silently fails to render audio for Enhanced (.enhanced) or Premium (.premium) neural voices whenever AVSpeechUtterance.rate is set below AVSpeechUtteranceDefaultSpeechRate (0.5). The synthesizer fires didFinish within ~100 ms for multi-word phrases as if it had spoken them, but no audio reaches the speakers.

This is the root cause of the Mac-only "TTS が全くならない" symptom seen during the dictation read-aloud work in #112. iPad runtime is not affected — the same code paths and rates work correctly on a physical iPad.

Reproduction

Tested 2026-04-26 on macOS 26 (Sequoia/Tahoe) with the alpha build:

  1. Install Samantha (Enhanced) via System Settings → Accessibility → Read & Speak → System voice → Customize. Confirm quality=2 is reported in [AudioSession] AppleTTSEngine: picked voice id=... log.
  2. Run Mora as "My Mac (Designed for iPad)" in Xcode (or the same build distributed to Mac Catalyst).
  3. Trigger any TTS path that uses .slow (slowRate=0.40), .normal (normalRate=0.46), or the new .verySlow (verySlowRate=0.34).
  4. Observe: no audio. OSLog shows TTS done "..." firing within milliseconds of TTS:, far too short to physically be playing the phrase.

Sample log (warmup + dictation phase, captured during #112 testing):

07:13:05.089  AppleTTSEngine: picked voice id=com.apple.voice.enhanced.en-US.Samantha lang=en-US quality=2
07:13:05.089  TTS: "Which one says"
07:13:05.418  TTS done "Which one says"               (329 ms — borderline, perhaps inaudible)
07:13:10.621  TTS: "Two letters, one sound."
07:13:10.725  TTS done "Two letters, one sound."      (104 ms — physically impossible)
07:13:12.851  TTS: "mat"
07:13:12.943  TTS done "mat"                          (90 ms)

The same Samantha (Enhanced) voice plays the macOS Settings → Read & Speak preview button correctly, ruling out a corrupt voice file or audio routing problem.

Expected behavior

AVSpeechUtterance.rate < AVSpeechUtteranceDefaultSpeechRate should produce audibly slower speech, not silent (or imperceptibly short) playback. This works as expected on physical iPad with the same code and the same Enhanced/Premium voice.

Current state

  • AppleTTSEngine.rateFor(_:voice:) returns verySlowRate / slowRate / normalRate for Enhanced/Premium voices (intentional — slowdown for an ~8yo ESL learner with dyslexia).
  • Compact (.default) voices are unaffected because the same function returns AVSpeechUtteranceDefaultSpeechRate for them (the original guard exists because Compact voices smear sibilants below 0.5).
  • No Mac-specific override is in place. Two prior attempts at one (force default rate when Enhanced; uninstall Enhanced via Settings) were considered and rejected: the first because Mac is dev-iteration-only and shouldn't constrain the iPad behavior; the second because macOS Sequoia / Tahoe doesn't expose a clear "delete voice" UI in Read & Speak.

Investigation paths (not yet attempted)

  1. Find the precise rate threshold below which the silent failure starts. The bug may be triggered by, say, rate < 0.48 specifically, in which case a tiny slowdown could still be safe on Mac.
  2. Test AVSpeechUtterance.preUtteranceDelay / postUtteranceDelay as an alternative to rate for "feels slower" pacing — these may not hit the same code path.
  3. AVSpeechUtterance.prefersAssistiveTechnologySettings = true — uses the system VoiceOver / assistive-tech rate, may bypass the bug.
  4. Add a #if targetEnvironment(macCatalyst) / ProcessInfo.processInfo.isiOSAppOnMac guard that pins Enhanced/Premium to default rate on Mac runtime, leaving iPad unaffected. This is the safe-fallback path; deferred to allow a real fix to land first.
  5. File a Feedback Assistant report to Apple referencing this OSLog evidence.

Acceptance criteria

  • TTS is audible on iOS-app-on-Mac when an Enhanced or Premium voice is selected, including for .slow and .verySlow paces.
  • iPad behavior is unchanged: existing slowdown still applies.
  • Either the upstream issue is fixed (Apple), worked around in code with a documented runtime guard, or both.

Reference

  • Packages/MoraEngines/Sources/MoraEngines/Speech/AppleTTSEngine.swift:rateFor(_:voice:) — current rate selection logic.
  • Packages/MoraEngines/Sources/MoraEngines/TTSEngine.swiftTTSPace.verySlow / .slow / .normal definitions.
  • PR ui+engines: dictation read-aloud UX (auto-TTS, replay, sentence coverage) #112 — work that surfaced this regression.
  • Auto-memory entry: project_mora_mac_tts_silent_fail.md (local-only, summarizes the decision to defer Mac handling).

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions