Skip to content

feat: implement voice mode improvements with continuous loop, audio error recovery, and TTS discovery#459

Open
devin-ai-integration[bot] wants to merge 31 commits intomasterfrom
devin/1772755465-voice-mode-improvements
Open

feat: implement voice mode improvements with continuous loop, audio error recovery, and TTS discovery#459
devin-ai-integration[bot] wants to merge 31 commits intomasterfrom
devin/1772755465-voice-mode-improvements

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Mar 6, 2026

Voice Mode Improvements

Summary

Implements the maple-voice-improvements spec with three main features:

1. Voice Mode — Continuous Loop State Machine

When a user starts recording on a TTS-capable platform (desktop/iOS), the app enters voice mode: a hands-free loop of Recording → Processing → Waiting → Generating → Playing → (500ms pause) → Recording. The mic button highlights when voice mode is active and acts as an exit button. Exit is also triggered on chat switch, new chat, or any overlay X button.

2. Audio Error Recovery

On transcription failure, the audio blob is retained in memory. The overlay shows an error state with the original recording duration, a Retry button (re-sends the same blob), and a Discard button. This works both inside and outside voice mode.

3. TTS Discovery Prompt & Enhanced Feedback

  • One-time prompt: After the first successful voice message on TTS-capable platforms, if TTS models aren't installed, show an inline prompt ("Enable voice responses?") with a Download button (~264 MB) and dismiss (X).
  • Generating animation: The speaker icon (TTSButton) pulses during TTS generation.
  • Audio cues: Play mic-on.wav (gentle ascending tone) when recording starts and mic-off.wav (confirmation tone) after sending successfully.

Updates since last revision

Audio cue ordering & iOS silent switch fix

Root cause confirmed via debug buttons: the playAudioCue function itself works fine on mobile Safari — the issue was that iOS switches to a play-and-record audio session when getUserMedia activates the microphone, which mutes/interrupts any in-progress Web Audio playback.

Fixes:

  • playAudioCue now returns a Promise that resolves when the sound finishes playing (or immediately on error), enabling proper sequencing
  • Mic-on cue plays BEFORE mic activation: await playAudioCue("mic-on") is called before getUserMedia() in startRecording, so the sound completes before iOS switches audio sessions
  • Mic stream stops BEFORE mic-off cue: In stopRecording, stream tracks are released before calling transcribeAndSend (which plays mic-off), so the audio session is free for playback
  • iOS silent switch bypass: Sets navigator.audioSession.type = 'playback' (Safari 17+) before playing each cue, so audio plays regardless of the physical silent switch position
  • cancelGeneration() now calls stopPlayback(): Closes the AudioContext to prevent audio from playing if cancellation happens during the async arrayBuffer()/decodeAudioData() window (Devin Review fix)

Debug buttons (temporary — to be removed)

Two test buttons ("Mic On" green, "Mic Off" red) are rendered above the input box for the user to verify audio cue playback on mobile Safari. These are explicitly temporary and will be removed in a follow-up change once audio cue behavior is confirmed end-to-end.

Earlier updates

TestFlight Bug Fixes (round 2)
  • iOS audio cues still not playing (Bug 2a): AudioContext on iOS starts in "suspended" state. Added ctx.resume() before fetching and decoding the WAV file. Without this, source.start(0) silently fails on iOS because the context never transitions to "running".
  • Previous recording duration flashes on re-entry (Bug 2b): setDuration(0) inside useEffect fires after the first render with stale state, causing a one-frame flash of the previous duration. Added a synchronous state reset during render (comparing prevEffectiveStateRef) that runs before React paints, eliminating the flash.
  • Compact playback overlay layout imbalanced (Bug 2c): Waveform was pushed to the top and status text to the bottom due to gap-6 spacing. Reduced to gap-2 in compact mode (isCompact ? "gap-2" : "gap-6") for a tighter, more balanced layout.
  • TTS auto-play after model download not triggering (Bug 2d): When ttsStatus transitions to "ready" mid-voice-loop, the user is already back in recording state with the mic active. The prevTtsStatusRef effect now stops the active recording (cleans up recorder and stream) before transitioning to "generating" state and calling speakAndWait.
Devin Review rounds 11–13
  • Timer flash on re-entering recording (round 11): Added setDuration(0) at the start of the recording effect in RecordingOverlay.tsx to prevent a one-frame flash of the previous recording's stale duration before requestAnimationFrame resets it.
  • Double mic-on audio cue (round 11): Removed playAudioCue("mic-on") from all 4 voice continuation caller sites in UnifiedChat.tsx. startRecording() is now the single source of truth for the mic-on cue, eliminating the stutter/echo caused by double playback.
  • AudioContext resource leak (round 12): audioContextRef.current is now assigned immediately after new AudioContext() creation (before decodeAudioData, resume, etc.) so stopPlayback() can close it if any subsequent operation throws. Previously, the ref was set after several async operations, leaving orphaned contexts on failure.
  • Empty preprocessed TTS text (round 12): speakInternal now throws Error("no_speakable_text") (instead of silently returning) when preprocessTextForTTS strips all content (e.g., code-only assistant responses). Both voice continuation catch blocks handle this by restarting recording with audio feedback, rather than silently activating the mic.
  • Voice mode exit on startRecording failure (round 13): startRecording's catch block now calls exitVoiceMode() when voice mode is active, preventing a stuck "Recording" overlay when the microphone is unavailable (permission denied, device busy, etc.).
  • Stale audioContextRef guard (round 13): The staleness check after audioContext.resume() now only nulls audioContextRef.current if it still points to this call's AudioContext, preventing a concurrent speakInternal call's ref from being clobbered.
TestFlight Bug Fixes (round 1)
  • TTS auto-play after model download (Bug 1): Added a prevTtsStatusRef effect that watches for ttsStatus transitioning from non-ready → "ready" while voice mode is active. When the user downloads a TTS model mid-voice-loop, this retroactively speaks the last assistant message instead of silently skipping TTS.
  • Compact overlay showing "Playing" state (Bug 2): The bottom input overlay (isCompact={true}) previously hid all status content and waveform — showing only a black overlay with an X button during TTS playback. Now shows status text (Playing, Generating, Waiting, Error) and animated waveform in compact mode for non-recording states.
  • iOS audio cues (Bug 3): Replaced new Audio('/audio/file.wav') with Web Audio API (AudioContext + fetch + decodeAudioData + BufferSource). HTMLAudioElement.play() is unreliable in iOS Tauri WebView due to autoplay restrictions.
  • Empty blob recovery race (Devin Review round 10): startRecordingRef.current() after empty blob recovery now uses setTimeout(0) to defer until React's setIsRecording(false) batch commits.
  • Prettier formatting fix for CI.
Earlier fixes (rounds 1–9)
  • Non-voice overlay dismissal (round 6): Added setVoiceState(null) after successful transcription in non-voice mode so the overlay dismisses.
  • Voice mode error guard (round 6): Voice continuation effect now checks errorRef.current before proceeding with TTS. If handleSendMessage failed, voice mode exits gracefully instead of speaking stale messages.
  • TTS failure loop break (round 7): speakInternal re-throws errors after handling them locally, allowing speakAndWait's .catch() to call exitVoiceMode().
  • speak() wrapper error handling (round 8): The speak() wrapper (used by TTSButton) now catches errors from speakInternal to prevent unhandled promise rejections. Only speakAndWait (voice mode) propagates errors.
  • exitVoiceMode stale TTS cleanup (round 5): Calls cancelTTSGeneration() and stopTTS() unconditionally instead of guarding on stale ttsIsGenerating/ttsIsPlaying.
  • Earlier fixes (rounds 1–4): recordingStartTimeRef for accurate duration capture, startRecordingRef for stale closure in handleVoiceDiscard, handleSendMessageRef/handleTTSDiscoveryRef for stale closures, isTauri() guard on iOS TTS platform check, ?? for 0-second duration display, recording restart on empty blob, mic leak fix.

Review & Testing Checklist for Human

⚠️ Risk Level: YELLOW — Complex async state machine with 14 rounds of incremental fixes + iOS-specific audio session changes; all iOS bugs tested blind without device access. Key risks: (1) await playAudioCue() blocks startRecording until sound finishes — verify no hang on iOS; (2) navigator.audioSession.type = 'playback' may conflict with subsequent getUserMedia's audio session switch; (3) debug buttons are temporary and should NOT be merged; (4) mic stream cleanup moved earlier in stopRecording — ensure transcription still works.

  • Audio cues on iOS with silent switch (both on/off): On iOS, test mic button with silent switch both ON and OFF. Verify mic-on.wav plays before mic activates and mic-off.wav plays after successful send, regardless of switch position.
  • Debug buttons (TEMPORARY): Two buttons ("Mic On" green, "Mic Off" red) are visible above the input box. These are for debugging only — verify they work on iOS, then remind reviewer to remove them before merging.
  • Audio cue timing (no hang): Verify tapping the mic button on iOS does not hang — the mic should activate shortly after the mic-on sound finishes playing. If it hangs, check if source.onended is firing.
  • Voice mode state machine: Test full loop (record → send → wait → TTS → auto-record) on desktop or iOS. Verify clean exit on: chat switch, new chat, mic button tap, X button, speaker icon tap.
  • cancelGeneration() behavior: During TTS generation in voice mode, tap the speaker icon to cancel. Verify no audio plays after cancellation (even if decoding was in progress).

Notes

  • Code changes:
    • TTSContext.tsx: cancelGeneration() now calls stopPlayback() to close AudioContext during async decoding window
    • UnifiedChat.tsx: playAudioCue() returns Promise, startRecording awaits mic-on cue before getUserMedia, stopRecording stops stream before mic-off cue, navigator.audioSession.type = 'playback' set before each cue playback, DEBUG BUTTONS added above input (temporary)
  • Audio files: frontend/public/audio/mic-on.wav and mic-off.wav (both 394888 bytes, confirmed served)
  • iOS silent switch bypass: Uses navigator.audioSession API (Safari 17+); degrades gracefully on older browsers
  • 15 rounds of fixes: 14 rounds Devin Review + 2 rounds TestFlight; tested blind on iOS TestFlight
  • Debug buttons: Temporary test UI for audio cue verification — to be removed in follow-up change

Link to Devin Session: https://app.devin.ai/sessions/0c853f0e1ba84474971875a61f616769
Requested by: @marksftw


Open with Devin

…rror recovery, and TTS discovery

- Add voice mode state machine: Recording → Processing → Waiting → Generating → Playing → Recording loop
- Add audio error recovery with blob retention, retry, and discard
- Add audio cues (mic-on.wav, mic-off.wav) for recording state transitions
- Add TTS discovery prompt for first-time users on supported platforms
- Extend RecordingOverlay with 6 voice mode states (recording, processing, error, waiting, generating, playing)
- Update TTSContext with isGenerating, speakAndWait, cancelGeneration, and sequence ID tracking
- Update TTSButton with generating animation state
- Add voice mode exit on chat switch, new chat, and manual stop
- Highlight mic button when voice mode is active

Closes #458

Co-Authored-By: marks <markskram@protonmail.com>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Co-Authored-By: marks <markskram@protonmail.com>
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Mar 6, 2026

Deploying maple with  Cloudflare Pages  Cloudflare Pages

Latest commit: bb814a4
Status: ✅  Deploy successful!
Preview URL: https://b0b3b0ae.maple-ca8.pages.dev
Branch Preview URL: https://devin-1772755465-voice-mode.maple-ca8.pages.dev

View logs

devin-ai-integration[bot]

This comment was marked as resolved.

…VoiceDiscard

- Add recordingStartTimeRef to track when recording starts (RecordRTC has no startTime property)
- Add startRecordingRef to avoid stale closure in handleVoiceDiscard's empty dependency array
- capturedDuration now correctly reflects actual recording time instead of always being 0

Co-Authored-By: marks <markskram@protonmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

… for savedDuration

- Replace all 3 startRecording() calls in voice continuation effect with startRecordingRef.current()
- Change recordingDuration || undefined to recordingDuration ?? undefined (both inputs)
  to correctly show 0-second durations in error UI

Co-Authored-By: marks <markskram@protonmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…g on empty blob

- isTTSPlatform now checks isTauri() && isIOS() to match TTSContext behavior
- Empty blob in voice mode now calls startRecordingRef.current() to restart

Co-Authored-By: marks <markskram@protonmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration bot and others added 2 commits March 6, 2026 00:54
…ssage/handleTTSDiscovery

- exitVoiceMode: check recorderRef.current instead of isRecording state to avoid stale closure in event handler effect
- Add handleSendMessageRef and handleTTSDiscoveryRef to prevent stale closures in transcribeAndSend
- transcribeAndSend now calls through refs for latest handleSendMessage and handleTTSDiscovery

Co-Authored-By: marks <markskram@protonmail.com>
Co-Authored-By: marks <markskram@protonmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…e to avoid stale closure

These functions are idempotent, so guarding on ttsIsGenerating/ttsIsPlaying
was unnecessary and caused stale closures when exitVoiceMode was captured
in the event listener effect.

Co-Authored-By: marks <markskram@protonmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

@marksftw
Copy link
Contributor

marksftw commented Mar 6, 2026

@TestFlight build

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

🚀 TestFlight deployment triggered! Check the Actions tab for progress.

…on generation error

- Set voiceState(null) after successful transcription in non-voice mode to dismiss overlay
- Add errorRef to track error state without stale closures
- Voice continuation effect checks errorRef.current and exits voice mode on error
  instead of speaking stale assistant message

Co-Authored-By: marks <markskram@protonmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

✅ TestFlight deployment completed successfully!

devin-ai-integration[bot]

This comment was marked as resolved.

…de exits

Without re-throwing, speakAndWait always resolves normally even on TTS failure,
causing the voice mode continuation effect's .catch() to never fire and creating
an infinite loop: TTS fails → recording restarts → repeat.

Co-Authored-By: marks <markskram@protonmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…n TTSButton

speak() is used by TTSButton which doesn't have try/catch. Only speakAndWait
(used by voice mode loop) needs error propagation to exit on TTS failure.

Co-Authored-By: marks <markskram@protonmail.com>
@marksftw
Copy link
Contributor

marksftw commented Mar 6, 2026

@TestFlight build

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

🚀 TestFlight deployment triggered! Check the Actions tab for progress.

devin-ai-integration[bot]

This comment was marked as resolved.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

✅ TestFlight deployment completed successfully!

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

✅ TestFlight deployment completed successfully!

devin-ai-integration bot and others added 2 commits March 6, 2026 04:01
…, compact layout gap, TTS auto-play after download

Bug 1: Audio cues not playing on iOS - AudioContext starts in 'suspended' state
on iOS; added ctx.resume() before fetching and playing the wav file.

Bug 2: Previous recording duration flashes on re-entry - setDuration(0) in
useEffect fires after the first render with stale state. Added synchronous
state reset during render when effectiveState transitions to 'recording'.

Bug 3: Compact playback overlay layout imbalanced - waveform too near top,
status text too near bottom. Reduced gap from gap-6 to gap-2 in compact mode.

Bug 4: TTS auto-play after model download not triggering - when ttsStatus
transitions to 'ready' mid-voice-loop, the user may already be recording.
Now stops active recording before transitioning to 'generating' state.

Co-Authored-By: marks <markskram@protonmail.com>
Co-Authored-By: marks <markskram@protonmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration bot and others added 2 commits March 6, 2026 14:24
…c decoding window

Co-Authored-By: marks <markskram@protonmail.com>
… on iOS

- playAudioCue now returns a Promise that resolves when playback ends
- startRecording: await playAudioCue('mic-on') BEFORE getUserMedia
- stopRecording: stop mic stream tracks BEFORE calling transcribeAndSend (which plays mic-off)
- Set navigator.audioSession.type = 'playback' to bypass iOS silent switch (Safari 17+)

Co-Authored-By: marks <markskram@protonmail.com>
@marksftw
Copy link
Contributor

marksftw commented Mar 6, 2026

@TestFlight build

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

🚀 TestFlight deployment triggered! Check the Actions tab for progress.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

✅ TestFlight deployment completed successfully!

devin-ai-integration bot and others added 5 commits March 6, 2026 15:13
The 'playback' session type is incompatible with getUserMedia on iOS.
Now saves the previous session type before setting 'playback' and
restores it in onended/catch/error paths so mic activation works.

Co-Authored-By: marks <markskram@protonmail.com>
'playback' session type caused two issues:
1. Blocked getUserMedia with InvalidStateError
2. Second mic-on cue was silent after a recording cycle

'play-and-record' bypasses the iOS silent switch AND is compatible
with mic capture, eliminating the need for save/restore logic.

Co-Authored-By: marks <markskram@protonmail.com>
Co-Authored-By: marks <markskram@protonmail.com>
The WAV files are now at the correct volume from the source.
Added ?v=2 query param to bypass mobile Safari's cached old files.

Co-Authored-By: marks <markskram@protonmail.com>
…bust mobile Safari cache

Co-Authored-By: marks <markskram@protonmail.com>
@marksftw
Copy link
Contributor

marksftw commented Mar 6, 2026

@TestFlight build

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

🚀 TestFlight deployment triggered! Check the Actions tab for progress.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

✅ TestFlight deployment completed successfully!

devin-ai-integration bot and others added 2 commits March 6, 2026 19:08
…e debug buttons

Co-Authored-By: marks <markskram@protonmail.com>
Co-Authored-By: marks <markskram@protonmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Co-Authored-By: marks <markskram@protonmail.com>
@marksftw
Copy link
Contributor

marksftw commented Mar 6, 2026

@TestFlight build

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

🚀 TestFlight deployment triggered! Check the Actions tab for progress.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

✅ TestFlight deployment completed successfully!

…oice loop

Co-Authored-By: marks <markskram@protonmail.com>
@marksftw
Copy link
Contributor

marksftw commented Mar 7, 2026

@TestFlight build

@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

🚀 TestFlight deployment triggered! Check the Actions tab for progress.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

✅ TestFlight deployment completed successfully!

…ckgrounding

Co-Authored-By: marks <markskram@protonmail.com>
Copy link
Contributor Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 46 additional findings in Devin Review.

Open in Devin Review

Comment on lines +3108 to +3135
// continues playing via the system media controller. So when going to background:
// - Stop the mic recording if active (it will be killed by iOS anyway)
// - Do NOT stop TTS playback — let it continue in the background
// On foreground return, reset any stuck recording flags so the user can start fresh.
useEffect(() => {
const handleVisibilityChange = () => {
if (document.visibilityState === "hidden") {
// App went to background — if actively recording, stop the mic cleanly.
// iOS will kill the stream anyway; this prevents corrupted state.
// Do NOT call exitVoiceMode() — that would also stop TTS playback
// which should continue in the background via the system media controller.
if (recorderRef.current) {
const recorderToCleanup = recorderRef.current;
const streamToCleanup = streamRef.current;
recorderRef.current = null;
streamRef.current = null;
if (streamToCleanup) {
streamToCleanup.getTracks().forEach((track) => track.stop());
}
recorderToCleanup.stopRecording(() => {
// Resources already cleaned up synchronously above.
});
setIsRecording(false);
}
} else if (document.visibilityState === "visible") {
// App came back to foreground — if recording flags are stuck
// (e.g. iOS killed the stream while backgrounded), reset them
// so the user can start a new recording.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Voice overlay shows stale "Recording" state after app returns from background

When the app is backgrounded during voice-mode recording, the visibilitychange handler (UnifiedChat.tsx:3108-3135) correctly stops the mic recorder and sets isRecording(false), but does NOT update voiceState (which remains "recording") or voiceModeRef. When the app returns to the foreground, the RecordingOverlay is still shown (because isRecording || voiceStatefalse || "recording" is truthy at UnifiedChat.tsx:3547), displaying "Recording" with an animated waveform and a Send button — but no microphone stream is active. Tapping Send calls stopRecording(true), which checks recorderRef.current && isRecording (both falsy) and silently does nothing. The user is stuck in a misleading overlay and must tap Cancel/X to dismiss it.

Prompt for agents
In frontend/src/components/UnifiedChat.tsx, inside the visibilitychange handler's "visible" branch (around line 3126-3134), after resetting the stuck recording flags, add logic to handle voice mode recovery. If voiceModeRef.current is true and voiceState is "recording" but there is no active recorder (recorderRef.current is null), either restart recording by calling startRecordingRef.current() after a short setTimeout, or update voiceState to reflect the interrupted state (e.g., set it to an appropriate state that prompts the user to tap to resume). The simplest fix is to restart recording automatically:

After line 3133 (setIsProcessingSend(false)), add:

  // Voice mode: if recording was interrupted by backgrounding, restart it
  if (voiceModeRef.current && voiceState === "recording") {
    setTimeout(() => {
      if (voiceModeRef.current) {
        startRecordingRef.current();
      }
    }, 300);
  }

Note that voiceState must be added to the effect's closure scope or accessed via a ref. Since the effect has an empty dependency array [], voiceState will be stale. You may need to use a voiceStateRef or restructure the effect.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant