fix: harden audio pipeline, improve barge-in, rewrite docs#15
Merged
abdul-abdi merged 13 commits intomainfrom Mar 15, 2026
Merged
fix: harden audio pipeline, improve barge-in, rewrite docs#15abdul-abdi merged 13 commits intomainfrom
abdul-abdi merged 13 commits intomainfrom
Conversation
Audio stability: ambient-calibrated VAD thresholds, barge-in reverb guard, playback drain detection, session resumption poison counter. Docs: rewrite README with tools list, core pipeline, GCP deployment section, grounding strategy, and persona description. Consolidate architecture into root ARCHITECTURE.md. Remove stale docs/plans.
The .titled style mask created an invisible title bar with safe area insets that pushed content down, clipping the text input at the bottom. Since canBecomeKey is overridden and isMovableByWindowBackground handles dragging, .titled is unnecessary.
- Mouse: "precise" → "coordinate accuracy depends on Gemini's vision" - Pipeline: fix chunk size (512 samples ~10ms, not 100ms), remove client-side VAD reference (removed in this branch), add actual barge-in params (1.3x threshold, 2 frames, 300ms reverb guard) - Safety: rewrite entirely — do shell script blocked wholesale (not pattern-filtered), metacharacter list from actual code, timeouts are 60s for all scripts (not 120s for AppleScript), destructive action guardrail is prompt-level not code-enforced - Grounding: screen feedback is continuous 2 FPS (not per-action), accessibility data provided to Gemini (not cross-referenced in code) - Tools: add missing key_state to system category - Terminal blocklist: list all 9 actual blocked apps from code
Without this, the proxy falls back to an in-memory device store which loses all registered device tokens when the instance scales to zero. With GCP_PROJECT_ID set, it uses Firestore-backed storage and tokens persist across restarts.
Newer ADK versions require sessions to exist before run_async. InMemoryRunner was throwing SessionNotFoundError because it passed a random session_id that didn't exist. Switch to explicit Runner with InMemorySessionService and pre-create sessions before each agent invocation.
The daemon was exiting immediately after session.disconnect(), killing the processor before it could run end-of-session consolidation and memory agent ingest. Now waits up to 15s for the processor to finish, ensuring facts are extracted and synced to the cloud before the process exits.
Three fixes for end-of-session memory agent ingest never firing: 1. Add SIGTERM signal handler so the daemon catches terminate signals from the SwiftUI parent app (previously only caught SIGINT/Ctrl+C). 2. Call session.disconnect() BEFORE cancel.cancel() — the processor's select loop has a cancel.cancelled() branch that was firing first, causing it to skip the Disconnected event handler where consolidation and memory agent ingest run. 3. SwiftUI app waits up to 12s after SIGTERM for daemon to finish graceful shutdown before falling back to SIGINT. Also tunes barge-in sensitivity: 1.3x→1.5x threshold multiplier, 2→3 consecutive frames — fixes false interruptions on MacBook speakers.
The memory_agent_handled flag was skipping Firestore sync under the assumption the memory agent wrote facts server-side. It doesn't — it only returns them. Now facts always sync to Firestore when config is present, regardless of whether the memory agent or local consolidation extracted them. SQLite saves still happen first as the local fallback.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stabilizes the real-time audio pipeline, eliminates false barge-in triggers, reduces end-to-end latency, and rewrites all user-facing documentation for hackathon submission.
3acee7b— fix: use security CLI for keychain (no ACL prompts), harden VAD for barge-inKeychainHelper rewrite (AuraApp/Sources/KeychainHelper.swift):
/usr/bin/securityCLI for all Keychain operations (save, read, delete).-Aflag to allow any local application to read the item without triggering a password dialog.save(account:data:)andread(account:)wrappers preserved for compatibility.VAD mode change (aura-voice/src/vad.rs):
Qualitymode toVeryAggressivemode — more tolerant of background noise, reduces false speech detection that was triggering unnecessary audio forwarding during playback.42e968e— fix: harden audio pipeline, rewrite docs for hackathon submissionAudio capture (aura-voice/src/audio.rs)
Playback (aura-voice/src/playback.rs)
Barge-in & mic pipeline (aura-daemon/src/orchestrator.rs)
PLAYBACK_THRESHOLD_MULTIPLIERfrom 1.8 to 1.3 — the old value was too aggressive and swallowed real user speech. New value filters speaker bleed (0.005-0.03 RMS) while passing direct speech (0.05-0.3 RMS).BARGE_IN_CONSECUTIVE_FRAMESfrom 3 to 2 — with the smaller 512-sample chunks, 2 frames ≈ 20ms of sustained speech is sufficient confirmation.Session management (aura-gemini/src/session.rs)
SESSION_ROTATION_SECS = 600). Long-lived Gemini audio sessions exhibit gradual latency degradation — rotating the WebSocket with session resumption resets server-side state.DisconnectReasonenum (Shutdown,GoAway,Rotate) replacing the previousboolreturn for cleaner control flow.GoAwayclears the resumption handle (server is done with the session).Rotatekeeps the handle for seamless resume.Shutdownis a clean exit.Screen capture (aura-daemon/src/screen_capture.rs)
System prompt (aura-gemini/src/config.rs)
run_javascriptis now the preferred tool for all browser interactions (Safari/Chrome) — faster and more reliable than coordinate-based clicking. Falls back toclick(x, y)only for browser chrome or when JS fails.Documentation (README.md, ARCHITECTURE.md)
docs/ARCHITECTURE.md).d869e61— docs: add onboarding flow, keychain auth, and permission details to READMEidentifier "com.aura.desktop") so TCC grants survive rebuilds without re-prompting.securityCLI (not Security.framework) to avoid per-app ACL prompts between the SwiftUI app and Rust daemon, with background retry on registration failure.dev.shvsbundle.shin Build from source:dev.shcallsbundle.shinternally then installs and launches;bundle.shbuilds without installing. Adds--dmgflag for distribution builds.Test plan
bash scripts/dev.shrun_javascriptfirstdev.shand confirm permissions are NOT re-prompted (stable DR)