Interpreter — Live Speaker Translation

A Chrome extension + FastAPI backend that translates live audio from web chat apps (Google Meet, Zoom, Discord, etc.) in real time. Hear everything in your native language, or route translated audio into the call.

Architecture

flowchart LR
    subgraph ext ["Chrome Extension (MV3)"]
        A["Tab Audio\n(tabCapture)"] --> B["Offscreen Doc\n(PCM extract)"]
        B --> C["Service Worker\n(orchestrator)"]
        G["Translated Audio"] --> C
        C --> H["Offscreen Playback\n(selected output device)"]
        H --> I["BlackHole / Speakers"]
    end

    subgraph web ["Web Dashboard"]
        J["Sign In\n(Clerk Auth)"] --> K["Profile Setup\n(language, name)"]
        K --> L["Voice Warmup\n(record & clone)"]
    end

    C -- WebSocket --> D["FastAPI Backend"]
    D --> E["Speechmatics STT\n+ RT Translation"]
    E --> F["TTS Provider\n(MiniMax / Speechmatics)"]
    F --> G
    L -- "Voice profile" --> M["Convex DB"]
    D -- "Lookup voice profile" --> M

Audio routing flow:

Route	What happens
Speakers (default)	You hear the translated audio locally
BlackHole → Meet	Translated audio is routed into the call as your "microphone" input

Quick Start

1. Backend

cd backend
cp .env.example .env    # Add your API keys
uv sync
uv run uvicorn main:app --reload --port 8000

2. Extension

cd extension
bun install
bun run dev

Then load in Chrome:

Go to chrome://extensions
Enable Developer Mode
Click Load unpacked → select extension/dist folder
Allow microphone permission when prompted (needed for device enumeration)

3. Web Dashboard (Voice Profile & Clone Setup)

cd web
npm install
npx convex dev          # Start Convex dev server (needs CONVEX_DEPLOYMENT in .env.local)
npm run dev             # Starts on http://localhost:5174

The web dashboard requires a .env.local file with:

VITE_CONVEX_URL=<your Convex deployment URL>
VITE_CLERK_PUBLISHABLE_KEY=<your Clerk publishable key>
VITE_SERVER_URL=http://localhost:8000
VITE_CONVEX_SITE_URL=<your Convex site URL>
CONVEX_DEPLOYMENT=<your Convex deployment>

4. Use It

Open any web chat (Google Meet, YouTube, etc.)
Click the Interpreter extension icon
Pick source + target language
Select an output device (speakers or BlackHole)
Hit Start Translation
Hear translated audio live 🎧

Notes:

Original tab audio passthrough is disabled in offscreen capture, so you should not hear untranslated + translated from the extension at the same time.
If output is set to BlackHole 2ch, local speakers are silent by design unless you monitor with a Multi-Output device.

Web Dashboard (Voice Profile & Cloning)

The web dashboard (web/) is a React app for managing user profiles and voice cloning. It lets users:

Sign in via Clerk authentication
Set a display name and preferred language
Record a voice sample and create a MiniMax voice clone

When voice cloning is enabled, the backend looks up the speaker's voice profile from Convex during translation so listeners hear translated speech rendered in the original speaker's cloned voice.

Stack: React 18, Vite, Convex (database + file storage), Clerk (auth)

How it connects:

User signs in and creates a profile on the web dashboard.
User records a voice sample; the app uploads it to Convex storage and sends it to MiniMax to create a voice clone (voiceProfileId).
During a live call, the backend queries the Convex HTTP endpoint (GET /api/voice-profile?userId=...) to fetch the speaker's voiceProfileId.
If a valid profile exists, TTS renders in the cloned voice; otherwise it falls back to a standard voice.

BlackHole Setup (Route Audio into Calls)

To let other meeting participants hear the translated audio:

Install BlackHole

brew install blackhole-2ch

Or download from existential.audio/blackhole.

Configure

Extension: In the popup, select BlackHole 2ch as the Translation Output device
Google Meet: Go to Meet Settings → Audio → set Microphone to BlackHole 2ch

Now when you start translation, the translated audio plays into BlackHole, which Meet picks up as your microphone input. Other participants hear the translation.

Tip: To hear the call yourself while routing audio into Meet, create a macOS Multi-Output Device in Audio MIDI Setup that combines your speakers + BlackHole.

API Keys Required

Service	Credit	How to Get
Speechmatics	$200 (code: `VOICEAGENT200`)	portal.speechmatics.com
MiniMax	$20	minimax.io

Latency Tuning

You can tune backend chunking and partial update behavior in backend/.env:

# Translation chunking
TRANSLATION_TRIGGER_CHAR_THRESHOLD=24

# Partial translated-text UI throttling
TRANSLATION_PARTIAL_MIN_DELTA_CHARS=12
TRANSLATION_PARTIAL_MIN_INTERVAL_MS=300

# Speechmatics finalization speed
SPEECHMATICS_MAX_DELAY=1.0
SPEECHMATICS_RT_WS_URL=wss://eu.rt.speechmatics.com/v2/

# Translation provider mode
USE_SPEECHMATICS_TRANSLATION=1
SPEECHMATICS_TRANSLATION_ENABLE_PARTIALS=1

# TTS provider mode
TTS_PROVIDER=speechmatics
# TTS_PROVIDER=minimax
# SPEECHMATICS_TTS_OUTPUT_FORMAT=wav_16000
# SPEECHMATICS_TTS_VOICE_ID=sarah

Guidance:

Keep USE_SPEECHMATICS_TRANSLATION=1 for the lowest end-to-end delay.
TTS_PROVIDER=speechmatics is now the default.
Switch to TTS_PROVIDER=minimax for broader multilingual voice coverage.
Lower TRANSLATION_TRIGGER_CHAR_THRESHOLD for faster response.
Higher TRANSLATION_TRIGGER_CHAR_THRESHOLD for fewer, larger chunks.
Lower SPEECHMATICS_MAX_DELAY for faster final transcripts.
Raise partial throttles if live text appears to constantly rewrite.

Tech Stack

Extension: React, TypeScript, Vite, CRXJS, Chrome MV3
Backend: Python, FastAPI, WebSocket, uv
Web Dashboard: React 18, TypeScript, Vite, Convex, Clerk
STT: Speechmatics Real-time API
Translation: Speechmatics RT Translation (recommended low-latency mode) or MiniMax M2 fallback
TTS: MiniMax Speech 2.8 Turbo (default) or Speechmatics preview TTS (test option)
Voice Cloning: MiniMax Voice Clone API (via web dashboard voice warmup flow)
Audio Routing: BlackHole (macOS virtual audio loopback)

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.vscode		.vscode
backend		backend
extension		extension
web		web
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
architecture.md		architecture.md
walkthrough.md		walkthrough.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interpreter — Live Speaker Translation

Architecture

Quick Start

1. Backend

2. Extension

3. Web Dashboard (Voice Profile & Clone Setup)

4. Use It

Web Dashboard (Voice Profile & Cloning)

BlackHole Setup (Route Audio into Calls)

Install BlackHole

Configure

API Keys Required

Latency Tuning

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

shlawgathon/interpreter

Folders and files

Latest commit

History

Repository files navigation

Interpreter — Live Speaker Translation

Architecture

Quick Start

1. Backend

2. Extension

3. Web Dashboard (Voice Profile & Clone Setup)

4. Use It

Web Dashboard (Voice Profile & Cloning)

BlackHole Setup (Route Audio into Calls)

Install BlackHole

Configure

API Keys Required

Latency Tuning

Tech Stack

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages