AI-Powered Voice Transcription for Windows
Live dictation, YouTube transcription, and file processing with LLM-powered summaries
Features β’ Screenshots β’ Quick Start β’ Usage β’ Configuration
Press a global hotkey (Ctrl+Alt+S by default) from anywhere on your system to instantly start recording. A sleek overlay appears with real-time audio visualization and transcription. Perfect for taking quick notes, writing emails, or dictating documents.
Paste any YouTube URL or search for videos directly within the app. Scriber downloads the audio and transcribes it with speaker diarization, making it ideal for podcasts, interviews, lectures, and video research.
Drag & drop audio or video files up to 2GB. Scriber automatically extracts audio from video formats (MP4, MOV, MKV, etc.) and transcribes them. Supports MP3, WAV, FLAC, M4A, and many more formats.
Generate intelligent summaries of your transcripts using Google Gemini or OpenAI GPT models. Customize the summarization prompt to get exactly the output format you needβbullet points, action items, or full prose.
Automatically identify and label different speakers in your transcripts with color-coded badges. Essential for meetings, interviews, and multi-person recordings.
Export your transcripts and summaries to PDF or DOCX with proper formatting. Markdown in summaries is rendered correctly, and speaker labels are preserved.
Quickly find any transcript with instant search across all your recordings. Each category (Live Mic, YouTube, Files) maintains its own searchable history.
Scriber runs silently in your system tray. Access recent recordings, view logs, or control the app with a right-clickβno windows cluttering your desktop.
Instant voice-to-text with real-time audio visualization and recording history
Search YouTube or paste URLs to transcribe any video with speaker identification
Drag & drop audio/video files for automatic transcription
Full transcript view with AI summary, speaker labels, and export options
Configure transcription models, hotkeys, and API integrations
-
Clone the repository
git clone https://github.com/MyButtermilk/Scriber.git cd Scriber -
Run the launcher
start.bat
This will automatically:
- Create a Python virtual environment
- Install all backend dependencies
- Install frontend dependencies (npm)
- Launch the application
-
Access the Web UI
The app opens automatically at
http://localhost:5000. A tray icon appears for background control.
- Python 3.10+
- Node.js 18+
- FFmpeg (for video file processing)
Press Ctrl+Alt+S (configurable) from anywhere to toggle recording. The live overlay shows:
- Real-time audio levels
- Interim transcription text
- Recording duration
| Tab | Purpose |
|---|---|
| Live Mic | View real-time transcription and recording history |
| YouTube | Paste URLs or search to transcribe videos |
| Files | Upload audio/video files for processing |
| Settings | Configure models, hotkeys, and API keys |
Right-click the tray icon to:
- Recent Recordings: Click to copy transcript to clipboard
- View Logs: Debug issues with backend/frontend
- Open Web UI: Launch the browser interface
- Restart / Quit: Control the application
Scriber uses environment variables and a .env file for configuration. Key settings:
| Provider | Env Variable | Features |
|---|---|---|
| Soniox | SONIOX_API_KEY |
Real-time streaming, speaker diarization |
| Deepgram | DEEPGRAM_API_KEY |
Nova-2 model, fast processing |
| OpenAI | OPENAI_API_KEY |
Whisper model |
| AssemblyAI | ASSEMBLYAI_API_KEY |
Universal model |
| Azure | AZURE_SPEECH_KEY |
Microsoft Speech Services |
| Gladia | GLADIA_API_KEY |
Multi-language support |
| Speechmatics | SPEECHMATICS_API_KEY |
Enterprise-grade accuracy |
| AWS | AWS_ACCESS_KEY_ID |
Transcribe service |
| Provider | Env Variable |
|---|---|
| Google Gemini | GOOGLE_API_KEY |
| OpenAI | OPENAI_API_KEY |
# Recording
SCRIBER_HOTKEY=ctrl+alt+s
SCRIBER_DEFAULT_STT=soniox
SCRIBER_MIC_DEVICE=default
# Summarization
SCRIBER_AUTO_SUMMARIZE=0
SCRIBER_SUMMARIZATION_MODEL=gemini-2.0-flash
# YouTube
YOUTUBE_API_KEY=your_key_hereβββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β System Tray ββββββΆβ Python Backend βββββββ React Frontend β
β (tray.py) β β (web_api.py) β β (Vite + React) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
β βΌ β
β βββββββββββββββββββ β
β β SQLite DB β β
β β (transcripts) β β
β βββββββββββββββββββ β
β β β
βΌ βΌ βΌ
Global Hotkeys STT Pipeline WebSocket
Overlay Window (Multiple Providers) (Real-time Updates)
Key Components:
src/tray.py: Entry point, manages process lifecyclesrc/web_api.py: aiohttp server with REST API + WebSocketsrc/pipeline.py: STT provider abstractionsrc/export.py: PDF/DOCX generationFrontend/: React 19 + Vite + Tailwind CSS
The live microphone recording follows this state flow:
stateDiagram-v2
[*] --> Idle
Idle --> Preparing : Hotkey pressed
Preparing --> Recording : Mic ready
Preparing --> Idle : Error (mic unavailable)
Recording --> Transcribing : Hotkey pressed (stop)
Recording --> Recording : Audio captured
Transcribing --> Completed : STT finished
Transcribing --> Failed : STT error
Completed --> [*]
Failed --> [*]
note right of Preparing
Shows "Preparing..." overlay
Initializes microphone
end note
note right of Recording
Shows audio visualization
Real-time transcription
end note
Background transcription jobs (YouTube and File uploads) follow this workflow:
stateDiagram-v2
[*] --> Queued
Queued --> Downloading : Start processing
Downloading --> Extracting : Download complete (video)
Downloading --> Transcribing : Download complete (audio)
Downloading --> Failed : Download error
Downloading --> Stopped : User cancelled
Extracting --> Transcribing : Audio extracted
Extracting --> Failed : FFmpeg error
Transcribing --> Summarizing : Transcription complete (auto-summarize on)
Transcribing --> Completed : Transcription complete
Transcribing --> Failed : STT error
Transcribing --> Stopped : User cancelled
Summarizing --> Completed : Summary generated
Summarizing --> Completed : Summary failed (transcript still saved)
Completed --> [*]
Failed --> [*]
Stopped --> [*]
Each transcript record transitions through these statuses:
stateDiagram-v2
[*] --> recording : Live mic started
[*] --> processing : YouTube/File queued
recording --> completed : Stop + finalize
recording --> failed : Pipeline error
processing --> completed : Transcription done
processing --> failed : Error occurred
processing --> stopped : User cancelled
completed --> completed : Summary added
completed --> completed : Export generated
note right of completed
Content persisted to SQLite
Available for search/export
end note
Real-time communication between backend and frontend:
sequenceDiagram
participant F as Frontend
participant B as Backend
participant S as STT Service
F->>B: Connect WebSocket
B->>F: state (current status)
Note over F,B: User starts recording
F->>B: toggle (via hotkey)
B->>F: session_started
loop Recording
B->>S: Audio stream
S->>B: Transcription (interim)
B->>F: transcript (text, isFinal=false)
S->>B: Transcription (final)
B->>F: transcript (text, isFinal=true)
end
Note over F,B: User stops recording
F->>B: toggle (via hotkey)
B->>F: transcribing
B->>F: session_finished
B->>F: history_updated
Scriber/
βββ src/
β βββ tray.py # System tray & process manager
β βββ web_api.py # Backend API server
β βββ pipeline.py # STT provider orchestration
β βββ database.py # SQLite persistence
β βββ export.py # PDF/DOCX export
β βββ overlay.py # Recording overlay window
β βββ config.py # Configuration loader
βββ Frontend/
β βββ client/
β βββ src/
β βββ pages/ # React page components
β βββ components/ # Reusable UI components
β βββ hooks/ # Custom React hooks
βββ docs/
β βββ screenshots/ # App screenshots
βββ start.bat # Windows launcher
βββ requirements.txt # Python dependencies
βββ transcripts.db # Local database (auto-created)
| Issue | Solution |
|---|---|
| App doesn't start | Run python -m src.tray manually to see errors |
| No audio input | Check microphone selection in Settings |
| STT fails | Verify API key in Settings β API Configuration |
| Export fails | Install: pip install python-docx reportlab lxml |
| YouTube fails | Ensure YouTube API key is set in Settings |
MIT License - see LICENSE for details.
Made with β€οΈ for efficient voice-to-text workflows




