Scriber

AI-Powered Voice Transcription for Windows
Live dictation, YouTube transcription, and file processing with LLM-powered summaries

Features • Screenshots • Quick Start • Usage • Configuration

Features

🎤 Live Dictation

Press a global hotkey (Ctrl+Alt+S by default) from anywhere on your system to instantly start recording. A sleek overlay appears with real-time audio visualization and transcription. Perfect for taking quick notes, writing emails, or dictating documents.

📺 YouTube Transcription

Paste any YouTube URL or search for videos directly within the app. Scriber downloads the audio and transcribes it with speaker diarization, making it ideal for podcasts, interviews, lectures, and video research.

📁 File Upload

Drag & drop audio or video files up to 2GB. Scriber automatically extracts audio from video formats (MP4, MOV, MKV, etc.) and transcribes them. Supports MP3, WAV, FLAC, M4A, and many more formats.

🤖 AI Summarization

Generate intelligent summaries of your transcripts using Google Gemini or OpenAI GPT models. Customize the summarization prompt to get exactly the output format you need—bullet points, action items, or full prose.

👥 Speaker Diarization

Automatically identify and label different speakers in your transcripts with color-coded badges. Essential for meetings, interviews, and multi-person recordings.

📤 Export Options

Export your transcripts and summaries to PDF or DOCX with proper formatting. Markdown in summaries is rendered correctly, and speaker labels are preserved.

🔍 Search & Filter

Quickly find any transcript with instant search across all your recordings. Each category (Live Mic, YouTube, Files) maintains its own searchable history.

🔔 System Tray Integration

Scriber runs silently in your system tray. Access recent recordings, view logs, or control the app with a right-click—no windows cluttering your desktop.

Screenshots

Live Mic Recording

Instant voice-to-text with real-time audio visualization and recording history

YouTube Transcription

Search YouTube or paste URLs to transcribe any video with speaker identification

File Upload

Drag & drop audio/video files for automatic transcription

Transcript Detail

Full transcript view with AI summary, speaker labels, and export options

Settings

Configure transcription models, hotkeys, and API integrations

Quick Start

Windows

Clone the repository

git clone https://github.com/MyButtermilk/Scriber.git
cd Scriber

Run the launcher
```
start.bat
```
This will automatically:
- Create a Python virtual environment
- Install all backend dependencies
- Install frontend dependencies (npm)
- Launch the application
Access the Web UI

The app opens automatically at http://localhost:5000. A tray icon appears for background control.

Requirements

Python 3.10+
Node.js 18+
FFmpeg (for video file processing)

Usage

Global Hotkey

Press Ctrl+Alt+S (configurable) from anywhere to toggle recording. The live overlay shows:

Real-time audio levels
Interim transcription text
Recording duration

Web Interface

Tab	Purpose
Live Mic	View real-time transcription and recording history
YouTube	Paste URLs or search to transcribe videos
Files	Upload audio/video files for processing
Settings	Configure models, hotkeys, and API keys

System Tray

Right-click the tray icon to:

Recent Recordings: Click to copy transcript to clipboard
View Logs: Debug issues with backend/frontend
Open Web UI: Launch the browser interface
Restart / Quit: Control the application

Configuration

Scriber uses environment variables and a .env file for configuration. Key settings:

Speech-to-Text Providers

Provider	Env Variable	Features
Soniox	`SONIOX_API_KEY`	Real-time streaming, speaker diarization
Deepgram	`DEEPGRAM_API_KEY`	Nova-2 model, fast processing
OpenAI	`OPENAI_API_KEY`	Whisper model
AssemblyAI	`ASSEMBLYAI_API_KEY`	Universal model
Azure	`AZURE_SPEECH_KEY`	Microsoft Speech Services
Gladia	`GLADIA_API_KEY`	Multi-language support
Speechmatics	`SPEECHMATICS_API_KEY`	Enterprise-grade accuracy
AWS	`AWS_ACCESS_KEY_ID`	Transcribe service

AI Summarization

Provider	Env Variable
Google Gemini	`GOOGLE_API_KEY`
OpenAI	`OPENAI_API_KEY`

App Settings

# Recording
SCRIBER_HOTKEY=ctrl+alt+s
SCRIBER_DEFAULT_STT=soniox
SCRIBER_MIC_DEVICE=default

# Summarization
SCRIBER_AUTO_SUMMARIZE=0
SCRIBER_SUMMARIZATION_MODEL=gemini-2.0-flash

# YouTube
YOUTUBE_API_KEY=your_key_here

🏗️ Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   System Tray   │────▶│  Python Backend │◀────│  React Frontend │
│   (tray.py)     │     │  (web_api.py)   │     │  (Vite + React) │
└─────────────────┘     └─────────────────┘     └─────────────────┘
        │                       │                       │
        │                       ▼                       │
        │               ┌─────────────────┐             │
        │               │  SQLite DB      │             │
        │               │  (transcripts)  │             │
        │               └─────────────────┘             │
        │                       │                       │
        ▼                       ▼                       ▼
   Global Hotkeys      STT Pipeline             WebSocket
   Overlay Window      (Multiple Providers)    (Real-time Updates)

Key Components:

src/tray.py: Entry point, manages process lifecycle
src/web_api.py: aiohttp server with REST API + WebSocket
src/pipeline.py: STT provider abstraction
src/export.py: PDF/DOCX generation
Frontend/: React 19 + Vite + Tailwind CSS

📊 State Machine Diagrams

Live Recording State Machine

The live microphone recording follows this state flow:

stateDiagram-v2
    [*] --> Idle
    
    Idle --> Preparing : Hotkey pressed
    Preparing --> Recording : Mic ready
    Preparing --> Idle : Error (mic unavailable)
    
    Recording --> Transcribing : Hotkey pressed (stop)
    Recording --> Recording : Audio captured
    
    Transcribing --> Completed : STT finished
    Transcribing --> Failed : STT error
    
    Completed --> [*]
    Failed --> [*]
    
    note right of Preparing
        Shows "Preparing..." overlay
        Initializes microphone
    end note
    
    note right of Recording
        Shows audio visualization
        Real-time transcription
    end note

YouTube/File Transcription State Machine

Background transcription jobs (YouTube and File uploads) follow this workflow:

stateDiagram-v2
    [*] --> Queued
    
    Queued --> Downloading : Start processing
    
    Downloading --> Extracting : Download complete (video)
    Downloading --> Transcribing : Download complete (audio)
    Downloading --> Failed : Download error
    Downloading --> Stopped : User cancelled
    
    Extracting --> Transcribing : Audio extracted
    Extracting --> Failed : FFmpeg error
    
    Transcribing --> Summarizing : Transcription complete (auto-summarize on)
    Transcribing --> Completed : Transcription complete
    Transcribing --> Failed : STT error
    Transcribing --> Stopped : User cancelled
    
    Summarizing --> Completed : Summary generated
    Summarizing --> Completed : Summary failed (transcript still saved)
    
    Completed --> [*]
    Failed --> [*]
    Stopped --> [*]

Transcript Status Lifecycle

Each transcript record transitions through these statuses:

stateDiagram-v2
    [*] --> recording : Live mic started
    [*] --> processing : YouTube/File queued
    
    recording --> completed : Stop + finalize
    recording --> failed : Pipeline error
    
    processing --> completed : Transcription done
    processing --> failed : Error occurred
    processing --> stopped : User cancelled
    
    completed --> completed : Summary added
    completed --> completed : Export generated
    
    note right of completed
        Content persisted to SQLite
        Available for search/export
    end note

WebSocket Message Flow

Real-time communication between backend and frontend:

sequenceDiagram
    participant F as Frontend
    participant B as Backend
    participant S as STT Service
    
    F->>B: Connect WebSocket
    B->>F: state (current status)
    
    Note over F,B: User starts recording
    F->>B: toggle (via hotkey)
    B->>F: session_started
    
    loop Recording
        B->>S: Audio stream
        S->>B: Transcription (interim)
        B->>F: transcript (text, isFinal=false)
        S->>B: Transcription (final)
        B->>F: transcript (text, isFinal=true)
    end
    
    Note over F,B: User stops recording
    F->>B: toggle (via hotkey)
    B->>F: transcribing
    B->>F: session_finished
    B->>F: history_updated

📦 Project Structure

Scriber/
├── src/
│   ├── tray.py           # System tray & process manager
│   ├── web_api.py        # Backend API server
│   ├── pipeline.py       # STT provider orchestration
│   ├── database.py       # SQLite persistence
│   ├── export.py         # PDF/DOCX export
│   ├── overlay.py        # Recording overlay window
│   └── config.py         # Configuration loader
├── Frontend/
│   └── client/
│       └── src/
│           ├── pages/    # React page components
│           ├── components/ # Reusable UI components
│           └── hooks/    # Custom React hooks
├── docs/
│   └── screenshots/      # App screenshots
├── start.bat             # Windows launcher
├── requirements.txt      # Python dependencies
└── transcripts.db        # Local database (auto-created)

🔧 Troubleshooting

Issue	Solution
App doesn't start	Run `python -m src.tray` manually to see errors
No audio input	Check microphone selection in Settings
STT fails	Verify API key in Settings → API Configuration
Export fails	Install: `pip install python-docx reportlab lxml`
YouTube fails	Ensure YouTube API key is set in Settings

📄 License

MIT License - see LICENSE for details.

Made with ❤️ for efficient voice-to-text workflows

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.claude		.claude
.vscode		.vscode
Frontend		Frontend
docs		docs
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
BUGS.md		BUGS.md
OPTIMIZATIONS_IMPLEMENTED.md		OPTIMIZATIONS_IMPLEMENTED.md
PERFORMANCE_ANALYSIS.md		PERFORMANCE_ANALYSIS.md
README.md		README.md
TRANSCRIPTION_SPEED_IMPROVEMENTS.md		TRANSCRIPTION_SPEED_IMPROVEMENTS.md
check_imports.py		check_imports.py
frontend.md		frontend.md
improvements.md		improvements.md
list_mics.py		list_mics.py
requirements.txt		requirements.txt
settings.json		settings.json
start.bat		start.bat
start.sh		start.sh
transcripts.db-shm		transcripts.db-shm
transcripts.db-wal		transcripts.db-wal

MyButtermilk/Scriber

Folders and files

Latest commit

History

Repository files navigation