Skip to content

MyButtermilk/Scriber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Scriber Logo

Scriber

AI-Powered Voice Transcription for Windows
Live dictation, YouTube transcription, and file processing with LLM-powered summaries

Features β€’ Screenshots β€’ Quick Start β€’ Usage β€’ Configuration


Features

🎀 Live Dictation

Press a global hotkey (Ctrl+Alt+S by default) from anywhere on your system to instantly start recording. A sleek overlay appears with real-time audio visualization and transcription. Perfect for taking quick notes, writing emails, or dictating documents.

πŸ“Ί YouTube Transcription

Paste any YouTube URL or search for videos directly within the app. Scriber downloads the audio and transcribes it with speaker diarization, making it ideal for podcasts, interviews, lectures, and video research.

πŸ“ File Upload

Drag & drop audio or video files up to 2GB. Scriber automatically extracts audio from video formats (MP4, MOV, MKV, etc.) and transcribes them. Supports MP3, WAV, FLAC, M4A, and many more formats.

πŸ€– AI Summarization

Generate intelligent summaries of your transcripts using Google Gemini or OpenAI GPT models. Customize the summarization prompt to get exactly the output format you needβ€”bullet points, action items, or full prose.

πŸ‘₯ Speaker Diarization

Automatically identify and label different speakers in your transcripts with color-coded badges. Essential for meetings, interviews, and multi-person recordings.

πŸ“€ Export Options

Export your transcripts and summaries to PDF or DOCX with proper formatting. Markdown in summaries is rendered correctly, and speaker labels are preserved.

πŸ” Search & Filter

Quickly find any transcript with instant search across all your recordings. Each category (Live Mic, YouTube, Files) maintains its own searchable history.

πŸ”” System Tray Integration

Scriber runs silently in your system tray. Access recent recordings, view logs, or control the app with a right-clickβ€”no windows cluttering your desktop.


Screenshots

Live Mic Recording

Live Mic Interface

Instant voice-to-text with real-time audio visualization and recording history

YouTube Transcription

YouTube Transcription

Search YouTube or paste URLs to transcribe any video with speaker identification

File Upload

File Upload

Drag & drop audio/video files for automatic transcription

Transcript Detail

Transcript Detail

Full transcript view with AI summary, speaker labels, and export options

Settings

Settings

Configure transcription models, hotkeys, and API integrations


Quick Start

Windows

  1. Clone the repository

    git clone https://github.com/MyButtermilk/Scriber.git
    cd Scriber
  2. Run the launcher

    start.bat

    This will automatically:

    • Create a Python virtual environment
    • Install all backend dependencies
    • Install frontend dependencies (npm)
    • Launch the application
  3. Access the Web UI

    The app opens automatically at http://localhost:5000. A tray icon appears for background control.

Requirements

  • Python 3.10+
  • Node.js 18+
  • FFmpeg (for video file processing)

Usage

Global Hotkey

Press Ctrl+Alt+S (configurable) from anywhere to toggle recording. The live overlay shows:

  • Real-time audio levels
  • Interim transcription text
  • Recording duration

Web Interface

Tab Purpose
Live Mic View real-time transcription and recording history
YouTube Paste URLs or search to transcribe videos
Files Upload audio/video files for processing
Settings Configure models, hotkeys, and API keys

System Tray

Right-click the tray icon to:

  • Recent Recordings: Click to copy transcript to clipboard
  • View Logs: Debug issues with backend/frontend
  • Open Web UI: Launch the browser interface
  • Restart / Quit: Control the application

Configuration

Scriber uses environment variables and a .env file for configuration. Key settings:

Speech-to-Text Providers

Provider Env Variable Features
Soniox SONIOX_API_KEY Real-time streaming, speaker diarization
Deepgram DEEPGRAM_API_KEY Nova-2 model, fast processing
OpenAI OPENAI_API_KEY Whisper model
AssemblyAI ASSEMBLYAI_API_KEY Universal model
Azure AZURE_SPEECH_KEY Microsoft Speech Services
Gladia GLADIA_API_KEY Multi-language support
Speechmatics SPEECHMATICS_API_KEY Enterprise-grade accuracy
AWS AWS_ACCESS_KEY_ID Transcribe service

AI Summarization

Provider Env Variable
Google Gemini GOOGLE_API_KEY
OpenAI OPENAI_API_KEY

App Settings

# Recording
SCRIBER_HOTKEY=ctrl+alt+s
SCRIBER_DEFAULT_STT=soniox
SCRIBER_MIC_DEVICE=default

# Summarization
SCRIBER_AUTO_SUMMARIZE=0
SCRIBER_SUMMARIZATION_MODEL=gemini-2.0-flash

# YouTube
YOUTUBE_API_KEY=your_key_here

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   System Tray   │────▢│  Python Backend │◀────│  React Frontend β”‚
β”‚   (tray.py)     β”‚     β”‚  (web_api.py)   β”‚     β”‚  (Vite + React) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚                       β”‚                       β”‚
        β”‚                       β–Ό                       β”‚
        β”‚               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”‚
        β”‚               β”‚  SQLite DB      β”‚             β”‚
        β”‚               β”‚  (transcripts)  β”‚             β”‚
        β”‚               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚
        β”‚                       β”‚                       β”‚
        β–Ό                       β–Ό                       β–Ό
   Global Hotkeys      STT Pipeline             WebSocket
   Overlay Window      (Multiple Providers)    (Real-time Updates)

Key Components:

  • src/tray.py: Entry point, manages process lifecycle
  • src/web_api.py: aiohttp server with REST API + WebSocket
  • src/pipeline.py: STT provider abstraction
  • src/export.py: PDF/DOCX generation
  • Frontend/: React 19 + Vite + Tailwind CSS

πŸ“Š State Machine Diagrams

Live Recording State Machine

The live microphone recording follows this state flow:

stateDiagram-v2
    [*] --> Idle
    
    Idle --> Preparing : Hotkey pressed
    Preparing --> Recording : Mic ready
    Preparing --> Idle : Error (mic unavailable)
    
    Recording --> Transcribing : Hotkey pressed (stop)
    Recording --> Recording : Audio captured
    
    Transcribing --> Completed : STT finished
    Transcribing --> Failed : STT error
    
    Completed --> [*]
    Failed --> [*]
    
    note right of Preparing
        Shows "Preparing..." overlay
        Initializes microphone
    end note
    
    note right of Recording
        Shows audio visualization
        Real-time transcription
    end note
Loading

YouTube/File Transcription State Machine

Background transcription jobs (YouTube and File uploads) follow this workflow:

stateDiagram-v2
    [*] --> Queued
    
    Queued --> Downloading : Start processing
    
    Downloading --> Extracting : Download complete (video)
    Downloading --> Transcribing : Download complete (audio)
    Downloading --> Failed : Download error
    Downloading --> Stopped : User cancelled
    
    Extracting --> Transcribing : Audio extracted
    Extracting --> Failed : FFmpeg error
    
    Transcribing --> Summarizing : Transcription complete (auto-summarize on)
    Transcribing --> Completed : Transcription complete
    Transcribing --> Failed : STT error
    Transcribing --> Stopped : User cancelled
    
    Summarizing --> Completed : Summary generated
    Summarizing --> Completed : Summary failed (transcript still saved)
    
    Completed --> [*]
    Failed --> [*]
    Stopped --> [*]
Loading

Transcript Status Lifecycle

Each transcript record transitions through these statuses:

stateDiagram-v2
    [*] --> recording : Live mic started
    [*] --> processing : YouTube/File queued
    
    recording --> completed : Stop + finalize
    recording --> failed : Pipeline error
    
    processing --> completed : Transcription done
    processing --> failed : Error occurred
    processing --> stopped : User cancelled
    
    completed --> completed : Summary added
    completed --> completed : Export generated
    
    note right of completed
        Content persisted to SQLite
        Available for search/export
    end note
Loading

WebSocket Message Flow

Real-time communication between backend and frontend:

sequenceDiagram
    participant F as Frontend
    participant B as Backend
    participant S as STT Service
    
    F->>B: Connect WebSocket
    B->>F: state (current status)
    
    Note over F,B: User starts recording
    F->>B: toggle (via hotkey)
    B->>F: session_started
    
    loop Recording
        B->>S: Audio stream
        S->>B: Transcription (interim)
        B->>F: transcript (text, isFinal=false)
        S->>B: Transcription (final)
        B->>F: transcript (text, isFinal=true)
    end
    
    Note over F,B: User stops recording
    F->>B: toggle (via hotkey)
    B->>F: transcribing
    B->>F: session_finished
    B->>F: history_updated
Loading

πŸ“¦ Project Structure

Scriber/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ tray.py           # System tray & process manager
β”‚   β”œβ”€β”€ web_api.py        # Backend API server
β”‚   β”œβ”€β”€ pipeline.py       # STT provider orchestration
β”‚   β”œβ”€β”€ database.py       # SQLite persistence
β”‚   β”œβ”€β”€ export.py         # PDF/DOCX export
β”‚   β”œβ”€β”€ overlay.py        # Recording overlay window
β”‚   └── config.py         # Configuration loader
β”œβ”€β”€ Frontend/
β”‚   └── client/
β”‚       └── src/
β”‚           β”œβ”€β”€ pages/    # React page components
β”‚           β”œβ”€β”€ components/ # Reusable UI components
β”‚           └── hooks/    # Custom React hooks
β”œβ”€β”€ docs/
β”‚   └── screenshots/      # App screenshots
β”œβ”€β”€ start.bat             # Windows launcher
β”œβ”€β”€ requirements.txt      # Python dependencies
└── transcripts.db        # Local database (auto-created)

πŸ”§ Troubleshooting

Issue Solution
App doesn't start Run python -m src.tray manually to see errors
No audio input Check microphone selection in Settings
STT fails Verify API key in Settings β†’ API Configuration
Export fails Install: pip install python-docx reportlab lxml
YouTube fails Ensure YouTube API key is set in Settings

πŸ“„ License

MIT License - see LICENSE for details.


Made with ❀️ for efficient voice-to-text workflows

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •