|
|
TranscribeAI is a cross-platform desktop app that uses Large Language Models to transcribe audio files and scanned images/pages. It features:
- LLM-powered transcription (e.g. Google Gemini via
google-generativeai) - Resumable image workflows (skips already-done files, cleans up partial outputs)
- Real-time logs & progress in the UI
- Persistent settings (stores your API key with
electron-store) - Drag-resizable, searchable sidebar for managing transcripts
Note: A “headless” version (no UI) is also available and can be integrated into your own system—see
https://github.com/Minitex/TranscribeAI
- Electron: Main process for file I/O and IPC
- React + TypeScript: Renderer UI, bundled with Vite for fast HMR
- Vite: Modern build tool for instant feedback and optimized production builds
- google-generativeai SDK: Interfaces with LLMs for high-quality transcription
- electron-store: Simple JSON storage for your Gemini API key
-
Obtain your API keys
- Gemini (required for audio/Gemini OCR): Sign in to AI Studio and create a key at the Google AI Studio API Key Console. Copy the generated key.
- Mistral (only for Mistral OCR): Create an API key in your Mistral account—follow the Mistral quickstart at docs.mistral.ai/getting-started/quickstart.
-
Download TranscribeAI
- Go to the TranscribeAI Releases page.
- Choose the installer for your OS:
- macOS:
.dmg - Windows:
.exe - Linux:
.AppImageor.tar.gz
- macOS:
- Download and run the installer. Because TranscribeAI is an open-source project and we don’t bundle a paid code-signing certificate, you may see a security warning the first time you run it:
- macOS Gatekeeper (“Unidentified Developer” or "damaged and can't be opened"): open System Preferences → Security & Privacy, then click Open Anyway next to the TranscribeAI entry.
- Windows SmartScreen (“Windows protected your PC”): click More info, then Run anyway.
- Follow the installer prompts to complete installation.
-
Configure your API key(s)
- Launch TranscribeAI.
- Click the gear icon in the top-right corner to open Settings.
- Paste your Google Gemini API key into the “API Key” field.
- If you plan to use Mistral OCR, paste your Mistral API key into the “Mistral API Key” field.
- Click Save.
-
Run your first transcription
- Click the file picker button to select an audio file or image folder, then choose the output folder for your transcripts.
- Click Transcribe to begin transcription.
- Monitor progress and logs in real time.
- In Settings, enter your Mistral API key (leave Gemini key as well if you use Gemini features).
- In the Image tab, pick the
mistral-ocr-latestmodel. - Select an input folder of images/PDFs and an output folder.
- Enable Batch mode to process files in batches, then start transcription. The app will call Mistral’s batch OCR and write outputs to your chosen folder.
- Adjust Batch size with the +/− controls (default 50, range 10–500) to balance throughput vs. request size.
- Note: single-image Mistral OCR calls are free; batch OCR requires a paid Mistral account.
- macOS: Download the latest
.dmgfrom Releases, open it, and drag the app to Applications. Choose “Replace” if prompted. If Gatekeeper blocks, right-click → Open once. - Windows: Download the new installer
.exefrom Releases and run it; it overwrites the existing install. Tip: after first launch, right-click the TranscribeAI icon on the taskbar → Pin to taskbar so it’s easy to find next time. - Linux (AppImage): Download the new
*.AppImage,chmod +xif needed, and replace your old AppImage file. - No uninstall needed: Install over the top; your existing data and settings remain.
|
|

