A web application for transcribing audio and video files using Google's Gemini Flash model.
Live Application: https://gemini-transcribe.fly.dev/
- Specify the desired language of the transcript
 - Automatically detects and labels different speakers in the audio (Speaker 1, Speaker 2, etc.).
 - Instead of a timestamp for every word, the transcript is logically grouped into paragraphs with a single timestamp, making it much more readable.
 - Click on any timestamp to jump to that specific moment in the audio or video player.
 - Download the final transcript as a plain .txt file (with or without timestamps) or as a .srt subtitle file for use in video players.
 
- Prerequisites
 
- Node.js (v22 or later)
 - A Google AI API Key
 
- 
Clone & install
git clone https://github.com/mikeesto/gemini-transcribe.git cd gemini-transcribeInstall dependencies (using npm, pnpm, or yarn)
npm install - 
Environment variables
Create a
.envfile in the root of the project and add your Google API.GOOGLE_API_KEY="YOUR_API_KEY_HERE" - 
Run the development server
npm run dev
The application should now be running at http://localhost:5173.
 
Flash is a very interesting model to explore for audio transcription because...
- It can attempt to detect not only words but also silence, sentiment, and sounds beyond human voices
 - It can translate the transcription