A web application for transcribing audio recordings from offline meetings. Upload your phone recordings and get accurate text transcripts instantly.
- Upload audio files (MP3, WAV, MP4, M4A up to 200MB)
- Automatic transcription using Whisper AI model
- Export transcripts as TXT files
- Clean, responsive black & white interface
- Privacy-focused (local processing, no cloud APIs)
- File validation (type and size)
Frontend: React 19, Vite 7, Tailwind CSS 4, Axios, Lucide React
Backend: Node.js, Express 5, Whisper AI (@xenova/transformers), FFmpeg, Multer
AI-Audio-Transcriber/
├── .dockerignore
├── .gitignore
├── README.md
├── docker-compose.yml
├── screenshots/
├── backend/
│ ├── Dockerfile
│ ├── package.json
│ ├── src/
│ │ ├── server.js
│ │ └── transcribe.js
│ └── uploads/
└── frontend/
├── Dockerfile
├── index.html
├── package.json
├── postcss.config.js
├── tailwind.config.js
├── vite.config.js
└── src/
├── App.jsx
├── main.jsx
├── index.css
└── components/
├── Home.jsx
├── Header.jsx
└── Footer.jsx
Prerequisites:
- Node.js 18+
- FFmpeg installed on system (for local development)
- Docker & Docker Compose (for containerized setup)
Ubuntu/Debian:
sudo apt install ffmpegmacOS:
brew install ffmpegWindows: Download from ffmpeg.org
- Clone the repository:
git clone https://github.com/akhilachiju/AI-Audio-Transcriber.git
cd AI-Audio-Transcriber- Install and run backend:
cd backend
npm install
npm run dev- Install and run frontend (in new terminal):
cd frontend
npm install
npm run devAccess the application:
- Frontend: http://localhost:7070
- Backend API: http://localhost:7071
docker-compose up --buildAccess the application:
- Frontend: http://localhost:7070
- Backend API: http://localhost:7071
- Cloud Platform (Recommended)
cd frontend && npm run build
cd backend && npm install --productionDeploy frontend to Vercel/Netlify and backend to Railway/Render
- VPS/Server
# Build frontend
cd frontend && npm run build
# Run backend with PM2
cd backend
npm install --production
pm2 start src/server.jsEnvironment Variables:
PORT=7071
NODE_ENV=production
VITE_API_URL=https://your-api-domain.com- Fast development with hot reload and modern build tooling
- Tailwind CSS for rapid UI development with black & white theme
- Axios for simple HTTP client
- Lucide React for consistent iconography
- JavaScript full-stack for consistency
- Multer for file upload handling with validation
- @xenova/transformers for JavaScript Whisper implementation
Chosen: Local Whisper (@xenova/transformers)
Reasons:
- Privacy: audio files stay local, never sent to external services
- No API costs or usage limits
- No internet dependency after initial model download
- Full control over processing
Trade-offs:
- Higher resource usage (CPU/memory)
- Slower initial model load time
- Limited to whisper-small model for performance
- System FFmpeg for reliable audio format conversion
- Converts audio to optimal format for Whisper (16kHz mono PCM float32)
Available Whisper models (configurable in backend/src/transcribe.js):
whisper-tiny.en- 75MB, fastest, English onlywhisper-base.en- 150MB, fast, English onlywhisper-small- 250MB, balanced (default), multilingualwhisper-medium- 1.5GB, high accuracy, slower, multilingual
- WebSocket connection to show transcription progress
- Visual progress bar with estimated time remaining
- Better loading states and user feedback
- Drag and drop file upload
- Audio player to preview files before transcription
- Copy to clipboard button for transcripts
- Display selected filename before upload
- Transcript editing with inline corrections
- Word count and character count for transcripts
- Timestamp display in transcripts
- Rate limiting to prevent abuse
- Automatic cleanup of uploaded files after processing
- Better error handling and user-friendly error messages
- Health check endpoint for monitoring
- Logging system for debugging
- Unit tests for transcription logic
- Integration tests for API endpoints
- E2E tests for upload and download flow
- File validation edge cases
- Queue system for multiple concurrent uploads
- Caching for frequently transcribed content
- Streaming transcription for large files
- Model warm-up on server start






