A production-ready Django application for transcribing video and audio files using OpenAI's Whisper model, with containerization, async task processing, and comprehensive security features.
Video Transcriber is a modern, full-featured web application for transcribing video and audio content. It leverages OpenAI's Whisper, a state-of-the-art speech recognition model, combined with Docker containerization, PostgreSQL, Celery async tasks, and strict security controls to provide a reliable, scalable platform for media transcription.
- Multiple File Formats: MP4, MPEG, MOV, AVI, WebM, OGG, MP3, WAV, FLAC
- Long Video Support: Automatic chunking for videos over 15 minutes
- Configurable Models: Tiny, Base, Small (default), Medium, Large (accuracy vs. speed tradeoff)
- Timestamp Segments: Per-utterance transcripts with precise timing
- Multi-format Export: Download as TXT (plaintext) or SRT (subtitles for video players)
- Docker Containerization: Reproducible deployments with docker-compose
- PostgreSQL Database: Production-grade data persistence and concurrent access
- Celery Task Queue: Asynchronous transcription processing with Redis broker
- Graceful Error Handling: Automatic recovery from worker crashes and deleted records
- Persistent Model Cache: Pre-downloaded Whisper models survive container restarts
- User Authentication: Registration, login, password reset with email
- Authorization: Users can only access their own videos (prevents IDOR attacks)
- Brute-force Protection: Lock accounts after 5 failed login attempts (24-hour cooldown)
- CSRF Protection: All forms include CSRF tokens
- Secure Cookies: HTTPS-only cookies in production
- Input Validation: File type, size, and content-type validation
- Real-time Status Updates: AJAX polling shows transcription progress
- Pagination: Video list with 6 items per page
- Bootstrap UI: Responsive, mobile-friendly interface
- User-friendly Titles: Video filenames automatically used as titles
- Progress Tracking: Visual status indicators (Pending, Processing, Completed, Failed)
- Docker & Docker Compose (recommended for production)
- Python 3.12+ (for local development)
- FFmpeg (for audio extraction from video files)
-
Clone the repository:
git clone https://github.com/Co-vengers/video_transcriber.git cd video_transcriber -
Create environment configuration:
cp video_transcriber/.env.example video_transcriber/.env # Edit video_transcriber/.env with your settings -
Build and start services:
docker-compose up --build
-
Access the application:
- Web UI: http://localhost:8000
- Admin: http://localhost:8000/admin (superuser required)
-
Create admin user (optional, in another terminal):
docker-compose exec -T web python manage.py createsuperuser
If you prefer running locally without Docker:
-
Clone the repository:
git clone https://github.com/Co-vengers/video_transcriber.git cd video_transcriber -
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
cp video_transcriber/.env.example video_transcriber/.env # Edit .env with development settings (use SQLite for local dev) -
Run migrations:
python manage.py migrate
-
Create a superuser:
python manage.py createsuperuser
-
Start Redis (in separate terminal):
# Using Homebrew on macOS: brew services start redis # Or run directly: redis-server
-
Start Celery worker (in separate terminal):
cd video_transcriber celery -A video_transcriber worker --loglevel=info -
Start Django development server:
cd video_transcriber python manage.py runserver -
Access at http://localhost:8000
- Click "Register" to create a new account or "Login" with existing credentials
- Password reset available via email link
- Brute-force protection: Account locks after 5 failed attempts (24-hour cooldown)
- Click "Upload Video" from main menu
- Select a video or audio file (max 500 MB)
- Supported formats: MP4, MPEG, MOV, AVI, WebM, OGG, MP3, WAV, FLAC
- Choose Whisper model size:
- Tiny: Fastest, ~39M parameters, basic accuracy
- Base: Balanced, ~74M parameters
- Small: Default, ~244M parameters, good accuracy/speed tradeoff
- Medium: Slower, ~769M parameters, better accuracy
- Large: Slowest, ~1.5B parameters, highest accuracy
- Click "Upload" to start transcription
- Videos appear in "Videos" list with status badge
- Status updates in real-time (Pending → Processing → Completed/Failed)
- Progress visible without page refresh
Once transcription completes, download in multiple formats:
- TXT: Plain text transcript (copy/paste friendly)
- SRT: SubRip format with timestamps (import into video players)
- Format:
HH:MM:SS,mmm --> HH:MM:SS,mmm - Compatible with VLC, YouTube, browser video players
- Format:
- View all your transcribed videos with status
- Click video title to see full transcript and segments
- Delete videos to free up storage
- Videos are private (only you can see your transcripts)
Access Django admin at /admin:
- Manage user accounts
- View/filter transcription jobs by status and date
- Search videos by title or username
- Monitor transcription history
| Endpoint | Method | Purpose |
|---|---|---|
/ |
GET | Upload form |
/videos/ |
GET | List user's videos (paginated) |
/videos/<id>/ |
GET | View transcript and segments |
/videos/<id>/status/ |
GET | JSON status (for AJAX polling) |
/videos/<id>/download/<fmt>/ |
GET | Download transcript (fmt: txt or srt) |
/videos/<id>/delete/ |
POST | Delete video |
/register/ |
GET/POST | User registration |
/login/ |
GET/POST | User login |
/logout/ |
POST | User logout |
/password-reset/ |
GET/POST | Password reset flow |
- Web Framework: Django 5.1.7 (Python web framework)
- Application Server: Gunicorn 23.0.0 (production WSGI server)
- Database: PostgreSQL 16 (production relational database)
- Message Broker: Redis 7 (in-memory message queue)
- Task Queue: Celery 5.4.0 (asynchronous job processing)
- ML/AI Engine: OpenAI Whisper 20240930 (speech-to-text)
- Containerization: Docker & docker-compose (reproducible deployments)
- Frontend: Bootstrap 5 (responsive CSS framework)
┌─────────────────────────────────────────────────────────────┐
│ Docker Compose │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Web Service │ │ Worker │ │ Database │ │
│ │ (Gunicorn) │ │ (Celery) │ │(PostgreSQL) │ │
│ │ 2 workers │ │ concurrency=4│ │ │ │
│ └────────┬────────┘ └──────┬───────┘ └──────────────┘ │
│ │ │ │
│ └──────────┬────────┘ │
│ │ │
│ ┌───────▼────────┐ │
│ │ Redis Broker │ │
│ │ (Task Queue) │ │
│ └────────────────┘ │
│ │
│ Persistent Volumes: │
│ - whisper_cache: /cache/whisper (461MB model) │
│ - postgres_data: PostgreSQL data │
│ │
└─────────────────────────────────────────────────────────────┘
- User uploads video → Web service saves to
/media/videos/ - Task is queued → Gunicorn sends
process_transcriptiontask to Redis - Worker picks up task → Celery worker loads Whisper model (cached)
- Chunked transcription → Long videos split into 10-minute chunks
- Results saved → Transcripts and segments stored in PostgreSQL
- Frontend updates → AJAX polling shows real-time status
- User downloads → Download as TXT or SRT subtitle format
video_transcriber/
├── Dockerfile # Container image definition
├── docker-compose.yml # Multi-service orchestration
├── requirements.txt # Python dependencies
├── CHANGES_DOCUMENTATION.md # Full changelog and rationale
│
├── video_transcriber/ # Django project config
│ ├── settings.py # Django configuration
│ ├── urls.py # Main URL routing
│ ├── wsgi.py # WSGI application
│ ├── celery.py # Celery configuration
│ └── __init__.py # Celery import
│
├── transcription/ # Django app (main logic)
│ ├── models.py # Video model with ownership
│ ├── views.py # All HTTP view handlers
│ ├── urls.py # App URL patterns
│ ├── forms.py # VideoUploadForm with validation
│ ├── tasks.py # Celery transcription task
│ ├── admin.py # Django admin configuration
│ ├── utils.py # Whisper transcription utilities
│ ├── exports.py # TXT/SRT export functions
│ │
│ ├── management/
│ │ └── commands/
│ │ └── requeue_stale_transcriptions.py # Stale task recovery
│ │
│ ├── migrations/ # Database schema versions
│ │ ├── 0001_initial.py
│ │ ├── 0002_video_user.py
│ │ ├── 0003_alter_video_file_alter_video_user.py
│ │ ├── 0004_video_status.py
│ │ ├── 0005_video_segments.py
│ │ └── 0006_fix_upload_to_path.py
│ │
│ ├── templates/ # HTML templates
│ │ ├── base.html # Navigation, Bootstrap layout
│ │ ├── upload.html # Video upload form
│ │ ├── video_list.html # Paginated video gallery
│ │ ├── video_detail.html # Transcript viewer, download
│ │ └── auth/ # Authentication templates
│ │ ├── login.html
│ │ ├── register.html
│ │ ├── password_reset.html
│ │ └── lockout.html
│ │
│ ├── static/ # CSS, JS, fonts
│ │ └── transcription/
│ │ └── favicon.svg
│ │
│ └── tests.py # Unit tests
│
├── media/ # User uploads (not in repo)
│ └── videos/
│
├── .env.example # Environment template
├── .python-version # Python 3.12
├── .gitignore # Excluded files
└── README.md # This file
- ✅ User registration with password validation
- ✅ Secure password reset via email
- ✅ CSRF tokens on all forms
- ✅ Permission checks (users can only access their own videos)
- ✅ IDOR prevention (returns 404 if accessing others' content)
- ✅ Brute-force protection (5 failed attempts → 24-hour lockout)
- ✅ HTTPS-only cookies in production
- ✅ Secure cookie flags (HttpOnly, SameSite)
- ✅ Security headers (X-Frame-Options=DENY, X-Content-Type-Options=nosniff)
- ✅ SQL injection prevention (parameterized queries via ORM)
- ✅ Environment-based secrets (not in code)
- ✅ File type validation (extension + MIME type)
- ✅ File size limits (500 MB max)
- ✅ Input sanitization on all forms
- ✅ Secure file storage outside web root
- ✅ Graceful error handling (no crashes on deleted records)
- ✅ Task timeouts (1 hour hard, 55-minute soft)
- ✅ Auto-requeue on worker loss
- ✅ Automatic stale task recovery on startup
- ✅ Non-root worker process (nobody:nogroup)
# Security
SECRET_KEY=your-secret-key-here
# Debug Mode (False in production)
DEBUG=True
# Allowed Hosts
ALLOWED_HOSTS=localhost,127.0.0.1,0.0.0.0
# Database
DB_ENGINE=postgres # or 'sqlite' for development
DB_NAME=video_transcriber
DB_USER=postgres
DB_PASSWORD=postgres
DB_HOST=db # Docker service name
DB_PORT=5432
# Message Broker & Results
CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=redis://redis:6379/0
# Stale Task Recovery
STALE_PROCESSING_MINUTES=45 # How old before marking as stale
RECOVERY_MODEL_SIZE=small # Model to use for requeueFor password reset emails:
EMAIL_BACKEND=django.core.mail.backends.smtp.EmailBackend
EMAIL_HOST=smtp.gmail.com
EMAIL_PORT=587
EMAIL_USE_TLS=True
EMAIL_HOST_USER=your-email@gmail.com
EMAIL_HOST_PASSWORD=your-app-password
DEFAULT_FROM_EMAIL=noreply@example.com- GPU Support: Using CUDA GPU significantly improves transcription speed
- Model Selection:
tiny(39M params): 5-10x faster, lower accuracybase(74M params): 2-5x faster, decent accuracysmall(244M params): Default, good balancemedium(769M params): Slower, better accuracylarge(1.5B params): Very slow, best accuracy
- Long Videos: Automatically chunked (no manual splitting needed)
- Batch Processing: Queue multiple uploads for parallel processing
- Horizontal Scale: Add more worker containers for higher throughput
- Concurrent Limit: Currently
--pool=solo --concurrency=4(adjust as needed) - Database: PostgreSQL handles concurrent access safely
- Cache: Model stays in memory, re-downloads on restart (persists across container restarts via volume)
- Set
DEBUG=Falsein.env - Generate strong
SECRET_KEY(usepython -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())") - Configure
ALLOWED_HOSTSwith your domain - Set up HTTPS (nginx reverse proxy or cloud provider)
- Configure email backend for password resets
- Use strong database password
- Enable PostgreSQL backups
- Monitor worker health and logs
- Set up log aggregation (e.g., CloudWatch, ELK)
- Configure resource limits in docker-compose
Videos stuck in "Processing" status:
# Manually requeue stale videos
docker-compose exec -T web python manage.py requeue_stale_transcriptions --minutes=0Worker not picking up tasks:
# Check Celery worker logs
docker-compose logs -f worker
# Restart worker
docker-compose restart workerDatabase connection errors:
# Check PostgreSQL is healthy
docker-compose exec db pg_isready -U postgres
# View database service logs
docker-compose logs dbRedis connection issues:
# Verify Redis is accessible
docker-compose exec redis redis-cli ping
# Should return: PONGModel download stuck:
- First run downloads 461MB model (~70 seconds)
- Model is cached in persistent volume
whisper_cache:/cache/whisper - Subsequent runs load from cache (~5 seconds)
Run unit tests:
# With Docker
docker-compose exec -T web python manage.py test
# Locally
python manage.py test transcriptionTest coverage includes:
- ✅ Authorization (IDOR prevention)
- ✅ AJAX status endpoint
- ✅ Chunked transcription merging
- ✅ Task deletion safety
- ✅ Form validation
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch:
git checkout -b feat/amazing-feature - Make your changes and add tests if applicable
- Commit with conventional messages:
git commit -m "feat: description" - Push to your fork:
git push origin feat/amazing-feature - Open a Pull Request against
mainbranch
# Create local development environment
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Set up local .env with SQLite
cp video_transcriber/.env.example video_transcriber/.env
# Edit to use: DB_ENGINE=sqlite, remove CELERY_* vars for testing
# Run migrations
python manage.py migrate
# Run tests
python manage.py test
# Start development servers (in separate terminals)
redis-server
celery -A video_transcriber worker --loglevel=info
python manage.py runserverThis project is licensed under the MIT License - see the LICENSE file for details.
See CHANGES_DOCUMENTATION.md for detailed documentation of all changes, including:
- Infrastructure & containerization improvements
- Security hardening details
- Task reliability enhancements
- Feature additions and rationale
- Migration guide from previous version
See IMPROVEMENTS.md for planned future enhancements.
- OpenAI Whisper — State-of-the-art speech recognition
- Django — Web framework
- Celery — Async task queue
- PostgreSQL — Reliable database
- Bootstrap — Responsive CSS framework
- All contributors who have helped build and improve this tool
- GitHub: Co-vengers/video_transcriber
- Issues: GitHub Issues
- Team: Co-vengers
- Current Version: 2.0.0 (Production-ready)
- Python: 3.12+
- Django: 5.1.7
- Release Date: March 2026