Welcome to the VidLingo project! This platform is designed to automate the process of video translation and dubbing using a modular, AI-powered pipeline.
The entire VidLingo suite is managed via Docker Compose, allowing for easy, modular execution of each service. All generated media will appear in the local downloads folder.
-
Build All Services: Builds the Docker images for all modules.
docker-compose build
-
Run the Downloader (Module 1): Downloads a video from a given URL into the shared
./downloadsfolder.docker-compose run yt-downloader "https://www.youtube.com/watch?v=your-video-id" -
Run the Transcriber (Module 2): Scans the
./downloadsfolder for videos and generates a timestamped transcription JSON file.docker-compose run transcriber
-
Run the Translator (Module 3): Scans the
./downloadsfolder for transcription files and translates them using the configured cloud AI. You can specify the target language using theTARGET_LANGUAGEenvironment variable. Note: This module requires aGEMINI_API_KEYto be set in a.envfile at the project root. See.env.examplefor format.docker-compose run --env TARGET_LANGUAGE=French translator # Or for Polish (default): # docker-compose run translator
-
Run the TTS (Module 4): Generates a new audio track from the translated text, using Microsoft Edge's text-to-speech engine, mixes it with the original background audio, and remuxes it into a final video.
docker-compose run tts
If you want to process a local video file (instead of downloading from YouTube):
- Place your video file (e.g.,
my_local_video.mp4) directly into theC:\VidLingo\downloadsfolder on your host machine. - Skip the
yt-downloaderstep. - Start the pipeline from the
transcriberservice:Thedocker-compose run transcriber docker-compose run translator docker-compose run tts
transcriberwill automatically find your local video file in thedownloadsfolder and initiate the rest of the dubbing process.
This project is built with a modular, service-oriented architecture. Each service has its own README file for detailed information.
- Status: ✅ Complete
- Description: A containerized Python service for media acquisition. More details in its local README.
- Status: ✅ Complete
- Description: A high-performance transcription service using
faster-whisperon CPU. More details in its local README.
- Status: ✅ Complete
- Description: A cloud-native translation service using Google's Gemini API for dubbing-ready text. More details in its local README.
Configuration: Requires a
GEMINI_API_KEYin a.envfile at the project root.
- Status: ✅ Complete
- Description: The final module, responsible for synthesizing dubbed audio using Microsoft Edge's TTS engine and mixing it into the final video. More details in its local README.
- Backend: Python 3.11
- AI / ML:
faster-whisper, Google Gemini API, Microsoft Edge TTS - Containerization: Docker, Docker Compose
- Core Libraries:
yt-dlp,pydub, FFmpeg - Automation: Git
This project is in its initial development phase. Contribution guidelines will be established as the project matures.
