MediTalk is a research project that gives the MultiMeditron medical LLM model from LiGHT Laboratory natural conversational speech capabilities, enabling voice-based medical interactions.
- Python 3.10+
- 48GB+ RAM
- HuggingFace token (get one)
- Access to meta-llama/Meta-Llama-3.1-8B-Instruct
- Access to ClosedMeditron/Mulimeditron-End2End-CLIP-medical (request from EPFL LiGHT lab)
Create .env file:
HUGGINGFACE_TOKEN=your_token
MULTIMEDITRON_HF_TOKEN=your_token
MULTIMEDITRON_MODEL=ClosedMeditron/Mulimeditron-End2End-CLIP-medical
ORPHEUS_MODEL=canopylabs/orpheus-3b-0.1-ft
WHISPER_MODEL=baseSetup environments (first run only):
./scripts/setup-local.shStart all services:
./scripts/start-local.shAccess web interface: http://localhost:8503
Stop services:
./scripts/stop-local.shMonitor:
./scripts/health-check.sh
tail -f logs/controller.logFor deployment on the EPFL RCP cluster, please refer to the LiGHT RCP Documentation for setup instructions. After setting up the environment, you can clone this repository and follow the deployment steps above to run MediTalk on the cluster.
| Service | Port | Description |
|---|---|---|
| Controller | 8000 | Orchestrates LLM, TTS, STT services |
| Web UI | 8503 | Streamlit interface |
| MultiMeditron | 5009 | Medical AI model |
| Whisper | 5007 | Speech-to-text |
| Orpheus | 5005 | Text-to-speech |
| Bark | 5008 | Text-to-speech |
| CSM | 5010 | Conversational Text-to-speech |
| Qwen3-Omni | 5014 | Conversational Text-to-speech |
Service won't start:
tail -f logs/<service>.logCheck for errors, missing dependencies or missing tokens.
Note: Some services may take several minutes to load models on first run.
Missing ffmpeg:
sudo apt-get update && sudo apt-get install -y ffmpeg
./scripts/restart.shModel loading fails:
- Verify HuggingFace token in
.env - Check disk space (models are large)
- Review service logs in
logs/directory
MediTalk/
├── .env # Environment configuration
├── scripts/ # Service management scripts
│ ├── setup-local.sh # Install dependencies
│ ├── start-local.sh # Start all services
│ ├── stop-local.sh # Stop all services
│ ├── restart.sh # Restart services
│ ├── health-check.sh # Check service health
│ └── monitor-gpus.sh # GPU monitoring
├── services/
│ ├── controller/ # Orchestration service
│ ├── modelMultiMeditron/ # Medical LLM
│ ├── modelWhisper/ # Speech-to-text
│ ├── modelOrpheus/ # TTS
│ ├── modelBark/ # TTS
│ ├── modelCSM/ # TTS (conversational)
│ ├── modelQwen3Omni/ # TTS (conversational)
│ └── webui/ # Streamlit interface
├── inputs/ # Input files (conversations json files)
├── outputs/ # Generated audio files
└── logs/ # Service logs
- [MultiMeditron] - EPFL LiGHT Lab
- Orpheus - Canopy Labs
- Bark - Suno AI
- Whisper - OpenAI
- Qwen3-Omni - Alibaba Cloud
- FastAPI & Streamlit
Semester Project | Nicolas Teissier | LiGHT Laboratory | EPFL