A visual speech recognition (VSR) tool that reads your lips in real-time and types whatever you silently mouth.
SilenceVoice leverages state-of-the-art AI to translate visual lip movements into text, making communication accessible for mute individuals or for silent dictation in quiet environments.
- Real-Time Lip Reading: Translates silent speech instantly using advanced computer vision.
- AI Correction: Uses Google Gemma-3-27B to refine raw phonetic detections into natural, grammatically correct sentences.
- Text-to-Speech: Built-in functionality to speak the corrected text aloud.
- Privacy-First: The core VSR model runs locally on your machine.
- Modern UI: A clean, accessible interface built with Next.js and Tailwind CSS.
- Framework: Next.js 16 (React 19)
- Styling: Tailwind CSS 4
- Language: TypeScript
- Framework: FastAPI (Python)
- Server: Uvicorn
- Visual Speech Recognition (VSR):
- Large Language Model (LLM):
- Model: Google Gemma-3-27B-IT
- Provider: Telus PaaS
- Role: Contextual correction and sentence refinement.
- Python 3.11+
- Node.js 18+
- Git
git clone https://github.com/your-username/silencevoice.git
cd silencevoiceRun the setup script to download the required VSR models (approx. 1GB):
./setup.shInstall Python dependencies:
pip install -r requirements.txt
pip install -r backend_requirements.txtStart the Backend Server:
python backend/main.pyThe server will start at http://0.0.0.0:8000
Open a new terminal window and navigate to the frontend directory:
cd frontend
npm installStart the Frontend Application:
npm run devThe application will be available at http://localhost:3000
- Open your browser to
http://localhost:3000. - Allow camera access when prompted.
- Click "Start Recognition" to begin the session.
- Speak silently (mouth words without sound) into the camera.
- The raw detection will be processed by the VSR model, then refined by Gemma-3.
- The final text will appear on screen.
- Toggle "Text-To-Speech On" to have the system read the text aloud automatically.