SilenceVoice

A visual speech recognition (VSR) tool that reads your lips in real-time and types whatever you silently mouth.

SilenceVoice leverages state-of-the-art AI to translate visual lip movements into text, making communication accessible for mute individuals or for silent dictation in quiet environments.

🚀 Features

Real-Time Lip Reading: Translates silent speech instantly using advanced computer vision.
AI Correction: Uses Google Gemma-3-27B to refine raw phonetic detections into natural, grammatically correct sentences.
Text-to-Speech: Built-in functionality to speak the corrected text aloud.
Privacy-First: The core VSR model runs locally on your machine.
Modern UI: A clean, accessible interface built with Next.js and Tailwind CSS.

🛠️ Technology Stack

Frontend

Framework: Next.js 16 (React 19)
Styling: Tailwind CSS 4
Language: TypeScript

Backend

Framework: FastAPI (Python)
Server: Uvicorn

AI & Machine Learning

Visual Speech Recognition (VSR):
- Based on Auto-AVSR architecture.
- Trained on the LRS3 (Lip Reading Sentences 3) dataset.
- Libraries: torch, torchvision, torchaudio, mediapipe (for face tracking), opencv-python.
Large Language Model (LLM):
- Model: Google Gemma-3-27B-IT
- Provider: Telus PaaS
- Role: Contextual correction and sentence refinement.

📦 Installation

Prerequisites

Python 3.11+
Node.js 18+
Git

1. Clone the Repository

git clone https://github.com/your-username/silencevoice.git
cd silencevoice

2. Setup Backend & Models

Run the setup script to download the required VSR models (approx. 1GB):

./setup.sh

Install Python dependencies:

pip install -r requirements.txt
pip install -r backend_requirements.txt

Start the Backend Server:

python backend/main.py

The server will start at http://0.0.0.0:8000

3. Setup Frontend

Open a new terminal window and navigate to the frontend directory:

cd frontend
npm install

Start the Frontend Application:

npm run dev

The application will be available at http://localhost:3000

🎮 Usage

Open your browser to http://localhost:3000.
Allow camera access when prompted.
Click "Start Recognition" to begin the session.
Speak silently (mouth words without sound) into the camera.
The raw detection will be processed by the VSR model, then refined by Gemma-3.
The final text will appear on screen.
Toggle "Text-To-Speech On" to have the system read the text aloud automatically.

📄 License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
backend		backend
configs		configs
espnet		espnet
frontend		frontend
hydra_configs		hydra_configs
pipelines		pipelines
.gitignore		.gitignore
README.md		README.md
UI_DESIGN_SYSTEM.md		UI_DESIGN_SYSTEM.md
WEB_APP_README.md		WEB_APP_README.md
backend_requirements.txt		backend_requirements.txt
main.py		main.py
requirements.txt		requirements.txt
setup.sh		setup.sh
silencevoice.py		silencevoice.py
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SilenceVoice

🚀 Features

🛠️ Technology Stack

Frontend

Backend

AI & Machine Learning

📦 Installation

Prerequisites

1. Clone the Repository

2. Setup Backend & Models

3. Setup Frontend

🎮 Usage

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SilenceVoice

🚀 Features

🛠️ Technology Stack

Frontend

Backend

AI & Machine Learning

📦 Installation

Prerequisites

1. Clone the Repository

2. Setup Backend & Models

3. Setup Frontend

🎮 Usage

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages