Skip to content

Mubarraqqq/STS_Robot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🗣️ Speech-to-Speech Retrieval-Augmented Assistant (Bruce)



Bruce is an intelligent voice assistant pipeline that listens, understands, retrieves, reasons, and speaks — enabling seamless speech-to-speech interaction powered by Retrieval-Augmented Generation (RAG), whisper-based transcription, and wake-word activation via Porcupine.


🚀 Features

  • 🎙️ Wake Word Detection: Trigger interaction with the phrase “Hey Bruce” using Picovoice Porcupine.
  • 🧠 RAG-based Reasoning: Context-aware generation using an embedded vector database (faiss_index.idx) with chunked document retrieval.
  • 🗣️ Speech-to-Text + Text-to-Speech: Uses Whisper for transcription and pyttsx3 (or compatible engine) for natural voice synthesis.
  • Real-time Interaction: Listens for user input, queries a document corpus, and returns spoken responses in a human-like loop.

🧱 Project Structure

├── S2S_v8.py / S2S_v9.py # Main assistant logic
├── .env # Environment variables (API keys, config)
├── requirements.txt # Python dependencies
├── doc_chunks.pkl # Pickled document chunks for retrieval
├── embeddings.npy # Pre-computed document embeddings
├── faiss_index.idx # FAISS index for fast similarity search
├── Hey-Bruce_en_windows_v3_0_0.ppn # Porcupine wake word model
├── info.txt # Text Corpus of which the RAG system wpuld make retrieval from 


🛠️ Setup Instructions

1. Set up Python environment

pip install -r requirements.txt

2. Add .env file

Create a .env file and add required environment variables:

TOGETHER_API_KEY=your_api_key
PORCUPINE_API_KEY = your_api_key
GROQ_API_KEY = your_api_key
OPEN_API = your_api_key

3. Run the assistant

python S2S_v9.py

🧠 How It Works

🔊 Wake Word Detection

Listens for the wake word “Hey Bruce” using a .ppn model via Porcupine.

🗣️ Speech-to-Text (STT)

Captures and transcribes user speech using OpenAI’s Whisper.

🔍 Contextual Retrieval

Searches document embeddings using FAISS to find the most relevant context from doc_chunks.pkl.

🧠 Language Generation

Sends query + context to a powerful language model (e.g., TogetherAI LLaMA) and retrieves the response.

🔁 Text-to-Speech (TTS)

Converts the generated response back to voice using pyttsx3.


📦 Dependencies

Key libraries (see requirements.txt):

  • openai-whisper
  • pyttsx3
  • faiss-cpu
  • python-dotenv
  • numpy
  • pickle
  • sounddevice
  • pvporcupine (Porcupine wake word detection)

🧪 Example Use Case

You: "Hey Bruce, what is retrieval-augmented generation?"
Assistant: (searches documents)
Response: "Retrieval-Augmented Generation (RAG) is a method that improves language models by providing external context from a document database..."


📍 Roadmap

  • Add multilingual support
  • Improve response latency via local quantized models
  • Add GUI interface
  • Deploy as web API for Raspberry Pi/Edge use

🔐 Disclaimer

  • Ensure you setup your personalized porcupine model (.ppn file) for the catch phrase 'Hey Bruce'
  • Also make sure you have an 'info.txt' file which would serve as the text corpus knowledge base for the RAG system.

🤝 Acknowledgements


📜 License

This project is licensed under the MIT License. See LICENSE for more information.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages