A FastAPI application that demonstrates Retrieval-Augmented Generation (RAG) concepts, combining:
- Audio/Video transcription via AssemblyAI,
- Semantic search over text sections with FAISS,
- Optional usage of LLaMA-based embeddings or a fake embeddings class.
- Upload or record audio/video, transcribe with AssemblyAI, and get
SRT/VTTsubtitles. - Semantic Search over the transcribed text or your own documents (markdown-based).
- Modular design following clean code principles, with separate classes for embeddings, search, and server.
- Flexible Embeddings with support for llama.cpp or custom embedding providers.
-
Clone this repository:
git clone https://github.com/boringresearch/rag-demo.git cd rag-demo -
Create and activate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # on Linux/Mac .\venv\Scripts\activate # on Windows
-
Install requirements (python<=3.11):
pip install -r requirements.txt
-
Configure environment:
- Copy
.env.exampleto.env - Insert your
ASSEMBLYAI_API_KEYinside.env - Configure your embedding API URL in
.envif using a custom embedding service
- Copy
-
Run the application:
python src/main.py
- Server listens on http://localhost:8002
- Open your browser at http://localhost:8002.
- Upload an audio/video file or choose an example to see transcription.
- Type a query in the search box to perform semantic search over the transcribed content or custom text.
This project uses llama.cpp for generating embeddings by default. To set up the embedding server:
- Install llama.cpp following the instructions in their repository
- Download a compatible model (e.g., a GGUF format model)
- Run the llama-server with embeddings enabled:
./llama-server -m model-f16.gguf --embeddings -c 512 -ngl 99 --host 0.0.0.0
- Update your
.envfile with the correct embedding API URL (default:http://localhost:8080)
The llama.cpp server expects embedding requests in the following format:
POST /embedding
{
"content": "text to embed"
}The response will contain the embedding vector:
{
"embedding": [0.123, 0.456, ...]
}If you encounter issues with the embedding API:
- Check that the llama-server is running with the
--embeddingsflag - Verify the API URL in your
.envfile matches the server address - Test the API directly using curl:
curl -X POST http://localhost:8080/embedding \ -H "Content-Type: application/json" \ -d '{"content":"test text"}'
- Check server logs for any error messages
- Try using the FakeEmbeddings provider for testing by setting
EMBEDDING_PROVIDER=fakein your.envfile
The project is designed to make it easy to switch between different embedding providers:
- Create a new class that implements the
EmbeddingsBaseinterface insrc/embeddings/ - Update the
TermsSearchEngineinitialization insrc/server/app.pyto use your custom embeddings class - Alternatively, set the
EMBEDDING_PROVIDERenvironment variable to switch between implemented providers
rag-demo/
├── LICENSE
├── README.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── requirements.txt
├── .env.example
├── src/
│ ├── main.py
│ ├── server/
│ │ ├── __init__.py
│ │ └── app.py
│ ├── embeddings/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── fake.py
│ │ └── llama.py
│ ├── search/
│ │ ├── __init__.py
│ │ └── terms_search_engine.py
│ └── templates/
│ └── index.html
├── static/
├── cache/
├── examples/
└── uploads/
This project is licensed under the MIT License.