Multi-provider agentic RAG built with LangGraph and LangChain.
This project indexes a set of source documents, caches the vector index locally, and answers questions with a retrieval-first workflow that can rewrite queries, call retrieval as a tool, and generate grounded answers with citations.
- LangGraph-based agentic RAG flow
- Multi-provider chat model support
- Multi-provider embedding support
- OpenAI-compatible endpoint support
- Local vector index cache in
.cache/vectorstores - CLI with step-by-step execution tracing
- Source-aware answers with citations
googleopenaiopenai-compatibleanthropiclitellm
googleopenaiopenai-compatiblelitellm
git clone git@github.com:scldrn/RAGmain.git
cd RAGmain
python3 -m venv .venv
source .venv/bin/activate
pip install -e '.[dev]'Create a .env file with your provider config. Minimal Google AI Studio example:
CHAT_PROVIDER=google
CHAT_MODEL=gemini-2.5-flash
EMBEDDING_PROVIDER=google
EMBEDDING_MODEL=gemini-embedding-2-preview
INDEX_CACHE_DIR=.cache/vectorstores
GOOGLE_API_KEY=your_api_keyRun a query:
python -m agentic_rag --question "What does Lilian Weng say about reward hacking?"Run with trace output:
python -m agentic_rag \
--question "What does Lilian Weng say about reward hacking?" \
--show-stepsOpenAI:
CHAT_PROVIDER=openai
CHAT_MODEL=gpt-4.1-mini
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=your_api_keyAnthropic chat + OpenAI embeddings:
CHAT_PROVIDER=anthropic
CHAT_MODEL=claude-3-5-sonnet-20241022
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
ANTHROPIC_API_KEY=your_api_key
OPENAI_API_KEY=your_api_keyOpenAI-compatible local endpoint:
CHAT_PROVIDER=openai-compatible
CHAT_MODEL=local-model-name
CHAT_API_BASE=http://localhost:1234/v1
CHAT_API_KEY=lm-studio
EMBEDDING_PROVIDER=openai-compatible
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_API_BASE=http://localhost:1234/v1
EMBEDDING_API_KEY=lm-studioCommon options:
--question--url--chat-provider--chat-model--embedding-provider--embedding-model--chat-api-base--embedding-api-base--show-steps--diagram
Show full help:
python -m agentic_rag --help- Load and split source documents.
- Build or reuse a cached vector index.
- Let the graph decide whether to answer directly or retrieve.
- Retrieve relevant chunks as a tool call.
- Grade retrieved context.
- Rewrite the question if retrieval quality is weak.
- Generate a final cited answer.
The current default setup indexes a small set of Lilian Weng blog posts so the project works immediately as a reference RAG example.
You can add your own sources at runtime:
python -m agentic_rag \
--url https://example.com/doc1 \
--url https://example.com/doc2 \
--question "Summarize the main ideas"Run tests:
pytestMain files:
src/agentic_rag/providers.pysrc/agentic_rag/app.pysrc/agentic_rag/graph.pysrc/agentic_rag/cli.pysrc/agentic_rag/settings.py
- The first run for a new source/config combination builds embeddings and writes a cache file.
- Later runs reuse the cached vector index and are much cheaper/faster.
- Python
3.10+is recommended even though the project currently runs on3.9.