A RAG (Retrieval-Augmented Generation) system using LangGraph, Milvus vector database, and OpenAI/Cohere for intelligent document Q&A.
GraphRAG implements a graph-based workflow that stores documents in a vector database, retrieves relevant context using hybrid search, and generates responses using LLMs. The system supports multiple document collections organized by namespaces.
- Document Storage: Vector database with Milvus for scalable document storage
- Hybrid Search: Combines dense embeddings + sparse BM25 vectors for better retrieval
- Reranking: Optional Cohere reranker to improve search result quality
- Graph Workflow: LangGraph orchestrates the retrieval → generation pipeline
- Multi-namespace: Organize documents by project/collection for better isolation
- Summarization: Auto-generate collection summaries for context awareness
- Conversational Memory: Supports both short-term (session) and long-term memory so the system can maintain context across multiple turns and sessions. This improves coherence for follow-up questions, enables limited personalization, and helps the agent remember important facts or user preferences when appropriate.
Query → [Store] → Retrieve → [Reranker] → [Agent] → Generate → Response
↓ ↑
Milvus DB LangGraph
The workflow:
-
Store documents with embeddings in Milvus
-
Retrieve relevant docs using hybrid search (dense + sparse)
-
Rerank results with Cohere (optional)
-
Generate response using LLM with retrieved context
-
Install dependencies:
poetry install- Set environment variables:
OPENAI_API_KEY=your_key
COHERE_API_KEY=your_key # Optional- Start Milvus:
docker run -d -p 19530:19530 milvusdb/milvus:latestfrom graphrag.store import Store
from graphrag.agent import GraphRAG
from langchain_core.documents import Document
# Initialize store
store = Store(
uri="http://localhost:19530",
database="my_db",
collection="docs",
k=4
)
# Add documents
docs = [Document(
page_content="Your content",
metadata={"namespace": "project1", "page_start": 1, "path": "doc.pdf"}
)]
store.add(docs)
# Create agent and query
agent = GraphRAG(store=store, llm="gpt-4o-mini", rerank=True)
result = agent.run("Your question here")
print(f"Answer: {result['response']}")
print(f"Sources: {len(result['context'])} documents")All prompts used by the agent are customizable via PromptsConfig. Default prompts are used when no configuration is provided.
from graphrag import PromptsConfig
custom_prompts = PromptsConfig(
generate_response="Your generation prompt with {query}, {context}, {memory}",
evaluate_context="Your evaluation prompt with {query}, {context}",
refine_query="Your refinement prompt with {history}, {current_question}",
)
agent = GraphRAG(store=store, llm="gpt-4o-mini", prompts=custom_prompts)Manages document storage and retrieval:
- Stores documents with OpenAI embeddings in Milvus
- Supports similarity search with score thresholds
- Generates collection summaries for better context
- Query by metadata filters (namespace, page range, etc.)
LangGraph-based workflow orchestration:
- Takes user queries and retrieves relevant documents
- Generates responses using specified LLM (GPT, Claude, etc.)
- Returns complete state with query, context, and response
- Configurable retrieval parameters and reranking
- Maintains conversational memory (short-term/session and optional long-term storage). Memory provides recent-turn context for coherent multi-turn dialogue and can persist selected information across sessions to improve continuity and personalization. Memory usage is integrated into retrieval and generation to surface relevant prior exchanges or saved facts when producing answers.
Optional component for improving retrieval quality:
- Reorders retrieved documents by relevance to query
- Reduces noise and improves answer quality
- Configurable top-N results and models
metadata = {
"namespace": "project_name", # Required
"page_start": 1,
"page_end": 1,
"path": "document.pdf"
}# Document management
store.add(docs) # Add documents to collection
store.drop_collection() # Remove entire collection
# Retrieval
store.retrieve(query, score=False) # Basic similarity search
store.retrieve_with_reranker(query) # Search + reranking
store.query('namespace == "proj"') # Metadata filtering
# Utilities
store.summarize(model="gpt-4o") # Generate collection summary# Main workflow
result = agent.run(query) # Process query end-to-end
# Result structure
{
"query": "user question",
"context": [Document, ...], # Retrieved documents
"response": "generated answer"
}OPENAI_API_KEY: Required for embeddings and LLMCOHERE_API_KEY: Optional, for reranking functionality
uri: Milvus server URI (default: localhost:19530)database/collection: Database and collection namesk: Number of documents to retrieve (default: 4)embedding_model: OpenAI model (default: text-embedding-3-small)
llm: Model name (gpt-4o-mini, claude-3-sonnet, etc.)rerank: Enable/disable Cohere reranking (default: False)prompts:PromptsConfiginstance for custom prompts (default: built-in prompts)