DocChat is a production-ready multi-agent RAG (Retrieval-Augmented Generation) document Q&A system powered by IBM Watsonx, LangGraph, and Docling.
Upload any document (PDF, DOCX, TXT, MD) and ask questions — DocChat's 3-agent LangGraph workflow handles relevance checking, research generation, and factual verification automatically, with intelligent re-research loops when answers need improvement.
Domain: RAG + Multi-Agent AI — Document Intelligence
LLMs: Meta Llama 3.3 70B (Research) + IBM Granite 3.3 8B (Relevance)
Embeddings: IBM Slate 125M English Retriever v2
Document Parser: Docling
UI: Gradio
┌──────────────────────────────────────────────────────────┐
│ Gradio Web UI (app.py) │
│ Upload PDF/DOCX/TXT/MD + Enter Question │
│ Session state caching (SHA-256 file hashes) │
└─────────────────────┬────────────────────────────────────┘
│
┌─────────────────────▼────────────────────────────────────┐
│ DocumentProcessor (Docling) │
│ Docling → Markdown → MarkdownHeaderTextSplitter │
│ SHA-256 content hashing → 7-day disk cache (.pkl) │
└─────────────────────┬────────────────────────────────────┘
│
┌─────────────────────▼────────────────────────────────────┐
│ Hybrid Retriever (BM25 + Vector) │
│ BM25Retriever (weight 0.4) │
│ ChromaDB + IBM Slate 125M embeddings (weight 0.6) │
│ EnsembleRetriever combines both │
└─────────────────────┬────────────────────────────────────┘
│
┌─────────────────────▼────────────────────────────────────┐
│ LangGraph AgentWorkflow (3 Agents) │
│ │
│ ┌───────────────────┐ │
│ │ RelevanceChecker │ IBM Granite 3.3 8B │
│ │ CAN_ANSWER / │ Classifies doc-query match │
│ │ PARTIAL / NO_MATCH│ │
│ └────────┬──────────┘ │
│ │ relevant → research │
│ │ irrelevant → END │
│ ┌────────▼──────────┐ │
│ │ ResearchAgent │ Llama 3.3 70B Instruct │
│ │ Generates draft │ Context-grounded answers │
│ │ answer from docs │ │
│ └────────┬──────────┘ │
│ │ │
│ ┌────────▼──────────┐ │
│ │ VerificationAgent │ Checks factual accuracy │
│ │ Supported: YES/NO│ vs source documents │
│ └────────┬──────────┘ │
│ │ verified → END │
│ │ failed → re_research (loop back) │
└──────────────────────────────────────────────────────────┘
docchat/
│
├── app.py # Gradio UI + session state management
├── agents/
│ ├── workflow.py # LangGraph AgentWorkflow (3 agents)
│ ├── research_agent.py # ResearchAgent — Llama 3.3 70B
│ ├── verification_agent.py # VerificationAgent — factual check
│ └── relevance_checker.py # RelevanceChecker — Granite 3.3 8B
├── document_processor/
│ └── file_handler.py # Docling parser + chunking + caching
├── retriever/
│ └── builder.py # Hybrid BM25 + ChromaDB retriever
├── config/
│ ├── settings.py # Pydantic settings (env vars)
│ └── constants.py # File size limits + allowed types
├── utils/
│ └── logging.py # Logger setup
├── document_cache/ # SHA-256 hashed .pkl cache files
├── test/ # Test PDFs + OCR test cases
└── requirements.txt
| Component | Technology |
|---|---|
| Agent Orchestration | LangGraph (StateGraph) |
| Research LLM | Meta Llama 3.3 70B Instruct (IBM Watsonx) |
| Relevance LLM | IBM Granite 3.3 8B Instruct (IBM Watsonx) |
| Embeddings | IBM Slate 125M English Retriever v2 (Watsonx) |
| Document Parser | Docling → Markdown |
| Text Splitter | MarkdownHeaderTextSplitter (H1, H2) |
| Vector Store | ChromaDB |
| Keyword Retriever | BM25Retriever |
| Hybrid Retrieval | EnsembleRetriever (BM25 0.4 + Vector 0.6) |
| UI | Gradio (Citrus theme + custom CSS/JS) |
| Caching | SHA-256 + Pickle (7-day expiry) |
| Config | Pydantic BaseSettings + .env |
# Classifies document-query relevance
response → "CAN_ANSWER" | "PARTIAL" | "NO_MATCH"
# Routing:
CAN_ANSWER / PARTIAL → proceed to ResearchAgent
NO_MATCH → return "Question not related to uploaded documents"# Generates grounded answers from retrieved context
model_id = "meta-llama/llama-3-3-70b-instruct"
params = {"max_tokens": 300, "temperature": 0.3}
# Context = combined top document chunks
context = "\n\n".join([doc.page_content for doc in documents])# Checks factual accuracy of draft answer vs source documents
# Reports: "Supported: YES/NO" + "Relevant: YES/NO"
# If verification fails → triggers re_research loop
"Supported: NO" or "Relevant: NO" → re_research
else → ENDclass AgentState(TypedDict):
question: str
documents: List[Document]
draft_answer: str
verification_report: str
is_relevant: bool
retriever: EnsembleRetriever
# Node connections
workflow.set_entry_point("check_relevance")
workflow.add_conditional_edges("check_relevance", decide, {
"relevant": "research",
"irrelevant": END
})
workflow.add_edge("research", "verify")
workflow.add_conditional_edges("verify", decide_next, {
"re_research": "research", # Loop back if verification fails
"end": END
})# BM25 — keyword-based retrieval (weight 0.4)
bm25 = BM25Retriever.from_documents(docs)
# ChromaDB — semantic vector retrieval (weight 0.6)
vector_store = Chroma.from_documents(
documents=docs,
embedding=WatsonxEmbeddings(model_id="ibm/slate-125m-english-rtrvr-v2")
)
# Ensemble — combines both for best coverage
hybrid_retriever = EnsembleRetriever(
retrievers=[bm25, vector_retriever],
weights=[0.4, 0.6]
)# Docling parses PDF/DOCX/TXT/MD → Markdown
converter = DocumentConverter()
result = converter.convert(file_path)
markdown_text = result.document.export_to_markdown()
# Split on Markdown headers
splitter = MarkdownHeaderTextSplitter(headers=[("#", "Header 1"), ("##", "Header 2")])
chunks = splitter.split_text(markdown_text)
# SHA-256 content caching (7-day expiry)
file_hash = hashlib.sha256(file_bytes).hexdigest()
cache_path = f"document_cache/{file_hash}.pkl"class Settings(BaseSettings):
CHROMA_DB_PATH: str = "./chroma_db"
VECTOR_SEARCH_K: int = 10
HYBRID_RETRIEVER_WEIGHTS: list = [0.4, 0.6]
CACHE_DIR: str = "document_cache"
CACHE_EXPIRE_DAYS: int = 7
class Config:
env_file = ".env"Step 1 — Install dependencies:
pip install -r requirements.txtStep 2 — Set environment variables:
# .env file
OPENAI_API_KEY=your-key
WATSONX_API_KEY=your-keyStep 3 — Launch DocChat:
python app.py
# Opens at http://127.0.0.1:5000Step 4 — Use the UI:
- Upload PDF/DOCX/TXT/MD file
- Enter your question
- Click Submit → get grounded, verified answer!
| Document | Sample Query |
|---|---|
| Google 2024 Environmental Report | "Retrieve the data center PUE efficiency values in Singapore 2nd facility in 2019 and 2022. Also retrieve regional average CFE in Asia Pacific in 2023" |
| DeepSeek-R1 Technical Report | "Summarize DeepSeek-R1 model's performance evaluation on all coding tasks against OpenAI o1-mini model" |
- Multi-agent RAG pipeline with LangGraph StateGraph
- Hybrid retrieval — BM25 + ChromaDB + IBM Watsonx embeddings
- Docling document parsing (PDF, DOCX, TXT, MD → Markdown)
- 3 specialized IBM Watsonx agents — relevance, research, verification
- Meta Llama 3.3 70B + IBM Granite 3.3 8B integration
- IBM Slate 125M embedding model for semantic search
- Conditional re-research loop when verification fails
- SHA-256 content hashing for intelligent document caching
- Session state management in Gradio
- Pydantic BaseSettings for configuration management
- Production-ready architecture with modular design
| Certification | Issuer | Platform |
|---|---|---|
| IBM Data Science Professional Certificate | IBM | Coursera |
| IBM Generative AI Professional Certificate | IBM | Coursera |
| IBM Agentic AI with RAG Certificate | IBM | Coursera |
| IBM RAG and Agentic AI Professional Certificate | IBM | Coursera |