Agentic document Q&A system powered by LangGraph, Qdrant, and Claude.
Ingest — upload PDFs, DOCX, or Markdown files. Documents are parsed, adaptively chunked by content type, embedded with FastEmbed, and stored in Qdrant with rich metadata.
Query — ask natural-language questions. A LangGraph ReAct agent autonomously searches, reasons across documents, and synthesizes grounded answers with citations — all streamed in real time.
Predecessor: docwise — same parsing/chunking/embedding layers, upgraded with a full LangGraph agentic pipeline.
- Autonomous agent — LangGraph state graph decides how many searches to run, when to ask for clarification, and when it has enough context to answer
- Multi-hop retrieval — agent can issue multiple
search_documentsandget_document_chunkscalls across a single question - Hybrid search — dense semantic search (FastEmbed) + BM25 keyword search fused with Reciprocal Rank Fusion (RRF)
- Real-time streaming — SSE streams each step (query plan, tool calls, tool results, answer) to the UI as it happens
- Multi-turn conversation — LangGraph
MemorySavercheckpointer persists thread state across turns - Human-in-the-loop — agent can pause and ask the user a clarifying question before continuing
- Grounded answers with citations — every answer references the source chunks it was drawn from
- MCP server — exposes 4 tools over HTTP/SSE so Claude Desktop can query your documents directly
- Document formats — PDF, DOCX, and Markdown supported out of the box
- Observability — LangSmith traces every LLM call, tool call, and multi-turn thread automatically
| Layer | Technology |
|---|---|
| LLM | Anthropic Claude or OpenAI (claude-sonnet-4-6 default) |
| Agent framework | LangGraph (state graph + MemorySaver checkpointer) |
| Vector store | Qdrant (Docker service, horizontally scalable) |
| Embeddings | FastEmbed (local, no API key needed) |
| Keyword search | BM25 via rank-bm25 |
| API | FastAPI + Server-Sent Events (SSE) |
| MCP server | FastMCP over HTTP/SSE |
| UI | Streamlit |
| Observability | LangSmith + structlog |
| Infra | Docker Compose |
| Language | Python 3.12 |
DocWise Agentic implements the ReAct pattern — the agent alternates between reasoning (deciding what to do) and acting (executing a tool), using the result of each action to inform the next reasoning step.
QueryUnderstanding → [ Reason → Act → Reason → Act → ... ] → AnswerSynthesis
| Step | Node | What happens |
|---|---|---|
| Pre-reason | query_understanding |
Classifies query type (factual, analytical, multi_hop, conversational), reformulates the query for better retrieval, flags if clarification is needed |
| Reason | agent_reason |
LLM receives the question + all accumulated context and decides: call a tool, ask clarification, or stop and answer |
| Act | tool_executor |
Executes the chosen tool, appends results to state |
| Loop | tool_executor → agent_reason |
Repeats until the LLM sets done=True or MAX_AGENT_ITERATIONS is reached |
| HitL | clarification |
If the agent calls ask_clarification, the graph interrupts and waits for the user's response before resuming the loop |
| Synthesize | answer_synthesis |
Produces a grounded answer with citations from all retrieved chunks |
The key difference from a fixed RAG pipeline: the agent decides at runtime how many searches to run and which documents to dig into, rather than following a predetermined retrieval path.
Each question goes through a LangGraph state graph. The agent runs a tool-use loop — at every iteration the LLM receives the original question, the reformulated query, and all context accumulated so far, then decides what to do next. It keeps calling tools until it has enough context to answer, or until MAX_AGENT_ITERATIONS is reached.
Tool selection is done by the LLM — each tool has a description and JSON schema that tells the model when to use it:
| Tool | When the agent uses it |
|---|---|
search_documents |
Any factual question — hybrid semantic + BM25 search across all ingested docs |
get_document_chunks |
When it needs the full text of a document or more context around a specific passage |
list_documents |
When the question is about what documents exist, or to discover doc IDs before a targeted search |
ask_clarification |
When the query is genuinely ambiguous and cannot be answered confidently — used sparingly |
Once the LLM decides it has sufficient context it stops calling tools and the graph transitions to AnswerSynthesis, which produces a grounded answer with citations. The ask_clarification tool triggers a human-in-the-loop pause — the agent waits for the user's response before continuing.
What's New - DocWise Agentic Vs. DocWise
| DocWise | DocWise Agentic | |
|---|---|---|
| Query pipeline | Fixed 3-step: retrieve → rank → answer | LangGraph state graph — agent decides tool calls |
| Retrieval | One hybrid search call | Multi-hop: agent can search multiple times |
| Conversation | Single-turn | Multi-turn with thread persistence |
| Vector store | ChromaDB (embedded) | Qdrant (Docker service, horizontally scalable) |
| Streaming | No | SSE — streams query plan, tool calls, answer |
| Human-in-the-loop | No | Clarification node — agent can ask user for more info |
| Observability | structlog only | LangSmith traces run/trace/thread automatically |
Streamlit UI (:8511)
│ HTTP + SSE
▼
FastAPI Backend (:8010)
│
├─ Document Pipeline (from docwise)
│ ParserRegistry → AdaptiveChunker → FastEmbed → Qdrant
│
└─ Agentic Query Pipeline (new)
LangGraph State Graph
QueryUnderstanding → AgentReason ──► search_documents
├──► get_document_chunks
├──► ask_clarification (HitL)
└──► AnswerSynthesis → END
MCP Server (:8011/sse) ← Claude Desktop
Qdrant (:6333)
State is managed at two levels:
Within-turn — AgentState
A single TypedDict that flows through every node in the graph for the duration of one question. Each node reads from it and writes back into it. Chunks retrieved across multiple tool calls accumulate in retrieved_chunks so the final synthesis node always has the full picture.
| Field group | Fields | Purpose |
|---|---|---|
| Input | question, thread_id |
The user's question and conversation identifier |
| Query plan | query_plan |
Query type, reformulated query, intent — set by QueryUnderstanding |
| Tool-use loop | tool_calls, tool_results |
Every tool invocation and its result, accumulated across iterations |
| Retrieved context | retrieved_chunks |
All chunks collected across all search_documents / get_document_chunks calls |
| Output | answer, citations |
Final answer text and source citations |
| HitL | clarification, clarification_response |
Question the agent wants to ask, and the user's reply |
| Loop control | iterations, done, error |
Guards the ReAct loop and signals completion |
Cross-turn — MemorySaver checkpointer
LangGraph serializes and persists the full AgentState after every node, keyed by thread_id. When the user sends a follow-up question in the same thread, the graph resumes from the last checkpoint — giving the agent full context of prior questions, tool calls, and answers without any extra code.
Three search modes
| Mode | How it works | When used |
|---|---|---|
| Semantic | Dense vector ANN via Qdrant — cosine similarity against BAAI/bge-small-en-v1.5 embeddings |
Conceptual or paraphrased queries |
| Keyword | BM25 (Okapi) in-memory index rebuilt after every ingestion — exact and partial term matching | Precise terms, names, IDs |
| Hybrid | RRF fusion of semantic + keyword — default for all agent tool calls | All queries via search_documents |
Hybrid ranking with RRF
Both lists are fetched at 2× top_k before fusion. Each chunk gets a Reciprocal Rank Fusion score:
RRF score = 1 / (k + rank_semantic) + 1 / (k + rank_keyword)
Chunks that rank well in both lists are promoted; chunks with a zero BM25 score are dropped. The final list is re-sorted by RRF score and trimmed to top_k.
Grounding
The AnswerSynthesis node is explicitly instructed to answer using only the retrieved chunks — no external knowledge. If the context is insufficient, the agent says so. Every answer is accompanied by structured citations that include source_file, page_number, and section_title, traceable back to the exact chunk in Qdrant.
Documents are not split with a fixed character window. The AdaptiveChunker inspects each parsed element's type and applies the appropriate strategy, preserving the structure and meaning of the original document.
| Element type | Strategy |
|---|---|
table, code, image |
Preserved whole — never split, structure must not be broken |
heading |
Merged with the immediately following text block so the chunk carries its section context |
text (long) |
Recursive split — tries paragraph → sentence → word → character boundaries in priority order, with configurable overlap between chunks |
list |
Item-aware split — groups items until chunk_size is reached, never splits mid-item |
This means a code block or table always lands in Qdrant as a single retrievable unit, and every text chunk knows which section heading it came from via the section_title metadata field.
Every chunk stored in Qdrant carries the following payload fields:
| Field | Type | Description |
|---|---|---|
doc_id |
string | UUID assigned at ingestion — used to scope searches to a single document |
source_file |
string | Original filename — shown in citations |
element_type |
string | Content type: text, heading, table, code, list, or image |
section_title |
string | Nearest heading above this chunk — provides structural context |
page_number |
int | Page number from the source file (-1 if not applicable) |
chunk_index |
int | Position within the document (0-based) — used to reconstruct reading order |
content |
string | Raw chunk text — used for BM25 keyword search and returned in results |
doc_id, source_file, element_type, page_number, and chunk_index are indexed in Qdrant for fast filtered queries. The agent uses doc_id to restrict a search_documents call to a specific document when the question targets one file.
cp .env.example .env
# Add your ANTHROPIC_API_KEY to .env
cd docker
docker-compose up --build| Service | URL |
|---|---|
| Streamlit UI | http://localhost:8511 |
| FastAPI docs | http://localhost:8010/docs |
| MCP SSE | http://localhost:8011/sse |
| Qdrant UI | http://localhost:6333/dashboard |
Terminal 1 — Start Qdrant
docker run -p 6333:6333 qdrant/qdrantTerminal 2 — Start backend
cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp ../.env.example .env # add ANTHROPIC_API_KEY
python main.py # FastAPI on :8010Terminal 3 — Start frontend
cd frontend
bash start.sh # Streamlit on :8511MCP server (optional, separate terminal)
cd backend
source .venv/bin/activate
python -m docwise_mcp.server # MCP SSE on :8011Copy .env.example to .env and set:
| Variable | Required | Default | Notes |
|---|---|---|---|
ANTHROPIC_API_KEY |
✅ (Anthropic) | — | Claude API key |
OPENAI_API_KEY |
✅ (OpenAI) | — | OpenAI API key |
LLM_PROVIDER |
— | anthropic |
anthropic or openai |
LLM_MODEL |
— | claude-sonnet-4-6 |
Any model for the chosen provider (e.g. gpt-4o) |
EMBEDDING_PROVIDER |
— | fastembed |
fastembed (local, no key needed) or openai |
QDRANT_HOST |
— | localhost |
qdrant in Docker |
MAX_AGENT_ITERATIONS |
— | 5 |
Guard against infinite loops |
LANGSMITH_API_KEY |
— | — | Optional — enables tracing |
LANGSMITH_TRACING |
— | false |
Set true to enable |
| Method | Endpoint | Description |
|---|---|---|
POST |
/ingest/file |
Upload + parse + store a document |
GET |
/documents |
List all ingested documents |
GET |
/documents/{doc_id}/chunks |
All chunks for a document |
DELETE |
/documents/{doc_id} |
Delete a document |
POST |
/query |
Agentic Q&A — SSE streaming |
POST |
/query/sync |
Agentic Q&A — wait for full answer |
POST |
/query/search |
Direct hybrid search (no agent) |
GET |
/health |
Health check |
Interactive docs at http://localhost:8010/docs.
DocWise Agentic exposes 4 tools via MCP over HTTP/SSE:
| Tool | Description |
|---|---|
list_documents |
List all ingested documents |
search_documents |
Hybrid semantic + BM25 search |
ask_question |
Full agentic Q&A via LangGraph |
ingest_document |
Ingest a local file by path |
Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"docwise-agentic": {
"url": "http://localhost:8011/sse"
}
}
}You can use DocWise Agentic as a client three ways:
- Streamlit UI — web app with Upload, Chat, Explorer tabs
- FastAPI — REST API + Swagger at
/docs - Claude Desktop — via MCP — ask Claude to query your documents directly
Enable in .env:
LANGSMITH_API_KEY=your_key
LANGSMITH_TRACING=true
LANGSMITH_PROJECT=docwise-agentic
LangSmith automatically captures three levels:
| Level | What it tracks |
|---|---|
| Run | Each LLM call — tokens, latency, input/output |
| Trace | Full agent invocation — all nodes, total cost |
| Thread | Multi-turn session — all traces grouped by thread_id |
docwise-agentic/
├── backend/
│ ├── agents/
│ │ ├── graph.py # LangGraph state graph
│ │ ├── state.py # AgentState TypedDict
│ │ ├── tools.py # Tool schemas + implementations
│ │ ├── llm.py # Anthropic client singleton
│ │ └── nodes/
│ │ ├── query_understanding.py
│ │ ├── agent_reason.py
│ │ ├── tool_executor.py
│ │ ├── answer_synthesis.py
│ │ └── clarification.py
│ ├── api/
│ │ ├── app.py # FastAPI factory + lifespan
│ │ └── routes/
│ │ ├── ingest.py
│ │ ├── documents.py
│ │ └── query.py # SSE streaming
│ ├── vectorstore/
│ │ ├── schema.py # Qdrant collection schema
│ │ └── qdrant.py # Qdrant adapter (semantic + BM25 + RRF)
│ ├── docwise_mcp/
│ │ └── server.py # FastMCP HTTP/SSE server
│ ├── parsing/ # inherited from docwise
│ ├── chunking/ # inherited from docwise
│ ├── embeddings/ # inherited from docwise
│ ├── observability.py # LangSmith setup
│ ├── config.py
│ ├── logger.py
│ └── main.py
├── frontend/
│ └── app.py # Streamlit — Upload + Chat + Explorer
├── docker/
│ ├── Dockerfile.backend
│ ├── Dockerfile.frontend
│ └── docker-compose.yml
└── docs/
├── REQUIREMENTS.md
├── ARCHITECTURE.md
├── PLAN.md
└── DESCRIPTION.md
MIT — see LICENSE.
langgraph rag agentic-ai qdrant anthropic claude langsmith streamlit fastapi mcp multi-turn sse python





