Skip to content

rakmohan/docwise-agentic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DocWise Agentic

Agentic document Q&A system powered by LangGraph, Qdrant, and Claude.

Ingest — upload PDFs, DOCX, or Markdown files. Documents are parsed, adaptively chunked by content type, embedded with FastEmbed, and stored in Qdrant with rich metadata.

Query — ask natural-language questions. A LangGraph ReAct agent autonomously searches, reasons across documents, and synthesizes grounded answers with citations — all streamed in real time.

Predecessor: docwise — same parsing/chunking/embedding layers, upgraded with a full LangGraph agentic pipeline.


Features

  • Autonomous agent — LangGraph state graph decides how many searches to run, when to ask for clarification, and when it has enough context to answer
  • Multi-hop retrieval — agent can issue multiple search_documents and get_document_chunks calls across a single question
  • Hybrid search — dense semantic search (FastEmbed) + BM25 keyword search fused with Reciprocal Rank Fusion (RRF)
  • Real-time streaming — SSE streams each step (query plan, tool calls, tool results, answer) to the UI as it happens
  • Multi-turn conversation — LangGraph MemorySaver checkpointer persists thread state across turns
  • Human-in-the-loop — agent can pause and ask the user a clarifying question before continuing
  • Grounded answers with citations — every answer references the source chunks it was drawn from
  • MCP server — exposes 4 tools over HTTP/SSE so Claude Desktop can query your documents directly
  • Document formats — PDF, DOCX, and Markdown supported out of the box
  • Observability — LangSmith traces every LLM call, tool call, and multi-turn thread automatically

Tech Stack

Layer Technology
LLM Anthropic Claude or OpenAI (claude-sonnet-4-6 default)
Agent framework LangGraph (state graph + MemorySaver checkpointer)
Vector store Qdrant (Docker service, horizontally scalable)
Embeddings FastEmbed (local, no API key needed)
Keyword search BM25 via rank-bm25
API FastAPI + Server-Sent Events (SSE)
MCP server FastMCP over HTTP/SSE
UI Streamlit
Observability LangSmith + structlog
Infra Docker Compose
Language Python 3.12

ReAct Pattern (Reasoning + Acting)

DocWise Agentic implements the ReAct pattern — the agent alternates between reasoning (deciding what to do) and acting (executing a tool), using the result of each action to inform the next reasoning step.

QueryUnderstanding → [ Reason → Act → Reason → Act → ... ] → AnswerSynthesis
Step Node What happens
Pre-reason query_understanding Classifies query type (factual, analytical, multi_hop, conversational), reformulates the query for better retrieval, flags if clarification is needed
Reason agent_reason LLM receives the question + all accumulated context and decides: call a tool, ask clarification, or stop and answer
Act tool_executor Executes the chosen tool, appends results to state
Loop tool_executor → agent_reason Repeats until the LLM sets done=True or MAX_AGENT_ITERATIONS is reached
HitL clarification If the agent calls ask_clarification, the graph interrupts and waits for the user's response before resuming the loop
Synthesize answer_synthesis Produces a grounded answer with citations from all retrieved chunks

The key difference from a fixed RAG pipeline: the agent decides at runtime how many searches to run and which documents to dig into, rather than following a predetermined retrieval path.


How the Agent Works

Each question goes through a LangGraph state graph. The agent runs a tool-use loop — at every iteration the LLM receives the original question, the reformulated query, and all context accumulated so far, then decides what to do next. It keeps calling tools until it has enough context to answer, or until MAX_AGENT_ITERATIONS is reached.

Tool selection is done by the LLM — each tool has a description and JSON schema that tells the model when to use it:

Tool When the agent uses it
search_documents Any factual question — hybrid semantic + BM25 search across all ingested docs
get_document_chunks When it needs the full text of a document or more context around a specific passage
list_documents When the question is about what documents exist, or to discover doc IDs before a targeted search
ask_clarification When the query is genuinely ambiguous and cannot be answered confidently — used sparingly

Once the LLM decides it has sufficient context it stops calling tools and the graph transitions to AnswerSynthesis, which produces a grounded answer with citations. The ask_clarification tool triggers a human-in-the-loop pause — the agent waits for the user's response before continuing.


What's New - DocWise Agentic Vs. DocWise

DocWise DocWise Agentic
Query pipeline Fixed 3-step: retrieve → rank → answer LangGraph state graph — agent decides tool calls
Retrieval One hybrid search call Multi-hop: agent can search multiple times
Conversation Single-turn Multi-turn with thread persistence
Vector store ChromaDB (embedded) Qdrant (Docker service, horizontally scalable)
Streaming No SSE — streams query plan, tool calls, answer
Human-in-the-loop No Clarification node — agent can ask user for more info
Observability structlog only LangSmith traces run/trace/thread automatically

Architecture

Streamlit UI (:8511)
      │  HTTP + SSE
      ▼
FastAPI Backend (:8010)
      │
      ├─ Document Pipeline (from docwise)
      │     ParserRegistry → AdaptiveChunker → FastEmbed → Qdrant
      │
      └─ Agentic Query Pipeline (new)
            LangGraph State Graph
              QueryUnderstanding → AgentReason ──► search_documents
                                              ├──► get_document_chunks
                                              ├──► ask_clarification (HitL)
                                              └──► AnswerSynthesis → END

MCP Server (:8011/sse)  ←  Claude Desktop
Qdrant (:6333)

Screenshots

Upload — Parse and ingest documents

Upload tab Upload tab

Chat — Streaming agent steps, answer, and citations

Chat basic

Chat — Cross-document synthesis with multiple tool calls

Chat cross-doc

Explorer — Browse chunks and inspect metadata

Explorer tab Explorer tab


State & Memory Management

State is managed at two levels:

Within-turn — AgentState A single TypedDict that flows through every node in the graph for the duration of one question. Each node reads from it and writes back into it. Chunks retrieved across multiple tool calls accumulate in retrieved_chunks so the final synthesis node always has the full picture.

Field group Fields Purpose
Input question, thread_id The user's question and conversation identifier
Query plan query_plan Query type, reformulated query, intent — set by QueryUnderstanding
Tool-use loop tool_calls, tool_results Every tool invocation and its result, accumulated across iterations
Retrieved context retrieved_chunks All chunks collected across all search_documents / get_document_chunks calls
Output answer, citations Final answer text and source citations
HitL clarification, clarification_response Question the agent wants to ask, and the user's reply
Loop control iterations, done, error Guards the ReAct loop and signals completion

Cross-turn — MemorySaver checkpointer LangGraph serializes and persists the full AgentState after every node, keyed by thread_id. When the user sends a follow-up question in the same thread, the graph resumes from the last checkpoint — giving the agent full context of prior questions, tool calls, and answers without any extra code.


Retrieval & Grounding

Three search modes

Mode How it works When used
Semantic Dense vector ANN via Qdrant — cosine similarity against BAAI/bge-small-en-v1.5 embeddings Conceptual or paraphrased queries
Keyword BM25 (Okapi) in-memory index rebuilt after every ingestion — exact and partial term matching Precise terms, names, IDs
Hybrid RRF fusion of semantic + keyword — default for all agent tool calls All queries via search_documents

Hybrid ranking with RRF

Both lists are fetched at 2× top_k before fusion. Each chunk gets a Reciprocal Rank Fusion score:

RRF score = 1 / (k + rank_semantic) + 1 / (k + rank_keyword)

Chunks that rank well in both lists are promoted; chunks with a zero BM25 score are dropped. The final list is re-sorted by RRF score and trimmed to top_k.

Grounding

The AnswerSynthesis node is explicitly instructed to answer using only the retrieved chunks — no external knowledge. If the context is insufficient, the agent says so. Every answer is accompanied by structured citations that include source_file, page_number, and section_title, traceable back to the exact chunk in Qdrant.


Adaptive Chunking

Documents are not split with a fixed character window. The AdaptiveChunker inspects each parsed element's type and applies the appropriate strategy, preserving the structure and meaning of the original document.

Element type Strategy
table, code, image Preserved whole — never split, structure must not be broken
heading Merged with the immediately following text block so the chunk carries its section context
text (long) Recursive split — tries paragraph → sentence → word → character boundaries in priority order, with configurable overlap between chunks
list Item-aware split — groups items until chunk_size is reached, never splits mid-item

This means a code block or table always lands in Qdrant as a single retrievable unit, and every text chunk knows which section heading it came from via the section_title metadata field.


Vector Store — Chunk Metadata

Every chunk stored in Qdrant carries the following payload fields:

Field Type Description
doc_id string UUID assigned at ingestion — used to scope searches to a single document
source_file string Original filename — shown in citations
element_type string Content type: text, heading, table, code, list, or image
section_title string Nearest heading above this chunk — provides structural context
page_number int Page number from the source file (-1 if not applicable)
chunk_index int Position within the document (0-based) — used to reconstruct reading order
content string Raw chunk text — used for BM25 keyword search and returned in results

doc_id, source_file, element_type, page_number, and chunk_index are indexed in Qdrant for fast filtered queries. The agent uses doc_id to restrict a search_documents call to a specific document when the question targets one file.


Quick Start

Option A — Docker (recommended)

cp .env.example .env
# Add your ANTHROPIC_API_KEY to .env

cd docker
docker-compose up --build
Service URL
Streamlit UI http://localhost:8511
FastAPI docs http://localhost:8010/docs
MCP SSE http://localhost:8011/sse
Qdrant UI http://localhost:6333/dashboard

Option B — Local development

Terminal 1 — Start Qdrant

docker run -p 6333:6333 qdrant/qdrant

Terminal 2 — Start backend

cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp ../.env.example .env   # add ANTHROPIC_API_KEY
python main.py            # FastAPI on :8010

Terminal 3 — Start frontend

cd frontend
bash start.sh             # Streamlit on :8511

MCP server (optional, separate terminal)

cd backend
source .venv/bin/activate
python -m docwise_mcp.server   # MCP SSE on :8011

Configuration

Copy .env.example to .env and set:

Variable Required Default Notes
ANTHROPIC_API_KEY ✅ (Anthropic) Claude API key
OPENAI_API_KEY ✅ (OpenAI) OpenAI API key
LLM_PROVIDER anthropic anthropic or openai
LLM_MODEL claude-sonnet-4-6 Any model for the chosen provider (e.g. gpt-4o)
EMBEDDING_PROVIDER fastembed fastembed (local, no key needed) or openai
QDRANT_HOST localhost qdrant in Docker
MAX_AGENT_ITERATIONS 5 Guard against infinite loops
LANGSMITH_API_KEY Optional — enables tracing
LANGSMITH_TRACING false Set true to enable

API

Method Endpoint Description
POST /ingest/file Upload + parse + store a document
GET /documents List all ingested documents
GET /documents/{doc_id}/chunks All chunks for a document
DELETE /documents/{doc_id} Delete a document
POST /query Agentic Q&A — SSE streaming
POST /query/sync Agentic Q&A — wait for full answer
POST /query/search Direct hybrid search (no agent)
GET /health Health check

Interactive docs at http://localhost:8010/docs.


MCP (Claude Desktop)

DocWise Agentic exposes 4 tools via MCP over HTTP/SSE:

Tool Description
list_documents List all ingested documents
search_documents Hybrid semantic + BM25 search
ask_question Full agentic Q&A via LangGraph
ingest_document Ingest a local file by path

Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "docwise-agentic": {
      "url": "http://localhost:8011/sse"
    }
  }
}

You can use DocWise Agentic as a client three ways:

  1. Streamlit UI — web app with Upload, Chat, Explorer tabs
  2. FastAPI — REST API + Swagger at /docs
  3. Claude Desktop — via MCP — ask Claude to query your documents directly

LangSmith Tracing

Enable in .env:

LANGSMITH_API_KEY=your_key
LANGSMITH_TRACING=true
LANGSMITH_PROJECT=docwise-agentic

LangSmith automatically captures three levels:

Level What it tracks
Run Each LLM call — tokens, latency, input/output
Trace Full agent invocation — all nodes, total cost
Thread Multi-turn session — all traces grouped by thread_id

Project Structure

docwise-agentic/
├── backend/
│   ├── agents/
│   │   ├── graph.py          # LangGraph state graph
│   │   ├── state.py          # AgentState TypedDict
│   │   ├── tools.py          # Tool schemas + implementations
│   │   ├── llm.py            # Anthropic client singleton
│   │   └── nodes/
│   │       ├── query_understanding.py
│   │       ├── agent_reason.py
│   │       ├── tool_executor.py
│   │       ├── answer_synthesis.py
│   │       └── clarification.py
│   ├── api/
│   │   ├── app.py            # FastAPI factory + lifespan
│   │   └── routes/
│   │       ├── ingest.py
│   │       ├── documents.py
│   │       └── query.py      # SSE streaming
│   ├── vectorstore/
│   │   ├── schema.py         # Qdrant collection schema
│   │   └── qdrant.py         # Qdrant adapter (semantic + BM25 + RRF)
│   ├── docwise_mcp/
│   │   └── server.py         # FastMCP HTTP/SSE server
│   ├── parsing/              # inherited from docwise
│   ├── chunking/             # inherited from docwise
│   ├── embeddings/           # inherited from docwise
│   ├── observability.py      # LangSmith setup
│   ├── config.py
│   ├── logger.py
│   └── main.py
├── frontend/
│   └── app.py                # Streamlit — Upload + Chat + Explorer
├── docker/
│   ├── Dockerfile.backend
│   ├── Dockerfile.frontend
│   └── docker-compose.yml
└── docs/
    ├── REQUIREMENTS.md
    ├── ARCHITECTURE.md
    ├── PLAN.md
    └── DESCRIPTION.md

License

MIT — see LICENSE.


Tags

langgraph rag agentic-ai qdrant anthropic claude langsmith streamlit fastapi mcp multi-turn sse python

About

Full agentic RAG pipeline: adaptive chunking → Qdrant hybrid search (semantic + BM25 + RRF) → LangGraph ReAct agent → grounded answers with citations, streamed live. Multi-turn conversation, multi-hop reasoning, human-in-the-loop, LangSmith tracing, MCP server. LLM-agnostic — works with Claude or OpenAI.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors