DocWise Agentic

Agentic document Q&A system powered by LangGraph, Qdrant, and Claude.

Ingest — upload PDFs, DOCX, or Markdown files. Documents are parsed, adaptively chunked by content type, embedded with FastEmbed, and stored in Qdrant with rich metadata.

Query — ask natural-language questions. A LangGraph ReAct agent autonomously searches, reasons across documents, and synthesizes grounded answers with citations — all streamed in real time.

Predecessor: docwise — same parsing/chunking/embedding layers, upgraded with a full LangGraph agentic pipeline.

Features

Autonomous agent — LangGraph state graph decides how many searches to run, when to ask for clarification, and when it has enough context to answer
Multi-hop retrieval — agent can issue multiple search_documents and get_document_chunks calls across a single question
Hybrid search — dense semantic search (FastEmbed) + BM25 keyword search fused with Reciprocal Rank Fusion (RRF)
Real-time streaming — SSE streams each step (query plan, tool calls, tool results, answer) to the UI as it happens
Multi-turn conversation — LangGraph MemorySaver checkpointer persists thread state across turns
Human-in-the-loop — agent can pause and ask the user a clarifying question before continuing
Grounded answers with citations — every answer references the source chunks it was drawn from
MCP server — exposes 4 tools over HTTP/SSE so Claude Desktop can query your documents directly
Document formats — PDF, DOCX, and Markdown supported out of the box
Observability — LangSmith traces every LLM call, tool call, and multi-turn thread automatically

Tech Stack

Layer	Technology
LLM	Anthropic Claude or OpenAI (claude-sonnet-4-6 default)
Agent framework	LangGraph (state graph + MemorySaver checkpointer)
Vector store	Qdrant (Docker service, horizontally scalable)
Embeddings	FastEmbed (local, no API key needed)
Keyword search	BM25 via rank-bm25
API	FastAPI + Server-Sent Events (SSE)
MCP server	FastMCP over HTTP/SSE
UI	Streamlit
Observability	LangSmith + structlog
Infra	Docker Compose
Language	Python 3.12

ReAct Pattern (Reasoning + Acting)

DocWise Agentic implements the ReAct pattern — the agent alternates between reasoning (deciding what to do) and acting (executing a tool), using the result of each action to inform the next reasoning step.

QueryUnderstanding → [ Reason → Act → Reason → Act → ... ] → AnswerSynthesis

Step	Node	What happens
Pre-reason	`query_understanding`	Classifies query type (`factual`, `analytical`, `multi_hop`, `conversational`), reformulates the query for better retrieval, flags if clarification is needed
Reason	`agent_reason`	LLM receives the question + all accumulated context and decides: call a tool, ask clarification, or stop and answer
Act	`tool_executor`	Executes the chosen tool, appends results to state
Loop	`tool_executor → agent_reason`	Repeats until the LLM sets `done=True` or `MAX_AGENT_ITERATIONS` is reached
HitL	`clarification`	If the agent calls `ask_clarification`, the graph interrupts and waits for the user's response before resuming the loop
Synthesize	`answer_synthesis`	Produces a grounded answer with citations from all retrieved chunks

The key difference from a fixed RAG pipeline: the agent decides at runtime how many searches to run and which documents to dig into, rather than following a predetermined retrieval path.

How the Agent Works

Each question goes through a LangGraph state graph. The agent runs a tool-use loop — at every iteration the LLM receives the original question, the reformulated query, and all context accumulated so far, then decides what to do next. It keeps calling tools until it has enough context to answer, or until MAX_AGENT_ITERATIONS is reached.

Tool selection is done by the LLM — each tool has a description and JSON schema that tells the model when to use it:

Tool	When the agent uses it
`search_documents`	Any factual question — hybrid semantic + BM25 search across all ingested docs
`get_document_chunks`	When it needs the full text of a document or more context around a specific passage
`list_documents`	When the question is about what documents exist, or to discover doc IDs before a targeted search
`ask_clarification`	When the query is genuinely ambiguous and cannot be answered confidently — used sparingly

Once the LLM decides it has sufficient context it stops calling tools and the graph transitions to AnswerSynthesis, which produces a grounded answer with citations. The ask_clarification tool triggers a human-in-the-loop pause — the agent waits for the user's response before continuing.

What's New - DocWise Agentic Vs. DocWise

	DocWise	DocWise Agentic
Query pipeline	Fixed 3-step: retrieve → rank → answer	LangGraph state graph — agent decides tool calls
Retrieval	One hybrid search call	Multi-hop: agent can search multiple times
Conversation	Single-turn	Multi-turn with thread persistence
Vector store	ChromaDB (embedded)	Qdrant (Docker service, horizontally scalable)
Streaming	No	SSE — streams query plan, tool calls, answer
Human-in-the-loop	No	Clarification node — agent can ask user for more info
Observability	structlog only	LangSmith traces run/trace/thread automatically

Architecture

Streamlit UI (:8511)
      │  HTTP + SSE
      ▼
FastAPI Backend (:8010)
      │
      ├─ Document Pipeline (from docwise)
      │     ParserRegistry → AdaptiveChunker → FastEmbed → Qdrant
      │
      └─ Agentic Query Pipeline (new)
            LangGraph State Graph
              QueryUnderstanding → AgentReason ──► search_documents
                                              ├──► get_document_chunks
                                              ├──► ask_clarification (HitL)
                                              └──► AnswerSynthesis → END

MCP Server (:8011/sse)  ←  Claude Desktop
Qdrant (:6333)

Screenshots

Upload — Parse and ingest documents

Chat — Streaming agent steps, answer, and citations

Chat — Cross-document synthesis with multiple tool calls

Explorer — Browse chunks and inspect metadata

State & Memory Management

State is managed at two levels:

Within-turn — AgentState A single TypedDict that flows through every node in the graph for the duration of one question. Each node reads from it and writes back into it. Chunks retrieved across multiple tool calls accumulate in retrieved_chunks so the final synthesis node always has the full picture.

Field group	Fields	Purpose
Input	`question`, `thread_id`	The user's question and conversation identifier
Query plan	`query_plan`	Query type, reformulated query, intent — set by `QueryUnderstanding`
Tool-use loop	`tool_calls`, `tool_results`	Every tool invocation and its result, accumulated across iterations
Retrieved context	`retrieved_chunks`	All chunks collected across all `search_documents` / `get_document_chunks` calls
Output	`answer`, `citations`	Final answer text and source citations
HitL	`clarification`, `clarification_response`	Question the agent wants to ask, and the user's reply
Loop control	`iterations`, `done`, `error`	Guards the ReAct loop and signals completion

Cross-turn — MemorySaver checkpointer LangGraph serializes and persists the full AgentState after every node, keyed by thread_id. When the user sends a follow-up question in the same thread, the graph resumes from the last checkpoint — giving the agent full context of prior questions, tool calls, and answers without any extra code.

Retrieval & Grounding

Three search modes

Mode	How it works	When used
Semantic	Dense vector ANN via Qdrant — cosine similarity against `BAAI/bge-small-en-v1.5` embeddings	Conceptual or paraphrased queries
Keyword	BM25 (Okapi) in-memory index rebuilt after every ingestion — exact and partial term matching	Precise terms, names, IDs
Hybrid	RRF fusion of semantic + keyword — default for all agent tool calls	All queries via `search_documents`

Hybrid ranking with RRF

Both lists are fetched at 2× top_k before fusion. Each chunk gets a Reciprocal Rank Fusion score:

RRF score = 1 / (k + rank_semantic) + 1 / (k + rank_keyword)

Chunks that rank well in both lists are promoted; chunks with a zero BM25 score are dropped. The final list is re-sorted by RRF score and trimmed to top_k.

Grounding

The AnswerSynthesis node is explicitly instructed to answer using only the retrieved chunks — no external knowledge. If the context is insufficient, the agent says so. Every answer is accompanied by structured citations that include source_file, page_number, and section_title, traceable back to the exact chunk in Qdrant.

Adaptive Chunking

Documents are not split with a fixed character window. The AdaptiveChunker inspects each parsed element's type and applies the appropriate strategy, preserving the structure and meaning of the original document.

Element type	Strategy
`table`, `code`, `image`	Preserved whole — never split, structure must not be broken
`heading`	Merged with the immediately following text block so the chunk carries its section context
`text` (long)	Recursive split — tries paragraph → sentence → word → character boundaries in priority order, with configurable overlap between chunks
`list`	Item-aware split — groups items until `chunk_size` is reached, never splits mid-item

This means a code block or table always lands in Qdrant as a single retrievable unit, and every text chunk knows which section heading it came from via the section_title metadata field.

Vector Store — Chunk Metadata

Every chunk stored in Qdrant carries the following payload fields:

Field	Type	Description
`doc_id`	string	UUID assigned at ingestion — used to scope searches to a single document
`source_file`	string	Original filename — shown in citations
`element_type`	string	Content type: `text`, `heading`, `table`, `code`, `list`, or `image`
`section_title`	string	Nearest heading above this chunk — provides structural context
`page_number`	int	Page number from the source file (`-1` if not applicable)
`chunk_index`	int	Position within the document (0-based) — used to reconstruct reading order
`content`	string	Raw chunk text — used for BM25 keyword search and returned in results

doc_id, source_file, element_type, page_number, and chunk_index are indexed in Qdrant for fast filtered queries. The agent uses doc_id to restrict a search_documents call to a specific document when the question targets one file.

Quick Start

Option A — Docker (recommended)

cp .env.example .env
# Add your ANTHROPIC_API_KEY to .env

cd docker
docker-compose up --build

Service	URL
Streamlit UI	http://localhost:8511
FastAPI docs	http://localhost:8010/docs
MCP SSE	http://localhost:8011/sse
Qdrant UI	http://localhost:6333/dashboard

Option B — Local development

Terminal 1 — Start Qdrant

docker run -p 6333:6333 qdrant/qdrant

Terminal 2 — Start backend

cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp ../.env.example .env   # add ANTHROPIC_API_KEY
python main.py            # FastAPI on :8010

Terminal 3 — Start frontend

cd frontend
bash start.sh             # Streamlit on :8511

MCP server (optional, separate terminal)

cd backend
source .venv/bin/activate
python -m docwise_mcp.server   # MCP SSE on :8011

Configuration

Copy .env.example to .env and set:

Variable	Required	Default	Notes
`ANTHROPIC_API_KEY`	✅ (Anthropic)	—	Claude API key
`OPENAI_API_KEY`	✅ (OpenAI)	—	OpenAI API key
`LLM_PROVIDER`	—	`anthropic`	`anthropic` or `openai`
`LLM_MODEL`	—	`claude-sonnet-4-6`	Any model for the chosen provider (e.g. `gpt-4o`)
`EMBEDDING_PROVIDER`	—	`fastembed`	`fastembed` (local, no key needed) or `openai`
`QDRANT_HOST`	—	`localhost`	`qdrant` in Docker
`MAX_AGENT_ITERATIONS`	—	`5`	Guard against infinite loops
`LANGSMITH_API_KEY`	—	—	Optional — enables tracing
`LANGSMITH_TRACING`	—	`false`	Set `true` to enable

API

Method	Endpoint	Description
`POST`	`/ingest/file`	Upload + parse + store a document
`GET`	`/documents`	List all ingested documents
`GET`	`/documents/{doc_id}/chunks`	All chunks for a document
`DELETE`	`/documents/{doc_id}`	Delete a document
`POST`	`/query`	Agentic Q&A — SSE streaming
`POST`	`/query/sync`	Agentic Q&A — wait for full answer
`POST`	`/query/search`	Direct hybrid search (no agent)
`GET`	`/health`	Health check

Interactive docs at http://localhost:8010/docs.

MCP (Claude Desktop)

DocWise Agentic exposes 4 tools via MCP over HTTP/SSE:

Tool	Description
`list_documents`	List all ingested documents
`search_documents`	Hybrid semantic + BM25 search
`ask_question`	Full agentic Q&A via LangGraph
`ingest_document`	Ingest a local file by path

Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "docwise-agentic": {
      "url": "http://localhost:8011/sse"
    }
  }
}

You can use DocWise Agentic as a client three ways:

Streamlit UI — web app with Upload, Chat, Explorer tabs
FastAPI — REST API + Swagger at /docs
Claude Desktop — via MCP — ask Claude to query your documents directly

LangSmith Tracing

Enable in .env:

LANGSMITH_API_KEY=your_key
LANGSMITH_TRACING=true
LANGSMITH_PROJECT=docwise-agentic

LangSmith automatically captures three levels:

Level	What it tracks
Run	Each LLM call — tokens, latency, input/output
Trace	Full agent invocation — all nodes, total cost
Thread	Multi-turn session — all traces grouped by `thread_id`

Project Structure

docwise-agentic/
├── backend/
│   ├── agents/
│   │   ├── graph.py          # LangGraph state graph
│   │   ├── state.py          # AgentState TypedDict
│   │   ├── tools.py          # Tool schemas + implementations
│   │   ├── llm.py            # Anthropic client singleton
│   │   └── nodes/
│   │       ├── query_understanding.py
│   │       ├── agent_reason.py
│   │       ├── tool_executor.py
│   │       ├── answer_synthesis.py
│   │       └── clarification.py
│   ├── api/
│   │   ├── app.py            # FastAPI factory + lifespan
│   │   └── routes/
│   │       ├── ingest.py
│   │       ├── documents.py
│   │       └── query.py      # SSE streaming
│   ├── vectorstore/
│   │   ├── schema.py         # Qdrant collection schema
│   │   └── qdrant.py         # Qdrant adapter (semantic + BM25 + RRF)
│   ├── docwise_mcp/
│   │   └── server.py         # FastMCP HTTP/SSE server
│   ├── parsing/              # inherited from docwise
│   ├── chunking/             # inherited from docwise
│   ├── embeddings/           # inherited from docwise
│   ├── observability.py      # LangSmith setup
│   ├── config.py
│   ├── logger.py
│   └── main.py
├── frontend/
│   └── app.py                # Streamlit — Upload + Chat + Explorer
├── docker/
│   ├── Dockerfile.backend
│   ├── Dockerfile.frontend
│   └── docker-compose.yml
└── docs/
    ├── REQUIREMENTS.md
    ├── ARCHITECTURE.md
    ├── PLAN.md
    └── DESCRIPTION.md

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
backend		backend
docker		docker
docs/screenshots		docs/screenshots
frontend		frontend
samples		samples
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocWise Agentic

Features

Tech Stack

ReAct Pattern (Reasoning + Acting)

How the Agent Works

What's New - DocWise Agentic Vs. DocWise

Architecture

Screenshots

Upload — Parse and ingest documents

Chat — Streaming agent steps, answer, and citations

Chat — Cross-document synthesis with multiple tool calls

Explorer — Browse chunks and inspect metadata

State & Memory Management

Retrieval & Grounding

Adaptive Chunking

Vector Store — Chunk Metadata

Quick Start

Option A — Docker (recommended)

Option B — Local development

Configuration

API

MCP (Claude Desktop)

LangSmith Tracing

Project Structure

License

Tags

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocWise Agentic

Features

Tech Stack

ReAct Pattern (Reasoning + Acting)

How the Agent Works

What's New - DocWise Agentic Vs. DocWise

Architecture

Screenshots

Upload — Parse and ingest documents

Chat — Streaming agent steps, answer, and citations

Chat — Cross-document synthesis with multiple tool calls

Explorer — Browse chunks and inspect metadata

State & Memory Management

Retrieval & Grounding

Adaptive Chunking

Vector Store — Chunk Metadata

Quick Start

Option A — Docker (recommended)

Option B — Local development

Configuration

API

MCP (Claude Desktop)

LangSmith Tracing

Project Structure

License

Tags

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages