A local, multi-agent chat framework built on LangGraph and Ollama. A supervisor LLM routes each user message to the right subagent — web search, document retrieval, or general chat — and streams the response back token by token.
User input
│
▼
┌─────────────┐
│ Supervisor │ classifies intent → search | retriever | general
└──────┬──────┘
│
┌────┴────────────────────────────────┐
▼ ▼ ▼
Search agent Retriever agent General agent
│ (web) │ (document) │ (conversation)
▼ ▼ ▼
search_tool get_tree END
(Serper) get_node_summary
get_node_content
| Agent | Trigger | Tools |
|---|---|---|
| General | Conversational, factual, or default questions | — |
| Search | Questions requiring current or external information | Google Search via Serper |
| Retriever | Questions about an attached document | Tree traversal tools (see below) |
The supervisor only routes to Retriever when an active document session exists (i.e. a file_id was provided at startup). Otherwise it falls back to General.
Instead of embeddings and vector similarity search, document retrieval is done via a hierarchical tree of nodes stored in PostgreSQL. Each node maps to a section of the original document (heading → subheading → content) and carries a title, summary, and content.
The Retriever agent:
- Calls
get_tree(file_id)to fetch the full document structure. - Inspects node titles to narrow down candidates.
- Calls
get_node_summary(node_id)on likely candidates to confirm relevance. - Fetches full
get_node_content(node_id)only for the most relevant nodes. - Synthesizes an answer from those nodes.
This avoids embedding costs and keeps retrieval interpretable and deterministic.
The tree structure is produced by a separate pipeline: lazzyms/pdf-markdown-embed
That project converts a PDF → Markdown (via docling + OCR), splits it by Markdown headers into a nested tree, summarizes each leaf node with an LLM, and persists the result to PostgreSQL.
There is currently no direct integration between the two repositories. The workflow is manual:
- Run
pdf-markdown-embedwithPROCESS_TYPE=vectorlessagainst your PDF.- Keep the note of the
file_idyou add in the .env for FILES variable (be mindful while working with multiple files)- Provide that
file_idat startup ofcl-ai-chatwhen prompted.
Both projects share the same PostgreSQL schema (tree_nodes table), so they only need to point at the same database.
- Multi-agent routing: supervisor LLM classifies each question and delegates to the right subagent.
- Interactive CLI with colored output, a live spinner that reflects the supervisor's routing decision, and real-time token streaming.
- Maintains conversation history within a session using LangGraph's
add_messagesreducer — no history is persisted between sessions. - Automatically compresses long conversations using an LLM-based summarization technique (triggered after 20 messages, keeping the 6 most recent verbatim).
- Optional document session: attach a pre-processed document by
file_idto enable tree-based retrieval.
- Python 3.10+
- Ollama running locally with a model pulled
- A Serper API key for web search
- PostgreSQL instance (only required for document sessions — general and search modes work without it)
uvas the package manager (recommended)
- Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate- Install dependencies:
uv sync- Create a
.envfile in the project root:
SERPER_API_KEY=your_serper_key
OLLAMA_MODEL=qwen3:latest
DATABASE_URL=postgresql://user:password@localhost:5432/your_db # only needed for document sessions- Pull your Ollama model:
ollama pull qwen3:latestTo use a different model, update
OLLAMA_MODELin your.envfile. All agents read the model name fromconfig/settings.pyvia theOLLAMA_MODELenv var.
uv run main.pyAt startup you will be asked for a file_id:
- Enter a
file_id— starts a document session; questions about the document will be routed to the Retriever agent. - Press Enter — starts in general chat mode; only Search and General agents are active.
Type exit, quit, or bye to end the session.
The spinner updates to reflect the supervisor's routing decision in real time:
| Spinner text | What's happening |
|---|---|
Thinking… |
Supervisor is classifying the question |
Searching… |
Routed to Search agent |
Fetching document… |
Routed to Retriever agent |
Processing… |
Routed to General agent |
Searching for "<query>"… |
Search agent calling the web search tool |
Fetching "<node_id>"… |
Retriever agent calling a tree node tool |
Processing results… |
Search tool returned, agent synthesizing |
Analysing document… |
Retriever tool returned, agent synthesizing |
| Color | Meaning |
|---|---|
| Cyan (bright) | Your input prompt |
| Green | AI response tokens |
| Magenta (dim) | Model thinking / <think> blocks |
| Yellow (bright) | Spinner / tool call status |
| Yellow (dim) | Tool result preview |
| Blue (dim) | System notices (summarization, session info) |
| Red | Errors |
main.py # Entrypoint, REPL, streaming display, summarization
cli/
session.py # Startup prompt — collects file_id from the user
agents/
state.py # AgentState schema (messages, file_id, route)
supervisor.py # Supervisor LLM — classifies intent, sets route
workflow.py # LangGraph graph — nodes, edges, conditional routing
subagents/
general_agent.py # Conversational agent (no tools)
search_agent.py # Web search agent
retriever_agent.py # Document retrieval agent (tree traversal)
prompts/
supervisor.py # Supervisor routing prompt
general.py # General agent system prompt
search.py # Search agent system prompt
review_nodes.py # Retriever agent system prompt (includes {file_id})
tools/
search_tool.py # Google search via Serper
file_traversal_tool.py # get_tree / get_node_summary / get_node_content
config/
settings.py # Pydantic settings (env vars)
utils/
database.py # SQLAlchemy engine factory
models/
tree.py # Pydantic model for tree nodes
- Conversation history is session-scoped — restarting the process starts a fresh session.
- History is automatically compressed after 20 messages (keeps 6 most recent verbatim). Tune
SUMMARY_THRESHOLDandKEEP_RECENTat the top ofmain.py. - If you add external dependencies, update
pyproject.tomland runuv sync.
- Direct integration with pdf-markdown-embed — auto-process a PDF and start a chat session in one command.
- Persist conversation history across sessions.
- Settings: personalised behaviour, learning from conversations.
Feel free to open issues or send PRs with improvements, examples, or bug fixes.