A LangGraph agent that interacts with a mock file system and database via MCP tools, with Tool Failure Recovery and Human-in-the-Loop (HITL) gating for destructive actions.
User Input
│
▼
┌─────────┐ tool_call / approval ┌───────────────┐
│ Planner │ ─────────────────────────► │ Tool Executor │
│ (LLM) │ ◄──────────────────────── │ (MCP tools) │
└─────────┘ result / error (retry) └───────────────┘
│ ▲
│ request_approval │ approved
▼ │
┌───────────┐ approved ┌─────────────────────────┐
│ HITL Gate │ ──────────►│ (back to tool executor) │
│ (human) │ └─────────────────────────┘
└───────────┘
│ denied
▼
[END — denied answer]
| Concern | Solution |
|---|---|
| Tool errors | Retry loop (max 3) with LLM reflection prompt |
| Destructive actions | HITL gate node — agent pauses, human approves/denies |
| Unknown tools | Caught at executor, returned as typed error |
| No API key | Stub LLM for offline/CI testing |
| Auditability | Every step logged in state["messages"] |
langgraph-agent/
├── agent/
│ ├── graph.py # LangGraph graph: nodes, edges, routing
│ ├── state.py # Typed AgentState (TypedDict)
│ └── prompts.py # LLM prompt templates
├── mcp_server/
│ └── mock_mcp.py # Mock file system + DB tools
├── tests/
│ └── test_agent.py # Pytest suite (unit + integration)
├── examples/
│ ├── input.json # Sample user inputs
│ └── output.json # Expected outputs with traces
├── main.py # CLI entry point
├── requirements.txt
└── .env.example
# 1. Clone / enter the repo
cd langgraph-agent
# 2. Create a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure environment
cp .env.example .env
# Edit .env — set OPENAI_API_KEY (or leave as sk-fake for stub mode)# Happy path — read an existing file
python main.py "show me the q1 sales report"
# Retry loop — wrong path triggers reflection + search fallback
python main.py "read /wrong/path/sales.csv"
# HITL gate — destructive action pauses for human approval
python main.py "delete the q1 sales file"
# Database query
python main.py "show me all users in the database"The examples/ folder contains sample payloads and expected response shapes:
| File | Description |
|---|---|
examples/input.json |
Array of sample user prompts (strings) you can pass to the agent. |
examples/output.json |
Example outputs with traces: status, final_answer, and a messages_trace for each scenario (happy path, retry/recovery, HITL deny, DB query). |
Try an input from the file:
# Using the first example input
python main.py "show me the q1 sales report"
# Or run each line from input.json (e.g. in a script)
# inputs: "show me the q1 sales report", "read /wrong/path/sales.csv", "delete the q1 sales file", "show me all users in the database"The structure of the agent’s final state (what you see in output.json under each example) is: status, final_answer, optional retry_count / approval_granted, and messages (full trace of user, assistant, and tool messages).
# Run all tests (no API key needed — uses stub LLM)
pytest tests/ -v
# Run only MCP unit tests
pytest tests/test_agent.py::TestMockMCP -v
# Run only integration tests
pytest tests/test_agent.py::TestAgentIntegration -vWhen a tool call fails, the agent doesn't give up immediately:
tool_executor_nodecaptures the error and incrementsretry_count- Router checks: if
retry_count < MAX_RETRIES (3), sends back toplanner planner_nodeusesREFLECTION_PROMPT— tells the LLM what failed and asks it to reason about an alternative- LLM typically recovers by calling
search_filesafter aFileNotFoundError - After
MAX_RETRIES,error_handler_nodeproduces a graceful failure message
read_file("/wrong/path") → FileNotFoundError
→ LLM reflects: "path was wrong, try searching"
→ search_files("sales") → ["/data/reports/q1_sales.csv", ...]
→ final_answer: "Found these files instead: ..."
Destructive tools (delete_file, update_db, delete_db_record) never run automatically:
- LLM uses
"action": "request_approval"instead of"action": "tool_call" planner_nodedetectspending_approval=Trueand routes tohitl_gate_nodehitl_gate_nodeprints the action details and waits for human input- Approved → routes back to
tool_executor_nodeto execute - Denied → sets
final_answerwith denial reason, routes to END
For CI/automated tests, set HITL_AUTO_APPROVE=true or HITL_AUTO_DENY=true.
- Stub LLM: The
_stub_llmfunction uses keyword matching for offline testing. With a realOPENAI_API_KEYthe actual GPT-4o-mini model is used. - In-memory state: The mock file system and DB are module-level dicts — they reset between process runs but are shared within a run. A production version would use persistent storage.
- Synchronous HITL: The approval prompt is blocking stdin. Production would use an async approval queue (e.g., Slack bot, web UI, email).
- Single-agent: This is a single-LLM agent. A production system might use specialized sub-agents per tool domain.
- Max retries = 3: Hardcoded constant; should be configurable via environment variable.