Skip to content

pankaj-agrawalla/kontex-cli

Repository files navigation

Kontex CLI

Local HTTP proxy + dashboard for AI agent developers.
One command intercepts every LLM API call, saves a full snapshot locally, and opens a "Control Room" dashboard — no cloud, no config, no data leaves your machine.

npm version License: MIT Node.js >= 20 CI

If this is useful, a star on GitHub goes a long way — it helps other agent developers find it.


Dashboard


Why Kontex?

When you're building AI agents, you need to answer questions like:

  • Which LLM call caused the bad output?
  • What was the exact context when the agent went off-track?
  • Can I replay this run with a different response at step 3?

Kontex intercepts every call at the proxy layer, so you get full observability with zero changes to your agent code — just point your base URL at localhost:8080.


What it does

Your agent  →  localhost:8080  →  OpenAI / Anthropic / Ollama / any LLM API
                    │
                    ├── Saves raw prompt + response to .kontex.db (SQLite)
                    ├── Optionally trims context (lossless, toggleable)
                    └── Serves dashboard at GET /

Key features

Feature Description
Proxy Intercepts every POST /* call and forwards to your upstream LLM
Snapshots Saves the full untrimmed prompt and response to SQLite — nothing is lost
Context trimmer Structurally lossless trimming applied before the upstream call — toggleable from the dashboard
Session grouping Groups related agent runs into sessions via a request header
Multi-agent graph Swim-lane view showing every agent's trajectory and cross-agent links
Live pause Pause a request mid-flight, inspect it, then resume with edited messages
Fork & replay Branch from any snapshot with a human-edited response; downstream calls replay deterministically
Branch chain Create a new agent task from any snapshot, staying in the same session

Requirements

  • Node.js 18+
  • npm 9+

Installation

Option A — global install (recommended)

npm install -g kontex-proxy
kontex start

Option B — clone and build

git clone https://github.com/pankaj-agrawalla/kontex-cli.git
cd kontex-cli
npm install
cd web && npm install && cd ..
npm run build

Configuration

Copy .env.example and edit as needed:

cp .env.example .env
# .env
KONTEX_PORT=8080           # Port for the proxy + dashboard (default: 8080)
UPSTREAM_URL=https://api.openai.com   # LLM API to forward requests to

To use with Ollama locally:

UPSTREAM_URL=http://localhost:11434

To use with Anthropic:

UPSTREAM_URL=https://api.anthropic.com

Usage

Start the server

kontex start

The browser opens automatically at http://localhost:8080.

Or with a custom port:

kontex start --port 9000

Point your agents at Kontex

Change your agent's base URL from the LLM provider to the Kontex proxy:

http://localhost:8080

No other code changes are required. All requests are transparently proxied.

Example — OpenAI SDK:

import OpenAI from "openai"

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "http://localhost:8080/v1",   // ← point at Kontex
})

Example — LangChain:

import { ChatOpenAI } from "@langchain/openai"

const llm = new ChatOpenAI({
  openAIApiKey: process.env.OPENAI_API_KEY,
  configuration: {
    baseURL: "http://localhost:8080/v1",  // ← point at Kontex
  },
})

Example — raw fetch:

await fetch("http://localhost:8080/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json", "Authorization": `Bearer ${apiKey}` },
  body: JSON.stringify({ model: "gpt-4o", messages }),
})

Optional request headers

These headers unlock richer dashboard views. They are stripped before forwarding upstream — your LLM never sees them.

Header Purpose
X-Kontex-Task-Id Groups snapshots into a named agent task (swim lane in the graph). Defaults to "default" if omitted.
X-Kontex-Session-Id Groups all tasks from one run into a single session entry in the sidebar.
X-Kontex-Parent-Task-Id Records a cross-agent link (draws an amber dashed edge). Send on the first turn only of a child agent.
X-Kontex-Fork-Id Enables deterministic replay. Set to the task ID you forked from.

Without any headers, everything still works — all snapshots land under the "default" task and appear in the dashboard.

With headers (recommended for multi-agent workflows):

const headers = {
  "X-Kontex-Task-Id": "planner-agent",
  "X-Kontex-Session-Id": "run-2024-001",
  // first turn of a child agent only:
  "X-Kontex-Parent-Task-Id": "planner-agent",
}

The Dashboard

Open http://localhost:8080 in your browser.

Sidebar (left)

  • Lists all sessions ordered newest-first
  • Each entry shows the session ID, timestamp, agent count, and snapshot count
  • Click a session to load its graph
  • Context trimmer toggle at the bottom — turn trimming on or off in real time

Graph (center)

  • One swim-lane column per agent task
  • Nodes = individual LLM calls (snapshots)
  • Gray edges = within the same agent
  • Amber dashed animated edges = cross-agent links (parent → child)
  • Amber-bordered nodes = human-edited snapshots
  • Click any node to open the snapshot drawer

Snapshot drawer (right)

Opens when you click a node. Shows:

  • The full conversation messages sent to the LLM
  • Live Pause — pauses the next request from this task mid-flight so you can inspect and edit messages before they reach the LLM
  • Fork & Edit — save a human-edited version of the messages; the next replay of this prompt hash will return your edited version instead of calling the LLM
  • Branch chain here — create a new agent task (in the same session) branching from this point, with an editable LLM response

Found this useful in your stack? Share it with your team or post it in your AI/agent dev community — this project grows entirely through word of mouth.


Context trimmer

The trimmer applies three structurally lossless passes before forwarding to the upstream LLM:

  1. Tool result truncation — long tool/function responses are sliced to prevent runaway context growth
  2. Middle-turn compression — older assistant turns in the middle of a long conversation are shortened
  3. System prompt deduplication — repeated system content across turns is reduced

The raw untrimmed payload is always saved to the database — trimming only affects what is forwarded upstream.

Toggle it on/off live from the sidebar without restarting the server.


Multi-agent workflow example

const SESSION_ID = `run-${Date.now()}`

// Agent 1 — Planner
const plannerResponse = await fetch("http://localhost:8080/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": `Bearer ${apiKey}`,
    "X-Kontex-Task-Id": "planner",
    "X-Kontex-Session-Id": SESSION_ID,
  },
  body: JSON.stringify({ model: "gpt-4o", messages: plannerMessages }),
})

// Agent 2 — Coder (links back to planner)
const coderResponse = await fetch("http://localhost:8080/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": `Bearer ${apiKey}`,
    "X-Kontex-Task-Id": "coder",
    "X-Kontex-Session-Id": SESSION_ID,
    "X-Kontex-Parent-Task-Id": "planner",   // ← first turn only
  },
  body: JSON.stringify({ model: "gpt-4o", messages: coderMessages }),
})

This produces a dashboard with two swim lanes and an amber edge from Planner → Coder, grouped under one session.


Database

All data is stored in .kontex.db (SQLite) in the project root. The file is created automatically on first run.

To start completely fresh:

rm .kontex.db
kontex start

Schema

CREATE TABLE Snapshots (
  id                 TEXT PRIMARY KEY,   -- cuid
  task_id            TEXT NOT NULL,      -- from X-Kontex-Task-Id header
  parent_id          TEXT,               -- previous snapshot in the same task
  parent_task_id     TEXT,               -- from X-Kontex-Parent-Task-Id header
  session_id         TEXT,               -- from X-Kontex-Session-Id header
  prompt_hash        TEXT NOT NULL,      -- MD5 of messages array (for replay lookup)
  raw_prompt_payload TEXT NOT NULL,      -- original untrimmed JSON body
  llm_response       TEXT,              -- raw response from upstream
  is_human_edited    INTEGER DEFAULT 0, -- 1 if created via fork
  created_at         INTEGER NOT NULL   -- Unix ms
);

Internal API

These endpoints power the dashboard. You can also call them directly.

Method Path Description
GET /health Health check
GET /api/sessions List all sessions
GET /api/tasks List all task IDs
GET /api/graph?session=<id> Combined graph (nodes + edges) for a session
GET /api/tasks/:id/graph Graph for a single task
GET /api/snapshots/:id Full snapshot detail
POST /api/snapshots/:id/pause Pause the next request on this snapshot
POST /api/snapshots/:id/resolve Resume a paused request with edited messages
POST /api/snapshots/:id/fork Create a human-edited snapshot (same task)
POST /api/snapshots/:id/fork-chain Create a new task branching from this snapshot
GET /api/trimmer Get trimmer state { enabled: boolean }
POST /api/trimmer/toggle Toggle trimmer on/off

Development

Run the backend and frontend separately with hot reload:

# Terminal 1 — backend
npm run dev

# Terminal 2 — frontend
cd web && npm run dev

The Vite dev server runs on port 5173 and proxies /api to localhost:8080.


E2E test

Requires Ollama running locally with llama3.2:1b:

ollama pull llama3.2:1b
npm run build
npm run e2e

Simulates a 3-agent pipeline (Planner → Coder → Reviewer), verifies snapshots, cross-agent edges, session grouping, fork/replay, and edge cases. Exits 0 on full pass.


Stay in the loop

We're building something bigger around Kontex CLI — team dashboards, session sharing, and deeper agent observability are on the roadmap.

  • Watch this repo (GitHub Watch) to get notified on releases
  • Star it (GitHub Star) to show support and help others discover it
  • Open an issue to share what you're building — it directly shapes what gets built next

Contributing

Issues and PRs are welcome. Please open an issue first for significant changes.


License

MIT — see LICENSE.

About

Observability layer for AI agent network. Local HTTP proxy + dashboard for AI agent developers. Intercept, inspect, replay, and fork every LLM call — no cloud required.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages