Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
name: CI

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13"]

steps:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install uv
uses: astral-sh/setup-uv@v4

- name: Run tests
run: python tests/test_flush.py

lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Check syntax
run: |
python -m py_compile hooks/core.py
python -m py_compile hooks/adapters/cursor.py
python -m py_compile hooks/adapters/claude_code.py
python -m py_compile hooks/flush.py

shellcheck:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: ShellCheck
uses: ludeeus/action-shellcheck@master
with:
scandir: hooks
additional_files: install.sh uninstall.sh
169 changes: 119 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,21 @@
# coding-agent-insights

[![CI](https://github.com/mazzucci/coding-agent-insights/actions/workflows/ci.yml/badge.svg)](https://github.com/mazzucci/coding-agent-insights/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

Session tracing and observability for AI coding agents, powered by [Phoenix](https://github.com/Arize-ai/phoenix).

Every agent interaction — prompts, tool calls, file edits, shell commands, thinking steps — is captured automatically and sent to Phoenix for search, replay, and analysis. Currently supports Cursor IDE, with plans to expand to other coding agents.
Every agent interaction — prompts, tool calls, file edits, shell commands, thinking steps — is captured automatically and sent to Phoenix for search, replay, and analysis.

## Supported agents

| Agent | Mechanism | Status |
|---|---|---|
| **Cursor** | Hook events → JSONL buffer → flush on session end | ✅ Stable |
| **Claude Code** | Stop hook → JSONL transcript parser → flush | ✅ New |

Both agents produce **identical span structures** in Phoenix — same trace hierarchy, same attributes, same OpenInference conventions. You can compare sessions across agents in a single Phoenix project.

## How it works

Expand All @@ -11,21 +24,41 @@ flowchart LR
subgraph cursor [Cursor IDE]
A[Hook event] -->|stdin JSON| B[trace-hook.sh]
end
subgraph claude [Claude Code]
G[Stop hook] -->|transcript path| H[claude-code-trace-hook.sh]
end
B -->|append| C["/tmp/cursor-traces.jsonl"]
B -->|"on stop/sessionEnd"| D["flush.py (uv run)"]
D -->|read & transform| C
D -->|Phoenix SDK| E[Phoenix]
E --> F[Traces & Sessions UI]
B -->|"on stop/sessionEnd"| D["flush.py → CursorAdapter"]
H -->|read| I["~/.claude/projects/.../session.jsonl"]
H -->|parse| J["flush.py → ClaudeCodeAdapter"]
D --> E[core.py]
J --> E
E -->|Phoenix SDK| F[Phoenix]
F --> K[Traces & Sessions UI]
```

**Hot path (~5 ms):** Every Cursor hook event is piped to `trace-hook.sh`, a bash script that appends the raw JSON to a local buffer file.
### Cursor

**Hot path (~5 ms):** Every Cursor hook event is piped to `trace-hook.sh`, which appends the raw JSON to a local buffer file.

**Flush (on session end):** When a session ends, `flush.py` runs in the background. The Cursor adapter reads the buffer, normalises events, and hands them to the core engine for span construction and Phoenix export.

### Claude Code

**Flush (on session end):** When a session ends, `flush.py` runs in the background via `uv run`. It reads the buffer, groups events into per-turn traces with proper parent-child relationships, maps them to [OpenInference](https://github.com/Arize-ai/openinference) semantic conventions, and sends them to Phoenix.
**Stop hook:** Claude Code's `Stop` hook fires after each agent turn. The hook script extracts the `transcript_path` from the hook context and passes it to `flush.py`.

**Result:** Each conversation turn becomes a separate trace in Phoenix. All turns from the same Cursor tab are grouped into a Phoenix session, giving you a full conversational thread view.
**Transcript parsing:** The Claude Code adapter reads the JSONL transcript, parses user messages, assistant content blocks (text, thinking, tool_use), and tool results into normalised events. Tool use/result pairs are linked automatically.

### Shared core

Both adapters produce `NormalizedEvent` objects that the core engine processes uniformly: turn assignment, span building with monotonic timestamps, parent-child relationships, and Phoenix export using [OpenInference](https://github.com/Arize-ai/openinference) semantic conventions.

**Result:** Each conversation turn becomes a separate trace in Phoenix. All turns from the same session are grouped together, giving you a full conversational thread view — regardless of which agent generated them.

## What gets captured

### Cursor

| Hook event | Span name | Content |
|---|---|---|
| `sessionStart` | `session` | Composer mode, background agent flag |
Expand All @@ -41,6 +74,16 @@ flowchart LR
| `subagentStop` | `subagent:<type>` | Task, summary, tool count |
| `stop` / `sessionEnd` | `session.end` | Status, reason, duration |

### Claude Code

| Content block | Span name | Content |
|---|---|---|
| User message | *(first 120 chars of prompt)* | Full prompt text |
| `thinking` | `thinking` | Model reasoning text |
| `tool_use` + `tool_result` | `tool:<name>` | Tool input, output, duration |
| `text` (assistant) | `response` | Final response text |
| Summary | `session.end` | Session summary |

## Quick start

### Prerequisites
Expand All @@ -61,102 +104,128 @@ bash install.sh
The installer will:

1. Install `uv` if needed
2. Copy hook scripts to `~/.cursor/hooks/`
3. Merge hook config into `~/.cursor/hooks.json`
2. **Auto-detect** Cursor (`~/.cursor/`) and/or Claude Code (`~/.claude/`)
3. Copy hook scripts and configure each detected agent
4. Ask how you want to connect to Phoenix:
- **Local Docker** — spins up Phoenix v13.15.0
- **Local Docker** — spins up Phoenix v13.15.0 (with gRPC on port 4317)
- **Existing URL** — connects to your Phoenix instance
- **Skip** — configure later
5. Ask for a Phoenix project name (default: `cursor`)
5. Ask for a Phoenix project name (default: `coding-agent-insights`)

After install, Cursor will trace all agent sessions automatically.
After install, both agents will trace sessions automatically.

### Verify

Open Phoenix at [http://localhost:6006](http://localhost:6006) (or your custom URL), start a Cursor agent conversation, and watch traces appear in the project.
Open Phoenix at [http://localhost:6006](http://localhost:6006), start an agent conversation, and watch traces appear in the project.

## Configuration

All settings are in `~/.cursor/hooks/.coding-agent-insights.env`:
Settings are in `.coding-agent-insights.env` in each agent's hooks directory:

```bash
PHOENIX_HOST="http://localhost:6006"
PHOENIX_PROJECT="cursor"
# CURSOR_TRACES_DEBUG="true"
# CURSOR_TRACES_SKIP="field1,field2"
# CURSOR_TRACES_BUFFER="/tmp/cursor-traces.jsonl"
PHOENIX_PROJECT="coding-agent-insights"
AGENT_TYPE="cursor" # or "claude_code"
# TRACES_DEBUG="true"
# TRACES_SKIP="field1,field2"
# TRACES_LOG="/tmp/coding-agent-insights.log"
```

| Variable | Default | Purpose |
|---|---|---|
| `PHOENIX_HOST` | `http://localhost:6006` | Phoenix server URL |
| `PHOENIX_PROJECT` | `cursor` | Phoenix project name |
| `CURSOR_TRACES_DEBUG` | *(unset)* | Set to `true` for debug logging to `/tmp/cursor-traces.log` |
| `CURSOR_TRACES_SKIP` | *(unset)* | Comma-separated field names to redact from traces |
| `CURSOR_TRACES_BUFFER` | `/tmp/cursor-traces.jsonl` | Path to the event buffer file |
| `PHOENIX_PROJECT` | `coding-agent-insights` | Phoenix project name |
| `AGENT_TYPE` | `cursor` | Agent type (cursor / claude_code) |
| `TRACES_DEBUG` | *(unset)* | Set to `true` for debug logging |
| `TRACES_SKIP` | *(unset)* | Comma-separated field names to redact |
| `TRACES_LOG` | `/tmp/coding-agent-insights.log` | Debug log file path |

Legacy `CURSOR_TRACES_*` env vars are still supported for backward compatibility.

## Manual flush

Traces flush automatically when a session ends. To flush manually:
Traces flush automatically. To flush manually:

```bash
uv run ~/.cursor/hooks/flush.py
```

With debug output:
# Cursor
uv run hooks/flush.py --agent cursor

```bash
CURSOR_TRACES_DEBUG=true uv run ~/.cursor/hooks/flush.py
# Claude Code
uv run hooks/flush.py --agent claude_code --transcript ~/.claude/projects/.../session.jsonl
```

Check buffer size:
With debug output:

```bash
wc -l /tmp/cursor-traces.jsonl
TRACES_DEBUG=true uv run hooks/flush.py --agent cursor
```

## Phoenix features

### Traces

Each user turn (prompt + agent response cycle) becomes a trace. Tool calls, file edits, and shell executions appear as child spans with proper input/output attribution.
Each user turn (prompt + agent response cycle) becomes a trace. Tool calls, file edits, and thinking steps appear as child spans with proper input/output attribution.

### Sessions

All turns from the same Cursor conversation are grouped into a Phoenix session. The Sessions tab shows the conversational thread with first input and last output for each turn.
All turns from the same conversation are grouped into a Phoenix session. The Sessions tab shows the conversational thread with first input and last output for each turn.

### Cross-agent comparison

Both Cursor and Claude Code traces appear in the same Phoenix project. Compare how different agents handle the same tasks, analyse tool usage patterns, and identify which workflows are most effective.

### Golden datasets

Save exemplary traces to Phoenix datasets for future reference — proven prompt patterns, successful tool chains, or reference workflows. See the [coding-agent-insights skill](skills/insights/SKILL.md) for programmatic examples.

## Uninstall
## Architecture

```bash
bash uninstall.sh
```
hooks/
├── flush.py # Entrypoint: dispatches to adapter → core → Phoenix
├── core.py # Agent-agnostic engine: NormalizedEvent, turns, spans, posting
├── trace-hook.sh # Cursor: hot-path bash hook (~5ms)
├── claude-code-trace-hook.sh # Claude Code: Stop hook script
└── adapters/
├── __init__.py # Adapter registry
├── cursor.py # Cursor: buffer I/O + event normalisation
└── claude_code.py # Claude Code: JSONL transcript parser

tests/
└── test_flush.py # 35 tests covering core, both adapters, cross-adapter parity

install.sh # Multi-agent installer with auto-detection
uninstall.sh # Cleanup script
docker-compose.yml # Phoenix with HTTP (6006) + gRPC (4317)
```

This removes hook scripts and config entries. Optionally stops the Phoenix container and removes its data volume.
### Adapter pattern

## Architecture
Each adapter implements:
- `read_events()` → `list[NormalizedEvent]`
- Agent-specific I/O and event normalisation

```
~/.cursor/
├── hooks.json # Cursor hook config (managed by installer)
└── hooks/
├── trace-hook.sh # Bash hot-path: buffers events (~5ms)
├── flush.py # Python: transforms & sends to Phoenix
└── .coding-agent-insights.env # User settings (Phoenix URL, project, etc.)
The core engine handles everything else: turn assignment, span building with monotonic timestamps, parent-child relationships, session labels, and Phoenix export.

**Adding a new agent:** Create `adapters/your_agent.py` with a class that implements `read_events()` returning `NormalizedEvent` objects. Register it in `adapters/__init__.py`. The core engine handles the rest.

## Tests

```bash
python tests/test_flush.py
```

- **trace-hook.sh** runs for every hook event. It sources `.coding-agent-insights.env`, appends the JSON payload to the buffer, and triggers `flush.py` on `stop`/`sessionEnd`.
- **flush.py** runs via `uv run` (isolated Python with `arize-phoenix-client`). It reads the buffer, splits events into turns, builds OpenInference-compliant spans, and posts them to Phoenix.
- **Buffer file** (`/tmp/cursor-traces.jsonl`) acts as a resilient intermediary. If Phoenix is unreachable, the buffer is preserved for retry.
The test suite covers:
- **Core engine** (20 tests): turn assignment, sequencing, timestamps, parent-child relationships, redaction, edge cases
- **Cursor adapter** (6 tests): event normalisation, all hook types, atomic buffer drain
- **Claude Code adapter** (8 tests): transcript parsing, tool use/result pairing, thinking blocks, multi-turn, timestamps
- **Cross-adapter parity** (1 test): both adapters produce consistent span structures

## Contributing

1. Fork the repo
2. Make your changes
3. Test with a real Cursor session
3. Run `python tests/test_flush.py` (all 35 tests must pass)
4. Submit a PR

## License
Expand Down
3 changes: 2 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ services:
image: arizephoenix/phoenix:13.15.0
container_name: coding-agent-insights-phoenix
ports:
- "6006:6006"
- "6006:6006" # HTTP (UI + REST API)
- "4317:4317" # gRPC (OpenTelemetry collector)
volumes:
- phoenix-data:/data
restart: unless-stopped
Expand Down
24 changes: 24 additions & 0 deletions hooks/adapters/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
"""
coding-agent-insights — adapter registry

Each adapter normalises agent-specific events into NormalizedEvent objects
that the core engine can process uniformly.
"""
from hooks.adapters.cursor import CursorAdapter
from hooks.adapters.claude_code import ClaudeCodeAdapter

ADAPTERS = {
"cursor": CursorAdapter,
"claude_code": ClaudeCodeAdapter,
}


def get_adapter(agent_type: str):
"""Return the adapter class for a given agent type."""
adapter_cls = ADAPTERS.get(agent_type)
if adapter_cls is None:
raise ValueError(
f"Unknown agent type: {agent_type!r}. "
f"Available: {', '.join(ADAPTERS)}"
)
return adapter_cls()
Loading
Loading