GitHub-as-State-Machine Benchmark

Measures the latency cost of using GitHub's REST API as a state coordination mechanism in the Amber issue-handler workflow.

Thesis

Using GitHub (issues, PRs, labels, checks) as the state machine for agentic workflows introduces measurable overhead. This benchmark quantifies:

Per-step API latency — how long each GitHub REST call takes
Polling tax — time wasted waiting for GitHub to compute state (e.g. PR mergeable)
Rate limit pressure — how fast you burn through 5,000 req/hr under concurrent agents
Concurrency degradation — how latency changes as parallel agents increase
Event propagation delay — the gap between "state changed" and "state visible via API"

The goal is to prove that timer-based polling should be replaced with event-driven webhooks and that GitHub's API is a bottleneck worth measuring.

Workflow Mapping

Mermaid Step              → GitHub API Call                    → Type
─────────────────────────────────────────────────────────────────────
A: Issue + label          → POST /issues + POST /labels       → WRITE
B: Workflow dispatch      → POST /dispatches                  → WRITE
C: ambient-action         → (simulated delay)                 → COMPUTE
D: ACP session            → (simulated delay)                 → COMPUTE
E: Amber2 workflow        → (simulated delay)                 → COMPUTE
F: Load context           → GET /repos + GET /issues          → READ
G: Spec Kit               → (simulated delay)                 → COMPUTE
H: Reproduce              → (simulated delay)                 → COMPUTE
I: Implement fix          → (simulated delay)                 → COMPUTE
J: Lint/test/coverage     → POST + PATCH /check-runs          → WRITE
K: Emit report            → POST /issues/comments             → WRITE
L: Push branch            → POST /git/refs + PUT /contents    → WRITE
M: Create PR              → POST /pulls                       → WRITE
M: Poll mergeable         → GET /pulls (repeated)             → POLL ← event gap
N: Review                 → POST /pulls/reviews               → WRITE
O: Merge                  → PUT /pulls/merge                  → WRITE
P: Post-merge learning    → POST /issues/comments             → WRITE
Q: Close issue            → PATCH /issues                     → WRITE
Z: Cleanup                → DELETE /git/refs                  → WRITE

Each WRITE call costs 5 points against the secondary rate limit (900 pts/min). A single cycle makes ~14 write calls = 70 points. At 900 pts/min, you hit the secondary rate limit at ~12 concurrent cycles/minute.

Quick Start

# Install
uv sync

# Set up test repo (creates repo, labels, workflow)
GITHUB_TOKEN=ghp_xxx GITHUB_ORG=your-test-org \
  uv run python setup_repo.py

# Run benchmark (sequential then parallel ramp)
GITHUB_TOKEN=ghp_xxx GITHUB_ORG=your-test-org GITHUB_REPO=gh-api-benchmark \
  uv run python benchmark.py

# Analyze results
uv run python analyze.py benchmark_results.db ./reports/

Configuration

Environment variables:

Variable	Default	Description
`GITHUB_TOKEN`	(required)	PAT with repo+workflow+checks
`GITHUB_ORG`	(required)	Test org or username
`GITHUB_REPO`	`gh-api-benchmark`	Test repo name
`SEQUENTIAL_CYCLES`	`5`	Baseline cycles (no concurrency)
`PARALLEL_CYCLES`	`20`	Total parallel cycles
`MAX_CONCURRENCY`	`5`	Max simultaneous cycles

Required PAT Scopes

repo — full control of private repos
workflow — update GitHub Action workflows
write:checks — create and update check runs

Output

SQLite database (benchmark_results.db) with three tables:

step_results — every individual API call or simulated step
cycle_summary — per-cycle aggregates (wall time, API time, sim time)
rate_limit_snapshots — periodic rate limit state

Analysis script produces:

benchmark_report.txt — human-readable summary
step_latency_stats.csv — per-step p50/p90/p99
rate_limit_timeline.csv — rate limit consumption over time
all_step_results.csv — raw data for external tools

Key Metrics to Watch

Poll Tax % — what fraction of cycle time is spent polling for state
GitHub Tax % — what fraction of non-compute time is API calls
P99 latency per step — tail latency tells the real story
Rate limit burn rate — projected req/hr extrapolated from the run
Concurrency knee — at what parallelism level does latency spike

Dashboard

A Next.js app for launching runs, monitoring progress, and comparing results.

cd dashboard
npm install

# Required: pass the same GitHub credentials
GITHUB_TOKEN=ghp_xxx GITHUB_ORG=your-test-org GITHUB_REPO=gh-api-benchmark \
  npx next dev -p 3001

See dashboard/.env.example for required variables. The dashboard reads from the same benchmark_results.db in the project root.

Features:

Launch benchmark runs with preset or custom configurations
Live monitoring — polls every 3s while a run is active
Run detail — per-step latency charts, percentile distributions, rate limit timeline
Compare — side-by-side metrics across multiple runs

Architecture

benchmark.py          — orchestrator + instrumented GitHub client
├── Phase 1           — sequential baseline (N cycles, 1 at a time)
├── Phase 2           — parallel ramp (1→MAX_CONCURRENCY workers)
└── SQLite writer     — WAL mode, commit-per-cycle for crash safety

analyze.py            — reads SQLite, produces stats + CSVs
setup_repo.py         — one-time test repo initialization

dashboard/            — Next.js app for visualization and run management
├── src/app/api/      — API routes (runs CRUD, benchmark launcher, compare)
└── src/app/page.tsx  — single-page app with runs list, detail, and compare views

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
benchmark		benchmark
dashboard		dashboard
.gitignore		.gitignore
README.md		README.md
analyze.py		analyze.py
benchmark.py		benchmark.py
pyproject.toml		pyproject.toml
setup_repo.py		setup_repo.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitHub-as-State-Machine Benchmark

Thesis

Workflow Mapping

Quick Start

Configuration

Required PAT Scopes

Output

Key Metrics to Watch

Dashboard

Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GitHub-as-State-Machine Benchmark

Thesis

Workflow Mapping

Quick Start

Configuration

Required PAT Scopes

Output

Key Metrics to Watch

Dashboard

Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages