Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
9473d09
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
c608efc
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
046ac3b
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
7b4cbbe
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
a0551ff
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
a36312e
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
d70e2c4
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
489a28a
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
53a7035
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
3b078d0
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
4b4d249
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
70ed186
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
e03dfcc
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
b538176
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
f021fd6
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
3b54d3f
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
e1dd56d
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
800c4f9
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
f13e641
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
fbcf12d
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
a93b377
docs(readme): add quickstart pointer, update project layout, fix benc…
levleontiev Mar 17, 2026
a4ad21c
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
411c6c7
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
288c9c7
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
80365c9
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
e778599
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
399cd93
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
e327a0c
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
ee2ab17
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
348789e
refactor: replace provider-failover recipe with circuit-breaker
Mar 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 18 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,12 @@ Fairvisor integrates *alongside* Kong, nginx, and Envoy — it is not a replacem

## Quick start

> **Runnable quickstart:** `examples/quickstart/` — `docker compose up -d` and run your first enforce/reject test in under a minute. See [`examples/quickstart/README.md`](examples/quickstart/README.md).
>
> **Recipes:** `examples/recipes/` — deployable team budgets, runaway agent guard, and provider failover examples.
>
> **Sample artifacts:** `fixtures/` — canonical request/response fixtures for enforce, reject (TPM, TPD, prompt-too-large), and provider-native error bodies (OpenAI, Anthropic, Gemini).

### 1. Create a policy

```bash
Expand Down Expand Up @@ -304,7 +310,7 @@ Policies are versioned JSON — commit them to Git, review changes in PRs, roll

**No external datastore.** All enforcement state lives in in-process shared memory (`ngx.shared.dict`). No Redis, no Postgres, no network round-trips in the decision path.

> Reproduce: `git clone https://github.com/fairvisor/benchmark && cd benchmark && ./run-all.sh`
> Reproduce: see [fairvisor/benchmark](https://github.com/fairvisor/benchmark) — the canonical benchmark source of truth for Fairvisor Edge performance numbers.

## Deployment

Expand Down Expand Up @@ -348,14 +354,16 @@ If the SaaS is unreachable, the edge keeps enforcing with the last-known policy
## Project layout

```
src/fairvisor/ runtime modules (OpenResty/LuaJIT)
cli/ command-line tooling
spec/ unit and integration tests (busted)
tests/e2e/ Docker-based E2E tests (pytest)
examples/ sample policy bundles
helm/ Helm chart
docker/ Docker artifacts
docs/ reference documentation
src/fairvisor/ runtime modules (OpenResty/LuaJIT)
cli/ command-line tooling
spec/ unit and integration tests (busted)
tests/e2e/ Docker-based E2E tests (pytest)
examples/quickstart/ runnable quickstart (docker compose up -d)
examples/recipes/ deployable policy recipes (team budgets, agent guard, failover)
fixtures/ canonical request/response sample artifacts
helm/ Helm chart
docker/ Docker artifacts
docs/ reference documentation
```

## Contributing
Expand All @@ -376,3 +384,4 @@ pytest tests/e2e -v # E2E (requires Docker)
---

**Docs:** [docs.fairvisor.com](https://docs.fairvisor.com/docs/) · **Website:** [fairvisor.com](https://fairvisor.com) · **Quickstart:** [5 minutes to enforcement](https://docs.fairvisor.com/docs/quickstart/)

108 changes: 108 additions & 0 deletions examples/quickstart/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Fairvisor Edge — Quickstart

Go from `git clone` to working policy enforcement in one step.

## Prerequisites

- Docker with Compose V2 (`docker compose version`)
- Port 8080 free on localhost

## Start

```bash
docker compose up -d
```

Wait for the edge service to report healthy:

```bash
docker compose ps
# edge should show "healthy"
```

## Verify enforcement

This quickstart runs in `FAIRVISOR_MODE=reverse_proxy`. Requests to `/v1/*`
are enforced by the TPM policy and forwarded to a local mock LLM backend.
No real API keys are required.

**Allowed request** — should return `200`:

```bash
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d @../../fixtures/normal_request.json
```

Expected response body shape matches `../../fixtures/allow_response.json`.

**Over-limit request** — should return `429`:

```bash
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d @../../fixtures/over_limit_request.json
```

Expected response body shape: `../../fixtures/reject_tpm_exceeded.json`.
The response will also include:
- `X-Fairvisor-Reason: tpm_exceeded`
- `Retry-After: 60`
- `RateLimit-Limit: 100` (matches the quickstart policy `tokens_per_minute`)
- `RateLimit-Remaining: 0`

## How the policy works

The quickstart policy (`policy.json`) enforces a TPM limit keyed on `ip:address`:

- `tokens_per_minute: 100` — allows roughly 2 small requests per minute
- `tokens_per_day: 1000` — daily cap
- `default_max_completion: 50` — pessimistic reservation per request when `max_tokens` is not set

Sending `over_limit_request.json` (which sets `max_tokens: 200000`) immediately
exceeds the 100-token per-minute budget and triggers a `429`.

## Wrapper mode (real provider routing)

Wrapper mode routes requests to real upstream providers using provider-prefixed paths
and a composite Bearer token. It requires real provider API keys and cannot be
demonstrated with this mock stack.

**Path and auth format:**

```
POST /openai/v1/chat/completions
Authorization: Bearer CLIENT_JWT:UPSTREAM_KEY
```

Where:
- `CLIENT_JWT` — signed JWT identifying the calling client/tenant (used for policy enforcement)
- `UPSTREAM_KEY` — real upstream API key forwarded to the provider (e.g. `sk-...` for OpenAI)

Fairvisor strips the composite header, injects the correct provider auth before forwarding,
and **never returns upstream auth headers to the caller**
(see `../../fixtures/allow_response.json`).

**Provider-prefixed paths:**

| Path prefix | Upstream | Auth header injected |
|---|---|---|
| `/openai/v1/...` | `https://api.openai.com/v1/...` | `Authorization: Bearer UPSTREAM_KEY` |
| `/anthropic/v1/...` | `https://api.anthropic.com/v1/...` | `x-api-key: UPSTREAM_KEY` |
| `/gemini/v1beta/...` | `https://generativelanguage.googleapis.com/v1beta/...` | `x-goog-api-key: UPSTREAM_KEY` |

To run in wrapper mode, change the compose env to `FAIRVISOR_MODE: wrapper` and
supply real credentials in the `Authorization` header.

## Teardown

```bash
docker compose down
```

## Next steps

- See `../recipes/` for team budgets, runaway agent guard, and provider failover scenarios
- See `../../fixtures/` for all sample request/response artifacts
- See [fairvisor/benchmark](https://github.com/fairvisor/benchmark) for performance benchmarks
- See [docs/install/](../../docs/install/) for Kubernetes, VM, and SaaS deployment options
58 changes: 58 additions & 0 deletions examples/quickstart/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Fairvisor Edge — Quickstart stack (standalone + reverse proxy mode)
#
# Usage:
# docker compose up -d
# curl -s http://localhost:8080/readyz # health check
# curl -s -X POST http://localhost:8080/v1/chat/completions \
# -H "Content-Type: application/json" \
# -d @../../fixtures/normal_request.json # expect 200
# curl -s -X POST http://localhost:8080/v1/chat/completions \
# -H "Content-Type: application/json" \
# -d @../../fixtures/over_limit_request.json # expect 429
#
# This stack runs in FAIRVISOR_MODE=reverse_proxy — requests to /v1/* are
# enforced by policy then forwarded to the local mock LLM backend.
# No real API keys required.
#
# Wrapper mode (routing by provider prefix, real upstream keys) is documented
# in README.md under "Wrapper mode". It requires real provider credentials and
# cannot be demonstrated with this mock stack.
#
# This file is also the base for the e2e-smoke CI check.
# CI extends it via tests/e2e/docker-compose.test.yml; do not diverge the
# service name, port, or volume contract without updating CI as well.

services:
edge:
image: ghcr.io/fairvisor/fairvisor-edge:latest
ports:
- "8080:8080"
environment:
FAIRVISOR_CONFIG_FILE: /etc/fairvisor/policy.json
FAIRVISOR_MODE: reverse_proxy
FAIRVISOR_BACKEND_URL: http://mock_llm:80
FAIRVISOR_SHARED_DICT_SIZE: 32m
FAIRVISOR_LOG_LEVEL: info
FAIRVISOR_WORKER_PROCESSES: "1"
volumes:
- ./policy.json:/etc/fairvisor/policy.json:ro
depends_on:
mock_llm:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-sf", "http://127.0.0.1:8080/readyz"]
interval: 2s
timeout: 2s
retries: 15
start_period: 5s

mock_llm:
image: nginx:1.27-alpine
volumes:
- ./mock-llm.conf:/etc/nginx/nginx.conf:ro
healthcheck:
test: ["CMD", "wget", "-q", "-O", "-", "http://127.0.0.1:80/"]
interval: 2s
timeout: 2s
retries: 10
start_period: 5s
10 changes: 10 additions & 0 deletions examples/quickstart/mock-llm.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
events {}
http {
server {
listen 80;
location / {
default_type application/json;
return 200 '{"id":"chatcmpl-qs","object":"chat.completion","choices":[{"index":0,"message":{"role":"assistant","content":"Hello from the mock backend!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":8,"total_tokens":18}}';
}
}
}
31 changes: 31 additions & 0 deletions examples/quickstart/policy.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"bundle_version": 1,
"issued_at": "2026-01-01T00:00:00Z",
"expires_at": "2030-01-01T00:00:00Z",
"policies": [
{
"id": "quickstart-tpm-policy",
"spec": {
"selector": {
"pathPrefix": "/v1/",
"methods": ["POST"]
},
"mode": "enforce",
"rules": [
{
"name": "tpm-limit",
"limit_keys": ["ip:address"],
"algorithm": "token_bucket_llm",
"algorithm_config": {
"tokens_per_minute": 100,
"tokens_per_day": 1000,
"burst_tokens": 100,
"default_max_completion": 50
}
}
]
}
}
],
"kill_switches": []
}
43 changes: 43 additions & 0 deletions examples/recipes/circuit-breaker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Recipe: Circuit Breaker — Cost Spike Auto-Shutdown

Automatically block all LLM traffic when the aggregate token spend rate
exceeds a budget threshold, then self-reset after a cooldown period.

## How it works

- Normal traffic: per-org TPM limit enforced (`100 000 tokens/min`)
- Spike detection: if the rolling spend rate hits `500 000 tokens/min`
the circuit breaker opens and **all requests return `429`** with
`X-Fairvisor-Reason: circuit_breaker_open`
- Auto-reset: after 10 minutes without breaker-triggering load, the
circuit resets automatically — no manual intervention needed
- `alert: true` logs the trip event to the Fairvisor audit log

## Deploy

```bash
cp policy.json /etc/fairvisor/policy.json
```

## Expected behaviour

```bash
# Normal request — passes
curl -s -o /dev/null -w "%{http_code}" \
-H "Authorization: Bearer <jwt>:<upstream-key>" \
http://localhost:8080/v1/chat/completions \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"hi"}]}'
# → 200

# After spend spike trips the breaker:
# → 429 X-Fairvisor-Reason: circuit_breaker_open
# Retry-After: 600
```

## Tuning

| Field | Description |
|---|---|
| `spend_rate_threshold_per_minute` | Tokens/min rolling spend that opens the breaker |
| `auto_reset_after_minutes` | Cooldown before automatic reset (0 = manual only) |
| `tokens_per_minute` | Per-org steady-state limit (independent of breaker) |
37 changes: 37 additions & 0 deletions examples/recipes/circuit-breaker/policy.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{
"bundle_version": 1,
"issued_at": "2026-01-01T00:00:00Z",
"expires_at": "2030-01-01T00:00:00Z",
"policies": [
{
"id": "cost-spike-guard",
"spec": {
"selector": {
"pathPrefix": "/v1/",
"methods": ["POST"]
},
"mode": "enforce",
"rules": [
{
"name": "per-org-tpm",
"limit_keys": ["jwt:org_id"],
"algorithm": "token_bucket_llm",
"algorithm_config": {
"tokens_per_minute": 100000,
"burst_tokens": 100000,
"default_max_completion": 2048
}
}
],
"circuit_breaker": {
"enabled": true,
"spend_rate_threshold_per_minute": 500000,
"action": "reject",
"alert": true,
"auto_reset_after_minutes": 10
}
}
}
],
"kill_switches": []
}
50 changes: 50 additions & 0 deletions examples/recipes/runaway-agent-guard/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Recipe: Runaway Agent Guard

Stop runaway agentic workflows before they exhaust your token budget or
billing limit.

## Problem

Autonomous agents (LangChain, AutoGPT, custom loops) can enter retry storms
or infinite planning loops. Without enforcement, a single runaway agent
can consume thousands of dollars of API budget in minutes.

## How it works

Two rules cooperate:

1. **Loop detector** — counts requests per `agent_id` in a sliding window.
If the agent fires more than 30 requests in 60 seconds, it trips a
120-second cooldown. This catches tight retry loops.

2. **TPM guard** — caps tokens per minute per agent. A burst-heavy agent
that passes the loop check still cannot drain the token pool.

## Deploy

```bash
cp policy.json /etc/fairvisor/policy.json
```

## JWT shape expected

```json
{
"sub": "user-456",
"agent_id": "autoagent-prod-7",
"exp": 9999999999
}
```

## Kill switch for incidents

If an agent causes an incident, flip a kill switch without restarting edge:

```bash
# Via CLI
fairvisor kill-switch enable agent-id=autoagent-prod-7

# Or update the policy bundle with a kill_switch entry and hot-reload
```

See `docs/cookbook/kill-switch-incident-response.md` for the full incident playbook.
Loading
Loading