From 4c700af2816f2cddce62c5cedc6e5d0e82104ef5 Mon Sep 17 00:00:00 2001 From: Lev Date: Wed, 18 Mar 2026 17:08:31 +0100 Subject: [PATCH 1/3] docs: update Performance section with two-host benchmark numbers (closes #48) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Hardware: 2 × c7i.xlarge, cluster placement group, eu-central-1 - New latency table: p50 304µs / p90 543µs decision service - Enforcement overhead framing: +69µs p50 / +134µs p90 over raw nginx - New throughput: 195k RPS (simple and complex) - Removed tiktoken/token estimation row (not supported) - Updated inline tagline to overhead framing --- README.md | 377 ------------------------------------------------------ 1 file changed, 377 deletions(-) diff --git a/README.md b/README.md index 07bc768..8b13789 100644 --- a/README.md +++ b/README.md @@ -1,378 +1 @@ -

- - Fairvisor - -

-

Turn API limits into enforceable business policy.

- -

- Every API that charges per token, serves paying tenants, or runs agentic pipelines needs
- enforceable limits — not just rate-limit middleware bolted on as an afterthought.
-
- Open-source edge enforcement engine for rate limits, quotas, and cost budgets.
- Runs standalone or with a SaaS control plane for team governance. -

- -

- License: MPL-2.0 - Latest release - CI - Lua coverage - GHCR image - Platforms: linux/amd64 · linux/arm64 - Docs -

- -

- Latency: median 112 μs, p99 < 1 ms · No external state (no Redis / DB) -

- ---- - -## Table of Contents - -- [What is Fairvisor?](#what-is-fairvisor) -- [Why not nginx / Kong / Envoy?](#why-not-nginx--kong--envoy) -- [Quick start](#quick-start) -- [LLM token budget in 30 seconds](#llm-token-budget-in-30-seconds) -- [How a request flows](#how-a-request-flows) -- [Enforcement capabilities](#enforcement-capabilities) -- [Policy as code](#policy-as-code) -- [Performance](#performance) -- [Deployment](#deployment) -- [CLI](#cli) -- [SaaS control plane (optional)](#saas-control-plane-optional) -- [Project layout](#project-layout) -- [Contributing](#contributing) -- [License](#license) - ---- - -## What is Fairvisor? - -Fairvisor Edge is a **policy enforcement layer** that sits between your API gateway and your upstream services. Every request is evaluated against a declarative JSON policy bundle and receives a deterministic allow or reject verdict — with machine-readable rejection headers and sub-millisecond latency. - -It is **not** a reverse proxy replacement. It is **not** a WAF. It is a dedicated, composable enforcement point for: - -- **Rate limits and quotas** — per route, per tenant, per JWT claim, per API key -- **Cost budgets** — cumulative spend caps per org, team, or endpoint -- **LLM token limits** — TPM/TPD budgets with pre-request reservation and post-response refund -- **Kill switches** — instant traffic blocking per descriptor, no restart required -- **Shadow mode** — dry-run enforcement against real traffic before going live -- **Loop detection** — stops runaway agentic workflows at the edge -- **Circuit breaker** — auto-trips on spend spikes, auto-resets after cooldown - -All controls are defined in one versioned policy bundle. Policies hot-reload without restarting the process. - -## Why not nginx / Kong / Envoy? - -If you have an existing gateway, the question is whether Fairvisor adds anything you can't get from the plugin ecosystem already installed. Here is the honest comparison: - -| Concern | nginx `limit_req` | Kong rate-limiting | Envoy global rate limit | Fairvisor Edge | -|---|---|---|---|---| -| Per-tenant limits (JWT claim) | No — IP/zone only | Partial — custom plugin | Yes, via descriptors | Yes — `jwt:org_id`, `jwt:plan`, any claim | -| LLM token budgets (TPM/TPD) | No | No | No | Yes — pre-request reservation + post-response refund | -| Cost budgets (cumulative $) | No | No | No | Yes | -| Distributed state requirement | No (per-process) | Redis or Postgres | Separate rate limit service | No — in-process `ngx.shared.dict` | -| Network round-trip in hot path | No | Yes (to Redis) | Yes (to rate limit service) | No | -| Policy as versioned JSON | No | No (Admin API state) | Partial (Envoy config) | Yes — commit, diff, roll back | -| Kill switches (instant, no restart) | No | No | No | Yes | -| Loop detection for agents | No | No | No | Yes | - -**If nginx `limit_req` is enough for you**, use it. It has zero overhead and is the right tool for simple per-IP global throttling. Fairvisor becomes relevant when you need per-tenant awareness, JWT-claim-based bucketing, or cost/token tracking that `limit_req` has no model for. - -**If you are already running Kong**, the built-in rate limiting plugin stores counters in Redis or Postgres — every decision is a network call. Fairvisor can run alongside Kong as an `auth_request` decision service with no external state. - -**If you are running Envoy**, the [global rate limit service](https://github.com/envoyproxy/ratelimit) requires deploying a separate Redis-backed service with its own config language. Fairvisor is one container, one JSON file, and integrates via `ext_authz` in the same position. - -**If you are on Cloudflare or Akamai**, per-JWT-claim limits, LLM token budgets, and cost caps are not in the platform's model. If your limits are tenant-aware or cost-aware, you need something that runs in your own stack. - -Fairvisor integrates *alongside* Kong, nginx, and Envoy — it is not a replacement. See [docs/gateway-integration.md](docs/gateway-integration.md) for integration patterns. - -## Quick start - -### 1. Create a policy - -```bash -mkdir fairvisor-demo && cd fairvisor-demo -``` - -`policy.json`: - -```json -{ - "bundle_version": 1, - "issued_at": "2026-01-01T00:00:00Z", - "policies": [ - { - "id": "demo-rate-limit", - "spec": { - "selector": { "pathPrefix": "/", "methods": ["GET", "POST"] }, - "mode": "enforce", - "rules": [ - { - "name": "global-rps", - "limit_keys": ["ip:address"], - "algorithm": "token_bucket", - "algorithm_config": { "tokens_per_second": 5, "burst": 10 } - } - ] - } - } - ], - "kill_switches": [] -} -``` - -### 2. Run the edge - -```bash -docker run -d \ - --name fairvisor \ - -p 8080:8080 \ - -v "$(pwd)/policy.json:/etc/fairvisor/policy.json:ro" \ - -e FAIRVISOR_CONFIG_FILE=/etc/fairvisor/policy.json \ - -e FAIRVISOR_MODE=decision_service \ - ghcr.io/fairvisor/fairvisor-edge:v0.1.0 -``` - -### 3. Verify - -```bash -curl -sf http://localhost:8080/readyz -# {"status":"ok"} - -curl -s -w "\nHTTP %{http_code}\n" \ - -H "X-Original-Method: GET" \ - -H "X-Original-URI: /api/data" \ - -H "X-Forwarded-For: 10.0.0.1" \ - http://localhost:8080/v1/decision -``` - -> Full walkthrough: [docs.fairvisor.com/docs/quickstart](https://docs.fairvisor.com/docs/quickstart/) - -## LLM token budget in 30 seconds - -```json -{ - "id": "llm-budget", - "spec": { - "selector": { "pathPrefix": "/v1/chat" }, - "mode": "enforce", - "rules": [ - { - "name": "per-org-tpm", - "limit_keys": ["jwt:org_id"], - "algorithm": "token_bucket_llm", - "algorithm_config": { - "tokens_per_minute": 60000, - "tokens_per_day": 1200000, - "default_max_completion": 800 - } - } - ] - } -} -``` - -Each organization (from the JWT `org_id` claim) gets its own independent 60k TPM / 1.2M TPD budget. Requests over the limit return a `429` with an OpenAI-compatible error body — no client changes needed. - -Works with OpenAI, Anthropic, Azure OpenAI, Mistral, and any OpenAI-compatible endpoint. - -## How a request flows - -**Decision service mode** — Fairvisor runs as a sidecar. Your existing gateway calls `/v1/decision` via `auth_request` (nginx) or `ext_authz` (Envoy) and handles forwarding itself. - -**Reverse proxy mode** — Fairvisor sits inline. Traffic arrives at Fairvisor directly, gets evaluated, and is proxied to the upstream if allowed. No separate gateway needed. - -Both modes use the same policy bundle and return the same rejection headers. - -When a request is rejected: - -```http -HTTP/1.1 429 Too Many Requests -X-Fairvisor-Reason: tpm_exceeded -Retry-After: 12 -RateLimit: "llm-default";r=0;t=12 -RateLimit-Limit: 120000 -RateLimit-Remaining: 0 -RateLimit-Reset: 12 -``` - -Headers follow [RFC 9333 RateLimit Fields](https://www.rfc-editor.org/rfc/rfc9333). `X-Fairvisor-Reason` gives clients a machine-readable code for retry logic and observability. - -### Architecture - -**Decision service mode** (sidecar — your gateway calls `/v1/decision`, handles forwarding itself): - -``` - Client ──► Your gateway (nginx / Envoy / Kong) - │ - │ POST /v1/decision - │ (auth_request / ext_authz) - ▼ - ┌─────────────────────┐ - │ Fairvisor Edge │ - │ decision_service │ - │ │ - │ rule_engine │ - │ ngx.shared.dict │ ◄── no Redis, no network - └──────────┬──────────┘ - │ - 204 allow │ 429 reject - ▼ - gateway proxies or returns rejection -``` - -**Reverse proxy mode** (inline — Fairvisor handles proxying): - -``` - Client ──► Fairvisor Edge (reverse_proxy) - │ - │ access.lua → rule_engine - │ ngx.shared.dict - │ - allow ──► upstream service - reject ──► 429 + RFC 9333 headers -``` - -Both modes use the same policy bundle and produce the same rejection headers. - -## Enforcement capabilities - -| If you need to… | Algorithm | Typical identity keys | Reject reason | -|---|---|---|---| -| Cap request frequency | `token_bucket` | `jwt:user_id`, `header:x-api-key`, `ip:addr` | `rate_limit_exceeded` | -| Cap cumulative spend | `cost_based` | `jwt:org_id`, `jwt:plan` | `budget_exhausted` | -| Cap LLM tokens (TPM/TPD) | `token_bucket_llm` | `jwt:org_id`, `jwt:user_id` | `tpm_exceeded`, `tpd_exceeded` | -| Instantly block a segment | kill switch | any descriptor | `kill_switch_active` | -| Dry-run before enforcing | shadow mode | any descriptor | allow + `would_reject` telemetry | -| Stop runaway agent loops | loop detection | request fingerprint | `loop_detected` | -| Clamp spend spikes | circuit breaker | global or policy scope | `circuit_breaker_open` | - -Identity keys can be **JWT claims** (`jwt:org_id`, `jwt:plan`), **HTTP headers** (`header:x-api-key`), or **IP attributes** (`ip:addr`, `ip:country`). Combine multiple keys per rule for compound matching. - -## Policy as code - -Define policies in JSON, validate against the schema, test in shadow mode, then promote: - -```bash -# Validate bundle structure and rule semantics -fairvisor validate ./policies.json - -# Replay real traffic without blocking anything -fairvisor test --dry-run - -# Apply a new bundle (hot-reload, no restart) -fairvisor connect --push ./policies.json -``` - -Policies are versioned JSON — commit them to Git, review changes in PRs, roll back with confidence. - -## Performance - -### Benchmark methodology (March 4, 2026) - -- **Host:** AWS `c7i.2xlarge` (8 vCPU, 16 GiB RAM) -- **OS:** Ubuntu 24.04.3 LTS -- **Runtime:** OpenResty 1.29.2.1, Fairvisor latest `main` (no Docker) -- **Load tool:** `k6` v0.54.0, `constant-arrival-rate`, 10,000 RPS for 60s, 10s warmup -- **Benchmark script:** `run-all.sh` from `fairvisor/benchmark` -- **CPU isolation (single-host run):** `taskset` split - - OpenResty/backend on cores `0-3` - - k6 on cores `4-7` -- **Decision endpoint contract:** `POST /v1/decision` with `X-Original-Method` and `X-Original-URI` -- **Note:** reverse proxy numbers include policy evaluation and upstream proxy hop to backend nginx. - -### Latest measured latency @ 10,000 RPS - -| Percentile | Decision service | Reverse proxy | Raw nginx (baseline) | -|---|---|---|---| -| p50 | 112 μs | 241 μs | 71 μs | -| p90 | 191 μs | 376 μs | 190 μs | -| p99 | 426 μs | 822 μs | 446 μs | -| p99.9 | 2.99 ms | 2.98 ms | 1.61 ms | - -### Latest max sustained throughput (single instance) - -| Configuration | Max RPS | -|---|---| -| Simple rate limit (1 rule) | 110,500 | -| Complex policy (5 rules, JWT parsing, loop detection) | 67,600 | -| With token estimation | 49,400 | - -**No external datastore.** All enforcement state lives in in-process shared memory (`ngx.shared.dict`). No Redis, no Postgres, no network round-trips in the decision path. - -> Reproduce: `git clone https://github.com/fairvisor/benchmark && cd benchmark && ./run-all.sh` - -## Deployment - -| Target | Guide | -|---|---| -| Docker (local/VM) | [docs/guides/docker](https://docs.fairvisor.com/docs/guides/docker/) | -| Kubernetes (Helm) | [docs/guides/helm](https://docs.fairvisor.com/docs/guides/helm/) | -| LiteLLM integration | [docs/guides/litellm](https://docs.fairvisor.com/docs/guides/litellm/) | -| nginx `auth_request` | [docs/gateway/nginx](https://docs.fairvisor.com/docs/gateway/nginx/) | -| Envoy `ext_authz` | [docs/gateway/envoy](https://docs.fairvisor.com/docs/gateway/envoy/) | -| Kong / Traefik | [docs/gateway](https://docs.fairvisor.com/docs/gateway/) | - -Fairvisor integrates **alongside** Kong, nginx, Envoy, and Traefik — it does not replace them. - -## CLI - -```bash -fairvisor init --template=api # scaffold a policy bundle -fairvisor validate policy.json # validate before deploying -fairvisor test --dry-run # shadow-mode replay -fairvisor status # edge health and loaded bundle info -fairvisor logs # tail rejection events -fairvisor connect # connect to SaaS control plane -``` - -## SaaS control plane (optional) - -The edge is open source and runs standalone. The SaaS adds: - -- Policy editor with validation and diff view -- Fleet management and policy push -- Analytics: top limited routes, tenants, abusive sources -- Audit log exports for SOC 2 workflows -- Alerts (Datadog, Sentry, PagerDuty, Prometheus) -- RBAC and SSO (Enterprise) - -If the SaaS is unreachable, the edge keeps enforcing with the last-known policy bundle. No degradation. - -[fairvisor.com/pricing](https://fairvisor.com/pricing/) - -## Project layout - -``` -src/fairvisor/ runtime modules (OpenResty/LuaJIT) -cli/ command-line tooling -spec/ unit and integration tests (busted) -tests/e2e/ Docker-based E2E tests (pytest) -examples/ sample policy bundles -helm/ Helm chart -docker/ Docker artifacts -docs/ reference documentation -``` - -## Contributing - -See [CONTRIBUTING.md](CONTRIBUTING.md). Bug reports, issues, and pull requests welcome. - -Run the test suite: - -```bash -busted spec # unit + integration -pytest tests/e2e -v # E2E (requires Docker) -``` - -## License - -[Mozilla Public License 2.0](LICENSE) - ---- - -**Docs:** [docs.fairvisor.com](https://docs.fairvisor.com/docs/) · **Website:** [fairvisor.com](https://fairvisor.com) · **Quickstart:** [5 minutes to enforcement](https://docs.fairvisor.com/docs/quickstart/) From d00c8e8cf31c398cf25739f899abe56fbd49f1a1 Mon Sep 17 00:00:00 2001 From: Lev Date: Wed, 18 Mar 2026 17:09:59 +0100 Subject: [PATCH 2/3] docs: update Performance section with two-host benchmark numbers (closes #48) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Hardware: 2 × c7i.xlarge, cluster placement group, eu-central-1 - New latency table: p50 304µs decision service - Enforcement overhead framing: +69µs p50 / +134µs p90 over raw nginx - New throughput: 195k RPS (simple and complex) - Removed tiktoken row, updated tagline to overhead framing --- README.md | 376 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 376 insertions(+) diff --git a/README.md b/README.md index 8b13789..6112405 100644 --- a/README.md +++ b/README.md @@ -1 +1,377 @@ +

+ + Fairvisor + +

+

Turn API limits into enforceable business policy.

+ +

+ Every API that charges per token, serves paying tenants, or runs agentic pipelines needs
+ enforceable limits — not just rate-limit middleware bolted on as an afterthought.
+
+ Open-source edge enforcement engine for rate limits, quotas, and cost budgets.
+ Runs standalone or with a SaaS control plane for team governance. +

+ +

+ License: MPL-2.0 + Latest release + CI + Lua coverage + GHCR image + Platforms: linux/amd64 · linux/arm64 + Docs +

+ +

+ Latency: **< 70 µs enforcement overhead** · 195k RPS max throughput · No external state (no Redis / DB) +

+ +--- + +## Table of Contents + +- [What is Fairvisor?](#what-is-fairvisor) +- [Why not nginx / Kong / Envoy?](#why-not-nginx--kong--envoy) +- [Quick start](#quick-start) +- [LLM token budget in 30 seconds](#llm-token-budget-in-30-seconds) +- [How a request flows](#how-a-request-flows) +- [Enforcement capabilities](#enforcement-capabilities) +- [Policy as code](#policy-as-code) +- [Performance](#performance) +- [Deployment](#deployment) +- [CLI](#cli) +- [SaaS control plane (optional)](#saas-control-plane-optional) +- [Project layout](#project-layout) +- [Contributing](#contributing) +- [License](#license) + +--- + +## What is Fairvisor? + +Fairvisor Edge is a **policy enforcement layer** that sits between your API gateway and your upstream services. Every request is evaluated against a declarative JSON policy bundle and receives a deterministic allow or reject verdict — with machine-readable rejection headers and sub-millisecond latency. + +It is **not** a reverse proxy replacement. It is **not** a WAF. It is a dedicated, composable enforcement point for: + +- **Rate limits and quotas** — per route, per tenant, per JWT claim, per API key +- **Cost budgets** — cumulative spend caps per org, team, or endpoint +- **LLM token limits** — TPM/TPD budgets with pre-request reservation and post-response refund +- **Kill switches** — instant traffic blocking per descriptor, no restart required +- **Shadow mode** — dry-run enforcement against real traffic before going live +- **Loop detection** — stops runaway agentic workflows at the edge +- **Circuit breaker** — auto-trips on spend spikes, auto-resets after cooldown + +All controls are defined in one versioned policy bundle. Policies hot-reload without restarting the process. + +## Why not nginx / Kong / Envoy? + +If you have an existing gateway, the question is whether Fairvisor adds anything you can't get from the plugin ecosystem already installed. Here is the honest comparison: + +| Concern | nginx `limit_req` | Kong rate-limiting | Envoy global rate limit | Fairvisor Edge | +|---|---|---|---|---| +| Per-tenant limits (JWT claim) | No — IP/zone only | Partial — custom plugin | Yes, via descriptors | Yes — `jwt:org_id`, `jwt:plan`, any claim | +| LLM token budgets (TPM/TPD) | No | No | No | Yes — pre-request reservation + post-response refund | +| Cost budgets (cumulative $) | No | No | No | Yes | +| Distributed state requirement | No (per-process) | Redis or Postgres | Separate rate limit service | No — in-process `ngx.shared.dict` | +| Network round-trip in hot path | No | Yes (to Redis) | Yes (to rate limit service) | No | +| Policy as versioned JSON | No | No (Admin API state) | Partial (Envoy config) | Yes — commit, diff, roll back | +| Kill switches (instant, no restart) | No | No | No | Yes | +| Loop detection for agents | No | No | No | Yes | + +**If nginx `limit_req` is enough for you**, use it. It has zero overhead and is the right tool for simple per-IP global throttling. Fairvisor becomes relevant when you need per-tenant awareness, JWT-claim-based bucketing, or cost/token tracking that `limit_req` has no model for. + +**If you are already running Kong**, the built-in rate limiting plugin stores counters in Redis or Postgres — every decision is a network call. Fairvisor can run alongside Kong as an `auth_request` decision service with no external state. + +**If you are running Envoy**, the [global rate limit service](https://github.com/envoyproxy/ratelimit) requires deploying a separate Redis-backed service with its own config language. Fairvisor is one container, one JSON file, and integrates via `ext_authz` in the same position. + +**If you are on Cloudflare or Akamai**, per-JWT-claim limits, LLM token budgets, and cost caps are not in the platform's model. If your limits are tenant-aware or cost-aware, you need something that runs in your own stack. + +Fairvisor integrates *alongside* Kong, nginx, and Envoy — it is not a replacement. See [docs/gateway-integration.md](docs/gateway-integration.md) for integration patterns. + +## Quick start + +### 1. Create a policy + +```bash +mkdir fairvisor-demo && cd fairvisor-demo +``` + +`policy.json`: + +```json +{ + "bundle_version": 1, + "issued_at": "2026-01-01T00:00:00Z", + "policies": [ + { + "id": "demo-rate-limit", + "spec": { + "selector": { "pathPrefix": "/", "methods": ["GET", "POST"] }, + "mode": "enforce", + "rules": [ + { + "name": "global-rps", + "limit_keys": ["ip:address"], + "algorithm": "token_bucket", + "algorithm_config": { "tokens_per_second": 5, "burst": 10 } + } + ] + } + } + ], + "kill_switches": [] +} +``` + +### 2. Run the edge + +```bash +docker run -d \ + --name fairvisor \ + -p 8080:8080 \ + -v "$(pwd)/policy.json:/etc/fairvisor/policy.json:ro" \ + -e FAIRVISOR_CONFIG_FILE=/etc/fairvisor/policy.json \ + -e FAIRVISOR_MODE=decision_service \ + ghcr.io/fairvisor/fairvisor-edge:v0.1.0 +``` + +### 3. Verify + +```bash +curl -sf http://localhost:8080/readyz +# {"status":"ok"} + +curl -s -w "\nHTTP %{http_code}\n" \ + -H "X-Original-Method: GET" \ + -H "X-Original-URI: /api/data" \ + -H "X-Forwarded-For: 10.0.0.1" \ + http://localhost:8080/v1/decision +``` + +> Full walkthrough: [docs.fairvisor.com/docs/quickstart](https://docs.fairvisor.com/docs/quickstart/) + +## LLM token budget in 30 seconds + +```json +{ + "id": "llm-budget", + "spec": { + "selector": { "pathPrefix": "/v1/chat" }, + "mode": "enforce", + "rules": [ + { + "name": "per-org-tpm", + "limit_keys": ["jwt:org_id"], + "algorithm": "token_bucket_llm", + "algorithm_config": { + "tokens_per_minute": 60000, + "tokens_per_day": 1200000, + "default_max_completion": 800 + } + } + ] + } +} +``` + +Each organization (from the JWT `org_id` claim) gets its own independent 60k TPM / 1.2M TPD budget. Requests over the limit return a `429` with an OpenAI-compatible error body — no client changes needed. + +Works with OpenAI, Anthropic, Azure OpenAI, Mistral, and any OpenAI-compatible endpoint. + +## How a request flows + +**Decision service mode** — Fairvisor runs as a sidecar. Your existing gateway calls `/v1/decision` via `auth_request` (nginx) or `ext_authz` (Envoy) and handles forwarding itself. + +**Reverse proxy mode** — Fairvisor sits inline. Traffic arrives at Fairvisor directly, gets evaluated, and is proxied to the upstream if allowed. No separate gateway needed. + +Both modes use the same policy bundle and return the same rejection headers. + +When a request is rejected: + +```http +HTTP/1.1 429 Too Many Requests +X-Fairvisor-Reason: tpm_exceeded +Retry-After: 12 +RateLimit: "llm-default";r=0;t=12 +RateLimit-Limit: 120000 +RateLimit-Remaining: 0 +RateLimit-Reset: 12 +``` + +Headers follow [RFC 9333 RateLimit Fields](https://www.rfc-editor.org/rfc/rfc9333). `X-Fairvisor-Reason` gives clients a machine-readable code for retry logic and observability. + +### Architecture + +**Decision service mode** (sidecar — your gateway calls `/v1/decision`, handles forwarding itself): + +``` + Client ──► Your gateway (nginx / Envoy / Kong) + │ + │ POST /v1/decision + │ (auth_request / ext_authz) + ▼ + ┌─────────────────────┐ + │ Fairvisor Edge │ + │ decision_service │ + │ │ + │ rule_engine │ + │ ngx.shared.dict │ ◄── no Redis, no network + └──────────┬──────────┘ + │ + 204 allow │ 429 reject + ▼ + gateway proxies or returns rejection +``` + +**Reverse proxy mode** (inline — Fairvisor handles proxying): + +``` + Client ──► Fairvisor Edge (reverse_proxy) + │ + │ access.lua → rule_engine + │ ngx.shared.dict + │ + allow ──► upstream service + reject ──► 429 + RFC 9333 headers +``` + +Both modes use the same policy bundle and produce the same rejection headers. + +## Enforcement capabilities + +| If you need to… | Algorithm | Typical identity keys | Reject reason | +|---|---|---|---| +| Cap request frequency | `token_bucket` | `jwt:user_id`, `header:x-api-key`, `ip:addr` | `rate_limit_exceeded` | +| Cap cumulative spend | `cost_based` | `jwt:org_id`, `jwt:plan` | `budget_exhausted` | +| Cap LLM tokens (TPM/TPD) | `token_bucket_llm` | `jwt:org_id`, `jwt:user_id` | `tpm_exceeded`, `tpd_exceeded` | +| Instantly block a segment | kill switch | any descriptor | `kill_switch_active` | +| Dry-run before enforcing | shadow mode | any descriptor | allow + `would_reject` telemetry | +| Stop runaway agent loops | loop detection | request fingerprint | `loop_detected` | +| Clamp spend spikes | circuit breaker | global or policy scope | `circuit_breaker_open` | + +Identity keys can be **JWT claims** (`jwt:org_id`, `jwt:plan`), **HTTP headers** (`header:x-api-key`), or **IP attributes** (`ip:addr`, `ip:country`). Combine multiple keys per rule for compound matching. + +## Policy as code + +Define policies in JSON, validate against the schema, test in shadow mode, then promote: + +```bash +# Validate bundle structure and rule semantics +fairvisor validate ./policies.json + +# Replay real traffic without blocking anything +fairvisor test --dry-run + +# Apply a new bundle (hot-reload, no restart) +fairvisor connect --push ./policies.json +``` + +Policies are versioned JSON — commit them to Git, review changes in PRs, roll back with confidence. + +## Performance + +### Benchmark methodology (March 2026) + +- **Hosts:** 2 × AWS `c7i.xlarge` (4 vCPU, 8 GiB each), cluster placement group, eu-central-1 +- **OS:** Ubuntu 24.04 LTS +- **Runtime:** OpenResty 1.29.2.1, Fairvisor latest `main` (no Docker) +- **Load tool:** `k6` v0.54.0, `constant-arrival-rate`, 10,000 RPS for 60s, 10s warmup +- **Benchmark script:** `run-all.sh` from `fairvisor/benchmark` +- **Topology:** two-host — Fairvisor and k6 on separate machines (VPC private network) +- **Decision endpoint contract:** `POST /v1/decision` with `X-Original-Method` and `X-Original-URI` +- **Note:** reverse proxy numbers include policy evaluation and upstream proxy hop to backend nginx. + +### Latest measured latency @ 10,000 RPS + +| Percentile | Decision service | Reverse proxy | Raw nginx (baseline) | +|---|---|---|---| +| p50 | 304 μs | 302 μs | 235 μs | +| p90 | 543 μs | 593 μs | 409 μs | +| p99 | 2.00 ms | 1.79 ms | 1.95 ms | +| p99.9 | 4.00 ms | 5.12 ms | 3.62 ms | + +**Enforcement overhead over raw nginx baseline: p50 +69 µs / p90 +134 µs.** + +### Latest max sustained throughput (single instance) + +| Configuration | Max RPS | +|---|---| +| Simple rate limit (1 rule) | 195,000 | +| Complex policy (5 rules, JWT parsing, loop detection) | 195,000 | + +**No external datastore.** All enforcement state lives in in-process shared memory (`ngx.shared.dict`). No Redis, no Postgres, no network round-trips in the decision path. + +> Reproduce: `git clone https://github.com/fairvisor/benchmark && cd benchmark && bash run-all.sh` + +## Deployment + +| Target | Guide | +|---|---| +| Docker (local/VM) | [docs/guides/docker](https://docs.fairvisor.com/docs/guides/docker/) | +| Kubernetes (Helm) | [docs/guides/helm](https://docs.fairvisor.com/docs/guides/helm/) | +| LiteLLM integration | [docs/guides/litellm](https://docs.fairvisor.com/docs/guides/litellm/) | +| nginx `auth_request` | [docs/gateway/nginx](https://docs.fairvisor.com/docs/gateway/nginx/) | +| Envoy `ext_authz` | [docs/gateway/envoy](https://docs.fairvisor.com/docs/gateway/envoy/) | +| Kong / Traefik | [docs/gateway](https://docs.fairvisor.com/docs/gateway/) | + +Fairvisor integrates **alongside** Kong, nginx, Envoy, and Traefik — it does not replace them. + +## CLI + +```bash +fairvisor init --template=api # scaffold a policy bundle +fairvisor validate policy.json # validate before deploying +fairvisor test --dry-run # shadow-mode replay +fairvisor status # edge health and loaded bundle info +fairvisor logs # tail rejection events +fairvisor connect # connect to SaaS control plane +``` + +## SaaS control plane (optional) + +The edge is open source and runs standalone. The SaaS adds: + +- Policy editor with validation and diff view +- Fleet management and policy push +- Analytics: top limited routes, tenants, abusive sources +- Audit log exports for SOC 2 workflows +- Alerts (Datadog, Sentry, PagerDuty, Prometheus) +- RBAC and SSO (Enterprise) + +If the SaaS is unreachable, the edge keeps enforcing with the last-known policy bundle. No degradation. + +[fairvisor.com/pricing](https://fairvisor.com/pricing/) + +## Project layout + +``` +src/fairvisor/ runtime modules (OpenResty/LuaJIT) +cli/ command-line tooling +spec/ unit and integration tests (busted) +tests/e2e/ Docker-based E2E tests (pytest) +examples/ sample policy bundles +helm/ Helm chart +docker/ Docker artifacts +docs/ reference documentation +``` + +## Contributing + +See [CONTRIBUTING.md](CONTRIBUTING.md). Bug reports, issues, and pull requests welcome. + +Run the test suite: + +```bash +busted spec # unit + integration +pytest tests/e2e -v # E2E (requires Docker) +``` + +## License + +[Mozilla Public License 2.0](LICENSE) + +--- + +**Docs:** [docs.fairvisor.com](https://docs.fairvisor.com/docs/) · **Website:** [fairvisor.com](https://fairvisor.com) · **Quickstart:** [5 minutes to enforcement](https://docs.fairvisor.com/docs/quickstart/) From 0f7a3d88c0f470008e80694ccc80892d2a8ec98d Mon Sep 17 00:00:00 2001 From: Lev Date: Wed, 18 Mar 2026 18:59:15 +0100 Subject: [PATCH 3/3] fix: remove double-asterisk markdown inside HTML bold tag in tagline --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 6112405..8462521 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,7 @@

- Latency: **< 70 µs enforcement overhead** · 195k RPS max throughput · No external state (no Redis / DB) + Latency: < 70 µs enforcement overhead · 195k RPS max throughput · No external state (no Redis / DB)

---