edgeandnode · suchapalaver · Jan 8, 2026
diff --git a/docs/architecture/ADR-001-static-allocations.md b/docs/architecture/ADR-001-static-allocations.md
@@ -0,0 +1,80 @@
+# ADR-001: Static Allocations via Box::leak
+
+## Status
+
+Accepted
+
+## Context
+
+The graph-gateway uses Axum as its HTTP framework. Axum's state management requires types to implement `Clone` and have `'static` lifetime. Several gateway components are heavyweight singletons that:
+
+1. Are initialized once at startup
+2. Never need to be deallocated (process lifetime)
+3. Are expensive to clone (contain channels, cryptographic keys, etc.)
+
+These components include:
+
+- `ReceiptSigner` - TAP receipt signing with private keys
+- `Budgeter` - PID controller state for fee management
+- `Chains` - Chain head tracking with per-chain state
+- `Eip712Domain` (attestation domains) - EIP-712 signing domains
+
+## Decision
+
+Use `Box::leak()` to convert owned `Box<T>` into `&'static T` references for singleton components.
+
+```rust
+// Example from main.rs
+let receipt_signer: &'static ReceiptSigner = Box::leak(Box::new(ReceiptSigner::new(...)));
+
+let chains: &'static Chains = Box::leak(Box::new(Chains::new(...)));
+```
+
+## Consequences
+
+### Positive
+
+1. **Zero-cost sharing**: `&'static T` is `Copy`, so passing to handlers has no overhead
+2. **No Arc overhead**: Avoids atomic reference counting on every request
+3. **Simpler lifetimes**: No need to propagate lifetime parameters through handler types
+4. **Explicit intent**: Makes it clear these are process-lifetime singletons
+
+### Negative
+
+1. **Memory never freed**: The leaked memory is never reclaimed. Acceptable because:
+   - Components live for the entire process lifetime anyway
+   - Total leaked memory is small and bounded (< 1 KB)
+   - Process termination reclaims all memory
+
+2. **Not suitable for tests**: Tests that need fresh state must use different patterns. Currently mitigated by limited test coverage.
+
+## Alternatives Considered
+
+### `Arc<T>` (Rejected)
+
+```rust
+let receipt_signer: Arc<ReceiptSigner> = Arc::new(ReceiptSigner::new(...));
+```
+
+Problems:
+
+- Atomic operations on every clone (per-request overhead)
+- More complex to share across Axum handlers
+- Implies shared ownership when sole ownership is the intent
+
+### `once_cell::sync::Lazy` (Rejected)
+
+```rust
+static RECEIPT_SIGNER: Lazy<ReceiptSigner> = Lazy::new(|| ...);
+```
+
+Problems:
+
+- Requires initialization logic in static context
+- Cannot use async initialization
+- Configuration not available at static init time
+
+## References
+
+- [Axum State Documentation](https://docs.rs/axum/latest/axum/extract/struct.State.html)
+- [Box::leak documentation](https://doc.rust-lang.org/std/boxed/struct.Box.html#method.leak)
diff --git a/docs/architecture/ADR-002-type-state-pattern.md b/docs/architecture/ADR-002-type-state-pattern.md
@@ -0,0 +1,131 @@
+# ADR-002: Type-State Pattern for Indexer Processing
+
+## Status
+
+Accepted
+
+## Context
+
+Indexer information flows through multiple processing stages, with each stage enriching the data:
+
+1. **Raw** - Basic indexer info from network subgraph
+2. **Version resolved** - After fetching indexer-service version
+3. **Progress resolved** - After fetching indexing progress (block height)
+4. **Cost resolved** - After fetching cost model/fee info
+
+Processing order matters: we need version info before we can query for progress (different API versions), and we need progress before cost resolution makes sense (stale indexers are filtered).
+
+A naive approach would use `Option<T>` fields that get populated:
+
+```rust
+struct IndexingInfo {
+    indexer: IndexerId,
+    deployment: DeploymentId,
+    version: Option<Version>,      // Filled in stage 2
+    progress: Option<BlockNumber>, // Filled in stage 3
+    fee: Option<GRT>,              // Filled in stage 4
+}
+```
+
+This leads to `unwrap()` calls throughout the codebase and runtime errors when accessing fields before they're populated.
+
+## Decision
+
+Use the type-state pattern with generic parameters to encode processing stage at compile time.
+
+```rust
+// Type markers for processing stages
+struct Unresolved;
+struct VersionResolved(Version);
+struct ProgressResolved { version: Version, block: BlockNumber }
+struct FullyResolved { version: Version, block: BlockNumber, fee: GRT }
+
+// Generic struct parameterized by stage
+struct IndexingInfo<Stage> {
+    indexer: IndexerId,
+    deployment: DeploymentId,
+    stage: Stage,
+}
+
+// Stage transitions are explicit methods
+impl IndexingInfo<Unresolved> {
+    fn resolve_version(self, version: Version) -> IndexingInfo<VersionResolved> {
+        IndexingInfo {
+            indexer: self.indexer,
+            deployment: self.deployment,
+            stage: VersionResolved(version),
+        }
+    }
+}
+```
+
+See `src/network/indexer_processing.rs` for the actual implementation.
+
+## Consequences
+
+### Positive
+
+1. **Compile-time safety**: Impossible to access version info before it's resolved
+2. **Self-documenting**: Function signatures show required processing stage
+3. **No runtime overhead**: Type parameters are erased at compile time
+4. **Explicit transitions**: Stage changes are visible method calls, not silent mutations
+
+### Negative
+
+1. **Verbose types**: `IndexingInfo<ProgressResolved>` is longer than `IndexingInfo`
+2. **Learning curve**: Pattern is less common, may confuse new contributors
+3. **More boilerplate**: Stage transition methods must be written explicitly
+
+## Pattern Usage
+
+```rust
+// Functions declare their required stage in the signature
+fn select_candidate(info: &IndexingInfo<FullyResolved>) -> Score {
+    // Safe to access info.stage.fee - compiler guarantees it exists
+    calculate_score(info.stage.fee, info.stage.block)
+}
+
+// Processing pipeline
+async fn process_indexer(raw: IndexingInfo<Unresolved>) -> Result<IndexingInfo<FullyResolved>> {
+    let with_version = raw.resolve_version(fetch_version(&raw.indexer).await?);
+    let with_progress = with_version.resolve_progress(fetch_progress(&with_version).await?);
+    let fully_resolved = with_progress.resolve_cost(fetch_cost(&with_progress).await?);
+    Ok(fully_resolved)
+}
+```
+
+## Alternatives Considered
+
+### Builder Pattern (Rejected)
+
+```rust
+IndexingInfoBuilder::new(indexer, deployment)
+    .version(v)
+    .progress(p)
+    .fee(f)
+    .build()
+```
+
+Problems:
+
+- Runtime validation only
+- `build()` must check all fields are set
+- No compile-time guarantee of processing order
+
+### Separate Structs (Rejected)
+
+```rust
+struct RawIndexingInfo { ... }
+struct ResolvedIndexingInfo { ... }
+```
+
+Problems:
+
+- Code duplication across struct definitions
+- Harder to share common logic
+- Type relationships not explicit
+
+## References
+
+- [Typestate Pattern in Rust](https://cliffle.com/blog/rust-typestate/)
+- [Parse, don't validate](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/)
diff --git a/docs/architecture/ADR-003-pid-budget-controller.md b/docs/architecture/ADR-003-pid-budget-controller.md
@@ -0,0 +1,149 @@
+# ADR-003: PID Controller for Fee Budget Management
+
+## Status
+
+Accepted
+
+## Context
+
+The gateway must manage query fee budgets to balance:
+
+1. **Cost efficiency** - Minimize fees paid to indexers
+2. **Query success rate** - Ensure queries succeed by offering competitive fees
+3. **Responsiveness** - Adapt quickly to market conditions
+
+Static fee budgets fail because:
+
+- Too low: Indexers reject queries, degraded service
+- Too high: Overpaying, wasted budget
+- Market conditions change: Indexer fees fluctuate based on demand
+
+We need a dynamic system that automatically adjusts fee budgets based on observed success rates.
+
+## Decision
+
+Implement a PID (Proportional-Integral-Derivative) controller to dynamically adjust fee budgets based on query success rate.
+
+### PID Controller Overview
+
+The PID controller continuously adjusts the fee budget using three terms:
+
+```
+adjustment = Kp * error + Ki * integral + Kd * derivative
+
+where:
+  error = target_success_rate - actual_success_rate
+  integral = sum of past errors
+  derivative = rate of error change
+```
+
+- **P (Proportional)**: Immediate response to current error
+- **I (Integral)**: Corrects persistent bias over time
+- **D (Derivative)**: Dampens oscillations, smooths response
+
+### Implementation
+
+See `src/budgets.rs` for implementation:
+
+```rust
+pub struct Budgeter {
+    controller: PidController,
+    decay_buffer: DecayBuffer,
+    budget_per_query: f64,
+}
+
+impl Budgeter {
+    pub fn feedback(&self, success: bool) {
+        self.decay_buffer.record(success);
+        let success_rate = self.decay_buffer.success_rate();
+        let adjustment = self.controller.update(success_rate);
+        self.budget_per_query *= adjustment;
+    }
+}
+```
+
+### Decay Buffer
+
+Success rate is calculated using exponential decay to weight recent observations more heavily:
+
+```
+weighted_sum = sum(success_i * decay^i)
+weighted_count = sum(decay^i)
+success_rate = weighted_sum / weighted_count
+```
+
+This provides:
+
+- Fast response to changing conditions
+- Natural forgetting of stale data
+- Bounded memory usage
+
+## Consequences
+
+### Positive
+
+1. **Self-tuning**: Budget automatically converges to optimal level
+2. **Adaptive**: Responds to market changes without manual intervention
+3. **Stable**: PID controllers are well-understood and tuneable
+4. **Observable**: Budget changes can be monitored via metrics
+
+### Negative
+
+1. **Tuning required**: PID gains (Kp, Ki, Kd) must be tuned for the system
+2. **Oscillation risk**: Poorly tuned controller can oscillate
+3. **Complexity**: More complex than static budgets
+4. **Cold start**: Initial budget must be set heuristically
+
+## Tuning Parameters
+
+Current parameters (may need adjustment based on production data):
+
+| Parameter | Value | Purpose                                   |
+| --------- | ----- | ----------------------------------------- |
+| Kp        | 0.1   | Proportional gain - immediate response    |
+| Ki        | 0.01  | Integral gain - bias correction           |
+| Kd        | 0.05  | Derivative gain - oscillation damping     |
+| Target    | 0.95  | Target success rate (95%)                 |
+| Decay     | 0.99  | Decay factor for success rate calculation |
+
+## Alternatives Considered
+
+### Static Budget (Rejected)
+
+```rust
+const BUDGET_PER_QUERY: GRT = GRT::from_wei(1_000_000);
+```
+
+Problems:
+
+- Cannot adapt to market conditions
+- Requires manual intervention to change
+- Either overpays or fails queries
+
+### Threshold-based Adjustment (Rejected)
+
+```rust
+if success_rate < 0.9 { budget *= 1.1; }
+if success_rate > 0.95 { budget *= 0.9; }
+```
+
+Problems:
+
+- Oscillates around thresholds
+- Step changes cause instability
+- No derivative term to dampen oscillations
+
+### Machine Learning Model (Rejected)
+
+Train a model to predict optimal budget based on features.
+
+Problems:
+
+- Requires training data
+- Black box behavior
+- Overkill for this use case
+
+## References
+
+- [PID Controller (Wikipedia)](https://en.wikipedia.org/wiki/PID_controller)
+- [Control Theory for Software Engineers](https://blog.acolyer.org/2015/05/01/feedback-control-for-computer-systems/)
diff --git a/src/auth.rs b/src/auth.rs
@@ -1,3 +1,32 @@
+//! API Key Authentication
+//!
+//! Handles API key validation, payment status checks, and domain authorization.
+//!
+//! # Authentication Flow
+//!
+//! 1. Extract API key from `Authorization: Bearer <key>` header
+//! 2. Parse and validate key format (32-char hex string → 16 bytes)
+//! 3. Look up key in `api_keys` map (from Studio API or Kafka)
+//! 4. Check payment status (`QueryStatus::Active`, `ServiceShutoff`, `MonthlyCapReached`)
+//! 5. Verify origin domain against authorized domains list
+//! 6. Return [`AuthSettings`] with user address and authorized subgraphs
+//!
+//! # Special API Keys
+//!
+//! Keys in `special_api_keys` bypass payment checks. Used for admin/monitoring.
+//!
+//! # Domain Authorization
+//!
+//! The `domains` field supports wildcards:
+//! - `"example.com"` → exact match only
+//! - `"*.example.com"` → matches `foo.example.com`, `bar.example.com`
+//! - Empty list → all domains authorized
+//!
+//! # API Key Sources
+//!
+//! - [`studio_api`]: Poll HTTP endpoint periodically
+//! - [`kafka`]: Stream updates from Kafka topic
+
 pub mod kafka;
 pub mod studio_api;