Skip to content

fix(security): F6 exempt peer endpoints from per-IP rate limit#106

Merged
TickTockBent merged 2 commits intomainfrom
fix/86-rate-limit-peer-bypass
Apr 27, 2026
Merged

fix(security): F6 exempt peer endpoints from per-IP rate limit#106
TickTockBent merged 2 commits intomainfrom
fix/86-rate-limit-peer-bypass

Conversation

@TickTockBent
Copy link
Copy Markdown
Owner

Summary

Closes #86. The per-IP rate limiter was applied to all routes, including the inter-cluster endpoints /v1/gossip/message and /v1/bootstrap. Above ~6 ops/sec/peer the default 100 req/min limit 429'd legitimate peer traffic and broke the cluster — the burn-in symptom that forced the REPRAM_RATE_LIMIT=10000 workaround.

Threat model (resolved during PR design)

The interesting question on this PR was whether removing rate-limiting on peer endpoints opens a DoS vector. Walked through it explicitly:

  • HMAC verification is already in place on both endpoints when REPRAM_CLUSTER_SECRET is set (verifyGossipSignature in Go, verifySignature in TS). The rate limiter was running before auth and 429'ing valid peer traffic before HMAC could decide.
  • Open mode (REPRAM_CLUSTER_SECRET empty): operators have explicitly accepted no-auth on the cluster plane. Applying a client-tier rate limit there is inconsistent with that trust model — if you chose open mode you accepted possible hostile actors.

So peer endpoints bypass rate-limiting in both modes. HMAC handles secret-set; operator-chosen trust handles open-mode.

What changed

  • internal/node/middleware.go: added peerEndpoints map. Middleware skips the rate-limit branch for /v1/gossip/message and /v1/bootstrap. Other checks (size, scanner, headers) still apply.
  • repram-mcp/src/node/middleware.ts: same change in check().

Tests

  • internal/node/middleware_test.go (+2): gossip + bootstrap stay open at burst=1 over 50 requests; client endpoints still 429.
  • repram-mcp/src/node/middleware.test.ts (+3): same coverage on TS.

Suite results

  • Go: all packages green
  • TS: 362/362 (was 359, +3 new)

Test plan

  • CI green
  • Re-run live wire-compat (./test/live-wire-compat/run.sh) — the 50 ops/sec gossip rate it generates would have been borderline pre-fix
  • Spot-check that scanner-detection and oversized-request rejection still fire on peer endpoints (covered by existing tests, but worth confirming)

Closes

Closes #86

// ticktockbent

Closes #86. The per-IP rate limiter was applied to all routes, including
the inter-cluster endpoints `/v1/gossip/message` and `/v1/bootstrap`.
Above ~6 ops/sec/peer the default 100 req/min limit would 429 legitimate
peer traffic and break the cluster — the burn-in symptom that forced the
REPRAM_RATE_LIMIT=10000 workaround.

Authentication on those endpoints is already correct: when
REPRAM_CLUSTER_SECRET is set, both `verifyGossipSignature` (Go) and
`verifySignature` (TS) HMAC-gate the request. The rate limiter was
running before auth and short-circuiting valid peer traffic.

Fix: peer endpoints bypass the rate limiter in both impls. Other
security checks (request size, scanner detection, security headers)
still apply.

Open-mode handling (REPRAM_CLUSTER_SECRET=""): peer endpoints bypass
rate limiting in this mode too. Operators who choose open mode have
explicitly accepted no-auth on the cluster plane; applying a client-
tier rate limit there is inconsistent with that trust model. Per-user
direction during PR design.

Tests:
  internal/node/middleware_test.go (+2):
    TestPeerEndpointsBypassRateLimit — gossip + bootstrap stay open at
                                        burst=1 over 50 requests
    TestClientEndpointsStillRateLimited — bypass scoped strictly to
                                           peer endpoints

  repram-mcp/src/node/middleware.test.ts (+3):
    bypasses rate limit on /v1/gossip/message
    bypasses rate limit on /v1/bootstrap
    still rate-limits client endpoints when peer-bypass is in effect

Suite results:
  Go: all packages green
  TS: 362/362 (was 359, +3 new)

Closes #86

// ticktockbent
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
repram Ready Ready Preview, Comment Apr 27, 2026 2:43pm

Request Review

…rity

Addresses suggestions 1 and 2 from PR #106 review.

Suggestion 1 — guard against widening the bypass:
  The fix exempts peer endpoints from rate-limiting only; body-size,
  scanner detection, and security headers must still fire. No prior
  test would catch a refactor that accidentally moved any of those
  inside the !peerEndpoints gate. Added regression tests in both impls
  asserting peer endpoints return 413 on oversized bodies and 403 on
  scanner UAs.

Suggestion 2 — Go/TS parity on the peer-endpoint set:
  Go used a top-level peerEndpoints map; TS used an inline disjunction
  in check(). When a third peer endpoint is added the two shapes have
  to be edited differently. Extracted a top-level PEER_ENDPOINTS Set
  in TS to mirror the Go map. Single source of truth on each side; the
  cross-reference comment notes that both must be updated together.

Tests:
  internal/node/middleware_test.go (+2):
    TestPeerEndpointsSizeCheckStillApplies
    TestPeerEndpointsScannerCheckStillApplies

  repram-mcp/src/node/middleware.test.ts (+2):
    still enforces body size on peer endpoints
    still enforces scanner detection on peer endpoints

Suite results:
  Go: all packages green
  TS: 364/364 (was 362, +2 new)

// ticktockbent
@TickTockBent TickTockBent merged commit ebf1a4f into main Apr 27, 2026
4 checks passed
@TickTockBent TickTockBent deleted the fix/86-rate-limit-peer-bypass branch April 27, 2026 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

F6: rate limiter rejects peer-to-peer gossip traffic (default 100/min/IP triggers 429 storm)

1 participant