fix(discord): prevent rate limit retry storm#1586
Conversation
Concurrent requests to the same Discord endpoint independently retried on 429 responses, each consuming the rate limit budget and preventing others from succeeding. This created a self-sustaining retry loop lasting ~30 seconds per cycle. Add a shared per-endpoint rate limit gate so all concurrent callers wait on the same timestamp, and cap retries at 5 to prevent prolonged retry storms. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
🚅 Deployed to the simon.dev-pr-1586 environment in simon.dev
|
|
Claude finished @simonkberg's task in 1m 28s —— View job Code Review Complete ✅I've performed a comprehensive review of the rate limiting improvements across code quality, security, performance, testing, and documentation. This is an excellent implementation with strong engineering practices throughout. SummaryThe PR successfully solves the rate limit retry storm problem with a clean, well-tested solution. The implementation demonstrates:
Detailed Review1. Code Quality ⭐️⭐️⭐️⭐️⭐️Strengths:
Minor observations:
2. Security ✅No security concerns identified:
3. Performance ✅Strengths:
No performance concerns. 4. Testing ⭐️⭐️⭐️⭐️⭐️Exceptional test quality:
Re: Codecov 85% patch coverage (3 missing lines): Based on the Codecov report, the missing lines are likely:
These are acceptable coverage gaps:
Recommendation: The current 85% patch coverage is excellent for rate limiting code. The missing lines don't represent meaningful risk. 5. Documentation ✅Adequate documentation:
No documentation concerns. The code is self-explanatory, and the PR body provides sufficient context. Specific ObservationsRetry Count Logic (api.ts:86-106)The retry counter increments before the timeout check (api.ts:88). This means:
Result: Allows 1 initial + 5 retries = 6 total attempts (verified by test at api.test.ts:883) This is correct behavior. The constant name Shared Gate Pattern (api.ts:109-112)const newUntil = Date.now() + retryAfterMs;
const existing = rateLimitUntil.get(endpoint) ?? 0;
rateLimitUntil.set(endpoint, Math.max(existing, newUntil));Excellent defensive programming. Using Test Quality Highlights
RecommendationsNone—this PR is ready to merge. 🎉 The implementation is clean, well-tested, and solves the stated problem effectively. The 85% patch coverage is strong, and the missing lines don't represent meaningful risk. Your evaluation of the previous suggestions was spot-on. Final VerdictApprove and merge. This is production-quality code that demonstrates:
Great work! 👏 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1586 +/- ##
==========================================
+ Coverage 97.76% 97.81% +0.04%
==========================================
Files 49 49
Lines 896 916 +20
Branches 156 161 +5
==========================================
+ Hits 876 896 +20
Misses 8 8
Partials 12 12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@claude Evaluated both suggestions: 1. Clear gate on non-429 errors — Not implementing. The gate represents Discord's stated rate limit window. A subsequent 500 doesn't mean the rate limit has ended — it's a separate error. Clearing the gate prematurely would let other requests hit the rate limit again. The gate expires naturally when 2. Extended inline comment — The existing comment ( |
|
Claude finished @simonkberg's task in 19s —— View job Response to FeedbackThank you for the thoughtful evaluation. Your reasoning on both points makes sense: 1. Gate cleanup on non-429 errors: You're correct that the gate represents Discord's declared rate limit window, and a 500 error doesn't invalidate that window. The natural expiration via 2. Inline comment: The existing comment adequately documents the purpose. The implementation is clear enough without additional explanation. Both decisions are sound. The PR is well-designed and ready to merge as-is. ✅ |
Add _setRateLimitGate test helper to directly set a rate limit gate, since the gate timeout path is unreachable through natural 429 responses alone (the gate is bounded by RATE_LIMIT_TIMEOUT_MS). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
rateLimitUntilMap) so concurrent requests wait on the same timestamp instead of independently retrying and consuming each other's rate limit budgetMAX_RETRIES) to prevent 30-second retry storms when multiple requests sustain the rate limitretriescount in log messages for observabilityTest plan
pnpm lintpasses (TypeScript, ESLint, Prettier)🤖 Generated with Claude Code