Skip to content

Fix cascading 503 on cold-cache tile loading#210

Merged
stef-k merged 3 commits intomainfrom
fix/tile-retry-cascading-503
Mar 26, 2026
Merged

Fix cascading 503 on cold-cache tile loading#210
stef-k merged 3 commits intomainfrom
fix/tile-retry-cascading-503

Conversation

@stef-k
Copy link
Copy Markdown
Owner

@stef-k stef-k commented Mar 26, 2026

Summary

  • Client-side: Add concurrency pool (6 slots) to retryTileLayer.js — tiles queue and stream in progressively instead of blasting ~35 simultaneous requests that overwhelm server budgets
  • Server-side: Two-phase per-IP outbound budget — only actual upstream fetches count against the per-IP limit. Previously, IsRateLimitExceeded incremented on every request (including rejected ones), causing retries to snowball past the limit → cascading 503 → all tiles gray out permanently
  • Add WouldExceedRateLimit (peek), RecordRateLimitHit (increment), and PeekCount to RateLimitHelper for the check-then-record pattern

Closes #206

Test plan

  • dotnet build passes (0 errors)
  • dotnet test passes (1407 tests)
  • Cold-cache test: clear tile cache, load map at zoom 17-18 — tiles should progressively fill in instead of graying out
  • Verify server logs show per-IP budget hits only for requests that acquired a global budget token
  • Code review via code-reviewer subagent (passed)

stef-k added 3 commits March 26, 2026 19:43
Two-part fix for tiles graying out instead of progressively loading:

1. Client-side: Add concurrency pool (6 slots) to retryTileLayer.js.
   Prevents thundering herd where ~35 tiles blast the server simultaneously,
   overwhelming both per-IP and global outbound budgets.

2. Server-side: Two-phase per-IP outbound budget in SendTileRequestCoreAsync.
   Previously, IsRateLimitExceeded incremented the per-IP counter on every
   request, even those rejected by the global budget. This caused retries to
   find the counter already past the limit and fail immediately (cascading 503).
   Now uses WouldExceedRateLimit (peek, no increment) for fast-fail, then
   RecordRateLimitHit only after global budget is acquired.
Two-part fix for tiles graying out instead of progressively loading:

1. Client-side (retryTileLayer.js): Add concurrency pool (6 slots).
   Prevents thundering herd where ~35 tiles blast the server simultaneously,
   overwhelming both per-IP and global outbound budgets.

2. Server-side (TileCacheService/RateLimitHelper): Two-phase per-IP budget.
   Previously, IsRateLimitExceeded incremented the per-IP counter on every
   request, even those rejected by the global budget. This caused retries to
   find the counter already past the limit and fail immediately (cascading 503).
   Now uses WouldExceedRateLimit (peek without increment) for fast-fail, then
   RecordRateLimitHit only after global budget is acquired — so only actual
   upstream fetches count against the per-IP limit.

Closes #206
@stef-k stef-k merged commit cd5dfc1 into main Mar 26, 2026
1 check passed
@stef-k stef-k deleted the fix/tile-retry-cascading-503 branch March 26, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cold-cache tile loading returns gray areas instead of progressively filling tiles

1 participant