Skip to content

Security: harden tile proxy against third-party abuse (#204)#205

Merged
stef-k merged 18 commits intomainfrom
feature/harden-tile-proxy-204
Mar 22, 2026
Merged

Security: harden tile proxy against third-party abuse (#204)#205
stef-k merged 18 commits intomainfrom
feature/harden-tile-proxy-204

Conversation

@stef-k
Copy link
Copy Markdown
Owner

@stef-k stef-k commented Mar 22, 2026

No description provided.

stef-k added 18 commits March 22, 2026 20:03
…d budget, and auth rate limits (#204)

- Replace fixed-window rate limiter with sliding-window counter approximation
  to prevent boundary-batching attacks
- Add authenticated user rate limiting by userId (default 2000/min) — previously
  authenticated users bypassed rate limiting entirely
- Add outbound request budget (token-bucket at 2/sec, burst 4) to prevent
  cache-miss cascading from overwhelming upstream OSM
- Add X-Content-Type-Options: nosniff on tile responses
- Clean shutdown of outbound budget replenisher via IHostApplicationLifetime
…lity

- CRITICAL: Add TileRateLimitAuthenticatedPerMinute to admin settings UI,
  controller tracking, and property assignment; fix incorrect "never limited" text
- HIGH: Reduce outbound burst to 2 (OSM 2-connection policy), increase
  timeout to 10s for cold-cache resilience
- HIGH: Fix CTS disposal race in StopReplenisher — cancel without disposing,
  let GC handle abandoned CTS; mark fields volatile for cross-thread visibility
- HIGH: Bump default anonymous rate limit from 500 to 600 to compensate for
  stricter sliding-window algorithm
- MEDIUM: Fall back to IP-based rate limiting when authenticated user lacks
  NameIdentifier claim instead of pooling into shared "unknown" bucket
- MEDIUM: Update sliding-window jitter docs to reflect actual error bound
  (prevCount × ~0.5 during rotation, not "~1")
- MEDIUM: Separate Stop() for production shutdown from ResetForTesting() to
  avoid unnecessary drain/refill during app shutdown
- MEDIUM: Add test for outbound budget exhaustion returning false
- LOW: Fix eviction _currentCacheSize undercount — decrement after
  SaveChangesAsync succeeds, not before
- LOW: Move File.Exists check inside _cacheLock in eviction to fix TOCTOU
- LOW: Coalesce concurrent cleanup runs with Interlocked flag; reduce
  cleanup threshold from 100k to 10k entries
Add hidden field fallback for TileRateLimitEnabled and IsRegistrationOpen
checkboxes so unchecking them correctly posts "false" to the model binder.
Without the hidden field, unchecked checkboxes send no value and ASP.NET
falls back to the C# property default (true for TileRateLimitEnabled),
making it impossible to disable rate limiting from the admin UI.
…d sweep, concurrency

- CRITICAL: Reverse eviction order — commit DB deletions before file deletes
  to prevent orphaned DB records when SaveChangesAsync fails
- MEDIUM: Replace shared static _cleanupInProgress with per-cache flag so
  independent caches (anonymous, authenticated, image proxy) don't block cleanup
- MEDIUM: Add RateLimitCleanupJob (Quartz, every 5 min) to sweep expired entries
  from all rate limit caches, preventing unbounded memory growth below threshold
- LOW: Reorder OutboundBudget.StopReplenisher to cancel-then-replace, eliminating
  brief window where two replenishers overlap and double-release tokens
- LOW: Log warning when authenticated user lacks NameIdentifier claim and falls
  back to IP-based rate limiting (safe-side default, but diagnostically useful)
- LOW: Validate X-Forwarded-For with IPAddress.TryParse before using as rate
  limit key to prevent arbitrary strings from creating unique buckets
…s, scoping, retry

- CRITICAL: Remove global read-lock from tile cache — file reads no longer
  serialize through _cacheLock; catch IOException as cache miss instead.
  Writes and deletes retain the exclusive lock.
- CRITICAL: Increase outbound budget burst capacity from 2 to 10. OSM's
  2-connection policy now enforced at transport layer via
  SocketsHttpHandler.MaxConnectionsPerServer instead of semaphore capacity.
- HIGH: Add eviction coalescing with Interlocked.CompareExchange guard —
  concurrent CacheTileAsync calls can no longer trigger simultaneous
  eviction runs. Add DbUpdateConcurrencyException handling in eviction.
- HIGH: EvictDbTilesAsync now creates its own IServiceScope instead of
  using per-request _dbContext, preventing disposed-context failures.
- HIGH: CacheTileAsync breaks immediately on outbound budget exhaustion
  instead of retrying 3x (up to 30s thread block).
- MEDIUM: Add cross-field validation — authenticated rate limit must be
  >= anonymous rate limit in admin settings.
- MEDIUM: Document rate limit strategy (auth/anon mutually exclusive).
…est robustness

- MEDIUM: Split 304 path by zoom level — zoom >= 9 reads without lock
  (no sidecar write needed), consistent with other lock-free read paths.
  Zoom < 9 retains lock for atomic sidecar write + file read.
- MEDIUM: PurgeAllCacheAsync now uses DB-first ordering (consistent with
  EvictDbTilesAsync) — batch DB deletions commit before file deletes.
  If DB fails, no files are deleted and cache stays consistent.
- MEDIUM: Retry loop acquires outbound budget token only on first attempt.
  Retries on HTTP 5xx reuse the already-consumed token via skipBudget
  parameter, preventing upstream failures from exhausting burst budget.
- MEDIUM: TilesControllerTests clears static rate limit caches in
  constructor and uses deterministic IPs instead of Random().
- MEDIUM: SingleScopeFactory creates independent DbContext per scope
  sharing the same in-memory DB name, mirroring production behavior
  where scoped contexts have separate change trackers.
…, entity tracking

- HIGH: PurgeLRUCacheAsync restructured to DB-first ordering (consistent with
  EvictDbTilesAsync) — commit DB deletions before file deletes. Removed broken
  explicit transaction that caused retry failures on CommitAsync.
- HIGH: PurgeAllCacheAsync orphan query replaced File.Exists (untranslatable
  to SQL) with client-side HashSet filtering of existing disk files.
- MEDIUM: CacheTileAsync concurrency retry uses entry.ReloadAsync() instead of
  ToObject() to avoid creating an untracked entity that conflicts with the
  original tracked instance on the next Update() call.
- MEDIUM: EvictDbTilesAsync loads candidates with AsNoTracking + re-fetches by
  ID before delete, so a concurrent LastAccessed update doesn't cause stale
  RowVersion conflicts that permanently block eviction.
- MEDIUM: StopReplenisher captures CTS in local variable before assigning to
  Lazy lambda, preventing shared CTS between instances on rapid sequential calls.
- MEDIUM: PurgeBatchAsync only decrements _currentCacheSize for DB-tracked
  tiles (zoom >= 9, Meta != null). Previously decremented for all files
  including zoom 0-8, driving the counter negative after purge-all and
  permanently disabling eviction until app restart.
…rrency guards

- PurgeBatchAsync/PurgeLRUCacheAsync: clear change tracker + re-fetch by ID
  inside retry lambda to prevent entity tracking conflicts on retry
- PurgeAllCacheAsync orphan query: AsNoTracking + project only Id/TileFilePath
  to reduce memory footprint from full entity load
- RateLimitHelper: swap rotation order to atomic capture-then-assign,
  eliminating undercount gap where concurrent increments were lost
- OutboundBudget: add lock to StopReplenisher preventing concurrent
  CTS/Lazy interleaving that could orphan a replenisher
- Document _dbContext lifecycle safety and Referer check design rationale
…te limit hardening

- CRITICAL: Add unique composite index on TileCacheMetadata(Zoom, X, Y) —
  eliminates sequential scans on every tile request hot path
- CRITICAL: PurgeAllCacheAsync bulk-loads metadata in one query instead of
  O(N) individual DB queries per cached PNG file
- HIGH: Eviction _currentCacheSize now computed from re-fetched entity sizes
  instead of stale projected sizes, reducing drift
- HIGH: Outbound budget AcquireTimeout reduced from 10s to 3s to prevent
  thread pool starvation under cold-cache load; CancellationToken threaded
  through Send*/Cache*/Retrieve* chain to HttpContext.RequestAborted
- HIGH: Eviction and purge file deletion consolidated into single lock
  acquisition per batch, eliminating convoy effects from per-file locking
- MEDIUM: X-Forwarded-For IPs normalized (IPv4-mapped IPv6 → IPv4) to
  prevent aliasing from creating separate rate-limit buckets
- MEDIUM: Rate limit cache hard cap (50K entries) with oldest-entry eviction
  prevents unbounded memory growth from sustained low-rate attacks
- MEDIUM: Periodic _currentCacheSize reconciliation via RateLimitCleanupJob
  corrects accumulated drift from non-atomic size tracking
- MEDIUM: PurgeLRUCacheAsync chunked into 1000-ID batches to prevent
  PostgreSQL query plan explosion from large IN clauses
- MEDIUM: Sliding-window documentation corrected — worst-case jitter is
  up to full prevCount during rotation, not ~0.5
…drift, revalidation cancellation

- HIGH: Add per-IP outbound budget tracking (default 30 cache misses/min/IP)
  to prevent a single client from monopolizing global outbound tokens via
  uncached tile requests. Configurable in Admin Settings, 0 = disabled.
- HIGH: Normalize IPv4-mapped IPv6 on the direct-IP return path in
  GetClientIpAddress — previously only normalized on the X-Forwarded-For path,
  allowing rate limit bypass via dual-stack aliasing.
- MEDIUM: PurgeBatchAsync and PurgeLRUCacheAsync now capture actual sizes from
  re-fetched entities (not stale projected sizes) for _currentCacheSize
  decrement, consistent with EvictDbTilesAsync pattern.
- MEDIUM-LOW: Revalidation coalescing now uses CancellationToken.None for the
  outbound HTTP request so a disconnecting first caller doesn't cancel the
  request for all coalesced waiters.
… revalidation budget

- MEDIUM: PurgeAllCacheAsync replaces ToDictionaryAsync with foreach-based
  dictionary build to handle anomalous duplicate TileFilePath values gracefully
  (last-wins) instead of throwing ArgumentException at runtime.
- MEDIUM: PurgeBatchAsync and PurgeLRUCacheAsync file deletion now uses chunked
  lock acquisition (100 files per lock) instead of holding _cacheLock for the
  entire batch. Allows CacheTileAsync writes to interleave during large purges.
- MEDIUM: Add explicit AddHttpContextAccessor() in ConfigureServices() so
  TileCacheService's dependency is not fragile on implicit transitive registration.
- MEDIUM: Per-IP outbound budget now works for coalesced revalidation flights.
  Client IP is captured eagerly in RetrieveTileAsync (where HttpContext is
  available) and threaded through RevalidateTileAsync → SendConditionalTileRequestAsync
  → SendTileRequestCoreAsync. Previously, HttpContext was null for coalesced tasks,
  silently bypassing the per-IP budget check.
- HIGH: CacheTileAsync now catches DbUpdateException on concurrent tile insert.
  Two requests for the same uncached tile both saw null and called Add; one hit
  the unique index constraint. Now handled as a benign race at debug log level.
- MEDIUM: PurgeAllCacheAsync bulk-load projects only Id + TileFilePath with
  AsNoTracking instead of loading full entities (geometry, ETag, RowVersion).
  PurgeBatchAsync uses lightweight (int? MetaId, string, long) tuples.
- MEDIUM: RetryOperationAsync catches only DbUpdateException instead of all
  exceptions. Non-transient errors propagate immediately. Retry log level
  changed from Error to Warning.
ProxyImageRateLimitEnabled and ProxyImageRateLimitPerMinute were defined
in ApplicationSettings and consumed by TripViewerController/TripsController
but never exposed in the admin settings UI or wired in SettingsController.
Admins could not toggle or tune proxy image rate limiting.

- Add checkbox + hidden-field fallback for ProxyImageRateLimitEnabled
- Add numeric input for ProxyImageRateLimitPerMinute
- Add audit tracking and property assignment in SettingsController.Update
@stef-k stef-k merged commit bfb1d66 into main Mar 22, 2026
1 check passed
@stef-k stef-k deleted the feature/harden-tile-proxy-204 branch March 24, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant