Problem
The backend caches user profiles in memory for 5 minutes (_USER_PROFILE_CACHE_TTL = 300) in backend/src/apis/shared/auth/dependencies.py:46-47. When a user's profile changes (roles, name, etc.), the stale cached value is served until the TTL expires, which is confusing for both users and admins debugging permission issues.
Original symptom (likely already fixed)
On first login, the user's profile was being cached with incomplete data — specifically, roles weren't yet enriched from the IdP, so RBAC checks failed for ~5 minutes after a user's first sign-in.
This appears to be addressed by commit 9ea91e9 (April 7, 2026), which calls invalidate_user_profile_cache(user_id) immediately after /users/me/sync upserts the profile to DynamoDB (backend/src/apis/app_api/users/routes.py:89-94). But it was a symptom of the broader staleness problem, and other mutation paths (e.g. admin role edits) don't bust the cache.
Two paths to evaluate
Path A — Intelligent cache busting
Keep the cache; add invalidation hooks at every write site.
What's already wired:
/users/me/sync calls invalidate_user_profile_cache() after upsert.
What's missing:
- Admin endpoints under
/admin/roles/* mutate role mappings but don't invalidate the per-user profile cache.
- The separate
AppRole cache in backend/src/apis/shared/rbac/cache.py has its own TTL/invalidation logic and is not coordinated with the profile cache.
- No invalidation on any other write paths (e.g. profile edits if/when added).
- In a multi-instance deployment (Fargate, multiple tasks), in-memory invalidation only clears the local process — other tasks still serve stale data until TTL. Would need a pub/sub layer (SNS/EventBridge/Redis pub-sub) for cross-instance busting.
Pros: Preserves the RCU savings; minimal disruption.
Cons: Every new write path becomes a place to forget the invalidation call. Doesn't solve cross-instance staleness without additional infrastructure.
Path B — Remove the cache
Hit DynamoDB on every authenticated request.
Performance trade-offs:
- Current: ~1 GetItem per user per 5 minutes = ~288 reads/user/day.
- Without cache: 1 GetItem per authenticated request. For an active chat user, this is dozens to hundreds of GetItems per session.
- DynamoDB on-demand pricing: $0.25 per million read request units. A user issuing 500 requests/day costs ~$0.000125/day vs ~$0.00000025/day cached. At 1,000 MAU averaging 200 requests/day, that's ~$1.50/month extra — negligible.
- Latency: GetItem on a single-PK lookup is sub-10ms in-region. Adds to every request's auth dependency. For SSE/streaming endpoints this is once per stream open, not per chunk, so impact is small.
- Throttling risk: Provisioned mode tables would need higher RCU; on-demand mode auto-scales.
Pros: Profile changes are instantly reflective. Simpler code path. No cross-instance coordination needed.
Cons: Slight per-request latency bump; small RCU cost increase. Need to confirm UserRepository.get_user_by_user_id() is the only hot path being saved.
Recommendation to discuss
Path B (remove the cache) looks attractive given the small cost delta, the simplification, and the fact that we keep finding edge-case staleness bugs. Path A keeps growing in surface area as we add write paths and would eventually need pub/sub to be correct in a multi-task deployment.
But Path A is lower risk if there's a high-traffic codepath I'm missing where the per-request DynamoDB hit would actually matter.
Decision needed
- Confirm Path A vs Path B (or hybrid: short TTL like 30s + invalidation hooks).
- If Path B, run a load test against a representative user session to confirm DynamoDB latency is acceptable.
- If Path A, audit all write paths and decide whether cross-instance invalidation is needed now or deferred.
References
Problem
The backend caches user profiles in memory for 5 minutes (
_USER_PROFILE_CACHE_TTL = 300) in backend/src/apis/shared/auth/dependencies.py:46-47. When a user's profile changes (roles, name, etc.), the stale cached value is served until the TTL expires, which is confusing for both users and admins debugging permission issues.Original symptom (likely already fixed)
On first login, the user's profile was being cached with incomplete data — specifically, roles weren't yet enriched from the IdP, so RBAC checks failed for ~5 minutes after a user's first sign-in.
This appears to be addressed by commit
9ea91e9(April 7, 2026), which callsinvalidate_user_profile_cache(user_id)immediately after/users/me/syncupserts the profile to DynamoDB (backend/src/apis/app_api/users/routes.py:89-94). But it was a symptom of the broader staleness problem, and other mutation paths (e.g. admin role edits) don't bust the cache.Two paths to evaluate
Path A — Intelligent cache busting
Keep the cache; add invalidation hooks at every write site.
What's already wired:
/users/me/synccallsinvalidate_user_profile_cache()after upsert.What's missing:
/admin/roles/*mutate role mappings but don't invalidate the per-user profile cache.AppRolecache in backend/src/apis/shared/rbac/cache.py has its own TTL/invalidation logic and is not coordinated with the profile cache.Pros: Preserves the RCU savings; minimal disruption.
Cons: Every new write path becomes a place to forget the invalidation call. Doesn't solve cross-instance staleness without additional infrastructure.
Path B — Remove the cache
Hit DynamoDB on every authenticated request.
Performance trade-offs:
Pros: Profile changes are instantly reflective. Simpler code path. No cross-instance coordination needed.
Cons: Slight per-request latency bump; small RCU cost increase. Need to confirm
UserRepository.get_user_by_user_id()is the only hot path being saved.Recommendation to discuss
Path B (remove the cache) looks attractive given the small cost delta, the simplification, and the fact that we keep finding edge-case staleness bugs. Path A keeps growing in surface area as we add write paths and would eventually need pub/sub to be correct in a multi-task deployment.
But Path A is lower risk if there's a high-traffic codepath I'm missing where the per-request DynamoDB hit would actually matter.
Decision needed
References
9ea91e9(cache invalidation on profile sync)