Skip to content

feat: Redis-backed refresh tokens, JTI blacklist, and httpOnly cookies#149

Open
GitAddRemote wants to merge 9 commits intomainfrom
feature/ISSUE-109
Open

feat: Redis-backed refresh tokens, JTI blacklist, and httpOnly cookies#149
GitAddRemote wants to merge 9 commits intomainfrom
feature/ISSUE-109

Conversation

@GitAddRemote
Copy link
Copy Markdown
Owner

Summary

  • Redis refresh token storage: On login, a refresh:{jti} key is written to Redis (7-day TTL) instead of the refresh_tokens DB table. Rotation deletes the old key and creates a new one. No DB reads on every token refresh.
  • JTI blacklist: Every access token now carries a jti UUID claim. On logout, blacklist:{jti} is written to Redis with the remaining TTL. JwtStrategy.validate() checks this on every authenticated request — revoked tokens get an immediate 401.
  • httpOnly cookie delivery: Unchanged from the previous implementation — access and refresh tokens are delivered via httpOnly; Secure; SameSite=Strict cookies. Refresh token is never in a JSON response body.
  • Migration: Drops the refresh_tokens table (1777409770542-DropRefreshTokensTable). down() recreates it for rollback safety.
  • TokenCleanupService: Stripped of the RefreshToken repository dependency — Redis TTLs handle expiry automatically.

Architecture notes

  • Uses the existing global CacheModule (CACHE_MANAGER) — no new Redis client needed. cache-manager v7 supports per-key TTL via set(key, value, ttlMs).
  • RefreshTokenAuthGuard decodes the access token cookie (without verifying expiry) to extract the JTI. This ties the refresh token cookie to its Redis entry — replay attacks with a rotated refresh token fail immediately.
  • JwtStrategy now depends on AuthService for the blacklist check. Both are in the same AuthModule — no circular module dependency.

Test plan

  • pnpm typecheck passes
  • pnpm test — 266/266 unit tests pass
  • pnpm test:e2e — run auth-jti-blacklist.e2e-spec.ts with DB + Redis available:
    • login → logout → old access token → expect 401
    • login → refresh → old refresh token → expect 401
    • login response body contains no access_token or refresh_token fields
  • Run migration against staging: pnpm migration:run
  • Verify refresh_tokens table is dropped: \dt refresh_tokens → "no relations found"

Closes #109

Replaces the PostgreSQL refresh_tokens table with Redis-based token

storage. Adds JTI-based access token blacklisting on logout.

Changes:

- JwtPayload now includes a jti (UUID v4) claim on every access token

- Login stores refresh:{jti} → userId:rawToken in Redis (7-day TTL)

- Refresh reads and rotates the Redis entry (old key deleted)

- Logout deletes the refresh entry and writes blacklist:{jti} to Redis

  with the remaining access token TTL

- JwtStrategy checks Redis blacklist on every authenticated request;

  blacklisted tokens receive an immediate 401

- RefreshTokenAuthGuard decodes the access token cookie to extract the

  JTI that ties the token pair together in Redis

- Tokens delivered and consumed exclusively via httpOnly cookies;

  no token values appear in JSON response bodies

- TokenCleanupService stripped of RefreshToken repository dependency

- Migration 1777409770542 drops the refresh_tokens table

Tests:

- Unit tests rewritten for Redis storage, blacklist, and rotation

- E2E test: login → logout → old access token → 401

- E2E test: login → refresh → old refresh token → 401

- E2E test: refresh_token absent from login response body

Closes #109
Fixes four issues identified in code review of #149:

1. (High) Refresh/logout no longer requires the access token cookie.

   The JTI is now embedded in the refresh token value itself

   ({jti}.{randomhex}), so RefreshTokenAuthGuard can extract it

   from the refresh cookie alone. Users can refresh after the

   15-minute access token cookie expires.

2. (High) Atomic token rotation via GETDEL.

   consumeRefreshEntry() uses the underlying Redis client's GETDEL

   command so concurrent refresh requests can only redeem a token

   once. Falls back to get+del for the in-memory cache (test env,

   single-threaded — no real concurrency risk).

3. (Medium-High) Refresh tokens stored as SHA-256 hashes.

   Redis stores {userId}:{sha256(raw)} instead of the raw value.

   Validation hashes the presented token before comparing.

   A Redis dump no longer yields reusable tokens.

   Hash-mismatch on refresh restores the entry so the legitimate

   holder is not locked out.

4. (Medium) New migration registered in data-source.ts.

   DropRefreshTokensTable1777409770542 added to the explicit

   migration list; RefreshToken entity removed from entities array.
Finding 1 (High) — GETDEL path was unreachable in production:

- Remove fragile internal cache-manager path traversal (.store.client)

- Inject a standalone node-redis client via REDIS_CLIENT token in AuthModule

- AuthService uses injected client for atomic getDel when available

- @optional() ensures test modules (no REDIS_CLIENT provider) use get+del fallback

Finding 2 (Medium-High) — restore reset TTL to full 7 days on mismatch:

- Read pTTL before consuming the key so remaining TTL is known

- consumeRefreshEntry returns [value, remainingTtlMs] tuple

- refreshAccessToken restores entry with original remaining TTL on hash mismatch
… tokens

Finding 1 (High) — auth state must not silently degrade to per-process memory:

- REDIS_CLIENT factory now throws on connection failure instead of returning null;

  a missing Redis kills the process at startup rather than silently splitting

  auth state across instances

- All auth state writes (refresh, blacklist) route through authSet(), which uses

  the raw Redis client when available, falling back to cacheManager only in tests

  (USE_REDIS_CACHE=false); this eliminates the cacheManager in-memory fallback

  path for auth in production

- authSet() enforces a positive TTL at the call site — auth state must always

  expire

Finding 2 (Medium) — pTTL=0 must not produce a persistent refresh token:

- consumeRefreshEntry() now returns remainingTtlMs=0 for any non-positive pTTL

  result (key missing, no expiry, or race with natural expiry)

- refreshAccessToken() only restores the entry on hash mismatch when

  remainingTtlMs > 0; a zero TTL means the token was expiring and is not

  written back, preventing a nearly-expired token from becoming immortal
…d behavior

Finding 1 (Medium-High) — logout race: stolen token rotated just before logout

left the newly minted refresh token alive because only the old JTI was revoked.

Introduce a session ID (SID) threaded through all token rotations:

- login() creates session:{sid} in Redis (7-day TTL, same family lifetime)

- generateRefreshToken() stores {userId}:{hash}:{sid} instead of {userId}:{hash}

- refreshAccessToken() checks session:{sid} is still alive after consuming

  the refresh entry; if already revoked, the rotated token is also rejected

- revokeRefreshToken() deletes session:{sid} before refresh:{jti} so any

  concurrently rotated token in the same family is immediately invalidated

- logout() inherits the family kill via revokeRefreshToken, then blacklists

  the current access token JTI as before

Finding 2 (Medium) — fail-closed behavior was implicit and conflicted with

existing app-level Redis fallback documentation:

- REDIS_CLIENT factory restores try/catch and returns null on failure

- AuthService constructor detects redisClient===null with USE_REDIS_CACHE=true

  and throws with a clear message, making fail-closed intent explicit in code

- Test module now provides REDIS_CLIENT: null and ConfigService returns

  USE_REDIS_CACHE=false so the constructor guard does not fire in tests

- .env.example documents why USE_REDIS_CACHE=true means Redis is required

  for auth and is not safe to run without
Finding 1 (High) — logout race with concurrent token rotation:

- generateRefreshToken() now writes a non-consumed reverse-index key

  jti:{jti} → sid alongside every refresh entry. Unlike refresh:{jti}

  this key is never GETDEL'd, so it survives a concurrent rotation.

- logout() now reads jti:{jti} after revokeRefreshToken() and deletes

  session:{sid} directly. If revokeRefreshToken() already deleted the

  session (normal path), the second del is a no-op. If refresh:{jti}

  was already consumed by a concurrent refresh, the reverse-index still

  provides the SID and the session family is reliably terminated.

Finding 2 (Medium-High) — session TTL not extended on token rotation:

- refreshAccessToken() renews session:{sid} with a fresh 7-day TTL

  after verifying the session is alive. The session window now slides

  with the refresh token so actively-used sessions do not expire while

  the client holds a valid refresh token.
Finding 1 (High) — access tokens issued after a concurrent refresh-before-

logout were not invalidated by session revocation because JwtStrategy only

checked the per-token jti blacklist, not session:{sid}.

- Add sid to JwtPayload interface

- login() and refreshAccessToken() embed sid in the signed JWT payload

- Add AuthService.isSessionAlive(sid) — reads session:{sid} from Redis

- JwtStrategy.validate() calls isSessionAlive(payload.sid) after the

  jti blacklist check; if session:{sid} is absent the request is rejected

  with 401 regardless of whether the specific jti was blacklisted

This closes the race: logout deletes session:{sid}, so any access token

from that family — including ones issued by a refresh that beat logout —

fails validation on the very next request. The 15-minute expiry window

no longer represents a revocation gap for concurrent rotations.
Covers the four race-shaped scenarios that unit tests with in-memory

mocks cannot prove, per the reviewer's residual-risk note:

1. Parallel refresh race — two concurrent requests against the same

   refresh token; GETDEL atomicity ensures exactly one wins (200) and

   one loses (401), never two winners.

2. Logout-vs-refresh race — logout and refresh fired simultaneously;

   regardless of which arrives first, the session must be dead afterward

   and any newly issued access or refresh token must be rejected.

3. Session TTL sliding — two successive rotations both produce access

   tokens that pass /auth/me, proving session:{sid} is renewed on each

   refresh and does not expire at the original login window.

4. Cross-rotation logout — logout via post-rotation tokens must also

   invalidate access tokens issued before the rotation (same session family).

The suite self-skips when Redis is unavailable (USE_REDIS_CACHE != true

or Redis not reachable) so the standard CI pipeline is unaffected.

Run with: pnpm test:e2e:redis (requires docker-compose up -d)
…hable

When USE_REDIS_CACHE=true the suite now throws on connection failure
(probe.connect() with no try/catch) so a green run proves real Redis was
exercised. When USE_REDIS_CACHE=false the suite calls pending() and is
visible as skipped rather than silently passing. Per-test redisAvailable
guards removed — gate is now enforced once in beforeAll.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tech Story: Redis-backed refresh tokens, JTI blacklist, and httpOnly cookies

1 participant