Skip to content

bug(cache): Deadlock in heatmap endpoint - Nested lock acquisition on same key #110

@jonasyr

Description

@jonasyr

Summary

The /api/commits/heatmap endpoint experiences a 120-second lock timeout and deadlock due to nested acquisition of the same lock in repositoryCache.ts. This causes the endpoint to hang and eventually fail with a timeout error.

Severity

High - Causes complete endpoint failure and 2-minute hang time affecting user experience.

Root Cause Analysis

The Deadlock Sequence

  1. Outer Lock Acquisition (Line 1514 in repositoryCache.ts):

    return withKeyLock(`cache-aggregated:${repoUrl}`, async () => {

    Acquires lock: cache-aggregated:https://github.com/octocat/Hello-World.git

  2. Inner Nested Lock Attempt (Lines 1591-1593):

    const commits = await withOrderedLocks(
      [`cache-aggregated:${repoUrl}`, `cache-filtered:${repoUrl}`],
      () => this.getOrParseFilteredCommitsUnlocked(repoUrl, commitOptions)
    );

    Tries to acquire the same lock again: cache-aggregated:${repoUrl}

  3. Deadlock:

    • Lock is already held by outer withKeyLock
    • Inner withOrderedLocks waits indefinitely for the same lock
    • After 120 seconds (default timeout), throws: Lock timeout for cache-aggregated:https://github.com/octocat/Hello-World.git after 120017ms

Evidence from Logs

15:44:44 warn: Lock acquisition timeout
15:44:44 error: Lock operation failed
15:44:44 error [GET /heatmap]: Error generating heatmap via unified cache
Heatmap route error: Error: Lock timeout for cache-aggregated:https://github.com/octocat/Hello-World.git after 120017ms
    at LockManager.acquire (/home/jonas/Documents/Code/gitray/apps/backend/src/utils/lockManager.ts:232:17)
    at async <anonymous> (/home/jonas/Documents/Code/gitray/apps/backend/src/utils/lockManager.ts:308:18)
    at async LockManager.withKeyLock (/home/jonas/Documents/Code/gitray/apps/backend/src/utils/lockManager.ts:329:14)
    at async <anonymous> (/home/jonas/Documents/Code/gitray/apps/backend/src/routes/commitRoutes.ts:360:27)

Affected Code

File: apps/backend/src/services/repositoryCache.ts
Method: RepositoryCacheManager.getOrGenerateAggregatedData()
Lines: 1514, 1591-1593

// Line 1514 - OUTER LOCK
async getOrGenerateAggregatedData(
  repoUrl: string,
  filterOptions?: CommitFilterOptions
): Promise<CommitHeatmapData> {
  return withKeyLock(`cache-aggregated:${repoUrl}`, async () => {  // ← LOCK #1
    // ... code ...
    
    // Lines 1591-1593 - NESTED LOCK (DEADLOCK!)
    const commits = await withOrderedLocks(
      [`cache-aggregated:${repoUrl}`, `cache-filtered:${repoUrl}`],  // ← TRIES TO ACQUIRE LOCK #1 AGAIN!
      () => this.getOrParseFilteredCommitsUnlocked(repoUrl, commitOptions)
    );

Fix Required

Option 1: Remove Redundant Lock (Recommended)

Since cache-aggregated:${repoUrl} is already held by the outer withKeyLock, remove it from the inner withOrderedLocks call:

// BEFORE (Line 1591-1593):
const commits = await withOrderedLocks(
  [`cache-aggregated:${repoUrl}`, `cache-filtered:${repoUrl}`],
  () => this.getOrParseFilteredCommitsUnlocked(repoUrl, commitOptions)
);

// AFTER:
const commits = await withKeyLock(
  `cache-filtered:${repoUrl}`,
  () => this.getOrParseFilteredCommitsUnlocked(repoUrl, commitOptions)
);

Rationale: We only need to acquire the cache-filtered lock since cache-aggregated is already held.

Option 2: Use Reentrant Locks

Implement reentrant (recursive) locking in LockManager to allow the same lock to be acquired multiple times by the same execution context. However, this is more complex and not necessary for this case.

Testing

Reproduction Steps

  1. Start the backend server
  2. Execute:
    curl -X GET "http://localhost:3001/api/commits/heatmap?repoUrl=https://github.com/octocat/Hello-World.git&timePeriod=day"
  3. Wait 2 minutes - request will timeout with lock error

Expected Behavior After Fix

  • Heatmap endpoint should respond within seconds (not 120+ seconds)
  • No lock timeout errors in logs
  • Proper aggregated data returned

Test Cases to Add

describe('getOrGenerateAggregatedData', () => {
  test('should not deadlock when acquiring nested locks', async () => {
    const startTime = Date.now();
    
    await repositoryCache.getOrGenerateAggregatedData(
      'https://github.com/test/repo.git',
      { author: 'test' }
    );
    
    const duration = Date.now() - startTime;
    
    // Should complete in under 5 seconds, not 120 seconds
    expect(duration).toBeLessThan(5000);
  });
});

Impact

User Experience

  • ❌ Heatmap visualizations completely broken
  • ❌ 2-minute hang time before error
  • ❌ Poor user experience with loading states

System Health

  • ⚠️ Lock manager metrics showing timeouts
  • ⚠️ Unnecessary lock contention
  • ⚠️ Potential for cascading failures if multiple requests queue up

Additional Context

Related Files

  • apps/backend/src/services/repositoryCache.ts - Contains the bug
  • apps/backend/src/utils/lockManager.ts - Lock implementation
  • apps/backend/src/routes/commitRoutes.ts - Heatmap endpoint

Similar Issues to Check

Search for other instances of nested withKeyLock or withOrderedLocks that might have the same pattern:

grep -n "withKeyLock.*withOrderedLocks" apps/backend/src/services/repositoryCache.ts
grep -n "withOrderedLocks.*cache-" apps/backend/src/services/repositoryCache.ts

Fix Implementation Checklist

  • Remove redundant cache-aggregated:${repoUrl} from withOrderedLocks call (line 1592)
  • Update to use withKeyLock with only cache-filtered:${repoUrl}
  • Add unit test to prevent regression
  • Manual testing with actual heatmap requests
  • Verify no other similar nested lock patterns exist
  • Update lock-related comments if needed

Priority

High - This completely breaks the heatmap endpoint which is a core feature of the application.


Discovered during: Manual CSRF protection validation (Issue #98)
Environment: Development (likely affects production too)
Lock Manager Version: Current implementation in lockManager.ts

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions