-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
The /api/commits/heatmap endpoint experiences a 120-second lock timeout and deadlock due to nested acquisition of the same lock in repositoryCache.ts. This causes the endpoint to hang and eventually fail with a timeout error.
Severity
High - Causes complete endpoint failure and 2-minute hang time affecting user experience.
Root Cause Analysis
The Deadlock Sequence
-
Outer Lock Acquisition (Line 1514 in
repositoryCache.ts):return withKeyLock(`cache-aggregated:${repoUrl}`, async () => {
Acquires lock:
cache-aggregated:https://github.com/octocat/Hello-World.git -
Inner Nested Lock Attempt (Lines 1591-1593):
const commits = await withOrderedLocks( [`cache-aggregated:${repoUrl}`, `cache-filtered:${repoUrl}`], () => this.getOrParseFilteredCommitsUnlocked(repoUrl, commitOptions) );
Tries to acquire the same lock again:
cache-aggregated:${repoUrl} -
Deadlock:
- Lock is already held by outer
withKeyLock - Inner
withOrderedLockswaits indefinitely for the same lock - After 120 seconds (default timeout), throws:
Lock timeout for cache-aggregated:https://github.com/octocat/Hello-World.git after 120017ms
- Lock is already held by outer
Evidence from Logs
15:44:44 warn: Lock acquisition timeout
15:44:44 error: Lock operation failed
15:44:44 error [GET /heatmap]: Error generating heatmap via unified cache
Heatmap route error: Error: Lock timeout for cache-aggregated:https://github.com/octocat/Hello-World.git after 120017ms
at LockManager.acquire (/home/jonas/Documents/Code/gitray/apps/backend/src/utils/lockManager.ts:232:17)
at async <anonymous> (/home/jonas/Documents/Code/gitray/apps/backend/src/utils/lockManager.ts:308:18)
at async LockManager.withKeyLock (/home/jonas/Documents/Code/gitray/apps/backend/src/utils/lockManager.ts:329:14)
at async <anonymous> (/home/jonas/Documents/Code/gitray/apps/backend/src/routes/commitRoutes.ts:360:27)
Affected Code
File: apps/backend/src/services/repositoryCache.ts
Method: RepositoryCacheManager.getOrGenerateAggregatedData()
Lines: 1514, 1591-1593
// Line 1514 - OUTER LOCK
async getOrGenerateAggregatedData(
repoUrl: string,
filterOptions?: CommitFilterOptions
): Promise<CommitHeatmapData> {
return withKeyLock(`cache-aggregated:${repoUrl}`, async () => { // ← LOCK #1
// ... code ...
// Lines 1591-1593 - NESTED LOCK (DEADLOCK!)
const commits = await withOrderedLocks(
[`cache-aggregated:${repoUrl}`, `cache-filtered:${repoUrl}`], // ← TRIES TO ACQUIRE LOCK #1 AGAIN!
() => this.getOrParseFilteredCommitsUnlocked(repoUrl, commitOptions)
);Fix Required
Option 1: Remove Redundant Lock (Recommended)
Since cache-aggregated:${repoUrl} is already held by the outer withKeyLock, remove it from the inner withOrderedLocks call:
// BEFORE (Line 1591-1593):
const commits = await withOrderedLocks(
[`cache-aggregated:${repoUrl}`, `cache-filtered:${repoUrl}`],
() => this.getOrParseFilteredCommitsUnlocked(repoUrl, commitOptions)
);
// AFTER:
const commits = await withKeyLock(
`cache-filtered:${repoUrl}`,
() => this.getOrParseFilteredCommitsUnlocked(repoUrl, commitOptions)
);Rationale: We only need to acquire the cache-filtered lock since cache-aggregated is already held.
Option 2: Use Reentrant Locks
Implement reentrant (recursive) locking in LockManager to allow the same lock to be acquired multiple times by the same execution context. However, this is more complex and not necessary for this case.
Testing
Reproduction Steps
- Start the backend server
- Execute:
curl -X GET "http://localhost:3001/api/commits/heatmap?repoUrl=https://github.com/octocat/Hello-World.git&timePeriod=day" - Wait 2 minutes - request will timeout with lock error
Expected Behavior After Fix
- Heatmap endpoint should respond within seconds (not 120+ seconds)
- No lock timeout errors in logs
- Proper aggregated data returned
Test Cases to Add
describe('getOrGenerateAggregatedData', () => {
test('should not deadlock when acquiring nested locks', async () => {
const startTime = Date.now();
await repositoryCache.getOrGenerateAggregatedData(
'https://github.com/test/repo.git',
{ author: 'test' }
);
const duration = Date.now() - startTime;
// Should complete in under 5 seconds, not 120 seconds
expect(duration).toBeLessThan(5000);
});
});Impact
User Experience
- ❌ Heatmap visualizations completely broken
- ❌ 2-minute hang time before error
- ❌ Poor user experience with loading states
System Health
⚠️ Lock manager metrics showing timeouts⚠️ Unnecessary lock contention⚠️ Potential for cascading failures if multiple requests queue up
Additional Context
Related Files
apps/backend/src/services/repositoryCache.ts- Contains the bugapps/backend/src/utils/lockManager.ts- Lock implementationapps/backend/src/routes/commitRoutes.ts- Heatmap endpoint
Similar Issues to Check
Search for other instances of nested withKeyLock or withOrderedLocks that might have the same pattern:
grep -n "withKeyLock.*withOrderedLocks" apps/backend/src/services/repositoryCache.ts
grep -n "withOrderedLocks.*cache-" apps/backend/src/services/repositoryCache.tsFix Implementation Checklist
- Remove redundant
cache-aggregated:${repoUrl}fromwithOrderedLockscall (line 1592) - Update to use
withKeyLockwith onlycache-filtered:${repoUrl} - Add unit test to prevent regression
- Manual testing with actual heatmap requests
- Verify no other similar nested lock patterns exist
- Update lock-related comments if needed
Priority
High - This completely breaks the heatmap endpoint which is a core feature of the application.
Discovered during: Manual CSRF protection validation (Issue #98)
Environment: Development (likely affects production too)
Lock Manager Version: Current implementation in lockManager.ts