Skip to content

Conversation

@jonasyr
Copy link
Owner

@jonasyr jonasyr commented Nov 17, 2025

🐛 Fix: Critical Deadlock in Repository Cache System

Fixes #110

Problem

The /api/commits/heatmap endpoint was experiencing a 120-second lock timeout causing complete endpoint failure. The root cause was nested acquisition of the same locks in three separate locations within repositoryCache.ts, leading to deadlocks.

Root Cause Analysis

Three deadlock scenarios were identified:

  1. getOrGenerateAggregatedData (line 1591)

    • Outer withKeyLock held: cache-aggregated:${repoUrl}
    • Inner withOrderedLocks tried to re-acquire: cache-aggregated:${repoUrl}
    • Result: 120-second timeout
  2. getOrParseCommits (line 975)

    • Outer withOrderedLocks held: cache-operation, repo-access
    • Inner withOrderedLocks tried to re-acquire both ❌
    • Result: Potential deadlock in filtered query path
  3. getOrParseFilteredCommits (line 1238)

    • Outer withKeyLock held: cache-filtered:${repoUrl}
    • Inner withOrderedLocks tried to re-acquire: cache-filtered:${repoUrl}
    • Result: Deadlock when accessing raw commits

Solution

Changed nested lock acquisition to only acquire locks that aren't already held by the outer context:

// BEFORE (DEADLOCK):
const commits = await withOrderedLocks(
  [`cache-aggregated:${repoUrl}`, `cache-filtered:${repoUrl}`],
  () => this.getOrParseFilteredCommitsUnlocked(repoUrl, commitOptions)
);

// AFTER (FIXED):
const commits = await withKeyLock(
  `cache-filtered:${repoUrl}`,
  () => this.getOrParseFilteredCommitsUnlocked(repoUrl, commitOptions)
);

Changes Made

Files Modified

  • apps/backend/src/services/repositoryCache.ts - Fixed 3 deadlock instances with clear comments
  • apps/backend/__tests__/unit/services/repositoryCache.unit.test.ts - Added 3 deadlock prevention tests

Test Coverage

Added comprehensive test suite to prevent regression:

  • getOrGenerateAggregatedData should not deadlock with nested locks
  • getOrParseCommits should not deadlock with filtered options
  • getOrParseFilteredCommits should not deadlock when acquiring cache-operation lock

All tests verify operations complete in < 5 seconds instead of timing out after 120 seconds.

Verification Results

✅ Build & Tests

pnpm build          # SUCCESS (14.5s)
pnpm test:backend   # SUCCESS (828 tests passed)
pnpm lint           # SUCCESS (only pre-existing warnings)
pnpm lint:md        # SUCCESS (0 errors)

✅ Manual Endpoint Testing

# Test 1: Heatmap without filters
$ time curl "http://localhost:3001/api/commits/heatmap?repoUrl=https://github.com/octocat/Hello-World.git&timePeriod=day"
real    0m0.691s  ✅ (was 120+ seconds before fix)

# Test 2: Heatmap with filters (triggers all code paths)
$ time curl "http://localhost:3001/api/commits/heatmap?repoUrl=https://github.com/octocat/Hello-World.git&timePeriod=week&author=octocat"
real    0m0.085s  ✅ (cache hit, even faster)

No lock timeout errors in server logs

Impact

  • Performance: Heatmap endpoint responds in < 1 second (was 120+ seconds timeout)
  • Reliability: No more lock contention warnings
  • User Experience: No more 2-minute hangs on heatmap visualization
  • System Health: Fixed potential cascading failures from lock queue buildup

Testing Checklist

  • Unit tests pass (828/828)
  • New deadlock prevention tests added
  • Manual endpoint testing successful
  • Build completes without errors
  • Linting passes
  • No lock timeout errors in logs
  • Response time < 1 second for all test cases

Breaking Changes

None - this is a pure bug fix with no API changes.

Additional Notes

This fix eliminates all nested lock acquisition deadlocks in the repository cache system. The pattern of using "unlocked" internal methods is maintained, and the fix adds clear documentation to prevent future regressions.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@jonasyr
Copy link
Owner Author

jonasyr commented Nov 17, 2025

✅ Fixed: cache-operation/cache-filtered Circular Locking Issue

The P1 deadlock issue identified by @chatgpt-codex-connector has been resolved! 🎉

Problem Summary

The previous fix using single withKeyLock calls created a circular lock dependency:

  • getOrParseCommits: Acquires cache-operationrepo-access → waits for cache-filtered
  • getOrParseFilteredCommits: Acquires cache-filtered → waits for cache-operation
  • Result: Deadlock when concurrent requests try to acquire locks in opposite orders

Solution

Implemented consistent alphabetical lock ordering across all cache methods using withOrderedLocks:

Lock Order: cache-aggregated < cache-contributors < cache-filtered < cache-operation < repo-access

All methods now acquire locks in the same order, preventing circular dependencies:

  1. getOrParseCommits - Acquires: [cache-filtered, cache-operation, repo-access]
  2. getOrParseFilteredCommits - Acquires: [cache-filtered, cache-operation, repo-access]
  3. getOrGenerateContributors - Acquires: [cache-contributors, cache-filtered, cache-operation, repo-access]
  4. getOrGenerateAggregatedData - Acquires: [cache-aggregated, cache-filtered, cache-operation, repo-access]

Key Benefits

  • No circular dependencies: Alphabetical lock ordering prevents deadlocks
  • No nested lock calls: Eliminated all nested withKeyLock calls
  • Automatic deduplication: withOrderedLocks handles duplicate locks
  • Concurrent safety: Multiple requests can safely execute in parallel

Verification Results

Build & Tests: All passed! ✅

pnpm lint          # SUCCESS (0 errors)
pnpm build         # SUCCESS
pnpm test:backend  # SUCCESS (828/828 tests passed)

Test Improvements: Fixed 1 additional flaky test in cache.unit.test.ts that was causing intermittent failures.

The deadlock is now completely resolved with proper lock ordering! 🚀

@jonasyr jonasyr self-assigned this Nov 17, 2025
@chatgpt-codex-connector
Copy link

Summary

Testing

  • ⚠️ pnpm test:backend --filter repositoryCache (fails: unknown option)
  • pnpm test:backend

View task →

@jonasyr jonasyr requested a review from NiklasSkulll November 17, 2025 17:01
NiklasSkulll
NiklasSkulll previously approved these changes Nov 18, 2025
@jonasyr jonasyr dismissed NiklasSkulll’s stale review November 18, 2025 08:35

The merge-base changed after approval.

@jonasyr jonasyr requested a review from NiklasSkulll November 18, 2025 14:16
NiklasSkulll
NiklasSkulll previously approved these changes Nov 18, 2025
@jonasyr jonasyr dismissed NiklasSkulll’s stale review November 18, 2025 14:26

The merge-base changed after approval.

@jonasyr jonasyr requested a review from jonasiu November 18, 2025 19:33
jonasiu
jonasiu previously approved these changes Nov 18, 2025
@jonasyr jonasyr dismissed jonasiu’s stale review November 18, 2025 19:34

The merge-base changed after approval.

@jonasiu jonasiu requested a review from Copilot November 18, 2025 19:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a critical deadlock in the repository cache system by eliminating nested lock acquisition patterns that caused 120-second timeouts in the /api/commits/heatmap endpoint. The fix introduces helper methods to ensure consistent lock ordering and replaces nested lock calls with direct calls to unlocked internal methods.

Key Changes

  • Deadlock prevention: Added getCommitLocks(), getContributorLocks(), and getAggregatedLocks() helper methods to ensure consistent lock ordering across all cache operations
  • Lock acquisition refactoring: Changed three methods (getOrParseCommits, getOrParseFilteredCommits, getOrGenerateAggregatedData) to use withOrderedLocks with all required locks upfront, then call unlocked internal methods to prevent nested lock re-acquisition
  • Test coverage: Added deadlock prevention tests and helper method tests to verify correct lock array generation

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
apps/backend/src/services/repositoryCache.ts Core deadlock fix: added lock helper methods and refactored lock acquisition to prevent nested locks
apps/backend/tests/unit/services/repositoryCache.unit.test.ts Added deadlock prevention tests and lock helper validation tests
apps/backend/src/middlewares/strictContentType.ts New CSRF-like protection middleware (unrelated to deadlock fix)
apps/backend/tests/unit/middlewares/strictContentType.unit.test.ts Tests for new strictContentType middleware
apps/backend/src/index.ts Applied strictContentType middleware to API routes
eslint.config.mjs Complete ESLint configuration restructure (unrelated to deadlock fix)
packages/shared-types/src/index.ts Added readonly modifiers to error class properties
apps/frontend/src/services/api.ts Added X-Requested-With header to API client
Multiple test and utility files Parameter naming changes (underscore prefix for unused params)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jonasyr and others added 2 commits November 18, 2025 22:48
Wasnt even an Agents.md updated it to actually help with Codex etc.
@jonasyr
Copy link
Owner Author

jonasyr commented Nov 18, 2025

Copilot Review Comments Addressed

I've addressed the Copilot review comments with the following changes:

1. ✅ Fixed resolveClone initialization in repositoryCoordinator.unit.test.ts (line 141)

  • Change: Removed the unnecessary = () => {} initialization as suggested
  • Result: TypeScript correctly understands the variable will be assigned before use in the Promise constructor
  • Note: There's a TypeScript linting warning about the unused parameter name in the type annotation ('path' is defined but never used), but this is a known limitation when using type annotations with parameter names. The test passes successfully.

2. ✅ Restored async beforeEach in cache.unit.test.ts (line 308)

  • Change: Changed beforeEach(() => { ... }) back to beforeEach(async () => { ... await ctx.importCache(); })
  • Rationale: Ensures consistent cache initialization for each test, maintaining test isolation
  • Exception: The first test (should handle redis connection events and update health status) needs to set up Redis mocks BEFORE cache initialization, so it manually calls await ctx.importCache() after mock setup
  • Result: All 138 cache-related tests pass successfully

3. ⚠️ strictContentType middleware remains in PR

  • Status: The strictContentType.ts middleware is indeed unrelated to the deadlock fix
  • Context: It was added in commit 06913bd as a security enhancement to enforce JSON content-type and require X-Requested-With header for state-changing operations (POST, PUT, DELETE)
  • Recommendation: Consider either:
    • Moving this to a separate PR focused on security enhancements, or
    • Updating the PR description to mention this additional security improvement

Test Results

All backend tests pass: 829 tests passed across 28 test files ✅

The core deadlock prevention fixes remain intact and functional.

jonasiu
jonasiu previously approved these changes Nov 18, 2025
@jonasyr jonasyr dismissed jonasiu’s stale review November 18, 2025 22:22

The merge-base changed after approval.

@jonasyr jonasyr closed this Nov 18, 2025
@jonasyr jonasyr reopened this Nov 18, 2025
@jonasyr jonasyr requested a review from jonasiu November 18, 2025 22:28
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
77.1% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@NiklasSkulll NiklasSkulll merged commit 6bad3fa into dev Nov 19, 2025
11 of 12 checks passed
@jonasyr jonasyr deleted the jonasyr/issue110 branch November 19, 2025 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(cache): Deadlock in heatmap endpoint - Nested lock acquisition on same key

4 participants