-
Notifications
You must be signed in to change notification settings - Fork 0
feat: refactor repository routes to use unified cache service and fix data accuracy issues #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cache service (#120) Summary Complete refactoring of all repository API routes to use the unified multi-tier cache service, removing manual Redis operations and migrating from POST to GET endpoints for RESTful compliance. Breaking Changes⚠️ All repository endpoints changed from POST to GET with query parameters: POST /api/repositories → GET /api/repositories/commits POST /api/repositories/heatmap → GET /api/repositories/heatmap POST /api/repositories/contributors → GET /api/repositories/contributors POST /api/repositories/churn → GET /api/repositories/churn POST /api/repositories/full-data → GET /api/repositories/full-data GET /api/repositories/summary (internal change only)
…g and improved performance
…documents - Introduced to provide detailed instructions for migrating frontend API calls to the new backend structure, including HTTP method changes, response structure updates, and necessary code modifications. - Removed and as they contained outdated information and were superseded by the new migration guide. - Deleted as it was no longer relevant to the updated API testing strategy.
- Created routeHelpers.ts with pure buildCommitFilters() function - Replaced 3 occurrences of duplicate filter building code - Eliminates 36 lines of duplication (37% reduction from 44.6%) - Preserves cache key generation logic exactly - Tested: heatmap, contributors, and full-data endpoints validated
- Created recordCacheHit() private method in RepositoryCacheManager - Replaced 8 occurrences of cache hit tracking pattern - Eliminates ~96 lines of duplication across cache methods - Preserves exact metric recording, freshness tracking, and Prometheus updates - Tested: commits, contributors, and heatmap endpoints validated
- Created recordCacheMiss() private method in RepositoryCacheManager - Replaced 8 occurrences of cache miss tracking pattern - Eliminates ~48 lines of duplication across cache methods - Preserves exact metric recording and Prometheus updates - Tested: commits and summary endpoints validated
Extract common request initialization (logger, repoUrl, userType) into reusable helper function to eliminate duplication across all 6 route handlers. Changes: - Add setupRouteRequest() helper to routeHelpers.ts - Apply helper to /commits, /heatmap, /contributors, /churn, /summary, /full-data - Reduces ~36 lines of duplicated initialization code Testing: - All 6 endpoints verified with manual API tests - TypeScript compilation successful - No behavior changes - 100% preservation Impact: ~7% duplication reduction (36 lines eliminated) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Extract common success path (metrics + logging + response) into reusable helper function to eliminate duplication across all 6 route handlers. Changes: - Add recordRouteSuccess() helper to routeHelpers.ts - Apply helper to /commits, /heatmap, /contributors, /churn, /summary, /full-data - Consolidates recordFeatureUsage, logger.info, and res.json calls Testing: - All endpoints verified with curl - Response structure identical to before - TypeScript compilation successful - No behavior changes - 100% preservation Impact: ~11% duplication reduction (54 lines eliminated) Cumulative: ~18% reduction after Phases 1 & 2
Extract common error handling pattern (metrics + logging + error propagation) into reusable helper function to eliminate duplication across all 6 route handlers. Changes: - Add recordRouteError() helper to routeHelpers.ts - Apply helper to /commits, /heatmap, /contributors, /churn, /summary, /full-data - Consolidates recordFeatureUsage, logger.error, and next(error) calls Testing: - All endpoints verified with curl after cache clearing - Error handling behavior identical to before - TypeScript compilation successful - No behavior changes - 100% preservation Impact: ~10% duplication reduction (48 lines eliminated) Cumulative: ~28% reduction after Phases 1, 2 & 3
- Add handleCacheHit private method to RepositoryCacheManager - Apply to getOrParseCommits method - Eliminates 18 lines of duplication - Behavior verified: /commits endpoint tested successfully
- Apply handleCacheHit to getOrGenerateAggregatedData - Apply handleCacheHit to getOrGenerateContributors - Apply handleCacheHit to getOrGenerateChurnData - Eliminates 54 lines of duplication across 3 methods - Behavior verified: /heatmap, /contributors, /churn all tested successfully
Consolidates cache miss recording and logging across all 4 cache methods. Reduces duplication by ~16 lines (4 lines × 4 methods). Added: - handleCacheMiss() private method that standardizes cache miss path Applied to: - getOrParseCommits - getOrGenerateAggregatedData - getOrGenerateContributors - getOrGenerateChurnData Testing: All 4 endpoints verified working (/commits, /heatmap, /contributors, /churn) Part of PR #122 to reduce SonarQube duplication violations in repositoryCache.ts
Consolidates transaction success path across all 4 cache methods. Reduces duplication by ~104 lines (26 lines × 4 methods). Added: - handleTransactionSuccess() private method that standardizes caching success path Applied to: - getOrParseCommits - getOrGenerateAggregatedData - getOrGenerateContributors - getOrGenerateChurnData Testing: All 4 endpoints verified working (/commits, /heatmap, /contributors, /churn) Part of PR #122 to reduce SonarQube duplication violations in repositoryCache.ts
- Add private handleTransactionError method to consolidate error handling - Apply to all 4 cache methods (getOrParseCommits, getOrGenerateAggregatedData, getOrGenerateContributors, getOrGenerateChurnData) - Reduces ~72 lines of duplicated error handling code - Maintains exact behavior: metrics, health scores, rollback, logging, re-throw
- Change catch blocks from 'await handleTransactionError' to 'return handleTransactionError' - TypeScript's control flow analysis requires explicit return for Promise<never> - All 4 cache methods updated (getOrParseCommits, getOrGenerateAggregatedData, getOrGenerateContributors, getOrGenerateChurnData) - Verified: Build passes, 850 tests pass, all API endpoints working correctly
- Update .gitignore to allow tracking Serena memories while ignoring cache - Add project configuration and onboarding memories - Includes: codebase_structure, coding_standards, suggested_commands - Includes: project_overview, architecture_overview, task_completion_checklist - Exclude Serena memories from markdownlint checks - Allows reuse of project context on other clients without re-onboarding
Addresses SonarQube duplication violation (9.9% → ~7-8% estimated) - Add extractPaginationParams helper for pagination query parsing - Add extractFilterParams helper for filter query extraction - Add buildChurnFilters helper for churn filter construction - Update repositoryRoutes.ts to use new helpers - Reduce duplication by ~25 lines across 6 route handlers Behavior preserved: - All helpers are pure functions with identical logic to replaced code - Pagination: same defaults (page=1, limit=100), same skip calculation - Filter extraction: identical destructuring, no logic changes - Churn filters: same conditional inclusion, same parsing logic Testing: - pnpm build: ✅ Success (no compilation errors) - pnpm test: ✅ 865 tests passed (0 failures) - Manual API: ✅ All 11 test scenarios passed - /commits (with pagination) - /heatmap (with/without filters) - /contributors (with/without filters) - /churn (with/without filters) - /summary (480 commits, 6 contributors verified) - /full-data (pagination + filters combined) - Validation errors (proper error responses) Backend runs without errors. Cache cleared before testing. Related: PR #122, Issue #120
Introduce factory pattern to reduce code duplication in repository routes from 35.2% (140 lines) to <10%. This addresses SonarQube duplication warnings while preserving all existing behavior. Changes: - Add repositoryRouteFactory.ts with createCachedRouteHandler() and buildRepoValidationChain() helpers - Refactor all 6 routes (/commits, /heatmap, /contributors, /churn, /summary, /full-data) to use factory pattern - Eliminate ~120 lines of repeated try-catch scaffolding - Consolidate validation chain patterns All tests passing (850/850). Manual API validation confirmed. Relates to #120
The contributors route was hanging due to incorrect lock acquisition order: - getOrGenerateContributors() used withOrderedLocks() to acquire cache locks - Then called withSharedRepository() which tried to acquire repo-access lock - This created a deadlock when multiple requests were in flight Solution: - Remove withOrderedLocks() wrapper from getOrGenerateContributors() - Let repository coordinator manage its own locking via withSharedRepository() - This aligns with upcoming analysis session architecture where endpoints manage their own locking independently Also includes refactoring improvements: - Extract hash utilities to utils/hashUtils.ts (reduce duplication) - Extract cache helpers to utils/cacheHelpers.ts (reduce duplication) - Unify validation chains in middlewares/validation.ts - Update comments in getCommitLocks() to clarify lock ordering All 21 API tests passing. Contributors endpoint now works correctly without hanging, though it remains slow on first request (will be optimized in upcoming analysis session indexing refactor).
…es and validation handlers
- Implement comprehensive unit tests for the shallowClone function in gitUtils to cover happy paths and error handling scenarios. - Create unit tests for the createCachedRouteHandler and buildRepoValidationChain functions in repositoryRouteFactory, focusing on various feature names and validation options. - Develop extensive tests for routeHelpers, including methods for setting up requests, recording successes and errors, and building filters for commits and churn. - Ensure all tests follow the AAA (Arrange-Act-Assert) pattern and achieve a coverage target of ≥80%.
…ges and new endpoint structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… and enhance ESLint config to ignore build directory
|
…e names Remove ranking and statistics from /contributors endpoint per issue #121. The endpoint now returns all unique contributor names alphabetically sorted without commit counts, line statistics, or contribution percentages. Key changes: - Replaced getTopContributors() with getContributors() in gitService - Updated Contributor interface to only include field - Removed top-5 limit - returns all contributors - Uses git log --format=%aN for author name extraction - Maintains integration with unified caching and repository coordination - Fully GDPR-compliant (author names only, no tracking metrics) Benefits: - Contributors endpoint can now reuse cached repositories from other endpoints - 24x performance improvement (7.6s → 0.3s when repo already cached) - No longer requires --numstat, enabling repository reuse - Simpler API contract aligned with GET semantics Breaking changes: - Response structure changed from ContributorStat[] to Contributor[] - Removed fields: commitCount, linesAdded, linesDeleted, contributionPercentage - No longer limited to top 5 contributors Files modified: - packages/shared-types/src/index.ts (simplified Contributor interface) - apps/backend/src/services/gitService.ts (new getContributors method) - apps/backend/src/services/repositoryCache.ts (updated type guards) - apps/backend/__tests__/unit/services/gitService.unit.test.ts (rewrote tests) - apps/backend/__tests__/unit/routes/repositoryRoutes.unit.test.ts (updated assertions) - FRONTEND_API_MIGRATION.md (documented API changes) Tested with multiple repositories (gitray, express, vscode, React): - 6 contributors for gitray (0.3s with cached repo) - 369 contributors for express - 2,727 contributors for vscode - 1,905 contributors for React (0.3s vs 7.6s without cache reuse) Resolves: #121 Related: #120 (unified caching), #122 (repository coordinator)


Closes #120
Table of Contents
Summary
This PR completely refactors all repository API routes to use the unified multi-tier cache service, removes manual Redis operations, migrates from POST to GET endpoints for RESTful compliance, and resolves critical data accuracy issues. Following the initial implementation, extensive code deduplication and quality improvements were applied based on SonarQube analysis.
Total changes: 28 commits with comprehensive refactoring, deduplication, and testing enhancements.
📊 View Architecture Migration Diagram
Architecture Overview: Legacy → Modern
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e3f2fd','primaryTextColor':'#1565c0','primaryBorderColor':'#1976d2','lineColor':'#424242','secondaryColor':'#f3e5f5','tertiaryColor':'#fff3e0'}}}%% flowchart TB subgraph BEFORE["❌ Legacy Architecture"] direction TB OLD_ROUTES["POST Routes (request body)"] OLD_CACHE["Manual Redis Cache (inconsistent keys)"] OLD_GIT["Temporary Git Clones (shallow, incomplete)"] OLD_ROUTES -->|direct Redis operations| OLD_CACHE OLD_CACHE -->|depth=1000 / 346 commits| OLD_GIT end subgraph AFTER["✅ New Architecture"] direction TB NEW_ROUTES["GET Routes (query parameters)"] UNIFIED_CACHE["Unified Cache Service (normalized keys)"] CACHE_TIERS["Multi-Tier Cache Memory → Disk → Redis → Git"] COORDINATED_GIT["Shared Git Repositories (blob filter, complete)"] NEW_ROUTES -->|unified interface| UNIFIED_CACHE UNIFIED_CACHE -->|automatic promotion| CACHE_TIERS CACHE_TIERS -->|blob:none / 480 commits| COORDINATED_GIT end subgraph IMPROVEMENTS["🎯 Key Improvements"] direction LR FIX1["✓ Complete History blob filtering"] FIX2["✓ Consistent Caching normalized keys"] FIX3["✓ Accurate Data sequential execution"] end BEFORE -.->|refactored to| AFTER COORDINATED_GIT -.->|enables| FIX1 UNIFIED_CACHE -.->|ensures| FIX2 CACHE_TIERS -.->|provides| FIX3 classDef legacyStyle fill:#ffebee,stroke:#c62828,stroke-width:2px,color:#b71c1c classDef modernStyle fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#1b5e20 classDef improvementStyle fill:#fff3e0,stroke:#ef6c00,stroke-width:2px,color:#e65100 class BEFORE,OLD_ROUTES,OLD_CACHE,OLD_GIT legacyStyle class AFTER,NEW_ROUTES,UNIFIED_CACHE,CACHE_TIERS,COORDINATED_GIT modernStyle class IMPROVEMENTS,FIX1,FIX2,FIX3 improvementStyle🖼️ View Architecture Diagram (PNG)
Phase 1: Core Refactoring (Issue #120)
Cache Service Extension (
repositoryCache.ts)Added
getCachedChurnData()- Implemented churn analysis caching using aggregated data tier with TTL of 900s (15 minutes), usingwithSharedRepository()for efficient Git access.Added
getCachedSummary()- IntegratedrepositorySummaryServicewith unified cache, using TTL of 7200s (2 hours) for metadata stability and preserving sparse clone optimization.Enhanced
getOrGenerateAggregatedData()- Added extensive debug logging, null/empty commit handling, type guard validation forCommitHeatmapData, and fixed filter conversion to exclude undefined properties.Repository Routes Refactoring (
repositoryRoutes.ts)All routes migrated to unified cache with POST → GET conversion:
GET /commits- UsesgetCachedCommits()with paginationGET /heatmap- UsesgetCachedAggregatedData()with filtersGET /contributors- UsesgetCachedContributors()with filtersGET /churn- UsesgetCachedChurnData()with filtersGET /summary- UsesgetCachedSummary()GET /full-data- Uses bothgetCachedCommits()andgetCachedAggregatedData()New Features:
pageandlimitquery parametersData Accuracy Fixes
Fix 1: Incomplete Commit History - Changed from shallow clone
--depth 1000to blob filtering strategy (--filter=blob:none), fetching complete history while saving 95-99% bandwidth (480 commits vs 346).Fix 2: Cache Key Mismatch - Fixed filter objects with
undefinedproperties creating inconsistent cache keys by only including defined properties.Fix 3: Lock Contention - Changed full-data endpoint from parallel to sequential execution to prevent cache corruption.
Phase 2: Code Deduplication & Quality Improvements
Following SonarQube analysis identifying 3.3% code duplication, systematic refactoring was performed:
Helper Extraction - Cache Operations
recordCacheHit()/recordCacheMiss()- Centralized cache hit/miss metrics recording, eliminating ~120 lines of duplication across 8 cache locations.handleCacheHit()- Consolidated cache hit path with logging and metrics, eliminating ~60 lines across 4 cache methods.handleCacheMiss()- Standardized cache miss recording and logging.handleTransactionSuccess()- Consolidated successful transaction caching operations with uniform logging and metrics.handleTransactionError()- Standardized transaction error handling with rollback, metrics, and structured logging.safeCacheGet()- Safe cache retrieval with standardized error handling (cacheErrorHandlers.ts).Helper Extraction - Route Operations
setupRouteRequest()- Extracted common request initialization (logger, repoUrl, userType) used by all routes.recordRouteSuccess()- Standardized success path with metrics, logging, and HTTP 200 responses.recordRouteError()- Uniform error handling with metrics, logging, and error propagation.buildCommitFilters()- BuildsCommitFilterOptionsfrom query parameters, excluding undefined properties for consistent cache keys.buildChurnFilters()- BuildsChurnFilterOptionswith proper field mapping (fromDate → since, toDate → until).extractPaginationParams()- Consistent pagination logic with defaults (page=1, limit=100).Route Factory Pattern
createCachedRouteHandler()- Factory function that eliminates route duplication by extracting common pattern:buildRepoValidationChain()- Builds validation chains based on options (pagination, dates, authors, churn).Utility Extraction
hashUrl()/hashObject()- Moved from private methods to shared utilities inhashingUtils.tsfor reuse.gitUtils.shallowClone()- Fixed to use blob filtering for complete commit history instead of depth limiting.Phase 3: Testing & Documentation
Test Enhancements
Unit Tests Added:
gitUtils.unit.test.ts- Tests for shallow clone blob filteringrepositoryRouteFactory.unit.test.ts- Tests for route factory functionsrouteHelpers.unit.test.ts- Tests for helper functionsEnhanced
repositoryRoutes.unit.test.ts- Added mock utilities and validation handlers for comprehensive coverage.API Test Scripts:
Documentation Created
FRONTEND_API_MIGRATION.md - Complete guide for frontend teams to migrate from POST to GET endpoints, including:
Serena MCP Memory Files:
Breaking Changes⚠️
All repository endpoints changed from POST to GET with query parameters:
POST /api/repositoriesGET /api/repositories/commitsPOST /api/repositories/heatmapGET /api/repositories/heatmapPOST /api/repositories/contributorsGET /api/repositories/contributorsPOST /api/repositories/churnGET /api/repositories/churnPOST /api/repositories/full-dataGET /api/repositories/full-dataGET /api/repositories/summaryGET /api/repositories/summaryFrontend Migration Required: See FRONTEND_API_MIGRATION.md for complete migration guide.
📋 View Endpoint Migration Diagram (for Frontend Team)
POST → GET Migration Map
%%{init: {'theme':'base', 'themeVariables': {'fontSize':'16px'}}}%% flowchart LR subgraph OLD["🔴 OLD - POST Endpoints"] direction TB P1["POST /api/repositories Body: { repoUrl }"] P2["POST /api/repositories/heatmap Body: { repoUrl, filterOptions }"] P3["POST /api/repositories/contributors Body: { repoUrl, filterOptions }"] P4["POST /api/repositories/churn Body: { repoUrl, filterOptions }"] P5["POST /api/repositories/full-data Body: { repoUrl, filterOptions }"] P6["GET /api/repositories/summary Query: ?repoUrl"] end subgraph NEW["🟢 NEW - GET Endpoints"] direction TB G1["GET /api/repositories/commits Query: ?repoUrl&page&limit"] G2["GET /api/repositories/heatmap Query: ?repoUrl&authors&fromDate&toDate"] G3["GET /api/repositories/contributors Query: ?repoUrl&authors&fromDate&toDate"] G4["GET /api/repositories/churn Query: ?repoUrl&fromDate&toDate&extensions"] G5["GET /api/repositories/full-data Query: ?repoUrl&page&limit&authors&dates"] G6["GET /api/repositories/summary Query: ?repoUrl"] end P1 -.->|"migrated + pagination"| G1 P2 -.->|"migrated + query params"| G2 P3 -.->|"migrated + query params"| G3 P4 -.->|"migrated + query params"| G4 P5 -.->|"migrated + pagination"| G5 P6 -.->|"improved cache"| G6 classDef oldStyle fill:#ffebee,stroke:#c62828,stroke-width:3px,color:#000 classDef newStyle fill:#e8f5e9,stroke:#2e7d32,stroke-width:3px,color:#000 class OLD,P1,P2,P3,P4,P5,P6 oldStyle class NEW,G1,G2,G3,G4,G5,G6 newStyle🖼️ View Endpoint Migration Diagram (PNG)
Migration Guide for Frontend Teams
Key Changes
1. HTTP Method Change
2. Filter Parameters
3. Pagination
Important: This backend refactor is frontend-agnostic. The new GET-based API can be consumed by any frontend implementation, including the new frontend being developed in a separate branch. See FRONTEND_API_MIGRATION.md for complete migration details.
🔄 View Request Flow & Cache Performance Diagram
Request Flow Through Multi-Tier Cache
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#f5f5f5','secondaryColor':'#e3f2fd','tertiaryColor':'#fff3e0'}}}%% flowchart LR subgraph CLIENT["Client"] REQUEST["HTTP Request"] end subgraph ENDPOINTS["API Endpoints"] direction TB EP1["GET /commits ?repoUrl&page&limit"] EP2["GET /heatmap ?repoUrl&authors&dates"] EP3["GET /contributors ?repoUrl&filters"] EP4["GET /churn ?repoUrl&filters"] EP5["GET /full-data ?repoUrl&all params"] end subgraph CACHE["Unified Cache Service"] direction TB L1["⚡ Memory ~1ms"] L2["💾 Disk ~10-50ms"] L3["🔴 Redis ~50-100ms"] end subgraph SOURCE["Git Repository"] GIT["📦 Shared Clone blob:none filter 480/480 commits"] end REQUEST --> ENDPOINTS EP1 --> CACHE EP2 --> CACHE EP3 --> CACHE EP4 --> CACHE EP5 --> CACHE CACHE -->|miss| SOURCE L1 -.->|promotion| L2 L2 -.->|promotion| L3 L3 -.->|promotion| GIT GIT -.->|cache| L3 L3 -.->|cache| L2 L2 -.->|cache| L1 L1 -->|hit| ENDPOINTS classDef clientStyle fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px classDef endpointStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:2px classDef cacheStyle fill:#fff3e0,stroke:#f57c00,stroke-width:2px classDef sourceStyle fill:#e8f5e9,stroke:#388e3c,stroke-width:2px class CLIENT,REQUEST clientStyle class ENDPOINTS,EP1,EP2,EP3,EP4,EP5 endpointStyle class CACHE,L1,L2,L3 cacheStyle class SOURCE,GIT sourceStyle🖼️ View Request Flow Diagram (PNG)
Test Results ✅
All endpoints tested with
https://github.com/jonasyr/gitray.git:Summary Endpoint
{ "stats": { "totalCommits": 480, // ✅ Correct (was null) "contributors": 6, // ✅ Correct (was null) "status": "active" } }Heatmap Endpoint
{ "timePeriod": "day", "dataPoints": 365 // ✅ Correct (was 0) }Contributors Endpoint
Commits Endpoint
Churn Analysis Endpoint
Full-Data Endpoint
CommitHeatmapDatastructureDocumentation Files
Lessons Learned
undefinedproperties in filter objectsAcceptance Criteria
redis.get/setlogic from all routesgitServicecalls with unified cache methodsRelated Issues
Testing Checklist
How to Test
Expected results:
Rollback Plan
If issues arise:
rm -rf apps/backend/cache/*Reviewers: Please verify:
cc: @jonasyr