Skip to content

Conversation

@jonasyr
Copy link
Owner

@jonasyr jonasyr commented Dec 2, 2025

Description

Refactors the /api/repositories/contributors endpoint to return all unique contributor names without ranking or statistics, resolving issue #121. This change makes the endpoint fully GDPR-compliant and enables it to reuse cached repositories from other endpoints for dramatic performance improvements.

Issue Reference

Resolves #121
Related to #120 (unified caching), #122 (repository coordinator)

Changes Made

API Contract Changes (Breaking)

Before:

{
  "contributors": [
    {
      "login": "Alice",
      "commitCount": 280,
      "linesAdded": 15420,
      "linesDeleted": 3210,
      "contributionPercentage": 58.3
    }
  ]
}

After:

{
  "contributors": [
    { "login": "Alice" },
    { "login": "Bob" },
    { "login": "Charlie" }
  ]
}

Implementation Changes

  1. Type Definitions (index.ts)

    • Simplified Contributor interface to only include login: string
    • Removed ContributorStat fields: commitCount, linesAdded, linesDeleted, contributionPercentage
  2. Service Layer (gitService.ts)

    • Replaced getTopContributors() with new getContributors() method
    • Uses git log --format=%aN instead of git log --numstat
    • Returns all unique contributors (no top-5 limit)
    • Alphabetically sorted for consistency
  3. Cache Layer (repositoryCache.ts)

    • Updated getOrGenerateContributors() to call new service method
    • Updated type guards for Contributor[] instead of ContributorStat[]
    • Maintains transactional cache consistency
  4. Tests (100% coverage maintained)

    • Completely rewrote getContributors test suite (8 test cases)
    • Updated route handler tests to validate simplified response structure
    • All 956 tests passing
  5. Documentation (FRONTEND_API_MIGRATION.md)

    • Updated response examples
    • Added migration notes highlighting breaking changes
    • Documented GDPR compliance benefits

Performance Improvements

The key benefit of this refactor is repository reuse - the old implementation couldn't reuse cached repositories because it required --numstat which wasn't in the raw commit cache. The new implementation only needs author names which can be extracted from already-cloned repositories.

Verified Performance Metrics (Manual API Tests)

Repository Contributors First Call Cached Call Speedup
gitray 6 1.5s 0.3s 5x
express 369 - 0.2s -
vscode 2,727 - 0.2s -
React 1,905 7.6s 0.3s 24x

Repository Reuse Confirmed:

  • /api/commits clones repo → /api/repositories/contributors reuses it (0.3s vs 7.6s)
  • Backend logs show "Fetching contributors via shared repository"
  • Same temp directory used (/tmp/git-visualizer-*)
  • No duplicate clone operations

GDPR Compliance

The new implementation is fully GDPR-compliant:

  • ✅ Returns only author names (pseudonymized identifiers)
  • ✅ No tracking of commit counts or contribution metrics
  • ✅ No personally identifiable information (no emails)
  • ✅ Aligns with data minimization principles

Breaking Changes

⚠️ Frontend Migration Required

  1. Response Structure Changed:

    • Old: ContributorStat[] with login, commitCount, linesAdded, linesDeleted, contributionPercentage
    • New: Contributor[] with only login
  2. No Top-5 Limit:

    • Old: Maximum 5 contributors
    • New: All unique contributors returned
  3. Alphabetical Sorting:

    • Old: Sorted by commit count (descending)
    • New: Sorted alphabetically by name

Migration Example:

// OLD CODE
const contributors = response.contributors;
console.log(`${contributors[0].login}: ${contributors[0].commitCount} commits`);

// NEW CODE
const contributors = response.contributors;
console.log(`Contributors: ${contributors.map(c => c.login).join(', ')}`);

See FRONTEND_API_MIGRATION.md section 3 for complete migration guide.

Testing

Unit Tests (All Passing)

  • ✅ 956 tests passing (100% retention)
  • getContributors test suite: 8 test cases covering:
    • Alphabetical sorting
    • Deduplication
    • Filter application (author, authors, dates)
    • Empty handling
    • Error cases
    • No top-5 limit enforcement

Integration Tests

  • ✅ Route handler tests updated to validate simplified response structure
  • ✅ No statistics fields in response validation

Manual API Testing

  • ✅ Tested with 4 different repositories (gitray, express, vscode, React)
  • ✅ Verified cache reuse with timing measurements
  • ✅ Confirmed repository coordination integration
  • ✅ Validated response structure matches spec

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Code comments added for complex logic
  • Documentation updated (FRONTEND_API_MIGRATION.md)
  • Tests added/updated (100% coverage maintained)
  • All tests passing (956/956)
  • No new TypeScript errors
  • Breaking changes documented
  • Manual testing completed with multiple repositories
  • Performance improvements verified

Additional Notes

This refactor is part of a larger initiative to optimize the GitRay backend for performance and compliance:

The removal of statistics from this endpoint is intentional - if detailed contributor analytics are needed in the future, they should be implemented in a separate, explicitly opt-in endpoint with proper consent mechanisms.

Screenshots/Logs

Backend Logs - Repository Reuse Confirmation
13:17:08 info [GET /]: Processing commits request with unified caching
13:17:08 info: Starting repository clone
13:17:15 info: Successfully cloned https://github.com/facebook/react.git
13:17:16 info: Successfully retrieved 21214 commits from /tmp/git-visualizer-FMMki3
13:17:16 info [GET /contributors]: Processing contributors request with unified caching
13:17:16 info: Fetching contributors via shared repository
13:17:16 info: Getting contributors from: /tmp/git-visualizer-FMMki3  ← REUSED!
13:17:16 info: Successfully retrieved 1905 unique contributors
API Response Examples

gitray repo (6 contributors):

{
  "contributors": [
    { "login": "Copilot" },
    { "login": "GitHub" },
    { "login": "Jonas Yao Rei" },
    { "login": "jonasyr" },
    { "login": "jonasyrdev" },
    { "login": "Jonas Yao Rei jonasyr" }
  ]
}

React repo (1,905 contributors):

{
  "contributors": [
    { "login": "Aaron Ackerman" },
    { "login": "Aaron Peckham" },
    ...
    { "login": "Зыкин Илья" }
  ]
}

…e names

Remove ranking and statistics from /contributors endpoint per issue #121.
The endpoint now returns all unique contributor names alphabetically sorted
without commit counts, line statistics, or contribution percentages.

Key changes:
- Replaced getTopContributors() with getContributors() in gitService
- Updated Contributor interface to only include  field
- Removed top-5 limit - returns all contributors
- Uses git log --format=%aN for author name extraction
- Maintains integration with unified caching and repository coordination
- Fully GDPR-compliant (author names only, no tracking metrics)

Benefits:
- Contributors endpoint can now reuse cached repositories from other endpoints
- 24x performance improvement (7.6s → 0.3s when repo already cached)
- No longer requires --numstat, enabling repository reuse
- Simpler API contract aligned with GET semantics

Breaking changes:
- Response structure changed from ContributorStat[] to Contributor[]
- Removed fields: commitCount, linesAdded, linesDeleted, contributionPercentage
- No longer limited to top 5 contributors

Files modified:
- packages/shared-types/src/index.ts (simplified Contributor interface)
- apps/backend/src/services/gitService.ts (new getContributors method)
- apps/backend/src/services/repositoryCache.ts (updated type guards)
- apps/backend/__tests__/unit/services/gitService.unit.test.ts (rewrote tests)
- apps/backend/__tests__/unit/routes/repositoryRoutes.unit.test.ts (updated assertions)
- FRONTEND_API_MIGRATION.md (documented API changes)

Tested with multiple repositories (gitray, express, vscode, React):
- 6 contributors for gitray (0.3s with cached repo)
- 369 contributors for express
- 2,727 contributors for vscode
- 1,905 contributors for React (0.3s vs 7.6s without cache reuse)

Resolves: #121
Related: #120 (unified caching), #122 (repository coordinator)
@jonasyr jonasyr requested a review from NiklasSkulll December 2, 2025 12:23
@jonasyr jonasyr self-assigned this Dec 2, 2025
@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 2, 2025

@NiklasSkulll NiklasSkulll merged commit 08f6d69 into dev Dec 2, 2025
8 checks passed
@jonasyr jonasyr deleted the 121-chorebackend-repurpose-contributors-endpoint-to-return-all-contributors-without-ranking branch December 2, 2025 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore(backend): Repurpose /contributors endpoint to return all contributors without ranking

3 participants