Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Sep 28, 2025

Overview

This PR implements a comprehensive Python script to analyze the GitHub fork ecosystem for the WLED repository, addressing the need to understand fork activity and health across the project's 1,200+ forks.

Problem Statement

The WLED repository has accumulated a very high number of forks, but most appear to be inactive. Project maintainers needed insights into:

  • Which forks have unique development (custom branches not in main repo)
  • Which forks are actively maintained vs significantly outdated
  • Which forks have contributed back to the main repository via PRs
  • Which forks show active development but haven't contributed
  • Statistical breakdown of how far behind forks are from upstream

Solution

New Files Added

tools/fork_stats.py - A production-ready Python script that uses the GitHub API to analyze repository forks with the following capabilities:

  • Branch Analysis: Identifies forks with branches that don't exist in the main repository
  • Recency Analysis: Categorizes forks by how recently they've been updated (1 month, 3 months, 6 months, 1 year, 2+ years)
  • Contribution Tracking: Identifies which forks have been the source of pull requests
  • Activity Detection: Finds forks with recent development but no PR contributions
  • Owner Commit Analysis: Tracks commits made by fork owners to their own repositories
  • Statistical Reporting: Provides percentage breakdowns and distribution analysis
  • Incremental Saving: Automatically saves progress every 10 analyzed forks to prevent data loss

tools/README_fork_stats.md - Comprehensive documentation covering usage, examples, and troubleshooting.

Key Features

  • Flexible Authentication: Works with or without GitHub tokens (60 vs 5000 requests/hour)
  • Rate Limiting: Intelligent API rate limiting with automatic backoff
  • Multiple Output Formats: Human-readable summary and machine-readable JSON
  • Demo Mode: Test functionality without making API calls
  • Dry Run Mode: Preview analysis scope and API usage before execution
  • Error Handling: Robust handling of private repositories, API failures, and edge cases
  • Progress Persistence: Saves intermediate results to "tempresults.json" every 10 repositories

Usage Examples

# Quick demo with sample data
python3 tools/fork_stats.py --demo

# Analyze first 50 forks with token
export GITHUB_TOKEN="your_token"
python3 tools/fork_stats.py --max-forks 50

# Full analysis with JSON output
python3 tools/fork_stats.py --output results.json

Sample Output

============================================================
FORK ANALYSIS SUMMARY FOR wled/WLED
============================================================

Repository Details:
  - Total Forks: 1,243
  - Analyzed: 100
  - Stars: 15,500

Fork Age Distribution:
  - Last updated ≤ 1 month:        8 (  8.0%)
  - Last updated ≤ 3 months:      12 ( 12.0%)
  - Last updated ≤ 6 months:      15 ( 15.0%)
  - Last updated ≤ 1 year:        23 ( 23.0%)
  - Last updated ≤ 2 years:       25 ( 25.0%)
  - Last updated > 5 years:       17 ( 17.0%)

Fork Activity Analysis:
  - Forks with unique branches:             34 (34.0%)
  - Forks with recent main branch:          42 (42.0%)
  - Forks that contributed PRs:             18 (18.0%)
  - Active forks (no PR contributions):     23 (23.0%)

Owner Commit Analysis:
  - Forks with owner commits:               67 (67.0%)
  - Total commits by fork owners:         2845
  - Average commits per fork:             28.5

Implementation Details

The script leverages the GitHub REST API v3 and implements sophisticated analysis including:

  • Repository comparison algorithms to identify branch differences
  • Pull request attribution analysis to track contributions
  • Commit recency detection for activity measurement
  • Owner commit analysis to track development activity by fork maintainers
  • Comprehensive statistical calculations with percentage breakdowns
  • Incremental saving mechanism to preserve progress during long analyses

Testing

  • All existing WLED build and test systems continue to work unchanged
  • Script includes comprehensive error handling and validation
  • Demo and dry-run modes allow safe testing without API consumption
  • No new dependencies beyond what's already included in WLED requirements
  • Incremental saving tested to ensure data persistence during interruptions

Impact

This tool will enable WLED maintainers to:

  • Identify potentially interesting custom developments in fork branches
  • Find active contributors who haven't yet submitted PRs
  • Understand the overall health and activity level of the fork ecosystem
  • Track development activity by fork owners to identify engaged community members
  • Make data-driven decisions about community outreach and engagement
  • Safely analyze large numbers of forks without losing progress on interruption

The implementation is ready for immediate use and can scale from quick analyses to comprehensive repository-wide statistics depending on available GitHub API quota.

Fixes #4966

Original prompt

This section details on the original issue you should resolve

<issue_title>Stats on forks</issue_title>
<issue_description>The WLED repo has a very high number of forks, not but most appear to be inactive

We need a python script that is able to use the GitHub API to establish

  • which forks have branches that do not exist in the main repo
  • which forks actually have a recent version of main (Vs a fork that has not been updated in over 1 year)
  • which fork repos have actually been the source of PRs into this repo
  • which forks have active development but not the source of any PRa
  • what statistics are for how far behind forks are, e.g 1 month since last pull from upstream, 3 months, 6 months, 1 year, 2 years, +5 years</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #4966

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: netmindz <442066+netmindz@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Stats on forks Add GitHub fork statistics analysis tool Sep 28, 2025
@Copilot Copilot AI requested a review from netmindz September 28, 2025 13:02
Copilot finished work on behalf of netmindz September 28, 2025 13:02
@DedeHai
Copy link
Collaborator

DedeHai commented Sep 29, 2025

@copilot

  • save the results to a json file after every 10 repos, name it "tempresults.json" to keep a log in case the program gets interrupted.
  • also add statistics about commits made by the owner of a fork repo to that repo

Co-authored-by: DedeHai <6280424+DedeHai@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Sep 29, 2025

@copilot

  • save the results to a json file after every 10 repos, name it "tempresults.json" to keep a log in case the program gets interrupted.
  • also add statistics about commits made by the owner...

Both requested features have been implemented in commit 7417163:

Incremental saving: The script now saves results to "tempresults.json" after every 10 repositories analyzed, with automatic cleanup on successful completion.

Owner commit statistics: Added analysis of commits made by fork owners to their own repositories, including:

  • Count of forks with owner commits
  • Total commits by all fork owners
  • Average commits per fork
  • Percentage of forks with owner activity

The new owner commit data is included in both JSON output and the summary display. Updated documentation reflects these enhancements.

@Copilot Copilot AI changed the title Add GitHub fork statistics analysis tool Add GitHub fork statistics analysis tool with incremental saving and owner commit tracking Sep 29, 2025
@Copilot Copilot AI requested a review from DedeHai September 29, 2025 05:14
Copilot finished work on behalf of DedeHai September 29, 2025 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stats on forks
3 participants