Git Size History is an experimental fast CLI tool that analyzes how a git repository's size has grown over time by sampling commits at regular intervals and measuring packed object sizes.
- Fast: Efficient size measurement, multithreaded processing
- Visual: Generates PNG plots of size over time (and CSV data for further analysis)
- Safe: Read-only operations, never modifies your repository
- Cross-platform: Works on Linux, macOS, and Windows
- Use Bitmap Index: Leverages git's bitmap index for fast object counting and size estimation
Use
git repack -a -d --write-bitmap-indexto create a bitmap index for faster analysis on large repositories.
$ time target/release/git-size-history ~/tmp/linux -o linux-bm.csv --plot linux-bm.png
[00:00:37] Analysis complete [00:00:04] [========================================] 22/22 Sampling complete Writing CSV to linux-bm.csv
Generating plot: linux-bm.png
Plot saved to linux-bm.png
=== Summary ===
Repository: /home/gautier/tmp/linux
Total commits analyzed: 1426552
Time span: 2005-04-16 to 2026-02-21 (20.8 years)
Sample points: 22
Sampling method: yearly
Initial size (2005-04-16): 53.14 MB
Final size (2026-02-21): 6.20 GB
Total growth: 6.15 GB
Output written to linux-bm.csv
Plot saved to linux-bm.png
real 0m43,268s
Scanning the size history of 1.4M commits in 43 seconds!
(Without bitmap index, it takes 3 minutes)
git clone https://github.com/example/git-size-history.git
cd git-size-history
cargo build --releaseThe binary will be at target/release/git-size-history.
# Analyze current directory
git-size-history -o output.csv
# Analyze specific repository with plot
git-size-history /path/to/repo -o output.csv --plot size-over-time.png| Option | Description |
|---|---|
<REPO_PATH> |
Path to git repository (default: .) |
-o, --output <FILE> |
Output CSV file path (required) |
--plot <FILE> |
Generate PNG plot of cumulative size |
--yearly |
Force yearly sampling |
--monthly |
Force monthly sampling (default for repos ≤6 years) |
-D, --debug |
Show debug output (object counts, sizes) |
-U, --uncompressed |
Calculate uncompressed blob sizes (slower) |
-h, --help |
Print help |
-V, --version |
Print version |
# Analyze a large repository with yearly sampling
git-size-history --yearly -o linux-size.csv --plot linux-size.png /path/to/linux
# Analyze current project with monthly sampling
git-size-history --monthly -o project-size.csv .
# Quick analysis with default settings
git-size-history -o output.csv /path/to/repo
# Show debug information during analysis
git-size-history -D -o output.csv /path/to/repo
# Include uncompressed sizes for compression ratio analysis
git-size-history -U -o output.csv /path/to/repoThe output CSV contains size measurements over time:
date,cumulative-size,uncompressed-size
2020-01-15,1048576,10485760
2021-01-15,2097152,20971520
2022-01-15,4194304,41943040| Column | Description |
|---|---|
date |
Sampling date in YYYY-MM-DD format |
cumulative-size |
Packed repository size in bytes (after git gc) |
uncompressed-size |
Total uncompressed blob size (only with -U flag) |
Tip: The ratio between uncompressed and packed size shows git's compression efficiency (typically 5-10x).
The generated PNG plot displays:
- X-axis: Timeline with year-month labels
- Y-axis: Repository size with automatic unit scaling (B, KB, MB, GB)
- Line: Cumulative packed size over time
Git Size History uses an adaptive sampling approach:
| Repository Age | Sampling Interval |
|---|---|
| > 6 years | Yearly (365 days) |
| ≤ 6 years | Monthly (30 days) |
The latest commit is always included as the final sample point.
For each sample point:
- Find Nearest Commit: Binary search for commit closest to sample date
- Packed Size:
git rev-list --objects --disk-usagemeasures actual disk usage - Uncompressed Size (optional):
git cat-file --batch-checksums all blob sizes
| Benefit | Description |
|---|---|
| Accurate | Measures actual disk usage after git compression |
| Fast | No cloning or temporary repositories needed |
| Safe | Read-only operations, never modifies the repository |
| Efficient | Uses git's batch mode for high-performance queries |
| Dependency | Version |
|---|---|
| Rust | 1.75 or later |
| Git | 2.0 or later |
Ensure the path points to a valid git repository:
cd /path/to/repo && git statusCheck that git is installed and accessible:
git --version
git rev-list --objects --disk-usage HEADEnsure you have write permissions in the output directory:
ls -la /path/to/output/directoryTry these optimizations:
- Use
--yearlyflag to reduce sample points - Skip uncompressed calculation (don't use
-U) - Ensure repository is on fast storage (SSD recommended)
- Run
git gcon the repository first
Analyzing large repositories (e.g., Linux kernel with 1.4M+ commits) can consume significant memory due to parallel processing. We use Rayon for parallelism, which by default uses all available CPU cores.
To limit memory usage:
- Reduce parallel threads using
RAYON_NUM_THREADS:# Limit to 2 threads (reduces memory pressure) RAYON_NUM_THREADS=2 git-size-history -o output.csv /path/to/repo
git clone https://github.com/example/git-size-history.git
cd git-size-history
cargo build --releasecargo testcargo fmt -- --check
cargo clippy -- -D warningsThis project is licensed under the MIT License - see LICENSE for details.
- Inspired by How to Calculate Git Repository Growth Over Time by Andrew Berry
- Built with:
