Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,5 +31,5 @@ jobs:
run: ./gitcortex extract --repo .

- name: Quality gates
run: ./gitcortex ci --fail-on-churn-risk 2500 --format github-actions
run: ./gitcortex ci --fail-on-churn-risk 5500 --format github-actions
# Add --fail-on-busfactor 1 when team grows
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ Available stats:
| `churn-risk` | Files ranked by recent churn, classified into `cold` / `active` / `active-core` / `silo` / `legacy-hotspot` |
| `working-patterns` | Commit heatmap by hour and day of week |
| `dev-network` | Developer collaboration graph based on shared file ownership |
| `profile` | Per-developer report: scope, contribution type, pace, collaboration, top files |
| `profile` | Per-developer report: scope, specialization index, contribution type, pace, collaboration, top files |
| `top-commits` | Largest commits ranked by lines changed (includes message if extracted with `--include-commit-messages`) |
| `pareto` | Concentration (80% threshold) across files, devs (two lenses: commits and churn), and directories |

Expand All @@ -206,7 +206,7 @@ See [`docs/METRICS.md`](docs/METRICS.md) for how each metric is calculated, incl

### Developer profile

Manager-facing report per developer showing scope, contribution type, pace, collaboration, and top files.
Manager-facing report per developer showing scope, specialization, contribution type, pace, collaboration, and top files.

```bash
# All developers, ranked by commits
Expand All @@ -221,9 +221,10 @@ gitcortex stats --input data.jsonl --stat profile --format json

Each profile includes:
- **Scope**: top directories where the dev works (by unique files, %)
- **Specialization**: Herfindahl concentration over the dev's full directory distribution; 1 = all files in one dir (narrow specialist), approaches 0 for broad generalists. Labelled `broad generalist` / `balanced` / `focused specialist` / `narrow specialist`. *Measures file distribution on disk, not domain expertise — a security engineer who refactored auth across four dirs looks like a generalist even though they are a domain specialist. See METRICS.md for the caveat in full.*
- **Contribution**: growth (add >> del), balanced, or refactor (del >> add)
- **Pace**: commits per active day
- **Collaboration**: top devs sharing the same files
- **Collaboration**: top devs sharing the same files (ranked by `shared_lines` = Σ min(linesA, linesB))
- **Weekend %**: off-hours work ratio
- **Top files**: most impacted files by churn

Expand Down
14 changes: 14 additions & 0 deletions docs/METRICS.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ Per-developer report combining multiple metrics.
| Pace | commits / active_days (smooths bursts — a dev with 100 commits on 2 days and silence for 28 shows pace=50, which reads as a steady rate but isn't) |
| Weekend % | commits on Saturday+Sunday / total commits × 100 |
| Scope | Top 5 directories by unique file count, as % of total files touched |
| Specialization | Herfindahl index over the **full** per-directory file-count distribution: Σ pᵢ² where pᵢ is the share of the dev's files in directory i. 1 = all files in one directory (narrow specialist); 1/N for a uniform spread across N directories; approaches 0 as the distribution widens. Computed before the top-5 Scope truncation so it reflects actual breadth. Labels (see `specBroadGeneralistMax`, `specBalancedMax`, `specFocusedMax` constants): `< 0.15` broad generalist, `< 0.35` balanced, `< 0.7` focused specialist, `≥ 0.7` narrow specialist. Herfindahl, not Gini, because Gini would collapse "1 file in 1 dir" and "1 file in each of 5 dirs" to the same value (both have zero inequality among buckets), which misses the specialization distinction. **Measures file distribution, not domain expertise** — see caveat below. **Display vs raw:** CLI and HTML show the value rounded to 3 decimals (`%.3f`) for readability; JSON output preserves the full float64. Band classification runs against the raw float, so a value like 0.149 lands in `broad generalist` even though %.2f would have rounded it to `0.15`. JSON consumers that reproduce the banding must use the raw value, not a rounded version. |
| Contribution type | Based on del/add ratio: growth (<0.4), balanced (0.4-0.8), refactor (>0.8) |
| Collaborators | Top 5 devs sharing code with this dev. Ranked by `shared_lines` (Σ min(linesA, linesB) across shared files), tiebreak `shared_files`, then email. Same `shared_lines` semantics as the Developer Network metric — discounts trivial one-line touches so "collaborator" reflects real overlap. |

Expand Down Expand Up @@ -270,6 +271,9 @@ Every classification boundary is a named constant in `internal/stats/stats.go`.
| `contribBalancedRatio` | `0.4` | `0.4 ≤ del/add < 0.8` → `balanced`; below 0.4 → `growth`. |
| `refactorMinFiles` | `10` | Minimum files for a commit to be a mechanical-refactor candidate (coupling filter). |
| `refactorMaxChurnPerFile` | `5.0` | Mean churn per file below this in a candidate commit → treated as refactor; its pairs are excluded from coupling. |
| `specBroadGeneralistMax` | `0.15` | Specialization Herfindahl `< 0.15` → `broad generalist` label in dev profile. |
| `specBalancedMax` | `0.35` | `0.15 ≤ H < 0.35` → `balanced`. |
| `specFocusedMax` | `0.7` | `0.35 ≤ H < 0.7` → `focused specialist`; `H ≥ 0.7` → `narrow specialist`. |

### Reproducibility

Expand Down Expand Up @@ -350,3 +354,13 @@ If you need the label to reflect true age, either extract without `--since` (the
- **Renames reverted (cycle A→B→A).** The resolver bails out of the cycle with the current path; it doesn't crash but the "canonical" is implementation-defined for cyclic inputs.
- **Repo with single file.** The median-based `cold` threshold degenerates (median is that file's churn); the single file is never classified `cold`.
- **All files with identical churn.** Median equals every value, `lowChurn = median × 0.5`, so nothing is `cold`. Everything falls into the bf/age/trend tree.

### Dev specialization measures distribution, not expertise

The `Specialization` number and its label (`broad generalist` … `narrow specialist`) describe **where the dev's files live on disk**, not their semantic area of expertise. The two diverge whenever the person's domain cuts across the directory structure rather than aligning with it:

- A security engineer who audited and refactored auth across `api/`, `web/`, `gateway/`, and `services/` touches four dirs. Herfindahl is low, the label says "broad generalist" — but the person is a domain specialist whose domain happens to be cross-cutting.
- A release engineer who maintains CI/CD config scattered across `.github/`, `docker/`, `scripts/`, and `deploy/` lands the same way.
- Conversely, a generalist who happened to do a big one-off refactor of a single module in the recent window looks like a "narrow specialist" for the snapshot.

The label is a shortcut for reading the Herfindahl value. Use it when directory structure aligns with domains (one dir per module); cross-reference with `TopFiles`, `Scope`, and `Collaborators` to confirm when the repo is organized along another axis (e.g. monorepo with service boundaries cutting across dirs, or a library where concerns are horizontal). The raw Herfindahl value is objective; the interpretation of the label is not.
4 changes: 2 additions & 2 deletions internal/report/profile_template.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@ footer { margin-top: 40px; padding-top: 16px; border-top: 1px solid #d0d7de; col
</div>

<div style="margin-bottom:16px;">
<div style="font-size:13px; font-weight:600; margin-bottom:2px;">Scope</div>
<div class="hint" style="margin-bottom:6px;">Where this developer works, by share of files touched per directory. One dominant bar = specialist; evenly split = generalist or cross-team.</div>
<div style="font-size:13px; font-weight:600; margin-bottom:2px;">Scope <span style="font-size:11px; color:#656d76; font-style:italic; margin-left:4px;">Specialization {{printf "%.3f" .Profile.Specialization}} — {{if lt .Profile.Specialization 0.15}}broad generalist{{else if lt .Profile.Specialization 0.35}}balanced{{else if lt .Profile.Specialization 0.7}}focused specialist{{else}}narrow specialist{{end}}</span></div>
<div class="hint" style="margin-bottom:6px;">Where this developer works, by share of files touched per directory. The specialization number is the Herfindahl index over the full per-directory distribution: 1 = all files in a single directory, 1/N for a uniform spread across N directories (approaches 0 as N grows).</div>
<div style="display:flex; height:28px; border-radius:4px; overflow:hidden; gap:1px;">
{{range $i, $s := .Profile.Scope}}<div style="flex:{{printf "%.0f" $s.Pct}}; background:{{index (list "#0969da" "#2da44e" "#8250df" "#bf8700" "#cf222e") $i}}; display:flex; align-items:center; justify-content:center; color:#fff; font-size:10px; min-width:30px; overflow:hidden;" title="{{$s.Dir}} — {{$s.Files}} files ({{printf "%.0f" $s.Pct}}%)">{{if gt $s.Pct 8.0}}{{$s.Dir}} {{printf "%.0f" $s.Pct}}%{{end}}</div>{{end}}
</div>
Expand Down
3 changes: 3 additions & 0 deletions internal/report/template.go
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,9 @@ footer { margin-top: 40px; padding-top: 16px; border-top: 1px solid #d0d7de; col
<span style="color:#656d76;">Scope</span>
<span>{{range $i, $s := .Scope}}{{if $i}}, {{end}}<b>{{$s.Dir}}</b> ({{printf "%.0f" $s.Pct}}%){{end}}</span>

<span style="color:#656d76;">Specialization</span>
<span>{{printf "%.3f" .Specialization}} <span style="color:#656d76;">({{if lt .Specialization 0.15}}broad generalist{{else if lt .Specialization 0.35}}balanced{{else if lt .Specialization 0.7}}focused specialist{{else}}narrow specialist{{end}})</span></span>

<span style="color:#656d76;">Contribution</span>
<span>{{if eq .ContribType "growth"}}<span style="color:#2da44e;">{{.ContribType}}</span>{{else if eq .ContribType "refactor"}}<span style="color:#cf222e;">{{.ContribType}}</span>{{else}}<span style="color:#bf8700;">{{.ContribType}}</span>{{end}} <span style="color:#656d76;">(ratio {{printf "%.2f" .ContribRatio}} · +{{.Additions}} −{{.Deletions}})</span></span>

Expand Down
22 changes: 22 additions & 0 deletions internal/stats/format.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,22 @@ import (
"text/tabwriter"
)

// specLabel turns a DevProfile.Specialization (Herfindahl) value into a
// short human-readable classification. Thresholds live in stats.go as
// named constants so templates can reuse the same values.
func specLabel(h float64) string {
switch {
case h < specBroadGeneralistMax:
return "broad generalist"
case h < specBalancedMax:
return "balanced"
case h < specFocusedMax:
return "focused specialist"
default:
return "narrow specialist"
}
}

func JoinDevs(devs []string) string {
if len(devs) <= 3 {
return strings.Join(devs, ", ")
Expand Down Expand Up @@ -397,6 +413,12 @@ func (f *Formatter) PrintProfiles(profiles []DevProfile) error {
fmt.Fprintf(f.w, "%s (%.0f%%)", s.Dir, s.Pct)
}
fmt.Fprintln(f.w)
// %.3f (not %.2f): labels are assigned at thresholds 0.15 / 0.35
// / 0.7 using the unrounded float. With %.2f a value like
// 0.149 displays as "0.15" and the "broad generalist" label
// reads as inconsistent with the shown number. %.3f keeps
// the boundary distinguishable (0.149 vs 0.150).
fmt.Fprintf(f.w, " Specialization: %.3f (%s)\n", p.Specialization, specLabel(p.Specialization))
fmt.Fprintf(f.w, " Contribution: %s (ratio %.2f — add: %d, del: %d)\n", p.ContribType, p.ContribRatio, p.Additions, p.Deletions)
fmt.Fprintf(f.w, " Pace: %.1f commits/active day\n", p.Pace)
fmt.Fprintf(f.w, " Collaboration: ")
Expand Down
79 changes: 76 additions & 3 deletions internal/stats/stats.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,17 @@ const (
// additions. Strict < threshold: a commit with mean exactly 5.0 is
// NOT filtered.
refactorMaxChurnPerFile = 5.0

// Developer specialization labels, applied to DevProfile.Specialization
// (Herfindahl over per-directory file distribution). Tuned so that
// plausible repo shapes land in the expected band:
// uniform spread over 7+ dirs → broad generalist
// 2-4 dirs with one somewhat dominant → balanced
// one dir clearly dominant (~60-85% of files) → focused specialist
// ≥ 85% of files in one dir → narrow specialist
specBroadGeneralistMax = 0.15
specBalancedMax = 0.35
specFocusedMax = 0.7
)

type ContributorStat struct {
Expand Down Expand Up @@ -120,6 +131,51 @@ type DevEdge struct {
Weight float64 // shared_files / max(files_A, files_B) * 100 (legacy)
}

// herfindahl returns the Herfindahl–Hirschman concentration index of a
// sample of non-negative values: Σ (pᵢ)² where pᵢ = valueᵢ / Σ value.
//
// Unlike Gini (which measures inequality between buckets and so returns 0
// for both "100% in 1 bucket" and "evenly across N buckets"), Herfindahl
// distinguishes these cases:
// 100% in 1 bucket → 1 (maximal concentration / specialization)
// evenly across N buckets → 1/N (approaches 0 as N grows)
// This matches the specialization semantics needed here: a developer
// working in a single directory is maximally specialized, a developer
// spread across many directories is a generalist.
//
// Returns 0 for empty input or zero-sum input; returns 1 for a single
// non-zero bucket. Returns full float64 precision — callers that need
// to display the value should round at format time (the CLI and HTML
// templates use %.2f). Rounding inside this function caused quantization-
// induced label misclassification at band boundaries: a true value of
// 0.1496 would round to 0.150 and flip from "broad generalist" to
// "balanced".
func herfindahl(values []int) float64 {
if len(values) == 0 {
return 0
}
var sum int64
for _, v := range values {
if v < 0 {
v = 0
}
sum += int64(v)
}
if sum == 0 {
return 0
}
total := float64(sum)
var h float64
for _, v := range values {
if v <= 0 {
continue
}
p := float64(v) / total
h += p * p
}
return h
}

type StatsFlags struct {
CouplingMinChanges int
NetworkMinFiles int
Expand Down Expand Up @@ -689,6 +745,7 @@ type DevProfile struct {
LastDate string
TopFiles []DevFileContrib
Scope []DirScope
Specialization float64 // Gini over dir file-count distribution: 0 = broad generalist, 1 = single-dir specialist
ContribRatio float64 // del/add — 0=growth, ~1=rewrite, >1=cleanup
ContribType string // "growth", "balanced", "refactor"
Pace float64 // commits per active day
Expand Down Expand Up @@ -886,11 +943,17 @@ func DevProfiles(ds *Dataset, filterEmail string) []DevProfile {
wpct = math.Round(float64(weekend)/float64(total)*1000) / 10
}

// Scope: top directories by file count
// Scope: top directories by file count. Root-level files (no "/"
// in path) collapse into "." so they form a single bucket instead
// of each filename becoming its own pseudo-directory. Matches the
// convention in DirectoryStats and keeps Specialization honest —
// otherwise a dev who only touches README, Makefile, go.mod, etc.
// appears as a broad generalist across N pseudo-dirs instead of
// a narrow specialist on the repo root.
dirCount := make(map[string]int)
if files, ok := devFiles[email]; ok {
for path := range files {
dir := path
dir := "."
if idx := strings.LastIndex(path, "/"); idx >= 0 {
dir = path[:idx]
}
Expand All @@ -912,6 +975,16 @@ func DevProfiles(ds *Dataset, filterEmail string) []DevProfile {
}
return scope[i].Dir < scope[j].Dir
})
// Specialization index: Herfindahl over the FULL per-directory
// file-count distribution (before truncation to top 5). 1.0 = all
// files in one directory (narrow specialist); ~0 = spread across
// many dirs (broad generalist). See herfindahl() for why this
// captures concentration rather than inequality.
specValues := make([]int, 0, len(dirCount))
for _, count := range dirCount {
specValues = append(specValues, count)
}
specialization := herfindahl(specValues)
Comment thread
lex0c marked this conversation as resolved.
if len(scope) > 5 {
scope = scope[:5]
}
Expand Down Expand Up @@ -962,7 +1035,7 @@ func DevProfiles(ds *Dataset, filterEmail string) []DevProfile {
Commits: cs.Commits, Additions: cs.Additions, Deletions: cs.Deletions,
LinesChanged: cs.Additions + cs.Deletions, FilesTouched: cs.FilesTouched,
ActiveDays: cs.ActiveDays, FirstDate: cs.FirstDate, LastDate: cs.LastDate,
TopFiles: topFiles, Scope: scope,
TopFiles: topFiles, Scope: scope, Specialization: specialization,
ContribRatio: contribRatio, ContribType: contribType,
Pace: pace, Collaborators: collabs,
MonthlyActivity: monthly, WorkGrid: grid, WeekendPct: wpct,
Expand Down
Loading
Loading