From f1f65be6a24adf12731955c67dfb7eab4e8cbaa3 Mon Sep 17 00:00:00 2001 From: lex0c Date: Sun, 19 Apr 2026 23:22:14 -0300 Subject: [PATCH 1/3] Fix DevProfile Pct sums and surface hidden-bucket counts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two related changes in the DevProfile Scope/Extensions render: 1. Denominator switch: cs.FilesTouched → len(devFiles[email]) FilesTouched counts every path the dev appeared on, including pure renames (commits where add==0 && del==0 on that file). The Scope and Extensions numerators, however, are built from fe.devLines, which only tracks files where the dev wrote lines. The mismatch silently deflated all Pcts in repos with reorgs and left sums below 100% with no visible cause. Switching to the authored-file count as the denominator makes the two sides consistent, aligns with the Herfindahl specialization index which already used the authored population, and leaves truncation as the ONLY reason a Pct sum can drop below 100. Validated on 5726 real profiles (pi-hole, praat, WordPress, kubernetes): zero profiles with <5 buckets sum to anything other than 100% after the fix. 2. ScopeHidden / ExtensionsHidden counters + "+N more" rendering A reader seeing "Scope: foo (28%), bar (25%), ... (+6 more)" now understands the 85% sum comes from 6 hidden buckets; previously the 85% read as a bug. Surfaced in CLI (inline suffix), HTML main report profile card (inline italic span), and HTML dedicated profile (line below the bar legend). Silent when Hidden == 0. Tests: regression on the denominator change (FilesTouched=5 but authored=4, sum must be 100 not 80); explicit truncation-sum invariant (6 buckets → 5 visible, sum<100); Hidden counters == 0 in the common case; Hidden counters == exact drop count when truncated. METRICS.md: both Scope and Extensions rows updated with the new denominator wording; Extensions caveat #2 (pure-rename gap) removed since it no longer applies. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/METRICS.md | 4 +- internal/report/profile_template.go | 4 +- internal/report/template.go | 4 +- internal/stats/extension_test.go | 164 ++++++++++++++++++++++++++++ internal/stats/format.go | 6 + internal/stats/stats.go | 40 ++++++- 6 files changed, 210 insertions(+), 12 deletions(-) diff --git a/docs/METRICS.md b/docs/METRICS.md index 1161e36..7ed473e 100644 --- a/docs/METRICS.md +++ b/docs/METRICS.md @@ -214,8 +214,8 @@ Per-developer report combining multiple metrics. | Active days | Unique dates with at least one commit | | Pace | commits / active_days (smooths bursts — a dev with 100 commits on 2 days and silence for 28 shows pace=50, which reads as a steady rate but isn't) | | Weekend % | commits on Saturday+Sunday / total commits × 100 | -| Scope | Top 5 directories by unique file count, as % of total files touched | -| Extensions | Top 5 file extensions the dev touched, sorted by **files desc** (tiebreak churn desc, then ext asc) so the displayed `Pct` is monotonic with the sort order and HTML bar widths read correctly. `Pct` is `Files/FilesTouched * 100`; the raw dev-attributable `Churn` (sum of `devLines[email]` across bucket files) is kept on the struct for JSON consumers who want a churn-ranked view. Answers the "language/skill fingerprint" question (`.go` + `.yaml` → backend+infra; `.tsx` + `.ts` + `.css` → frontend). **Caveats:** (1) bucket is derived from the file's canonical (post-rename) path — a dev who worked on `foo.js` pre-migration still shows up under `.ts` if it was later renamed; per-era per-dev attribution would need `byExt` to carry a dev dimension, which isn't tracked. (2) `Pct` values may sum to less than 100% when the dev appears as a contributor on files without adding lines (pure-rename contributions), since the extension aggregation only walks files with non-zero `devLines[email]`. | +| Scope | Top 5 directories by unique file count, as % of the dev's **authored** files — i.e. files where the dev added or removed at least one line. Pure renames (file appears in the dev's change set with zero line changes) are excluded from both numerator and denominator so the visible Pct values sum to 100% (modulo the top-5 truncation). Same denominator is used for Extensions and for the Herfindahl specialization index, keeping the three consistent. | +| Extensions | Top 5 file extensions the dev touched, sorted by **files desc** (tiebreak churn desc, then ext asc) so the displayed `Pct` is monotonic with the sort order and HTML bar widths read correctly. `Pct` is `Files / authored * 100` where `authored` is the count of files the dev added or removed at least one line on — same denominator as Scope, so Pcts sum to 100% modulo top-5 truncation. The raw dev-attributable `Churn` (sum of `devLines[email]` across bucket files) is kept on the struct for JSON consumers who want a churn-ranked view. Answers the "language/skill fingerprint" question (`.go` + `.yaml` → backend+infra; `.tsx` + `.ts` + `.css` → frontend). **Attribution caveat:** bucket is derived from the file's canonical (post-rename) path — a dev who worked on `foo.js` pre-migration still shows up under `.ts` if it was later renamed; per-era per-dev attribution would need `byExt` to carry a dev dimension, which isn't tracked. | | Specialization | Herfindahl index over the **full** per-directory file-count distribution: Σ pᵢ² where pᵢ is the share of the dev's files in directory i. 1 = all files in one directory (narrow specialist); 1/N for a uniform spread across N directories; approaches 0 as the distribution widens. Computed before the top-5 Scope truncation so it reflects actual breadth. Labels (see `specBroadGeneralistMax`, `specBalancedMax`, `specFocusedMax` constants): `< 0.15` broad generalist, `< 0.35` balanced, `< 0.7` focused specialist, `≥ 0.7` narrow specialist. Herfindahl, not Gini, because Gini would collapse "1 file in 1 dir" and "1 file in each of 5 dirs" to the same value (both have zero inequality among buckets), which misses the specialization distinction. **Measures file distribution, not domain expertise** — see caveat below. **Display vs raw:** CLI and HTML show the value rounded to 3 decimals (`%.3f`) for readability; JSON output preserves the full float64. Band classification runs against the raw float, so a value like 0.149 lands in `broad generalist` even though %.2f would have rounded it to `0.15`. JSON consumers that reproduce the banding must use the raw value, not a rounded version. | | Contribution type | Based on del/add ratio: growth (<0.4), balanced (0.4-0.8), refactor (>0.8) | | Collaborators | Top 5 devs sharing code with this dev. Ranked by `shared_lines` (Σ min(linesA, linesB) across shared files), tiebreak `shared_files`, then email. Same `shared_lines` semantics as the Developer Network metric — discounts trivial one-line touches so "collaborator" reflects real overlap. | diff --git a/internal/report/profile_template.go b/internal/report/profile_template.go index 74e61ef..08f9453 100644 --- a/internal/report/profile_template.go +++ b/internal/report/profile_template.go @@ -87,7 +87,7 @@ footer { margin-top: 40px; padding-top: 16px; border-top: 1px solid #d0d7de; col {{range $i, $s := .Profile.Scope}}
{{if gt $s.Pct 8.0}}{{$s.Dir}} {{printf "%.0f" $s.Pct}}%{{end}}
{{end}}
- {{range $i, $s := .Profile.Scope}} {{$s.Dir}} ({{printf "%.0f" $s.Pct}}%){{end}} + {{range $i, $s := .Profile.Scope}} {{$s.Dir}} ({{printf "%.0f" $s.Pct}}%){{end}}{{if gt .Profile.ScopeHidden 0}}+{{.Profile.ScopeHidden}} more directories not shown{{end}}
@@ -99,7 +99,7 @@ footer { margin-top: 40px; padding-top: 16px; border-top: 1px solid #d0d7de; col {{range $i, $e := .Profile.Extensions}}
{{if gt $e.Pct 8.0}}{{$e.Ext}} {{printf "%.0f" $e.Pct}}%{{end}}
{{end}}
- {{range $i, $e := .Profile.Extensions}} {{$e.Ext}} ({{printf "%.0f" $e.Pct}}%){{end}} + {{range $i, $e := .Profile.Extensions}} {{$e.Ext}} ({{printf "%.0f" $e.Pct}}%){{end}}{{if gt .Profile.ExtensionsHidden 0}}+{{.Profile.ExtensionsHidden}} more extensions not shown{{end}}
{{end}} diff --git a/internal/report/template.go b/internal/report/template.go index ed67f6c..a293a96 100644 --- a/internal/report/template.go +++ b/internal/report/template.go @@ -381,11 +381,11 @@ footer { margin-top: 40px; padding-top: 16px; border-top: 1px solid #d0d7de; col
Scope - {{range $i, $s := .Scope}}{{if $i}}, {{end}}{{$s.Dir}} ({{printf "%.0f" $s.Pct}}%){{end}} + {{range $i, $s := .Scope}}{{if $i}}, {{end}}{{$s.Dir}} ({{printf "%.0f" $s.Pct}}%){{end}}{{if gt .ScopeHidden 0}} (+{{.ScopeHidden}} more){{end}} {{if .Extensions}} Extensions - {{range $i, $e := .Extensions}}{{if $i}}, {{end}}{{$e.Ext}} ({{printf "%.0f" $e.Pct}}%){{end}} + {{range $i, $e := .Extensions}}{{if $i}}, {{end}}{{$e.Ext}} ({{printf "%.0f" $e.Pct}}%){{end}}{{if gt .ExtensionsHidden 0}} (+{{.ExtensionsHidden}} more){{end}} {{end}} Specialization diff --git a/internal/stats/extension_test.go b/internal/stats/extension_test.go index e315c5d..2189674 100644 --- a/internal/stats/extension_test.go +++ b/internal/stats/extension_test.go @@ -456,6 +456,72 @@ func TestDevProfileExtensionsAllNone(t *testing.T) { } } +// Regression: Pct denominator is len(devFiles[email]), NOT +// cs.FilesTouched. FilesTouched includes pure-rename contributions +// (the dev appears in the file's contribFiles but adds zero lines), +// which would deflate all percentages because those files never make +// it into the Scope/Extension numerators. Under the corrected +// denominator the visible Pcts sum to 100 (modulo top-5 truncation). +// This test drives that divergence: FilesTouched = 5 but only 4 +// files carry non-zero devLines, so the denominators differ. +func TestDevProfileExtensionsPctDenominatorIgnoresRenames(t *testing.T) { + ds := &Dataset{ + contributors: map[string]*ContributorStat{ + // FilesTouched = 5 simulates the dev having appeared on a + // 5th file via a pure rename (no add/del lines). The files + // map below only has 4 entries with non-zero devLines. + "alice@x": {Email: "alice@x", Commits: 5, FilesTouched: 5, ActiveDays: 1}, + }, + files: map[string]*fileEntry{ + "a.go": {devLines: map[string]int64{"alice@x": 10}, devCommits: map[string]int{"alice@x": 1}}, + "b.go": {devLines: map[string]int64{"alice@x": 10}, devCommits: map[string]int{"alice@x": 1}}, + "c.go": {devLines: map[string]int64{"alice@x": 10}, devCommits: map[string]int{"alice@x": 1}}, + "d.py": {devLines: map[string]int64{"alice@x": 10}, devCommits: map[string]int{"alice@x": 1}}, + }, + commits: map[string]*commitEntry{}, + workGrid: [7][24]int{}, + } + p := DevProfiles(ds, "alice@x", 0)[0] + + // Visible extensions: .go with 3/4 = 75%, .py with 1/4 = 25%. + // Under the old (FilesTouched=5) denominator the sum would be + // 3/5 + 1/5 = 80% — a silent 20% gap from the rename. + var sum float64 + for _, e := range p.Extensions { + sum += e.Pct + } + if sum != 100.0 { + t.Errorf("Extensions Pct sum = %.1f, want 100.0 (denominator should be authored files, not FilesTouched)", sum) + } + + // Spot-check individual values. + var goPct, pyPct float64 + for _, e := range p.Extensions { + if e.Ext == ".go" { + goPct = e.Pct + } + if e.Ext == ".py" { + pyPct = e.Pct + } + } + if goPct != 75.0 { + t.Errorf(".go Pct = %.1f, want 75.0 (3/4)", goPct) + } + if pyPct != 25.0 { + t.Errorf(".py Pct = %.1f, want 25.0 (1/4)", pyPct) + } + + // Same invariant on Scope: a.go/b.go/c.go all at root, d.py at + // root → single bucket "." at 100%. Renames shouldn't count. + var scopeSum float64 + for _, s := range p.Scope { + scopeSum += s.Pct + } + if scopeSum != 100.0 { + t.Errorf("Scope Pct sum = %.1f, want 100.0", scopeSum) + } +} + // Edge case: a dev whose commits never touch any file (all commits // had files_changed = 0, so no commit_file records reached fe.devLines). // devFiles[email] is absent; Extensions must be nil — both HTML @@ -475,6 +541,104 @@ func TestDevProfileExtensionsEmpty(t *testing.T) { } } +// Regression: once a dev has >5 buckets the top-5 truncation is the +// ONLY way Pct sum can drop below 100. Lock that invariant — a +// silent change to the cap size or the sort would surface here. +func TestDevProfileExtensionsTruncationSum(t *testing.T) { + // 6 files, all single-ext, each 1 file → all 6 buckets carry the + // same Files=1. Top-5 sort keeps 5 (tiebroken by churn desc then + // ext asc); the 6th drops off, contributing its ~16.7% to the + // gap. Math: 5 × round(1/6 × 1000)/10 = 5 × 16.7 = 83.5%. + ds := &Dataset{ + contributors: map[string]*ContributorStat{ + "alice@x": {Email: "alice@x", Commits: 6, FilesTouched: 6, ActiveDays: 1}, + }, + files: map[string]*fileEntry{ + "a.go": {devLines: map[string]int64{"alice@x": 60}, devCommits: map[string]int{"alice@x": 1}}, + "b.py": {devLines: map[string]int64{"alice@x": 50}, devCommits: map[string]int{"alice@x": 1}}, + "c.rs": {devLines: map[string]int64{"alice@x": 40}, devCommits: map[string]int{"alice@x": 1}}, + "d.ts": {devLines: map[string]int64{"alice@x": 30}, devCommits: map[string]int{"alice@x": 1}}, + "e.md": {devLines: map[string]int64{"alice@x": 20}, devCommits: map[string]int{"alice@x": 1}}, + "f.sh": {devLines: map[string]int64{"alice@x": 10}, devCommits: map[string]int{"alice@x": 1}}, + }, + commits: map[string]*commitEntry{}, + workGrid: [7][24]int{}, + } + p := DevProfiles(ds, "alice@x", 0)[0] + if len(p.Extensions) != 5 { + t.Fatalf("Extensions len = %d, want 5 (truncated from 6)", len(p.Extensions)) + } + var sum float64 + for _, e := range p.Extensions { + sum += e.Pct + } + // Must be strictly <100 (truncated) but close (~83.5 for this + // fixture). Wide tolerance — the exact value is rounding-sensitive. + if sum >= 100.0 { + t.Errorf("truncated Extensions sum = %.1f, want strictly < 100", sum) + } + if sum < 80.0 || sum > 86.0 { + t.Errorf("truncated Extensions sum = %.1f, want ~83.5 (5/6 buckets × 16.7%%)", sum) + } +} + +// Regression: when truncation drops buckets, the count goes into +// ScopeHidden/ExtensionsHidden so renderers can surface "+N more" +// next to the visible list. Silent when no truncation (the whole +// point is to appear only when the Pct-sum <100% case makes readers +// suspect a bug). +func TestDevProfileHiddenCounters(t *testing.T) { + // Build a dev with 7 extensions and 6 dirs so both counters hit. + ds := &Dataset{ + contributors: map[string]*ContributorStat{ + "alice@x": {Email: "alice@x", Commits: 7, FilesTouched: 7, ActiveDays: 1}, + }, + files: map[string]*fileEntry{ + "d1/a.go": {devLines: map[string]int64{"alice@x": 70}, devCommits: map[string]int{"alice@x": 1}}, + "d2/b.py": {devLines: map[string]int64{"alice@x": 60}, devCommits: map[string]int{"alice@x": 1}}, + "d3/c.rs": {devLines: map[string]int64{"alice@x": 50}, devCommits: map[string]int{"alice@x": 1}}, + "d4/d.ts": {devLines: map[string]int64{"alice@x": 40}, devCommits: map[string]int{"alice@x": 1}}, + "d5/e.md": {devLines: map[string]int64{"alice@x": 30}, devCommits: map[string]int{"alice@x": 1}}, + "d6/f.sh": {devLines: map[string]int64{"alice@x": 20}, devCommits: map[string]int{"alice@x": 1}}, + "d6/g.yml": {devLines: map[string]int64{"alice@x": 10}, devCommits: map[string]int{"alice@x": 1}}, + }, + commits: map[string]*commitEntry{}, + workGrid: [7][24]int{}, + } + p := DevProfiles(ds, "alice@x", 0)[0] + + // 6 dirs → 5 visible, 1 hidden. + if p.ScopeHidden != 1 { + t.Errorf("ScopeHidden = %d, want 1", p.ScopeHidden) + } + // 7 extensions → 5 visible, 2 hidden. + if p.ExtensionsHidden != 2 { + t.Errorf("ExtensionsHidden = %d, want 2", p.ExtensionsHidden) + } +} + +// Silent when nothing to hide — the counters must be zero so the +// renderers don't emit "+0 more" (noise) for the common case. +func TestDevProfileHiddenCountersZeroWhenFits(t *testing.T) { + ds := &Dataset{ + contributors: map[string]*ContributorStat{ + "bob@x": {Email: "bob@x", Commits: 3, FilesTouched: 3, ActiveDays: 1}, + }, + files: map[string]*fileEntry{ + "src/a.go": {devLines: map[string]int64{"bob@x": 10}, devCommits: map[string]int{"bob@x": 1}}, + "src/b.go": {devLines: map[string]int64{"bob@x": 10}, devCommits: map[string]int{"bob@x": 1}}, + "docs/x.md": {devLines: map[string]int64{"bob@x": 5}, devCommits: map[string]int{"bob@x": 1}}, + }, + commits: map[string]*commitEntry{}, + workGrid: [7][24]int{}, + } + p := DevProfiles(ds, "bob@x", 0)[0] + if p.ScopeHidden != 0 || p.ExtensionsHidden != 0 { + t.Errorf("Hidden counters: Scope=%d Ext=%d, want 0/0 (dev has ≤5 buckets each)", + p.ScopeHidden, p.ExtensionsHidden) + } +} + // Truncate to top-5 when a dev's extension set is larger. Under the // files-first sort, ties on file count (all 1 each here) fall through // to churn desc, so the top 5 by churn still win. diff --git a/internal/stats/format.go b/internal/stats/format.go index 66ca240..895c766 100644 --- a/internal/stats/format.go +++ b/internal/stats/format.go @@ -467,6 +467,9 @@ func (f *Formatter) PrintProfiles(profiles []DevProfile) error { } fmt.Fprintf(f.w, "%s (%.0f%%)", s.Dir, s.Pct) } + if p.ScopeHidden > 0 { + fmt.Fprintf(f.w, " (+%d more)", p.ScopeHidden) + } fmt.Fprintln(f.w) if len(p.Extensions) > 0 { fmt.Fprintf(f.w, " Extensions: ") @@ -476,6 +479,9 @@ func (f *Formatter) PrintProfiles(profiles []DevProfile) error { } fmt.Fprintf(f.w, "%s (%.0f%%)", e.Ext, e.Pct) } + if p.ExtensionsHidden > 0 { + fmt.Fprintf(f.w, " (+%d more)", p.ExtensionsHidden) + } fmt.Fprintln(f.w) } // %.3f (not %.2f): labels are assigned at thresholds 0.15 / 0.35 diff --git a/internal/stats/stats.go b/internal/stats/stats.go index 1e85bf2..8f9b9e6 100644 --- a/internal/stats/stats.go +++ b/internal/stats/stats.go @@ -1264,7 +1264,13 @@ type DevProfile struct { LastDate string TopFiles []DevFileContrib Scope []DirScope - Extensions []DevExtContrib + // ScopeHidden / ExtensionsHidden count the buckets dropped by the + // top-5 truncation so CLI and HTML can surface "+N more" — without + // this, a reader sees Pct summing to e.g. 85% and wonders if the + // math is broken. Zero when the full set fits in 5. + ScopeHidden int + Extensions []DevExtContrib + ExtensionsHidden int Specialization float64 // Gini over dir file-count distribution: 0 = broad generalist, 1 = single-dir specialist ContribRatio float64 // del/add — 0=growth, ~1=rewrite, >1=cleanup ContribType string // "growth", "balanced", "refactor" @@ -1554,11 +1560,24 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile { dirCount[dir]++ } } + // Pct denominator is the count of files the dev authored lines + // on (len(devFiles[email])), NOT cs.FilesTouched. The latter + // includes pure renames (path appears in the dev's change set + // but with zero additions/deletions), which would never appear + // in dirCount/extCount numerators — creating a silent + // under-100% sum. Using len(devFiles[email]) keeps Pct summing + // to 100% (modulo top-5 truncation) and aligns the visible + // numbers with the Herfindahl specialization index below, + // which also operates on dirCount. + authored := 0 + if files, ok := devFiles[email]; ok { + authored = len(files) + } var scope []DirScope for dir, count := range dirCount { pct := 0.0 - if cs.FilesTouched > 0 { - pct = math.Round(float64(count)/float64(cs.FilesTouched)*1000) / 10 + if authored > 0 { + pct = math.Round(float64(count)/float64(authored)*1000) / 10 } scope = append(scope, DirScope{Dir: dir, Files: count, Pct: pct}) } @@ -1579,7 +1598,9 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile { specValues = append(specValues, count) } specialization := herfindahl(specValues) + scopeHidden := 0 if len(scope) > 5 { + scopeHidden = len(scope) - 5 scope = scope[:5] } @@ -1610,8 +1631,11 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile { var extensions []DevExtContrib for ext, acc := range extCount { pct := 0.0 - if cs.FilesTouched > 0 { - pct = math.Round(float64(acc.files)/float64(cs.FilesTouched)*1000) / 10 + // Same denominator as Scope above — authored-file count, + // not FilesTouched — so pure renames don't deflate the + // percentages silently. + if authored > 0 { + pct = math.Round(float64(acc.files)/float64(authored)*1000) / 10 } extensions = append(extensions, DevExtContrib{ Ext: ext, Files: acc.files, Churn: acc.churn, Pct: pct, @@ -1633,7 +1657,9 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile { } return extensions[i].Ext < extensions[j].Ext }) + extensionsHidden := 0 if len(extensions) > 5 { + extensionsHidden = len(extensions) - 5 extensions = extensions[:5] } @@ -1683,7 +1709,9 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile { Commits: cs.Commits, Additions: cs.Additions, Deletions: cs.Deletions, LinesChanged: cs.Additions + cs.Deletions, FilesTouched: cs.FilesTouched, ActiveDays: cs.ActiveDays, FirstDate: cs.FirstDate, LastDate: cs.LastDate, - TopFiles: topFiles, Scope: scope, Extensions: extensions, Specialization: specialization, + TopFiles: topFiles, Scope: scope, ScopeHidden: scopeHidden, + Extensions: extensions, ExtensionsHidden: extensionsHidden, + Specialization: specialization, ContribRatio: contribRatio, ContribType: contribType, Pace: pace, Collaborators: collabs, MonthlyActivity: monthly, WorkGrid: grid, WeekendPct: wpct, From b71d0176cd90c997b62211c917788d74521dc294 Mon Sep 17 00:00:00 2001 From: lex0c Date: Sun, 19 Apr 2026 23:32:13 -0300 Subject: [PATCH 2/3] Use teal monochrome palette for Extensions bar in dedicated profile MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Scope and Extensions both rendered as horizontal bars stacked vertically in the dedicated profile page. Sharing the same categorical palette (blue/green/purple/orange/red) invited false cross-chart correlation — a reader's eye auto-pairs same-index segments ("the blue dir must go with the blue ext") even though the two axes are independent. Swap Extensions to a teal monochromatic progression (#0e4c5b → #5dbdb7). Two effects: (1) no hue collides with Scope's palette, so same-color confusion is eliminated; (2) monochrome signals "ordered distribution" instead of "distinct categories", which matches what the data actually represents (top-5 slice of one ranking). Palette stops at #5dbdb7 so the lightest shade still holds adequate contrast with the white overlay labels on slices ≥8% (smaller slices skip the label anyway per the existing `gt $e.Pct 8.0` guard). Co-Authored-By: Claude Opus 4.7 (1M context) --- internal/report/profile_template.go | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/internal/report/profile_template.go b/internal/report/profile_template.go index 08f9453..0bda0ee 100644 --- a/internal/report/profile_template.go +++ b/internal/report/profile_template.go @@ -95,11 +95,12 @@ footer { margin-top: 40px; padding-top: 16px; border-top: 1px solid #d0d7de; col
Extensions
The dev's language/skill fingerprint by share of files touched. Extension attribution uses the file's current canonical path, so cross-extension renames (e.g. .js → .ts) credit pre-rename work to the new extension. · {{docRef "profile"}}
+ {{/* Teal monochrome progression — intentionally a different color family from Scope's categorical palette above. Same-index colors in Scope (blue/green/purple/orange/red) would invite false cross-chart correlation ("the blue dir uses the blue ext"). The monochromatic treatment also visually signals that this is a single ordered distribution, not five independent categories. Stopped at #3fa3ae so even the lightest shade keeps adequate contrast with white text when a tail bucket is large enough to show a label. */}}
- {{range $i, $e := .Profile.Extensions}}
{{if gt $e.Pct 8.0}}{{$e.Ext}} {{printf "%.0f" $e.Pct}}%{{end}}
{{end}} + {{range $i, $e := .Profile.Extensions}}
{{if gt $e.Pct 8.0}}{{$e.Ext}} {{printf "%.0f" $e.Pct}}%{{end}}
{{end}}
- {{range $i, $e := .Profile.Extensions}} {{$e.Ext}} ({{printf "%.0f" $e.Pct}}%){{end}}{{if gt .Profile.ExtensionsHidden 0}}+{{.Profile.ExtensionsHidden}} more extensions not shown{{end}} + {{range $i, $e := .Profile.Extensions}} {{$e.Ext}} ({{printf "%.0f" $e.Pct}}%){{end}}{{if gt .Profile.ExtensionsHidden 0}}+{{.Profile.ExtensionsHidden}} more extensions not shown{{end}}
{{end}} From 2896eea4b6559f27db57e88770c209d2ba964bbc Mon Sep 17 00:00:00 2001 From: lex0c Date: Sun, 19 Apr 2026 23:36:25 -0300 Subject: [PATCH 3/3] Skip zero-line writes into fe.devLines at ingest MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit fe.devLines[email] += cf.Additions + cf.Deletions ran on every commit_file row, including pure renames where the numstat carries 0/0. That +=0 created a zero-valued map entry which then survived as a "dev touched this file" signal into every downstream consumer: BusFactor, FileHotspots.UniqueDevs, DeveloperNetwork pairs, ChurnRisk bus factor, and — the reported symptom — DevProfile.Scope / Extensions Pct math. A dev with one real .go edit and one pure .md rename was reporting .go 50% + .md 50% even though they never authored a line of .md. Fix the write site: only touch fe.devLines when lines > 0. devCommits still increments unconditionally so the distinct "appeared on this file" signal remains available for any caller that wants it (none do today). devLines now cleanly means "lines this dev contributed", which is the semantic the METRICS.md Scope/Extensions doc already advertised but the code didn't enforce. Integration test uses streamLoad on a synthetic JSONL with an M commit and an R100 rename: Alice's profile surfaces only the .go bucket at 100%, and the renamed .md file reports 0 unique devs from FileHotspots. Existing tests pass unchanged because no test fixture relied on the zero-entry behavior. Co-Authored-By: Claude Opus 4.7 (1M context) --- internal/stats/extension_test.go | 50 ++++++++++++++++++++++++++++++++ internal/stats/reader.go | 14 ++++++++- 2 files changed, 63 insertions(+), 1 deletion(-) diff --git a/internal/stats/extension_test.go b/internal/stats/extension_test.go index 2189674..7b48146 100644 --- a/internal/stats/extension_test.go +++ b/internal/stats/extension_test.go @@ -1,6 +1,7 @@ package stats import ( + "strings" "testing" "time" ) @@ -582,6 +583,55 @@ func TestDevProfileExtensionsTruncationSum(t *testing.T) { } } +// Regression at the INGEST level: a pure rename (commit_file with +// additions=0 && deletions=0) used to create a zero-valued entry in +// fe.devLines, which then made len(devFiles[email]) count that file +// as "authored" by the renaming dev. The 50/50 symptom: Alice edits +// one .go file (5 lines) and separately renames one .md file (0 +// lines); under the broken ingest she shows up with `.go (50%)` + +// `.md (50%)` in the Extensions fingerprint even though she never +// wrote a single line in .md. The fix skips the zero-line write +// site so devLines stays the "lines this dev contributed" map. +// devCommits is intentionally still bumped — that map preserves the +// "dev appeared on this file" signal for any caller that wants it. +func TestDevProfilePureRenamesNotAuthored(t *testing.T) { + jsonl := `{"type":"commit","sha":"c1","tree":"t","parents":[],"author_name":"Alice","author_email":"alice@x","author_date":"2024-01-10T10:00:00Z","committer_name":"Alice","committer_email":"alice@x","committer_date":"2024-01-10T10:00:00Z","additions":5,"deletions":0,"files_changed":1} +{"type":"commit_file","commit":"c1","path_current":"src/main.go","path_previous":"src/main.go","status":"M","old_hash":"0","new_hash":"1","old_size":0,"new_size":0,"additions":5,"deletions":0} +{"type":"commit","sha":"c2","tree":"t","parents":[],"author_name":"Alice","author_email":"alice@x","author_date":"2024-01-12T10:00:00Z","committer_name":"Alice","committer_email":"alice@x","committer_date":"2024-01-12T10:00:00Z","additions":0,"deletions":0,"files_changed":1} +{"type":"commit_file","commit":"c2","path_current":"docs/renamed.md","path_previous":"docs/old.md","status":"R100","old_hash":"0","new_hash":"2","old_size":0,"new_size":0,"additions":0,"deletions":0} +` + ds, err := streamLoad(strings.NewReader(jsonl), LoadOptions{HalfLifeDays: 90, CoupMaxFiles: 50}) + if err != nil { + t.Fatalf("load: %v", err) + } + p := DevProfiles(ds, "alice@x", 0)[0] + + // Only the .go edit counts as authored. The .md pure-rename must + // not show up in Extensions or Scope. + if len(p.Extensions) != 1 || p.Extensions[0].Ext != ".go" { + t.Errorf("Extensions = %+v, want single .go bucket", p.Extensions) + } + if p.Extensions[0].Pct != 100.0 { + t.Errorf(".go Pct = %.1f, want 100.0 (rename should not inflate denominator)", p.Extensions[0].Pct) + } + if len(p.Scope) != 1 || p.Scope[0].Dir != "src" { + t.Errorf("Scope = %+v, want single src/ bucket", p.Scope) + } + if p.Scope[0].Pct != 100.0 { + t.Errorf("src/ Pct = %.1f, want 100.0", p.Scope[0].Pct) + } + + // Cross-check downstream stats that also consume fe.devLines: + // UniqueDevs on the renamed file must be 0 now (no one authored + // lines on it) — before the fix it would have been 1. + hotspots := FileHotspots(ds, 0) + for _, h := range hotspots { + if h.Path == "docs/renamed.md" && h.UniqueDevs != 0 { + t.Errorf("pure-renamed file unique devs = %d, want 0 (no line authors)", h.UniqueDevs) + } + } +} + // Regression: when truncation drops buckets, the count goes into // ScopeHidden/ExtensionsHidden so renderers can surface "+N more" // next to the visible list. Silent when no truncation (the whole diff --git a/internal/stats/reader.go b/internal/stats/reader.go index 6c395d8..87fc4ef 100644 --- a/internal/stats/reader.go +++ b/internal/stats/reader.go @@ -367,7 +367,19 @@ func streamLoadInto(ds *Dataset, r io.Reader, opt LoadOptions, pathPrefix string cm := ds.commits[cf.Commit] if cm != nil { - fe.devLines[cm.email] += cf.Additions + cf.Deletions + // Only record a devLines entry when the change actually + // carried lines. Pure renames (R100 with 0/0 numstat) + // would otherwise create a zero-valued map entry that + // survives as "dev touched this file" into every + // downstream consumer — bus factor, unique-dev counts, + // dev network, and DevProfile authored counts — inflating + // them with contributions that are not authored work. + // devCommits still increments unconditionally so the + // "appeared on this file" signal stays available for + // callers that want it. + if lines := cf.Additions + cf.Deletions; lines > 0 { + fe.devLines[cm.email] += lines + } fe.devCommits[cm.email]++ // Contributor files touched