diff --git a/docs/METRICS.md b/docs/METRICS.md
index 1161e36..7ed473e 100644
--- a/docs/METRICS.md
+++ b/docs/METRICS.md
@@ -214,8 +214,8 @@ Per-developer report combining multiple metrics.
| Active days | Unique dates with at least one commit |
| Pace | commits / active_days (smooths bursts — a dev with 100 commits on 2 days and silence for 28 shows pace=50, which reads as a steady rate but isn't) |
| Weekend % | commits on Saturday+Sunday / total commits × 100 |
-| Scope | Top 5 directories by unique file count, as % of total files touched |
-| Extensions | Top 5 file extensions the dev touched, sorted by **files desc** (tiebreak churn desc, then ext asc) so the displayed `Pct` is monotonic with the sort order and HTML bar widths read correctly. `Pct` is `Files/FilesTouched * 100`; the raw dev-attributable `Churn` (sum of `devLines[email]` across bucket files) is kept on the struct for JSON consumers who want a churn-ranked view. Answers the "language/skill fingerprint" question (`.go` + `.yaml` → backend+infra; `.tsx` + `.ts` + `.css` → frontend). **Caveats:** (1) bucket is derived from the file's canonical (post-rename) path — a dev who worked on `foo.js` pre-migration still shows up under `.ts` if it was later renamed; per-era per-dev attribution would need `byExt` to carry a dev dimension, which isn't tracked. (2) `Pct` values may sum to less than 100% when the dev appears as a contributor on files without adding lines (pure-rename contributions), since the extension aggregation only walks files with non-zero `devLines[email]`. |
+| Scope | Top 5 directories by unique file count, as % of the dev's **authored** files — i.e. files where the dev added or removed at least one line. Pure renames (file appears in the dev's change set with zero line changes) are excluded from both numerator and denominator so the visible Pct values sum to 100% (modulo the top-5 truncation). Same denominator is used for Extensions and for the Herfindahl specialization index, keeping the three consistent. |
+| Extensions | Top 5 file extensions the dev touched, sorted by **files desc** (tiebreak churn desc, then ext asc) so the displayed `Pct` is monotonic with the sort order and HTML bar widths read correctly. `Pct` is `Files / authored * 100` where `authored` is the count of files the dev added or removed at least one line on — same denominator as Scope, so Pcts sum to 100% modulo top-5 truncation. The raw dev-attributable `Churn` (sum of `devLines[email]` across bucket files) is kept on the struct for JSON consumers who want a churn-ranked view. Answers the "language/skill fingerprint" question (`.go` + `.yaml` → backend+infra; `.tsx` + `.ts` + `.css` → frontend). **Attribution caveat:** bucket is derived from the file's canonical (post-rename) path — a dev who worked on `foo.js` pre-migration still shows up under `.ts` if it was later renamed; per-era per-dev attribution would need `byExt` to carry a dev dimension, which isn't tracked. |
| Specialization | Herfindahl index over the **full** per-directory file-count distribution: Σ pᵢ² where pᵢ is the share of the dev's files in directory i. 1 = all files in one directory (narrow specialist); 1/N for a uniform spread across N directories; approaches 0 as the distribution widens. Computed before the top-5 Scope truncation so it reflects actual breadth. Labels (see `specBroadGeneralistMax`, `specBalancedMax`, `specFocusedMax` constants): `< 0.15` broad generalist, `< 0.35` balanced, `< 0.7` focused specialist, `≥ 0.7` narrow specialist. Herfindahl, not Gini, because Gini would collapse "1 file in 1 dir" and "1 file in each of 5 dirs" to the same value (both have zero inequality among buckets), which misses the specialization distinction. **Measures file distribution, not domain expertise** — see caveat below. **Display vs raw:** CLI and HTML show the value rounded to 3 decimals (`%.3f`) for readability; JSON output preserves the full float64. Band classification runs against the raw float, so a value like 0.149 lands in `broad generalist` even though %.2f would have rounded it to `0.15`. JSON consumers that reproduce the banding must use the raw value, not a rounded version. |
| Contribution type | Based on del/add ratio: growth (<0.4), balanced (0.4-0.8), refactor (>0.8) |
| Collaborators | Top 5 devs sharing code with this dev. Ranked by `shared_lines` (Σ min(linesA, linesB) across shared files), tiebreak `shared_files`, then email. Same `shared_lines` semantics as the Developer Network metric — discounts trivial one-line touches so "collaborator" reflects real overlap. |
diff --git a/internal/report/profile_template.go b/internal/report/profile_template.go
index 74e61ef..0bda0ee 100644
--- a/internal/report/profile_template.go
+++ b/internal/report/profile_template.go
@@ -87,7 +87,7 @@ footer { margin-top: 40px; padding-top: 16px; border-top: 1px solid #d0d7de; col
{{range $i, $s := .Profile.Scope}}
{{if gt $s.Pct 8.0}}{{$s.Dir}} {{printf "%.0f" $s.Pct}}%{{end}}
{{end}}
- {{range $i, $s := .Profile.Scope}} {{$s.Dir}} ({{printf "%.0f" $s.Pct}}%){{end}}
+ {{range $i, $s := .Profile.Scope}} {{$s.Dir}} ({{printf "%.0f" $s.Pct}}%){{end}}{{if gt .Profile.ScopeHidden 0}}+{{.Profile.ScopeHidden}} more directories not shown{{end}}
@@ -95,11 +95,12 @@ footer { margin-top: 40px; padding-top: 16px; border-top: 1px solid #d0d7de; col
Extensions
The dev's language/skill fingerprint by share of files touched. Extension attribution uses the file's current canonical path, so cross-extension renames (e.g. .js → .ts) credit pre-rename work to the new extension. · {{docRef "profile"}}
+ {{/* Teal monochrome progression — intentionally a different color family from Scope's categorical palette above. Same-index colors in Scope (blue/green/purple/orange/red) would invite false cross-chart correlation ("the blue dir uses the blue ext"). The monochromatic treatment also visually signals that this is a single ordered distribution, not five independent categories. Stopped at #3fa3ae so even the lightest shade keeps adequate contrast with white text when a tail bucket is large enough to show a label. */}}
- {{range $i, $e := .Profile.Extensions}}
{{if gt $e.Pct 8.0}}{{$e.Ext}} {{printf "%.0f" $e.Pct}}%{{end}}
{{end}}
+ {{range $i, $e := .Profile.Extensions}}
{{if gt $e.Pct 8.0}}{{$e.Ext}} {{printf "%.0f" $e.Pct}}%{{end}}
{{end}}
- {{range $i, $e := .Profile.Extensions}} {{$e.Ext}} ({{printf "%.0f" $e.Pct}}%){{end}}
+ {{range $i, $e := .Profile.Extensions}} {{$e.Ext}} ({{printf "%.0f" $e.Pct}}%){{end}}{{if gt .Profile.ExtensionsHidden 0}}+{{.Profile.ExtensionsHidden}} more extensions not shown{{end}}
{{end}}
diff --git a/internal/report/template.go b/internal/report/template.go
index ed67f6c..a293a96 100644
--- a/internal/report/template.go
+++ b/internal/report/template.go
@@ -381,11 +381,11 @@ footer { margin-top: 40px; padding-top: 16px; border-top: 1px solid #d0d7de; col
Scope
- {{range $i, $s := .Scope}}{{if $i}}, {{end}}{{$s.Dir}} ({{printf "%.0f" $s.Pct}}%){{end}}
+ {{range $i, $s := .Scope}}{{if $i}}, {{end}}{{$s.Dir}} ({{printf "%.0f" $s.Pct}}%){{end}}{{if gt .ScopeHidden 0}} (+{{.ScopeHidden}} more){{end}}
{{if .Extensions}}
Extensions
- {{range $i, $e := .Extensions}}{{if $i}}, {{end}}{{$e.Ext}} ({{printf "%.0f" $e.Pct}}%){{end}}
+ {{range $i, $e := .Extensions}}{{if $i}}, {{end}}{{$e.Ext}} ({{printf "%.0f" $e.Pct}}%){{end}}{{if gt .ExtensionsHidden 0}} (+{{.ExtensionsHidden}} more){{end}}
{{end}}
Specialization
diff --git a/internal/stats/extension_test.go b/internal/stats/extension_test.go
index e315c5d..7b48146 100644
--- a/internal/stats/extension_test.go
+++ b/internal/stats/extension_test.go
@@ -1,6 +1,7 @@
package stats
import (
+ "strings"
"testing"
"time"
)
@@ -456,6 +457,72 @@ func TestDevProfileExtensionsAllNone(t *testing.T) {
}
}
+// Regression: Pct denominator is len(devFiles[email]), NOT
+// cs.FilesTouched. FilesTouched includes pure-rename contributions
+// (the dev appears in the file's contribFiles but adds zero lines),
+// which would deflate all percentages because those files never make
+// it into the Scope/Extension numerators. Under the corrected
+// denominator the visible Pcts sum to 100 (modulo top-5 truncation).
+// This test drives that divergence: FilesTouched = 5 but only 4
+// files carry non-zero devLines, so the denominators differ.
+func TestDevProfileExtensionsPctDenominatorIgnoresRenames(t *testing.T) {
+ ds := &Dataset{
+ contributors: map[string]*ContributorStat{
+ // FilesTouched = 5 simulates the dev having appeared on a
+ // 5th file via a pure rename (no add/del lines). The files
+ // map below only has 4 entries with non-zero devLines.
+ "alice@x": {Email: "alice@x", Commits: 5, FilesTouched: 5, ActiveDays: 1},
+ },
+ files: map[string]*fileEntry{
+ "a.go": {devLines: map[string]int64{"alice@x": 10}, devCommits: map[string]int{"alice@x": 1}},
+ "b.go": {devLines: map[string]int64{"alice@x": 10}, devCommits: map[string]int{"alice@x": 1}},
+ "c.go": {devLines: map[string]int64{"alice@x": 10}, devCommits: map[string]int{"alice@x": 1}},
+ "d.py": {devLines: map[string]int64{"alice@x": 10}, devCommits: map[string]int{"alice@x": 1}},
+ },
+ commits: map[string]*commitEntry{},
+ workGrid: [7][24]int{},
+ }
+ p := DevProfiles(ds, "alice@x", 0)[0]
+
+ // Visible extensions: .go with 3/4 = 75%, .py with 1/4 = 25%.
+ // Under the old (FilesTouched=5) denominator the sum would be
+ // 3/5 + 1/5 = 80% — a silent 20% gap from the rename.
+ var sum float64
+ for _, e := range p.Extensions {
+ sum += e.Pct
+ }
+ if sum != 100.0 {
+ t.Errorf("Extensions Pct sum = %.1f, want 100.0 (denominator should be authored files, not FilesTouched)", sum)
+ }
+
+ // Spot-check individual values.
+ var goPct, pyPct float64
+ for _, e := range p.Extensions {
+ if e.Ext == ".go" {
+ goPct = e.Pct
+ }
+ if e.Ext == ".py" {
+ pyPct = e.Pct
+ }
+ }
+ if goPct != 75.0 {
+ t.Errorf(".go Pct = %.1f, want 75.0 (3/4)", goPct)
+ }
+ if pyPct != 25.0 {
+ t.Errorf(".py Pct = %.1f, want 25.0 (1/4)", pyPct)
+ }
+
+ // Same invariant on Scope: a.go/b.go/c.go all at root, d.py at
+ // root → single bucket "." at 100%. Renames shouldn't count.
+ var scopeSum float64
+ for _, s := range p.Scope {
+ scopeSum += s.Pct
+ }
+ if scopeSum != 100.0 {
+ t.Errorf("Scope Pct sum = %.1f, want 100.0", scopeSum)
+ }
+}
+
// Edge case: a dev whose commits never touch any file (all commits
// had files_changed = 0, so no commit_file records reached fe.devLines).
// devFiles[email] is absent; Extensions must be nil — both HTML
@@ -475,6 +542,153 @@ func TestDevProfileExtensionsEmpty(t *testing.T) {
}
}
+// Regression: once a dev has >5 buckets the top-5 truncation is the
+// ONLY way Pct sum can drop below 100. Lock that invariant — a
+// silent change to the cap size or the sort would surface here.
+func TestDevProfileExtensionsTruncationSum(t *testing.T) {
+ // 6 files, all single-ext, each 1 file → all 6 buckets carry the
+ // same Files=1. Top-5 sort keeps 5 (tiebroken by churn desc then
+ // ext asc); the 6th drops off, contributing its ~16.7% to the
+ // gap. Math: 5 × round(1/6 × 1000)/10 = 5 × 16.7 = 83.5%.
+ ds := &Dataset{
+ contributors: map[string]*ContributorStat{
+ "alice@x": {Email: "alice@x", Commits: 6, FilesTouched: 6, ActiveDays: 1},
+ },
+ files: map[string]*fileEntry{
+ "a.go": {devLines: map[string]int64{"alice@x": 60}, devCommits: map[string]int{"alice@x": 1}},
+ "b.py": {devLines: map[string]int64{"alice@x": 50}, devCommits: map[string]int{"alice@x": 1}},
+ "c.rs": {devLines: map[string]int64{"alice@x": 40}, devCommits: map[string]int{"alice@x": 1}},
+ "d.ts": {devLines: map[string]int64{"alice@x": 30}, devCommits: map[string]int{"alice@x": 1}},
+ "e.md": {devLines: map[string]int64{"alice@x": 20}, devCommits: map[string]int{"alice@x": 1}},
+ "f.sh": {devLines: map[string]int64{"alice@x": 10}, devCommits: map[string]int{"alice@x": 1}},
+ },
+ commits: map[string]*commitEntry{},
+ workGrid: [7][24]int{},
+ }
+ p := DevProfiles(ds, "alice@x", 0)[0]
+ if len(p.Extensions) != 5 {
+ t.Fatalf("Extensions len = %d, want 5 (truncated from 6)", len(p.Extensions))
+ }
+ var sum float64
+ for _, e := range p.Extensions {
+ sum += e.Pct
+ }
+ // Must be strictly <100 (truncated) but close (~83.5 for this
+ // fixture). Wide tolerance — the exact value is rounding-sensitive.
+ if sum >= 100.0 {
+ t.Errorf("truncated Extensions sum = %.1f, want strictly < 100", sum)
+ }
+ if sum < 80.0 || sum > 86.0 {
+ t.Errorf("truncated Extensions sum = %.1f, want ~83.5 (5/6 buckets × 16.7%%)", sum)
+ }
+}
+
+// Regression at the INGEST level: a pure rename (commit_file with
+// additions=0 && deletions=0) used to create a zero-valued entry in
+// fe.devLines, which then made len(devFiles[email]) count that file
+// as "authored" by the renaming dev. The 50/50 symptom: Alice edits
+// one .go file (5 lines) and separately renames one .md file (0
+// lines); under the broken ingest she shows up with `.go (50%)` +
+// `.md (50%)` in the Extensions fingerprint even though she never
+// wrote a single line in .md. The fix skips the zero-line write
+// site so devLines stays the "lines this dev contributed" map.
+// devCommits is intentionally still bumped — that map preserves the
+// "dev appeared on this file" signal for any caller that wants it.
+func TestDevProfilePureRenamesNotAuthored(t *testing.T) {
+ jsonl := `{"type":"commit","sha":"c1","tree":"t","parents":[],"author_name":"Alice","author_email":"alice@x","author_date":"2024-01-10T10:00:00Z","committer_name":"Alice","committer_email":"alice@x","committer_date":"2024-01-10T10:00:00Z","additions":5,"deletions":0,"files_changed":1}
+{"type":"commit_file","commit":"c1","path_current":"src/main.go","path_previous":"src/main.go","status":"M","old_hash":"0","new_hash":"1","old_size":0,"new_size":0,"additions":5,"deletions":0}
+{"type":"commit","sha":"c2","tree":"t","parents":[],"author_name":"Alice","author_email":"alice@x","author_date":"2024-01-12T10:00:00Z","committer_name":"Alice","committer_email":"alice@x","committer_date":"2024-01-12T10:00:00Z","additions":0,"deletions":0,"files_changed":1}
+{"type":"commit_file","commit":"c2","path_current":"docs/renamed.md","path_previous":"docs/old.md","status":"R100","old_hash":"0","new_hash":"2","old_size":0,"new_size":0,"additions":0,"deletions":0}
+`
+ ds, err := streamLoad(strings.NewReader(jsonl), LoadOptions{HalfLifeDays: 90, CoupMaxFiles: 50})
+ if err != nil {
+ t.Fatalf("load: %v", err)
+ }
+ p := DevProfiles(ds, "alice@x", 0)[0]
+
+ // Only the .go edit counts as authored. The .md pure-rename must
+ // not show up in Extensions or Scope.
+ if len(p.Extensions) != 1 || p.Extensions[0].Ext != ".go" {
+ t.Errorf("Extensions = %+v, want single .go bucket", p.Extensions)
+ }
+ if p.Extensions[0].Pct != 100.0 {
+ t.Errorf(".go Pct = %.1f, want 100.0 (rename should not inflate denominator)", p.Extensions[0].Pct)
+ }
+ if len(p.Scope) != 1 || p.Scope[0].Dir != "src" {
+ t.Errorf("Scope = %+v, want single src/ bucket", p.Scope)
+ }
+ if p.Scope[0].Pct != 100.0 {
+ t.Errorf("src/ Pct = %.1f, want 100.0", p.Scope[0].Pct)
+ }
+
+ // Cross-check downstream stats that also consume fe.devLines:
+ // UniqueDevs on the renamed file must be 0 now (no one authored
+ // lines on it) — before the fix it would have been 1.
+ hotspots := FileHotspots(ds, 0)
+ for _, h := range hotspots {
+ if h.Path == "docs/renamed.md" && h.UniqueDevs != 0 {
+ t.Errorf("pure-renamed file unique devs = %d, want 0 (no line authors)", h.UniqueDevs)
+ }
+ }
+}
+
+// Regression: when truncation drops buckets, the count goes into
+// ScopeHidden/ExtensionsHidden so renderers can surface "+N more"
+// next to the visible list. Silent when no truncation (the whole
+// point is to appear only when the Pct-sum <100% case makes readers
+// suspect a bug).
+func TestDevProfileHiddenCounters(t *testing.T) {
+ // Build a dev with 7 extensions and 6 dirs so both counters hit.
+ ds := &Dataset{
+ contributors: map[string]*ContributorStat{
+ "alice@x": {Email: "alice@x", Commits: 7, FilesTouched: 7, ActiveDays: 1},
+ },
+ files: map[string]*fileEntry{
+ "d1/a.go": {devLines: map[string]int64{"alice@x": 70}, devCommits: map[string]int{"alice@x": 1}},
+ "d2/b.py": {devLines: map[string]int64{"alice@x": 60}, devCommits: map[string]int{"alice@x": 1}},
+ "d3/c.rs": {devLines: map[string]int64{"alice@x": 50}, devCommits: map[string]int{"alice@x": 1}},
+ "d4/d.ts": {devLines: map[string]int64{"alice@x": 40}, devCommits: map[string]int{"alice@x": 1}},
+ "d5/e.md": {devLines: map[string]int64{"alice@x": 30}, devCommits: map[string]int{"alice@x": 1}},
+ "d6/f.sh": {devLines: map[string]int64{"alice@x": 20}, devCommits: map[string]int{"alice@x": 1}},
+ "d6/g.yml": {devLines: map[string]int64{"alice@x": 10}, devCommits: map[string]int{"alice@x": 1}},
+ },
+ commits: map[string]*commitEntry{},
+ workGrid: [7][24]int{},
+ }
+ p := DevProfiles(ds, "alice@x", 0)[0]
+
+ // 6 dirs → 5 visible, 1 hidden.
+ if p.ScopeHidden != 1 {
+ t.Errorf("ScopeHidden = %d, want 1", p.ScopeHidden)
+ }
+ // 7 extensions → 5 visible, 2 hidden.
+ if p.ExtensionsHidden != 2 {
+ t.Errorf("ExtensionsHidden = %d, want 2", p.ExtensionsHidden)
+ }
+}
+
+// Silent when nothing to hide — the counters must be zero so the
+// renderers don't emit "+0 more" (noise) for the common case.
+func TestDevProfileHiddenCountersZeroWhenFits(t *testing.T) {
+ ds := &Dataset{
+ contributors: map[string]*ContributorStat{
+ "bob@x": {Email: "bob@x", Commits: 3, FilesTouched: 3, ActiveDays: 1},
+ },
+ files: map[string]*fileEntry{
+ "src/a.go": {devLines: map[string]int64{"bob@x": 10}, devCommits: map[string]int{"bob@x": 1}},
+ "src/b.go": {devLines: map[string]int64{"bob@x": 10}, devCommits: map[string]int{"bob@x": 1}},
+ "docs/x.md": {devLines: map[string]int64{"bob@x": 5}, devCommits: map[string]int{"bob@x": 1}},
+ },
+ commits: map[string]*commitEntry{},
+ workGrid: [7][24]int{},
+ }
+ p := DevProfiles(ds, "bob@x", 0)[0]
+ if p.ScopeHidden != 0 || p.ExtensionsHidden != 0 {
+ t.Errorf("Hidden counters: Scope=%d Ext=%d, want 0/0 (dev has ≤5 buckets each)",
+ p.ScopeHidden, p.ExtensionsHidden)
+ }
+}
+
// Truncate to top-5 when a dev's extension set is larger. Under the
// files-first sort, ties on file count (all 1 each here) fall through
// to churn desc, so the top 5 by churn still win.
diff --git a/internal/stats/format.go b/internal/stats/format.go
index 66ca240..895c766 100644
--- a/internal/stats/format.go
+++ b/internal/stats/format.go
@@ -467,6 +467,9 @@ func (f *Formatter) PrintProfiles(profiles []DevProfile) error {
}
fmt.Fprintf(f.w, "%s (%.0f%%)", s.Dir, s.Pct)
}
+ if p.ScopeHidden > 0 {
+ fmt.Fprintf(f.w, " (+%d more)", p.ScopeHidden)
+ }
fmt.Fprintln(f.w)
if len(p.Extensions) > 0 {
fmt.Fprintf(f.w, " Extensions: ")
@@ -476,6 +479,9 @@ func (f *Formatter) PrintProfiles(profiles []DevProfile) error {
}
fmt.Fprintf(f.w, "%s (%.0f%%)", e.Ext, e.Pct)
}
+ if p.ExtensionsHidden > 0 {
+ fmt.Fprintf(f.w, " (+%d more)", p.ExtensionsHidden)
+ }
fmt.Fprintln(f.w)
}
// %.3f (not %.2f): labels are assigned at thresholds 0.15 / 0.35
diff --git a/internal/stats/reader.go b/internal/stats/reader.go
index 6c395d8..87fc4ef 100644
--- a/internal/stats/reader.go
+++ b/internal/stats/reader.go
@@ -367,7 +367,19 @@ func streamLoadInto(ds *Dataset, r io.Reader, opt LoadOptions, pathPrefix string
cm := ds.commits[cf.Commit]
if cm != nil {
- fe.devLines[cm.email] += cf.Additions + cf.Deletions
+ // Only record a devLines entry when the change actually
+ // carried lines. Pure renames (R100 with 0/0 numstat)
+ // would otherwise create a zero-valued map entry that
+ // survives as "dev touched this file" into every
+ // downstream consumer — bus factor, unique-dev counts,
+ // dev network, and DevProfile authored counts — inflating
+ // them with contributions that are not authored work.
+ // devCommits still increments unconditionally so the
+ // "appeared on this file" signal stays available for
+ // callers that want it.
+ if lines := cf.Additions + cf.Deletions; lines > 0 {
+ fe.devLines[cm.email] += lines
+ }
fe.devCommits[cm.email]++
// Contributor files touched
diff --git a/internal/stats/stats.go b/internal/stats/stats.go
index 1e85bf2..8f9b9e6 100644
--- a/internal/stats/stats.go
+++ b/internal/stats/stats.go
@@ -1264,7 +1264,13 @@ type DevProfile struct {
LastDate string
TopFiles []DevFileContrib
Scope []DirScope
- Extensions []DevExtContrib
+ // ScopeHidden / ExtensionsHidden count the buckets dropped by the
+ // top-5 truncation so CLI and HTML can surface "+N more" — without
+ // this, a reader sees Pct summing to e.g. 85% and wonders if the
+ // math is broken. Zero when the full set fits in 5.
+ ScopeHidden int
+ Extensions []DevExtContrib
+ ExtensionsHidden int
Specialization float64 // Gini over dir file-count distribution: 0 = broad generalist, 1 = single-dir specialist
ContribRatio float64 // del/add — 0=growth, ~1=rewrite, >1=cleanup
ContribType string // "growth", "balanced", "refactor"
@@ -1554,11 +1560,24 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile {
dirCount[dir]++
}
}
+ // Pct denominator is the count of files the dev authored lines
+ // on (len(devFiles[email])), NOT cs.FilesTouched. The latter
+ // includes pure renames (path appears in the dev's change set
+ // but with zero additions/deletions), which would never appear
+ // in dirCount/extCount numerators — creating a silent
+ // under-100% sum. Using len(devFiles[email]) keeps Pct summing
+ // to 100% (modulo top-5 truncation) and aligns the visible
+ // numbers with the Herfindahl specialization index below,
+ // which also operates on dirCount.
+ authored := 0
+ if files, ok := devFiles[email]; ok {
+ authored = len(files)
+ }
var scope []DirScope
for dir, count := range dirCount {
pct := 0.0
- if cs.FilesTouched > 0 {
- pct = math.Round(float64(count)/float64(cs.FilesTouched)*1000) / 10
+ if authored > 0 {
+ pct = math.Round(float64(count)/float64(authored)*1000) / 10
}
scope = append(scope, DirScope{Dir: dir, Files: count, Pct: pct})
}
@@ -1579,7 +1598,9 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile {
specValues = append(specValues, count)
}
specialization := herfindahl(specValues)
+ scopeHidden := 0
if len(scope) > 5 {
+ scopeHidden = len(scope) - 5
scope = scope[:5]
}
@@ -1610,8 +1631,11 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile {
var extensions []DevExtContrib
for ext, acc := range extCount {
pct := 0.0
- if cs.FilesTouched > 0 {
- pct = math.Round(float64(acc.files)/float64(cs.FilesTouched)*1000) / 10
+ // Same denominator as Scope above — authored-file count,
+ // not FilesTouched — so pure renames don't deflate the
+ // percentages silently.
+ if authored > 0 {
+ pct = math.Round(float64(acc.files)/float64(authored)*1000) / 10
}
extensions = append(extensions, DevExtContrib{
Ext: ext, Files: acc.files, Churn: acc.churn, Pct: pct,
@@ -1633,7 +1657,9 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile {
}
return extensions[i].Ext < extensions[j].Ext
})
+ extensionsHidden := 0
if len(extensions) > 5 {
+ extensionsHidden = len(extensions) - 5
extensions = extensions[:5]
}
@@ -1683,7 +1709,9 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile {
Commits: cs.Commits, Additions: cs.Additions, Deletions: cs.Deletions,
LinesChanged: cs.Additions + cs.Deletions, FilesTouched: cs.FilesTouched,
ActiveDays: cs.ActiveDays, FirstDate: cs.FirstDate, LastDate: cs.LastDate,
- TopFiles: topFiles, Scope: scope, Extensions: extensions, Specialization: specialization,
+ TopFiles: topFiles, Scope: scope, ScopeHidden: scopeHidden,
+ Extensions: extensions, ExtensionsHidden: extensionsHidden,
+ Specialization: specialization,
ContribRatio: contribRatio, ContribType: contribType,
Pace: pace, Collaborators: collabs,
MonthlyActivity: monthly, WorkGrid: grid, WeekendPct: wpct,