Skip to content

Fix attribution inflation from intermediate commits#812

Open
peyton-alt wants to merge 5 commits intomainfrom
fix/attribution-intermediate-commit-inflation
Open

Fix attribution inflation from intermediate commits#812
peyton-alt wants to merge 5 commits intomainfrom
fix/attribution-intermediate-commit-inflation

Conversation

@peyton-alt
Copy link
Copy Markdown
Contributor

@peyton-alt peyton-alt commented Mar 31, 2026

Fix attribution inflation from intermediate commits and pre-session worktree dirt

Fixes two related bugs that inflated human contribution numbers in the attribution calculation,
causing agent percentages to be incorrectly deflated.

Bug 1: Intermediate commit inflation

In multi-commit sessions (user commits multiple times during a session), non-agent file diffs were
computed as sessionBase → HEAD, which counted changes from all intermediate commits, not just the
current one. For example, if a user edited readme.md across 3 commits, the final attribution would count
all edits as if they happened in the last commit.

Fix: Prefer parentCommit → HEAD for non-agent file diffing when a parent commit exists. This scopes the
diff to only the current commit. For initial commits (no parent), falls back to the original sessionBase
→ HEAD behavior.

  • parentCommitHash and parentTree are precomputed once in PostCommit and threaded through condenseOpts →
    attributionOpts → CalculateAttributionWithAccumulated
  • Non-agent file line counting in CalculateAttributionWithAccumulated now uses parentTree instead of
    baseTree when available

Bug 2: Pre-session worktree dirt counted as human work

PA1 (the first prompt attribution, CheckpointNumber <= 1) captures worktree state at session start —
including files already dirty before the agent began (CLI config files from entire enable, leftover
changes from previous sessions). These were being counted as human contributions, deflating the agent
percentage.

Fix: Split prompt attributions into baseline (PA1) and session (PA2+):

  • PA1 data is still used for agent line correction (subtracting pre-existing lines from agent totals —
    this was already correct)
  • PA1 contributions are now excluded from relevantAccumulatedUser and totalUserRemoved
  • Only PA2+ represents genuine human edits during the session

Additional fixes:

  • Include PendingPromptAttribution during condensation for agents that commit mid-turn without calling
    SaveStep (e.g., Codex)
  • Filter .entire/ paths from non-agent file enumeration (matches existing PA filter)

Note

Medium Risk
Touches attribution math and checkpoint metadata writes; incorrect handling could skew reported agent/human contribution metrics or add extra metadata-branch commits.

Overview
Fixes attribution inflation in multi-commit/manual-commit flows by scoping non-agent file diffs to only the current commit (using parent commit/tree when available) and by excluding pre-session worktree dirt captured in PA1 from human contribution totals.

Persists raw prompt_attributions JSON into per-session checkpoint metadata for diagnostics, and adds an aggregated combined_attribution field on the root CheckpointSummary that is recomputed post-commit across sessions and written back via a new GitStore.UpdateCheckpointSummary path.

Written by Cursor Bugbot for commit 7f17a37. Configure here.

Copilot AI review requested due to automatic review settings March 31, 2026 01:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses attribution “inflation” caused by counting non-agent file changes from intermediate commits by switching non-agent file detection to prefer a per-commit diff base (first parent → HEAD) when available.

Changes:

  • Plumbs parentCommitHash from the post-commit hook into condensation/attribution calculation.
  • Updates attribution logic to prefer parentCommitHash→headCommitHash for enumerating non-agent changed files (falling back to attributionBaseCommit→headCommitHash).
  • Updates unit tests to match the new CalculateAttributionWithAccumulated signature.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
cmd/entire/cli/strategy/manual_commit_hooks.go Captures HEAD’s first parent hash during post-commit handling and passes it into condensation options.
cmd/entire/cli/strategy/manual_commit_condensation.go Adds parentCommitHash to condensation/attribution option structs and threads it through to attribution calculation.
cmd/entire/cli/strategy/manual_commit_attribution.go Prefers parentCommitHash as the diff base for non-agent changed-file enumeration.
cmd/entire/cli/strategy/manual_commit_attribution_test.go Updates calls to CalculateAttributionWithAccumulated for the new parameter list.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Non-agent file line counting inconsistent with file scoping
    • Threaded parentTree through condenseOpts/attributionOpts/CalculateAttributionWithAccumulated and used it instead of baseTree for non-agent file line counting, so diffs are now parent→head (consistent with file scoping) instead of session-base→head.
  • ✅ Fixed: Duplicated parent commit hash computation in handlers
    • Precomputed parentCommitHash alongside parentTree in PostCommit, stored it on the postCommitActionHandler struct, and replaced the duplicated 6-line blocks in HandleCondense and HandleCondenseIfFilesTouched with the precomputed field.

Create PR

Or push these changes by commenting:

@cursor push 6cd3ef87a2
Preview (6cd3ef87a2)
diff --git a/cmd/entire/cli/strategy/manual_commit_attribution.go b/cmd/entire/cli/strategy/manual_commit_attribution.go
--- a/cmd/entire/cli/strategy/manual_commit_attribution.go
+++ b/cmd/entire/cli/strategy/manual_commit_attribution.go
@@ -185,6 +185,11 @@
 // For initial commits (no parent), falls back to attributionBaseCommit→headCommitHash.
 // When hashes are empty, falls back to go-git tree walk.
 //
+// parentTree is the tree of the parent commit (nil for initial commits). When provided
+// alongside parentCommitHash, non-agent file line counting uses parentTree instead of
+// baseTree so that only THIS commit's changes are counted (consistent with the file
+// scoping from parentCommitHash→headCommitHash).
+//
 // Note: Binary files (detected by null bytes) are silently excluded from attribution
 // calculations since line-based diffing only applies to text files.
 //
@@ -200,6 +205,7 @@
 	parentCommitHash string,
 	attributionBaseCommit string,
 	headCommitHash string,
+	parentTree *object.Tree,
 ) *checkpoint.InitialAttribution {
 	if len(filesTouched) == 0 {
 		return nil
@@ -253,7 +259,13 @@
 	if diffBaseCommit == "" {
 		diffBaseCommit = attributionBaseCommit
 	}
-	allChangedFiles, err := getAllChangedFiles(ctx, baseTree, headTree, repoDir, diffBaseCommit, headCommitHash)
+	// Use parentTree for line counting when available (consistent with file scoping).
+	// For initial commits, fall back to session baseTree.
+	nonAgentDiffTree := parentTree
+	if nonAgentDiffTree == nil {
+		nonAgentDiffTree = baseTree
+	}
+	allChangedFiles, err := getAllChangedFiles(ctx, nonAgentDiffTree, headTree, repoDir, diffBaseCommit, headCommitHash)
 	if err != nil {
 		logging.Warn(logging.WithComponent(ctx, "attribution"),
 			"attribution: failed to enumerate changed files",
@@ -267,9 +279,9 @@
 			continue // Skip agent-touched files
 		}
 
-		baseContent := getFileContent(baseTree, filePath)
+		diffBaseContent := getFileContent(nonAgentDiffTree, filePath)
 		headContent := getFileContent(headTree, filePath)
-		_, userAdded, _ := diffLines(baseContent, headContent)
+		_, userAdded, _ := diffLines(diffBaseContent, headContent)
 		allUserEditsToNonAgentFiles += userAdded
 	}
 

diff --git a/cmd/entire/cli/strategy/manual_commit_attribution_test.go b/cmd/entire/cli/strategy/manual_commit_attribution_test.go
--- a/cmd/entire/cli/strategy/manual_commit_attribution_test.go
+++ b/cmd/entire/cli/strategy/manual_commit_attribution_test.go
@@ -281,7 +281,7 @@
 
 	result := CalculateAttributionWithAccumulated(
 		context.Background(),
-		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
 	)
 
 	require.NotNil(t, result, "expected non-nil result")
@@ -338,7 +338,7 @@
 
 	result := CalculateAttributionWithAccumulated(
 		context.Background(),
-		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
 	)
 
 	require.NotNil(t, result, "expected non-nil result")
@@ -395,7 +395,7 @@
 
 	result := CalculateAttributionWithAccumulated(
 		context.Background(),
-		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
 	)
 
 	require.NotNil(t, result, "expected non-nil result")
@@ -444,7 +444,7 @@
 
 	result := CalculateAttributionWithAccumulated(
 		context.Background(),
-		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
 	)
 
 	require.NotNil(t, result, "expected non-nil result")
@@ -496,7 +496,7 @@
 
 	result := CalculateAttributionWithAccumulated(
 		context.Background(),
-		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
 	)
 
 	require.NotNil(t, result, "expected non-nil result")
@@ -550,7 +550,7 @@
 
 	result := CalculateAttributionWithAccumulated(
 		context.Background(),
-		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
 	)
 
 	require.NotNil(t, result, "expected non-nil result")
@@ -619,7 +619,7 @@
 
 	result := CalculateAttributionWithAccumulated(
 		context.Background(),
-		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
 	)
 
 	require.NotNil(t, result, "expected non-nil result")
@@ -662,7 +662,7 @@
 
 	result := CalculateAttributionWithAccumulated(
 		context.Background(),
-		baseTree, shadowTree, headTree, []string{}, []PromptAttribution{}, "", "", "", "",
+		baseTree, shadowTree, headTree, []string{}, []PromptAttribution{}, "", "", "", "", nil,
 	)
 
 	if result != nil {
@@ -716,7 +716,7 @@
 
 	result := CalculateAttributionWithAccumulated(
 		context.Background(),
-		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
 	)
 
 	require.NotNil(t, result, "expected non-nil result")
@@ -1021,7 +1021,7 @@
 
 	result := CalculateAttributionWithAccumulated(
 		context.Background(),
-		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
 	)
 
 	require.NotNil(t, result, "expected non-nil result")
@@ -1092,7 +1092,7 @@
 
 	result := CalculateAttributionWithAccumulated(
 		context.Background(),
-		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
 	)
 
 	require.NotNil(t, result, "expected non-nil result")
@@ -1173,7 +1173,7 @@
 
 	result := CalculateAttributionWithAccumulated(
 		context.Background(),
-		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+		baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
 	)
 
 	require.NotNil(t, result, "expected non-nil result")

diff --git a/cmd/entire/cli/strategy/manual_commit_condensation.go b/cmd/entire/cli/strategy/manual_commit_condensation.go
--- a/cmd/entire/cli/strategy/manual_commit_condensation.go
+++ b/cmd/entire/cli/strategy/manual_commit_condensation.go
@@ -89,6 +89,7 @@
 type condenseOpts struct {
 	shadowRef        *plumbing.Reference // Pre-resolved shadow branch ref (nil = resolve from repo)
 	headTree         *object.Tree        // Pre-resolved HEAD tree (passed through to calculateSessionAttributions)
+	parentTree       *object.Tree        // Pre-resolved parent tree (nil for initial commits, passed through for attribution)
 	repoDir          string              // Repository worktree path for git CLI commands
 	parentCommitHash string              // HEAD's first parent hash for per-commit non-agent file detection
 	headCommitHash   string              // HEAD commit hash (passed through for attribution)
@@ -196,6 +197,7 @@
 
 	attribution := calculateSessionAttributions(ctx, repo, ref, sessionData, state, attributionOpts{
 		headTree:              o.headTree,
+		parentTree:            o.parentTree,
 		repoDir:               o.repoDir,
 		attributionBaseCommit: attrBase,
 		parentCommitHash:      o.parentCommitHash,
@@ -339,6 +341,7 @@
 type attributionOpts struct {
 	headTree              *object.Tree // HEAD commit tree (already resolved by PostCommit)
 	shadowTree            *object.Tree // Shadow branch tree (already resolved by PostCommit)
+	parentTree            *object.Tree // Parent commit tree (already resolved by PostCommit, nil for initial commits)
 	repoDir               string       // Repository worktree path for git CLI commands
 	attributionBaseCommit string       // Base commit hash for non-agent file detection (empty = fall back to go-git tree walk)
 	parentCommitHash      string       // HEAD's first parent hash (preferred diff base for non-agent files)
@@ -455,6 +458,7 @@
 		o.parentCommitHash,
 		o.attributionBaseCommit,
 		o.headCommitHash,
+		o.parentTree,
 	)
 
 	if attribution != nil {

diff --git a/cmd/entire/cli/strategy/manual_commit_hooks.go b/cmd/entire/cli/strategy/manual_commit_hooks.go
--- a/cmd/entire/cli/strategy/manual_commit_hooks.go
+++ b/cmd/entire/cli/strategy/manual_commit_hooks.go
@@ -617,10 +617,11 @@
 	// Cached git objects — resolved once per PostCommit invocation to avoid
 	// redundant reads across filesOverlapWithContent, filesWithRemainingAgentChanges,
 	// CondenseSession, and calculateSessionAttributions.
-	headTree   *object.Tree        // HEAD commit tree (shared across all sessions)
-	parentTree *object.Tree        // HEAD's first parent tree (shared, nil for initial commits)
-	shadowRef  *plumbing.Reference // Per-session shadow branch ref (nil if branch doesn't exist)
-	shadowTree *object.Tree        // Per-session shadow commit tree (nil if branch doesn't exist)
+	headTree         *object.Tree        // HEAD commit tree (shared across all sessions)
+	parentTree       *object.Tree        // HEAD's first parent tree (shared, nil for initial commits)
+	parentCommitHash string              // HEAD's first parent hash (empty for initial commits)
+	shadowRef        *plumbing.Reference // Per-session shadow branch ref (nil if branch doesn't exist)
+	shadowTree       *object.Tree        // Per-session shadow commit tree (nil if branch doesn't exist)
 
 	// Output: set by handler methods, read by caller after TransitionAndLog.
 	condensed bool
@@ -639,17 +640,12 @@
 	)
 
 	if shouldCondense {
-		parentCommitHash := ""
-		if h.commit.NumParents() > 0 {
-			if parent, err := h.commit.Parent(0); err == nil {
-				parentCommitHash = parent.Hash.String()
-			}
-		}
 		h.condensed = h.s.condenseAndUpdateState(h.ctx, h.repo, h.checkpointID, state, h.head, h.shadowBranchName, h.shadowBranchesToDelete, h.committedFileSet, condenseOpts{
 			shadowRef:        h.shadowRef,
 			headTree:         h.headTree,
+			parentTree:       h.parentTree,
 			repoDir:          h.repoDir,
-			parentCommitHash: parentCommitHash,
+			parentCommitHash: h.parentCommitHash,
 			headCommitHash:   h.newHead,
 		})
 	} else {
@@ -672,17 +668,12 @@
 	)
 
 	if shouldCondense {
-		parentCommitHash := ""
-		if h.commit.NumParents() > 0 {
-			if parent, err := h.commit.Parent(0); err == nil {
-				parentCommitHash = parent.Hash.String()
-			}
-		}
 		h.condensed = h.s.condenseAndUpdateState(h.ctx, h.repo, h.checkpointID, state, h.head, h.shadowBranchName, h.shadowBranchesToDelete, h.committedFileSet, condenseOpts{
 			shadowRef:        h.shadowRef,
 			headTree:         h.headTree,
+			parentTree:       h.parentTree,
 			repoDir:          h.repoDir,
-			parentCommitHash: parentCommitHash,
+			parentCommitHash: h.parentCommitHash,
 			headCommitHash:   h.newHead,
 		})
 	} else {
@@ -851,8 +842,10 @@
 		headTree = t
 	}
 	var parentTree *object.Tree
+	var parentCommitHash string
 	if commit.NumParents() > 0 {
 		if parent, err := commit.Parent(0); err == nil {
+			parentCommitHash = parent.Hash.String()
 			if t, err := parent.Tree(); err == nil {
 				parentTree = t
 			}
@@ -871,8 +864,8 @@
 		}
 		iterCtx, iterSpan := processSessionsLoop.Iteration(loopCtx)
 		s.postCommitProcessSession(iterCtx, repo, state, &transitionCtx, checkpointID,
-			head, commit, newHead, worktreePath, headTree, parentTree, committedFileSet,
-			shadowBranchesToDelete, uncondensedActiveOnBranch)
+			head, commit, newHead, worktreePath, headTree, parentTree, parentCommitHash,
+			committedFileSet, shadowBranchesToDelete, uncondensedActiveOnBranch)
 		iterSpan.End()
 	}
 	processSessionsLoop.End()
@@ -917,6 +910,7 @@
 	newHead string,
 	repoDir string,
 	headTree, parentTree *object.Tree,
+	parentCommitHash string,
 	committedFileSet map[string]struct{},
 	shadowBranchesToDelete map[string]struct{},
 	uncondensedActiveOnBranch map[string]bool,
@@ -1008,6 +1002,7 @@
 		filesTouchedBefore:     filesTouchedBefore,
 		headTree:               headTree,
 		parentTree:             parentTree,
+		parentCommitHash:       parentCommitHash,
 		shadowRef:              shadowRef,
 		shadowTree:             shadowTree,
 	}

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

peyton-alt and others added 4 commits March 31, 2026 16:25
Pre-session dirty files (CLI config files from `entire enable`, leftover
changes from previous sessions) were incorrectly counted as human
contributions, deflating agent percentage.

Root cause: PA1 (first prompt attribution) captures worktree state at
session start. This data was used to correct agent line counts (correct)
but also added to human contributions (wrong).

Fix:
- Split prompt attributions into baseline (PA1) and session (PA2+)
- PA1 data still subtracted from agent work (correct agent calc)
- PA1 contributions excluded from relevantAccumulatedUser
- PA1 removals excluded from totalUserRemoved
- Include PendingPromptAttribution during condensation for agents
  that skip SaveStep (e.g., Codex mid-turn commits)
- Add .entire/ filter to attribution calc (matches existing PA filter)
- Fix wrapcheck lint errors in updateCombinedAttributionForCheckpoint

Verified end-to-end: 100% agent with config files committed alongside.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b0cb4216f6bc
…ibution

Checkpoint package changes required by the attribution baseline fix:
- PromptAttributionsJSON field on WriteCommittedOptions and CommittedMetadata
- UpdateCheckpointSummary method on GitStore for multi-session aggregation
- CombinedAttribution field on CheckpointSummary
- Preserve existing CombinedAttribution during summary rewrites

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b8963737336c
…arentCommitHash

Fixes all 4 issues from Copilot and Cursor Bugbot review:

1. Precompute parentCommitHash on postCommitActionHandler struct
   using ParentHashes[0] (avoids extra object read, no silent error)
2. Remove duplicated 6-line parentCommitHash computation from
   HandleCondense and HandleCondenseIfFilesTouched
3. Thread parentTree through condenseOpts/attributionOpts and use it
   for non-agent file line counting — ensures diffLines uses parent→HEAD
   (consistent with parentCommitHash file scoping) instead of
   sessionBase→HEAD which over-counted intermediate commit changes
4. Add ParentTreeForNonAgentLines test proving the fix (TDD verified:
   HumanAdded=8 without fix → HumanAdded=3 with fix)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 12f5c4373467
@peyton-alt
Copy link
Copy Markdown
Contributor Author

@BugBot review

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@peyton-alt peyton-alt marked this pull request as ready for review April 1, 2026 04:17
@peyton-alt peyton-alt requested a review from a team as a code owner April 1, 2026 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants