Fix attribution inflation from intermediate commits#812
Open
peyton-alt wants to merge 5 commits intomainfrom
Open
Fix attribution inflation from intermediate commits#812peyton-alt wants to merge 5 commits intomainfrom
peyton-alt wants to merge 5 commits intomainfrom
Conversation
Entire-Checkpoint: 079c1c0e0eeb
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses attribution “inflation” caused by counting non-agent file changes from intermediate commits by switching non-agent file detection to prefer a per-commit diff base (first parent → HEAD) when available.
Changes:
- Plumbs
parentCommitHashfrom the post-commit hook into condensation/attribution calculation. - Updates attribution logic to prefer
parentCommitHash→headCommitHashfor enumerating non-agent changed files (falling back toattributionBaseCommit→headCommitHash). - Updates unit tests to match the new
CalculateAttributionWithAccumulatedsignature.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
cmd/entire/cli/strategy/manual_commit_hooks.go |
Captures HEAD’s first parent hash during post-commit handling and passes it into condensation options. |
cmd/entire/cli/strategy/manual_commit_condensation.go |
Adds parentCommitHash to condensation/attribution option structs and threads it through to attribution calculation. |
cmd/entire/cli/strategy/manual_commit_attribution.go |
Prefers parentCommitHash as the diff base for non-agent changed-file enumeration. |
cmd/entire/cli/strategy/manual_commit_attribution_test.go |
Updates calls to CalculateAttributionWithAccumulated for the new parameter list. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Non-agent file line counting inconsistent with file scoping
- Threaded parentTree through condenseOpts/attributionOpts/CalculateAttributionWithAccumulated and used it instead of baseTree for non-agent file line counting, so diffs are now parent→head (consistent with file scoping) instead of session-base→head.
- ✅ Fixed: Duplicated parent commit hash computation in handlers
- Precomputed parentCommitHash alongside parentTree in PostCommit, stored it on the postCommitActionHandler struct, and replaced the duplicated 6-line blocks in HandleCondense and HandleCondenseIfFilesTouched with the precomputed field.
Or push these changes by commenting:
@cursor push 6cd3ef87a2
Preview (6cd3ef87a2)
diff --git a/cmd/entire/cli/strategy/manual_commit_attribution.go b/cmd/entire/cli/strategy/manual_commit_attribution.go
--- a/cmd/entire/cli/strategy/manual_commit_attribution.go
+++ b/cmd/entire/cli/strategy/manual_commit_attribution.go
@@ -185,6 +185,11 @@
// For initial commits (no parent), falls back to attributionBaseCommit→headCommitHash.
// When hashes are empty, falls back to go-git tree walk.
//
+// parentTree is the tree of the parent commit (nil for initial commits). When provided
+// alongside parentCommitHash, non-agent file line counting uses parentTree instead of
+// baseTree so that only THIS commit's changes are counted (consistent with the file
+// scoping from parentCommitHash→headCommitHash).
+//
// Note: Binary files (detected by null bytes) are silently excluded from attribution
// calculations since line-based diffing only applies to text files.
//
@@ -200,6 +205,7 @@
parentCommitHash string,
attributionBaseCommit string,
headCommitHash string,
+ parentTree *object.Tree,
) *checkpoint.InitialAttribution {
if len(filesTouched) == 0 {
return nil
@@ -253,7 +259,13 @@
if diffBaseCommit == "" {
diffBaseCommit = attributionBaseCommit
}
- allChangedFiles, err := getAllChangedFiles(ctx, baseTree, headTree, repoDir, diffBaseCommit, headCommitHash)
+ // Use parentTree for line counting when available (consistent with file scoping).
+ // For initial commits, fall back to session baseTree.
+ nonAgentDiffTree := parentTree
+ if nonAgentDiffTree == nil {
+ nonAgentDiffTree = baseTree
+ }
+ allChangedFiles, err := getAllChangedFiles(ctx, nonAgentDiffTree, headTree, repoDir, diffBaseCommit, headCommitHash)
if err != nil {
logging.Warn(logging.WithComponent(ctx, "attribution"),
"attribution: failed to enumerate changed files",
@@ -267,9 +279,9 @@
continue // Skip agent-touched files
}
- baseContent := getFileContent(baseTree, filePath)
+ diffBaseContent := getFileContent(nonAgentDiffTree, filePath)
headContent := getFileContent(headTree, filePath)
- _, userAdded, _ := diffLines(baseContent, headContent)
+ _, userAdded, _ := diffLines(diffBaseContent, headContent)
allUserEditsToNonAgentFiles += userAdded
}
diff --git a/cmd/entire/cli/strategy/manual_commit_attribution_test.go b/cmd/entire/cli/strategy/manual_commit_attribution_test.go
--- a/cmd/entire/cli/strategy/manual_commit_attribution_test.go
+++ b/cmd/entire/cli/strategy/manual_commit_attribution_test.go
@@ -281,7 +281,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -338,7 +338,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -395,7 +395,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -444,7 +444,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -496,7 +496,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -550,7 +550,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -619,7 +619,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -662,7 +662,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, []string{}, []PromptAttribution{}, "", "", "", "",
+ baseTree, shadowTree, headTree, []string{}, []PromptAttribution{}, "", "", "", "", nil,
)
if result != nil {
@@ -716,7 +716,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -1021,7 +1021,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -1092,7 +1092,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -1173,7 +1173,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
diff --git a/cmd/entire/cli/strategy/manual_commit_condensation.go b/cmd/entire/cli/strategy/manual_commit_condensation.go
--- a/cmd/entire/cli/strategy/manual_commit_condensation.go
+++ b/cmd/entire/cli/strategy/manual_commit_condensation.go
@@ -89,6 +89,7 @@
type condenseOpts struct {
shadowRef *plumbing.Reference // Pre-resolved shadow branch ref (nil = resolve from repo)
headTree *object.Tree // Pre-resolved HEAD tree (passed through to calculateSessionAttributions)
+ parentTree *object.Tree // Pre-resolved parent tree (nil for initial commits, passed through for attribution)
repoDir string // Repository worktree path for git CLI commands
parentCommitHash string // HEAD's first parent hash for per-commit non-agent file detection
headCommitHash string // HEAD commit hash (passed through for attribution)
@@ -196,6 +197,7 @@
attribution := calculateSessionAttributions(ctx, repo, ref, sessionData, state, attributionOpts{
headTree: o.headTree,
+ parentTree: o.parentTree,
repoDir: o.repoDir,
attributionBaseCommit: attrBase,
parentCommitHash: o.parentCommitHash,
@@ -339,6 +341,7 @@
type attributionOpts struct {
headTree *object.Tree // HEAD commit tree (already resolved by PostCommit)
shadowTree *object.Tree // Shadow branch tree (already resolved by PostCommit)
+ parentTree *object.Tree // Parent commit tree (already resolved by PostCommit, nil for initial commits)
repoDir string // Repository worktree path for git CLI commands
attributionBaseCommit string // Base commit hash for non-agent file detection (empty = fall back to go-git tree walk)
parentCommitHash string // HEAD's first parent hash (preferred diff base for non-agent files)
@@ -455,6 +458,7 @@
o.parentCommitHash,
o.attributionBaseCommit,
o.headCommitHash,
+ o.parentTree,
)
if attribution != nil {
diff --git a/cmd/entire/cli/strategy/manual_commit_hooks.go b/cmd/entire/cli/strategy/manual_commit_hooks.go
--- a/cmd/entire/cli/strategy/manual_commit_hooks.go
+++ b/cmd/entire/cli/strategy/manual_commit_hooks.go
@@ -617,10 +617,11 @@
// Cached git objects — resolved once per PostCommit invocation to avoid
// redundant reads across filesOverlapWithContent, filesWithRemainingAgentChanges,
// CondenseSession, and calculateSessionAttributions.
- headTree *object.Tree // HEAD commit tree (shared across all sessions)
- parentTree *object.Tree // HEAD's first parent tree (shared, nil for initial commits)
- shadowRef *plumbing.Reference // Per-session shadow branch ref (nil if branch doesn't exist)
- shadowTree *object.Tree // Per-session shadow commit tree (nil if branch doesn't exist)
+ headTree *object.Tree // HEAD commit tree (shared across all sessions)
+ parentTree *object.Tree // HEAD's first parent tree (shared, nil for initial commits)
+ parentCommitHash string // HEAD's first parent hash (empty for initial commits)
+ shadowRef *plumbing.Reference // Per-session shadow branch ref (nil if branch doesn't exist)
+ shadowTree *object.Tree // Per-session shadow commit tree (nil if branch doesn't exist)
// Output: set by handler methods, read by caller after TransitionAndLog.
condensed bool
@@ -639,17 +640,12 @@
)
if shouldCondense {
- parentCommitHash := ""
- if h.commit.NumParents() > 0 {
- if parent, err := h.commit.Parent(0); err == nil {
- parentCommitHash = parent.Hash.String()
- }
- }
h.condensed = h.s.condenseAndUpdateState(h.ctx, h.repo, h.checkpointID, state, h.head, h.shadowBranchName, h.shadowBranchesToDelete, h.committedFileSet, condenseOpts{
shadowRef: h.shadowRef,
headTree: h.headTree,
+ parentTree: h.parentTree,
repoDir: h.repoDir,
- parentCommitHash: parentCommitHash,
+ parentCommitHash: h.parentCommitHash,
headCommitHash: h.newHead,
})
} else {
@@ -672,17 +668,12 @@
)
if shouldCondense {
- parentCommitHash := ""
- if h.commit.NumParents() > 0 {
- if parent, err := h.commit.Parent(0); err == nil {
- parentCommitHash = parent.Hash.String()
- }
- }
h.condensed = h.s.condenseAndUpdateState(h.ctx, h.repo, h.checkpointID, state, h.head, h.shadowBranchName, h.shadowBranchesToDelete, h.committedFileSet, condenseOpts{
shadowRef: h.shadowRef,
headTree: h.headTree,
+ parentTree: h.parentTree,
repoDir: h.repoDir,
- parentCommitHash: parentCommitHash,
+ parentCommitHash: h.parentCommitHash,
headCommitHash: h.newHead,
})
} else {
@@ -851,8 +842,10 @@
headTree = t
}
var parentTree *object.Tree
+ var parentCommitHash string
if commit.NumParents() > 0 {
if parent, err := commit.Parent(0); err == nil {
+ parentCommitHash = parent.Hash.String()
if t, err := parent.Tree(); err == nil {
parentTree = t
}
@@ -871,8 +864,8 @@
}
iterCtx, iterSpan := processSessionsLoop.Iteration(loopCtx)
s.postCommitProcessSession(iterCtx, repo, state, &transitionCtx, checkpointID,
- head, commit, newHead, worktreePath, headTree, parentTree, committedFileSet,
- shadowBranchesToDelete, uncondensedActiveOnBranch)
+ head, commit, newHead, worktreePath, headTree, parentTree, parentCommitHash,
+ committedFileSet, shadowBranchesToDelete, uncondensedActiveOnBranch)
iterSpan.End()
}
processSessionsLoop.End()
@@ -917,6 +910,7 @@
newHead string,
repoDir string,
headTree, parentTree *object.Tree,
+ parentCommitHash string,
committedFileSet map[string]struct{},
shadowBranchesToDelete map[string]struct{},
uncondensedActiveOnBranch map[string]bool,
@@ -1008,6 +1002,7 @@
filesTouchedBefore: filesTouchedBefore,
headTree: headTree,
parentTree: parentTree,
+ parentCommitHash: parentCommitHash,
shadowRef: shadowRef,
shadowTree: shadowTree,
}This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
Pre-session dirty files (CLI config files from `entire enable`, leftover changes from previous sessions) were incorrectly counted as human contributions, deflating agent percentage. Root cause: PA1 (first prompt attribution) captures worktree state at session start. This data was used to correct agent line counts (correct) but also added to human contributions (wrong). Fix: - Split prompt attributions into baseline (PA1) and session (PA2+) - PA1 data still subtracted from agent work (correct agent calc) - PA1 contributions excluded from relevantAccumulatedUser - PA1 removals excluded from totalUserRemoved - Include PendingPromptAttribution during condensation for agents that skip SaveStep (e.g., Codex mid-turn commits) - Add .entire/ filter to attribution calc (matches existing PA filter) - Fix wrapcheck lint errors in updateCombinedAttributionForCheckpoint Verified end-to-end: 100% agent with config files committed alongside. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: b0cb4216f6bc
…ibution Checkpoint package changes required by the attribution baseline fix: - PromptAttributionsJSON field on WriteCommittedOptions and CommittedMetadata - UpdateCheckpointSummary method on GitStore for multi-session aggregation - CombinedAttribution field on CheckpointSummary - Preserve existing CombinedAttribution during summary rewrites Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: b8963737336c
…arentCommitHash Fixes all 4 issues from Copilot and Cursor Bugbot review: 1. Precompute parentCommitHash on postCommitActionHandler struct using ParentHashes[0] (avoids extra object read, no silent error) 2. Remove duplicated 6-line parentCommitHash computation from HandleCondense and HandleCondenseIfFilesTouched 3. Thread parentTree through condenseOpts/attributionOpts and use it for non-agent file line counting — ensures diffLines uses parent→HEAD (consistent with parentCommitHash file scoping) instead of sessionBase→HEAD which over-counted intermediate commit changes 4. Add ParentTreeForNonAgentLines test proving the fix (TDD verified: HumanAdded=8 without fix → HumanAdded=3 with fix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 12f5c4373467
Contributor
Author
|
@BugBot review |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Fix attribution inflation from intermediate commits and pre-session worktree dirt
Fixes two related bugs that inflated human contribution numbers in the attribution calculation,
causing agent percentages to be incorrectly deflated.
Bug 1: Intermediate commit inflation
In multi-commit sessions (user commits multiple times during a session), non-agent file diffs were
computed as sessionBase → HEAD, which counted changes from all intermediate commits, not just the
current one. For example, if a user edited readme.md across 3 commits, the final attribution would count
all edits as if they happened in the last commit.
Fix: Prefer parentCommit → HEAD for non-agent file diffing when a parent commit exists. This scopes the
diff to only the current commit. For initial commits (no parent), falls back to the original sessionBase
→ HEAD behavior.
attributionOpts → CalculateAttributionWithAccumulated
baseTree when available
Bug 2: Pre-session worktree dirt counted as human work
PA1 (the first prompt attribution, CheckpointNumber <= 1) captures worktree state at session start —
including files already dirty before the agent began (CLI config files from entire enable, leftover
changes from previous sessions). These were being counted as human contributions, deflating the agent
percentage.
Fix: Split prompt attributions into baseline (PA1) and session (PA2+):
this was already correct)
Additional fixes:
SaveStep (e.g., Codex)
Note
Medium Risk
Touches attribution math and checkpoint metadata writes; incorrect handling could skew reported agent/human contribution metrics or add extra metadata-branch commits.
Overview
Fixes attribution inflation in multi-commit/manual-commit flows by scoping non-agent file diffs to only the current commit (using parent commit/tree when available) and by excluding pre-session worktree dirt captured in PA1 from human contribution totals.
Persists raw
prompt_attributionsJSON into per-session checkpoint metadata for diagnostics, and adds an aggregatedcombined_attributionfield on the rootCheckpointSummarythat is recomputed post-commit across sessions and written back via a newGitStore.UpdateCheckpointSummarypath.Written by Cursor Bugbot for commit 7f17a37. Configure here.