fix(tests): make cache-lockfile-parity test resilient to leftover deploy artifacts and timestamp drift#1125
Merged
danielmeppiel merged 1 commit intomainfrom May 3, 2026
Merged
Conversation
…loy artifacts and timestamp drift The regression-trap test added in #1116 was failing on main, blocking the v0.12.1 release CI on macOS arm64 build-validate and all 3 OSes' integration tests jobs. Two fixture-level bugs caused false positives: 1. Stale deployed-files state across regimes _reset_install_state removed apm_modules/ + apm.lock.yaml between runs but left the prior run's integration outputs (.github/agents/, .agents/ skills/, ...) on disk. With the lockfile gone, the integrate phase had an empty managed_files set; the leftover files looked like user-authored collisions to BaseIntegrator.check_collision(); they were skipped and never appended to target_paths -- so the new lockfile recorded only the single freshly-written file, not the package's full deployed_files list. Cold-cache (run A, fresh disk) recorded 5 files; warm-cache (B) and no-cache (C) recorded 1. Fix: clone the project tree into a fresh subdir per regime via _clone_project(). Each run starts from pristine filesystem state, which is what 'same input, same output' actually means for this invariant. No hand-rolled cleanup list to keep in sync with new target deploy roots. 2. generated_at always drifts between independent runs The lockfile's generated_at is captured at write time; with no existing lockfile to dedup against (each cloned project starts empty), every run produces a fresh timestamp. The byte-identical assertion was unsatisfiable by construction. Fix: hash the lockfile excluding the generated_at line. The parity contract is about resolution outcome (resolved_commit, content_hash, deployed_files, package_type, ...), not the wall-clock write timestamp. Verified locally with GITHUB_TOKEN=$(gh auth token): uv run --extra dev pytest tests/integration/test_cache_lockfile_parity.py -> 1 passed in 12.73s Lint: uv run --extra dev ruff check src/ tests/ -> All checks passed uv run --extra dev ruff format --check src/ tests/ -> 668 files already formatted No product code changed -- the underlying cache layer was always deterministic; only the fixture was wrong. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a regression in the cache/lockfile parity integration test (tests/integration/test_cache_lockfile_parity.py) that was producing false positives due to leftover on-disk deploy artifacts between runs and unavoidable generated_at timestamp drift.
Changes:
- Update the parity invariant and hashing logic to ignore
generated_atwhen comparing lockfiles across regimes. - Run each cache regime against a freshly cloned copy of the project tree to prevent previously deployed integration outputs from affecting subsequent runs.
- Update module/test docstrings to reflect the revised invariant (content-identical modulo
generated_at).
Show a summary per file
| File | Description |
|---|---|
| tests/integration/test_cache_lockfile_parity.py | Makes the cache-regime parity test resilient by isolating filesystem state per run and excluding generated_at from the lockfile hash. |
Copilot's findings
- Files reviewed: 1/1 changed files
- Comments generated: 2
Comment on lines
85
to
+100
| def _lockfile_sha(project: Path) -> str: | ||
| """Hash the lockfile excluding the `generated_at` line. | ||
|
|
||
| `generated_at` is a wall-clock timestamp captured at write time, so it | ||
| necessarily differs between independent runs. The parity invariant is | ||
| about resolution outcome (resolved_commit, content_hash, deployed_files, | ||
| package_type, ...), not the write timestamp. | ||
| """ | ||
| lock = project / "apm.lock.yaml" | ||
| assert lock.is_file(), "apm.lock.yaml not produced by install" | ||
| return hashlib.sha256(lock.read_bytes()).hexdigest() | ||
| canonical = "\n".join( | ||
| line | ||
| for line in lock.read_text(encoding="utf-8").splitlines() | ||
| if not line.startswith("generated_at:") | ||
| ) | ||
| return hashlib.sha256(canonical.encode("utf-8")).hexdigest() |
| tmp_path: Path, | ||
| ) -> None: | ||
| """A, B, C must produce byte-identical apm.lock.yaml. | ||
| """A, B, C must produce content-identical apm.lock.yaml (modulo `generated_at`). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
Fixes the
tests/integration/test_cache_lockfile_parity.pyregression-trap added in #1116, which has been failing onmainsince merge (run 25287779252 blocks v0.12.1 release CI on macOS arm64 build-validate and integration tests across ubuntu/windows/arm64).Test-only change. No product code touched -- the cache layer was always deterministic; the fixture was wrong.
Root cause
Two fixture-level bugs caused false positives:
1. Stale deployed-files state leaked between regimes
_reset_install_stateremovedapm_modules/+apm.lock.yamlbetween runs but left the prior run's integration outputs (.github/agents/,.agents/skills/, ...) on disk.With the lockfile gone, the integrate phase had an empty
managed_filesset; the leftover files looked like user-authored collisions toBaseIntegrator.check_collision(); they were skipped and never appended totarget_paths-- so the new lockfile recorded only the single freshly-written file, not the package's fulldeployed_fileslist.Cold-cache (run A, fresh disk) recorded 5 files. Warm-cache (B) and no-cache (C) recorded 1. Hence the drift.
2.
generated_atdrifts between independent runsThe lockfile's
generated_atfield is captured at write time. With no existing lockfile to dedupe against (each fresh project starts empty), every run produces a different timestamp. The byte-identical sha256 assertion was unsatisfiable by construction.Fix
Clone the project tree per regime via
_clone_project()instead of trying to surgically reset state. Each run starts from pristine filesystem state, which is what "same input, same output" actually means for the parity invariant. No hand-rolled cleanup list to keep in sync with future target deploy roots (.cursor/,.claude/,.github/hooks/,.agents/, etc).Hash the lockfile excluding
generated_atin_lockfile_sha. The parity contract is about resolution outcome (resolved_commit,content_hash,deployed_files,package_type, ...) -- not the wall-clock write timestamp. The assertion message and intent are unchanged.Updated docstring + test docstring to reflect "content-identical (modulo
generated_at)" rather than "byte-identical".Validation
Locally on macOS arm64:
Lint contract:
How to verify the root cause
Before this PR, running the test on a fresh checkout produces:
diffof the two lockfiles before the fix showed run A had 5 entries underdeployed_files(.agents/skills/style-checker+ 4.github/{agents,instructions,prompts}/*paths) while runs B and C had only.agents/skills/style-checker. After the project-isolation fix, the only remaining difference wasgenerated_at-- which the second part of the fix handles.Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>