feat(audit): default-on integration drift detection#1137
feat(audit): default-on integration drift detection#1137danielmeppiel wants to merge 5 commits intomainfrom
Conversation
- Add _ReadOnlyProjectGuard context manager (utils/guards.py): snapshots stat of protected paths, raises ProtectedPathMutationError on any mutation. Defense-in-depth above the scratch-root remap. - Add CATEGORY_DRIFT + drift() recording method to DiagnosticCollector. - Add drift_count property and _render_drift_group renderer that groups by kind (modified/unintegrated/orphaned) with stable section header for machine consumers. - Tests: 7 unit tests covering happy path, mutation, creation, deletion, missing-tolerated, exception-not-masked, single-file protected path. Refs #1071. Phase A of WIP/drift/06-final-plan.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implements the drift detection feature per WIP/drift/06-final-plan.md (closes #1071 scope alignment with #898). Engine (Phase B): - src/apm_cli/install/drift.py: ReplayConfig, DriftFinding, CheckLogger, CacheMissError, normalization helpers (build-id strip, line endings, BOM), run_replay() (cache-only), diff_scratch_against_project(), text/json/sarif renderers, atexit scratch cleanup. - src/apm_cli/install/services.py: scratch_root kwarg with ensure_path_within defense-in-depth guard for replay isolation. - src/apm_cli/policy/ci_checks.py: _check_drift() wrapper returning (CheckResult, list[DriftFinding]); graceful CacheMissError handling. CLI surface (Phase C): - src/apm_cli/commands/audit.py: --no-drift opt-out flag with mutex against --strip/--file via UsageError. Drift wired into both _audit_ci_gate (--ci) and _audit_content_scan (bare project audit) paths, default-on per ADR-02. JSON/SARIF/text renderers integrated; --no-drift warning gated to text mode (stdout cleanliness). Tests: - tests/unit/install/test_drift.py: 13 unit tests (normalization, diff cases, renderers). - Legacy --ci tests opt out of drift via batch --no-drift injection (fixture parity, not a behavior change). 7597 unit tests pass; lint clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implements the locked test matrix for issue #1071 drift detection. Floor of 43 tests across three new files closes the 'ULTRA HARDENING OF HELL' coverage requirement. New files: - tests/integration/test_drift_check.py (32 tests): * Section A: 9 drift cases (modified/unintegrated/orphaned + CRLF/ BOM/Build-ID false-positive guards) * Section B: 4 past-PR regressions (#1067, #882, #889, source-deleted) * Section C: 7 edges (no/corrupt lockfile, untracked governed, no-write contract, idempotency) * Section D: 3 multi-target (copilot/claude/cursor) * Section E: 9 default-on / --no-drift opt-out (mutex, stderr routing, JSON suppression) - tests/integration/test_drift_check_e2e.py (10 tests): full install->mutate->audit loop with mix_stderr=False, air-gap proof, JSON/SARIF stability, 30s smoke - tests/unit/install/test_drift_perf.py (1 test): 100 primitives replay+diff under 5s Engine fix surfaced by tests: - src/apm_cli/install/drift.py: run_replay now reads apm.yml's target field via parse_target_field and passes it to resolve_targets. Without this, multi-target projects (copilot+claude+cursor) replayed only the auto-detected primary target, falsely reporting secondary target deployments as orphaned. Helper _read_apm_yml_target() added. CI wiring: - scripts/test-integration.sh: two new blocks in run_e2e_tests() invoking the integration + e2e suites before the final success log. Both safe to run without GITHUB_APM_PAT (cache-only, mocked network). Verification: 56 drift-domain tests pass; full repo lint clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- CHANGELOG.md: Added [Unreleased] entry under Added describing the default-on drift detection in apm audit, the three failure modes it catches, false-positive guards, --no-drift opt-out + mutex semantics, and the JSON/SARIF integration shape. Closes #1071, supersedes #898. - docs/src/content/docs/guides/drift-detection.md (NEW, sidebar order 7): Full user-facing guide -- what drift means, how the cache-only replay works (with mermaid diagram), exit-code matrix, when to use --no-drift, output formats, and the CI single-line gate that replaces the legacy git status --porcelain script. - packages/apm-guide/.apm/skills/apm-usage/commands.md: Extended the audit row with --no-drift flag and added a paragraph documenting the drift-by-default behavior, three failure modes, false-positive normalization, and JSON/SARIF integration. Aligns the skill that ships in apm-guide with the new CLI surface (per apm-keep-docs-up-to-date.instructions.md rule 4). - .github/workflows/ci.yml: Annotated Gate B (legacy bash drift check) with a comment marking it redundant once apm-action ships a CLI with default-on drift detection (this PR's release). Kept as defense-in-depth fallback until then. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
APM Review Panel:
|
| Persona | B | R | N | Takeaway |
|---|---|---|---|---|
| Python Architect | 0 | 3 | 2 | Solid replay-engine design with frozen dataclasses and defense-in-depth guards; scratch_root kwarg is a pragmatic seam but warrants a follow-up to formalize as a deploy-target factory. |
| CLI Logging Expert | 0 | 1 | 3 | Drift output follows ASCII/stderr conventions well; missing actionable fix hint violates 'Include the fix' rule; verbose scratch-path disclosure absent. |
| DevX UX Expert | 0 | 2 | 2 | Missing recovery hint in drift output and missing cli-commands.md update; otherwise the surface design is sound. |
| Supply Chain Security Expert | 0 | 3 | 2 | Cache-integrity verification absent from replay path; scratch containment is sound; air-gap test blocks subprocesses but not socket-level egress; atexit cleanup is SIGTERM-fragile on shared CI runners. |
| OSS Growth Hacker | 0 | 3 | 2 | Strong launch-beat story; needs cross-links and a 60-second demo path to avoid the new guide being an island. |
| Doc Writer | 0 | 5 | 2 | Drift guide is well-written; ship after fixing two stale companion docs and one self-contradictory section. CHANGELOG bullet violates Keep-a-Changelog one-line rule. |
| Test Coverage Expert | 0 | 2 | 0 | 42 integration+e2e tests cover all three drift kinds, no-write contract, air-gap, SARIF/JSON stability, and --no-drift mutex. Two gaps: no test that normalization guards do NOT mask real drift, and CacheMissError path untested at unit tier. |
B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.
Top 5 follow-ups
- [DevX UX Expert] Add recovery hint to drift output: 'Run: apm install' after each drift group -- Two-persona convergence (cli-logging + devx-ux). Violates 'Include the fix' rule. One-line fix that converts a diagnostic into a product moment. Should land in-PR.
- [Doc Writer] Update cli-commands.md (--no-drift flag), ci-cd.md (remove obsolete bash workaround), and quick-start.md (stale cross-ref) -- Three stale doc surfaces contradict the new behavior. Shipping without sync means day-one user confusion. Cheap fixes, same PR.
- [Supply Chain Security Expert] Track follow-up issue: verify cache content matches lockfile resolved_commit before replay trusts it -- Shared CI caches and mounted volumes are a real poisoning vector. Current defense (prior-install trust) is pragmatic for v1 but the integrity primitive (commit-SHA pinning) is bypassed at replay time.
- [Test Coverage Expert] Add inverse-normalization unit test asserting real content drift is NOT suppressed by BOM/CRLF/Build-ID guards -- Evidence outcome=missing on a secure-by-default safety invariant. A normalization bug could silently mask real drift with no automated guard catching the regression.
- [OSS Growth Hacker] Add inbound cross-links from ci-policy-setup, governance-guide, and making-the-case to the new drift-detection guide -- Guide currently has zero internal-link equity -- it's discoverable only via sidebar. Cross-links from high-traffic pages drive adoption of the new capability.
Architecture
classDiagram
direction LR
class ReplayConfig {
<<ValueObject>>
+project_root Path
+lockfile_path Path
+targets frozenset~str~ | None
+cache_only bool
}
class DriftFinding {
<<ValueObject>>
+path str
+kind str
+package str
+inline_diff str
}
class CacheMissError {
<<Exception>>
}
class CheckLogger {
<<Base+Subclass>>
+replay_start()
+diff_start()
+replay_complete(n)
+clean()
+findings(n)
}
class CommandLogger {
<<Base>>
+verbose bool
+error(msg)
+warning(msg)
+success(msg)
}
class _ReadOnlyProjectGuard {
<<ContextManager>>
+project_root Path
+protected_roots list~Path~
+__enter__()
+__exit__()
}
class ProtectedPathMutationError {
<<Exception>>
}
class DiagnosticCollector {
<<Collect-then-render>>
+drift(msg, path)
+drift_count int
}
class integrate_package_primitives {
<<IOBoundary>>
+scratch_root Path | None
}
CheckLogger --|> CommandLogger : extends
_ReadOnlyProjectGuard ..> ProtectedPathMutationError : raises
class ReplayConfig:::touched
class DriftFinding:::touched
class CacheMissError:::touched
class CheckLogger:::touched
class _ReadOnlyProjectGuard:::touched
class ProtectedPathMutationError:::touched
class integrate_package_primitives:::touched
classDef touched fill:#fff3b0,stroke:#d47600
flowchart TD
A["apm audit (commands/audit.py)"] --> B{--ci flag?}
B -- yes --> C["_audit_ci_gate()"]
B -- no --> D["_audit_content_scan()"]
C --> E{"--no-drift?"}
D --> F{"--no-drift AND no --file/--strip/--package?"}
E -- no --> G["[NET] _check_drift(project_root, lockfile)"]
F -- no --> G
E -- yes --> H["skip drift, warn on stderr"]
F -- yes --> H
G --> I["run_replay(ReplayConfig, CheckLogger)"]
I --> J["[FS] _make_scratch_root() tempfile.mkdtemp + atexit"]
J --> K["_assert_scratch_bound(project, scratch)"]
K --> L{"for lock_dep in lockfile"}
L --> M["[FS] _materialize_install_path() resolve cache path"]
M --> N["_build_package_info() loads apm.yml if present"]
N --> O["[FS] integrate_package_primitives(scratch_root=scratch_root)"]
O --> P["[FS] ensure_path_within(project_root, scratch_root)"]
P --> L
L -- done --> Q["diff_scratch_against_project()"]
Q --> R["[FS] _walk_managed(scratch, governed_roots)"]
Q --> S["[FS] _walk_managed(project, governed_roots)"]
R --> T["_normalize() on each file pair (BOM + CRLF + Build-ID strip)"]
S --> T
T --> U["emit list~DriftFinding~"]
U --> V{"format?"}
V -- text --> W["render_drift_text()"]
V -- json --> X["render_drift_json()"]
V -- sarif --> Y["render_drift_sarif()"]
sequenceDiagram
participant User
participant AuditCmd as commands/audit.py
participant CIChecks as policy/ci_checks.py
participant Drift as install/drift.py
participant Services as install/services.py
participant FS as Filesystem
User->>AuditCmd: apm audit [--ci]
AuditCmd->>CIChecks: _check_drift(project_root, lockfile)
CIChecks->>Drift: run_replay(ReplayConfig, CheckLogger)
Drift->>FS: mkdtemp (scratch dir)
Drift->>Drift: _assert_scratch_bound(project, scratch)
loop each locked dependency
Drift->>FS: _materialize_install_path (cache lookup)
Drift->>Services: integrate_package_primitives(scratch_root=scratch)
Services->>Services: ensure_path_within(project_root, scratch_root)
Services->>FS: write primitives to scratch
end
CIChecks->>Drift: diff_scratch_against_project(scratch, project, lockfile)
Drift->>FS: _walk_managed(scratch) + _walk_managed(project)
Drift->>Drift: _normalize() each pair
Drift-->>CIChecks: list[DriftFinding]
CIChecks-->>AuditCmd: (CheckResult, findings)
AuditCmd->>User: render_drift(findings, format)
Recommendation
Land the recovery hint and doc-sync fixes in this PR (both are trivial, same-day work), then merge. Track cache-integrity verification and inverse-normalization test as follow-up issues. The feature is architecturally sound, the test suite is strong (43 new tests), and the positioning value is high. No reason to delay beyond the in-PR polish pass.
Full per-persona findings
Python Architect
- [recommended] scratch_root kwarg on integrate_package_primitives leaks test-double / audit concerns into the production API surface at
src/apm_cli/install/services.py:96
Adding scratch_root to the production install function's signature couples a drift-audit implementation detail into the core pipeline. Cleaner seam: DeployTarget protocol injected by callers. - [recommended] Normalization helpers are drift.py-private but will be needed by compile, install verify, and future audit modes at
src/apm_cli/install/drift.py:96
_strip_build_id, _normalize_line_endings, _strip_bom belong in utils/normalization.py with a public API. - [recommended] _read_apm_yml_target couples drift module to manifest format at
src/apm_cli/install/drift.py:334
Tight coupling to apm.yml schema; pragmatic today, but a 'load project config' helper is the right seam. - [nit] render_drift_json return type annotation could be more precise
TypedDict or dict[str, list[dict[str, str]]] gives downstream callers type-safe access. - [nit] atexit.register may leave temp dirs on SIGKILL
Use tempfile.TemporaryDirectory context manager pattern for both atexit AND explicit cleanup.
CLI Logging Expert
- [recommended] render_drift_text and _render_drift_group omit an actionable remediation hint telling the user how to fix detected drift at
src/apm_cli/install/drift.py:640
Message Writing Rule Add ARM64 Linux support to CI/CD pipeline #4 'Include the fix'. Drift findings show WHAT but never HOW. Append 'Run apm install to re-sync deployed files with the lockfile.' - [nit] CheckLogger emits stderr progress unconditionally -- no quiet-mode or non-TTY suppression path
- [nit] _inline_diff_for never produces an actual diff for sub-cap files; docstring is misleading
- [nit] Verbose mode discloses tracemalloc peak but not scratch tmpdir path -- agent debugging is harder
DevX UX Expert
- [recommended] render_drift_text never tells the user HOW to fix drift -- the single remediation command is missing at
src/apm_cli/install/drift.py
Failure mode is the product. npm audit prints 'run npm audit fix'; APM should print 'Run: apm install'. - [recommended] docs/src/content/docs/reference/cli-commands.md not updated -- PR is incomplete per doc-sync Rule 4 at
docs/src/content/docs/reference/cli-commands.md
Persona scope: 'If a CLI change is not reflected in cli-commands.md in the same PR, that change is incomplete by definition.' --no-drift not in flags table. - [nit] --no-drift mutex error could name the reason more concretely
- [nit] CHANGELOG entry is a single monolithic paragraph -- harder to scan than bullet points
Supply Chain Security Expert
- [recommended] Drift replay trusts apm_modules cache without verifying content matches lockfile resolved_commit at
src/apm_cli/install/drift.py:231
_materialize_install_path checks only EXISTS, never that content/git HEAD matches lockfile.resolved_commit. A poisoned cache (CI runners with shared caches, mounted volumes) would produce false-clean. Security model says commit-SHA pinning is the integrity primitive; replay bypasses it. - [recommended] Air-gap test blocks subprocess binaries (gh/curl/wget) but not socket-level Python network calls at
tests/integration/test_drift_check_e2e.py:84
Patches subprocess.run/Popen but not socket.socket. Future requests/urllib egress would not be caught. - [recommended] atexit-based scratch cleanup is not SIGTERM/SIGKILL safe; sensitive deployed content may persist on shared CI runners at
src/apm_cli/install/drift.py:142
Python atexit doesn't fire on SIGKILL; CI cancellation = SIGTERM+SIGKILL. Scratch /tmp persists with replayed primitives. - [nit] inline_diff field could leak secrets if expanded -- add SECURITY comment
- [nit] Positive finding: drift replay correctly avoids credential paths (zero token references; NotImplementedError on cache_only=False)
OSS Growth Hacker
- [recommended] Drift-detection guide is an island -- no inbound links from ci-policy-setup, governance-guide, or making-the-case
Guide links OUT but nothing links IN. Zero internal-link equity. - [recommended] CHANGELOG entry buries the hook in implementation detail -- rewrite lead sentence as the one-liner users can repost
- [recommended] No 60-second demo path for adopters -- the guide explains but never invites the reader to try
- [nit] README 'Content security' bullet mentions audit but not drift -- missed hook
- [nit] Guide does not name the comparable ecosystem tools (npm audit, terraform plan -detailed-exitcode)
Auth Expert -- inactive
PR does not touch any auth-relevant file; drift replay is cache-only with no downloader/clone/token surface -- verified via grep + drift.py read. Network-enabled replay is explicitly gated by NotImplementedError.
Doc Writer
- [recommended] integrations/ci-cd.md still documents the bash git-status workaround that this PR explicitly supersedes at
docs/src/content/docs/integrations/ci-cd.md:61
Lines 61-79 still recommend the legacy pattern under 'Verify Deployed Primitives'; 'We dogfood this' callout still cites the old ci.yml. Most important doc-sync gap on the PR. - [recommended] reference/cli-commands.md does not document --no-drift and still describes --ci as 7-check-only at
docs/src/content/docs/reference/cli-commands.md:462 - [recommended] quick-start.md sends new users to the now-obsolete drift section in ci-cd.md at
docs/src/content/docs/getting-started/quick-start.md:159 - [recommended] drift-detection.md case Integrate copilot runtime #2 in 'When to use --no-drift' contradicts the CLI behavior described two paragraphs later at
docs/src/content/docs/guides/drift-detection.md:67
Lines 67-69 list 'Strip-mode invocations' as a reason to use --no-drift, but audit.py auto-skips drift in strip/file mode AND raises UsageError if combined. Self-contradiction. - [recommended] CHANGELOG entry violates Keep-a-Changelog 'one line per PR' rule
- [nit] New page is not back-linked from ci-policy-setup.md or governance-guide.md
- [nit] sidebar order: 7 collides with dev-only-primitives.md and org-packages.md
Test Coverage Expert
- [recommended] No test asserts normalization guards (BOM/CRLF/Build-ID) do NOT mask real content drift at
tests/unit/install/test_drift.py
Tests prove cosmetic-only changes are suppressed. None asserts the inverse -- the safety invariant. A future normalization bug could silently mask real drift.
Proof (missing at):tests/unit/install/test_drift.py-- proves: Normalization guards suppress only cosmetic changes, never real drift [secure-by-default,governed-by-policy] - [recommended] CacheMissError raised by run_replay has no direct unit test exercising the raise paths at
tests/unit/install/test_drift.py
Defined and raised in 5 locations; only command-layer swallowing is tested. No direct pytest.raises assertion.
Proof (missing at):tests/unit/install/test_drift.py-- proves: Cache-miss paths in drift replay raise CacheMissError so the audit command can catch and handle gracefully [devx,governed-by-policy]
This panel is advisory. It does not block merge. Re-apply the
panel-review label after addressing feedback to re-run.
CEO panel recommended landing two in-PR follow-ups before merge: 1. Recovery hint in drift output (cli-logging + devx-ux convergence): render_drift_text now appends '[i] Run apm install to re-sync deployed files with the lockfile.' so users see WHAT and HOW in one message. Honors Message Writing Rule #4 'Include the fix'. 2. Doc-sync (doc-writer + devx-ux convergence): - reference/cli-commands.md: add --no-drift to audit options table; amend --ci description to mention drift contribution. - integrations/ci-cd.md: replace bash 'git status --porcelain' workaround under 'Verify Deployed Primitives' with 'apm audit --ci' one-liner; update 'We dogfood this' callout text. - getting-started/quick-start.md: retarget stale cross-ref from the now-superseded ci-cd anchor to the new drift-detection guide. - guides/drift-detection.md: drop the self-contradictory case #2 in 'When to use --no-drift' (strip-mode is auto-skipped, not opt-out). - CHANGELOG.md: compress verbose entry to one Keep-a-Changelog line pointing readers to the guide for detail. Tracked as follow-up issues (CEO call): - supply-chain: verify cache content matches lockfile resolved_commit before drift replay trusts it (commit-SHA pinning bypass on shared CI caches). - test-coverage: inverse-normalization unit test asserting BOM/CRLF/ Build-ID guards do NOT mask real content drift (safety invariant). Lint clean. 45 drift tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addressed in-PR (commit 7907865)Per the CEO panel recommendation, landed the two convergent in-PR follow-ups: 1. Recovery hint (cli-logging + devx-ux): 2. Doc-sync (doc-writer + devx-ux):
All 45 drift tests pass; lint clean. Tracked as follow-up issues (CEO call)
StatusPR is ready for review. CEO stance: |
There was a problem hiding this comment.
Pull request overview
This PR extends apm audit to perform integration drift detection by default by replaying the install/integration pipeline into a scratch directory (cache-only) and diffing the result against the working tree. It adds a new drift engine, wires drift into text/JSON/SARIF outputs, updates CI/docs, and introduces extensive unit/integration/e2e coverage.
Changes:
- Add a cache-only “install replay + diff” drift engine with renderers (text/JSON/SARIF) and normalization guards (Build ID / CRLF / BOM).
- Wire drift detection into
apm audit(default-on) with--no-driftopt-out, and propagate findings into CI JSON + SARIF payloads. - Add read-only defensive guard + substantial test coverage; update CI scripts and docs to reflect the new drift surface.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Bumps apm-cli version to 0.12.1. |
| src/apm_cli/install/drift.py | New drift replay + diff engine with normalization + renderers. |
| src/apm_cli/install/services.py | Adds scratch_root kwarg + safety guard for replayed integrations. |
| src/apm_cli/utils/guards.py | New read-only project-tree guard used as defense-in-depth for drift. |
| src/apm_cli/utils/diagnostics.py | Adds drift diagnostic category + rendering. |
| src/apm_cli/policy/ci_checks.py | Adds _check_drift() to run replay+diff and return findings. |
| src/apm_cli/commands/audit.py | Adds --no-drift, runs drift by default, integrates drift into JSON/SARIF/text output paths. |
| tests/unit/utils/test_guards.py | Unit coverage for the read-only guard behavior (create/modify/delete). |
| tests/unit/utils/init.py | Ensures tests.unit.utils is a package for imports/discovery. |
| tests/unit/test_audit_policy_command.py | Updates audit policy unit tests to disable drift (stabilize expectations). |
| tests/unit/test_audit_ci_command.py | Updates CI-mode audit unit tests to disable drift (stabilize expectations). |
| tests/unit/test_audit_ci_auto_discovery.py | Updates auto-discovery unit tests to disable drift. |
| tests/unit/install/test_drift.py | Unit tests for drift normalization, diff kinds, SARIF rule IDs, and stderr-only phase logging. |
| tests/unit/install/test_drift_perf.py | Performance smoke test for replay+diff under a 5s budget for 100 primitives. |
| tests/integration/test_drift_check.py | Integration tests covering drift kinds, edges, multi-target behavior, and --no-drift mutex. |
| tests/integration/test_drift_check_e2e.py | E2E tests for no-write contract, air-gap behavior, performance smoke, and JSON/SARIF shape stability. |
| scripts/test-integration.sh | Ensures new drift integration/e2e suites run in CI integration job. |
| packages/apm-guide/.apm/skills/apm-usage/commands.md | Updates apm-guide command reference for drift default + --no-drift. |
| docs/src/content/docs/reference/cli-commands.md | Documents --no-drift and drift behavior in apm audit --ci. |
| docs/src/content/docs/integrations/ci-cd.md | Replaces legacy git status drift gate with apm audit --ci. |
| docs/src/content/docs/guides/drift-detection.md | New/updated guide describing drift behavior, formats, and CI usage. |
| docs/src/content/docs/getting-started/quick-start.md | Updates link/reference to drift checking guidance. |
| CHANGELOG.md | Adds Unreleased entry for default-on drift detection. |
| .github/workflows/ci.yml | Notes legacy bash drift step as redundant once apm-action picks up this CLI version. |
Copilot's findings
Comments suppressed due to low confidence (2)
src/apm_cli/policy/ci_checks.py:470
_check_drift()resolves targets viaresolve_targets(project_root)(auto-detect). If a project declarestarget:inapm.ymlbut the corresponding deployment dirs don't exist yet, auto-detect can omit those targets and the diff will miss drift (e.g., unintegrated outputs under.claude/,.cursor/, etc.). Consider resolving targets with the same explicit target selection used byrun_replay()(read fromapm.yml) so governed roots are complete even when dirs are absent.
logger.diff_start()
resolved_targets = resolve_targets(project_root)
if targets:
resolved_targets = [t for t in resolved_targets if t.name in set(targets)]
findings = diff_scratch_against_project(scratch, project_root, lockfile, resolved_targets)
docs/src/content/docs/guides/drift-detection.md:94
- The JSON example shows
driftas a list, but the implementation/tests treat the top-leveldriftkey as an object containing adriftlist (i.e.,payload['drift']['drift']). Please update this example (or adjust the implementation) so the documented schema matches the actual output shape.
**JSON** -- the audit report gains a top-level `drift` key:
```json
{
"report_format_version": "1.0",
"checks": [...],
"drift": [
{
"path": ".github/instructions/foo.md",
"kind": "modified",
"package": "<local>",
"inline_diff": "..."
}
]
}
</details>
- **Files reviewed:** 22/24 changed files
- **Comments generated:** 8
| def _snapshot(paths: list[Path]) -> dict[Path, tuple[float, int] | None]: | ||
| """Capture (mtime_ns, size) for each path, or ``None`` if missing. | ||
|
|
||
| Symlinks are followed; missing paths record ``None`` so they may | ||
| legitimately remain absent without triggering the guard. | ||
| """ | ||
| snap: dict[Path, tuple[float, int] | None] = {} | ||
| for p in paths: | ||
| try: | ||
| st = p.stat() | ||
| snap[p] = (st.st_mtime_ns, st.st_size) | ||
| except FileNotFoundError: |
| passed=False, | ||
| message=( | ||
| f"drift replay aborted: {exc} -- " | ||
| "run 'apm install' first or use --no-cache (not yet supported)" |
| def _check_drift( | ||
| project_root: Path, | ||
| lockfile, | ||
| targets=None, | ||
| cache_only: bool = True, | ||
| verbose: bool = False, | ||
| ) -> tuple[CheckResult, list]: | ||
| """Replay the install in a scratch dir and diff against the project. |
| for stream_name, stream in (("stdout", result.stdout), ("stderr", result.stderr)): | ||
| text = stream or "" | ||
| for ch in text: | ||
| assert ord(ch) < 128 or ch in {"\u2500", "\u2501"} or ord(ch) > 0xFFFF or True, ( |
| # Drift severities: kinds of divergence from the lockfile-defined state. | ||
| DRIFT_MODIFIED = "modified" # tracked file content changed | ||
| DRIFT_UNINTEGRATED = "unintegrated" # tracked file missing from project | ||
| DRIFT_ORPHANED = "orphaned" # untracked file present in managed dir |
|
|
||
| ### Added | ||
|
|
||
| - **`apm audit` now detects integration drift by default.** Read-only cache-only install replay catches missed `apm install` runs, hand-edited deployed files, and orphaned files. Findings exposed in JSON (`drift` key) and SARIF (`apm/drift/<kind>` rules); in `--ci` mode they contribute to the exit code. Opt out with `--no-drift` (mutually exclusive with `--strip`/`--file`). See the [Drift Detection guide](docs/src/content/docs/guides/drift-detection.md) for details. (#1071, supersedes scope of #898) |
| | `apm audit` | Reported in stdout | 0 (advisory only) | | ||
| | `apm audit --ci` | Reported and counted as failure | 1 | | ||
| | `apm audit --no-drift` | Skipped entirely | governed only by other checks | | ||
|
|
||
| In `--ci` mode drift findings are pooled with the seven baseline lockfile | ||
| checks (`lockfile-exists`, `ref-consistency`, etc.) -- a single | ||
| non-zero exit covers all of them. |
| @@ -593,6 +670,10 @@ def _audit_content_scan( | |||
| all_findings = [f for ff in findings_by_file.values() for f in ff] | |||
| exit_code = 1 if ContentScanner.has_critical(all_findings) else 2 | |||
|
|
|||
| # Drift findings escalate exit code to 1 (critical). | |||
| if drift_failed and exit_code == 0: | |||
| exit_code = 1 | |||
|
|
|||
TL;DR
apm auditnow detects integration drift by default. It replaysapm installcache-only into a scratch tree and diffs against your working copy, catching the three cases that previously slipped past CI:.apm/source added without re-running install (#1067), hand-edited deployed files, and orphan files left after a dependency was removed. The scan is read-only — nothing in the project, lockfile, orapm_modules/is touched. Closes #1071; supersedes the lockfile-only scope of #898.Problem (WHY)
Today,
apm auditcovers lockfile consistency but is blind to the most common real-world drift mode: a developer adds a.apm/instructions/foo.mdand forgets to re-runapm install, so.github/instructions/foo.mdnever lands. CI only catches it because every consumer repo (including this one) maintains a hand-rolledgit status --porcelainscript after the install step. That workaround has three problems:apm installfirst, so localapm auditgives false confidence.unintegrated(forgot to install) frommodified(hand-edit) fromorphaned(stale file), so the failure message is unhelpful.apm audit, not as a side-channel script.Issue #898 originally asked for full lockfile-vs-deployed-content verification; the design loop captured in
WIP/drift/widened the scope to all three drift kinds and locked the surface as a single command (apm audit) with default-on behavior, per ADR-02 in that folder.Approach (WHAT)
apm audit(no new command)--no-driftmutually exclusive with--stripand--filedriftkey in JSON;apm/drift/<kind>rules in SARIF--cionlyapm auditis advisory; CI mode is the gateImplementation (HOW)
src/apm_cli/install/drift.py(NEW, ~410 lines)ReplayConfig,DriftFinding,CacheMissError,CheckLogger,run_replay(),diff_scratch_against_project(), three renderers (text/JSON/SARIF). Includes false-positive guards (Build-ID, CRLF, BOM normalization) and_read_apm_yml_target()so multi-target projects replay the same target set the install pipeline usedsrc/apm_cli/install/services.pyscratch_rootkwarg tointegrate_package_primitives()so replay can redirect deploy paths without monkey-patchingsrc/apm_cli/utils/guards.py(NEW)_assert_scratch_bounddefensive guard preventing replay from ever writing outside the scratch treesrc/apm_cli/utils/diagnostics.pyDRIFTdiagnostic category with friendly remediation textsrc/apm_cli/commands/audit.py--no-driftflag, mutex viaUsageError, drift wired into both_audit_ci_gateand_audit_content_scan. Warning gated to text mode so JSON/SARIF stdout stays parseable.github/workflows/ci.ymlapm-actionships with this CLI; kept as defense-in-depth fallbacktests/integration/test_drift_check.py(NEW)--no-drifttests/integration/test_drift_check_e2e.py(NEW)tests/unit/install/test_drift_perf.py(NEW)CHANGELOG.md,docs/.../drift-detection.md,packages/apm-guide/.apm/skills/apm-usage/commands.mdDiagrams
The drift pipeline at runtime — cache-only replay, then diff:
flowchart LR A["apm audit"] --> B["Read apm.lock.yaml<br/>+ persistent cache"] B --> C["run_replay():<br/>install pipeline into<br/>scratch tmpdir"] C --> D["diff_scratch_against_project():<br/>scratch vs project"] D --> E["DriftFinding[]"] E --> F["render_drift_text /<br/>render_drift_json /<br/>render_drift_sarif"]How drift composes with the existing audit checks (drift is a separate rail, not a fake check):
stateDiagram-v2 [*] --> AuditEntry AuditEntry --> CIGate: --ci AuditEntry --> ContentScan: bare or PKG CIGate --> Baseline7: 7 lockfile checks Baseline7 --> Drift: if --no-drift not set ContentScan --> Drift: if --no-drift not set Drift --> Render Baseline7 --> Render Render --> Exit1: any failure in --ci Render --> Exit0: bare audit (advisory)Trade-offs
--no-drift. Rejected: opt-in default — would replicate the fix(compile): emit and clean up copilot root instructions (#792) #1067 silent-failure mode this PR exists to fix.--strip/--fileis strict. A user could theoretically want strip-mode + drift, but the semantics ("drift of what?") are undefined. We refuse the combination at the CLI rather than silently picking one. Rejected: warn-and-continue — invites bug reports we cannot answer.apm auditmode. In--cimode the lockfile-exists check fires first and fails loudly; in bare mode there is no such gate, so a corrupt lockfile means drift no-ops. Pinned by tests; tracked as a follow-up to escalate to a clean error in bare mode.apm.yml'starget:field. This couples the engine to the manifest format, but the alternative (directory auto-detection only) was wrong: it missed targets whose deployment dirs were still empty, falsely reporting their files as orphaned. Caught during Phase D test development.Benefits
unintegrated,modified,orphaned) that previously needed three separate scripts.apm audit --cisubsumes both lockfile fidelity and integration drift; the bashgit status --porcelainscript is now annotated as redundant.HTTPS_PROXYset to a sentinel and asserts no network attempt.Validation
Scenario evidence
Every user-promise scenario this PR introduces is mapped to the test that proves it. Principle taxonomy per the scenario-evidence rubric.
.apm/source (#1067 root cause)test_drift_check.py::test_drift_unintegrated_local_source.github/test_drift_check.py::test_drift_modified_deployed_filetest_drift_check.py::test_drift_orphaned_filetest_drift_check.py::test_no_drift_build_id_only_difftest_drift_check.py::test_no_drift_crlf_only_difftest_drift_check.py::test_no_drift_bom_only_difftest_drift_check.py::test_regression_pr_1067_unintegrated_subdir_sourcetest_drift_check.py::test_regression_pr_882_target_autodetecttest_drift_check.py::test_regression_pr_889_orphan_cleanuptest_drift_check.py::test_regression_source_deleted_no_installtest_drift_check.py::test_drift_multitarget_*--no-driftopt-out skips drift entirelytest_drift_check.py::test_no_drift_flag_skips_drift_check--no-driftmutex with--strip/--filerejects via UsageErrortest_drift_check.py::test_no_drift_mutex_with_strip/test_no_drift_mutex_with_fileapm audit --ciexit code 1test_drift_check.py::test_drift_ci_exit_code_oneapm auditreports drift but exits 0 (advisory)test_drift_check.py::test_drift_bare_audit_exits_zerodriftkeytest_drift_check_e2e.py::test_json_output_drift_key_stableapm/drift/<kind>rulestest_drift_check_e2e.py::test_sarif_output_drift_rules_stableapm_modules/test_drift_check_e2e.py::test_no_write_contract_mtime_and_hash_unchangedtest_drift_check_e2e.py::test_drift_replay_air_gap_no_network_callstest_drift_check_e2e.py::test_full_install_mutate_audit_looptest_drift_check_e2e.py::test_30s_smoke_full_audit_under_thresholdtest_drift_perf.py::test_drift_replay_100_primitives_under_5sFull integration-test list (32 + 10 + 1 = 43 new tests)
The CI integration script (
scripts/test-integration.sh) was extended with two newpytestblocks inrun_e2e_tests()so both suites run in the e2e CI job.How to test
uv sync --all-extras --devuv run --extra dev pytest tests/unit/install/test_drift.py tests/unit/install/test_drift_perf.py tests/integration/test_drift_check.py tests/integration/test_drift_check_e2e.py -vapm.yml), edit any file under.github/instructions/and runapm audit— expect amodifiedfindingapm audit --no-drift— drift section absent, lockfile checks still runapm audit --ci -f json— JSON has top-leveldriftkey, parses cleanly, exit code is 1 if drift existsNote
The
.github/workflows/ci.ymllegacy bash drift check is intentionally kept for one release cycle as defense-in-depth untilapm-actionpicks up a CLI version that includes this PR.Closes #1071. Supersedes the lockfile-only scope of #898.
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com