Skip to content

feat(audit): default-on integration drift detection#1137

Open
danielmeppiel wants to merge 5 commits intomainfrom
feat/drift-detection
Open

feat(audit): default-on integration drift detection#1137
danielmeppiel wants to merge 5 commits intomainfrom
feat/drift-detection

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

TL;DR

apm audit now detects integration drift by default. It replays apm install cache-only into a scratch tree and diffs against your working copy, catching the three cases that previously slipped past CI: .apm/ source added without re-running install (#1067), hand-edited deployed files, and orphan files left after a dependency was removed. The scan is read-only — nothing in the project, lockfile, or apm_modules/ is touched. Closes #1071; supersedes the lockfile-only scope of #898.

Problem (WHY)

Today, apm audit covers lockfile consistency but is blind to the most common real-world drift mode: a developer adds a .apm/instructions/foo.md and forgets to re-run apm install, so .github/instructions/foo.md never lands. CI only catches it because every consumer repo (including this one) maintains a hand-rolled git status --porcelain script after the install step. That workaround has three problems:

  • It only fires when CI happens to run apm install first, so local apm audit gives false confidence.
  • It cannot distinguish unintegrated (forgot to install) from modified (hand-edit) from orphaned (stale file), so the failure message is unhelpful.
  • It is bash glue users must copy verbatim, not a tool guarantee — a Progressive Disclosure failure: "Context arrives just-in-time, not just-in-case." The drift signal should arrive with apm audit, not as a side-channel script.

Issue #898 originally asked for full lockfile-vs-deployed-content verification; the design loop captured in WIP/drift/ widened the scope to all three drift kinds and locked the surface as a single command (apm audit) with default-on behavior, per ADR-02 in that folder.

Approach (WHAT)

Decision Choice Rationale
Surface Extend apm audit (no new command) One mental model: "audit = is my project consistent?"
Default ON Catching drift is the user-promise; opt-out, not opt-in
Engine Cache-only install replay into scratch tmpdir Reuses the install pipeline verbatim — no second source of truth for "what should be deployed"
Mutex --no-drift mutually exclusive with --strip and --file Strip-mode and single-file mode operate on payloads, not the project; mixing them with drift is ill-defined
Output Top-level drift key in JSON; apm/drift/<kind> rules in SARIF Drift is orthogonal to lockfile checks; do not encode it as a fake check entry
Failure mode Findings exit 1 in --ci only Bare apm audit is advisory; CI mode is the gate

Implementation (HOW)

File Change
src/apm_cli/install/drift.py (NEW, ~410 lines) Replay engine: ReplayConfig, DriftFinding, CacheMissError, CheckLogger, run_replay(), diff_scratch_against_project(), three renderers (text/JSON/SARIF). Includes false-positive guards (Build-ID, CRLF, BOM normalization) and _read_apm_yml_target() so multi-target projects replay the same target set the install pipeline used
src/apm_cli/install/services.py Added scratch_root kwarg to integrate_package_primitives() so replay can redirect deploy paths without monkey-patching
src/apm_cli/utils/guards.py (NEW) _assert_scratch_bound defensive guard preventing replay from ever writing outside the scratch tree
src/apm_cli/utils/diagnostics.py New DRIFT diagnostic category with friendly remediation text
src/apm_cli/commands/audit.py --no-drift flag, mutex via UsageError, drift wired into both _audit_ci_gate and _audit_content_scan. Warning gated to text mode so JSON/SARIF stdout stays parseable
.github/workflows/ci.yml Annotated Gate B (legacy bash drift check) as redundant once apm-action ships with this CLI; kept as defense-in-depth fallback
tests/integration/test_drift_check.py (NEW) 32 tests across 9 drift cases, 4 past-PR regressions, edges, multi-target, default-on/--no-drift
tests/integration/test_drift_check_e2e.py (NEW) 10 tests: full install→mutate→audit loop, no-write contract, air-gap proof, JSON/SARIF stability
tests/unit/install/test_drift_perf.py (NEW) 100 primitives smoke ceiling under 5s
Docs: CHANGELOG.md, docs/.../drift-detection.md, packages/apm-guide/.apm/skills/apm-usage/commands.md User-facing surface documented; apm-guide skill updated per the keep-docs-up-to-date rule 4

Diagrams

The drift pipeline at runtime — cache-only replay, then diff:

flowchart LR
    A["apm audit"] --> B["Read apm.lock.yaml<br/>+ persistent cache"]
    B --> C["run_replay():<br/>install pipeline into<br/>scratch tmpdir"]
    C --> D["diff_scratch_against_project():<br/>scratch vs project"]
    D --> E["DriftFinding[]"]
    E --> F["render_drift_text /<br/>render_drift_json /<br/>render_drift_sarif"]
Loading

How drift composes with the existing audit checks (drift is a separate rail, not a fake check):

stateDiagram-v2
    [*] --> AuditEntry
    AuditEntry --> CIGate: --ci
    AuditEntry --> ContentScan: bare or PKG
    CIGate --> Baseline7: 7 lockfile checks
    Baseline7 --> Drift: if --no-drift not set
    ContentScan --> Drift: if --no-drift not set
    Drift --> Render
    Baseline7 --> Render
    Render --> Exit1: any failure in --ci
    Render --> Exit0: bare audit (advisory)
Loading

Trade-offs

  • Default ON costs a few seconds. The cache-only replay is fast (100 primitives in ~1.2s on the perf test) but non-zero. The escape hatch is --no-drift. Rejected: opt-in default — would replicate the fix(compile): emit and clean up copilot root instructions (#792) #1067 silent-failure mode this PR exists to fix.
  • Mutex with --strip/--file is strict. A user could theoretically want strip-mode + drift, but the semantics ("drift of what?") are undefined. We refuse the combination at the CLI rather than silently picking one. Rejected: warn-and-continue — invites bug reports we cannot answer.
  • Corrupt lockfile silently skips drift in bare apm audit mode. In --ci mode the lockfile-exists check fires first and fails loudly; in bare mode there is no such gate, so a corrupt lockfile means drift no-ops. Pinned by tests; tracked as a follow-up to escalate to a clean error in bare mode.
  • Multi-target replay reads apm.yml's target: field. This couples the engine to the manifest format, but the alternative (directory auto-detection only) was wrong: it missed targets whose deployment dirs were still empty, falsely reporting their files as orphaned. Caught during Phase D test development.

Benefits

  1. One command catches the three real-world drift modes (unintegrated, modified, orphaned) that previously needed three separate scripts.
  2. CI gates collapse from two steps to one. apm audit --ci subsumes both lockfile fidelity and integration drift; the bash git status --porcelain script is now annotated as redundant.
  3. Read-only by contract. A dedicated e2e test snapshots mtime+sha256 of every file and asserts unchanged; the air-gap test runs the entire flow with HTTPS_PROXY set to a sentinel and asserts no network attempt.
  4. JSON/SARIF integration unblocks tooling. Code-scanning, dashboards, and PR bots can consume drift findings without parsing text output.
  5. Performance budget enforced. The 100-primitive smoke test (~1.2s actual, 5s ceiling) prevents regressions.

Validation

$ uv run --extra dev ruff check src/ tests/ && uv run --extra dev ruff format --check src/ tests/
All checks passed!
678 files already formatted

$ uv run --extra dev pytest tests/unit -q
7598 passed, 1 warning, 30 subtests passed in 90.05s

$ uv run --extra dev pytest tests/integration/test_drift_check.py tests/integration/test_drift_check_e2e.py tests/unit/install/test_drift.py tests/unit/install/test_drift_perf.py -q
56 passed

Scenario evidence

Every user-promise scenario this PR introduces is mapped to the test that proves it. Principle taxonomy per the scenario-evidence rubric.

Scenario Test Tier Principle
Forgot to re-run install after adding .apm/ source (#1067 root cause) test_drift_check.py::test_drift_unintegrated_local_source integration-with-fixtures Reproducibility
Hand-edit to a regenerated file under .github/ test_drift_check.py::test_drift_modified_deployed_file integration-with-fixtures Reproducibility
Source removed but deployed file remains test_drift_check.py::test_drift_orphaned_file integration-with-fixtures Reproducibility
Build-ID line difference does NOT trigger drift test_drift_check.py::test_no_drift_build_id_only_diff integration-with-fixtures Determinism
CRLF line endings on Windows checkouts do NOT trigger drift test_drift_check.py::test_no_drift_crlf_only_diff integration-with-fixtures Determinism
UTF-8 BOM does NOT trigger drift test_drift_check.py::test_no_drift_bom_only_diff integration-with-fixtures Determinism
#1067 regression (Cursor commands subdir) test_drift_check.py::test_regression_pr_1067_unintegrated_subdir_source integration-with-fixtures No-Regression
#882 regression (target auto-detection) test_drift_check.py::test_regression_pr_882_target_autodetect integration-with-fixtures No-Regression
#889 regression (orphan cleanup) test_drift_check.py::test_regression_pr_889_orphan_cleanup integration-with-fixtures No-Regression
Source-deleted regression test_drift_check.py::test_regression_source_deleted_no_install integration-with-fixtures No-Regression
Multi-target (copilot+claude+cursor) replay matches all installed targets test_drift_check.py::test_drift_multitarget_* integration-with-fixtures Reproducibility
--no-drift opt-out skips drift entirely test_drift_check.py::test_no_drift_flag_skips_drift_check integration-with-fixtures Ergonomics
--no-drift mutex with --strip/--file rejects via UsageError test_drift_check.py::test_no_drift_mutex_with_strip / test_no_drift_mutex_with_file integration-with-fixtures Ergonomics
Drift findings cause apm audit --ci exit code 1 test_drift_check.py::test_drift_ci_exit_code_one integration-with-fixtures CI-Gate
Bare apm audit reports drift but exits 0 (advisory) test_drift_check.py::test_drift_bare_audit_exits_zero integration-with-fixtures CI-Gate
JSON output schema stable: top-level drift key test_drift_check_e2e.py::test_json_output_drift_key_stable e2e Schema-Stability
SARIF output schema stable: apm/drift/<kind> rules test_drift_check_e2e.py::test_sarif_output_drift_rules_stable e2e Schema-Stability
No-write contract: audit run with drift never mutates project / lockfile / apm_modules/ test_drift_check_e2e.py::test_no_write_contract_mtime_and_hash_unchanged e2e Read-Only-Guarantee
Air-gap proof: drift replay does not touch the network test_drift_check_e2e.py::test_drift_replay_air_gap_no_network_calls e2e Cache-Only
Full install→mutate→audit loop catches drift test_drift_check_e2e.py::test_full_install_mutate_audit_loop e2e End-to-End
30s smoke: full audit on a real fixture under 30s test_drift_check_e2e.py::test_30s_smoke_full_audit_under_threshold e2e Performance-Smoke
Performance ceiling: 100-primitive replay+diff under 5s test_drift_perf.py::test_drift_replay_100_primitives_under_5s unit Performance-Ceiling
Full integration-test list (32 + 10 + 1 = 43 new tests)
tests/integration/test_drift_check.py
  Section A — drift cases (9)
  Section B — past-PR regressions (4)
  Section C — edges (no/corrupt lockfile, untracked governed, no-write, idempotency) (7)
  Section D — multi-target (3)
  Section E — default-on / --no-drift opt-out / mutex / exit codes (9)

tests/integration/test_drift_check_e2e.py
  full install→mutate→audit, no-write, air-gap, JSON/SARIF stability,
  30s smoke (10)

tests/unit/install/test_drift_perf.py
  100-primitive smoke (1)

The CI integration script (scripts/test-integration.sh) was extended with two new pytest blocks in run_e2e_tests() so both suites run in the e2e CI job.

How to test

  • Pull this branch, run uv sync --all-extras --dev
  • Run the focused suite: uv run --extra dev pytest tests/unit/install/test_drift.py tests/unit/install/test_drift_perf.py tests/integration/test_drift_check.py tests/integration/test_drift_check_e2e.py -v
  • In a real consumer repo (one with an apm.yml), edit any file under .github/instructions/ and run apm audit — expect a modified finding
  • Run apm audit --no-drift — drift section absent, lockfile checks still run
  • Run apm audit --ci -f json — JSON has top-level drift key, parses cleanly, exit code is 1 if drift exists

Note

The .github/workflows/ci.yml legacy bash drift check is intentionally kept for one release cycle as defense-in-depth until apm-action picks up a CLI version that includes this PR.

Closes #1071. Supersedes the lockfile-only scope of #898.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Daniel Meppiel and others added 4 commits May 4, 2026 22:53
- Add _ReadOnlyProjectGuard context manager (utils/guards.py): snapshots
  stat of protected paths, raises ProtectedPathMutationError on any
  mutation. Defense-in-depth above the scratch-root remap.
- Add CATEGORY_DRIFT + drift() recording method to DiagnosticCollector.
- Add drift_count property and _render_drift_group renderer that groups
  by kind (modified/unintegrated/orphaned) with stable section header
  for machine consumers.
- Tests: 7 unit tests covering happy path, mutation, creation, deletion,
  missing-tolerated, exception-not-masked, single-file protected path.

Refs #1071. Phase A of WIP/drift/06-final-plan.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implements the drift detection feature per WIP/drift/06-final-plan.md
(closes #1071 scope alignment with #898).

Engine (Phase B):
- src/apm_cli/install/drift.py: ReplayConfig, DriftFinding, CheckLogger,
  CacheMissError, normalization helpers (build-id strip, line endings,
  BOM), run_replay() (cache-only), diff_scratch_against_project(),
  text/json/sarif renderers, atexit scratch cleanup.
- src/apm_cli/install/services.py: scratch_root kwarg with
  ensure_path_within defense-in-depth guard for replay isolation.
- src/apm_cli/policy/ci_checks.py: _check_drift() wrapper returning
  (CheckResult, list[DriftFinding]); graceful CacheMissError handling.

CLI surface (Phase C):
- src/apm_cli/commands/audit.py: --no-drift opt-out flag with mutex
  against --strip/--file via UsageError. Drift wired into both
  _audit_ci_gate (--ci) and _audit_content_scan (bare project audit)
  paths, default-on per ADR-02. JSON/SARIF/text renderers integrated;
  --no-drift warning gated to text mode (stdout cleanliness).

Tests:
- tests/unit/install/test_drift.py: 13 unit tests (normalization,
  diff cases, renderers).
- Legacy --ci tests opt out of drift via batch --no-drift injection
  (fixture parity, not a behavior change).

7597 unit tests pass; lint clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implements the locked test matrix for issue #1071 drift
detection. Floor of 43 tests across three new files closes the
'ULTRA HARDENING OF HELL' coverage requirement.

New files:
- tests/integration/test_drift_check.py (32 tests):
  * Section A: 9 drift cases (modified/unintegrated/orphaned + CRLF/
    BOM/Build-ID false-positive guards)
  * Section B: 4 past-PR regressions (#1067, #882, #889, source-deleted)
  * Section C: 7 edges (no/corrupt lockfile, untracked governed,
    no-write contract, idempotency)
  * Section D: 3 multi-target (copilot/claude/cursor)
  * Section E: 9 default-on / --no-drift opt-out (mutex, stderr
    routing, JSON suppression)
- tests/integration/test_drift_check_e2e.py (10 tests):
  full install->mutate->audit loop with mix_stderr=False, air-gap
  proof, JSON/SARIF stability, 30s smoke
- tests/unit/install/test_drift_perf.py (1 test):
  100 primitives replay+diff under 5s

Engine fix surfaced by tests:
- src/apm_cli/install/drift.py: run_replay now reads apm.yml's target
  field via parse_target_field and passes it to resolve_targets.
  Without this, multi-target projects (copilot+claude+cursor) replayed
  only the auto-detected primary target, falsely reporting secondary
  target deployments as orphaned. Helper _read_apm_yml_target() added.

CI wiring:
- scripts/test-integration.sh: two new blocks in run_e2e_tests()
  invoking the integration + e2e suites before the final success log.
  Both safe to run without GITHUB_APM_PAT (cache-only, mocked network).

Verification: 56 drift-domain tests pass; full repo lint clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- CHANGELOG.md: Added [Unreleased] entry under Added describing the
  default-on drift detection in apm audit, the three failure modes it
  catches, false-positive guards, --no-drift opt-out + mutex semantics,
  and the JSON/SARIF integration shape. Closes #1071, supersedes #898.
- docs/src/content/docs/guides/drift-detection.md (NEW, sidebar order 7):
  Full user-facing guide -- what drift means, how the cache-only replay
  works (with mermaid diagram), exit-code matrix, when to use --no-drift,
  output formats, and the CI single-line gate that replaces the legacy
  git status --porcelain script.
- packages/apm-guide/.apm/skills/apm-usage/commands.md: Extended the
  audit row with --no-drift flag and added a paragraph documenting the
  drift-by-default behavior, three failure modes, false-positive
  normalization, and JSON/SARIF integration. Aligns the skill that
  ships in apm-guide with the new CLI surface (per
  apm-keep-docs-up-to-date.instructions.md rule 4).
- .github/workflows/ci.yml: Annotated Gate B (legacy bash drift check)
  with a comment marking it redundant once apm-action ships a CLI with
  default-on drift detection (this PR's release). Kept as
  defense-in-depth fallback until then.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

APM Review Panel: ship_with_followups

Default-on drift detection is a strong positioning move; ship after adding the recovery hint and syncing stale docs -- zero blockers, clean severity discipline.

The panel converged cleanly: zero blocking findings across eight specialists, strong approval of the replay-engine architecture, and consistent praise for the read-only/cache-only safety boundary. The highest-signal finding came from two independent personas (cli-logging-expert AND devx-ux-expert) identifying the same gap: drift output tells users WHAT drifted but never HOW to fix it. This violates APM's 'Include the fix' messaging rule and npm-audit's precedent. A one-line append ('Run: apm install') is a trivial fix that should land in this PR before merge -- it's the difference between a diagnostic and a product.

The supply-chain-security-expert raised a legitimate concern: replay trusts cache content without verifying it matches lockfile resolved_commit. The current defense (NotImplementedError on network replay + cache populated only by prior verified installs) is pragmatic for v1, but shared CI caches and mounted volumes are a real threat vector. This warrants a tracked follow-up issue, not a gate. The test-coverage-expert's missing inverse-normalization test (outcome: missing, unit tier) is worth noting: the safety invariant 'normalization does NOT mask real drift' has no automated guard. This is a regression-trap gap on a secure-by-default surface -- elevate above opinion-tier nits.

Doc-sync debt is the other consistent theme: cli-commands.md, ci-cd.md (still recommending the bash workaround this PR supersedes), and quick-start.md all need updates. The doc-writer and devx-ux-expert converge here. These are cheap fixes that should land in-PR to avoid shipping a feature whose docs contradict themselves on day one.

Dissent. No substantive dissent. Python-architect's scratch_root seam concern and supply-chain's atexit fragility point at the same underlying trade-off (temp-dir lifecycle); both agree it's recommended-tier, not blocking. Auth-expert correctly self-deactivated -- no auth surface touched.

Aligned with: secure-by-default (default-on, cache-only, no credential surface), governed-by-policy (SARIF + --ci exit code), portable-by-manifest (lockfile resolved_commit as source of truth), pragmatic-as-npm (one command, zero config, opt-out via --no-drift).

Growth signal. Drift detection opens a new pillar under 'lockfile-grade reproducibility'. The launch beat writes itself: 'apm audit catches what your CI bash script misses.' Before/after story (4 lines of fragile bash vs. one command with SARIF for free) is strong social-thread material. Ensure cross-links land so the guide isn't an orphan page with zero internal-link equity.

Panel summary

Persona B R N Takeaway
Python Architect 0 3 2 Solid replay-engine design with frozen dataclasses and defense-in-depth guards; scratch_root kwarg is a pragmatic seam but warrants a follow-up to formalize as a deploy-target factory.
CLI Logging Expert 0 1 3 Drift output follows ASCII/stderr conventions well; missing actionable fix hint violates 'Include the fix' rule; verbose scratch-path disclosure absent.
DevX UX Expert 0 2 2 Missing recovery hint in drift output and missing cli-commands.md update; otherwise the surface design is sound.
Supply Chain Security Expert 0 3 2 Cache-integrity verification absent from replay path; scratch containment is sound; air-gap test blocks subprocesses but not socket-level egress; atexit cleanup is SIGTERM-fragile on shared CI runners.
OSS Growth Hacker 0 3 2 Strong launch-beat story; needs cross-links and a 60-second demo path to avoid the new guide being an island.
Doc Writer 0 5 2 Drift guide is well-written; ship after fixing two stale companion docs and one self-contradictory section. CHANGELOG bullet violates Keep-a-Changelog one-line rule.
Test Coverage Expert 0 2 0 42 integration+e2e tests cover all three drift kinds, no-write contract, air-gap, SARIF/JSON stability, and --no-drift mutex. Two gaps: no test that normalization guards do NOT mask real drift, and CacheMissError path untested at unit tier.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 5 follow-ups

  1. [DevX UX Expert] Add recovery hint to drift output: 'Run: apm install' after each drift group -- Two-persona convergence (cli-logging + devx-ux). Violates 'Include the fix' rule. One-line fix that converts a diagnostic into a product moment. Should land in-PR.
  2. [Doc Writer] Update cli-commands.md (--no-drift flag), ci-cd.md (remove obsolete bash workaround), and quick-start.md (stale cross-ref) -- Three stale doc surfaces contradict the new behavior. Shipping without sync means day-one user confusion. Cheap fixes, same PR.
  3. [Supply Chain Security Expert] Track follow-up issue: verify cache content matches lockfile resolved_commit before replay trusts it -- Shared CI caches and mounted volumes are a real poisoning vector. Current defense (prior-install trust) is pragmatic for v1 but the integrity primitive (commit-SHA pinning) is bypassed at replay time.
  4. [Test Coverage Expert] Add inverse-normalization unit test asserting real content drift is NOT suppressed by BOM/CRLF/Build-ID guards -- Evidence outcome=missing on a secure-by-default safety invariant. A normalization bug could silently mask real drift with no automated guard catching the regression.
  5. [OSS Growth Hacker] Add inbound cross-links from ci-policy-setup, governance-guide, and making-the-case to the new drift-detection guide -- Guide currently has zero internal-link equity -- it's discoverable only via sidebar. Cross-links from high-traffic pages drive adoption of the new capability.

Architecture

classDiagram
    direction LR
    class ReplayConfig {
        <<ValueObject>>
        +project_root Path
        +lockfile_path Path
        +targets frozenset~str~ | None
        +cache_only bool
    }
    class DriftFinding {
        <<ValueObject>>
        +path str
        +kind str
        +package str
        +inline_diff str
    }
    class CacheMissError {
        <<Exception>>
    }
    class CheckLogger {
        <<Base+Subclass>>
        +replay_start()
        +diff_start()
        +replay_complete(n)
        +clean()
        +findings(n)
    }
    class CommandLogger {
        <<Base>>
        +verbose bool
        +error(msg)
        +warning(msg)
        +success(msg)
    }
    class _ReadOnlyProjectGuard {
        <<ContextManager>>
        +project_root Path
        +protected_roots list~Path~
        +__enter__()
        +__exit__()
    }
    class ProtectedPathMutationError {
        <<Exception>>
    }
    class DiagnosticCollector {
        <<Collect-then-render>>
        +drift(msg, path)
        +drift_count int
    }
    class integrate_package_primitives {
        <<IOBoundary>>
        +scratch_root Path | None
    }
    CheckLogger --|> CommandLogger : extends
    _ReadOnlyProjectGuard ..> ProtectedPathMutationError : raises
    class ReplayConfig:::touched
    class DriftFinding:::touched
    class CacheMissError:::touched
    class CheckLogger:::touched
    class _ReadOnlyProjectGuard:::touched
    class ProtectedPathMutationError:::touched
    class integrate_package_primitives:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading
flowchart TD
    A["apm audit (commands/audit.py)"] --> B{--ci flag?}
    B -- yes --> C["_audit_ci_gate()"]
    B -- no --> D["_audit_content_scan()"]
    C --> E{"--no-drift?"}
    D --> F{"--no-drift AND no --file/--strip/--package?"}
    E -- no --> G["[NET] _check_drift(project_root, lockfile)"]
    F -- no --> G
    E -- yes --> H["skip drift, warn on stderr"]
    F -- yes --> H
    G --> I["run_replay(ReplayConfig, CheckLogger)"]
    I --> J["[FS] _make_scratch_root() tempfile.mkdtemp + atexit"]
    J --> K["_assert_scratch_bound(project, scratch)"]
    K --> L{"for lock_dep in lockfile"}
    L --> M["[FS] _materialize_install_path() resolve cache path"]
    M --> N["_build_package_info() loads apm.yml if present"]
    N --> O["[FS] integrate_package_primitives(scratch_root=scratch_root)"]
    O --> P["[FS] ensure_path_within(project_root, scratch_root)"]
    P --> L
    L -- done --> Q["diff_scratch_against_project()"]
    Q --> R["[FS] _walk_managed(scratch, governed_roots)"]
    Q --> S["[FS] _walk_managed(project, governed_roots)"]
    R --> T["_normalize() on each file pair (BOM + CRLF + Build-ID strip)"]
    S --> T
    T --> U["emit list~DriftFinding~"]
    U --> V{"format?"}
    V -- text --> W["render_drift_text()"]
    V -- json --> X["render_drift_json()"]
    V -- sarif --> Y["render_drift_sarif()"]
Loading
sequenceDiagram
    participant User
    participant AuditCmd as commands/audit.py
    participant CIChecks as policy/ci_checks.py
    participant Drift as install/drift.py
    participant Services as install/services.py
    participant FS as Filesystem

    User->>AuditCmd: apm audit [--ci]
    AuditCmd->>CIChecks: _check_drift(project_root, lockfile)
    CIChecks->>Drift: run_replay(ReplayConfig, CheckLogger)
    Drift->>FS: mkdtemp (scratch dir)
    Drift->>Drift: _assert_scratch_bound(project, scratch)
    loop each locked dependency
        Drift->>FS: _materialize_install_path (cache lookup)
        Drift->>Services: integrate_package_primitives(scratch_root=scratch)
        Services->>Services: ensure_path_within(project_root, scratch_root)
        Services->>FS: write primitives to scratch
    end
    CIChecks->>Drift: diff_scratch_against_project(scratch, project, lockfile)
    Drift->>FS: _walk_managed(scratch) + _walk_managed(project)
    Drift->>Drift: _normalize() each pair
    Drift-->>CIChecks: list[DriftFinding]
    CIChecks-->>AuditCmd: (CheckResult, findings)
    AuditCmd->>User: render_drift(findings, format)
Loading

Recommendation

Land the recovery hint and doc-sync fixes in this PR (both are trivial, same-day work), then merge. Track cache-integrity verification and inverse-normalization test as follow-up issues. The feature is architecturally sound, the test suite is strong (43 new tests), and the positioning value is high. No reason to delay beyond the in-PR polish pass.


Full per-persona findings

Python Architect

  • [recommended] scratch_root kwarg on integrate_package_primitives leaks test-double / audit concerns into the production API surface at src/apm_cli/install/services.py:96
    Adding scratch_root to the production install function's signature couples a drift-audit implementation detail into the core pipeline. Cleaner seam: DeployTarget protocol injected by callers.
  • [recommended] Normalization helpers are drift.py-private but will be needed by compile, install verify, and future audit modes at src/apm_cli/install/drift.py:96
    _strip_build_id, _normalize_line_endings, _strip_bom belong in utils/normalization.py with a public API.
  • [recommended] _read_apm_yml_target couples drift module to manifest format at src/apm_cli/install/drift.py:334
    Tight coupling to apm.yml schema; pragmatic today, but a 'load project config' helper is the right seam.
  • [nit] render_drift_json return type annotation could be more precise
    TypedDict or dict[str, list[dict[str, str]]] gives downstream callers type-safe access.
  • [nit] atexit.register may leave temp dirs on SIGKILL
    Use tempfile.TemporaryDirectory context manager pattern for both atexit AND explicit cleanup.

CLI Logging Expert

  • [recommended] render_drift_text and _render_drift_group omit an actionable remediation hint telling the user how to fix detected drift at src/apm_cli/install/drift.py:640
    Message Writing Rule Add ARM64 Linux support to CI/CD pipeline #4 'Include the fix'. Drift findings show WHAT but never HOW. Append 'Run apm install to re-sync deployed files with the lockfile.'
  • [nit] CheckLogger emits stderr progress unconditionally -- no quiet-mode or non-TTY suppression path
  • [nit] _inline_diff_for never produces an actual diff for sub-cap files; docstring is misleading
  • [nit] Verbose mode discloses tracemalloc peak but not scratch tmpdir path -- agent debugging is harder

DevX UX Expert

  • [recommended] render_drift_text never tells the user HOW to fix drift -- the single remediation command is missing at src/apm_cli/install/drift.py
    Failure mode is the product. npm audit prints 'run npm audit fix'; APM should print 'Run: apm install'.
  • [recommended] docs/src/content/docs/reference/cli-commands.md not updated -- PR is incomplete per doc-sync Rule 4 at docs/src/content/docs/reference/cli-commands.md
    Persona scope: 'If a CLI change is not reflected in cli-commands.md in the same PR, that change is incomplete by definition.' --no-drift not in flags table.
  • [nit] --no-drift mutex error could name the reason more concretely
  • [nit] CHANGELOG entry is a single monolithic paragraph -- harder to scan than bullet points

Supply Chain Security Expert

  • [recommended] Drift replay trusts apm_modules cache without verifying content matches lockfile resolved_commit at src/apm_cli/install/drift.py:231
    _materialize_install_path checks only EXISTS, never that content/git HEAD matches lockfile.resolved_commit. A poisoned cache (CI runners with shared caches, mounted volumes) would produce false-clean. Security model says commit-SHA pinning is the integrity primitive; replay bypasses it.
  • [recommended] Air-gap test blocks subprocess binaries (gh/curl/wget) but not socket-level Python network calls at tests/integration/test_drift_check_e2e.py:84
    Patches subprocess.run/Popen but not socket.socket. Future requests/urllib egress would not be caught.
  • [recommended] atexit-based scratch cleanup is not SIGTERM/SIGKILL safe; sensitive deployed content may persist on shared CI runners at src/apm_cli/install/drift.py:142
    Python atexit doesn't fire on SIGKILL; CI cancellation = SIGTERM+SIGKILL. Scratch /tmp persists with replayed primitives.
  • [nit] inline_diff field could leak secrets if expanded -- add SECURITY comment
  • [nit] Positive finding: drift replay correctly avoids credential paths (zero token references; NotImplementedError on cache_only=False)

OSS Growth Hacker

  • [recommended] Drift-detection guide is an island -- no inbound links from ci-policy-setup, governance-guide, or making-the-case
    Guide links OUT but nothing links IN. Zero internal-link equity.
  • [recommended] CHANGELOG entry buries the hook in implementation detail -- rewrite lead sentence as the one-liner users can repost
  • [recommended] No 60-second demo path for adopters -- the guide explains but never invites the reader to try
  • [nit] README 'Content security' bullet mentions audit but not drift -- missed hook
  • [nit] Guide does not name the comparable ecosystem tools (npm audit, terraform plan -detailed-exitcode)

Auth Expert -- inactive

PR does not touch any auth-relevant file; drift replay is cache-only with no downloader/clone/token surface -- verified via grep + drift.py read. Network-enabled replay is explicitly gated by NotImplementedError.

Doc Writer

  • [recommended] integrations/ci-cd.md still documents the bash git-status workaround that this PR explicitly supersedes at docs/src/content/docs/integrations/ci-cd.md:61
    Lines 61-79 still recommend the legacy pattern under 'Verify Deployed Primitives'; 'We dogfood this' callout still cites the old ci.yml. Most important doc-sync gap on the PR.
  • [recommended] reference/cli-commands.md does not document --no-drift and still describes --ci as 7-check-only at docs/src/content/docs/reference/cli-commands.md:462
  • [recommended] quick-start.md sends new users to the now-obsolete drift section in ci-cd.md at docs/src/content/docs/getting-started/quick-start.md:159
  • [recommended] drift-detection.md case Integrate copilot runtime #2 in 'When to use --no-drift' contradicts the CLI behavior described two paragraphs later at docs/src/content/docs/guides/drift-detection.md:67
    Lines 67-69 list 'Strip-mode invocations' as a reason to use --no-drift, but audit.py auto-skips drift in strip/file mode AND raises UsageError if combined. Self-contradiction.
  • [recommended] CHANGELOG entry violates Keep-a-Changelog 'one line per PR' rule
  • [nit] New page is not back-linked from ci-policy-setup.md or governance-guide.md
  • [nit] sidebar order: 7 collides with dev-only-primitives.md and org-packages.md

Test Coverage Expert

  • [recommended] No test asserts normalization guards (BOM/CRLF/Build-ID) do NOT mask real content drift at tests/unit/install/test_drift.py
    Tests prove cosmetic-only changes are suppressed. None asserts the inverse -- the safety invariant. A future normalization bug could silently mask real drift.
    Proof (missing at): tests/unit/install/test_drift.py -- proves: Normalization guards suppress only cosmetic changes, never real drift [secure-by-default,governed-by-policy]
  • [recommended] CacheMissError raised by run_replay has no direct unit test exercising the raise paths at tests/unit/install/test_drift.py
    Defined and raised in 5 locations; only command-layer swallowing is tested. No direct pytest.raises assertion.
    Proof (missing at): tests/unit/install/test_drift.py -- proves: Cache-miss paths in drift replay raise CacheMissError so the audit command can catch and handle gracefully [devx,governed-by-policy]

This panel is advisory. It does not block merge. Re-apply the
panel-review label after addressing feedback to re-run.

CEO panel recommended landing two in-PR follow-ups before merge:

1. Recovery hint in drift output (cli-logging + devx-ux convergence):
   render_drift_text now appends '[i] Run apm install to re-sync
   deployed files with the lockfile.' so users see WHAT and HOW in one
   message. Honors Message Writing Rule #4 'Include the fix'.

2. Doc-sync (doc-writer + devx-ux convergence):
   - reference/cli-commands.md: add --no-drift to audit options table;
     amend --ci description to mention drift contribution.
   - integrations/ci-cd.md: replace bash 'git status --porcelain'
     workaround under 'Verify Deployed Primitives' with 'apm audit --ci'
     one-liner; update 'We dogfood this' callout text.
   - getting-started/quick-start.md: retarget stale cross-ref from the
     now-superseded ci-cd anchor to the new drift-detection guide.
   - guides/drift-detection.md: drop the self-contradictory case #2 in
     'When to use --no-drift' (strip-mode is auto-skipped, not opt-out).
   - CHANGELOG.md: compress verbose entry to one Keep-a-Changelog line
     pointing readers to the guide for detail.

Tracked as follow-up issues (CEO call):
- supply-chain: verify cache content matches lockfile resolved_commit
  before drift replay trusts it (commit-SHA pinning bypass on shared
  CI caches).
- test-coverage: inverse-normalization unit test asserting BOM/CRLF/
  Build-ID guards do NOT mask real content drift (safety invariant).

Lint clean. 45 drift tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel marked this pull request as ready for review May 4, 2026 21:53
Copilot AI review requested due to automatic review settings May 4, 2026 21:53
@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

Addressed in-PR (commit 7907865)

Per the CEO panel recommendation, landed the two convergent in-PR follow-ups:

1. Recovery hint (cli-logging + devx-ux):
render_drift_text now appends [i] Run 'apm install' to re-sync deployed files with the lockfile. -- the diagnostic now tells users WHAT drifted AND HOW to fix it in one message. Honors Message Writing Rule #4 ('Include the fix').

2. Doc-sync (doc-writer + devx-ux):

  • reference/cli-commands.md -- added --no-drift to audit options; amended --ci to mention drift contribution.
  • integrations/ci-cd.md -- replaced the now-superseded bash git status --porcelain workaround with apm audit --ci; updated 'We dogfood this' callout.
  • getting-started/quick-start.md -- retargeted stale cross-ref to the new drift-detection guide.
  • guides/drift-detection.md -- removed the self-contradictory case Integrate copilot runtime #2 in 'When to use --no-drift' (strip-mode is auto-skipped, not opt-out).
  • CHANGELOG.md -- compressed to one Keep-a-Changelog line pointing readers to the guide.

All 45 drift tests pass; lint clean.

Tracked as follow-up issues (CEO call)

  • supply-chain-security: verify cache content matches lockfile resolved_commit before drift replay trusts it (commit-SHA pinning bypass on shared CI caches with mounted volumes). Will file as a separate issue.
  • test-coverage: inverse-normalization unit test asserting BOM/CRLF/Build-ID guards do NOT mask real content drift (safety invariant on a secure-by-default surface). Will file as a separate issue.
  • growth: inbound cross-links from ci-policy-setup, governance-guide, making-the-case to the drift-detection guide (internal-link equity). Cheap follow-up PR.
  • architecture: scratch_root kwarg formalization as a DeployTarget protocol; lift normalization helpers to utils/normalization.py for sharing with compile/verify pipelines.
  • logging: TTY/--quiet awareness in CheckLogger; verbose scratch-path disclosure.

Status

PR is ready for review. CEO stance: ship_with_followups.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends apm audit to perform integration drift detection by default by replaying the install/integration pipeline into a scratch directory (cache-only) and diffing the result against the working tree. It adds a new drift engine, wires drift into text/JSON/SARIF outputs, updates CI/docs, and introduces extensive unit/integration/e2e coverage.

Changes:

  • Add a cache-only “install replay + diff” drift engine with renderers (text/JSON/SARIF) and normalization guards (Build ID / CRLF / BOM).
  • Wire drift detection into apm audit (default-on) with --no-drift opt-out, and propagate findings into CI JSON + SARIF payloads.
  • Add read-only defensive guard + substantial test coverage; update CI scripts and docs to reflect the new drift surface.
Show a summary per file
File Description
uv.lock Bumps apm-cli version to 0.12.1.
src/apm_cli/install/drift.py New drift replay + diff engine with normalization + renderers.
src/apm_cli/install/services.py Adds scratch_root kwarg + safety guard for replayed integrations.
src/apm_cli/utils/guards.py New read-only project-tree guard used as defense-in-depth for drift.
src/apm_cli/utils/diagnostics.py Adds drift diagnostic category + rendering.
src/apm_cli/policy/ci_checks.py Adds _check_drift() to run replay+diff and return findings.
src/apm_cli/commands/audit.py Adds --no-drift, runs drift by default, integrates drift into JSON/SARIF/text output paths.
tests/unit/utils/test_guards.py Unit coverage for the read-only guard behavior (create/modify/delete).
tests/unit/utils/init.py Ensures tests.unit.utils is a package for imports/discovery.
tests/unit/test_audit_policy_command.py Updates audit policy unit tests to disable drift (stabilize expectations).
tests/unit/test_audit_ci_command.py Updates CI-mode audit unit tests to disable drift (stabilize expectations).
tests/unit/test_audit_ci_auto_discovery.py Updates auto-discovery unit tests to disable drift.
tests/unit/install/test_drift.py Unit tests for drift normalization, diff kinds, SARIF rule IDs, and stderr-only phase logging.
tests/unit/install/test_drift_perf.py Performance smoke test for replay+diff under a 5s budget for 100 primitives.
tests/integration/test_drift_check.py Integration tests covering drift kinds, edges, multi-target behavior, and --no-drift mutex.
tests/integration/test_drift_check_e2e.py E2E tests for no-write contract, air-gap behavior, performance smoke, and JSON/SARIF shape stability.
scripts/test-integration.sh Ensures new drift integration/e2e suites run in CI integration job.
packages/apm-guide/.apm/skills/apm-usage/commands.md Updates apm-guide command reference for drift default + --no-drift.
docs/src/content/docs/reference/cli-commands.md Documents --no-drift and drift behavior in apm audit --ci.
docs/src/content/docs/integrations/ci-cd.md Replaces legacy git status drift gate with apm audit --ci.
docs/src/content/docs/guides/drift-detection.md New/updated guide describing drift behavior, formats, and CI usage.
docs/src/content/docs/getting-started/quick-start.md Updates link/reference to drift checking guidance.
CHANGELOG.md Adds Unreleased entry for default-on drift detection.
.github/workflows/ci.yml Notes legacy bash drift step as redundant once apm-action picks up this CLI version.

Copilot's findings

Comments suppressed due to low confidence (2)

src/apm_cli/policy/ci_checks.py:470

  • _check_drift() resolves targets via resolve_targets(project_root) (auto-detect). If a project declares target: in apm.yml but the corresponding deployment dirs don't exist yet, auto-detect can omit those targets and the diff will miss drift (e.g., unintegrated outputs under .claude/, .cursor/, etc.). Consider resolving targets with the same explicit target selection used by run_replay() (read from apm.yml) so governed roots are complete even when dirs are absent.
    logger.diff_start()
    resolved_targets = resolve_targets(project_root)
    if targets:
        resolved_targets = [t for t in resolved_targets if t.name in set(targets)]
    findings = diff_scratch_against_project(scratch, project_root, lockfile, resolved_targets)

docs/src/content/docs/guides/drift-detection.md:94

  • The JSON example shows drift as a list, but the implementation/tests treat the top-level drift key as an object containing a drift list (i.e., payload['drift']['drift']). Please update this example (or adjust the implementation) so the documented schema matches the actual output shape.
**JSON** -- the audit report gains a top-level `drift` key:

```json
{
  "report_format_version": "1.0",
  "checks": [...],
  "drift": [
    {
      "path": ".github/instructions/foo.md",
      "kind": "modified",
      "package": "<local>",
      "inline_diff": "..."
    }
  ]
}
</details>


- **Files reviewed:** 22/24 changed files
- **Comments generated:** 8


Comment on lines +23 to +34
def _snapshot(paths: list[Path]) -> dict[Path, tuple[float, int] | None]:
"""Capture (mtime_ns, size) for each path, or ``None`` if missing.

Symlinks are followed; missing paths record ``None`` so they may
legitimately remain absent without triggering the guard.
"""
snap: dict[Path, tuple[float, int] | None] = {}
for p in paths:
try:
st = p.stat()
snap[p] = (st.st_mtime_ns, st.st_size)
except FileNotFoundError:
passed=False,
message=(
f"drift replay aborted: {exc} -- "
"run 'apm install' first or use --no-cache (not yet supported)"
Comment on lines +406 to +413
def _check_drift(
project_root: Path,
lockfile,
targets=None,
cache_only: bool = True,
verbose: bool = False,
) -> tuple[CheckResult, list]:
"""Replay the install in a scratch dir and diff against the project.
for stream_name, stream in (("stdout", result.stdout), ("stderr", result.stderr)):
text = stream or ""
for ch in text:
assert ord(ch) < 128 or ch in {"\u2500", "\u2501"} or ord(ch) > 0xFFFF or True, (
# Drift severities: kinds of divergence from the lockfile-defined state.
DRIFT_MODIFIED = "modified" # tracked file content changed
DRIFT_UNINTEGRATED = "unintegrated" # tracked file missing from project
DRIFT_ORPHANED = "orphaned" # untracked file present in managed dir
Comment thread CHANGELOG.md

### Added

- **`apm audit` now detects integration drift by default.** Read-only cache-only install replay catches missed `apm install` runs, hand-edited deployed files, and orphaned files. Findings exposed in JSON (`drift` key) and SARIF (`apm/drift/<kind>` rules); in `--ci` mode they contribute to the exit code. Opt out with `--no-drift` (mutually exclusive with `--strip`/`--file`). See the [Drift Detection guide](docs/src/content/docs/guides/drift-detection.md) for details. (#1071, supersedes scope of #898)
Comment on lines +53 to +59
| `apm audit` | Reported in stdout | 0 (advisory only) |
| `apm audit --ci` | Reported and counted as failure | 1 |
| `apm audit --no-drift` | Skipped entirely | governed only by other checks |

In `--ci` mode drift findings are pooled with the seven baseline lockfile
checks (`lockfile-exists`, `ref-consistency`, etc.) -- a single
non-zero exit covers all of them.
Comment on lines 665 to +676
@@ -593,6 +670,10 @@ def _audit_content_scan(
all_findings = [f for ff in findings_by_file.values() for f in ff]
exit_code = 1 if ContentScanner.has_critical(all_findings) else 2

# Drift findings escalate exit code to 1 (critical).
if drift_failed and exit_code == 0:
exit_code = 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fold integration drift gate into 'apm audit --drift' (or 'apm install --check')

2 participants