Skip to content

feat(audit): ContractTrendAnalyzer for cross-session pass-rate regression detection#2

Merged
practicalmind-dev merged 2 commits intoPracticalMind:mainfrom
nanookclaw:feat/contract-trend-analyzer
Apr 1, 2026
Merged

feat(audit): ContractTrendAnalyzer for cross-session pass-rate regression detection#2
practicalmind-dev merged 2 commits intoPracticalMind:mainfrom
nanookclaw:feat/contract-trend-analyzer

Conversation

@nanookclaw
Copy link
Copy Markdown
Contributor

Closes #1

Summary

Adds gateframe.audit.trend.ContractTrendAnalyzer — reads an existing JSONL audit log written by JsonFileExporter, groups entries by workflow_id, and computes per-contract OLS pass-rate slopes to detect reliability regressions across workflow runs.

Motivation (from issue #1)

JsonFileExporter already writes an append-mode JSONL audit trail. The data needed for cross-run trend analysis is already there. The missing layer is grouping by workflow run and computing whether a contract's pass rate is trending up or down over time.

As clarified in the issue thread: window=N means last N distinct workflow_id groups, not last N raw events — so the slope reflects run-over-run trend, not intra-run noise.

Usage

from pathlib import Path
from gateframe.audit.trend import ContractTrendAnalyzer

report = ContractTrendAnalyzer(Path('audit.jsonl'), window=20).analyze()

if report.any_regression:
    for ct in report.regressions:
        print(f'{ct.contract_name}: slope={ct.slope:.4f} ({ct.direction})')
        for run in ct.run_summaries[-3:]:
            print(f'  {run.workflow_id}: {run.pass_rate:.1%} ({run.passed}/{run.total})')

Implementation

  • gateframe/audit/trend.pyContractTrendAnalyzer, TrendReport, ContractTrend, WorkflowRunSummary
  • tests/audit/test_trend.py — 19 tests, all passing

Design constraints

  • Reads existing JsonFileExporter JSONL output — no new data format
  • Entries without workflow_id are silently skipped (backward compat)
  • Runs ordered by earliest timestamp within each workflow_id group
  • OLS via statistics.linear_regression (stdlib, zero new dependencies)
  • Strictly additive — no changes to AuditLog, JsonFileExporter, AuditEntry, or CLI

Tests

python -m pytest tests/audit/test_trend.py -v
# 19 passed in 0.20s

Test coverage includes: improving/degrading/stable trends, window capping, temporal ordering (by first-seen timestamp), entries without workflow_id, malformed JSONL lines, missing file, multiple contracts, regression threshold boundary.


When JsonFileExporter answers "what happened in each validation", ContractTrendAnalyzer answers "is this contract getting more or less reliable over time?"

Reference: PDR in Production v2.5

…sion detection

Adds gateframe.audit.trend.ContractTrendAnalyzer which reads an existing
JSONL audit log written by JsonFileExporter, groups entries by workflow_id,
and computes per-contract OLS pass-rate slopes to detect reliability
regressions across workflow runs.

Closes: none (requested in issue PracticalMind#1)

Changes:
- gateframe/audit/trend.py         -- ContractTrendAnalyzer, TrendReport,
                                       ContractTrend, WorkflowRunSummary
- tests/audit/__init__.py
- tests/audit/test_trend.py        -- 19 tests (all passing)

API:
    from pathlib import Path
    from gateframe.audit.trend import ContractTrendAnalyzer

    report = ContractTrendAnalyzer(Path('audit.jsonl'), window=20).analyze()
    if report.any_regression:
        # one or more contracts are degrading
        for ct in report.regressions:
            print(ct.contract_name, ct.slope)

Key design decisions (per issue PracticalMind#1 discussion):
- window=N means last N *workflow_id groups* (not raw events)
- Entries without workflow_id are silently skipped
- Groups ordered by earliest timestamp seen within each workflow_id
- OLS via statistics.linear_regression (stdlib, zero new deps)
- Strictly additive -- no changes to AuditLog, JsonFileExporter, or CLI

Reference: PDR in Production v2.5 -- DOI 10.5281/zenodo.19362461
@practicalmind-dev
Copy link
Copy Markdown
Contributor

CI flagged a few linting issues, mostly import ordering and unused imports. Should be quick to fix:

ruff check --fix .

That should auto-fix most of them. The remaining ones (SIM210 and the long line) will need a manual touch.

…, wrap long line

- gateframe/audit/trend.py: remove unused Optional import, sort import block
- tests/audit/test_trend.py: sort import block, remove unused TrendReport and
  WorkflowRunSummary imports, simplify 'True if i < 8 else False' → 'i < 8',
  wrap 101-char line to satisfy E501

Fixes CI lint failures flagged by ruff check.
@nanookclaw
Copy link
Copy Markdown
Contributor Author

Fixed in commit 0d37fd9. Ruff passes clean now:

  • Sorted import blocks in both files (I001)
  • Removed unused Optional import from trend.py (F401)
  • Removed unused TrendReport and WorkflowRunSummary test imports (F401)
  • Simplified True if i < 8 else Falsei < 8 (SIM210)
  • Wrapped the 101-char line in test_malformed_lines_skipped (E501)

ruff check .All checks passed!

@practicalmind-dev practicalmind-dev merged commit 2247a48 into PracticalMind:main Apr 1, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(audit): ContractTrendAnalyzer — cross-session pass rate trend detection from JSONL audit log

2 participants