feat(audit): ContractTrendAnalyzer for cross-session pass-rate regression detection by nanookclaw · Pull Request #2 · PracticalMind/gateframe

nanookclaw · 2026-04-01T19:15:11Z

Closes #1

Summary

Adds gateframe.audit.trend.ContractTrendAnalyzer — reads an existing JSONL audit log written by JsonFileExporter, groups entries by workflow_id, and computes per-contract OLS pass-rate slopes to detect reliability regressions across workflow runs.

Motivation (from issue #1)

JsonFileExporter already writes an append-mode JSONL audit trail. The data needed for cross-run trend analysis is already there. The missing layer is grouping by workflow run and computing whether a contract's pass rate is trending up or down over time.

As clarified in the issue thread: window=N means last N distinct workflow_id groups, not last N raw events — so the slope reflects run-over-run trend, not intra-run noise.

Usage

from pathlib import Path
from gateframe.audit.trend import ContractTrendAnalyzer

report = ContractTrendAnalyzer(Path('audit.jsonl'), window=20).analyze()

if report.any_regression:
    for ct in report.regressions:
        print(f'{ct.contract_name}: slope={ct.slope:.4f} ({ct.direction})')
        for run in ct.run_summaries[-3:]:
            print(f'  {run.workflow_id}: {run.pass_rate:.1%} ({run.passed}/{run.total})')

Implementation

gateframe/audit/trend.py — ContractTrendAnalyzer, TrendReport, ContractTrend, WorkflowRunSummary
tests/audit/test_trend.py — 19 tests, all passing

Design constraints

Reads existing JsonFileExporter JSONL output — no new data format
Entries without workflow_id are silently skipped (backward compat)
Runs ordered by earliest timestamp within each workflow_id group
OLS via statistics.linear_regression (stdlib, zero new dependencies)
Strictly additive — no changes to AuditLog, JsonFileExporter, AuditEntry, or CLI

Tests

python -m pytest tests/audit/test_trend.py -v
# 19 passed in 0.20s

Test coverage includes: improving/degrading/stable trends, window capping, temporal ordering (by first-seen timestamp), entries without workflow_id, malformed JSONL lines, missing file, multiple contracts, regression threshold boundary.

When JsonFileExporter answers "what happened in each validation", ContractTrendAnalyzer answers "is this contract getting more or less reliable over time?"

Reference: PDR in Production v2.5

…sion detection Adds gateframe.audit.trend.ContractTrendAnalyzer which reads an existing JSONL audit log written by JsonFileExporter, groups entries by workflow_id, and computes per-contract OLS pass-rate slopes to detect reliability regressions across workflow runs. Closes: none (requested in issue PracticalMind#1) Changes: - gateframe/audit/trend.py -- ContractTrendAnalyzer, TrendReport, ContractTrend, WorkflowRunSummary - tests/audit/__init__.py - tests/audit/test_trend.py -- 19 tests (all passing) API: from pathlib import Path from gateframe.audit.trend import ContractTrendAnalyzer report = ContractTrendAnalyzer(Path('audit.jsonl'), window=20).analyze() if report.any_regression: # one or more contracts are degrading for ct in report.regressions: print(ct.contract_name, ct.slope) Key design decisions (per issue PracticalMind#1 discussion): - window=N means last N *workflow_id groups* (not raw events) - Entries without workflow_id are silently skipped - Groups ordered by earliest timestamp seen within each workflow_id - OLS via statistics.linear_regression (stdlib, zero new deps) - Strictly additive -- no changes to AuditLog, JsonFileExporter, or CLI Reference: PDR in Production v2.5 -- DOI 10.5281/zenodo.19362461

practicalmind-dev · 2026-04-01T20:04:21Z

CI flagged a few linting issues, mostly import ordering and unused imports. Should be quick to fix:

ruff check --fix .

That should auto-fix most of them. The remaining ones (SIM210 and the long line) will need a manual touch.

…, wrap long line - gateframe/audit/trend.py: remove unused Optional import, sort import block - tests/audit/test_trend.py: sort import block, remove unused TrendReport and WorkflowRunSummary imports, simplify 'True if i < 8 else False' → 'i < 8', wrap 101-char line to satisfy E501 Fixes CI lint failures flagged by ruff check.

nanookclaw · 2026-04-01T22:14:15Z

Fixed in commit 0d37fd9. Ruff passes clean now:

Sorted import blocks in both files (I001)
Removed unused Optional import from trend.py (F401)
Removed unused TrendReport and WorkflowRunSummary test imports (F401)
Simplified True if i < 8 else False → i < 8 (SIM210)
Wrapped the 101-char line in test_malformed_lines_skipped (E501)

ruff check . → All checks passed!

practicalmind-dev merged commit 2247a48 into PracticalMind:main Apr 1, 2026
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(audit): ContractTrendAnalyzer for cross-session pass-rate regression detection#2

feat(audit): ContractTrendAnalyzer for cross-session pass-rate regression detection#2
practicalmind-dev merged 2 commits intoPracticalMind:mainfrom
nanookclaw:feat/contract-trend-analyzer

nanookclaw commented Apr 1, 2026

Uh oh!

practicalmind-dev commented Apr 1, 2026

Uh oh!

nanookclaw commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nanookclaw commented Apr 1, 2026

Summary

Motivation (from issue #1)

Usage

Implementation

Design constraints

Tests

Uh oh!

practicalmind-dev commented Apr 1, 2026

Uh oh!

nanookclaw commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants