feat(audit): ContractTrendAnalyzer for cross-session pass-rate regression detection#2
Merged
practicalmind-dev merged 2 commits intoPracticalMind:mainfrom Apr 1, 2026
Conversation
…sion detection Adds gateframe.audit.trend.ContractTrendAnalyzer which reads an existing JSONL audit log written by JsonFileExporter, groups entries by workflow_id, and computes per-contract OLS pass-rate slopes to detect reliability regressions across workflow runs. Closes: none (requested in issue PracticalMind#1) Changes: - gateframe/audit/trend.py -- ContractTrendAnalyzer, TrendReport, ContractTrend, WorkflowRunSummary - tests/audit/__init__.py - tests/audit/test_trend.py -- 19 tests (all passing) API: from pathlib import Path from gateframe.audit.trend import ContractTrendAnalyzer report = ContractTrendAnalyzer(Path('audit.jsonl'), window=20).analyze() if report.any_regression: # one or more contracts are degrading for ct in report.regressions: print(ct.contract_name, ct.slope) Key design decisions (per issue PracticalMind#1 discussion): - window=N means last N *workflow_id groups* (not raw events) - Entries without workflow_id are silently skipped - Groups ordered by earliest timestamp seen within each workflow_id - OLS via statistics.linear_regression (stdlib, zero new deps) - Strictly additive -- no changes to AuditLog, JsonFileExporter, or CLI Reference: PDR in Production v2.5 -- DOI 10.5281/zenodo.19362461
Contributor
|
CI flagged a few linting issues, mostly import ordering and unused imports. Should be quick to fix: ruff check --fix . That should auto-fix most of them. The remaining ones (SIM210 and the long line) will need a manual touch. |
…, wrap long line - gateframe/audit/trend.py: remove unused Optional import, sort import block - tests/audit/test_trend.py: sort import block, remove unused TrendReport and WorkflowRunSummary imports, simplify 'True if i < 8 else False' → 'i < 8', wrap 101-char line to satisfy E501 Fixes CI lint failures flagged by ruff check.
Contributor
Author
|
Fixed in commit 0d37fd9. Ruff passes clean now:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1
Summary
Adds
gateframe.audit.trend.ContractTrendAnalyzer— reads an existing JSONL audit log written byJsonFileExporter, groups entries byworkflow_id, and computes per-contract OLS pass-rate slopes to detect reliability regressions across workflow runs.Motivation (from issue #1)
JsonFileExporteralready writes an append-mode JSONL audit trail. The data needed for cross-run trend analysis is already there. The missing layer is grouping by workflow run and computing whether a contract's pass rate is trending up or down over time.As clarified in the issue thread:
window=Nmeans last N distinctworkflow_idgroups, not last N raw events — so the slope reflects run-over-run trend, not intra-run noise.Usage
Implementation
gateframe/audit/trend.py—ContractTrendAnalyzer,TrendReport,ContractTrend,WorkflowRunSummarytests/audit/test_trend.py— 19 tests, all passingDesign constraints
JsonFileExporterJSONL output — no new data formatworkflow_idare silently skipped (backward compat)workflow_idgroupstatistics.linear_regression(stdlib, zero new dependencies)AuditLog,JsonFileExporter,AuditEntry, or CLITests
Test coverage includes: improving/degrading/stable trends, window capping, temporal ordering (by first-seen timestamp), entries without workflow_id, malformed JSONL lines, missing file, multiple contracts, regression threshold boundary.
When
JsonFileExporteranswers "what happened in each validation",ContractTrendAnalyzeranswers "is this contract getting more or less reliable over time?"Reference: PDR in Production v2.5