Context
Discovered during implementation of arch-check suppression (issue #23 / PR from branch archon/task-feat-issue-23-arch-check-suppression).
Description
The current suppression implementation scans the violation sample (default 10 rows) in Python to determine suppressed_count, then subtracts from the Cypher COUNT(*) total. This produces incorrect violation_count when the suppression key matches violations outside the sample window.
Example: policy has 50 violations, sample shows 10. User suppresses a key that would match all 50. We see 8 matches in the sample, so we report violation_count = 50 - 8 = 42 unsuppressed — but the true answer is 0.
The safety clamp max(0, violation_count - suppressed_count) was added defensively, but the root problem remains: sampling-based counting is imprecise.
Suggested approach
Run a second Cypher query per policy when suppressions exist, applying the suppression filter directly in Cypher (WHERE NOT key IN $suppressed_keys) and using COUNT(*) for an exact unsuppressed total. This would replace the Python sample-scan approach and make violation_count authoritative.
For cycle suppressions (which use substring/edge matching), a Cypher ANY(pair IN ...) predicate over the cycle list could replicate the Python edge-pair logic natively.
Context
Discovered during implementation of arch-check suppression (issue #23 / PR from branch
archon/task-feat-issue-23-arch-check-suppression).Description
The current suppression implementation scans the violation sample (default 10 rows) in Python to determine
suppressed_count, then subtracts from the CypherCOUNT(*)total. This produces incorrectviolation_countwhen the suppression key matches violations outside the sample window.Example: policy has 50 violations, sample shows 10. User suppresses a key that would match all 50. We see 8 matches in the sample, so we report
violation_count = 50 - 8 = 42 unsuppressed— but the true answer is 0.The safety clamp
max(0, violation_count - suppressed_count)was added defensively, but the root problem remains: sampling-based counting is imprecise.Suggested approach
Run a second Cypher query per policy when suppressions exist, applying the suppression filter directly in Cypher (
WHERE NOT key IN $suppressed_keys) and usingCOUNT(*)for an exact unsuppressed total. This would replace the Python sample-scan approach and makeviolation_countauthoritative.For cycle suppressions (which use substring/edge matching), a Cypher
ANY(pair IN ...)predicate over the cycle list could replicate the Python edge-pair logic natively.