Merged
Conversation
…uting gap recording The learning DB was only capturing quality check (ruff/mypy) data because three other capture paths were effectively dead: 1. Quality dedup: _normalize_error now strips file:line:col: prefixes before hashing so same error type on different lines produces one DB key, not many. 2. Error learner: PostToolUse only fires on success, so the is_error gate never triggered. Created a dedicated PostToolUseFailure hook that handles the correct event for failed tool calls. 3. Observation capture: removed "rtk " from skip prefixes (RTK proxies real commands), lowered min output from 50 to 20 chars, and unwraps "rtk proxy" commands to get the real command for filtering/tagging. 4. Routing gaps: wired up db.record_routing_gap() in router.classify() when all deterministic tiers miss and tier-3 (LLM) is needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes several gaps in the “learning DB capture” pipeline so that non-quality sources (tool observations, tool failures, and routing gaps) actually generate learning DB rows and deduplicate better.
Changes:
- Improved deduping of quality issues by normalizing
file:line:col:-style prefixes before hashing. - Split error learning into a new
PostToolUseFailurehook and moved error-output teeing there (sincePostToolUseruns only on success). - Broadened observation capture (lower output threshold, stop skipping
rtk ..., unwraprtk proxy ...) and wired routing-gap recording when deterministic routing misses.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
hooks/post_tool_use.py |
Updates observation capture behavior and error normalization; removes error-learner logic now handled on failures. |
hooks/post_tool_use_failure.py |
New hook to record failed tool-use error signatures and tee full error output. |
src/router.py |
Records routing gaps when all deterministic tiers miss. |
install.py |
Registers the new PostToolUseFailure hook event for installation. |
tests/test_post_tool_use.py |
Updates observation-capture tests for RTK/proxy behavior. |
tests/test_tee_output.py |
Updates tee-output tests to target the new failure hook and updated hint string. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Extract _normalize_error() to hooks/constants.py as shared normalize_error() — eliminates verbatim duplication between post_tool_use.py and post_tool_use_failure.py - Update post_tool_use.py docstring to reflect current responsibilities (quality checks + observations, not error learning) - Include tier 2.5 in routing gap tiers_attempted list - Guard _tee_output against empty cwd to prevent writing to process cwd Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ning - Move tee rotation glob after file write so new file is counted; fix off-by-one (>= to >) - Add usedforsecurity=False to all 4 MD5 call sites (non-security dedup/keying) - Add prune_routing_gaps(max_age_days=90) to LearningDB, wire into daily prune path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tency - Switch all LearningDB usage to `with` context managers (post_tool_use, post_tool_use_failure, session_start, router) to prevent connection leaks - Fix tee symlink guard: verify resolved path after mkdir instead of check-then-act; isolate chmod in its own suppress so it cannot swallow the return hint - Add version = 3 to migration chain for V4 safety - Update stale comment (cmd_stripped vs original command) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_normalize_error()now stripsfile:line:col:prefixes before hashing, so the same error type on different lines collapses to one DB key instead of many duplicatesPostToolUseFailurehook — thePostToolUsehook only fires on success, so the oldis_errorgate was dead codertkfrom skip prefixes, lowered min output threshold from 50 to 20 chars, and unwrapsrtk proxycommands to get the real command for filtering/taggingdb.record_routing_gap()inrouter.classify()when all deterministic tiers missContext
Analysis of the learning DB revealed all 32 rows came from a single source (
quality_check). Three other capture paths — observation, error learning, and routing gaps — had zero rows due to overly restrictive filters, incorrect event assumptions, and missing wiring.Test plan
_normalize_errorproduces identical hashes for same error type on different linesrtk proxy catstill skipped,rtk proxy git logcaptured with correct tags🤖 Generated with Claude Code