Fix learning DB capture pipeline by Jason-Adam · Pull Request #42 · Jason-Adam/autodidact

Jason-Adam · 2026-03-31T15:03:29Z

Summary

Quality dedup: _normalize_error() now strips file:line:col: prefixes before hashing, so the same error type on different lines collapses to one DB key instead of many duplicates
Error learner: Created dedicated PostToolUseFailure hook — the PostToolUse hook only fires on success, so the old is_error gate was dead code
Observation capture: Removed rtk from skip prefixes, lowered min output threshold from 50 to 20 chars, and unwraps rtk proxy commands to get the real command for filtering/tagging
Routing gaps: Wired up db.record_routing_gap() in router.classify() when all deterministic tiers miss

Context

Analysis of the learning DB revealed all 32 rows came from a single source (quality_check). Three other capture paths — observation, error learning, and routing gaps — had zero rows due to overly restrictive filters, incorrect event assumptions, and missing wiring.

Test plan

All 462 tests pass
Verified _normalize_error produces identical hashes for same error type on different lines
RTK proxy unwrap tested: rtk proxy cat still skipped, rtk proxy git log captured with correct tags
Ruff/mypy/format all pass

🤖 Generated with Claude Code

…uting gap recording The learning DB was only capturing quality check (ruff/mypy) data because three other capture paths were effectively dead: 1. Quality dedup: _normalize_error now strips file:line:col: prefixes before hashing so same error type on different lines produces one DB key, not many. 2. Error learner: PostToolUse only fires on success, so the is_error gate never triggered. Created a dedicated PostToolUseFailure hook that handles the correct event for failed tool calls. 3. Observation capture: removed "rtk " from skip prefixes (RTK proxies real commands), lowered min output from 50 to 20 chars, and unwraps "rtk proxy" commands to get the real command for filtering/tagging. 4. Routing gaps: wired up db.record_routing_gap() in router.classify() when all deterministic tiers miss and tier-3 (LLM) is needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR fixes several gaps in the “learning DB capture” pipeline so that non-quality sources (tool observations, tool failures, and routing gaps) actually generate learning DB rows and deduplicate better.

Changes:

Improved deduping of quality issues by normalizing file:line:col:-style prefixes before hashing.
Split error learning into a new PostToolUseFailure hook and moved error-output teeing there (since PostToolUse runs only on success).
Broadened observation capture (lower output threshold, stop skipping rtk ..., unwrap rtk proxy ...) and wired routing-gap recording when deterministic routing misses.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`hooks/post_tool_use.py`	Updates observation capture behavior and error normalization; removes error-learner logic now handled on failures.
`hooks/post_tool_use_failure.py`	New hook to record failed tool-use error signatures and tee full error output.
`src/router.py`	Records routing gaps when all deterministic tiers miss.
`install.py`	Registers the new `PostToolUseFailure` hook event for installation.
`tests/test_post_tool_use.py`	Updates observation-capture tests for RTK/proxy behavior.
`tests/test_tee_output.py`	Updates tee-output tests to target the new failure hook and updated hint string.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hooks/post_tool_use.py

src/router.py

hooks/post_tool_use_failure.py

- Extract _normalize_error() to hooks/constants.py as shared normalize_error() — eliminates verbatim duplication between post_tool_use.py and post_tool_use_failure.py - Update post_tool_use.py docstring to reflect current responsibilities (quality checks + observations, not error learning) - Include tier 2.5 in routing gap tiers_attempted list - Guard _tee_output against empty cwd to prevent writing to process cwd Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ning - Move tee rotation glob after file write so new file is counted; fix off-by-one (>= to >) - Add usedforsecurity=False to all 4 MD5 call sites (non-security dedup/keying) - Add prune_routing_gaps(max_age_days=90) to LearningDB, wire into daily prune path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tency - Switch all LearningDB usage to `with` context managers (post_tool_use, post_tool_use_failure, session_start, router) to prevent connection leaks - Fix tee symlink guard: verify resolved path after mkdir instead of check-then-act; isolate chmod in its own suppress so it cannot swallow the return hint - Add version = 3 to migration chain for V4 safety - Update stale comment (cmd_stripped vs original command) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 31, 2026 15:03

Copilot started reviewing on behalf of Jason-Adam March 31, 2026 15:04 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

hooks/post_tool_use.py Outdated Show resolved Hide resolved

src/router.py Outdated Show resolved Hide resolved

src/router.py Show resolved Hide resolved

hooks/post_tool_use_failure.py Show resolved Hide resolved

hooks/post_tool_use_failure.py Outdated Show resolved Hide resolved

Jason-Adam and others added 3 commits March 31, 2026 16:11

Jason-Adam self-assigned this Mar 31, 2026

Jason-Adam merged commit 3fd3498 into main Mar 31, 2026
3 checks passed

Jason-Adam deleted the fix/learning-db-capture-pipeline branch March 31, 2026 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix learning DB capture pipeline#42

Fix learning DB capture pipeline#42
Jason-Adam merged 4 commits intomainfrom
fix/learning-db-capture-pipeline

Jason-Adam commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jason-Adam commented Mar 31, 2026

Summary

Context

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants