feat(rule-tracker): per-repo origin tagging + cross-repo pollution guard by rladmsgh34 · Pull Request #35 · rladmsgh34/ai-dev-loop-analyzer

rladmsgh34 · 2026-04-26T13:35:32Z

Summary

loop-efficacy 측정 무력화 원인 발견 — rules-history.json이 레포 간 공유라 vue snapshot이 gwangcheon 유발 rule에 누적되며 pollution 발생. measurement infra만 깔고 숫자는 자라기 기다리는 living document 방식으로 baseline lock-in 회피.

스키마 변경

필드	의미
`rules[].origin_repo`	`'owner/repo'` \| `'self'` \| `'unknown'`
`rules[].origin_confidence`	`'explicit'` (induction 시점 fact) \| `'inferred'` (heuristic) \| `'unknown'`
`snapshots[].repo`	이 snapshot 생산한 분석의 타겟 레포

Pollution 가드

record_snapshot: rule의 origin_repo와 다른 repo snapshot은 attach 안 함
legacy unknown origin rule은 모든 snapshot 받음 (backward compat best-effort)
compute_effectiveness: repo / include_self / include_inferred 플래그
- self 기본 제외 — analyzer가 자기 자신 분석하는 재귀 측정 신뢰도 문제

마이그레이션 (idempotent)

scripts/migrate_rules_origin.py 결과:

origin	count	confidence
`self`	6	explicit
`vuejs/core`	7	inferred
`unknown`	17	unknown

→ default efficacy 계산은 vue 7 rule만 사용. 두 번째 실행은 30개 모두 skip (idempotent 검증 완료).

Living document 명시

effectiveness_summary 헤더: "데이터 누적과 함께 정확도 상승"
min_days=7 미충족 → 다음 cron 한 사이클 후 첫 숫자 등장
baseline lock-in 안 함 — n 작은 환경에서 정직한 표현

시퀀스 위치

이 PR은 measurement infra (#1). 다음:

(perf: --fetch-files 속도 개선 (3N → N API 호출 + 병렬화) #2) fetch_limit 스케일업 — intervention
(feat: --fetch-files 모드에서 diff를 LLM 프롬프트에 포함 #3) stable_domains — 지금 데이터로 추가 가능
(deferred) 임계값 검증 — 다음 cron 후 결정 (TODO 주석)

Test plan

pytest — 94/94 통과 (87 prior + 7 new rule_tracker)
YAML syntax
migration idempotent 검증 (1회: 30 tagged, 2회: 30 skipped)
CLI 3개 변형 동작 (--repo, --strict, default)
머지 후 첫 cron — 신규 rule이 explicit로 태깅되는지 확인
다음 cron — pollution 가드로 vue snapshot이 self/shop rule timeline에 안 끼어드는지 확인

Out of scope

외부 베타 (hook 추출) — recipient 없어 보류
임계값 검증 — fetch 스케일업 후

🤖 Generated with Claude Code

The 30 existing rules in rules-history.json were the dataset for loop-efficacy measurement, but inspection revealed pollution: rule X induced from gwangcheon-shop carried snapshots from later vue/core runs, making "did this rule reduce regressions" structurally unanswerable. Schema additions - rules[].origin_repo: 'owner/repo' | 'self' | 'unknown' - rules[].origin_confidence: 'explicit' (recorded at induction) | 'inferred' (heuristic from rule text) | 'unknown' - snapshots[].repo: which target repo's analysis produced this snapshot Pollution guard - record_snapshot now skips attaching to a rule when the rule's origin_repo doesn't match the snapshot's repo. Legacy unknown-origin rules accept all snapshots (best-effort backward compat). - compute_effectiveness gains repo / include_self / include_inferred flags. Self-rules are excluded by default — measuring them would require the analyzer analyzing itself (recursive, unstable). Migration (idempotent) - scripts/migrate_rules_origin.py classifies the 30 legacy rules: 6 explicit self, 7 inferred vuejs/core, 17 unknown. Re-runs skip already-tagged rules. Workflow - Step 3 passes TARGET_REPO when calling record_snapshot and record_new_rules. New rules always recorded as explicit fact (we know which repo we're analyzing at that moment). - analyzer's own repo recorded as 'self' for the special case (matrix never targets analyzer itself, but local self-runs would). Living document framing - effectiveness_summary header marks output as accumulating; n=7 vue rules + min_days=7 means real numbers won't appear until the next cron cycle. CLI now supports --repo, --include-self, --strict. Tests - 7 new tests in tests/test_rule_tracker.py cover origin recording, pollution guard (the load-bearing one), repo filter, self exclusion, and strict mode. 94/94 pass.

Dry-run on PR branch (run 24957960684) verified the new origin tagging works for fresh writes (15 new rules tagged with explicit owner/repo, new snapshots carry repo field), but exposed leftover pollution: the 30 legacy rules retained 75 rule-attached snapshots and 1 global snapshot from before per-repo tracking landed. Those have no repo field, so the pollution guard can't tell which target produced them and downstream efficacy calc would still mix data. Migration now also purges any snapshot lacking a repo field. Same idempotent contract — already-clean histories stay no-op on re-run. Verified - Local migration purged 1 global + 75 rule-attached dirty snapshots - Second run: 0/0 purge (idempotent) - Live data file: zero repo-less snapshots remain - New test covers both purge correctness and idempotence

rladmsgh34 and others added 8 commits April 26, 2026 13:35

data: daily snapshot 2026-04-26 [rladmsgh34/gwangcheon-shop]

56ae214

data: daily snapshot 2026-04-26 [vuejs/core]

c662a5a

data: daily snapshot 2026-04-26 [vercel/next.js]

93608bc

data: daily snapshot 2026-04-26 [rladmsgh34/gwangcheon-shop]

1f9a45c

data: daily snapshot 2026-04-26 [vuejs/core]

33613bd

data: daily snapshot 2026-04-26 [vercel/next.js]

9deb1d1

rladmsgh34 merged commit 71cc737 into main Apr 26, 2026

rladmsgh34 deleted the feat/rules-history-per-repo-origin branch April 26, 2026 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rule-tracker): per-repo origin tagging + cross-repo pollution guard#35

feat(rule-tracker): per-repo origin tagging + cross-repo pollution guard#35
rladmsgh34 merged 8 commits intomainfrom
feat/rules-history-per-repo-origin

rladmsgh34 commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rladmsgh34 commented Apr 26, 2026

Summary

스키마 변경

Pollution 가드

마이그레이션 (idempotent)

Living document 명시

시퀀스 위치

Test plan

Out of scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant