Skip to content

feat: repo-first evidence pipeline (v0.10.0)#52

Open
brianruggieri wants to merge 25 commits intomainfrom
feat/evidence-pipeline
Open

feat: repo-first evidence pipeline (v0.10.0)#52
brianruggieri wants to merge 25 commits intomainfrom
feat/evidence-pipeline

Conversation

@brianruggieri
Copy link
Copy Markdown
Owner

@brianruggieri brianruggieri commented Mar 31, 2026

Summary

Implements the 6-decision repo-first evidence pipeline redesign from the 2026-03-30 grill session:

  • D1 — RepoProject schema: Replaces ProjectSummary with RepoProject as canonical project source in MergedEvidenceProfile.projects. Projects now come from scanned repos via merge_triad(), not sessions.
  • D2 — Commit evidence extraction: Two-stage pipeline — heuristic pre-filter (commit_filter.py) classifies commits into tiers and drops noise, then Claude extracts pithy highlight quotes with skill tags (commit_highlighter.py). Heuristic fallback when Claude is unavailable.
  • D3 — Repo link wiring: repo_url and commit_url flow through select_projects() and select_evidence_highlights() to Hugo front matter.
  • D4 — Three-tier skill resolution: Abstract skills (agentic-workflows, system-design, etc.) now resolve via mechanical match → commit tags (stub) → repo signal inference (static rules + Claude).
  • D5 — Quantitative fallback: format_skill_repo_fallback() generates hiring-manager-aligned text from repo stats (repo count, timeline, test coverage, frameworks, CI) when no commit highlights exist.
  • D6 — Session dormancy: export_fit_assessment() no longer reads candidate_profile.json. Session scanning infrastructure preserved but dormant.

New modules

  • src/claude_candidate/commit_filter.py — heuristic pre-filter with tier classification
  • src/claude_candidate/commit_highlighter.py — Claude-powered highlight extraction with fallback

Stats

  • 22 files changed, +3118/−255 lines
  • 1544 tests pass (28 skipped slow), up from 1423

Test plan

  • Fast suite green: .venv/bin/python -m pytest (1544 passed, 28 skipped)
  • Phase 1 spec review: all 8 tasks verified spec compliant
  • Slow suite: .venv/bin/python -m pytest --run-slow
  • Benchmark: .venv/bin/python tests/golden_set/benchmark_accuracy.py

Review fixes (f316048)

  • fit_exporter.py: repo_url falls back to public_repo_url for legacy ProjectSummary shapes
  • commit_filter.py: removed unused field import
  • commit_highlighter.py: catch ClaudeCLIError specifically before broad Exception
  • test_commit_filter.py: removed unused pytest import
  • test_repo_project_schema.py: removed unused pytest import
  • fit_exporter.py L1098: evidence_highlights empty by design (D6 dormancy, D2 wiring in follow-up) — no change needed

Remove candidate_profile_path parameter from export_fit_assessment().
Evidence highlights return empty list until commit evidence (D2) is wired.
select_evidence_highlights() function preserved for future reuse.
- ABSTRACT_SKILLS constant and _ABSTRACT_SIGNAL_RULES mapping
- _apply_signal_rules() for repo signal-based inference
- _build_abstract_skill_prompt() and _claude_infer_abstract_skills()
- _build_commit_tagged_skills() stub for tier 2
- resolve_abstract_skills() three-tier orchestrator
Copilot AI review requested due to automatic review settings March 31, 2026 01:13
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the v0.10.0 “repo-first evidence pipeline” redesign by shifting MergedEvidenceProfile.projects to a repo-derived RepoProject, adding commit evidence extraction primitives (filter + Claude/heuristic highlighter), and updating export/merge paths and tests to reflect session dormancy.

Changes:

  • Introduce RepoProject (replacing session ProjectSummary in merged profiles) and update merge/scoring call sites.
  • Add commit evidence extraction pipeline: heuristic commit filtering + Claude-powered (with fallback) highlight extraction.
  • Update fit export + tests for repo-first project shape, abstract-skill resolution scaffolding, and v0.10.0 version bumps.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/test_repo_scanner.py Adds coverage for raw commit fetching and optional highlight extraction in local repo scans.
tests/test_repo_project_schema.py Adds tests for RepoProject schema and from_repo_evidence() factory behavior.
tests/test_repo_profile.py Adds tests for CommitHighlight schema round-tripping and RepoEvidence defaults.
tests/test_merger.py Updates merger tests to assert projects come from repo profiles (not sessions).
tests/test_merged_profile_projects.py Verifies merged profiles accept RepoProject instances and serialize/deserialize.
tests/test_fit_exporter.py Updates exporter tests for RepoProject-shaped projects, dormancy, repo_url/commit_url wiring, and abstract-skill helpers.
tests/test_commit_highlighter.py Adds unit + slow integration coverage for commit highlight extraction and fallback.
tests/test_commit_filter.py Adds tests for commit noise filtering, tiering, scoring, and slot budgeting.
src/claude_candidate/scoring/dimensions.py Updates project-name usage to RepoProject.name.
src/claude_candidate/schemas/repo_profile.py Adds RepoProject + CommitHighlight, and adds commit_highlights to RepoEvidence.
src/claude_candidate/schemas/merged_profile.py Switches merged projects field type from ProjectSummary to RepoProject.
src/claude_candidate/schemas/init.py Re-exports RepoProject from the schemas package.
src/claude_candidate/repo_scanner.py Adds remote URL detection, optional highlight extraction, and raw commit fetching/parser helpers.
src/claude_candidate/merger.py Populates merged projects from repo_profile.repos via RepoProject.from_repo_evidence().
src/claude_candidate/fit_exporter.py Adds repo quantitative fallback formatter + abstract skill resolution helpers; updates project selection for RepoProject shape; removes candidate_profile dependency from export.
src/claude_candidate/extractors/signal_merger.py Marks session-based project building as dormant with updated docstrings.
src/claude_candidate/commit_highlighter.py New: Claude-powered commit highlight extraction with heuristic fallback.
src/claude_candidate/commit_filter.py New: heuristic pre-filter with tiering/noise detection and scoring.
src/claude_candidate/cli.py Updates export-fit command to no longer pass candidate_profile_path into exporter.
src/claude_candidate/init.py Bumps package version to 0.10.0.
pyproject.toml Bumps project version to 0.10.0.
extension/manifest.json Bumps extension version to 0.10.0.
Comments suppressed due to low confidence (1)

src/claude_candidate/cli.py:338

  • The export-fit command no longer checks that candidate_profile.json exists, but it still unconditionally reads it via CandidateProfile.from_json(candidate_path.read_text()). If the file is missing, this will raise a low-level exception/stack trace. Consider restoring an explicit existence check (or catching the error) to emit a clear CLI message and exit code.
	data_dir = Path.home() / ".claude-candidate"
	db_path = Path(db) if db else data_dir / "assessments.db"
	candidate_path = data_dir / "candidate_profile.json"

	# Validate paths
	if not db_path.exists():
		click.echo(f"Error: Database not found at {db_path}", err=True)
		raise SystemExit(1)

	# Load assessment from DB
	async def _load():
		store = AssessmentStore(db_path)
		await store.initialize()
		try:
			return await store.get_assessment(assessment_id)
		finally:
			await store.close()

	assessment = asyncio.run(_load())
	if not assessment:
		click.echo(f"Error: Assessment '{assessment_id}' not found.", err=True)
		raise SystemExit(1)

	# Build merged profile on the fly — no merged_profile.json needed
	cp = CandidateProfile.from_json(candidate_path.read_text())
	merged = _merge_profile(cp, quiet=True)

Comment thread src/claude_candidate/fit_exporter.py Outdated
Comment thread src/claude_candidate/fit_exporter.py
Comment thread src/claude_candidate/commit_filter.py Outdated
Comment thread src/claude_candidate/commit_highlighter.py Outdated
Comment thread tests/test_commit_filter.py Outdated
Comment thread tests/test_repo_project_schema.py Outdated
- fit_exporter.py: repo_url falls back to public_repo_url for legacy shapes
- commit_filter.py: remove unused `field` import
- commit_highlighter.py: catch ClaudeCLIError specifically before broad Exception
- test_commit_filter.py: remove unused pytest import
- test_repo_project_schema.py: remove unused pytest import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants