feat: add issue fingerprinting for regression detection #45

dtsong · 2026-01-24T19:24:57Z

Closes #35

Summary

Generates consistent fingerprints for review issues using file path, line range bucket, category, severity, and normalized description
Same logical issue produces same fingerprint even with minor line shifts or wording variations
Fingerprints stored with metrics for regression tracking

Changes

New: src/pr_review_agent/review/fingerprint.py - hash algorithm with normalization
New: tests/test_fingerprint.py - 17 tests covering all edge cases
Modified: src/pr_review_agent/review/llm_reviewer.py - fingerprint field on ReviewIssue, auto-attached during review
Modified: src/pr_review_agent/metrics/supabase_logger.py - stores fingerprints with issue data

Fingerprint Algorithm

hash(file | line_bucket | category | severity | normalized_desc)[:16]

Line bucket: nearest 10 lines (so ±5 line shifts still match)
Normalized desc: lowercased, stop words removed, sorted words

Test Plan

All 315 tests pass (96.69% coverage)
Same issue → same fingerprint
Different issues → different fingerprints
Minor line shifts (±5) → same fingerprint
Description wording variations → same fingerprint
Null/missing fields handled gracefully

🤖 Generated with Claude Code

Generates consistent fingerprints for review issues using: - File path, line range bucket (±10 lines), category, severity - Normalized description (stop words removed, sorted) Enables detecting same issue across PRs, tracking resolution, and measuring fix effectiveness. - New: src/pr_review_agent/review/fingerprint.py - Modified: ReviewIssue now has fingerprint field - Modified: LLM reviewer attaches fingerprint to each issue - Modified: Supabase logger stores fingerprints Implements #35 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions · 2026-01-24T19:26:14Z

AI Code Review

Summary

The PR implements a solid fingerprinting system for review issues with good test coverage. The implementation is well-structured but has some potential security and usability concerns that should be addressed.

Confidence: 0.00 (low)
This PR may require human review

Strengths

Comprehensive test coverage with 17 test cases covering edge cases
Well-documented code with clear docstrings explaining the algorithm
Thoughtful normalization approach that handles description variations
Proper integration with existing ReviewIssue dataclass
Clean separation of concerns with dedicated fingerprint module

Issues Found

Major (2)

src/pr_review_agent/review/fingerprint.py:95 - security

Using only first 16 hex chars (64 bits) increases collision risk. With birthday paradox, collisions become likely with ~4 billion issues.
Suggestion: Use at least 128 bits (32 hex chars) to reduce collision probability to negligible levels for practical use cases.

src/pr_review_agent/review/fingerprint.py:61 - logic

Line bucketing logic has edge cases that could cause inconsistent fingerprints. When start_line=5 and end_line=15, midpoint is 10 (bucket 1), but line=14 would be bucket 1 while line=16 would be bucket 1 as well, missing the range semantics.
Suggestion: Consider using the start_line for bucketing when a range is provided, or implement range-aware bucketing that accounts for the span of the issue.

Minor (3)

src/pr_review_agent/review/fingerprint.py:23 - performance

STOP_WORDS frozenset lookup is efficient, but the word list could be more comprehensive for better normalization.
Suggestion: Consider using a more complete stop words list or a library like NLTK's stopwords for better description normalization across different domains.

src/pr_review_agent/review/fingerprint.py:50 - logic

Regex removes all punctuation but some punctuation might be significant for technical terms (e.g., 'OAuth2.0' becomes 'OAuth2 0').
Suggestion: Consider preserving dots and hyphens within words while still removing sentence punctuation, or use word boundary detection.

tests/test_fingerprint.py:127 - testing

Test only verifies 2-line shift produces same fingerprint, but doesn't test boundary conditions (e.g., line 49 vs 50 with bucket size 10).
Suggestion: Add test cases for bucket boundary conditions to ensure the bucketing logic works correctly at edges.

Concerns

64-bit fingerprints may not provide sufficient collision resistance for large-scale usage
Line bucketing logic could produce inconsistent results for issues spanning bucket boundaries
No mechanism to handle fingerprint format evolution if the algorithm needs to change

Questions

What's the expected scale of issues? Will you have millions of issues where collision risk becomes significant?
Have you considered how to handle fingerprint algorithm versioning if you need to change the algorithm later?
Should the fingerprint consider the suggestion field as well, or intentionally ignore it for matching purposes?

_{Model: claude-sonnet-4-20250514 | Cost: $0.0300 |

Tokens: 4856 in / 1031 out}

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions · 2026-01-24T19:43:05Z

AI Code Review

Summary

Well-structured fingerprinting implementation with comprehensive test coverage. The algorithm design is sound for regression tracking, but there are a few areas that need attention around edge cases and implementation details.

Confidence: 0.00 (low)
This PR may require human review

Strengths

Comprehensive test suite with 17 tests covering edge cases
Well-designed algorithm that balances stability with uniqueness
Good separation of concerns with dedicated fingerprint module
Proper integration with existing ReviewIssue dataclass
Smart line bucketing to handle minor code shifts
Description normalization handles wording variations well

Issues Found

Major (2)

src/pr_review_agent/review/fingerprint.py:61 - logic

The _bucket_line function has incorrect logic for handling None values and bucketing calculation
Suggestion: Fix the None handling and ensure proper bucketing calculation. The current logic will crash on None values and the bucketing should round to nearest bucket, not truncate.

src/pr_review_agent/review/fingerprint.py:94 - security

Using only 16 hex characters (64 bits) for fingerprints creates collision risk with scale
Suggestion: Consider using at least 128 bits (32 hex chars) to reduce collision probability. With thousands of issues, 64 bits may not provide sufficient collision resistance.

Minor (3)

src/pr_review_agent/review/fingerprint.py:47 - logic

Regex removes all non-word characters but may be too aggressive for technical terms
Suggestion: Consider preserving hyphens and underscores which are common in technical terms (e.g., 'cross-site', 'buffer_overflow'). The current regex would turn 'cross-site scripting' into 'cross site scripting' which might affect fingerprint stability.

src/pr_review_agent/review/fingerprint.py:23 - performance

STOP_WORDS as frozenset is good for performance, but the set is quite large
Suggestion: Consider if all these stop words are necessary for technical code review descriptions. A smaller, more targeted set might be more appropriate for code-related content.

tests/test_fingerprint.py:240 - testing

Missing test coverage for edge cases with all None values
Suggestion: Add a test case where all fields (file, line, category, severity, description) are None to ensure the fingerprinting doesn't crash and produces a valid fingerprint.

Suggestion (1)

src/pr_review_agent/review/fingerprint.py:1 - documentation

Missing documentation about fingerprint length and collision handling
Suggestion: Add documentation about the fingerprint format (16 hex chars), expected collision rate, and what happens if collisions occur in the system.

Concerns

64-bit fingerprints may not provide sufficient collision resistance at scale
Aggressive punctuation removal might affect technical term recognition
Missing validation for what happens when fingerprints collide in the database

Questions

Have you considered the collision probability with 64-bit fingerprints given your expected volume of issues?
How will the system handle fingerprint collisions when they occur in the database?
Is the 10-line bucket size appropriate for all types of files, or should it vary by file type/language?
Should the fingerprint generation be deterministic across different Python versions and platforms?

_{Model: claude-sonnet-4-20250514 | Cost: $0.0351 |

Tokens: 4856 in / 1368 out}

dtsong added evals Evaluation and testing infrastructure backlog Backlog items for future implementation labels Jan 24, 2026

chore: merge main into feat/35-issue-fingerprinting

732c89e

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

dtsong merged commit 119e45d into main Jan 24, 2026
1 check passed

dtsong deleted the feat/35-issue-fingerprinting branch January 24, 2026 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add issue fingerprinting for regression detection #45

feat: add issue fingerprinting for regression detection #45

Uh oh!

dtsong commented Jan 24, 2026

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add issue fingerprinting for regression detection #45

feat: add issue fingerprinting for regression detection #45

Uh oh!

Conversation

dtsong commented Jan 24, 2026

Summary

Changes

Fingerprint Algorithm

Test Plan

Uh oh!

github-actions bot commented Jan 24, 2026

AI Code Review

Summary

Strengths

Issues Found

Major (2)

Minor (3)

Concerns

Questions

Uh oh!

Uh oh!

github-actions bot commented Jan 24, 2026

AI Code Review

Summary

Strengths

Issues Found

Major (2)

Minor (3)

Suggestion (1)

Concerns

Questions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants