Skip to content

fix(diff): check modified count in identical-files guard#792

Open
gghez wants to merge 1 commit intortk-ai:developfrom
gghez:fix/diff-false-identical
Open

fix(diff): check modified count in identical-files guard#792
gghez wants to merge 1 commit intortk-ai:developfrom
gghez:fix/diff-false-identical

Conversation

@gghez
Copy link

@gghez gghez commented Mar 24, 2026

Summary

rtk diff incorrectly reports "Files are identical" when lines differ by only a few characters. Two root causes:

  1. The identical-files guard at line 24 only checked added == 0 && removed == 0, ignoring the modified counter. Lines classified as modifications (similarity > 0.5) incremented modified but not added/removed, so the guard passed and the diff silently swallowed actual differences.

  2. The similarity() function used Jaccard similarity on unique character sets, which over-estimated similarity for lines sharing common characters regardless of position. Replaced it with a positional prefix+suffix ratio that better reflects actual line differences.

Changes

  • Added && diff.modified == 0 to the identical-files guard in run()
  • Replaced similarity() implementation: Jaccard set-based -> positional prefix+suffix metric
  • Added 4 regression tests covering all reproduction cases from the issue
  • Updated existing test_similarity_partial_overlap for the new metric

Test plan

Fixes #781

The "Files are identical" check only tested added and removed counters,
ignoring modified lines. When two lines shared enough characters to be
classified as a modification (similarity > 0.5), they incremented
modified but not added/removed, so the guard passed and the diff
reported no changes.

Also replace the Jaccard set-based similarity function with a
positional prefix+suffix metric that better reflects actual line
differences — the old approach used unique character sets, which
over-estimated similarity for lines sharing common characters in
different positions.

Fixes rtk-ai#781

Signed-off-by: gghez <gghez@users.noreply.github.com>
@CLAassistant
Copy link

CLAassistant commented Mar 24, 2026

CLA assistant check
All committers have signed the CLA.

@pszymkowiak pszymkowiak added bug Something isn't working effort-small Quelques heures, 1 fichier labels Mar 24, 2026
@pszymkowiak
Copy link
Collaborator

[w] wshm · Automated triage by AI

📊 Automated PR Analysis

🐛 Type bug-fix
🟡 Risk medium

Summary

Fixes a bug where rtk diff falsely reported files as identical when lines differed by only a few characters. The identical-files guard now checks the modified counter in addition to added and removed, and the similarity function was replaced with a positional prefix+suffix metric to avoid over-estimating similarity.

Review Checklist

  • Tests present
  • Breaking change
  • Docs updated

Linked issues: #781


Analyzed automatically by wshm · This is an automated analysis, not a human review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working effort-small Quelques heures, 1 fichier

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants