Skip to content

Conversation

@hawkingrei
Copy link
Member

@hawkingrei hawkingrei commented Dec 4, 2025

When your result set changes significantly, it's not easy to directly discern the differences from the logs. I've modified the log format so that it can print the diff.

before

image

after the improvement

image

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Copy link
Member

@YangKeao YangKeao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


func (e *WrongResultError) Error() string {
diff := diffmatchpatch.New()
diffText := diff.DiffPrettyText(diff.DiffMain(e.actual, e.expected, false))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the effect of checklines = false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I gave it a try, but I couldn't see any difference.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot:

In sergi/go-diff, the checklines parameter controls whether the diff algorithm first does a fast, line‑level diff before falling back to a more precise, character‑level diff.

Specifically:

  • DiffMain(text1, text2, checklines bool) passes checklines down to diffCompute:
    func (dmp *DiffMatchPatch) diffCompute(text1, text2 []rune, checklines bool, deadline time.Time) []Diff {
        …
        } else if checklines && len(text1) > 100 && len(text2) > 100 {
            return dmp.diffLineMode(text1, text2, deadline)
        }
        return dmp.diffBisect(text1, text2, deadline)
    }
  • When checklines == true and both texts are longer than 100 runes, it uses diffLineMode, which:
    • Does a quick diff at line granularity (treating each line as a token), then
    • “Rediffs” the changed line blocks at character level for better accuracy.

Effects in practice

  • checklines = false

    • Always uses diffBisect (a Myers-style character-level diff).
    • Slower on large inputs, but tends to give minimal/optimal character-level diffs.
  • checklines = true

    • For long texts: runs the faster line-mode pre-pass, then refines.
    • This usually gives a significant speedup on big inputs, at the cost of possibly non-minimal diffs (more or differently grouped hunks than the theoretical minimum).

The tests (TestDiffMainWithCheckLines) confirm that:

  • For most cases, results with and without checklines match exactly.
  • It’s explicitly documented that the speedup “can produce non-minimal diffs” (comment on diffLineMode), and there’s a TODO about a failing test case, highlighting that behavior can differ.

Copy link
Member

@bb7133 bb7133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bb7133 bb7133 merged commit 12f3756 into pingcap:master Dec 5, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants