forked from sergi/go-diff
-
Notifications
You must be signed in to change notification settings - Fork 1
feat: update repo and dependencies (#1) #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
upgrade gopkg.in/yaml.v2
Use common cache of line contents between two texts in `DiffLinesToChars` to get line diffs correctly. In some cases, line diffs cannot be retrieved correctly in the standard way (https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs#line-mode). In the below case, we failed to get line diffs correctly before this fix. ```go:main.go package main import ( "fmt" "github.com/sergi/go-diff/diffmatchpatch" ) const ( text1 = `hoge: step11: - arrayitem1 - arrayitem2 step12: step21: hoge step22: -93 fuga: flatitem ` text2 = `hoge: step11: - arrayitem4 - arrayitem2 - arrayitem3 step12: step21: hoge step22: -92 fuga: flatitem ` ) func main() { dmp := diffmatchpatch.New() a, b, c := dmp.DiffLinesToChars(text1, text2) diffs := dmp.DiffMain(a, b, false) diffs = dmp.DiffCharsToLines(diffs, c) // diffs = dmp.DiffCleanupSemantic(diffs) fmt.Println(diffs) } ``` ```text:output [{Insert hoge: step11: hoge: } {Equal hoge: } {Insert hoge: } {Equal step11: } {Insert hoge: } {Equal - arrayitem1 } {Insert hoge: } {Equal - arrayitem2 } {Insert hoge: } {Equal step12: } {Insert hoge: } {Equal step21: hoge } {Insert hoge: } {Equal step22: -93 } {Delete fuga: flatitem }] ``` Note: This fix corresponds to a javascript implementation. (ref: https://github.com/google/diff-match-patch/blob/62f2e689f498f9c92dbc588c58750addec9b1654/javascript/diff_match_patch_uncompressed.js#L466)
Use common lineHash to share indice between text1 and text2 for correct line diffs
Current implementation produces wrong result because it calls `DiffMain` on the following 2 arguments: * `1,2,3,4,5,6,7,8,9,10` * `1,2,3,4,5,6,7,8,9,11` This numbers represent indices into the lines array. The algorithm finds that equal part of those strings is `1,2,3,4,5,6,7,8,9,1` and which is followed by `Delete 0` and `Insert `1`.
[The suggested approach](https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs#line-mode ) for doing line level diffing is the following set of steps: 1. `ti1, ti2, linesIdx = DiffLinesToChars(t1, t2)` 2. `diffs = DiffMain(ti1, ti2)` 3. `DiffCharsToLines(diff, linesIdx)` The original implementation in `google/diff-match-patch` uses unicode codepoints for storing indices in `ti1` and `ti2` joined by an empty string. Current implementation in this repo stores them as integers joined by a comma. While this implementation makes `ti1` and `ti2` more readable, it introduces bugs when trying to rely on it when doing line level diffing with `DiffMain`. The root cause of the issue is that an integer line index might span more than one character/rune, and `DiffMain` can assume that two different lines having the same index prefix match partially. For example, indices 123 and 129 will have partial match `12`. In that example, the diff will show lines 3 and 9 which is not correct. A simple failing test case demonstrating this issue is available at `TestDiffPartialLineIndex`. In this PR I am adjusting the algorithm to use the same approach as in [diff-match-patch](https://github.com/google/diff-match-patch/blob/62f2e689f498f9c92dbc588c58750addec9b1654/javascript/diff_match_patch_uncompressed.js#L508-L510 ) by storing each line index as a rune. While a rune in Golang is a type alias to uint32, not every uint32 can be a valid rune. During string to rune slice conversion invalid runes will be replaced with `utf.RuneError`. The integer to rune generation logic is based on the table in https://en.wikipedia.org/wiki/UTF-8#Encoding The first 127 lines will work the fastest as they are represented as a single bytes. Higher numbers are represented as 2-4 bytes. In addition to that, the range `U+D800 - U+DFFF` contains [invalid codepoints](https://en.wikipedia.org/wiki/UTF-8#Invalid_sequences_and_error_handling). and all codepoints higher or equal to `0xD800` are incremented by `0xDFFF - 0xD800`. The maximum representable integer using this approach is 1'112'060. This improves on Javascript implementation which currently [bails out](https://github.com/google/diff-match-patch/blob/62f2e689f498f9c92dbc588c58750addec9b1654/javascript/diff_match_patch_uncompressed.js#L503-L505 ) when files have more than 65535 lines.
Fix line diff by using rune index without a separator
…red_text Fix colored display for diffs with newlines
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Release new forked version with no security issue