fix(patch): guard against None match in hunk header extraction#2330
Open
gvago wants to merge 3 commits intoThe-PR-Agent:mainfrom
Open
fix(patch): guard against None match in hunk header extraction#2330gvago wants to merge 3 commits intoThe-PR-Agent:mainfrom
gvago wants to merge 3 commits intoThe-PR-Agent:mainfrom
Conversation
In `decouple_and_convert_to_hunks_with_lines_numbers` and `extract_hunk_lines_from_patch`, the call to `extract_hunk_headers(match)` was outside proper `if match:` guards. If a line starts with `@@` but doesn't match `RE_HUNK_HEADER`, `match` is None, causing an `AttributeError` crash on `match.groups()`. Move the `extract_hunk_headers` call inside the match guard in both functions, and skip malformed hunk header lines gracefully.
… @@ lines The decouple_and_convert_to_hunks_with_lines_numbers() function overwrote `match` with the new RE_HUNK_HEADER result before checking whether the previous hunk needed to be finalized. When a malformed @@ line produced match=None, the flush condition `if match and (new/old_content_lines)` was False, silently dropping the previous hunk's content. Fix: save `prev_match` before overwriting and use it for the flush decision. Also use `prev_header_line` instead of `match`/`header_line` in the post-loop finalization, so a trailing malformed @@ cannot suppress the last valid hunk. Adds 7 unit tests covering malformed @@ scenarios: crash safety, content preservation, trailing malformed headers, line-number accuracy, all-malformed patches, and deletion-only hunks.
Orphan lines between a malformed @@ (match=None) and the next valid @@ were leaking into the next hunk. The buffer reset was inside the `if prev_match:` flush block, so when prev_match was None (set by the malformed @@), the buffers were never cleared. Move the buffer reset outside the conditional so it runs on every @@ encounter, and add a test that verifies orphan lines are discarded.
Contributor
Review Summary by QodoFix crash and data loss from malformed hunk header parsing
WalkthroughsDescription• Guard against None match results from malformed @@ hunk headers • Prevent orphan lines between malformed and valid hunks leaking • Preserve content from valid hunks before/after malformed headers • Add comprehensive test coverage for malformed hunk scenarios Diagramflowchart LR
A["Malformed @@ line<br/>match=None"] -->|Before| B["Crash on<br/>match.groups()"]
A -->|After| C["Guard check<br/>skip gracefully"]
D["Orphan lines<br/>in buffer"] -->|Before| E["Leak into<br/>next hunk"]
D -->|After| F["Clear buffers<br/>unconditionally"]
G["Trailing malformed @@<br/>overwrites match"] -->|Before| H["Last valid hunk<br/>not finalized"]
G -->|After| I["Use prev_header_line<br/>for finalization"]
File Changes1. pr_agent/algo/git_patch_processing.py
|
Contributor
Code Review by Qodo
|
Comment on lines
+399
to
406
| # finishing last hunk — use prev_header_line (not match/header_line) because | ||
| # match may have been set to None by a trailing malformed @@ line, and | ||
| # header_line may point to that malformed line instead of the last valid hunk | ||
| if prev_header_line and new_content_lines: | ||
| patch_with_lines_str += f'\n{prev_header_line}\n' | ||
| is_plus_lines = is_minus_lines = False | ||
| if new_content_lines: | ||
| is_plus_lines = any([line.startswith('+') for line in new_content_lines]) |
Contributor
There was a problem hiding this comment.
1. Eof orphan lines leaked 🐞 Bug ≡ Correctness
decouple_and_convert_to_hunks_with_lines_numbers can append lines that occur after a trailing malformed "@@" header to the previous valid hunk at EOF, duplicating that hunk header and mis-numbering the appended lines. This happens because malformed headers are skipped but subsequent "+"/"-" lines are still buffered, and the EOF flush is triggered by prev_header_line rather than by an “active valid hunk” flag.
Agent Prompt
## Issue description
`decouple_and_convert_to_hunks_with_lines_numbers()` can still emit orphan lines if a malformed `@@` header occurs near the end of the patch and is followed by content lines (e.g., `+...`, `-...`, or context). Those lines get buffered and then flushed at EOF under the last valid `prev_header_line`, duplicating the previous hunk header and producing incorrect line-numbered output.
## Issue Context
You already skip malformed `@@` lines via `continue`, and you reset `new_content_lines`/`old_content_lines` on every `@@`. However, after a malformed `@@`, the function currently continues buffering subsequent lines even though it is no longer in a valid hunk, and the EOF flush uses `prev_header_line` + `new_content_lines` as the condition.
## Fix Focus Areas
- pr_agent/algo/git_patch_processing.py[355-416]
- tests/unittest/test_malformed_hunk_header.py[61-76]
## Suggested fix approach
- Introduce an explicit state like `in_valid_hunk` (or reuse `match is not None` carefully) that is set to `True` only after a valid hunk header parse, and set to `False` when encountering a malformed `@@`.
- Only append `+`/`-`/context lines to buffers when `in_valid_hunk` is `True`.
- Alternatively (minimum change): when `match` is falsy in the `@@` branch, also clear `prev_header_line` (and optionally reset `start1/start2/...`) so the EOF finalization cannot attach any later buffered lines to a previous hunk.
- Add a regression test: valid hunk -> malformed `@@` -> orphan lines -> EOF, asserting those orphan lines are not present and the previous hunk header isn’t duplicated.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
extract_hunk_headers(match)calls againstNonematch results in two functions withinpr_agent/algo/git_patch_processing.pydecouple_and_convert_to_hunks_with_lines_numbers(line ~379): movedextract_hunk_headersinside the existingif match:block and addedcontinuefor non-matching@@linesextract_hunk_lines_from_patch(line ~434): added an explicitif not match:guard that setsskip_hunk = Trueand continues, preventing the crash@@line (valid or malformed) to prevent orphan lines between a malformed@@and the next valid@@from leaking into the next hunktest_orphan_lines_after_malformed_not_joined_to_next_hunkto verify orphan lines are discardedBug
If a line starts with
@@but doesn't fully match theRE_HUNK_HEADERregex pattern,re.match()returnsNone. The code then callsextract_hunk_headers(None), which invokesNone.groups()and raisesAttributeError: 'NoneType' object has no attribute 'groups'.Additionally, orphan lines between a malformed
@@(wherematch=None) and the next valid@@were leaking into the next hunk because the buffer reset was inside theif prev_match:flush block — whenprev_matchwasNone, buffers were never cleared.Test plan
test_malformed_hunk_header.pypass@@are not joined to the next hunkReplaces #2322 (lost push access to org branch).