Skip to content

fix(patch): guard against None match in hunk header extraction#2330

Open
gvago wants to merge 3 commits intoThe-PR-Agent:mainfrom
gvago:fix/hunk-header-parse-crash
Open

fix(patch): guard against None match in hunk header extraction#2330
gvago wants to merge 3 commits intoThe-PR-Agent:mainfrom
gvago:fix/hunk-header-parse-crash

Conversation

@gvago
Copy link
Copy Markdown

@gvago gvago commented Apr 16, 2026

Summary

  • Guard extract_hunk_headers(match) calls against None match results in two functions within pr_agent/algo/git_patch_processing.py
  • In decouple_and_convert_to_hunks_with_lines_numbers (line ~379): moved extract_hunk_headers inside the existing if match: block and added continue for non-matching @@ lines
  • In extract_hunk_lines_from_patch (line ~434): added an explicit if not match: guard that sets skip_hunk = True and continues, preventing the crash
  • NEW: Clear line buffers unconditionally on every @@ line (valid or malformed) to prevent orphan lines between a malformed @@ and the next valid @@ from leaking into the next hunk
  • Added test test_orphan_lines_after_malformed_not_joined_to_next_hunk to verify orphan lines are discarded

Bug

If a line starts with @@ but doesn't fully match the RE_HUNK_HEADER regex pattern, re.match() returns None. The code then calls extract_hunk_headers(None), which invokes None.groups() and raises AttributeError: 'NoneType' object has no attribute 'groups'.

Additionally, orphan lines between a malformed @@ (where match=None) and the next valid @@ were leaking into the next hunk because the buffer reset was inside the if prev_match: flush block — when prev_match was None, buffers were never cleared.

Test plan

  • All 8 tests in test_malformed_hunk_header.py pass
  • New test verifies orphan lines after malformed @@ are not joined to the next hunk
  • Verify normal patch processing still works correctly with valid hunk headers

Replaces #2322 (lost push access to org branch).

gvago added 3 commits April 14, 2026 18:37
In `decouple_and_convert_to_hunks_with_lines_numbers` and
`extract_hunk_lines_from_patch`, the call to `extract_hunk_headers(match)`
was outside proper `if match:` guards. If a line starts with `@@` but
doesn't match `RE_HUNK_HEADER`, `match` is None, causing an
`AttributeError` crash on `match.groups()`.

Move the `extract_hunk_headers` call inside the match guard in both
functions, and skip malformed hunk header lines gracefully.
… @@ lines

The decouple_and_convert_to_hunks_with_lines_numbers() function overwrote
`match` with the new RE_HUNK_HEADER result before checking whether the
previous hunk needed to be finalized. When a malformed @@ line produced
match=None, the flush condition `if match and (new/old_content_lines)` was
False, silently dropping the previous hunk's content.

Fix: save `prev_match` before overwriting and use it for the flush decision.
Also use `prev_header_line` instead of `match`/`header_line` in the
post-loop finalization, so a trailing malformed @@ cannot suppress the
last valid hunk.

Adds 7 unit tests covering malformed @@ scenarios: crash safety,
content preservation, trailing malformed headers, line-number accuracy,
all-malformed patches, and deletion-only hunks.
Orphan lines between a malformed @@ (match=None) and the next valid @@
were leaking into the next hunk. The buffer reset was inside the
`if prev_match:` flush block, so when prev_match was None (set by
the malformed @@), the buffers were never cleared.

Move the buffer reset outside the conditional so it runs on every @@
encounter, and add a test that verifies orphan lines are discarded.
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Fix crash and data loss from malformed hunk header parsing

🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Guard against None match results from malformed @@ hunk headers
• Prevent orphan lines between malformed and valid hunks leaking
• Preserve content from valid hunks before/after malformed headers
• Add comprehensive test coverage for malformed hunk scenarios
Diagram
flowchart LR
  A["Malformed @@ line<br/>match=None"] -->|Before| B["Crash on<br/>match.groups()"]
  A -->|After| C["Guard check<br/>skip gracefully"]
  D["Orphan lines<br/>in buffer"] -->|Before| E["Leak into<br/>next hunk"]
  D -->|After| F["Clear buffers<br/>unconditionally"]
  G["Trailing malformed @@<br/>overwrites match"] -->|Before| H["Last valid hunk<br/>not finalized"]
  G -->|After| I["Use prev_header_line<br/>for finalization"]
Loading

Grey Divider

File Changes

1. pr_agent/algo/git_patch_processing.py 🐞 Bug fix +18/-8

Guard against None match in hunk header extraction

• Save prev_match before overwriting to properly flush previous hunk before processing new @@
 line
• Move buffer reset outside conditional to clear on every @@ line, preventing orphan lines from
 leaking
• Guard extract_hunk_headers() call with if match: check and skip malformed headers with
 continue
• Use prev_header_line in final hunk finalization to handle trailing malformed @@ lines
 correctly
• Add explicit if not match: guard in extract_hunk_lines_from_patch() to prevent crash

pr_agent/algo/git_patch_processing.py


2. tests/unittest/test_malformed_hunk_header.py 🧪 Tests +151/-0

Comprehensive tests for malformed hunk header handling

• Add 8 unit tests covering malformed @@ hunk header scenarios
• Test crash safety, content preservation, and line number accuracy
• Verify orphan lines between malformed and valid hunks are discarded
• Cover edge cases: trailing malformed headers, deletion-only hunks, all-malformed patches

tests/unittest/test_malformed_hunk_header.py


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 16, 2026

Code Review by Qodo

🐞 Bugs (1)   📘 Rule violations (0)   📎 Requirement gaps (0)
🐞\ ≡ Correctness (1)

Grey Divider


Action required

1. EOF orphan lines leaked 🐞
Description
decouple_and_convert_to_hunks_with_lines_numbers can append lines that occur after a trailing
malformed "@@" header to the previous valid hunk at EOF, duplicating that hunk header and
mis-numbering the appended lines. This happens because malformed headers are skipped but subsequent
"+"/"-" lines are still buffered, and the EOF flush is triggered by prev_header_line rather than by
an “active valid hunk” flag.
Code

pr_agent/algo/git_patch_processing.py[R399-406]

+    # finishing last hunk — use prev_header_line (not match/header_line) because
+    # match may have been set to None by a trailing malformed @@ line, and
+    # header_line may point to that malformed line instead of the last valid hunk
+    if prev_header_line and new_content_lines:
+        patch_with_lines_str += f'\n{prev_header_line}\n'
        is_plus_lines = is_minus_lines = False
        if new_content_lines:
            is_plus_lines = any([line.startswith('+') for line in new_content_lines])
Evidence
On a malformed hunk header, the code clears the line buffers but then continues without clearing
prev_header_line, so the function is no longer “in a valid hunk” but still remembers the last
valid header. Subsequent patch lines (e.g., "+orphan") are appended to new_content_lines
regardless of whether a valid hunk header was parsed. At EOF, the function finalizes based on `if
prev_header_line and new_content_lines:`, so those orphan lines get emitted under the previous valid
header with stale start2—duplicating the header and corrupting the output.

pr_agent/algo/git_patch_processing.py[355-384]
pr_agent/algo/git_patch_processing.py[386-389]
pr_agent/algo/git_patch_processing.py[399-416]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`decouple_and_convert_to_hunks_with_lines_numbers()` can still emit orphan lines if a malformed `@@` header occurs near the end of the patch and is followed by content lines (e.g., `+...`, `-...`, or context). Those lines get buffered and then flushed at EOF under the last valid `prev_header_line`, duplicating the previous hunk header and producing incorrect line-numbered output.

## Issue Context
You already skip malformed `@@` lines via `continue`, and you reset `new_content_lines`/`old_content_lines` on every `@@`. However, after a malformed `@@`, the function currently continues buffering subsequent lines even though it is no longer in a valid hunk, and the EOF flush uses `prev_header_line` + `new_content_lines` as the condition.

## Fix Focus Areas
- pr_agent/algo/git_patch_processing.py[355-416]
- tests/unittest/test_malformed_hunk_header.py[61-76]

## Suggested fix approach
- Introduce an explicit state like `in_valid_hunk` (or reuse `match is not None` carefully) that is set to `True` only after a valid hunk header parse, and set to `False` when encountering a malformed `@@`.
- Only append `+`/`-`/context lines to buffers when `in_valid_hunk` is `True`.
- Alternatively (minimum change): when `match` is falsy in the `@@` branch, also clear `prev_header_line` (and optionally reset `start1/start2/...`) so the EOF finalization cannot attach any later buffered lines to a previous hunk.
- Add a regression test: valid hunk -> malformed `@@` -> orphan lines -> EOF, asserting those orphan lines are not present and the previous hunk header isn’t duplicated.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment on lines +399 to 406
# finishing last hunk — use prev_header_line (not match/header_line) because
# match may have been set to None by a trailing malformed @@ line, and
# header_line may point to that malformed line instead of the last valid hunk
if prev_header_line and new_content_lines:
patch_with_lines_str += f'\n{prev_header_line}\n'
is_plus_lines = is_minus_lines = False
if new_content_lines:
is_plus_lines = any([line.startswith('+') for line in new_content_lines])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Eof orphan lines leaked 🐞 Bug ≡ Correctness

decouple_and_convert_to_hunks_with_lines_numbers can append lines that occur after a trailing
malformed "@@" header to the previous valid hunk at EOF, duplicating that hunk header and
mis-numbering the appended lines. This happens because malformed headers are skipped but subsequent
"+"/"-" lines are still buffered, and the EOF flush is triggered by prev_header_line rather than by
an “active valid hunk” flag.
Agent Prompt
## Issue description
`decouple_and_convert_to_hunks_with_lines_numbers()` can still emit orphan lines if a malformed `@@` header occurs near the end of the patch and is followed by content lines (e.g., `+...`, `-...`, or context). Those lines get buffered and then flushed at EOF under the last valid `prev_header_line`, duplicating the previous hunk header and producing incorrect line-numbered output.

## Issue Context
You already skip malformed `@@` lines via `continue`, and you reset `new_content_lines`/`old_content_lines` on every `@@`. However, after a malformed `@@`, the function currently continues buffering subsequent lines even though it is no longer in a valid hunk, and the EOF flush uses `prev_header_line` + `new_content_lines` as the condition.

## Fix Focus Areas
- pr_agent/algo/git_patch_processing.py[355-416]
- tests/unittest/test_malformed_hunk_header.py[61-76]

## Suggested fix approach
- Introduce an explicit state like `in_valid_hunk` (or reuse `match is not None` carefully) that is set to `True` only after a valid hunk header parse, and set to `False` when encountering a malformed `@@`.
- Only append `+`/`-`/context lines to buffers when `in_valid_hunk` is `True`.
- Alternatively (minimum change): when `match` is falsy in the `@@` branch, also clear `prev_header_line` (and optionally reset `start1/start2/...`) so the EOF finalization cannot attach any later buffered lines to a previous hunk.
- Add a regression test: valid hunk -> malformed `@@` -> orphan lines -> EOF, asserting those orphan lines are not present and the previous hunk header isn’t duplicated.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant