Skip to content

fix(patch): guard against None match in hunk header extraction#2322

Closed
gvago wants to merge 2 commits intomainfrom
fix/hunk-header-parse-crash
Closed

fix(patch): guard against None match in hunk header extraction#2322
gvago wants to merge 2 commits intomainfrom
fix/hunk-header-parse-crash

Conversation

@gvago
Copy link
Copy Markdown

@gvago gvago commented Apr 14, 2026

Summary

  • Guard extract_hunk_headers(match) calls against None match results in two functions within pr_agent/algo/git_patch_processing.py
  • In decouple_and_convert_to_hunks_with_lines_numbers (line ~379): moved extract_hunk_headers inside the existing if match: block and added continue for non-matching @@ lines
  • In extract_hunk_lines_from_patch (line ~434): added an explicit if not match: guard that sets skip_hunk = True and continues, preventing the crash

Bug

If a line starts with @@ but doesn't fully match the RE_HUNK_HEADER regex pattern, re.match() returns None. The code then calls extract_hunk_headers(None), which invokes None.groups() and raises AttributeError: 'NoneType' object has no attribute 'groups'.

This can occur with malformed or unusual diff output (e.g. conflict markers, truncated patches, or non-standard @@ lines).

Test plan

  • Verify existing tests pass (pytest tests/ -v)
  • Manually test with a patch containing a malformed @@ line (e.g. @@ invalid header @@) to confirm it no longer crashes
  • Verify normal patch processing still works correctly with valid hunk headers

🤖 Generated with Claude Code

In `decouple_and_convert_to_hunks_with_lines_numbers` and
`extract_hunk_lines_from_patch`, the call to `extract_hunk_headers(match)`
was outside proper `if match:` guards. If a line starts with `@@` but
doesn't match `RE_HUNK_HEADER`, `match` is None, causing an
`AttributeError` crash on `match.groups()`.

Move the `extract_hunk_headers` call inside the match guard in both
functions, and skip malformed hunk header lines gracefully.
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Guard against None match in hunk header extraction

🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Guard extract_hunk_headers() calls against None match results
• Move extraction logic inside match validation blocks
• Skip malformed hunk header lines gracefully instead of crashing
• Prevent AttributeError when @@ lines don't match regex pattern
Diagram
flowchart LR
  A["Line starts with @@"] --> B{"RE_HUNK_HEADER.match()"}
  B -->|Valid match| C["Extract hunk headers"]
  B -->|None/Invalid| D["Skip line gracefully"]
  C --> E["Process hunk content"]
  D --> E
Loading

Grey Divider

File Changes

1. pr_agent/algo/git_patch_processing.py 🐞 Bug fix +6/-2

Add None match guards in hunk header extraction

• Moved extract_hunk_headers(match) call inside if match: block in
 decouple_and_convert_to_hunks_with_lines_numbers() function
• Added else: continue clause to skip malformed hunk header lines
• Added explicit if not match: guard in extract_hunk_lines_from_patch() function that sets
 skip_hunk = True and continues
• Prevents AttributeError crash when @@ lines don't fully match the RE_HUNK_HEADER regex
 pattern

pr_agent/algo/git_patch_processing.py


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 14, 2026

Code Review by Qodo

🐞 Bugs (2)   📘 Rule violations (1)   📎 Requirement gaps (0)
🐞\ ≡ Correctness (2) ⭐ New (2)
📘\ ☼ Reliability (1)

Grey Divider


Action required

1. Orphan lines join next hunk 🐞
Description
In decouple_and_convert_to_hunks_with_lines_numbers(), a malformed @@ line sets match=None and
continues without clearing new_content_lines/old_content_lines, so any subsequent patch lines
are buffered and then incorrectly emitted under the next valid hunk header (wrong hunk association
and wrong line numbers). This can corrupt the diff context passed downstream to PR processing and
suggestions generation.
Code

pr_agent/algo/git_patch_processing.py[R355-362]

        if line.startswith('@@'):
            header_line = line
+            prev_match = match  # save previous match before overwriting
            match = RE_HUNK_HEADER.match(line)
-            if match and (new_content_lines or old_content_lines):  # found a new hunk, split the previous lines
+            if prev_match and (new_content_lines or old_content_lines):  # flush the previous hunk
                if prev_header_line:
                    patch_with_lines_str += f'\n{prev_header_line}\n'
                is_plus_lines = is_minus_lines = False
Evidence
When a malformed header is encountered, the code takes the else: continue path, leaving
match=None and not resetting the content buffers; meanwhile, non-header lines always append into
new_content_lines/old_content_lines. When the next valid header arrives, prev_match is None
(because the malformed header overwrote match), so the flush block doesn’t run and the orphan
buffered lines remain and will be emitted as part of the next hunk with that hunk’s start2 line
numbering. This output is used to build the patch context given to PR processing.

pr_agent/algo/git_patch_processing.py[344-395]
pr_agent/algo/pr_processing.py[183-196]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`decouple_and_convert_to_hunks_with_lines_numbers()` can mis-attribute diff lines that appear after a malformed `@@` header to the next valid hunk because it `continue`s without clearing `new_content_lines` / `old_content_lines`.

### Issue Context
A malformed `@@` header sets `match=None`. Any subsequent `+`/`-`/context lines are still appended to the buffers, and the next valid header won’t flush them because `prev_match` is `None`.

### Fix Focus Areas
- pr_agent/algo/git_patch_processing.py[355-395]

### Implementation notes
- When encountering any `@@` line, after flushing the previous valid hunk (if any), reset `new_content_lines` and `old_content_lines` regardless of whether the previous header matched.
- Additionally, when `match` is `None` (malformed header), ensure buffers are cleared (and optionally reset `start1/start2`) so subsequent non-header lines are treated as orphan content and not attached to the next valid hunk.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Malformed hunk handling untested 📘
Description
This PR changes patch parsing behavior to skip malformed @@ hunk headers, but no corresponding
pytest coverage is added/updated in this change set. Without tests, regressions could reintroduce
crashes or silently drop hunks when encountering unusual diff output.
Code

pr_agent/algo/git_patch_processing.py[R379-380]

+            else:
+                continue  # skip lines that start with @@ but don't match the hunk header pattern
Evidence
PR Compliance ID 14 requires adding/updating pytest coverage when behavior changes. The diff shows
new guards that alter control flow for malformed hunk headers (skipping/marking hunks as skipped),
but the PR diff contains no test additions exercising these new branches.

AGENTS.md
pr_agent/algo/git_patch_processing.py[376-380]
pr_agent/algo/git_patch_processing.py[433-436]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The new guard logic for malformed `@@` hunk headers changes runtime behavior (skip/ignore malformed hunks) but is not covered by tests.
## Issue Context
Previously, malformed `@@` lines could cause `extract_hunk_headers(None)` crashes. The fix now skips those headers; add unit tests to ensure (1) no exception is raised and (2) behavior is stable for both valid and malformed headers.
## Fix Focus Areas
- pr_agent/algo/git_patch_processing.py[376-380]
- pr_agent/algo/git_patch_processing.py[433-436]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Last hunk can drop🐞
Description
decouple_and_convert_to_hunks_with_lines_numbers() assigns match = RE_HUNK_HEADER.match(line)
before the new else: continue, so a malformed trailing @@ line leaves match=None while
new_content_lines/old_content_lines still contain the last valid hunk. The final flush condition
if match and new_content_lines: then fails, causing the function to return without emitting the
buffered hunk (or to merge lines between malformed and next valid header into the prior hunk).
Code

pr_agent/algo/git_patch_processing.py[R375-380]

              old_content_lines = []
          if match:
              prev_header_line = header_line
-
-            section_header, size1, size2, start1, start2 = extract_hunk_headers(match)
+                section_header, size1, size2, start1, start2 = extract_hunk_headers(match)
+            else:
+                continue  # skip lines that start with @@ but don't match the hunk header pattern
Evidence
Within the loop, match is overwritten on every @@ line (match = RE_HUNK_HEADER.match(line)),
and on a non-matching header the new else: continue skips flushing/resetting buffers. After the
loop, the function only emits the final buffered hunk when match is truthy (`if match and
new_content_lines:), so a trailing malformed @@` header (or a malformed header after the last
flush) can prevent any final emission despite non-empty buffers.

pr_agent/algo/git_patch_processing.py[351-381]
pr_agent/algo/git_patch_processing.py[395-411]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`decouple_and_convert_to_hunks_with_lines_numbers()` can drop the final buffered hunk when a malformed `@@` line occurs after a valid hunk. This happens because `match`/`header_line` are assigned before validating the hunk header, and the end-of-function flush is gated by `if match and new_content_lines:`.
### Issue Context
The PR adds `else: continue` for non-matching `@@` lines, which prevents the crash, but the function’s end-of-loop flush uses the *last assigned* `match` value. A malformed trailing `@@` sets `match=None`, leaving buffered lines unflushed.
### Fix Focus Areas
- pr_agent/algo/git_patch_processing.py[351-412]
### Suggested fix
- Don’t clobber the outer `match` (and `header_line`) unless the hunk header regex matched.
- e.g., use a local `header_match = RE_HUNK_HEADER.match(line)` and `if not header_match: continue` *before* assigning `match = header_match` and `header_line = line`.
- Ensure the “finishing last hunk” logic flushes based on the last **valid** header seen (or on `prev_header_line`/`start2 != -1`) rather than on a potentially overwritten `match` from a malformed header line.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

4. Trailing deletion hunk skipped 🐞
Description
The last-hunk finalization in decouple_and_convert_to_hunks_with_lines_numbers() only runs when
new_content_lines is non-empty, so a patch ending with a deletion-only hunk (only - lines) will
be dropped from the output. This loses diff context for deletions at EOF.
Code

pr_agent/algo/git_patch_processing.py[R396-403]

+    # finishing last hunk — use prev_header_line (not match/header_line) because
+    # match may have been set to None by a trailing malformed @@ line, and
+    # header_line may point to that malformed line instead of the last valid hunk
+    if prev_header_line and new_content_lines:
+        patch_with_lines_str += f'\n{prev_header_line}\n'
        is_plus_lines = is_minus_lines = False
        if new_content_lines:
            is_plus_lines = any([line.startswith('+') for line in new_content_lines])
Evidence
Deletions are collected into old_content_lines, but last-hunk finalization is gated by `if
prev_header_line and new_content_lines:`, so a deletion-only hunk at the end of the patch never gets
emitted even though old_content_lines contains changes.

pr_agent/algo/git_patch_processing.py[383-415]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The last-hunk finalization block skips trailing deletion-only hunks because it requires `new_content_lines` to be non-empty.

### Issue Context
Deletion-only hunks populate `old_content_lines` but may leave `new_content_lines` empty; the current gate prevents emitting `__old hunk__` for the final hunk.

### Fix Focus Areas
- pr_agent/algo/git_patch_processing.py[396-414]

### Implementation notes
- Change the finalization condition to include either side:
 - `if prev_header_line and (new_content_lines or old_content_lines):`
- Keep the existing `is_minus_lines` logic so deletion hunks still produce `__old hunk__` output.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Grey Divider

Previous review results

Review updated until commit 89fbb48

Results up to commit N/A


🐞 Bugs (0)  
📘 Rule violations (1)  
📎 Requirement gaps (0)

Grey Divider
Action required
1. Malformed hunk handling untested 📘
Description
This PR changes patch parsing behavior to skip malformed @@ hunk headers, but no corresponding
pytest coverage is added/updated in this change set. Without tests, regressions could reintroduce
crashes or silently drop hunks when encountering unusual diff output.
Code

pr_agent/algo/git_patch_processing.py[R379-380]

+            else:
+                continue  # skip lines that start with @@ but don't match the hunk header pattern
Evidence
PR Compliance ID 14 requires adding/updating pytest coverage when behavior changes. The diff shows
new guards that alter control flow for malformed hunk headers (skipping/marking hunks as skipped),
but the PR diff contains no test additions exercising these new branches.

AGENTS.md
pr_agent/algo/git_patch_processing.py[376-380]
pr_agent/algo/git_patch_processing.py[433-436]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The new guard logic for malformed `@@` hunk headers changes runtime behavior (skip/ignore malformed hunks) but is not covered by tests.
## Issue Context
Previously, malformed `@@` lines could cause `extract_hunk_headers(None)` crashes. The fix now skips those headers; add unit tests to ensure (1) no exception is raised and (2) behavior is stable for both valid and malformed headers.
## Fix Focus Areas
- pr_agent/algo/git_patch_processing.py[376-380]
- pr_agent/algo/git_patch_processing.py[433-436]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Last hunk can drop🐞
Description
decouple_and_convert_to_hunks_with_lines_numbers() assigns match = RE_HUNK_HEADER.match(line)
before the new else: continue, so a malformed trailing @@ line leaves match=None while
new_content_lines/old_content_lines still contain the last valid hunk. The final flush condition
if match and new_content_lines: then fails, causing the function to return without emitting the
buffered hunk (or to merge lines between malformed and next valid header into the prior hunk).
Code

pr_agent/algo/git_patch_processing.py[R375-380]

               old_content_lines = []
           if match:
               prev_header_line = header_line
-
-            section_header, size1, size2, start1, start2 = extract_hunk_headers(match)
+                section_header, size1, size2, start1, start2 = extract_hunk_headers(match)
+            else:
+                continue  # skip lines that start with @@ but don't match the hunk header pattern
Evidence
Within the loop, match is overwritten on every @@ line (match = RE_HUNK_HEADER.match(line)),
and on a non-matching header the new else: continue skips flushing/resetting buffers. After the
loop, the function only emits the final buffered hunk when match is truthy (`if match and
new_content_lines:), so a trailing malformed @@` header (or a malformed header after the last
flush) can prevent any final emission despite non-empty buffers.

pr_agent/algo/git_patch_processing.py[351-381]
pr_agent/algo/git_patch_processing.py[395-411]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`decouple_and_convert_to_hunks_with_lines_numbers()` can drop the final buffered hunk when a malformed `@@` line occurs after a valid hunk. This happens because `match`/`header_line` are assigned before validating the hunk header, and the end-of-function flush is gated by `if match and new_content_lines:`.
### Issue Context
The PR adds `else: continue` for non-matching `@@` lines, which prevents the crash, but the function’s end-of-loop flush uses the *last assigned* `match` value. A malformed trailing `@@` sets `match=None`, leaving buffered lines unflushed.
### Fix Focus Areas
- pr_agent/algo/git_patch_processing.py[351-412]
### Suggested fix
- Don’t clobber the outer `match` (and `header_line`) unless the hunk header regex matched.
- e.g., use a local `header_match = RE_HUNK_HEADER.match(line)` and `if not header_match: continue` *before* assigning `match = header_match` and `header_line = line`.
- Ensure the “finishing last hunk” logic flushes based on the last **valid** header seen (or on `prev_header_line`/`start2 != -1`) rather than on a potentially overwritten `match` from a malformed header line.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider Grey Divider

Qodo Logo

Comment on lines +379 to +380
else:
continue # skip lines that start with @@ but don't match the hunk header pattern
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Malformed hunk handling untested 📘 Rule violation ☼ Reliability

This PR changes patch parsing behavior to skip malformed @@ hunk headers, but no corresponding
pytest coverage is added/updated in this change set. Without tests, regressions could reintroduce
crashes or silently drop hunks when encountering unusual diff output.
Agent Prompt
## Issue description
The new guard logic for malformed `@@` hunk headers changes runtime behavior (skip/ignore malformed hunks) but is not covered by tests.

## Issue Context
Previously, malformed `@@` lines could cause `extract_hunk_headers(None)` crashes. The fix now skips those headers; add unit tests to ensure (1) no exception is raised and (2) behavior is stable for both valid and malformed headers.

## Fix Focus Areas
- pr_agent/algo/git_patch_processing.py[376-380]
- pr_agent/algo/git_patch_processing.py[433-436]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment thread pr_agent/algo/git_patch_processing.py
… @@ lines

The decouple_and_convert_to_hunks_with_lines_numbers() function overwrote
`match` with the new RE_HUNK_HEADER result before checking whether the
previous hunk needed to be finalized. When a malformed @@ line produced
match=None, the flush condition `if match and (new/old_content_lines)` was
False, silently dropping the previous hunk's content.

Fix: save `prev_match` before overwriting and use it for the flush decision.
Also use `prev_header_line` instead of `match`/`header_line` in the
post-loop finalization, so a trailing malformed @@ cannot suppress the
last valid hunk.

Adds 7 unit tests covering malformed @@ scenarios: crash safety,
content preservation, trailing malformed headers, line-number accuracy,
all-malformed patches, and deletion-only hunks.
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 15, 2026

Persistent review updated to latest commit 89fbb48

Comment on lines 355 to 362
if line.startswith('@@'):
header_line = line
prev_match = match # save previous match before overwriting
match = RE_HUNK_HEADER.match(line)
if match and (new_content_lines or old_content_lines): # found a new hunk, split the previous lines
if prev_match and (new_content_lines or old_content_lines): # flush the previous hunk
if prev_header_line:
patch_with_lines_str += f'\n{prev_header_line}\n'
is_plus_lines = is_minus_lines = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Orphan lines join next hunk 🐞 Bug ≡ Correctness

In decouple_and_convert_to_hunks_with_lines_numbers(), a malformed @@ line sets match=None and
continues without clearing new_content_lines/old_content_lines, so any subsequent patch lines
are buffered and then incorrectly emitted under the next valid hunk header (wrong hunk association
and wrong line numbers). This can corrupt the diff context passed downstream to PR processing and
suggestions generation.
Agent Prompt
### Issue description
`decouple_and_convert_to_hunks_with_lines_numbers()` can mis-attribute diff lines that appear after a malformed `@@` header to the next valid hunk because it `continue`s without clearing `new_content_lines` / `old_content_lines`.

### Issue Context
A malformed `@@` header sets `match=None`. Any subsequent `+`/`-`/context lines are still appended to the buffers, and the next valid header won’t flush them because `prev_match` is `None`.

### Fix Focus Areas
- pr_agent/algo/git_patch_processing.py[355-395]

### Implementation notes
- When encountering any `@@` line, after flushing the previous valid hunk (if any), reset `new_content_lines` and `old_content_lines` regardless of whether the previous header matched.
- Additionally, when `match` is `None` (malformed header), ensure buffers are cleared (and optionally reset `start1/start2`) so subsequent non-header lines are treated as orphan content and not attached to the next valid hunk.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@gvago
Copy link
Copy Markdown
Author

gvago commented Apr 16, 2026

Closing to reopen from fork (lost push access to org branch).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant