Skip to content

Conversation

@nh13
Copy link
Owner

@nh13 nh13 commented Dec 22, 2025

Summary

Add field width specifiers to %s format strings to prevent buffer overflow.

Files fixed:

  • mut_txt.c: %1023s for name and mut fields
  • mut_vcf.c: %1023s/%1024s for name, id, ref, alt fields
  • mut_bed.c: %1023s for name, bases, type fields
  • regions_bed.c: %1023s for name field

Closes #100

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes

  • Improved input parsing robustness across multiple file format handlers (BED, TXT, VCF) to better handle and constrain input field sizes, ensuring more stable data processing.

✏️ Tip: You can customize this high-level summary in your review settings.

Add field width specifiers to %s format strings to prevent buffer
overflow when parsing input files with long fields.

Files fixed:
- mut_txt.c: %1023s for name and mut fields
- mut_vcf.c: %1023s/%1024s for name, id, ref, alt fields
- mut_bed.c: %1023s for name, bases, type fields
- regions_bed.c: %1023s for name field

Closes #100

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Dec 22, 2025

Walkthrough

Four source files receive updates to input parsing calls with field width specifiers. The format strings for fscanf and sscanf calls now include maximum character limits (1023–1024) for string fields, constraining field sizes during read operations. No control flow, logic, or error handling changes are introduced; truncation occurs for inputs exceeding these limits.

Changes

Cohort / File(s) Summary
Buffer overflow prevention in input parsing
src/mut_bed.c, src/mut_txt.c, src/mut_vcf.c, src/regions_bed.c
Added field width specifiers to fscanf/sscanf format strings to limit maximum string lengths during parsing: name, bases, and type fields capped at 1023–1024 characters. Prevents potential buffer overflow without altering existing control flow or error handling.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding field width specifiers to prevent buffer overflow in string parsing functions.
Linked Issues check ✅ Passed All code changes directly implement the requirement from issue #100: field width specifiers are added to fscanf/sscanf %s formats across four files to prevent buffer overflow.
Out of Scope Changes check ✅ Passed All changes are strictly scoped to the buffer overflow prevention objective; no unrelated modifications are present in any of the four modified files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/buffer-overrun-fscanf

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/regions_bed.c (1)

54-54: Buffer overflow prevention is correctly implemented.

The field width specifier %1023s properly constrains the name field to prevent buffer overrun in the 1024-byte buffer.

Consider detecting and reporting truncation explicitly.

If input exceeds 1023 characters, it will be silently truncated, potentially causing the contig name match at line 57 to fail with a confusing "contig not found" error. Consider checking for truncation and reporting a clearer error message.

Example approach to detect truncation

One approach is to check if the next character after reading is non-whitespace, indicating truncation occurred:

int c;
while(0 < fscanf(fp, "%1023s\t%u\t%u", name, &start, &end)) {
    // Check if name was truncated
    c = fgetc(fp);
    if (c != EOF && c != '\t' && c != '\n' && c != '\r' && !isspace(c)) {
        ungetc(c, fp);
        fprintf(stderr, "Error: contig name exceeds maximum length of 1023 characters\n");
        exit(1);
    }
    if (c != EOF) ungetc(c, fp);
    
    // ... rest of logic

Note: This is illustrative; the actual implementation would need to account for the parsing logic flow.

src/mut_vcf.c (1)

88-88: Buffer overflow prevention is correctly implemented.

The field width specifiers properly match buffer sizes:

  • %1023s for name[1024], id[1024], ref[1024]
  • %1024s for alt[1025]

All correctly leave room for the null terminator.

Silent truncation of ref/alt alleles could cause data integrity issues.

If reference or alternate alleles exceed the limits, they will be truncated without warning. Truncated alleles will have incorrect lengths and could pass through validation logic but produce incorrect mutation records. Consider adding explicit length validation or error reporting for oversized alleles, as these are critical to mutation accuracy.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26e3aa9 and 4c1de8f.

📒 Files selected for processing (4)
  • src/mut_bed.c
  • src/mut_txt.c
  • src/mut_vcf.c
  • src/regions_bed.c
🔇 Additional comments (2)
src/mut_bed.c (1)

55-55: Buffer overflow prevention is correctly implemented.

The field width specifiers %1023s correctly prevent buffer overrun for all string fields (name, bases, type) given their 1024-byte buffer sizes.

Downstream validation provides some truncation detection.

The length validation at line 76 will catch truncated bases fields since strlen(bases) won't match (end - start). However, truncated contig names or type strings will produce less clear error messages ("contig not found" or "mutation type unrecognized" rather than "field too long").

src/mut_txt.c (1)

54-54: Buffer overflow prevention is correctly implemented.

The field width specifiers %1023s properly constrain the name and mut fields to prevent buffer overrun in their 1024-byte buffers.

@nh13 nh13 merged commit ae9cd92 into main Dec 25, 2025
2 checks passed
@nh13 nh13 deleted the fix/buffer-overrun-fscanf branch December 25, 2025 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: buffer overrun in fscanf/sscanf string parsing

2 participants