-
Notifications
You must be signed in to change notification settings - Fork 37
fix: prevent buffer overrun in fscanf/sscanf string parsing #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add field width specifiers to %s format strings to prevent buffer overflow when parsing input files with long fields. Files fixed: - mut_txt.c: %1023s for name and mut fields - mut_vcf.c: %1023s/%1024s for name, id, ref, alt fields - mut_bed.c: %1023s for name, bases, type fields - regions_bed.c: %1023s for name field Closes #100 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
WalkthroughFour source files receive updates to input parsing calls with field width specifiers. The format strings for Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/regions_bed.c (1)
54-54: Buffer overflow prevention is correctly implemented.The field width specifier %1023s properly constrains the name field to prevent buffer overrun in the 1024-byte buffer.
Consider detecting and reporting truncation explicitly.
If input exceeds 1023 characters, it will be silently truncated, potentially causing the contig name match at line 57 to fail with a confusing "contig not found" error. Consider checking for truncation and reporting a clearer error message.
Example approach to detect truncation
One approach is to check if the next character after reading is non-whitespace, indicating truncation occurred:
int c; while(0 < fscanf(fp, "%1023s\t%u\t%u", name, &start, &end)) { // Check if name was truncated c = fgetc(fp); if (c != EOF && c != '\t' && c != '\n' && c != '\r' && !isspace(c)) { ungetc(c, fp); fprintf(stderr, "Error: contig name exceeds maximum length of 1023 characters\n"); exit(1); } if (c != EOF) ungetc(c, fp); // ... rest of logicNote: This is illustrative; the actual implementation would need to account for the parsing logic flow.
src/mut_vcf.c (1)
88-88: Buffer overflow prevention is correctly implemented.The field width specifiers properly match buffer sizes:
- %1023s for name[1024], id[1024], ref[1024]
- %1024s for alt[1025]
All correctly leave room for the null terminator.
Silent truncation of ref/alt alleles could cause data integrity issues.
If reference or alternate alleles exceed the limits, they will be truncated without warning. Truncated alleles will have incorrect lengths and could pass through validation logic but produce incorrect mutation records. Consider adding explicit length validation or error reporting for oversized alleles, as these are critical to mutation accuracy.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/mut_bed.csrc/mut_txt.csrc/mut_vcf.csrc/regions_bed.c
🔇 Additional comments (2)
src/mut_bed.c (1)
55-55: Buffer overflow prevention is correctly implemented.The field width specifiers %1023s correctly prevent buffer overrun for all string fields (name, bases, type) given their 1024-byte buffer sizes.
Downstream validation provides some truncation detection.
The length validation at line 76 will catch truncated bases fields since
strlen(bases)won't match(end - start). However, truncated contig names or type strings will produce less clear error messages ("contig not found" or "mutation type unrecognized" rather than "field too long").src/mut_txt.c (1)
54-54: Buffer overflow prevention is correctly implemented.The field width specifiers %1023s properly constrain the name and mut fields to prevent buffer overrun in their 1024-byte buffers.
Summary
Add field width specifiers to %s format strings to prevent buffer overflow.
Files fixed:
Closes #100
🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.