Skip to content

Fix BED format parsing to accept valid end coordinates at chromosome boundaries#12

Merged
wdecoster merged 3 commits intomainfrom
copilot/fix-bed-parsing-issue
Jan 3, 2026
Merged

Fix BED format parsing to accept valid end coordinates at chromosome boundaries#12
wdecoster merged 3 commits intomainfrom
copilot/fix-bed-parsing-issue

Conversation

Copy link
Contributor

Copilot AI commented Jan 2, 2026

BED format uses 0-based half-open intervals where the end coordinate is exclusive. The validation logic incorrectly rejected entries with end == chromosome_length, which are valid (representing positions 0 to length-1).

Changes

  • Fixed coordinate validation: Changed chromosome_length > end to chromosome_length >= end in new_interval() to correctly accept end coordinates equal to chromosome length
  • Separated error cases: Split ambiguous error message into distinct cases:
    • "Chromosome {chrom} is not in the fasta file"
    • "End coordinate {end} is out of bounds for chromosome {chrom} (length: {chrom_length})"

Example

For a chromosome of length 1000:

// Before: This would panic incorrectly
// BED entry: chr1  900  1000  repeat1
// Represents positions 900-999 (end is exclusive)

// After: Correctly accepted
if chrom_length >= end {  // 1000 >= 1000 → true
    return Some(Self { chrom, start, end, created: None });
}

Added tests for boundary conditions: test_bed_end_coordinate_at_chromosome_boundary and test_bed_end_coordinate_beyond_chromosome_boundary.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • ftp.1000genomes.ebi.ac.uk
    • Triggering command: /home/REDACTED/work/STRdust/STRdust/target/debug/deps/STRdust-3dd46870d1a71233 /home/REDACTED/work/STRdust/STRdust/target/debug/deps/STRdust-3dd46870d1a71233 /home/REDACTED/work/STRdust/STRdust/target/debug/build/hts-sys-79f17932f32b0613/ou/home/REDACTED/work/STRdust/STRdust/target/debug/deps/STRdust-123480e7d2d57adf.0eoymgxy063n92ptgiwoaq8bd.138dtzm.rcgu.o out/include -1949cf8c6b5b557/home/REDACTED/work/STRdust/STRdust/target/debug/build/hts-sys-79f/home/REDACTED/work/STRdust/STRdust/target/debug/deps/STRdust-123480e7d2d57adf.0ftrn8pnpnzlg0bb1lemnjk80.138dtzm.rcgu.o 0/xz-5.2/src/liblzma/api uild/curl-sys-ca/home/REDACTED/work/STRdust/STRdust/target/debug/build/libz-sys-f5/home/REDACTED/work/STRdust/STRdust/target/debug/deps/STRdust-123480e7d2d57adf.0hoz9kaoqxtvg3d3zh9huokuu.138dtzm.rcgu.o z7orbzt2dzy110bpk2t3qf.1sjysvq.rcgu.o 0qrncjfar4qnnaxc5oiz6p.1sjysvq.rcgu.o sr7m3guezi6v6c8sbicdo3.1sjysvq.rcgu.o qjq2wwyhxqh9gdzcbpuj0w.1sjysvq.rcgu.o y7tatkbew8g64awb7jj896.1sjysvq.rcgu.o 6xaqhmormr6s0pxdw9w86s.1sjysvq.rcgu.o rlv5ry46qtm4zahj08ctlw.1sjysvq.rcgu.o cuao2y7d0xb97in3oclu9s.1sjysvq.rcgu.o iazqjebhvcmnv2lynauj6j.1sjysvq.rcgu.o ymgxy063n92ptgiwoaq8bd.1sjysvq.rcgu.o cqv8rxneyetfdjvhzj6xdk.1sjysvq.rcgu.o rn8pnpnzlg0bb1lemnjk80.1sjysvq.rcgu.o fjsnik14x7qkryieq5dnpk.1sjysvq.rcgu.o z9kaoqxtvg3d3zh9huokuu.1sjysvq.rcgu.o (dns block)
    • Triggering command: /home/REDACTED/work/STRdust/STRdust/target/debug/deps/STRdust-3dd46870d1a71233 /home/REDACTED/work/STRdust/STRdust/target/debug/deps/STRdust-3dd46870d1a71233 27cf753af69d9b8/-W t/target/debug/b-I f/lzma-sys-0.1.2/home/REDACTED/work/STRdust/STRdust/target/debug/build/hts-sys-297/home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_64-REDACTED-linux-gnu/lib/librustc_demangle-43b2ff22c18e1125.rlib t/d0�� 7ac01cd33093f/ou/home/REDACTED/work/STRdust/STRdust/target/debug/build/libz-sys-e3/home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_64-REDACTED-linux-gnu/lib/libhashbrown-2ed6a8f06fc51a9d.rlib 7ac01cd33093f/ou-I 4ab757da748304/o/home/REDACTED/work/STRdust/STRdust/target/debug/build/bzip2-sys-d/home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_64-REDACTED-linux-gnu/lib/libminiz_oxide-5312b588e5cfab93.rlib 4ab757da748304/orustfmt 27cf753af69d9b8//home/REDACTED/work/STRdust/STRdust/src/main.rs 27cf753af69d9b8//home/REDACTED/work/STRdust/STRdust/tests/pathogenic_integration_tests.rs f/lzma-sys-0.1.2--edition f/lz�� 7ac01cd33093f/ou--check 7ac01cd33093f/out/d03ca6f18e292d06-kfunc.o 7ac01cd33093f/out/d03ca6f18e292d06-kstring.o 7ac01cd33093f/oucargo 7ac01cd33093f/outest 7ac01cd33093f/ou-- 7ac01cd33093f/ou--nocapture (dns block)
    • Triggering command: /home/REDACTED/work/STRdust/STRdust/target/debug/deps/STRdust-3dd46870d1a71233 /home/REDACTED/work/STRdust/STRdust/target/debug/deps/STRdust-3dd46870d1a71233 --quiet /home/REDACTED/work/STRdust/.codeql-scratch/dbs/rust/working/target/debug/build/curl-sys-d19583a32f79f85f/out/include -I -194�� s-sys-2977ac01cd33093f/out/htslib st/working/targe-I bz-sys-e34ab757da748304/out/include st/working/targe/usr/libexec/gcc/x86_64-linux-gnu/13/cc1 ip2-sys-d27cf753af69d9b8/out/include st/working/targe-I 0/xz-5.2/src/liblzma/api -194�� rl-sys-d19583a32f79f85f/out/include st/working/targe-I t/debug/build/li/home/REDACTED/work/STRdust/.codeql-scratch/dbs/rust/working/target/debug/build/bz-v st/working/targellvm-config t/debug/build/bz--prefix k/STRdust/.codeq-I f/lzma-sys-0.1.2/home/REDACTED/work/STRdust/.codeql-scratch/dbs/rust/working/target/debug/build/cu--gdwarf-4 (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>BED parsing incorrect ?</issue_title>
<issue_description>Hi there,

Thanks for developing STRdust !

I am using it on phased R10 Nanopore GIAB data

My FASTA is GCA_000001405.15_GRCh38_no_alt_analysis_set and my --region-file is GRCh38 tandem repeats BED from Sniffles

But I got a Chromosome chrUn_KI270376v1 is not in the fasta file or the end coordinate is out of bounds error
=> I ran bedtools getfasta on faulty region and no error

So I am wondering if maybe your parsing of BED format is incorrect ?
=> Because end coordinate is supposed to be excluded in BED format ?

Thanks for your help !
Best,
Felix.
</issue_description>

<agent_instructions>investigate if there could be an issue with how bed files are parsed, and if the error message is accurate. Are the chromosomes from the fasta or the chromosomes from the BAM used to check if the chromosome was found? The error message is not too helpful in the sense that it still leaves two possible issues</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@wdecoster Hrm, my apologies for not following up earlier. I don't think I would recommend that tandem repeats bed file. That is to say, I don't know how it was generated, but it's purpose is different: to help SV calling - not to delineate targets for STR genotyping. Rather, use the adotto catalog, trexplorer or simple repeats from the ucsc browser. Let me know if you can't find it!

But that also doesn't necessarily solve your error. I am not immediately worried about the bed parsing, but maybe the error message is not correct. Is the fasta file you use for the getfasta the same as the one used to align the GIAB data?

Wouter</comment_new>


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits January 2, 2026 22:00
- Fix BED end coordinate validation: change comparison from `>` to `>=` to properly handle BED format's exclusive end coordinates
- Improve error messages to clearly distinguish between "chromosome not found" and "coordinate out of bounds" errors
- Add comprehensive tests for BED boundary conditions
- Refactor code to eliminate compiler warnings

Co-authored-by: wdecoster <11660522+wdecoster@users.noreply.github.com>
- Update test data to use actual 1000bp sequences matching FAI declarations
- Ensures tests are robust and won't cause issues with faidx library
- Tests still validate the same boundary conditions

Co-authored-by: wdecoster <11660522+wdecoster@users.noreply.github.com>
@wdecoster
Copy link
Owner

I cancelled the security audit, we only need the tests for now

Copilot AI changed the title [WIP] Fix incorrect BED parsing in STRdust Fix BED format parsing to accept valid end coordinates at chromosome boundaries Jan 2, 2026
Copilot AI requested a review from wdecoster January 2, 2026 22:09
@wdecoster wdecoster marked this pull request as ready for review January 3, 2026 20:43
@wdecoster wdecoster merged commit 789a74b into main Jan 3, 2026
3 of 4 checks passed
@wdecoster wdecoster deleted the copilot/fix-bed-parsing-issue branch January 3, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BED parsing incorrect ?

2 participants