-
Notifications
You must be signed in to change notification settings - Fork 20
Description
We found this when working with long reads data, where coverage bins tend to be longer than for short reads data. We use d4tools to calculate bin-by-bin coverage. Here, frequently, certain mean coverage values became very large.
Example
$ ./d4tools stat --region 100bp_bins_more.bed example.d4
chr1 54000 54100 33.05
chr1 54100 54200 32.51
chr1 54200 54300 3083160.33
chr1 54300 54400 3083129.18
chr1 54400 54500 30.16
chr1 54500 54600 30
chr1 54600 54700 30
chr1 54700 54800 30
chr1 54800 54900 29.06
chr1 54900 55000 29.54
Underlying data
$ ./d4tools view --region-file 100bp_bins_more.bed example.d4
chr1 54000 54005 34
chr1 54005 54100 33
chr1 54100 54151 33
chr1 54151 54200 32
chr1 54200 54215 32
chr1 54215 54300 31
chr1 54300 54400 31
chr1 54400 54416 31
chr1 54416 54500 30
chr1 54500 54600 30
chr1 54600 54700 30
chr1 54700 54800 30
chr1 54800 54806 30
chr1 54806 54900 29
chr1 54900 54946 29
chr1 54946 55000 30
It looks like it comes from here:
d4-format/d4/src/d4file/track.rs
Lines 181 to 207 in 6f40313
| scan_partition_impl(handles, |part_left, part_right, active_handles| { | |
| if let Some(default_value) = default_primary_value { | |
| let iter = self.secondary.seek_iter(part_left); | |
| let mut last_right = part_left; | |
| for (mut left, mut right, value) in iter { | |
| left = left.max(part_left); | |
| right = right.min(part_right).max(left); | |
| for handle in active_handles.iter_mut() { | |
| if last_right < left { | |
| handle.feed_rows(last_right, left, &mut std::iter::once(default_value)); | |
| } | |
| handle.feed_rows(left, right, &mut std::iter::once(value)); | |
| } | |
| last_right = right; | |
| if right == part_right { | |
| break; | |
| } | |
| } | |
| if last_right < part_right { | |
| for handle in active_handles.iter_mut() { | |
| handle.feed_rows( | |
| last_right, | |
| part_right, | |
| &mut std::iter::once(default_value), | |
| ); | |
| } | |
| } |
Under certain conditions it loops indefinitely, exhausting the iterator of ranges rather than stopping when done with the requested range.
I have tried just breaking early when the left part of the coverage region is outside the part of interest, and it seems to work fine, i.e. by adding inside the loop:
if left >= part_right {
break;
}
Existing tests run fine with this added. I will see if I can trigger this error in a minimal test data case and come back with a PR.