Skip to content

Bug in pipeline with filtering and map_joins #12

@mrubio-chavarria

Description

@mrubio-chavarria

Hi @vsbuffalo,

Thank you very much for this great library.

I recently started to code in Rust and I created a small binary with GRanges to do a small task. Essentially, the pipeline 1) filters an input bedGraph and 2) classifies the bedGraph entries in different groups by doing a join with a different GRanges. Here is the code:

let mut final_data: Vec<(String, u32, u32, String)> = count_ranges
    // Format tuple
    .left_overlaps(group_ranges.as_granges_ref())
    .unwrap()
    .map_joins(|join_data| {
        let group_id = match join_data.right_data.len() {
            0 => String::from(""),
            _ => join_data.right_data[0].name.clone(),
        };
        (
            group_id,
            join_data.join.left.start(),
            join_data.join.left.end(),
            join_data.left_data.name.clone()
        )
    })
    .unwrap()
    // Remove ranges in mask
    .antifilter_overlaps(mask_ranges)
    .unwrap()
    // Assign a group to each interval
    .filter_overlaps(group_ranges.as_granges_ref())
    .unwrap()
    // Return only data
    .take_data()
    .unwrap();

Initially, I tried instead the code below although the bedGraph data value (join_data.left_data.name.clone()) is not the right value - i.e. some other value is returned in the tuple.

let mut final_data: Vec<(String, u32, u32, String)> = count_ranges
    // Remove ranges in mask
    .antifilter_overlaps(mask_ranges)
    .unwrap()
    // Filter only intervals in groups
    .filter_overlaps(group_ranges.as_granges_ref())
    .unwrap()
    // Format tuple
    .left_overlaps(group_ranges.as_granges_ref())
    .unwrap()
    .map_joins(|join_data| {
        let group_id = match join_data.right_data.len() {
            0 => String::from(""),
            _ => join_data.right_data[0].name.clone(),
        };
        (
            group_id,
            join_data.join.left.start(),
            join_data.join.left.end(),
            join_data.left_data.name.clone()
        )
    })
    .unwrap()
    // Return only data
    .take_data()
    .unwrap();

Eventually, I write the vector content line by line into a file.

Why does the order matter here? Where do these values come from? I have tried to fix this issue although I do not find the cause. Could you please let me know if you have any idea of what could be causing the problem? Specifically, the problem is that the value in the fourth entry in the tuple varies between both snippets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions