Skip to content

Huge file size from borzoi_sad.py #34

@ratheraarif

Description

@ratheraarif

Hi,

Thank you for the nice and truly open source work you are doing!

I tested the code for VEP using a single example from https://github.com/calico/borzoi/blob/main/tutorials/legacy/score_variants/snps_expr.vcf, with the script https://github.com/calico/borzoi/blob/main/tutorials/legacy/score_variants/score_expr_sad.sh. However, I noticed that the size of the sad.h5 file generated was extremely large, approximately 450MB for just a single example.

Additionally, when I ran the code on my own VCF file containing about 135 examples, the resulting file size ballooned to 192GB. Am I making an error in my workflow, or is this file size expected?

Actually, I want to retain only ref and alt predictions and then compute the sad score manually. can this file size be decreased?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions