Skip to content

reference data limit number of contigs to 5000 #95

@tcezard

Description

@tcezard

Associated with issue EdinburghGenomics/Analysis-Driver#344
The QC for genome with lots of contigs can be very slow becuase GATK3.4 does not work very well with such genomes.
The reference data process should check the number of contigs in the in the fasta file and offer to merge contigs in order to limit the QC time.

Add new option in reference_data.py that will merge contigs entries in chunck of 20Mb minimum.
The new genome version can only be used for qc.
A comment will be added to describe what the modification is.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions