Skip to content

python wrapper is slower than straight awk #4

@cc2qe

Description

@cc2qe
time zcat Omni25_genotypes_2141_samples.b37.v2.vcf.gz | vawk --header '{ print $1,$2,$3,$4,$5,$6,$7,$8,$9,S$NA12878 }' | bgzip -c > NA12878.omni.vcf.gz
# real    19m43.893s
# user    21m27.138s
# sys     0m59.355s

# aside: outside python it's 25% faster
time zcat Omni25_genotypes_2141_samples.b37.v2.vcf.gz | awk 'BEGIN {FS=" "; OFS="\t"; } {if ($0~"^#") {if ($0!~"^##") { for (i=10;i<=NF;++i) SAMPLE[$i]=i; }; print} else {split($9,fmt,":"); SAMPLE_NA12878_ALL=$SAMPLE["NA12878"];  print $1,$2,$3,$4,$5,$6,$7,$8,$9,SAMPLE_NA12878_ALL }} END {}' - | bgzip -c > NA12878.omni2.vcf.gz
# real    15m28.737s
# user    17m1.151s
# sys     0m15.266s

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions