When I try to run CADD on a VCF file with 200k variants, I found the prescore match step executed by extract_scored.py is pretty time consuming. I think maybe this step can be accelerated by parallel matching per chromosome.
I suggest split the prescore file to 24 pieces by chromosome and split the input VCF to pieces by chormosome as well. For each chromosome, perform the extract_scored.py once and let them perform in parallel.
If it is OK for you, I can offer a PR later. Thanks!