Hi @lrauschning,
I had an issue with the merged vcfs of msyd that stems from the way that multiallelic deletions are encoded. This is a simplified section of VCF:
Chr4 19310313 DEL996661 CGAGA C . PASS . GT 0 0 0 0 0 0 0 0 0 0 0 1
Chr4 19310313 DEL964119 CGAGAGA C . PASS . GT 0 1 0 0 0 0 0 0 0 0 0 0
Chr4 19310313 DEL1041418 CGAGAGAGA C . PASS . GT 0 0 0 0 0 1 0 0 0 0 0 0
Chr4 19310313 DEL917455 CGAGAGAGAGA C . PASS . GT 1 0 0 0 0 0 0 0 0 0 0 0
Chr4 19310313 DEL963898 CGAGAGAGAGAGA C . PASS . GT 0 0 1 0 0 0 1 0 0 0 0 0
Chr4 19310313 DEL1044120 CGAGAGAGAGAGAGA C . PASS . GT 0 0 0 0 0 0 0 1 0 0 0 0
Chr4 19310313 DEL1208140 CGAGAGAGAGAGAGAGA C . PASS . GT 0 0 0 0 0 0 0 0 1 0 0 0
Chr4 19310313 DEL673752 CGAGAGAGAGAGAGAGAGA C . PASS . GT 0 0 0 0 0 0 0 0 0 1 0 0
Chr4 19310313 DEL1069688 CGAGAGAGAGAGAGAGAGAGA C . PASS . GT 0 0 0 1 0 0 0 0 0 0 0 0
Chr4 19310313 DEL1323846 CGAGAGAGAGAGAGAGAGAGAGAGA C . PASS . GT 0 0 0 0 1 0 0 0 0 0 1 0
Instead of outputting a single record with a single reference and many alts, msyd produces many records with different refs. Ideally the longest deletion should be taken as the ref and many alts should be made:
Chr4 19310313 DEL996661 CGAGAGAGAGAGAGAGAGAGAGAGA CGAGAGAGAGAGAGAGAGAGA,CGAGAGAGAGAGAGAGAGA,CGAGAGAGAGAGAGAGA,CGAGAGAGAGAGAGA,CGAGAGAGAGAGA,CGAGAGAGAGA,CGAGAGAGA,CGAGAGA,CGAGA,C . PASS . GT 4 2 5 9 10 3 5 6 7 8 10 1
I can achieve this with bcftools norm, so its not a big deal, but it might be worth implementing since multiallelic SNPs and insertions are already handled correctly.
Cheers
Matt
Hi @lrauschning,
I had an issue with the merged vcfs of msyd that stems from the way that multiallelic deletions are encoded. This is a simplified section of VCF:
Instead of outputting a single record with a single reference and many alts, msyd produces many records with different refs. Ideally the longest deletion should be taken as the ref and many alts should be made:
I can achieve this with bcftools norm, so its not a big deal, but it might be worth implementing since multiallelic SNPs and insertions are already handled correctly.
Cheers
Matt