Hi Leon,
I'm having some more issues with msyd VCFs. I found that when merging VCFs, msyd keeps all SNPs and Indels overlapping called core syntenic regions. However in syri output, syntenic alignments can actually overlap with non-syntenic ones (e.g. when a region is syntenic between two genomes but also duplicated elsewhere). I think msyd doesn't filter out these regions of the syntenic alignments (perhaps it should btw, or at least have the option to?), and also doesn't check the parents records of SNPs/Indels that overlap them during vcf filtering. The result is VCFs with conflicting variants like these:
Here minimap2 outputs three alignments between Col-0 and Ler. syri considers one syntenic, one as a duplication and one as a translocation. The syntenic and translocation alignments have corresponding but different SNPs and Indels. Instead of just keeping the variants that originate from the syntenic alignment however, msyd copies all of the variants that overlap the syntenic region across to the output vcf. The consequence (as well as not making much biological sense) is a broken VCF, because a SNP adjacent to an Indel upsets a lot of tools (e.g. bcftools consensus).
Something similar also happens when two syntenic alignments overlap, but this is more of a syri problem...
Hi Leon,
I'm having some more issues with msyd VCFs. I found that when merging VCFs, msyd keeps all SNPs and Indels overlapping called core syntenic regions. However in syri output, syntenic alignments can actually overlap with non-syntenic ones (e.g. when a region is syntenic between two genomes but also duplicated elsewhere). I think msyd doesn't filter out these regions of the syntenic alignments (perhaps it should btw, or at least have the option to?), and also doesn't check the parents records of SNPs/Indels that overlap them during vcf filtering. The result is VCFs with conflicting variants like these:
Here minimap2 outputs three alignments between Col-0 and Ler. syri considers one syntenic, one as a duplication and one as a translocation. The syntenic and translocation alignments have corresponding but different SNPs and Indels. Instead of just keeping the variants that originate from the syntenic alignment however, msyd copies all of the variants that overlap the syntenic region across to the output vcf. The consequence (as well as not making much biological sense) is a broken VCF, because a SNP adjacent to an Indel upsets a lot of tools (e.g. bcftools consensus).
Something similar also happens when two syntenic alignments overlap, but this is more of a syri problem...