Skip to content

Fix vcf parsing#25

Open
xfengnefx wants to merge 1 commit intotreangenlab:mainfrom
xfengnefx:main
Open

Fix vcf parsing#25
xfengnefx wants to merge 1 commit intotreangenlab:mainfrom
xfengnefx:main

Conversation

@xfengnefx
Copy link
Copy Markdown

Hi,

Phased variants is read from vcf file by finding "1|0" or "0|1" substring in each vcf records. This should be done only to the last column of a vcf record (in single sample vcf files), not the whole record. Similar for replacing the phasing.

Example: The following line is from a epi2me-labs/wf-human-variation + hapcut2 v1.3.1 run on this bam. The variant is unphased but line has a "0|1", which crashes the run by calling int() on a string:

chr6 145913508 . G A 25.38 PASS P;ANN=A|synonymous_variant|LOW|S HPRH|SHPRH|transcript|XM_017010691.2|protein_coding|24/30|c.4296C>T|p.Cys1432Cys|4457/5527|4296/ 5235|1432/1744||,A|synonymous_variant|LOW|SHPRH|SHPRH|transcript|XM_006715439.4|protein_coding|2 4/31|c.4296C>T|p.Cys1432Cys|4457/11423|4296/5124|1432/1707||,A|synonymous_variant|LOW|SHPRH|SHPR H|transcript|XM_006715443.4|protein_coding|24/26|c.4296C>T|p.Cys1432Cys|4457/4780|4296/4524|1432 /1507||,A|synonymous_variant|LOW|SHPRH|SHPRH|transcript|XM_017010693.2|protein_coding|24/31|c.42 96C>T|p.Cys1432Cys|4457/5304|4296/5073|1432/1690||,A|synonymous_variant|LOW|SHPRH|SHPRH|transcri pt|XM_017010696.2|protein_coding|25/31|c.2853C>T|p.Cys951Cys|4074/5145|2853/3792|951/1263||,A|sy nonymous_variant|LOW|SHPRH|SHPRH|transcript|XM_024446394.1|protein_coding|25/31|c.2853C>T|p.Cys9 51Cys|4374/5445|2853/3792|951/1263||,A|synonymous_variant|LOW|SHPRH|SHPRH|transcript|XM_01701069 2.1|protein_coding|24/30|c.4296C>T|p.Cys1432Cys|4695/5765|4296/5235|1432/1744||,A|synonymous_var iant|LOW|SHPRH|SHPRH|transcript|XM_024446393.1|protein_coding|25/31|c.3354C>T|p.Cys1118Cys|4590/ 5660|3354/4293|1118/1430||,A|synonymous_variant|LOW|SHPRH|SHPRH|transcript|XM_011535719.3|protei n_coding|24/30|c.4296C>T|p.Cys1432Cys|4457/7072|4296/5034|1432/1677||,A|synonymous_variant|LOW|S HPRH|SHPRH|transcript|NM_001042683.3|protein_coding|24/30|c.4296C>T|p.Cys1432Cys|4956/7596|4296/ 5052|1432/1683||,A|synonymous_variant|LOW|SHPRH|SHPRH|transcript|NM_001370327.1|protein_coding|2 4/30|c.4296C>T|p.Cys1432Cys|4530/7170|4296/5052|1432/1683||,A|synonymous_variant|LOW|SHPRH|SHPRH |transcript|NM_001370328.1|protein_coding|26/32|c.2853C>T|p.Cys951Cys|4114/6754|2853/3609|951/12 02||,A|synonymous_variant|LOW|SHPRH|SHPRH|transcript|NM_173082.4|protein_coding|24/30|c.4308C>T| p.Cys1436Cys|4968/7261|4308/4980|1436/1659||,A|downstream_gene_variant|MODIFIER|SHPRH|SHPRH|tran script|XR_002956273.1|pseudogene||n.*4666C>T|||||4666|,A|non_coding_transcript_exon_variant|MODI FIER|SHPRH|SHPRH|transcript|XR_942391.3|pseudogene|24/29|n.4457C>T||||||,A|non_coding_transcript _exon_variant|MODIFIER|SHPRH|SHPRH|transcript|XR_942393.3|pseudogene|24/29|n.4457C>T||||||,A|non _coding_transcript_exon_variant|MODIFIER|SHPRH|SHPRH|transcript|XR_942392.3|pseudogene|24/29|n.4 457C>T||||||,A|non_coding_transcript_exon_variant|MODIFIER|SHPRH|SHPRH|transcript|XR_942390.3|ps eudogene|24/29|n.4457C>T|||||| GT:GQ:DP:AD:AF:PS 1/1:25:89:0,86:0.9663:.

Thanks!

Phased variants is read from vcf file by finding 
"1|0" or "0|1" substring in each vcf records. 
This should be done only to the last column. 
This fix still assumes input is a single sample vcf.
@Fu-Yilei
Copy link
Copy Markdown
Collaborator

Thank you for catching this bug. If you have tested the modified code I can merge the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants