Reference list of lineage-defining mutations as derived from SNP-IT.
SNP-IT is a tool written by Sam Lipworth that infers the species, lineage and (if possible) sub-lineage of a Mycobacterial samples and is published as below:
Lipworth S, Jajou R, De Neeling A, Bradley P, Van Der Hoek W, Maphalala G, Bonnet M, Sanchez-Padilla E, Diel R, Niemann S, Iqbal Z, Smith G, Peto T, Crook D, Walker T, Van Soolingen D. 2019. SNP-IT tool for identifying subspecies and associated lineages of Mycobacterium tuberculosis complex. Emerg Infect Dis 25:482–488. doi:10.3201/eid2503.180894
In the lib/ folder it contains a file library.csv that lists the lineages covered. The id is the internal identifier given to that lineage by SNP-IT.
id,species,lineage,sublineage
Indo_Oceanic,M. tuberculosis,Lineage 1,
beijing,M. tuberculosis,Lineage 2,
East_African_Indian,M. tuberculosis,Lineage 3,
within the same lib/ folder, each lineage has a single file named with its id e.g. beijing contains
1011511 C
1022003 C
1028217 A
1034758 T
1071966 G
1076689 C
1076880 T
1097442 T
1102468 A
This is a tab-limited file containing the genome-indices (1-based) of positions in the H37Rv version 3 genome which can be used to identify this lineage. If e.g. at a genome index of 1011511 there is a C then that is consistent with this sample belonging to Lineage 2.
We simply read these files in, then using gumpy, apply all the single nucleotide changes identified for that lineage. Then, using gumpy, we create a list of the VARIANTS and, more usefully, by only considering those SNVs in genes and translating genes into amino acids, we create a list of MUTATIONS. E.g. for VARIANTS
SNPIT_ID SPECIES LINEAGE SUBLINEAGE VARIANT
Dassie Dassie bacillus (ex Procavia capensis) NaN NaN 4087g>t
Dassie Dassie bacillus (ex Procavia capensis) NaN NaN 5073g>a
Dassie Dassie bacillus (ex Procavia capensis) NaN NaN 19052c>g
and for MUTATIONS
SNPIT_ID SPECIES LINEAGE SUBLINEAGE GENE MUTATION
Dassie Dassie bacillus (ex Procavia capensis) NaN NaN PE_PGRS11 T469N
Dassie Dassie bacillus (ex Procavia capensis) NaN NaN PE_PGRS11 R512L
Dassie Dassie bacillus (ex Procavia capensis) NaN NaN PE_PGRS11 P518L
What is not yet clear is if SNPIT contains the complete set of lineage-defining mutations, or just sufficient to allow a sample to be identified. It likely does not include mutations deep enough in the phylogenetic tree that they cannot unambigiously define which lineage a sample belongs to, but merely narrow it down.
Philip W Fowler 6 Feb 2022