A python package for analyzing variant calls from high throughput single cell genome sequencing experiments. Provides a convenient scanpy style API for loading joint calling vcf files into anndata objects, and performing downstream processing and analysis tasks, including:
- coverage analysis and filtering
- Annotation ancestral trinucleotide sequence context of SNPs
- Computing trinucleotide mutation spectra
- Visualizing mutation spectra
in development:
- sequencing error / artifact correction
- Mutation signature fitting and de novo signature discovery
- Phylogenetic analysis
- distance based
- maximum liklihood
- Bayseian
- eQTL analysis (using genome-transcriptome coassay data)
Install the latest development version:
git clone https://github.com/harrispopgen/cellspec.git
cd cellspec
pip install -e .As an homage to the semi permeable capsule technology that spurred the need for this package, I encourage the following convention when importing cellspec:
import cellspec as spccellspec uses the {class}~anndata.AnnData class to store joint calling data.
:width: 500px
From the scanpy docs:
At the most basic level, an {class}
~anndata.AnnDataobjectadatastores a data matrixadata.X, annotation of observationsadata.obsand variablesadata.varaspd.DataFrameand unstructured annotationadata.unsasdict. Names of observations and variables can be accessed viaadata.obs_namesandadata.var_names, respectively. {class}~anndata.AnnDataobjects can be sliced like dataframes, for example,adata_subset = adata[:, list_of_gene_names].
In cellspec, observations are cells (or samples), and variables are bi-allelic sites. Genotype calls from the vcf file are stored in adata.X, and depth information in adata.layers. Total read depth at each site in each observation is stored in adata.layers["DP"], and alternate allele read depth is stored in adata.layers["AD"].
To load a vcf into anndata:
adata = spc.pp.load_vcf(filename)This initial step can take a somewhat long time, especially for datasets with a lot of alleles. As such, it a good idea to save your data in .h5ad format for more convenient loading in the future:
adata.write_h5ad(filename)Please refer to the documentation and tutorials for more instruction, and the API documentation for information on specific functionality.