cellspec

A python package for analyzing variant calls from high throughput single cell genome sequencing experiments. Provides a convenient scanpy style API for loading joint calling vcf files into anndata objects, and performing downstream processing and analysis tasks, including:

coverage analysis and filtering
Annotation ancestral trinucleotide sequence context of SNPs
Computing trinucleotide mutation spectra
Visualizing mutation spectra

in development:

sequencing error / artifact correction
Mutation signature fitting and de novo signature discovery
Phylogenetic analysis
- distance based
- maximum liklihood
- Bayseian
eQTL analysis (using genome-transcriptome coassay data)

Installation

Install the latest development version:

git clone https://github.com/harrispopgen/cellspec.git
cd cellspec
pip install -e .

Getting started

As an homage to the semi permeable capsule technology that spurred the need for this package, I encourage the following convention when importing cellspec:

import cellspec as spc

cellspec uses the {class}~anndata.AnnData class to store joint calling data.

:width: 500px

From the scanpy docs:

At the most basic level, an {class}~anndata.AnnData object adata stores a data matrix adata.X, annotation of observations adata.obs and variables adata.var as pd.DataFrame and unstructured annotation adata.uns as dict. Names of observations and variables can be accessed via adata.obs_names and adata.var_names, respectively. {class}~anndata.AnnData objects can be sliced like dataframes, for example, adata_subset = adata[:, list_of_gene_names].

In cellspec, observations are cells (or samples), and variables are bi-allelic sites. Genotype calls from the vcf file are stored in adata.X, and depth information in adata.layers. Total read depth at each site in each observation is stored in adata.layers["DP"], and alternate allele read depth is stored in adata.layers["AD"].

To load a vcf into anndata:

adata = spc.pp.load_vcf(filename)

This initial step can take a somewhat long time, especially for datasets with a lot of alleles. As such, it a good idea to save your data in .h5ad format for more convenient loading in the future:

adata.write_h5ad(filename)

Please refer to the documentation and tutorials for more instruction, and the API documentation for information on specific functionality.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
.vscode		.vscode
docs		docs
src/cellspec		src/cellspec
tests		tests
.codecov.yaml		.codecov.yaml
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CP116366.1		CP116366.1
LICENSE		LICENSE
README.md		README.md
biome.jsonc		biome.jsonc
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cellspec

Installation

Getting started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

harrispopgen/cellspec

Folders and files

Latest commit

History

Repository files navigation

cellspec

Installation

Getting started

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages