GitHub - statbiophys/memory_plasmab_dynamics

Data analysis and inference in the paper Quantifying the dynamics of memory B cells and plasmablasts in healthy individuals

Folder structure:

Phad data: it contains all the scripts to download the dataset of https://www.nature.com/articles/s41590-022-01230-1, process the B-cell reperotire data of memory cells and plasmablasts, and infer the different models discussed in the paper.
Mikelov data: similar type of analysis of Phad data but for https://elifesciences.org/articles/79254.
func_py: phyton functions used in the analysis.
func_build: c++ functions that are wrapped in python scripts.

How to reproduce our analysis

Within the two folders Phad data and Mikelov data there are numbered scripts that can

Scripts 1.: download the fastq files
Scripts 2. to 5.: process the fastq files, align the IG sequences
Scripts 6.: cluster the sequences into clonal families
Scripts 7.: run the noise inference and plot the marginals
Scripts 8.: run the Geometric Brownian Motion inference for memory cells and plot the marginals
Scripts 9.: run the inference of the coupled memory-plasmablast system and plot the marginals

Note that the numbering is consistent in the two folders for the two datasets, but the specific type of analysis can differ

We can split the analysis into two parts:

Bioinformatic pipeline, scripts from 1. to 6.

The repository does not contains intermediate files (beacuse of size reasons), and the scripts have to be run in the precise sequence to obtain the final tables. To run the scripts one needs the following software:

Cell Ranger (version 7.1.0): for the single-cell data analysis of the Phad dataset.
Change-O toolkit (https://changeo.readthedocs.io/en/stable/): for the Ig sequence alignment using IgBlast (version 1.22.0) of both the datasets.
Hilary (https://github.com/statbiophys/hilary) for the clonal family assignment of both datasets. The final tables of clone clusters are uploaded into the repository (the output of the scripts 6). This makes possible to directly run the downstream inference analysis without re-run the previous pipeline.

Inference pipeline, scripts 7.,8.,9.

All the intermediate files generated by those scripts are saved in the repo, therefore each notebook can be run independently. Here we provide a Dockerfile that installs all the dependencies and directy launch the python notebooks with jupyter lab.

To build and run the Dockerfile, clone the repository and make sure to have docker installed (https://www.docker.com/). Then change directory inside the repo and type the following commands:

sudo docker build -t infer-b-cell-nb .
sudo docker run -p 8888:8888 -v $(pwd):/workspace infer-b-cell-nb

The last command execute jupyter, but you still need to link that to your favourite browser. You just need to copy the second of the two urls printed at terminal below the sentence Or copy and paste one of these URLs:, and paste it on your browser.

Reproducing the figures

For each figure we point at the notebook (for both the datasets) that is used to generate it.

Main

Fig. 2: 7.3_infer_noise_marginals.ipynb
Fig. 3: 8.3_gbm_marginals.ipynb
Fig. 4: 9.2_memplasm_marginals.ipynb

Supplementary Material

Figs. S1, S2: 5_seq_info.ipynb
Figs. S3, S4: 7.2_infer_noise_results.ipynb
Figs. S5: 7.3_infer_noise_marginals.ipynb
Figs. S6: 8.2_infer_gbm.ipynb (Phad data)
Figs. S7: 8.2_infer_gbm.ipynb (Mikelov data)
Figs. S8: 8.3_gbm_marginals.ipynb
Figs. S9: 8.3_gbm_marginals.ipynb
Figs. S10: 9.1_infer_memplasm.ipynb (Mikelov data)
Figs. S11: 9.1_infer_memplasm.ipynb (Phad data)

This code can be used for reproducing all the analysis and the figures of the paper. I made some effort in commenting and explaining how to execute it, however it can still be hard to read in some parts or execute it from external users. If you are interested in running it and you're having hard time, please send me an email at andrea.mazzolini.90@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Mikelov_data		Mikelov_data
Phad_data		Phad_data
func_build		func_build
func_py		func_py
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data analysis and inference in the paper Quantifying the dynamics of memory B cells and plasmablasts in healthy individuals

Folder structure:

How to reproduce our analysis

Bioinformatic pipeline, scripts from 1. to 6.

Inference pipeline, scripts 7.,8.,9.

Reproducing the figures

Main

Supplementary Material

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data analysis and inference in the paper Quantifying the dynamics of memory B cells and plasmablasts in healthy individuals

Folder structure:

How to reproduce our analysis

Bioinformatic pipeline, scripts from 1. to 6.

Inference pipeline, scripts 7.,8.,9.

Reproducing the figures

Main

Supplementary Material

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages