TCRtoolbox is a collection of tools related to TCR sequences used in the Schumacher lab. TCRtoolbox provides command line interface (CLI) tools for designing and preparing TCR assembly runs (including TCR reference .fa file generation), analysing Illumina bulk TCR and Ag-barcoded sequencing data, and reconstructing TCR amino acid or nucleotide sequences from V/J/CDR3 information. In addition, the the tcr_toolbox package provides many helper functions for working with TCR sequences.
# Clone the repo, install environment, install TCRtoolbox:
git clone https://github.com/schumacherlab/TCRtoolbox.git
cd TCRtoolbox
conda env create -f tcr-toolbox_ARM_env.yml -n py312-tcr-toolbox
conda activate py312-tcr-toolbox
pip install -e .
# Download the datasets on which TCRtoolbox depends (see section `TCRtoolbox datasets`)
curl -L -o tcr_toolbox_data.zip "https://zenodo.org/record/17832522/files/tcr_toolbox_data.zip?download=1"
unzip -q tcr_toolbox_data.zip
# Setup TCRtoolbox/.env to change [path_to] to the path where you downloaded these files on your local system
vi .env
tcr_toolbox_data_path='[path_to]/tcr_toolbox_data'
v_gene_barcode_tracing_path='[path_to]/tcr_toolbox_data/tcr_toolbox_datasets/tcr_assembly/barcode_tracing'
### Tutorials ####
# For detailed tutorials have a look at the markdown tutorials in: TCRtoolbox/tutorials
# Simulate a TCR assembly run:
tcr_toolbox run-tcr-assembly configs/tcr_assembly/simulation_run_config.json
# Analyse bulk TCR read sequencing data for a DNA sequencing library protocol:
tcr_toolbox count-reads-bulk configs/tcr_assembly/run_config_count_reads_bulk_150bp_custom-tcr_bwa.json
# Reconstruct TCRs from CDR3 and VDJ input:
tcr_toolbox reconstruct-tcrs-simple configs/tcr_reconstruction/tcr_reconstruction_simple.jsonconfigs/- configuration template files for the command line tools.tutorials/- tutorials for running command tools using configuration file templatestcr_toolbox/- main Python packagesequencing_analysis/- Illumina bulk TCR beta or alpha chain and antigen sequencing read counter scripts.tcr_assembly/- TCR assembly run design and preparation code.tcr_parsing/- parsers for different TCR input formats.tcr_reconstruction/- full TCR amino acid and nucleotide sequence reconstruction from V/J/CDR3 information.utils/- utilities used across modules.
Clone the package from github:
git clone https://github.com/schumacherlab/TCRtoolbox.gitThis project is distributed with multiple Conda environment YAML files tailored to specific platforms. Use Conda to create and activate an environment from the YAML file that matches your system.
Example (create and activate):
cd TCRtoolbox
conda env create -f tcr-toolbox_ARM_env.yml -n py312-tcr-toolbox
conda activate py312-tcr-toolboxWhich YAML file to use:
tcr-toolbox_x86_env.ymlon x86 machines. Has a bwa dependency that does not (yet) work on ARM machines (e.g. apple M processors).tcr-toolbox_ARM_env.ymlon ARM machines. Replaces bwa with minimap2.
Install the TCRtoolbox package with pip inside your active conda environment (run pip in the same directory as the pyproject.toml is located):
cd TCRtoolbox
pip install -e .Next, TCR toolbox depends on several files (see TCRtoolbox datasets for a more detailed explanation).
curl -L -o tcr_toolbox_data.zip "https://zenodo.org/record/17832522/files/tcr_toolbox_data.zip?download=1"
unzip -q tcr_toolbox_data.zipTo be able to use tcr_toolbox_data you need to have a .env. You can use vi, or any text-editor of your liking to make/adapt the .env file. In the following you need to adapt [path_to] so the path point to where these files are located on your system.
vi
tcr_toolbox_data_path='[path_to]/tcr_toolbox_data'
v_gene_barcode_tracing_path='[path_to]/tcr_toolbox_data/tcr_toolbox_datasets/tcr_assembly/barcode_tracing'The repository contains a small set of standalone Markdown tutorials that explain how to run our pipelines. These can be found in the tutorials directory. Quick summary:
- tutorials/assembly_tutorial.md : TCR assembly run design and preparation using a single command and a configuration file.
- tutorials/count_bulk_illumina_seq_reads_cli_tutorial.md : unique DNA sequencing library protocols can be counted using a single command and provided protocol-specific config templates, reference .fa file generation, and monitoring of bulk read counting jobs. For example, Moravec et al., Nat. Biotech. 2024 TCR reactivity screen beta chain bulk sequencing data can be counted.
- tutorials/tcr_reconstruction_tutorial.md : Minimal pipeline to reconstruct full length TCR sequences (
Leader+Va+CDR3a+Ja+constantaandLeader+Vb+CDR3b+Jb+constantb) from V, J, CDR3 annotations and the constant sequence. Leader and constant sequence are optional.
Many tcr toolbox functions and pipelines depend on several files provided outside of this code repository. The tcr_toolbox_data can be downloaded here: https://doi.org/10.5281/zenodo.17832522.
Or accessed through the command line using:
cd TCRtoolbox
curl -L -o tcr_toolbox_data.zip "https://zenodo.org/record/17832522/files/tcr_toolbox_data.zip?download=1"
unzip -q tcr_toolbox_data.zipIt contains the following files:
tcr_toolbox_datasets/:tcr_assembly/: contains files needed for the tcr assembly and TCR reconstruction pipelines. Some files are required for our robotics TCR assembly platform, and might not be strictly required for your purpose.tcr_reconstruction/: contains files needed for tcr reconstruction (such as imgt reference reference sequences)
tcr_toolbox_tcr_assembly_runs/: can be empty, when you use the TCR assembly pipeline output files will be writtentest: minimal dataset and configs to test command line pipelines
Besides to the provided pipelines, tcr_toolbox can be used as a python package.
- Import package utilities in Python (example):
from tcr_toolbox import tcr_assembly_pipeline
# see subpackages for specific functions and helpersContributions are welcome:
- Open an issue if you find a bug or want a new feature
- Send a pull request with tests that demonstrate fixes/improvements
This repository is developed and maintained by members of the Schumacher lab. For questions, please open an issue or contact the maintainers listed in the repository metadata.
TCRtoolbox is provided under the Apache 2.0 licence (see LICENCE.txt)