Skip to content

ObenaufLab/CaTCHseq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

371 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CaTCHseq pipeline
================

Overview
--------
The CaTCHseq pipeline processes PCR-amplified CaTCH libraries from FASTQ files through mapping, barcode collapsing/deduplication, multiplet handling, and reporting. It is built with Nextflow (DSL2) and supports CellRanger or STARsolo mapping. Deduplication leverages umi_tools network-based clustering with tunable strategies and distances for CaTCH barcodes and UMIs.

Quick start (Docker-based defaults)
-----------------------------------
```bash
nextflow run nextflow/main.nf \
	--libraries <libraries.csv> \
	--outputDir ./CaTCHseq_OUTPUT \
	--reportsDir ./REPORTS
```

Key inputs
----------
- `--libraries` (CSV, required): columns SampleName, Condition, Replicate, LibraryType (GEX|CaTCHseq), R1, R2, CellNumber, Chemistry.
- Reference inputs vary by mapper:
	- CellRanger: `--index` (transcriptome), `--reference`, `--annotation` as needed for building indexes.
	- STARsolo: `--index`, `--reference`, `--annotation`, optional `--whitelist`.

Major parameters
----------------
- General
	- `--mapper` (CellRanger|STAR) and mapper-specific params `--cellranger_params`, `--star_params`, `--idx_params`.
	- `--withQC` to run FastQC/MultiQC; `--fastqc_params` for extra options.
	- `--chunkSize` read chunking for barcode counting.
	- `--minReads` minimum reads per CaTCH barcode; `--filter` to use filtered counts.
	- `--min_detected_barcodes`, `--singlet_cutoff`, `--bc1_cutoff`, `--bc2_cutoff` for downstream classification.

- Barcode/UMI collapsing (umi_tools-backed)
	- Distance cutoffs:
		- `--maxDist` global fallback (default 1).
		- `--maxDistCaTCH` distance for CaTCH barcode collapsing (default 1; falls back to `--maxDist`).
		- `--maxDistUMIs` distance for UMI collapsing (default 1; falls back to `--maxDist`).
	- Network methods:
		- `--clusterMethodCaTCH` (directional/adjacency/cluster, default directional).
		- `--clusterMethodUMIs` (directional/adjacency/cluster, default directional).
	- Uniqueness toggle: `--uniqueCaTCH` (true/false) to enable umi_tools-based collapsing; otherwise Hamming distance.

Pipeline steps (high level)
---------------------------
1) **QC** (optional): FastQC → MultiQC summaries.
2) **Mapping**: CellRanger count or STARsolo (with chemistry-specific presets).
3) **Barcode counting**: Count CaTCH barcodes in chunks and merge.
4) **Collapse & filter**: Apply distance/method settings to deduplicate CaTCH barcodes and UMIs; remove background.
5) **Multiplet resolution**: Majority-vote based merging of multiplets.
6) **Reports & tables**: Generate CaTCH barcode and cell summaries plus analytics plots.

CLI mapping for collapse script
-------------------------------
- Nextflow params map to `collapseCaTCHbarcodes.py`:
	- `--clusterMethodCaTCH` → `--cluster-method-catch`
	- `--clusterMethodUMIs`  → `--cluster-method-umis`
	- `--maxDistCaTCH`       → `--maxdist-catch`
	- `--maxDistUMIs`        → `--maxdist-umis`
- All default to the original behaviour (directional, distance 1) if not set.

Outputs
-------
- `OUTPUT/Counts/` – intermediate and collapsed `.sclib` libraries and stats.
- `OUTPUT/Reports/` – tables (`*.CaTCHbarcodes`, `*.cells`) and plots.
- `OUTPUT/CellRanger/` or `OUTPUT/STAR/` – mapper-specific outputs.

Tips
----
- Provide chemistry-specific parameters for STARsolo if deviating from 10X presets.
- Ensure `--libraries` paths are accessible where Nextflow executes (local or workdir-mounted in containers).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors