nf_xpatial is a best-practices bioinformatics pipeline written in Nextflow that can be used to perform tertiary analysis on 10x Xenium data. It uses the output directories produced by the Xenium Onboard Analysis (XOA) instrument as input and performs quality control, filtering, normalization, and clustering, and generates configurable figures that can be reviewed individually or in the final summary report.
Notably, for cases where the raw data no longer follows the Xenium Onboard Analysis outputs (e.g.: sample was re-segmented with a third parity tool) nf_xpatial also accepts Seurat objects as input (per sample Seurat object without any data processed).
- Create Seurat object(s) from Xenium output
- Generate QC images for raw data
- Cell Area QC (
Area Box Plot,Area Histogram Plot,Overlapping Histogram Plot) - Cell Shape QC (
Cell Segmentation Proportion Plot,Cell Shape Proportion Plot) - General QC (
Image Dim Plot,nFeature/nCount Violin Plot,nFeature/nCount Feature Scatter Plot,nFeature Dim Plot,nCount Dim Plot)
- Cell Area QC (
- Filter the Seurat object
- Generate QC images for post-filetered data
- Cell Area QC (
Area Box Plot,Area Histogram Plot,Overlapping Histogram Plot) - Cell Shape QC (
Cell Segmentation Proportion Plot,Cell Shape Proportion Plot) - General QC (
Image Dim Plot,nFeature/nCount Violin Plot,nFeature/nCount Feature Scatter Plot,nFeature Dim Plot,nCount Dim Plot)
- Cell Area QC (
- Normalize the data: choose between
area normalization,log normalization, or execute both - Gene Pair QC (
Barnyard Plot,Heatmap Plot) - Merge normalized Seurat objects
- Integrate the data (with Harmony)
- Perform Seurat clustering for single-cell clustering
- Scale data
- Run PCA
- Run Harmony
- Run UMAP
- Find Neighbors
- Find Clusters
- Perform BANKSY clustering for single-cell and/or spatial-domain clustering (Note: this can be executed with BANKSY or with the BANKSY Seurat Wrapper)
- Convert to Spatial Experiment
- Stagger Spatial Coordinates
- Compute BANKSY Matrix
- Compute BANKSY PCA
- Run Harmony BANKSY
- Run BANKSY UMAP
- Merge BANKSY and Seurat clustered objects into a single object
- Generate Cluster QC images (This is done for all parameter combinations) (
UMAP Dim Plot,Split Cluster Plot,Marker Violin Plot,Marker Dot Plot) - Generate summary report
First, prepare a csv file containing metadata for the samples to be analyzed. A user can choose to create separate metadata csv's for each sample or create a single metadata csv that contains information for all samples. The only required columns in this file are SampleID and BiologicalGroup, however additional columns can be added which will be stored in the Seurat object.
metadata.csv
SampleID,BiologicalGroup
XNM001,Control
XNM002,Treatment
XNM003,Control
XNM004,TreatmentIf you have any tissue annotations, i.e. regions that you have drawn and labelled using the Xenium Explorer, you are able to add these onto the seurat object. Once the annotations are exported outside of Xenium Explorer, these will need to be reformatted so that all the annotations are in a tab-delimited file with the columns Cell_ID and Tissue_annotation. To assist with this step, we provide a script in this repository (bin/gather_xenium_explorer_annotations.sh) that can be used to process the exports from Xenium Explorer into the format needed by this pipeline.
Additionally, this step can also be used to remove parts of a slide by labelling the region you wish to remove as REMOVE in Xenium Explorer. The most common use cases for this are to remove parts of sample that has folded over on itself or to remove regions that are from a different sample (NOTE: The pipeline does not currently have a way to add these regions back to the sample it belongs to).
An example file format for the cases described above is presented below:
Case where specific cells map to regions in slide called a, b, c
Cell_ID Tissue_annotation
efhphlac-1 a
fpldnmpm-1 c
bmhpfjfb-1 c
eonmgbhj-1 b
gmjbldbh-1 c
cjfbfjmn-1 bCase where cells below will be removed from any downstream processing
Cell_ID Tissue_annotation
mbohedjb-1 REMOVE
mbohdkcj-1 REMOVE
bieojgni-1 REMOVE
mbpkcimm-1 REMOVE
mbpkhbip-1 REMOVE
mbohjhae-1 REMOVEFinally, prepare a samplesheet with your input data that looks as follows, and note that the samplesheet does need the column names sample, xenium, metadata, manual_annotation:
samplesheet.csv:
sample,xenium,metadata,manual_annotation
XNM001,/path/to/XNM001_xenium_output,/path/to/xenium_metadata.csv,/path/to/XNM001_manual_annotation.csv
XNM002,/path/to/XNM002_xenium_output,
XNM003,/path/to/XNM003_xenium_output,
XNM004,/path/to/XNM004_xenium_output,/path/to/xenium_metadata.csv,/path/to/XNM004_manual_annotation.csvEach row represents a directory produced by the Xenium Onboard Analysis.
A marker list is recommended to support the evaluation of the single-cell and spatial clusters. This list should be a 2 column csv file, and does need to contain the columns group and gene. An example is below and this file should be provided to the --marker_gene_list option of nf_xpatial:
group,gene
Celltype Marker,Gad1
Celltype Marker,Drd1
Celltype Marker,Drd2
Celltype Marker,Slc17a6
Celltype Marker,Slc17a7
Celltype Marker,Aqp4
Celltype Marker,PdgfraAlthough the workflow supports multiple categories in the group column, only the first category in the csv is displayed in the summary report for simplicity (the example above uses a single category for all cell type markers). While only the first group is present in the report, in the case multiple groups are given, all groups are shown in the figures generated (dot plots and violin plots will group genes based on the 'group' column and will print each 'group' on a separate page in their pdf output). By default they will separate groups if they are over 50 genes long, and will spread the group out over multiple pages if it is. This can be configured via custom config by modifying the --max_genes_per_group <INT> parameter for the MARKER_DOT_PLOT and MARKER_VLN_PLOT processes.
Now, you can run the pipeline using (in the example below both area and log normalization methods are enabled, but a user can choose only one of them):
nextflow run U-BDS/nf_xpatial \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--normalization_method "area,log" \
--dim_Seurat "25,30" \
--res_Seurat "0.4,0.5,0.6,0.7" \
--lambda_BANKSY "0.2,0.8" \
--k_geom_BANKSY "15,30" \
--nPCs_BANKSY "20,30" \
--res_BANKSY "0.4,0.6,0.8,1.0" \
--outdir <OUTDIR>For more details on enabling additional parameters, or usage please refer the advanced usage documentation.
Warning
Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters;
(Sample data from Jeremy Day, Jamie Peters and Jasper Heinsbroek)
nf_xpatial produces a number of files and figures that can be used to review the quality of the data and refine clustering. However, the main output of this pipeline are .rds objects that contain all clustering results into a central object. An .rds object is created for each normalization method specified by the --normalization_method pipeline parameter.
Because these objects contain all clustering results and all dimension reductions they can be quite large, making it prudent to filter these objects to a single (or selection) of parameter combinations. In order to do that, it's important to note how the data is stored on each object:
- Each normalization stores its result in a specific assay,
log_normstores its data in theXeniumassay, whilearea_normstores its data in theAreaNormassay. - The clustering calls are stored in the objects metadata with the following format:
clust_[method]_[clustering parameters].methodmay be one of 1.SEU(Seurat clustering) 2.BSKY(BANKSY clustering) 3.BSKYSEU(BANSKY’s Seurat wrapper).clustering parametersmay bed(PCA dimensions) ,r(resolution),l(lambda),k(k_geom). - UMAP dimension reductions follow a similar format to clustering, specifically
[method]_[reduction]_[clustering parameters].methodis the same as described above,reductionmay be one of 1.pca2.harmony, 3.umapand theparametersoption match those described above, with the exception that reductions are calculated prior to clustering, so there are no resolution (r) parameters in their names.
-
For in-depth descriptions and locations of the additional outputs within the results folder, refer to this document (located at
docs/output.md) -
We provide a brief guide that details the naviation of the compiled objects which are produced by
nf_xpatial. This guide can be found here. To assist with filtering the object(s), we provided this script (located atassets/filter_xenium_obj.R) to perform the filtering as well as listing some examples on how to use the provided script.
U-BDS/nf_xpatial was originally written by Luke Potter, Nilesh Kumar, Austyn Trull, Lara Ianov.
We would also like to thank the following people and groups for their support, including financial support:
- Elizabeth Worthey
- Jeremy Day
- Jamie Peters
- Jasper Heinsbroek
- Frances Lund
- Funding:
- Health Services Foundation’s General Endowment Fund
- University of Alabama at Birmingham Biological Data Science Core (U-BDS), RRID:SCR_021766, https://github.com/U-BDS
- Civitan International Research Center
- UAB Office of Research
- 3P30CA013148-48S8
- UM1TR004771
- UAB MULTIPI8110 and Dr. Worthey's start-up funds
If you would like to contribute to this pipeline, please see the contributing guidelines.
If you use U-BDS/nf_xpatial for your analysis, please cite it using the following:
Luke Potter, Nilesh Kumar, Austyn Trull, Lara Ianov, U-BDS/nf_xpatial, 10.5281/zenodo.19861933
We also ask to please check regularly since the citation is expected to be changed once a pre-print is available.
This pipeline uses code and infrastructure developed and maintained by the nf-core initative, and reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
In addition, an extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

