Single-Islet Proteomics Maps Pseudo-Temporal Islet Immune Responses and Dysfunction in Stage 1 Type 1 Diabetes
Shane S. Kelly, Soumyadeep Sarkar, Sarai M. Williams, An D. Fu, Elizabeth A. Butterworth, Tyler J. Sagendorf, Lorenz A. Nierves, Yumi Kwon, Xiaolu Li, Vladislav A. Petyuk, James M. Fulcher, Jing Chen, Ernesto S. Nakayasu, Mark A. Atkinson, Rohit N. Kulkarni, Clayton E Mathews, Ying Zhu, Martha Campbell-Thompson, and Wei-Jun Qian
This repository contains the complete analysis pipeline for single-islet proteomics in stage 1 Type 1 Diabetes (T1D), supporting our manuscript currently available as a preprint.
Progressive β-cell dysfunction precedes the onset of T1D, yet the molecular mechanisms driving early T1D development remain poorly understood. This study applies a single-islet proteomics workflow to profile intra-donor islet heterogeneity in three stage 1 T1D cases with matched non-diabetic controls, defining in situ protein signatures of pseudo-temporal islet dysfunction.
- ~100 individual islets per donor were analyzed using laser microdissection (LMD) and nanoPOTS-based proteomics, revealing consistent proteomic patterns reflecting pseudo-time progression.
- Islet Immune Response Signature (IIRS): A 40-protein signature capturing immune-mediated islet progression, including HLA class I/II machinery, interferon-stimulated genes, antigen processing components, and novel candidates (PARP10, GSDMD, LGALS3BP, OPTN).
- Beta Cell Profile (BCP): A 42-protein panel representing β-cell identity and function, revealing progressive loss of β-cell characteristics in stage 1 T1D.
- Extracellular matrix (ECM) dysregulation was identified in association with islet immune response, while mRNA processing and RNA-splicing pathways were prominently associated with loss of β-cell function. ECM remodeling was also negatively correlated with β-cell function.
- Evidence of pseudo-temporal trajectories within individual donors, with immune activation and β-cell dysfunction occurring largely independently (Pearson correlation r = 0.13).
- Stage 1 T1D Donors: 3 donors (6450, 6521, 6267) - multiple autoantibody positive (mAAb+)
- Non-diabetic Controls: 3 age, sex, and race-matched donors (6178, 6440, 6539)
- Methodology: Single-islet spatial proteomics (~5,800 proteins per donor), multiplex immunohistochemistry (mIHC), machine learning, WGCNA, and pathway enrichment analyses
Raw mass spectrometry data and processed MSstats files are available from MassIVE: - Accession: MSV000099920
Required files from MassIVE: - MSstats.csv output files (FragPipe processed quantification data) - sample_metadata.xlsx (sample phenotype and experimental metadata) - uniprotkb_Human_2023_10_25.tsv.gz (UniProt human proteome reference for protein annotation)
Before running the analysis pipeline, download the required files from MassIVE and place them in the appropriate locations:
# Create data directory structure
mkdir data/Results_7_combined
# Download all files from MassIVE and place them in the data directories
# - MSstats.csv files → data/Results_7_combined/Results_7.1/, Results_7.2/, and Results_7.3/
# - sample_metadata.xlsx → data/
# - uniprotkb_Human_2023_10_25.tsv.gz → data/Additional required reference files are included in this repository under the data folder.
- R (≥4.5)
- Snakemake (≥9.14)
This project uses renv for R package management, ensuring reproducible package versions. The Snakefile automatically initializes the environment on first run.
# Clone the repository
git clone https://github.com/skelly001/T1D_Single_Islets.git
cd T1D_Single_Islets
# The renv environment will be automatically set up when you run Snakemake (see below)The complete analysis pipeline is automated using Snakemake. After setting up the data (see Data Setup above), run:
# Run the complete pipeline with automatic environment setup
snakemake --cores 4Note: The Snakefile automatically: 1. Initializes the renv environment with all required packages 2. Executes all analysis stages in the correct order 3. Manages dependencies between analysis steps
The pipeline consists of 10 major stages that recapitulate the analyses presented in the manuscript:
- MSnSet Creation - Import and format MSstats data into MSnSet objects
- Preprocessing - Normalization, filtering, batch correction (standard and iBAQ normalization)
- WGCNA - Weighted Gene Co-expression Network Analysis for each T1D donor to identify protein modules associated with CD3+ infiltration and insulin intensity
- Immune Signature Identification - Machine learning (random forest with nested cross-validation) to select the top 40-protein IIRS from 329 candidate proteins identified in WGCNA immune-related modules
- Immune Signature Analysis - UMAP trajectory visualization, snRNA-seq validation, differential expression analysis (LIMMA), pathway enrichment (CAMERA-PR), and identification of key pathways including:
- Antigen processing and presentation
- Interferon signaling (Type I and II)
- ECM dysregulation (lower association in stage 1 T1D)
- Hyaluronan metabolic and glycosaminoglycan catabolic processes
- IL-10 production
- Beta Cell Profile - Identification of the 42-protein BCP by selecting proteins with highest correlation with INS and ENTPD3
- Beta Cell Profile Analysis - Differential expression (LIMMA), pathway enrichment (CAMERA-PR) revealing:
- Mitochondrial translation and gene expression (positive correlation with BCP)
- mRNA processing and RNA-splicing (negative correlation with BCP in stage 1 T1D)
- ECM remodeling (negative correlation with BCP)
- IIRS-BCP trajectory comparison (weak correlation, r = 0.13)
- Clustering - Fine-grained hierarchical clustering identifying 75 protein modules and their functional associations
- Cell Type QC - Islet cell type marker validation using scRNA-seq reference data (optional, requires Azimuth preprocessing)
- Miscellaneous QC - Final quality control plots (insulin/glucagon distributions, observed proteins) and donor information summary tables
If you wish to run the cell type quality control analysis (Stage 9), you must first process the Azimuth human pancreas reference dataset:
# Navigate to the Azimuth reference directory
cd azimuth-references/human_pancreas_snakemake
# Run the Azimuth pancreas Snakefile
snakemake --cores 4
# Return to main directory
cd ../..
# Then run scripts 9a_1, 9b_1, and 9c_1 manually
Rscript 9a_1-cell_type_markers_prep.R
Rscript 9b_1-cell_type_markers_selection.R
Rscript 9c_1-cell_type_QC_barplot.RThe Azimuth reference processing creates the scRNA-seq reference object needed for cell type marker validation.
Reference: Azimuth pancreas reference sourced from satijalab/azimuth-references
You can also run individual analysis scripts manually. If running scripts outside of Snakemake, you must first set up the renv environment:
# First time setup: Initialize renv environment
Rscript -e "renv::restore(prompt = FALSE)"
# Uncomment the source line in .Rprofile to activate renv for manual script execution
# Edit .Rprofile and uncomment: source("renv/activate.R")
# Then run individual scripts
# Example: Run immune signature selection
Rscript 4b_1-immune_signature_selection.RAnalysis outputs are organized in the output/ directory by analysis stage:
output/RD1-raw_msnsets/- Raw MSnSet objectsoutput/RD2-preprocessing/- Preprocessed data with batch correctionoutput/RD3-WGCNA/- WGCNA co-expression networks and cluster ORAoutput/RD4-islet_immune_response_signature/- IIRS proteins and modelsoutput/RD5-islet_immune_response_signature_analysis/- IIRS analysis resultsoutput/RD6-beta_cell_profile/- Beta Cell Profile proteinsoutput/RD7-beta_cell_profile_analysis/- Beta Cell Profile analysis resultsoutput/RD8-clustering/- Clustering resultsoutput/RD9-cell_type_marker_QC/- Cell type QCoutput/RD10-misc/- QC plots and summary tables
Key output files include: - Heatmaps of protein signatures - Volcano plots of differential expression - UMAP trajectory visualizations - Gene set enrichment results (CAMERA-PR analysis) - Quality control plots
Major R packages used in this analysis: - MSnbase / MSnSet.utils - Proteomics data structures and manipulation - limma - Linear modeling for differential expression analysis and CAMERA-PR competitive gene set testing - WGCNA - Weighted Gene Co-expression Network Analysis - ComplexHeatmap - Advanced heatmap visualization - tidyverse - Data manipulation and visualization - Seurat - scRNA-seq reference data processing (for cell type QC) - mlr3verse - Machine learning ecosystem for IIRS feature selection - umap - Dimensionality reduction and trajectory visualization - clusterProfiler - Over-representation analysis (ORA)
All package versions are locked and managed by renv (specified in renv.lock) to ensure reproducibility.
If you use this code or data, please cite:
Kelly SS, Sarkar S, Williams SM, Fu AD, Butterworth EA, Sagendorf TJ, Nierves LA, Kwon Y, Li X, Petyuk VA, Fulcher JM, Chen J, Nakayasu ES, Atkinson MA, Kulkarni RN, Mathews CE, Zhu Y, Campbell-Thompson M, Qian W-J. Single-Islet Proteomics Maps Pseudo-Temporal Islet Immune Responses and Dysfunction in Stage 1 Type 1 Diabetes. bioRxiv 2025.11.10.687674; doi: https://doi.org/10.1101/2025.11.10.687674
This project is licensed under the MIT License. See the LICENSE file for details.
Corresponding Authors: - Wei-Jun Qian (weijun.qian@pnnl.gov) - Pacific Northwest National Laboratory - Martha Campbell-Thompson (thompmc@pathology.ufl.edu) - University of Florida - Ying Zhu (yingzhupnnl@gmail.com) - Pacific Northwest National Laboratory\
- Clayton E. Mathews (cxm@ufl.edu) - University of Florida
We thank the donors and families of the donors for their invaluable contribution to our research and their help to further understand and hopefully cure type 1 diabetes. This research was performed with the support of the Network for Pancreatic Organ donors with Diabetes (nPOD; RRID:SCR_014641), a collaborative type 1 diabetes research project supported by Breakthrough T1D and The Leona M. & Harry B. Helmsley Charitable Trust (Grant#3-SRA-2023-1417-S-B). The content and views expressed are the responsibility of this article’s authors and do not necessarily reflect the official view of nPOD. Organ Procurement Organizations (OPO) partnering with nPOD to provide research resources are listed at https://npod.org/for-partners/npod-partners/. This research was supported by NIH Grants R01DK122160, R01DK135081, R01DK131059, R01DK123329, P01AI42288, and U01DK137113. Proteomics was performed in the Environmental Molecular Sciences Laboratory, a national scientific user facility sponsored by the DOE and located at Pacific Northwest National Laboratory, which is operated by Battelle Memorial Institute for the DOE under Contract DE-AC05-76RL0 1830. This work utilized a LEICA 7000 laser microdissection microscope purchased with a NIH shared instrumentation grant S10OD016350 and operated by the University of Florida Molecular Pathology Core (RRID:SCR_016601).