ImmuneSubtypeClassifier

ImmuneSubtypeClassifier is an R package for robust immune subtype classification of tumor samples using gene expression data. It uses the "Robencla" (Robust Ensemble Classifier) framework (github.com/gibbsdavidl/Robencla), utilizing an ensemble of XGBoost models to assign samples to one of six immune subtypes (C1–C6).

⚠️ Critical Dependency Note

This package requires xgboost version < 2.0.0.

Due to major changes in the XGBoost model serialization format, models trained on version 1.x cannot be loaded by version 2.x or 3.x. The pre-trained models included in this package were built using XGBoost 1.7.x.

To ensure compatibility, this package enforces xgboost (< 2.0.0). Using this package with renv is encouraged, but if you are setting up your environment manually:

# Install a compatible version of xgboost
remotes::install_version("xgboost", version = "1.7.8.1")

Installation

You can install the development version of ImmuneSubtypeClassifier from GitHub:

# Install devtools if you haven't already
if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")

devtools::install_github("gibbsdavidl/robencla")
devtools::install_github("gibbsdavidl/ImmuneSubtypeClassifier")

Quick Start: Predicting Subtypes

The function getFeaturesPairList() returns a named list, C1-C6, where feature pairs are defined. Pairs are (1,2), (3,4), (5,6), etc. for each cluster label.

The function getFeaturesGeneTable() returns the gene ID table used with the TCGA PanCancer EB++ expression training data (hg19). Use this function to subset your data with columns Symbol, Entrez, and Ensembl.

The main function callSubtypes() handles gene matching, data transformation, and prediction.

library(ImmuneSubtypeClassifier)

# Get the list of feature-pair genes used for the model.
# ...there's model_genes_list$model_genes and model_genes_list$gene_map
model_genes_list <- modelGenes(model_path='../models/immune_optimized_99_pairs.rds')

# Confirm the gene_map contains all genes.
length(model_genes$model_genes) == nrow(model_genes$gene_map)

# Call the subtypes, can also pass in a data.frame / tibble etc.
result <- callSubtypes(X_or_path = '../data/gene_expression_rsem_tpm.csv.gz',
                     model = NULL,  # genes in columns and samples in rows.
                     model_path = '../models/immune_optimized_99_pairs.rds',
                     geneid =   "symbol",  # how are the gene IDs encoded
                     sampleid = 'Barcode', # column name with sample IDs
                     labelid=  'Label')    # column name with sample Labels (optional)

# get the trained robencla model.
model <- result$Model

# get the results table, $BestCall is the predicted subtype
pred  <- result$Pred

# confusion matrix
table(pred$BestCall, pred$Label)

# get class prediction metrics.
model$classification_metrics(labels = pred$Label, calls = pred$BestCall)

Output Format

The output is a data frame containing:

SampleIDs: The sample identifiers.
BestCall: The predicted immune subtype (1–6).
1–6: Probability scores for each subtype.

SampleIDs	BestCall	1	2	3	4	5	6
Sample_01	3	0.02	0.10	0.85	0.01	0.01	0.01
Sample_02	1	0.78	0.05	0.10	0.02	0.04	0.01

Advanced: Retraining the Model

If you have a labeled training set and wish to rebuild the model (e.g., after updating XGBoost or changing feature pairs), you can use build_robencla_classifier. I can also send you the PanCancer training set, already formatted.

Training Data Requirements

CSV file with samples as rows and genes as columns.
Must contain a Label column (e.g., "C1", "C2") and a Sample ID column.

library(ImmuneSubtypeClassifier)

# Conservative parameters, see robencla & xgboost for parameters 
conservative_params <- list(
  max_depth = 8,
  eta = 0.2,
  nrounds = 64,
  early_stopping_rounds = 4,
  gamma = 0.3,
  lambda = 1.9,
  alpha = 0.3,
  ensemble_size = 11,
  sample_prop = 0.8,
  feature_prop = 0.8,
  subsample = 0.8
)

# then load up a list of gene pairs...
pair_list <- readRDS('../models/pair_list_stratified.rds')
# if there's a gene you want to try removing
this_set <- editPairList(pair_list, 'IGJ')

# Now use the cleaned pair_list
result <- build_robencla_classifier(
  data_path='../data/training_expression_data.csv.gz',
  test_path='../data/testing_expression_data.csv.gz',
  output_path = '../models/immune_optimized_99_pairs.rds',
  pair_list = this_set,
  sig_list = NULL,
  param_list = conservative_params,
  data_mode = c("namedpairs"),
  train_fraction = NULL,  # used if test_path is NULL
  seed = 412,
  sample_id = "Barcode"  # Specify the sample ID column
)

Immune Subtypes

The classifier assigns one of six immune subtypes:

C1 (Wound Healing): High proliferation, angiogenic gene expression.
C2 (IFN-gamma Dominant): High M1/M2 macrophage polarization, strong CD8 signal.
C3 (Inflammatory): Elevated Th17 and Th1 genes, low tumor cell proliferation.
C4 (Lymphocyte Depleted): Macrophage sequestration, Th2 shift.
C5 (Immunologically Quiet): Low lymphocyte and macrophage responses.
C6 (TGF-beta Dominant): High TGF-beta signature, lymphocytic infiltration.

Troubleshooting

Error: `xgboost.Booster object is corrupted`

This means you are trying to load the model with a newer version of XGBoost (>= 2.0) than was used to train it. Fix: Downgrade XGBoost to 1.7.8.1 or re-train the model using build_robencla_classifier.

License

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
.idea		.idea
R		R
data		data
inst		inst
man		man
model		model
test		test
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.Rprofile		.Rprofile
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
ImmuneSubtypeClassifier.Rproj		ImmuneSubtypeClassifier.Rproj
NAMESPACE		NAMESPACE
README.md		README.md
classifier_training.rmd		classifier_training.rmd
license.txt		license.txt
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImmuneSubtypeClassifier

⚠️ Critical Dependency Note

Installation

Quick Start: Predicting Subtypes

Output Format

Advanced: Retraining the Model

Training Data Requirements

Immune Subtypes

Troubleshooting

Error: `xgboost.Booster object is corrupted`

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 2

Languages

License

CRI-iAtlas/ImmuneSubtypeClassifier

Folders and files

Latest commit

History

Repository files navigation

ImmuneSubtypeClassifier

⚠️ Critical Dependency Note

Installation

Quick Start: Predicting Subtypes

Output Format

Advanced: Retraining the Model

Training Data Requirements

Immune Subtypes

Troubleshooting

Error: xgboost.Booster object is corrupted

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 2

Languages

Error: `xgboost.Booster object is corrupted`

Packages