Cell Type Mapping Evaluation on Primate Basal Ganglia Taxonomies

The latest Allen Institute Basal Ganglia Taxonomy datasets released in Q2 2025 are used in this tutorial and are available publicly for download.

Overview

Allen Institute Taxonomies are defined by clusters of cells that meet criteria establishing a level of distinctness in the cells' transcriptomic signature. Application of a given taxonomy typically involves annotation of new cells against the cell type hierarchy, starting from clusters and moving up into broader cell categories. However, not all clusters are equal in how well they can be annotated onto new data for various reasons, such as cutting gradients or lacking distinct markers.

To provide transparency into taxonomy labeling and mapping algorithm biases and accuracies when mapping cell type labels using Allen Institute taxonomies as references, we have designed a cell type mapping evaluation procedure. The image below illustrates the idea of this tutorial, which is to evaluate cell type reproducibility by self-projecting the taxonomy. For each of the K-Fold smaller sets new marker genes are selected each time. The main question this tutorial answers is: "How reproducible and accurate are Allen Institute taxonomy labels when used for mapping new single-cell datasets, and how do biases in the mapping algorithm affect this process?"

Data

This tutorial evaluates the Marmoset Basal Ganglia (et al. 2025) using Correlation and Hierarchical Mapping algorithms. However, Human and Macaque taxonomies are also shared. Marmoset is the smallest and therefore used for this tutorial. [Link to Human, Macaque, and Marmoset taxonomies] (https://alleninstitute.github.io/HMBA_BasalGanglia_Consensus_Taxonomy/)

The marmoset taxonomy was initially built using the Basal Ganglia (BG) area single-nucleus dataset. In building the taxonomy, binary marker genes were selected based on their gene expression from the single-cell transcriptome. Subsequently, the dataset was mapped to itself, termed self-projection, for evaluating the ideal performance of correlation and hierarchical mapping algorithms.

Mapping Algorithms

This tutorial uses two mapping algorithms:

CORRELATION MAPPING: one-step nearest cluster centroid mapping based on correlation
HIERARCHICAL CORRELATION MAPPING: hierarchical nearest cluster centroid mapping based on correlation

Algorithm descriptions are available at: https://portal.brain-map.org/atlases-and-data/bkp/mapmycells/algorithms.

Tutorial

The code for the tutorial is in this repository named "tutorial.ipynb". The code takes about 15-20mins to run using the subsampled Marmoset Taxonomy. It will be faster or take longer depending on the dataset size.

Results

Below are the results of the analysis to evaluate the predictions of correlation and hierarchical mappings in determining cluster labels in a self-projection evaluation:

To analyze the results:

Most groups have very high F1-scores (>0.9), suggesting that your taxonomy mapping procedure works robustly across the majority of clusters.
Confidence values from the models are a mostly reliable at distinguishing between True positives and False positives.
A few groups dip below the bulk trend. For example, one oligodendrocyte cluster (ImOligo) looks like it has a noticeably lower F1-score compared to the others (closer to ~0.7–0.8). This suggests that these clusters are less reproducible in cross-validation, possibly because:

They overlap with neighboring subtypes (continuous gradients).
They lack distinct marker genes.
Sample size is smaller, which reduces stability.

There is some confusion in the MSNs, specifically STRd D1 and STRd D2 MSNs.
Lower F1-score groups should be flagged as “less stable” — they might require:

More robust marker selection,
Collapsing into broader categories,
Or additional orthogonal evidence (spatial localization, morphology).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
markdown_images		markdown_images
results		results
.gitignore		.gitignore
README.md		README.md
apply_labels.py		apply_labels.py
build_plots.py		build_plots.py
centroid_subsample.py		centroid_subsample.py
generate_MapMyCells_files.sh		generate_MapMyCells_files.sh
group_order.csv		group_order.csv
split_kfolds.py		split_kfolds.py
tutorial.ipynb		tutorial.ipynb
tutorial.md		tutorial.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cell Type Mapping Evaluation on Primate Basal Ganglia Taxonomies

Overview

Data

Mapping Algorithms

Tutorial

Results

To analyze the results:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cell Type Mapping Evaluation on Primate Basal Ganglia Taxonomies

Overview

Data

Mapping Algorithms

Tutorial

Results

To analyze the results:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages