Step-Level Sparse Autoencoder for Reasoning Process Interpretation

This repository contains the official implementation of the paper "Step-Level Sparse Autoencoder for Reasoning Process Interpretation"

🔧 Installation

Prerequisites

Python 3.10
CUDA 11.8+

# Clone the repository
git clone https://github.com/Miaow-Lab/SSAE.git
cd SSAE

# Create conda environment
conda create -n ssae python=3.10
conda activate ssae

# Install dependencies
pip install -r requirements.txt

Dataset

The dataset is hosted on HuggingFace Dataset. There should be 6 files to download: gsm8k_385K_train.json, gsm8k_385K_valid.json, numina_859K_train.json, numina_859K_valid.json, opencodeinstruct_36K_train.json, and opencodeinstruct_36K_valid.json.

Please create a data/ folder at the repository root and place all downloaded files there.

mkdir -p data
# put downloaded dataset files into ./data

Pretrained SSAE Checkpoints

We also provide pretrained SSAE checkpoints on HuggingFace.

⚙️ Configuration

The project uses modular YAML configuration files located in configs/. You can modify parameters directly in the YAML files or override them via command line arguments.

Global Config Structure

There are 3 main configuration files:

configs/train.yaml: Main configuration for training SSAE models.
configs/classifier.yaml: Configuration for the classifier pipeline (data generation, training, evaluation).
configs/experiment.yaml: Configuration for analysis and probing experiments.

All scripts support overriding config parameters using the --set KEY=VALUE argument without modifying the YAML file.

🧰 Usage

1. Training SSAE

Main training entry is train.py.

Single-GPU training:

bash scripts/train_single.sh configs/train.yaml

Multi-GPU training (DDP):

NPROC_PER_NODE=<num_gpus> bash scripts/train_ddp.sh configs/train.yaml

If you use the provided checkpoints, please place them in your configured model_dir.

2. Classifier

We train a series of classifiers to investigate the expressiveness of SSAE. The Classifier module covers data generation, training, and evaluation.

For full usage details, see:

classifier/README.md

3. N2G Pattern Mining

N2G pattern mining:

bash scripts/run_n2g.sh configs/experiment.yaml

N2G pattern analysis/labeling:

bash scripts/run_n2g_analysis.sh configs/experiment.yaml

4. Probing Guided Weighted Voting

Run probing experiment:

bash scripts/run_probing.sh configs/experiment.yaml

📧 Contact

For questions or feedback, please contact Xuan Yang

🖊 Citation

If you find this work helpful, please cite our paper:

@misc{yang2026steplevelsparseautoencoderreasoning,
      title={Step-Level Sparse Autoencoder for Reasoning Process Interpretation}, 
      author={Xuan Yang and Jiayu Liu and Yuhang Lai and Hao Xu and Zhenya Huang and Ning Miao},
      year={2026},
      eprint={2603.03031},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2603.03031}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
classifier		classifier
configs		configs
experiment		experiment
plotting		plotting
scripts		scripts
token-sae		token-sae
.gitignore		.gitignore
README.md		README.md
config_utils.py		config_utils.py
dataloader.py		dataloader.py
dataloader_numina.py		dataloader_numina.py
dataloader_opencodeinstruct.py		dataloader_opencodeinstruct.py
model_qwen.py		model_qwen.py
requirements.txt		requirements.txt
sentenceSAE.py		sentenceSAE.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Step-Level Sparse Autoencoder for Reasoning Process Interpretation

🔧 Installation

Prerequisites

Dataset

Pretrained SSAE Checkpoints

⚙️ Configuration

Global Config Structure

🧰 Usage

1. Training SSAE

2. Classifier

3. N2G Pattern Mining

4. Probing Guided Weighted Voting

📧 Contact

🖊 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Step-Level Sparse Autoencoder for Reasoning Process Interpretation

🔧 Installation

Prerequisites

Dataset

Pretrained SSAE Checkpoints

⚙️ Configuration

Global Config Structure

🧰 Usage

1. Training SSAE

2. Classifier

3. N2G Pattern Mining

4. Probing Guided Weighted Voting

📧 Contact

🖊 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages