This is an unofficial spin off repository for the paper: ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models[project website].
Here is a blog talking in more detail about this work.
The purpose is to extend the original codebase to support newer reasoning models and integrate vllm for faster steering experiments.
- deepseek-qwen-1.5b
- deepseek-llama3-8b
- deepseek-qwen-14b
- qwen3-1.7b
To add additional models, simply add an identifier and a local or huggingface path in utils.py
- openai/gsm8k
- furonghuang-lab/Easy2Hard-Bench gsm8k split
To add additional datasets, follow the examples found in utils.py. Identify the name,split and question/answer keys used.
To run steering experiments, follow these steps:
Run generate_responses.py to generate model output for a given dataset. The script takes the following arguments:
--model: The name of the model to use (e.g.deepseek-qwen-1.5b)--dataset: The name of the dataset to use (e.g.gsm8k)--batch_size: The batch size to use for generation (default: 1)--tp: The tensor parallel size to use for generation (default: 1)
Example:
python3 generate_responses.py --model deepseek-qwen-1.5b --dataset gsm8k --batch_size 32 --tp 2Run extract_tl.py to extract the steering vector for a given model and dataset. The script takes the following arguments:
--model: The name of the model to use (e.g.deepseek-qwen-1.5b)--control: The type of control to use (e.g.attnormlp)
Example:
python3 extract_tl.py --model deepseek-qwen-1.5b --control attnRun run_mlp_steering_experiments.sh to run steering experiments by intervening after each MLP layer for a given model and dataset. The script takes the following arguments:
model: The name of the model to use (e.g.deepseek-qwen-1.5b)dataset: The name of the dataset to use (e.g.gsm8k)device: The CUDA GPU number(s) that you wish to use for running the experiments
Example:
bash run_mlp_steering_experiments.sh deepseek-qwen-1.5b gsm8kThe script run_mlp_steering_experiments.sh works in a similar manner, but intervenes after every attention layer.
Run plot_steering.py to plot the steering results for a given directory of CSV files. The script takes the following arguments:
dir_path: The path to the directory containing the CSV files
Example:
python3 plot_steering.py results/qwen3-1.7b_steering_resultsThis will create two summary figures:
combined_thinking_length_vs_steering_strength.pngcombined_accuracy_vs_steering_strength.png
inside the specified directory.
Note: Make sure to update the model_dict and DATASET_MAP variables in generate_responses.py and extract_tl.py to include the models and datasets you want to support.
Chung-En Sun, Ge Yan, Tsui-Wei Weng, "ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models", arxiv preprint
@article{ThinkEdit,
title={ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models},
author={Chung-En Sun, Ge Yan, Tsui-Wei Weng},
journal={arXiv},
year={2025}
}