State-space language models such as Mamba match Transformer quality while permitting linear complexity inference, yet still comprise billions of parameters that hinder deployment. Existing one-shot pruning methods are tailored to attention blocks and fail to account for the time-shared and discretized state-transition matrix at the heart of the selective state-space module (SSM). In this paper, we introduce SparseSSM, the first training-free pruning framework that extends the classic optimal brain surgeon (OBS) framework to state space architectures. Our layer-wise algorithm (i) derives an approximate second-order saliency score that aggregates Hessian-trace information across time steps, (ii) incorporates a component sensitivity analysis to guide feed-forward network (FFN) pruning, which also sheds light on where redundancy resides in mamba architecture, (iii) can be easily extended to semi-structured and structured sparsity. Empirically, we prune 50% of SSM weights without fine-tuning and observe no zero-shot accuracy loss, achieving the current state-of-the-art pruning algorithm for Mamba-based LLMs.
git clone https://github.com/CFinTech/SparseSSM
cd SparseSSM
pip install -r requirements.txtThe data for calibrations can be downloaded here.
To prune the SSM module, you can run the following command:
CUDA_VISIBLE_DEVICES=${your_gpu_id} python main.py \
  path/to/your/model wikitext2 \
  --experiment_name your_experiment_name\
  --method "sparsessm_dev" \
  --save path/to/pruned_model \
  --sparsity 0.5 \
  --nsamples 64 \
  --minlayer 0 \
  --maxlayer 100 \
  --prune_A True \
  --do_prune \
  --eval_zero_shot \
  --log_wandb \Illustration of SparseSSM. The first row depicts the evolution of the diagonal parameter matrix 
Performance analysis for one-shot unstructured pruning of SSM modules in Mamba models at 
- This source code is derived from the famous PyTorch reimplementation of SparseGPT and mamba-minimal.
- We use Mamba checkpoints to test our method.
- The README file is inspired by LLM-pruner.
If you find this work useful for your research, please consider citing our paper:
@article{tuo2025sparsessm,
  title={SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot},
  author={Kaiwen Tuo and Huan Wang},
  journal={arXiv preprint arXiv:2506.09613},
  year={2025},
}


