This is the code implementation for "SIDReasoner - Reasoning over Semantic IDs Enhances Generative Recommendation".
SIDReasoner is a generative recommendation framework that strengthens generative recommenders with reasoning ability over semantic IDs. This repository provides:
- A complete training pipeline, with each training stage integrated into an easy-to-run script.
- Full training data, including our synthesized enriched alignment corpus.
- Pretrained model checkpoints.
Our method demonstrates that, with improved SID–language alignment, effective recommendation reasoning can be achieved even under academic-scale training. SIDReasoner is able to associate SIDs with their underlying item semantics, produce coherent natural-language reasoning over interaction histories, and generate recommendations according to the reasoning process. By open-sourcing the pipeline, data, and checkpoints, we aim to facilitate further research on reasoning in generative recommendation.
A case study of how SIDReasoner generates interpretable reasoning over SIDs.
The reinforcement learning stage (Stage 3) in this project is built on top of VERL. We recommend follow the official installation guide to set up the environment. To execute the codes correctly, the following additional packages are required:
torchtransformersdatasetspeftpandasnumpyfirewandbtqdmacceleratebitsandbytes
The datasets can be accessed via this link. Please download the dataset and ensure the dataset folder is placed under directory ./data/Amazon .
SIDReasoner follows a three-stage training pipeline.
| Stage | Script |
|---|---|
| Stage 1: Supervised Fine-Tuning | bash sft_Qwen3_enrich.sh |
| Stage 2: Reasoning Activation | bash sft_reasoning_activation.sh |
| Stage 3: RL Training | bash RL_training_script.sh |
# Stage 1
bash sft_Qwen3_enrich.sh
# Stage 2
bash sft_reasoning_activation.sh
# Stage 3
bash RL_training_script.shThe training logs are written to ./logs.
To facilitate further research, we release our pretrained model checkpoints, which can be downloaded via this link.
We provide the scripts to test the model performance under thinking and non-thinking mode:
# Non-thinking mode.
bash evaluate_Qwen3.sh
# Thinking mode.
bash evaluate_Qwen3_think.shThe reasoning evaluation script expects a merged Hugging Face checkpoint named actor_merged. If RL training has only produced raw actor folders, merge them first:
python3 ./scripts/merge_fsdp_checkpoint.py \
--checkpoint ./checkpoints/RecRL_Reasoning/Office_Products_stage3_rl_Qwen3-1.7B/global_step_100/actor \
--output-dir ./checkpoints/RecRL_Reasoning/Office_Products_stage3_rl_Qwen3-1.7B/global_step_100/actor_mergedIf you find this work useful in your research, please consider citing:
@article{SIDReasoner,
title={Reasoning over Semantic IDs Enhances Generative Recommendation},
author={Yingzhi He and Yan Sun and Junfei Tan and Yuxin Chen and Xiaoyu Kong and Chunxu Shen and Xiang Wang and An Zhang and Tat-Seng Chua},
journal={arXiv preprint arXiv:2603.23183},
year={2026}
}This repo is built upon MiniOneRec.
