Skip to content

HappyPointer/SIDReasoner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SIDReasoner

This is the code implementation for "SIDReasoner - Reasoning over Semantic IDs Enhances Generative Recommendation".

SIDReasoner is a generative recommendation framework that strengthens generative recommenders with reasoning ability over semantic IDs. This repository provides:

  • A complete training pipeline, with each training stage integrated into an easy-to-run script.
  • Full training data, including our synthesized enriched alignment corpus.
  • Pretrained model checkpoints.

Our method demonstrates that, with improved SID–language alignment, effective recommendation reasoning can be achieved even under academic-scale training. SIDReasoner is able to associate SIDs with their underlying item semantics, produce coherent natural-language reasoning over interaction histories, and generate recommendations according to the reasoning process. By open-sourcing the pipeline, data, and checkpoints, we aim to facilitate further research on reasoning in generative recommendation.

Training

A case study of how SIDReasoner generates interpretable reasoning over SIDs.

Environments

The reinforcement learning stage (Stage 3) in this project is built on top of VERL. We recommend follow the official installation guide to set up the environment. To execute the codes correctly, the following additional packages are required:

  • torch
  • transformers
  • datasets
  • peft
  • pandas
  • numpy
  • fire
  • wandb
  • tqdm
  • accelerate
  • bitsandbytes

Dataset

The datasets can be accessed via this link. Please download the dataset and ensure the dataset folder is placed under directory ./data/Amazon .

Training

SIDReasoner follows a three-stage training pipeline.

Stage Script
Stage 1: Supervised Fine-Tuning bash sft_Qwen3_enrich.sh
Stage 2: Reasoning Activation bash sft_reasoning_activation.sh
Stage 3: RL Training bash RL_training_script.sh

Run training

# Stage 1
bash sft_Qwen3_enrich.sh

# Stage 2
bash sft_reasoning_activation.sh

# Stage 3
bash RL_training_script.sh

The training logs are written to ./logs.

Checkpoints

To facilitate further research, we release our pretrained model checkpoints, which can be downloaded via this link.

Evaluation

We provide the scripts to test the model performance under thinking and non-thinking mode:

# Non-thinking mode.
bash evaluate_Qwen3.sh

# Thinking mode.
bash evaluate_Qwen3_think.sh

Stage 3 checkpoint merge

The reasoning evaluation script expects a merged Hugging Face checkpoint named actor_merged. If RL training has only produced raw actor folders, merge them first:

python3 ./scripts/merge_fsdp_checkpoint.py \
  --checkpoint ./checkpoints/RecRL_Reasoning/Office_Products_stage3_rl_Qwen3-1.7B/global_step_100/actor \
  --output-dir ./checkpoints/RecRL_Reasoning/Office_Products_stage3_rl_Qwen3-1.7B/global_step_100/actor_merged

Citation

If you find this work useful in your research, please consider citing:

@article{SIDReasoner,
  title={Reasoning over Semantic IDs Enhances Generative Recommendation},
  author={Yingzhi He and Yan Sun and Junfei Tan and Yuxin Chen and Xiaoyu Kong and Chunxu Shen and Xiang Wang and An Zhang and Tat-Seng Chua},
  journal={arXiv preprint arXiv:2603.23183},
  year={2026}
}

Acknowledgement

This repo is built upon MiniOneRec.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors