By parallelizing the thought process
the training speeds up, the inference speeds up
and the performance rises up.
The code base for project Parallel Continuous Chain-of-Thought, parallelizing the continuous chain-of-thought approach with Jacobi iteration. It improves both training and inference efficiency of continuous chain-of-thought, with better performance on reasoning tasks. The paper "Parallel Continuous Chain-of-Thought with Jacobi Iteration" was accepted to EMNLP 2025 main conference.
Replay our oral session at EMNLP 2025.
Please install pytorch, flash-attn, then those packages in requirements.txt. Below is how I install these packages. It is not necessary to follow that exactly the same. Developped with python 3.12.4.
conda install pytorch=2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install mkl=2022.1.0
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.5cxx11abiFALSE-cp312-cp312-linux_x86_64.whl
pip install -r requirements.txtWe have released a finetuned GPT-2 model described in our paper on HuggingFace. To run this model, please first clone this repo then execute the following command:
python example.pyIt provides an example usage of PCCoT model. See example.py for more details.
Our implementation is based on HuggingFace transformers. We register new models pccot-gpt2 and pccot-llama to facilitate the usage of PCCoT.
To train a PCCoT model, copy run_ccot.sh.template to run_ccot.sh and run it.
# for the first time execution
cp run_ccot.sh.template run_ccot.sh
# run the script
bash run_ccot.shIt will finetune a GPT-2 and a Llama3.2-1B model on the GSM8K-AUG dataset. See the script for more details.
We use the same training script as the original transformers library with slight modifications. You may refer to the official documentation for more details.
To train a standard CoT model, use the run_cot.sh.template instead.
The training script can also be used for testing and the metric will test the accuracy of the model generating every answer token correctly. In case you want to test the model with real generation result, use the test_ccot.sh.template script.
# for the first time execution
cp test_ccot.sh.template test_ccot.sh
# run the script
bash test_ccot.shIt will test the model on the GSM8K-AUG dataset. See the script for more details.
To test a standard CoT model, use the test_cot.sh.template instead.
We provide some sample configuration files in the configs folder. The config settings are defined in models/configuration_gpt2.py. You may refer to this file for more details.
from models import PCCoTGPT2Config
# we have prepared a sample configuration file
config = PCCoTGPT2Config.from_pretrained("configs/pccot_gpt2_small.json")
# setup the loss weights described in CODI
config.loss_alpha = 1
config.loss_beta = 1
config.loss_gamma = 1
# setup the number of iterations
config.num_iterations = 3accelerate launch run_ccot.py \
--config_name configs/pccot_gpt2_small.json \
--config_overrides loss_gamma=1,num_iterations=3 \
...There are also some other configurations defined in models/pccot_arguments.py. You may refer to this file for more details.
To setup the arguments just write them in the shell script:
accelerate launch run_ccot.py \
--use_chat_template true \
--num_latent_tokens 24 \
--answer_prompt "The answer is:" \
--lora_r 128 \
...If you are loading a trained PCCoT model, then these arguments will by default be loaded from the checkpoint. If you want to override them, please remove the corresponding codes in the python script.