This project fine-tunes the Phi-4-reasoning-plus model with SOS chain of thought training data (from Bohan) using the LoRA through Axolotl.
- RunPod instance with enough disk memory
- SSH access to RunPod
- Python 3.11 or higher
- Access to training data (Google Drive folder)
SSH into your RunPod instance and navigate to the workspace:
# Connect to RunPod
ssh runpod-tcp
# CD to workspace
cd /workspace
# Git clone repo
cd mathbeaver-finetune
cd into "data" folder
pip install gdown
git clone https://github.com/Shivamshaiv/mathbeaver-finetune.git# download training data into "data" in the "Data_SOS_Cot" folder
# to get the first 50 folders from Bohan's google drive of SOS training data
gdown --folder https://drive.google.com/drive/folders/1E1tHwS7YQOajZcjWsMXpTaPdRZm9jYcC --remaining-okconda create -n phi-tuning python=3.11
conda init
# restart shell
conda activate phi-tuning
#install axolotl
pip install axolotlrun python preprocess_data_chatml.py
Run training script: python run_training.py --config config_test.yaml
Output saved to outputs directory