Skip to content

LORA Fine-tuning and evaluating Phi-4-reasoning-plus model with SOS chain of thought training data

Notifications You must be signed in to change notification settings

richardhoff88/mb_finetuning_notes

Repository files navigation

MathBeaver Fine-tuning Setup Guide

Overview

This project fine-tunes the Phi-4-reasoning-plus model with SOS chain of thought training data (from Bohan) using the LoRA through Axolotl.

Built with Axolotl

Colab Link to QA testing with trained model

Open In Colab

Prerequisites

  • RunPod instance with enough disk memory
  • SSH access to RunPod
  • Python 3.11 or higher
  • Access to training data (Google Drive folder)

Setup Instructions

1. Initial Setup

SSH into your RunPod instance and navigate to the workspace:

# Connect to RunPod
ssh runpod-tcp

# CD to workspace
cd /workspace

# Git clone repo
cd mathbeaver-finetune
cd into "data" folder
pip install gdown

2. Clone Shivam's repo (Richard's branch)

git clone https://github.com/Shivamshaiv/mathbeaver-finetune.git

3. Download files from Bohan's drive

# download training data into "data" in the "Data_SOS_Cot" folder
# to get the first 50 folders from Bohan's google drive of SOS training data
gdown --folder https://drive.google.com/drive/folders/1E1tHwS7YQOajZcjWsMXpTaPdRZm9jYcC --remaining-ok

4. Set up environment

conda create -n phi-tuning python=3.11
conda init
# restart shell
conda activate phi-tuning

#install axolotl
pip install axolotl

5. Preprocess data to ChatML format

run python preprocess_data_chatml.py

6. Configure config file (with small number of examples)

Run training script: python run_training.py --config config_test.yaml

7. Results

Output saved to outputs directory

About

LORA Fine-tuning and evaluating Phi-4-reasoning-plus model with SOS chain of thought training data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published