VSML Fine-Tuning: Master Modular Fine-Tuning for Small LLMs

⭐ v2.0 UPDATE: Now following official Unsloth patterns from docs.unsloth.ai. See OFFICIAL_UNSLOTH_GUIDE.md for the recommended approach.

A comprehensive, modular framework for learning and benchmarking parameter-efficient fine-tuning methods for small language models. This repository provides production-ready implementations of modern fine-tuning techniques including LoRA, QLoRA, DPO, and Unsloth with extensive documentation, mobile deployment support, reproducible experiments, and Google Cloud Platform integration for scalable training.

🎯 Overview

Current LLM fine-tuning approaches evolve rapidly. This project provides:

Modular Implementations: Clean, reusable code for each fine-tuning method
Comprehensive Benchmarking: Compare methods on standardized metrics
Production-Ready: Well-tested, documented code following best practices
Educational Resources: Step-by-step guides and learning materials
Reproducible Experiments: Fully documented configurations and results
Cross-Platform: Verified support for Windows (CPU/GPU) and Linux
Hyperparameter Tuning: Integrated Optuna support for optimizing model performance
Mobile Deployment: Export models for Android and iOS devices with Unsloth
Optimized Training: 2-5x faster training with Unsloth's optimized kernels
☁️ GCP Integration: Authenticate with Google accounts, provision GPU/TPU resources, and train models remotely on GCP infrastructure with cost estimation and monitoring

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/ajay-sai/VSML_Fine_Tuning.git
cd VSML_Fine_Tuning

# Create virtual environment
python -m venv .venv
# Windows: .venv\Scripts\activate
# Linux/Mac: source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Interactive UI (Recommended for Beginners)

Launch the Streamlit-based interactive interface for a modern, visual fine-tuning experience:

# Quick launch (Unix/Linux/Mac)
./launch_ui.sh

# Or on Windows
launch_ui.bat

# Or directly with streamlit
streamlit run streamlit_app.py

Modern Streamlit UI Features:

🎯 Multi-Page Interface: Clear workflow navigation through selection, configuration, training, and results
📝 Visual Configuration: Interactive sliders, dropdowns, and forms for all parameters
📊 Dataset Preview: View sample data and statistics before training
🏋️ Live Progress: Real-time training progress bars and status updates
📱 Mobile Export: Built-in Unsloth GGUF export for Android/iOS deployment
🎓 Educational Content: Comprehensive tooltips, help panels, and documentation links
💾 Download Options: Export trained models, configs, and mobile packages

Quick Start:

Open the UI (it launches in your browser at http://localhost:8501)
Click a workflow card (try "CPU Quick Experiment" first!)
Adjust settings using the interactive widgets
Preview your dataset in the Dataset Preview tab
Click "▶️ Start Training" and monitor live progress
Download your trained model and optional mobile export

Note for GitHub Copilot Users: The repository includes Actions setup steps (.github/copilot/setup-steps.yml) that configure the environment to avoid firewall issues with external connections. This ensures the Streamlit app works properly in the Copilot coding agent environment.

📖 New to the UI? Start with the Quick Start Guide
📚 Full Documentation: Streamlit UI Guide

Learning by Doing (CPU Friendly)

If you don't have a GPU, you can still learn the mechanics of fine-tuning using our CPU-optimized configuration with the SmolLM2-135M model.

# Run a complete fine-tuning loop on CPU (~1 minute)
python examples/train_lora.py --config configs/cpu_config.yaml

Advanced Usage

Train with LoRA (GPU)

python examples/train_lora.py --config configs/lora_config.yaml

Train with Unsloth (2-5x Faster!)

Unsloth provides optimized training with 80% less memory usage:

python examples/train_unsloth.py --config configs/unsloth_config.yaml

🌩️ Cloud Alternatives (Recommended for Resource-Constrained Systems):

If you encounter OOM (Out of Memory) errors locally, use our ready-to-run cloud notebooks:

Google Colab (Free GPU):
- Free T4 GPU (15GB VRAM)
- No installation required
- Complete fine-tuning in 10-15 minutes
Kaggle Notebooks (Free GPU): Open in Kaggle
- Free T4/P100 GPU (up to 16GB VRAM)
- 30 hours/week GPU time
- Save models as Kaggle datasets

Both notebooks include:

✅ Complete environment setup
✅ Step-by-step instructions
✅ Model testing and download
✅ Mobile export (GGUF format)

For mobile deployment, see Mobile Deployment Guide.

Hyperparameter Tuning

Automatically find the best learning rate and rank for your data:

python examples/tune_lora.py

Run Benchmarks

Compare different LoRA configurations:

python examples/run_benchmark.py --methods lora_r4 lora_r16 --max-samples 100

📁 Project Structure

VSML_Fine_Tuning/
├── src/
│   ├── methods/          # Fine-tuning implementations
│   │   ├── lora.py      # Standard LoRA
│   │   ├── qlora.py     # Quantized LoRA
│   │   ├── dpo.py       # Direct Preference Optimization
│   │   └── unsloth.py   # Unsloth optimized training
│   ├── data/            # Data loading and preprocessing
│   │   └── loader.py
│   ├── evaluation/      # Evaluation and benchmarking
│   │   ├── evaluator.py
│   │   └── benchmark.py
│   ├── gcp_integration/ # ☁️ GCP integration (NEW!)
│   │   ├── auth.py      # GCP authentication
│   │   ├── resource_manager.py  # VM provisioning
│   │   └── training_orchestrator.py  # Remote training
│   ├── backend/         # FastAPI backend for GCP
│   │   ├── app.py       # API server
│   │   └── models.py    # Request/response models
│   └── utils/           # Utilities
│       ├── config.py
│       └── logging.py
├── configs/             # Configuration files
│   └── gcp_config.yaml  # GCP configuration
├── examples/            # Example training scripts
│   ├── launch_gcp_backend.py  # Start GCP API server
│   └── gcp_resource_example.py  # GCP resource management
├── docs/               # Documentation
│   ├── GCP_INTEGRATION.md  # Full GCP guide
│   └── GCP_QUICKSTART.md   # Quick start guide
├── tests/              # Unit tests
└── requirements.txt    # Dependencies

🛠️ Supported Methods

Method	Description	Memory Efficiency	Speed	Best For
LoRA	Low-Rank Adaptation	High	Fast	General purpose, good balance
QLoRA	4-bit Quantized LoRA	Very High	Moderate	Limited GPU memory
DPO	Direct Preference Optimization	Medium	Moderate	Alignment from human preferences
Unsloth	Optimized LoRA Training	Very High	Very Fast	Production, Mobile deployment

Unsloth Benefits

Unsloth provides significant improvements over standard implementations:

2-5x faster training with optimized kernels
80% less memory usage through efficient quantization
Mobile export support for Android and iOS deployment
Sequence packing for 3x faster training
Compatible with all LoRA variants

For detailed comparisons, see docs/method_comparison.md

📊 Features

Core Capabilities

✅ Multiple Fine-Tuning Methods: LoRA, QLoRA, DPO, Unsloth with more coming
✅ Flexible Configuration: YAML-based configs for reproducibility
✅ Comprehensive Evaluation: Perplexity, generation quality, resource usage
✅ Automated Benchmarking: Compare methods side-by-side
✅ Production Ready: Modular, tested, well-documented code
✅ Mobile Deployment: Export models for Android and iOS devices
✅ Optimized Training: 2-5x faster with Unsloth's optimized kernels

Evaluation Metrics

Performance: Accuracy, Perplexity, Loss
Efficiency: Training time, Inference speed, Throughput
Resources: Memory usage, Trainable parameters
Quality: Generation samples, Task-specific metrics

☁️ GCP Integration (NEW!)

Train your models on powerful GCP infrastructure with GPU/TPU support!

Quick Start with GCP

# 1. Set up GCP project and service account (see docs/GCP_QUICKSTART.md)
# 2. Configure GCP credentials
export GCP_SERVICE_ACCOUNT_KEY=~/gcp-key.json
export GCP_PROJECT_ID=your-project-id

# 3. Start the GCP backend API
python examples/launch_gcp_backend.py

# 4. Use the API or CLI to manage resources
python examples/gcp_resource_example.py \
  --service-account-key ~/gcp-key.json \
  --action create \
  --vm-name training-vm \
  --gpu-type nvidia-tesla-t4 \
  --gpu-count 1

GCP Features

🔐 Authentication: Google OAuth2 or service account authentication
💻 Resource Provisioning: Create GPU/TPU VMs with custom configurations
🎯 Remote Training: Submit and monitor training jobs on GCP infrastructure
💰 Cost Estimation: Get detailed cost estimates before provisioning
📊 Job Monitoring: Track training progress and retrieve results
🔄 Resource Management: Start, stop, delete VMs as needed
⚡ Preemptible Instances: Save 60-80% on compute costs

Cost Estimates (per hour)

Configuration	Regular	Preemptible	Best For
n1-standard-4 (CPU)	$0.19	$0.06	Small models, testing
n1-standard-4 + T4 GPU	$0.54	$0.16	Most training tasks
n1-standard-8 + V100 GPU	$2.86	$0.86	Large models, fast training

📚 GCP Documentation:

GCP Integration Guide - Complete setup and usage
GCP Quick Start - Get started in 10 minutes
API Documentation - Interactive API docs (when server is running)

📚 Documentation

Getting Started Guide - Installation and first steps
Method Comparison - Detailed comparison of techniques
Configuration Guide - How to configure training
Unsloth Quick Start - Fast training with Unsloth
Mobile Deployment Guide - Deploy models to Android and iOS
Streamlit UI Quick Start - Get started with the Streamlit interface
Streamlit UI Complete Guide - Full feature documentation
GitHub Environments & Secrets - Setup for CI/CD and deployments
Reproducibility Checklist - Ensure reproducible results
FAQ & Troubleshooting - Common issues and solutions
Onboarding Playbook - Learning path and resources

🎓 Learning Resources

Tutorials

Each method includes:

Implementation notes with code walkthrough
Observed trade-offs and best practices
Troubleshooting tips for common issues
Reproducibility checklist

🧪 Example Results

Benchmark on TinyLlama-1.1B with Alpaca dataset (1000 samples):

Method	Training Time	Memory Usage	Trainable Params	Perplexity
LoRA	~15 min	~8 GB	~4.2M (0.4%)	~12.5
QLoRA	~20 min	~4 GB	~4.2M (0.4%)	~12.8

Results may vary based on hardware and configuration

🔧 Troubleshooting

Common Issues

ImportError: cannot import name 'AttrsDescriptor' from 'triton.compiler.compiler'

This occurs when training with Unsloth due to Torch 2.9.1 + Triton 3.5.1 compatibility.

Solution: This is already fixed in the latest version. See docs/TRITON_TORCH_COMPATIBILITY.md for details and alternatives.

Out of Memory (OOM) Errors

Use our troubleshooting guide: docs/TROUBLESHOOTING_OOM.md

Other Issues

Check docs/ folder for detailed documentation
See FAQ for common questions
Open an issue on GitHub

🤝 Contributing

Contributions are welcome! Areas of interest:

New fine-tuning methods (AdaLoRA, IA3, etc.)
Additional evaluation metrics
More datasets and tasks
Documentation improvements
Bug fixes and optimizations

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Hugging Face for Transformers and PEFT libraries
TinyLlama team for the excellent small LLM
Research community for developing these methods

📮 Contact

For questions or issues, please open an issue on GitHub.

Note: This framework is designed for educational and research purposes. For production deployments, additional testing and optimization may be required.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github		.github
.streamlit		.streamlit
.vscode		.vscode
backup/chainlit_archive		backup/chainlit_archive
configs		configs
docs		docs
examples		examples
notebooks		notebooks
scripts		scripts
src		src
tests		tests
unsloth_compiled_cache		unsloth_compiled_cache
unsloth_docs		unsloth_docs
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
OFFICIAL_UNSLOTH_GUIDE.md		OFFICIAL_UNSLOTH_GUIDE.md
README.md		README.md
launch_ui.bat		launch_ui.bat
launch_ui.sh		launch_ui.sh
requirements.txt		requirements.txt
setup.py		setup.py
sitecustomize.py		sitecustomize.py
validate_structure.py		validate_structure.py
validate_ui.py		validate_ui.py
validate_unsloth_integration.py		validate_unsloth_integration.py

License

ajay-sai/VSLM_Fine_Tuning

Folders and files

Latest commit

History

Repository files navigation