Skip to content

ajay-sai/VSLM_Fine_Tuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

15 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

VSML Fine-Tuning: Master Modular Fine-Tuning for Small LLMs

โญ v2.0 UPDATE: Now following official Unsloth patterns from docs.unsloth.ai. See OFFICIAL_UNSLOTH_GUIDE.md for the recommended approach.

Python 3.8+ License: MIT GCP Integration

A comprehensive, modular framework for learning and benchmarking parameter-efficient fine-tuning methods for small language models. This repository provides production-ready implementations of modern fine-tuning techniques including LoRA, QLoRA, DPO, and Unsloth with extensive documentation, mobile deployment support, reproducible experiments, and Google Cloud Platform integration for scalable training.

๐ŸŽฏ Overview

Current LLM fine-tuning approaches evolve rapidly. This project provides:

  • Modular Implementations: Clean, reusable code for each fine-tuning method
  • Comprehensive Benchmarking: Compare methods on standardized metrics
  • Production-Ready: Well-tested, documented code following best practices
  • Educational Resources: Step-by-step guides and learning materials
  • Reproducible Experiments: Fully documented configurations and results
  • Cross-Platform: Verified support for Windows (CPU/GPU) and Linux
  • Hyperparameter Tuning: Integrated Optuna support for optimizing model performance
  • Mobile Deployment: Export models for Android and iOS devices with Unsloth
  • Optimized Training: 2-5x faster training with Unsloth's optimized kernels
  • โ˜๏ธ GCP Integration: Authenticate with Google accounts, provision GPU/TPU resources, and train models remotely on GCP infrastructure with cost estimation and monitoring

๐Ÿš€ Quick Start

Installation

# Clone the repository
git clone https://github.com/ajay-sai/VSML_Fine_Tuning.git
cd VSML_Fine_Tuning

# Create virtual environment
python -m venv .venv
# Windows: .venv\Scripts\activate
# Linux/Mac: source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Interactive UI (Recommended for Beginners)

Launch the Streamlit-based interactive interface for a modern, visual fine-tuning experience:

# Quick launch (Unix/Linux/Mac)
./launch_ui.sh

# Or on Windows
launch_ui.bat

# Or directly with streamlit
streamlit run streamlit_app.py

Modern Streamlit UI Features:

  • ๐ŸŽฏ Multi-Page Interface: Clear workflow navigation through selection, configuration, training, and results
  • ๐Ÿ“ Visual Configuration: Interactive sliders, dropdowns, and forms for all parameters
  • ๐Ÿ“Š Dataset Preview: View sample data and statistics before training
  • ๐Ÿ‹๏ธ Live Progress: Real-time training progress bars and status updates
  • ๐Ÿ“ฑ Mobile Export: Built-in Unsloth GGUF export for Android/iOS deployment
  • ๐ŸŽ“ Educational Content: Comprehensive tooltips, help panels, and documentation links
  • ๐Ÿ’พ Download Options: Export trained models, configs, and mobile packages

Quick Start:

  1. Open the UI (it launches in your browser at http://localhost:8501)
  2. Click a workflow card (try "CPU Quick Experiment" first!)
  3. Adjust settings using the interactive widgets
  4. Preview your dataset in the Dataset Preview tab
  5. Click "โ–ถ๏ธ Start Training" and monitor live progress
  6. Download your trained model and optional mobile export

Note for GitHub Copilot Users: The repository includes Actions setup steps (.github/copilot/setup-steps.yml) that configure the environment to avoid firewall issues with external connections. This ensures the Streamlit app works properly in the Copilot coding agent environment.

๐Ÿ“– New to the UI? Start with the Quick Start Guide
๐Ÿ“š Full Documentation: Streamlit UI Guide

Learning by Doing (CPU Friendly)

If you don't have a GPU, you can still learn the mechanics of fine-tuning using our CPU-optimized configuration with the SmolLM2-135M model.

# Run a complete fine-tuning loop on CPU (~1 minute)
python examples/train_lora.py --config configs/cpu_config.yaml

Advanced Usage

Train with LoRA (GPU)

python examples/train_lora.py --config configs/lora_config.yaml

Train with Unsloth (2-5x Faster!)

Unsloth provides optimized training with 80% less memory usage:

python examples/train_unsloth.py --config configs/unsloth_config.yaml

๐ŸŒฉ๏ธ Cloud Alternatives (Recommended for Resource-Constrained Systems):

If you encounter OOM (Out of Memory) errors locally, use our ready-to-run cloud notebooks:

  • Google Colab (Free GPU): Open In Colab

    • Free T4 GPU (15GB VRAM)
    • No installation required
    • Complete fine-tuning in 10-15 minutes
  • Kaggle Notebooks (Free GPU): Open in Kaggle

    • Free T4/P100 GPU (up to 16GB VRAM)
    • 30 hours/week GPU time
    • Save models as Kaggle datasets

Both notebooks include:

  • โœ… Complete environment setup
  • โœ… Step-by-step instructions
  • โœ… Model testing and download
  • โœ… Mobile export (GGUF format)

For mobile deployment, see Mobile Deployment Guide.

Hyperparameter Tuning

Automatically find the best learning rate and rank for your data:

python examples/tune_lora.py

Run Benchmarks

Compare different LoRA configurations:

python examples/run_benchmark.py --methods lora_r4 lora_r16 --max-samples 100

๐Ÿ“ Project Structure

VSML_Fine_Tuning/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ methods/          # Fine-tuning implementations
โ”‚   โ”‚   โ”œโ”€โ”€ lora.py      # Standard LoRA
โ”‚   โ”‚   โ”œโ”€โ”€ qlora.py     # Quantized LoRA
โ”‚   โ”‚   โ”œโ”€โ”€ dpo.py       # Direct Preference Optimization
โ”‚   โ”‚   โ””โ”€โ”€ unsloth.py   # Unsloth optimized training
โ”‚   โ”œโ”€โ”€ data/            # Data loading and preprocessing
โ”‚   โ”‚   โ””โ”€โ”€ loader.py
โ”‚   โ”œโ”€โ”€ evaluation/      # Evaluation and benchmarking
โ”‚   โ”‚   โ”œโ”€โ”€ evaluator.py
โ”‚   โ”‚   โ””โ”€โ”€ benchmark.py
โ”‚   โ”œโ”€โ”€ gcp_integration/ # โ˜๏ธ GCP integration (NEW!)
โ”‚   โ”‚   โ”œโ”€โ”€ auth.py      # GCP authentication
โ”‚   โ”‚   โ”œโ”€โ”€ resource_manager.py  # VM provisioning
โ”‚   โ”‚   โ””โ”€โ”€ training_orchestrator.py  # Remote training
โ”‚   โ”œโ”€โ”€ backend/         # FastAPI backend for GCP
โ”‚   โ”‚   โ”œโ”€โ”€ app.py       # API server
โ”‚   โ”‚   โ””โ”€โ”€ models.py    # Request/response models
โ”‚   โ””โ”€โ”€ utils/           # Utilities
โ”‚       โ”œโ”€โ”€ config.py
โ”‚       โ””โ”€โ”€ logging.py
โ”œโ”€โ”€ configs/             # Configuration files
โ”‚   โ””โ”€โ”€ gcp_config.yaml  # GCP configuration
โ”œโ”€โ”€ examples/            # Example training scripts
โ”‚   โ”œโ”€โ”€ launch_gcp_backend.py  # Start GCP API server
โ”‚   โ””โ”€โ”€ gcp_resource_example.py  # GCP resource management
โ”œโ”€โ”€ docs/               # Documentation
โ”‚   โ”œโ”€โ”€ GCP_INTEGRATION.md  # Full GCP guide
โ”‚   โ””โ”€โ”€ GCP_QUICKSTART.md   # Quick start guide
โ”œโ”€โ”€ tests/              # Unit tests
โ””โ”€โ”€ requirements.txt    # Dependencies

๐Ÿ› ๏ธ Supported Methods

Method Description Memory Efficiency Speed Best For
LoRA Low-Rank Adaptation High Fast General purpose, good balance
QLoRA 4-bit Quantized LoRA Very High Moderate Limited GPU memory
DPO Direct Preference Optimization Medium Moderate Alignment from human preferences
Unsloth Optimized LoRA Training Very High Very Fast Production, Mobile deployment

Unsloth Benefits

Unsloth provides significant improvements over standard implementations:

  • 2-5x faster training with optimized kernels
  • 80% less memory usage through efficient quantization
  • Mobile export support for Android and iOS deployment
  • Sequence packing for 3x faster training
  • Compatible with all LoRA variants

For detailed comparisons, see docs/method_comparison.md

๐Ÿ“Š Features

Core Capabilities

  • โœ… Multiple Fine-Tuning Methods: LoRA, QLoRA, DPO, Unsloth with more coming
  • โœ… Flexible Configuration: YAML-based configs for reproducibility
  • โœ… Comprehensive Evaluation: Perplexity, generation quality, resource usage
  • โœ… Automated Benchmarking: Compare methods side-by-side
  • โœ… Production Ready: Modular, tested, well-documented code
  • โœ… Mobile Deployment: Export models for Android and iOS devices
  • โœ… Optimized Training: 2-5x faster with Unsloth's optimized kernels

Evaluation Metrics

  • Performance: Accuracy, Perplexity, Loss
  • Efficiency: Training time, Inference speed, Throughput
  • Resources: Memory usage, Trainable parameters
  • Quality: Generation samples, Task-specific metrics

โ˜๏ธ GCP Integration (NEW!)

Train your models on powerful GCP infrastructure with GPU/TPU support!

Quick Start with GCP

# 1. Set up GCP project and service account (see docs/GCP_QUICKSTART.md)
# 2. Configure GCP credentials
export GCP_SERVICE_ACCOUNT_KEY=~/gcp-key.json
export GCP_PROJECT_ID=your-project-id

# 3. Start the GCP backend API
python examples/launch_gcp_backend.py

# 4. Use the API or CLI to manage resources
python examples/gcp_resource_example.py \
  --service-account-key ~/gcp-key.json \
  --action create \
  --vm-name training-vm \
  --gpu-type nvidia-tesla-t4 \
  --gpu-count 1

GCP Features

  • ๐Ÿ” Authentication: Google OAuth2 or service account authentication
  • ๐Ÿ’ป Resource Provisioning: Create GPU/TPU VMs with custom configurations
  • ๐ŸŽฏ Remote Training: Submit and monitor training jobs on GCP infrastructure
  • ๐Ÿ’ฐ Cost Estimation: Get detailed cost estimates before provisioning
  • ๐Ÿ“Š Job Monitoring: Track training progress and retrieve results
  • ๐Ÿ”„ Resource Management: Start, stop, delete VMs as needed
  • โšก Preemptible Instances: Save 60-80% on compute costs

Cost Estimates (per hour)

Configuration Regular Preemptible Best For
n1-standard-4 (CPU) $0.19 $0.06 Small models, testing
n1-standard-4 + T4 GPU $0.54 $0.16 Most training tasks
n1-standard-8 + V100 GPU $2.86 $0.86 Large models, fast training

๐Ÿ“š GCP Documentation:

๐Ÿ“š Documentation

๐ŸŽ“ Learning Resources

Recommended Reading

Tutorials

Each method includes:

  • Implementation notes with code walkthrough
  • Observed trade-offs and best practices
  • Troubleshooting tips for common issues
  • Reproducibility checklist

๐Ÿงช Example Results

Benchmark on TinyLlama-1.1B with Alpaca dataset (1000 samples):

Method Training Time Memory Usage Trainable Params Perplexity
LoRA ~15 min ~8 GB ~4.2M (0.4%) ~12.5
QLoRA ~20 min ~4 GB ~4.2M (0.4%) ~12.8

Results may vary based on hardware and configuration

๐Ÿ”ง Troubleshooting

Common Issues

ImportError: cannot import name 'AttrsDescriptor' from 'triton.compiler.compiler'

This occurs when training with Unsloth due to Torch 2.9.1 + Triton 3.5.1 compatibility.

Solution: This is already fixed in the latest version. See docs/TRITON_TORCH_COMPATIBILITY.md for details and alternatives.

Out of Memory (OOM) Errors

Use our troubleshooting guide: docs/TROUBLESHOOTING_OOM.md

Other Issues

๐Ÿค Contributing

Contributions are welcome! Areas of interest:

  • New fine-tuning methods (AdaLoRA, IA3, etc.)
  • Additional evaluation metrics
  • More datasets and tasks
  • Documentation improvements
  • Bug fixes and optimizations

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Hugging Face for Transformers and PEFT libraries
  • TinyLlama team for the excellent small LLM
  • Research community for developing these methods

๐Ÿ“ฎ Contact

For questions or issues, please open an issue on GitHub.


Note: This framework is designed for educational and research purposes. For production deployments, additional testing and optimization may be required.

About

Fine tuning

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published