โญ v2.0 UPDATE: Now following official Unsloth patterns from docs.unsloth.ai. See OFFICIAL_UNSLOTH_GUIDE.md for the recommended approach.
A comprehensive, modular framework for learning and benchmarking parameter-efficient fine-tuning methods for small language models. This repository provides production-ready implementations of modern fine-tuning techniques including LoRA, QLoRA, DPO, and Unsloth with extensive documentation, mobile deployment support, reproducible experiments, and Google Cloud Platform integration for scalable training.
Current LLM fine-tuning approaches evolve rapidly. This project provides:
- Modular Implementations: Clean, reusable code for each fine-tuning method
- Comprehensive Benchmarking: Compare methods on standardized metrics
- Production-Ready: Well-tested, documented code following best practices
- Educational Resources: Step-by-step guides and learning materials
- Reproducible Experiments: Fully documented configurations and results
- Cross-Platform: Verified support for Windows (CPU/GPU) and Linux
- Hyperparameter Tuning: Integrated Optuna support for optimizing model performance
- Mobile Deployment: Export models for Android and iOS devices with Unsloth
- Optimized Training: 2-5x faster training with Unsloth's optimized kernels
- โ๏ธ GCP Integration: Authenticate with Google accounts, provision GPU/TPU resources, and train models remotely on GCP infrastructure with cost estimation and monitoring
# Clone the repository
git clone https://github.com/ajay-sai/VSML_Fine_Tuning.git
cd VSML_Fine_Tuning
# Create virtual environment
python -m venv .venv
# Windows: .venv\Scripts\activate
# Linux/Mac: source .venv/bin/activate
# Install dependencies
pip install -r requirements.txtLaunch the Streamlit-based interactive interface for a modern, visual fine-tuning experience:
# Quick launch (Unix/Linux/Mac)
./launch_ui.sh
# Or on Windows
launch_ui.bat
# Or directly with streamlit
streamlit run streamlit_app.pyModern Streamlit UI Features:
- ๐ฏ Multi-Page Interface: Clear workflow navigation through selection, configuration, training, and results
- ๐ Visual Configuration: Interactive sliders, dropdowns, and forms for all parameters
- ๐ Dataset Preview: View sample data and statistics before training
- ๐๏ธ Live Progress: Real-time training progress bars and status updates
- ๐ฑ Mobile Export: Built-in Unsloth GGUF export for Android/iOS deployment
- ๐ Educational Content: Comprehensive tooltips, help panels, and documentation links
- ๐พ Download Options: Export trained models, configs, and mobile packages
Quick Start:
- Open the UI (it launches in your browser at
http://localhost:8501) - Click a workflow card (try "CPU Quick Experiment" first!)
- Adjust settings using the interactive widgets
- Preview your dataset in the Dataset Preview tab
- Click "
โถ๏ธ Start Training" and monitor live progress - Download your trained model and optional mobile export
Note for GitHub Copilot Users: The repository includes Actions setup steps (
.github/copilot/setup-steps.yml) that configure the environment to avoid firewall issues with external connections. This ensures the Streamlit app works properly in the Copilot coding agent environment.
๐ New to the UI? Start with the Quick Start Guide
๐ Full Documentation: Streamlit UI Guide
If you don't have a GPU, you can still learn the mechanics of fine-tuning using our CPU-optimized configuration with the SmolLM2-135M model.
# Run a complete fine-tuning loop on CPU (~1 minute)
python examples/train_lora.py --config configs/cpu_config.yamlpython examples/train_lora.py --config configs/lora_config.yamlUnsloth provides optimized training with 80% less memory usage:
python examples/train_unsloth.py --config configs/unsloth_config.yaml๐ฉ๏ธ Cloud Alternatives (Recommended for Resource-Constrained Systems):
If you encounter OOM (Out of Memory) errors locally, use our ready-to-run cloud notebooks:
-
- Free T4 GPU (15GB VRAM)
- No installation required
- Complete fine-tuning in 10-15 minutes
-
Kaggle Notebooks (Free GPU): Open in Kaggle
- Free T4/P100 GPU (up to 16GB VRAM)
- 30 hours/week GPU time
- Save models as Kaggle datasets
Both notebooks include:
- โ Complete environment setup
- โ Step-by-step instructions
- โ Model testing and download
- โ Mobile export (GGUF format)
For mobile deployment, see Mobile Deployment Guide.
Automatically find the best learning rate and rank for your data:
python examples/tune_lora.pyCompare different LoRA configurations:
python examples/run_benchmark.py --methods lora_r4 lora_r16 --max-samples 100VSML_Fine_Tuning/
โโโ src/
โ โโโ methods/ # Fine-tuning implementations
โ โ โโโ lora.py # Standard LoRA
โ โ โโโ qlora.py # Quantized LoRA
โ โ โโโ dpo.py # Direct Preference Optimization
โ โ โโโ unsloth.py # Unsloth optimized training
โ โโโ data/ # Data loading and preprocessing
โ โ โโโ loader.py
โ โโโ evaluation/ # Evaluation and benchmarking
โ โ โโโ evaluator.py
โ โ โโโ benchmark.py
โ โโโ gcp_integration/ # โ๏ธ GCP integration (NEW!)
โ โ โโโ auth.py # GCP authentication
โ โ โโโ resource_manager.py # VM provisioning
โ โ โโโ training_orchestrator.py # Remote training
โ โโโ backend/ # FastAPI backend for GCP
โ โ โโโ app.py # API server
โ โ โโโ models.py # Request/response models
โ โโโ utils/ # Utilities
โ โโโ config.py
โ โโโ logging.py
โโโ configs/ # Configuration files
โ โโโ gcp_config.yaml # GCP configuration
โโโ examples/ # Example training scripts
โ โโโ launch_gcp_backend.py # Start GCP API server
โ โโโ gcp_resource_example.py # GCP resource management
โโโ docs/ # Documentation
โ โโโ GCP_INTEGRATION.md # Full GCP guide
โ โโโ GCP_QUICKSTART.md # Quick start guide
โโโ tests/ # Unit tests
โโโ requirements.txt # Dependencies
| Method | Description | Memory Efficiency | Speed | Best For |
|---|---|---|---|---|
| LoRA | Low-Rank Adaptation | High | Fast | General purpose, good balance |
| QLoRA | 4-bit Quantized LoRA | Very High | Moderate | Limited GPU memory |
| DPO | Direct Preference Optimization | Medium | Moderate | Alignment from human preferences |
| Unsloth | Optimized LoRA Training | Very High | Very Fast | Production, Mobile deployment |
Unsloth provides significant improvements over standard implementations:
- 2-5x faster training with optimized kernels
- 80% less memory usage through efficient quantization
- Mobile export support for Android and iOS deployment
- Sequence packing for 3x faster training
- Compatible with all LoRA variants
For detailed comparisons, see docs/method_comparison.md
- โ Multiple Fine-Tuning Methods: LoRA, QLoRA, DPO, Unsloth with more coming
- โ Flexible Configuration: YAML-based configs for reproducibility
- โ Comprehensive Evaluation: Perplexity, generation quality, resource usage
- โ Automated Benchmarking: Compare methods side-by-side
- โ Production Ready: Modular, tested, well-documented code
- โ Mobile Deployment: Export models for Android and iOS devices
- โ Optimized Training: 2-5x faster with Unsloth's optimized kernels
- Performance: Accuracy, Perplexity, Loss
- Efficiency: Training time, Inference speed, Throughput
- Resources: Memory usage, Trainable parameters
- Quality: Generation samples, Task-specific metrics
Train your models on powerful GCP infrastructure with GPU/TPU support!
# 1. Set up GCP project and service account (see docs/GCP_QUICKSTART.md)
# 2. Configure GCP credentials
export GCP_SERVICE_ACCOUNT_KEY=~/gcp-key.json
export GCP_PROJECT_ID=your-project-id
# 3. Start the GCP backend API
python examples/launch_gcp_backend.py
# 4. Use the API or CLI to manage resources
python examples/gcp_resource_example.py \
--service-account-key ~/gcp-key.json \
--action create \
--vm-name training-vm \
--gpu-type nvidia-tesla-t4 \
--gpu-count 1- ๐ Authentication: Google OAuth2 or service account authentication
- ๐ป Resource Provisioning: Create GPU/TPU VMs with custom configurations
- ๐ฏ Remote Training: Submit and monitor training jobs on GCP infrastructure
- ๐ฐ Cost Estimation: Get detailed cost estimates before provisioning
- ๐ Job Monitoring: Track training progress and retrieve results
- ๐ Resource Management: Start, stop, delete VMs as needed
- โก Preemptible Instances: Save 60-80% on compute costs
| Configuration | Regular | Preemptible | Best For |
|---|---|---|---|
| n1-standard-4 (CPU) | $0.19 | $0.06 | Small models, testing |
| n1-standard-4 + T4 GPU | $0.54 | $0.16 | Most training tasks |
| n1-standard-8 + V100 GPU | $2.86 | $0.86 | Large models, fast training |
๐ GCP Documentation:
- GCP Integration Guide - Complete setup and usage
- GCP Quick Start - Get started in 10 minutes
- API Documentation - Interactive API docs (when server is running)
- Getting Started Guide - Installation and first steps
- Method Comparison - Detailed comparison of techniques
- Configuration Guide - How to configure training
- Unsloth Quick Start - Fast training with Unsloth
- Mobile Deployment Guide - Deploy models to Android and iOS
- Streamlit UI Quick Start - Get started with the Streamlit interface
- Streamlit UI Complete Guide - Full feature documentation
- GitHub Environments & Secrets - Setup for CI/CD and deployments
- Reproducibility Checklist - Ensure reproducible results
- FAQ & Troubleshooting - Common issues and solutions
- Onboarding Playbook - Learning path and resources
- LoRA Paper (Hu et al., 2021)
- QLoRA Paper (Dettmers et al., 2023)
- TinyLlama Paper (Zhang et al., 2024)
- Hugging Face PEFT Documentation
Each method includes:
- Implementation notes with code walkthrough
- Observed trade-offs and best practices
- Troubleshooting tips for common issues
- Reproducibility checklist
Benchmark on TinyLlama-1.1B with Alpaca dataset (1000 samples):
| Method | Training Time | Memory Usage | Trainable Params | Perplexity |
|---|---|---|---|---|
| LoRA | ~15 min | ~8 GB | ~4.2M (0.4%) | ~12.5 |
| QLoRA | ~20 min | ~4 GB | ~4.2M (0.4%) | ~12.8 |
Results may vary based on hardware and configuration
ImportError: cannot import name 'AttrsDescriptor' from 'triton.compiler.compiler'
This occurs when training with Unsloth due to Torch 2.9.1 + Triton 3.5.1 compatibility.
Solution: This is already fixed in the latest version. See docs/TRITON_TORCH_COMPATIBILITY.md for details and alternatives.
Out of Memory (OOM) Errors
Use our troubleshooting guide: docs/TROUBLESHOOTING_OOM.md
Other Issues
- Check docs/ folder for detailed documentation
- See FAQ for common questions
- Open an issue on GitHub
Contributions are welcome! Areas of interest:
- New fine-tuning methods (AdaLoRA, IA3, etc.)
- Additional evaluation metrics
- More datasets and tasks
- Documentation improvements
- Bug fixes and optimizations
This project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face for Transformers and PEFT libraries
- TinyLlama team for the excellent small LLM
- Research community for developing these methods
For questions or issues, please open an issue on GitHub.
Note: This framework is designed for educational and research purposes. For production deployments, additional testing and optimization may be required.