VLM Distillation: Knowledge Distillation Experiments

A comprehensive exploration of Knowledge Distillation techniques on CIFAR-100, comparing same-architecture and cross-architecture approaches across Vision Transformers (ViT) and CNNs (ResNet).

📊 Results Summary

ViT-Based Distillation (VIT_distill/)

Experiment	Teacher	Student	Baseline	Distilled	Gain	Compression
ViT → ViT	ViT-Base (85.8M)	ViT-Tiny (5.5M)	76.91%	83.38%	+6.47%	15.5x
ViT → ResNet	ViT-Base (85.8M)	ResNet-18 (11.2M)	78.44%	80.86%	+2.42%	7.6x
ViT → MobileViT	ViT-Base (85.8M)	MobileViT-S (5.0M)	83.90%	84.45%	+0.55%	17.2x

ResNet-Based Distillation (Resnet_distill/)

Experiment	Teacher	Student	Baseline	Distilled	Gain	Compression
ResNet → ViT	ResNet-152 (58.3M)	ViT-Tiny (5.5M)	79.38%	80.72%	+1.34%	10.5x
ResNet → ResNet	ResNet-152 (58.3M)	ResNet-18 (11.2M)	81.02%	81.96%	+0.94%	5.2x

🔑 Key Findings

Same-Architecture Distillation Works Best: ViT-Base → ViT-Tiny achieved the highest improvement (+6.47%)
Cross-Architecture Works: Knowledge transfers between CNNs ↔ Transformers in both directions
Strong Baselines Limit Gains: MobileViT & ResNet-18 have strong pretrained weights, leaving less room for improvement
Compression Champion: MobileViT achieves 17.2x compression while maintaining 84.45% accuracy

📁 Repository Structure

VLM_Distillation/
├── VIT_distill/                    # ViT-Base as Teacher
│   ├── vit_base_patch16_224_cifar100.py   # Train ViT-Base teacher
│   ├── baseline_vit_tiny.py               # ViT-Tiny baseline
│   ├── distill_vit_tiny.py                # ViT-Base → ViT-Tiny
│   ├── baseline_resnet.py                 # ResNet-18 baseline
│   ├── distillation_resnet.py             # ViT-Base → ResNet-18
│   ├── baseline_mobilevit.py              # MobileViT-S baseline
│   └── distill_mobilevit.py               # ViT-Base → MobileViT-S
│
└── Resnet_distill/                 # ResNet-152 as Teacher
    ├── resnet152_cifar100_teacher.py      # Train ResNet-152 teacher
    ├── baseline_vit_tiny.py               # ViT-Tiny baseline
    ├── distill_vit_tiny.py                # ResNet-152 → ViT-Tiny
    ├── baseline_resnet18.py               # ResNet-18 baseline
    └── distill_resnet18.py                # ResNet-152 → ResNet-18

🚀 Quick Start

# Clone and navigate
cd VLM_Distillation

# Train a teacher model
python VIT_distill/vit_base_patch16_224_cifar100.py

# Train baseline student
python VIT_distill/baseline_vit_tiny.py

# Distill teacher → student
python VIT_distill/distill_vit_tiny.py

⚙️ Training Configuration

Parameter	Value
Dataset	CIFAR-100 (224×224)
Temperature (T)	4.0
Alpha (α)	0.5
Optimizer	AdamW
Learning Rate	3e-4
Epochs	10
Batch Size	64-128

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
ICLR_IGDM		ICLR_IGDM
Resnet_distill		Resnet_distill
VIT_distill		VIT_distill
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLM Distillation: Knowledge Distillation Experiments

📊 Results Summary

ViT-Based Distillation (VIT_distill/)

ResNet-Based Distillation (Resnet_distill/)

🔑 Key Findings

📁 Repository Structure

🚀 Quick Start

⚙️ Training Configuration

📚 References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Xavaitron/VLM_Distillation

Folders and files

Latest commit

History

Repository files navigation

VLM Distillation: Knowledge Distillation Experiments

📊 Results Summary

ViT-Based Distillation (VIT_distill/)

ResNet-Based Distillation (Resnet_distill/)

🔑 Key Findings

📁 Repository Structure

🚀 Quick Start

⚙️ Training Configuration

📚 References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages