Vision Transformer (ViT) on CIFAR-10 — From Scratch

This project implements a Vision Transformer (ViT) architecture from the ground up in PyTorch, inspired by the research paper

"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"
by Dosovitskiy et al. (arXiv:2010.11929)

The model is trained and evaluated on the CIFAR-10 dataset — chosen for its compact size and suitability for experimentation.
This project was a hands-on exploration of the core ViT architecture, aimed at understanding transformer-based vision models without relying on pre-built modules or pre-trained weights.

Features

ViT architecture implemented entirely from scratch
Patch embedding via Conv2d with flattening and projection
Learnable positional encodings with a [CLS] token
Transformer encoder blocks with:
- Multi-head self-attention
- MLP layers + GELU activation
Classification head operating on [CLS] token
Custom training and evaluation loops
Grid-based visualization of predictions for qualitative insight

Setup & Installation

Clone the repository:

git clone https://github.com/HrishikeshUchake/vit-from-scratch.git
cd vit-from-scratch

Install dependencies:

pip install torch torchvision matplotlib

Dataset

Uses CIFAR-10 from torchvision.datasets
Automatically downloads and normalizes the dataset
10 classes, 32×32 color images, ideal for quick transformer training experiments

Output Visualization

After training, the model produces color-coded grid plots of predictions vs ground truth — useful for:

Identifying common failure modes
Visual confirmation of model confidence
Quick debugging and qualitative evaluation

License

MIT License — feel free to fork, modify, and build upon the code for personal or academic use.

Author

Developed by Hrishikesh Uchake

Coming Soon

Support for larger datasets (e.g. CIFAR-100, TinyImageNet)
Accuracy/loss logging with TensorBoard
CLI training and evaluation wrapper

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
ViT_from_scratch.ipynb		ViT_from_scratch.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Transformer (ViT) on CIFAR-10 — From Scratch

Features

Setup & Installation

Dataset

Output Visualization

License

Author

Coming Soon

About

Uh oh!

Releases

Packages

Languages

HrishikeshUchake/ViT_from_scratch

Folders and files

Latest commit

History

Repository files navigation

Vision Transformer (ViT) on CIFAR-10 — From Scratch

Features

Setup & Installation

Dataset

Output Visualization

License

Author

Coming Soon

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages