A modernized AlexNet implementation for CIFAR-10 classification with strong regularization, stable training, and real-world deployment via Hugging Face Spaces.
This project presents a research-oriented deep learning pipeline for image classification using a modified AlexNet architecture implemented in PyTorch. The model is trained and evaluated on the CIFAR-10 dataset.
Unlike the original AlexNet designed for ImageNet-scale inputs, this implementation is adapted for small images using:
- Resize (70Γ70)
- Random Crop (64Γ64)
- Strong regularization techniques
- Modern training improvements
- Training Accuracy: ~99.6%
- Validation Accuracy: ~89.48%
- Test Accuracy: ~88.63%
- Early Stopping: Epoch 46/90
You can directly download and reuse the trained model weights from the Hugging Face Model Hub:
π https://huggingface.co/Mudassir-08/alexnet-cifar10
This repository contains the trained PyTorch checkpoint (alexnet_cifar10.pth) which includes:
- model_state_dict
- optimizer_state_dict
- number of classes
β You can load it directly for inference without retraining.
You can test the model in real-time using the deployed Hugging Face Space:
π https://huggingface.co/spaces/Mudassir-08/alexnet-cifar10-demo
This demo allows:
- Image upload
- Real-time classification
- Top-3 probability predictions
β Important Note: The input image must follow CIFAR-10 style preprocessing assumptions:
- RGB image
- Proper object-centered framing
- Similar distribution to CIFAR-10 dataset (airplane, ship, cat, etc.)
- Avoid unrelated or out-of-domain images for best performance
- CIFAR-10 dataset
- 60,000 RGB images (32Γ32)
- 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
- 50,000 training images
- 10,000 test images
Resize(70, 70)
RandomCrop(64, 64)
ToTensor()
Normalize(mean=0.5, std=0.5)
Resize(70, 70)
CenterCrop(64, 64)
ToTensor()
Normalize(mean=0.5, std=0.5)
- Resize (70Γ70): Improves feature richness
- RandomCrop: Adds spatial invariance and augmentation
- CenterCrop: Ensures deterministic evaluation
- Normalization: Stabilizes gradient flow
Conv2D(3 β 64) + BatchNorm + ReLU
MaxPool
Conv2D(64 β 192) + BatchNorm + ReLU
MaxPool
Conv2D(192 β 384) + BatchNorm + ReLU
Conv2D(384 β 256) + BatchNorm + ReLU
Conv2D(256 β 256) + BatchNorm + ReLU
MaxPool
AdaptiveAvgPool2D β (4Γ4)
Flatten (4096)
Linear β 512 + ReLU + Dropout(0.4)
Linear β 256 + ReLU + Dropout(0.4)
Linear β 10 classes
Framework: PyTorch
Optimizer: SGD
Momentum: 0.9
Learning Rate: 0.1
Scheduler: ReduceLROnPlateau
Batch Size: 256
Epochs: 90
Loss Function: CrossEntropy + Label Smoothing (0.1)
Device: CUDA (if available)
Prevents overconfidence and improves generalization.
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
Prevents gradient explosion.
Stops training when validation accuracy stops improving.
Automatically reduces LR on plateau.
- Fast convergence in early epochs (1β10)
- Stable learning phase (10β30)
- Plateau at ~88β89% validation accuracy
- Early stopping at epoch 46
Strong generalization achieved due to:
- Dropout
- Label smoothing
- Stable optimization
Training Accuracy: ~99.6%
Validation Accuracy: ~89.48%
Test Accuracy: ~88.63%
Strong classes:
- airplane
- ship
- truck
Confusions:
- cat β dog
- deer β horse
This reflects visual similarity in CIFAR-10.
Input: 3 Γ 64 Γ 64 image
Feature Extractor: Conv β BN β ReLU β Pool Γ multiple layers
Adaptive Pool: β 4 Γ 4 feature map
Classifier: FC(4096 β 512 β 256 β 10)
saved_trained_model/alexnet_cifar10.pth
Image β Resize β Crop β Normalize β Model β Softmax β Prediction
- PyTorch inference pipeline
- Gradio web interface (app.py)
- Hugging Face Spaces support
Features:
- Image upload
- Real-time prediction
- Top-3 probabilities
AlexNet/
βββ src/
βββ notebooks/
βββ data/
βββ saved_trained_model/
βββ app.py
βββ main.py
βββ requirements.txt
βββ README.md
- Modified AlexNet for CIFAR-10
- Stable training pipeline
- Full evaluation system
- Confusion matrix analysis
- Deployment-ready system
- CIFAR-10 images are resized from 32Γ32 β 64Γ64
- Results depend on seed and hardware
- This is a research implementation, not production-grade
Malik Muhammad Mudassir Iqbal
Deep Learning Research Engineer
Computer Vision β’ PyTorch β’ CNN Architectures