Learn machine learning by building it from scratch, then applying it to solve real-world problems
This repository provides comprehensive machine learning implementations built from first principles, combined with production-ready examples for real-world deployment.
Learn by Implementation: Every algorithm is built from scratch using minimal dependencies, helping you understand the mathematics and intuition behind ML/DL techniques.
Production-Ready Examples: Bridge the gap between academic understanding and real-world deployment with complete end-to-end pipelines.
Comprehensive Coverage: From classical ML to cutting-edge deep learning, covering supervised/unsupervised learning, NLP, computer vision, and reinforcement learning.
If you are studying ML from first principles, do not start by jumping between random folders. Use the First Principles Study Guide for a verified learning order, concrete commands, and notes on which parts of the repository are the best starting points.
After you finish the core learn/ path, use the Use Cases Guide to move into the cleaned-up applied examples in a sensible order.
# Clone the repository
git clone https://github.com/Asad-Ismail/Real_World_ML.git
cd Real_World_ML
# Install dependencies
pip install -r requirements.txt
# Try a quick example
python learn/Supervised/LogisticRegression/logisticregression.pyThe foundational supervised examples now save their figures inside each algorithm folder under results/.
/learn/ - Algorithm Implementations
Supervised Learning /learn/Supervised/
- Decision Trees - From scratch tree building with various splitting criteria
- Ensemble Methods - Gradient boosting, random forests with custom implementations
- Linear Models - Linear/logistic regression with regularization
- Support Vector Machines - SVM implementation with different kernels
- Naive Bayes - Probabilistic classification
- k-NN - Instance-based learning
Deep Learning
- CNNs - Convolutional neural networks from scratch
- LLMs from Scratch - Transformer architecture, attention mechanisms, BPE tokenization
- Generative Models - GANs, VAEs, diffusion models, NeRF implementations
Unsupervised Learning /learn/Unsupervised/
- PCA - Principal component analysis
- t-SNE - Dimensionality reduction and visualization
- K-Means - Clustering algorithms
- Autoencoders - Neural network-based dimensionality reduction
Natural Language Processing /learn/NLP/
- BERT Implementation - Transformer-based language model
- Word2Vec - Skip-gram and CBOW implementations
- Tokenizers - Text preprocessing and tokenization
Reinforcement Learning /learn/Reinforcement_Learning/
- Q-Learning & SARSA - Temporal difference methods
- Policy Gradient - REINFORCE, Actor-Critic, A2C, SAC
- Custom Environments - Grid world and other RL environments
Specialized Topics
- Graph Neural Networks - DGL-based fraud detection pipeline
- Active Learning - Smart data labeling strategies
- Explainable AI - GradCAM, saliency maps, interpretability tools
- Time Series - Forecasting and temporal data analysis
- Probability Theory - Bayesian methods, Kalman filtering, sensor fusion
/Use_Cases/ - Production Examples
Use the Use Cases Guide to see which examples run locally, which ones need Spark/Kafka, and which ones require external API keys.
- Complete Kafka + Spark streaming pipeline
- ML model inference on streaming data
- Scalable architecture for production deployment
- Complete fraud detection pipeline
- Model training, deployment, and monitoring
- Lambda functions for real-time inference
- Distributed image processing with PySpark
- Scalable computer vision workflows
- Comprehensive guide to data-efficient learning
- Transfer learning, semi-supervised, and active learning strategies
- Performance comparisons and best practices
For a more guided beginner sequence, including what to read in the code and what to ignore at first, use the First Principles Study Guide.
- Learning With Less Data → Spark Image Processing → Real-Time Processing
- Graph Neural Networks → Active Learning
- Reinforcement Learning → Explainable AI
Core Dependencies:
- Python 3.7+
- NumPy, Matplotlib, Scikit-learn
- PyTorch (for deep learning examples)
- Additional dependencies listed in
requirements.txt
For Production Examples:
- Apache Kafka (Real-time processing)
- Apache Spark/PySpark (Distributed processing)
- AWS CLI (SageMaker examples)
- Docker (Containerized deployments)
- Educational Focus: Every implementation includes detailed comments explaining the mathematics
- From Scratch Implementation: Minimal external dependencies - understand every line of code
- Comprehensive Testing: Most implementations include test cases and validation examples
- Production Ready: Complete pipelines from data ingestion to model deployment
- Real-World Applications: Tackle fraud detection, image processing, NLP, and time series forecasting
We welcome contributions including:
- Bug fixes and performance improvements
- Enhanced documentation and examples
- New algorithm implementations
- Additional production use cases
Please feel free to open issues and pull requests.
- Detailed Explanations: Check the learning_with_less directory for comprehensive guides
- Research References: Most implementations include links to original papers and theoretical foundations
- Best Practices: Production examples demonstrate industry-standard practices and deployment patterns
For questions, suggestions, or discussions about machine learning concepts, please open an issue in this repository.
