Skip to content

NabidAlam/road-to-machine-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Road to ML: From Zero to Hero

GitHub stars GitHub forks GitHub issues GitHub license PRs Welcome

A comprehensive, step-by-step guide to learning Machine Learning from absolute basics to advanced topics

Perfect for beginners • 25 Learning Modules • 23 Real-World Projects • Production-Ready Skills

Get StartedLearning PathProjectsContribute


Why This Repository?

  • Zero to Hero Path: Complete learning journey from basics to advanced ML
  • Beginner-Friendly: No prior experience needed - we cover everything
  • Hands-On Learning: 23 practical projects to build your portfolio
  • Production Focus: Learn deployment, MLOps, and real-world skills
  • Well-Organized: Logical progression with clear learning objectives
  • Community-Driven: Open source, contributions welcome!

Perfect for: Students, career switchers, self-learners, and anyone wanting to master ML

Career Paths

Choose your path! This repository prepares you for multiple ML/AI careers. Select your target role to see a customized learning path:

Role Focus Est. Time Key Modules Full Guide
Data Analyst Insights & Reports 8-12 months 00, 01, 19, 20, 21 View Guide →
Data Scientist Predictive Models 13-20 months 00-08, 15, 19-21 View Guide →
ML Engineer Production ML 17-26 months 00-10, 13-14, 19-21 View Guide →
LLM Engineer Language Models 17-24 months 00-01, 05, 09-10, 12-14, 19 View Guide →
GenAI Solution Architect Production GenAI 15-21 months 00-01, 02, 05, 09-10, 12-14, 19 View Guide →
Computer Vision Engineer Image Processing 16-25 months 00-01, 04-05, 09-11, 13-14, 19, 21 View Guide →
AI Engineer Generalist AI 25-38 months 00-15, 19-21, 22-24 View Guide →
Data Engineer Data Infrastructure 14-20 months 00-01, 13-14, 19-20 View Guide →
MLOps Engineer ML Operations 16-24 months 00-01, 05, 09-10, 13-14, 19 View Guide →
Research Scientist Research & Innovation 24-34 months 00-12, 15, 19, 21, 22-24 View Guide →
BI Analyst Business Intelligence 10-15 months 00-01, 19-21 View Guide →

Each role-specific guide includes:

  • Detailed module recommendations with priorities
  • Role-specific projects to build
  • Essential resources and tools
  • Skills checklist for each role
  • Time estimates and learning path

View Complete Career Roadmap Guide →

Table of Contents

Overview

This repository provides a structured learning path for machine learning, organized in a logical progression from fundamentals to advanced topics. Each module includes:

  • Clear explanations of concepts
  • Hands-on exercises with solutions
  • Practical projects to reinforce learning
  • Additional resources for deeper understanding

Learning Path

Note on Module Numbering: Module numbers (00-24) don't strictly follow the learning path order. The phases below represent the recommended learning sequence. Some modules can be learned in parallel or in different orders based on your goals. See individual module READMEs for prerequisites and suggested learning order.

Time Estimates: Realistic completion time is 14-20 months (full-time, 30-40 hrs/week) or 28-35 months (part-time, 10-15 hrs/week) for comprehensive coverage including all 25 modules and 23 projects. See FAQ section for detailed breakdown.

Learning Path Overview

Phase Modules Focus Area Est. Time (Full-Time) Est. Time (Part-Time)
Phase 0 00 Foundation (Python, Math) 2-3 months 4-6 months
Phase 1 01 Data Fundamentals 2-3 months 4-6 months
Phase 2 02-05 ML Basics 2-3 months 4-6 months
Phase 3 06-07 Advanced ML 1-2 months 2-4 months
Phase 4 08 Unsupervised Learning 1 month 2 months
Phase 5 09-10 Deep Learning Fundamentals 2 months 4 months
Phase 6 11-12, 15 Specialized Deep Learning 3-4 months 6-8 months
Phase 6.5 19-21 Essential Skills (SQL, Imbalanced Data, Explainability) 1-2 months 2-4 months
Phase 7 13-14 Production & MLOps 2-3 months 4-6 months
Phase 8 16-18 Projects (23 total) 4-6 months 8-12 months
Phase 9 22-24 Advanced Specialized Topics (RL, GNNs, Audio) 2-3 months 4-6 months
Total 25 modules Complete Path 14-20 months 28-35 months

Phase 0: Foundation (Prerequisites)

Goal: Build the mathematical and programming foundation needed for ML

  • 00-prerequisites
    • Python Basics (Variables, Data Types, Control Flow, Functions, OOP)
    • Time Complexity & Algorithm Efficiency (Big O notation)
    • Iterators & Generators (Memory-efficient data processing)
    • GUI Development with tkinter
    • Essential Mathematics (Linear Algebra including Tensors, Statistics, Calculus basics)
    • Environment Setup (Python, Jupyter, Virtual Environments)

Phase 1: Data Fundamentals

Goal: Master data manipulation and visualization

  • 01-python-for-data-science
    • NumPy: Arrays, operations, broadcasting
    • Pandas: DataFrames, data manipulation, cleaning
    • Matplotlib & Seaborn: Data visualization
    • Plotly & Dash: Interactive visualizations and web applications
    • Streamlit: Building interactive dashboards and ML applications
    • Flask: Web applications and REST APIs
    • Tableau: Professional data visualization and dashboards
    • EDA, Data Sources, Regular Expressions, Advanced Data Wrangling
    • ETL with AWS RDS: Extract, Transform, Load pipelines
    • Advanced Selenium Web Scraping: Chromedriver, dynamic content, Smartprix example

Phase 2: Machine Learning Basics

Goal: Understand core ML concepts and algorithms

  • 02-introduction-to-ml

    • What is Machine Learning?
    • Types of ML (Supervised, Unsupervised, Reinforcement)
    • ML Workflow and Best Practices
  • 03-supervised-learning-regression

    • Linear Regression
    • Polynomial Regression
    • Regularization (Ridge, Lasso)
    • Evaluation Metrics (MSE, RMSE, MAE, R²)
    • Statistical Regression Analysis (statsmodels: TSS, RSS, ESS, F-statistic, p-values, confidence intervals)
  • 04-supervised-learning-classification

    • Logistic Regression
    • Decision Trees
    • Random Forests
    • Support Vector Machines (SVM)
    • K-Nearest Neighbors (KNN)
    • Naive Bayes (Gaussian, Multinomial, Bernoulli)
    • Multi-Class Classification Strategies (One-vs-Rest, One-vs-One)
    • Evaluation Metrics (Accuracy, Precision, Recall, F1, ROC-AUC)
  • 05-model-evaluation-optimization

    • Train/Validation/Test Split
    • Cross-Validation (K-Fold, Stratified, Leave-One-Out, Time Series Split)
    • Hyperparameter Tuning (Grid Search, Random Search, Bayesian Optimization with Optuna)
    • Bias-Variance Tradeoff
    • Learning Curves
    • Overfitting and Underfitting
    • Model Calibration (Probability Calibration, Platt Scaling, Isotonic Regression)

Phase 3: Advanced Supervised Learning

Goal: Explore ensemble methods and advanced techniques

  • 06-ensemble-methods

    • Bagging (Bootstrap Aggregating)
    • Boosting (AdaBoost with detailed components, Gradient Boosting, XGBoost with advanced topics, LightGBM with GOSS & EFB, CatBoost)
    • Stacking (detailed steps and best practices)
    • Voting Classifiers
  • 07-feature-engineering

    • Feature Selection
    • Feature Transformation
    • Handling Categorical Variables (One-Hot, Label, Target, Frequency, WOE Encoding)
    • Feature Scaling and Normalization
    • Dimensionality Reduction (PCA)
    • Advanced Discretization (Decision Tree-based binning, custom strategies)
    • Decision Tree Visualization with dtreeviz
    • Comprehensive sklearn Pipeline and ColumnTransformer Guide

Phase 4: Unsupervised Learning

Goal: Learn to work with unlabeled data

  • 08-unsupervised-learning
    • Clustering (K-Means, Hierarchical, DBSCAN)
    • Dimensionality Reduction (PCA, t-SNE, SVD)
    • SVD (Singular Value Decomposition) and its connection to PCA
    • Anomaly Detection
    • Association Rules

Phase 5: Deep Learning Fundamentals

Goal: Introduction to neural networks and deep learning

  • 09-neural-networks-basics

    • Perceptron
    • Multi-Layer Perceptron (MLP)
    • Activation Functions
    • Backpropagation
    • Gradient Descent Variants
  • 10-deep-learning-frameworks

    • TensorFlow/Keras Basics
    • PyTorch Basics
    • Building and Training Neural Networks
    • Model Saving and Loading

Phase 6: Specialized Deep Learning

Goal: Master domain-specific deep learning applications

  • 11-computer-vision

    • Introduction to Computer Vision and Visual Cortex
    • Images and Pixels (RGB, Grayscale, Color Models)
    • Convolution and Edge Detection (Sobel, Canny, Prewitt)
    • Padding, Strides, and Spatial Arrangement
    • Convolutional Neural Networks (CNNs)
    • Pooling Mechanisms (Max, Average)
    • CNN Training Optimization (Batch Normalization, Dropout, Callbacks)
    • CNN Architectures: LeNet, AlexNet, VGGNet, ResNet
    • ImageNet and Large-Scale Recognition
    • Transfer Learning
    • Data Augmentation
    • Object Detection (R-CNN to YOLO)
    • Semantic and Instance Segmentation
    • GANs for Image Generation
    • Diffusion Models and Stable Diffusion (with Hugging Face integration)
    • Variational Autoencoders (VAEs)
    • Recent Breakthroughs (Vision Transformers, CLIP)
  • 12-natural-language-processing

    • Text Preprocessing
    • Word Embeddings (Word2Vec, GloVe)
    • Recurrent Neural Networks (RNNs)
    • Long Short-Term Memory (LSTM)
    • Transformers Basics
    • Fine-tuning Transformers (T5, BERT, GPT with Hugging Face)
    • RAG (Retrieval Augmented Generation)
    • Sentiment Analysis
  • 15-time-series-analysis

    • Time Series Fundamentals (Trend, Seasonality, Stationarity)
    • Statistical Methods (ARIMA, SARIMA, Exponential Smoothing)
    • Deep Learning for Time Series (LSTM, GRU, Transformers)
    • Feature Engineering for Time Series
    • Evaluation and Validation
    • Note: Module 15 is placed here logically but numbered after modules 13-14. It can be learned in parallel with modules 11-12 or after Phase 5.

Phase 9: Advanced Specialized Topics

Goal: Master advanced specialized ML domains

Note: These modules cover advanced topics that build on deep learning fundamentals. Learn these after completing Phase 5 (Deep Learning Fundamentals) and Phase 6 (Specialized Deep Learning).

  • 22-reinforcement-learning

    • Markov Decision Processes (MDPs)
    • Value-Based Methods (Q-Learning, DQN)
    • Policy-Based Methods (REINFORCE, Policy Gradients)
    • Actor-Critic Methods
    • Deep Reinforcement Learning
    • Multi-Agent RL, Hierarchical RL, Imitation Learning
    • Applications: Game Playing, Robotics, Recommendation Systems
  • 23-graph-neural-networks

    • Graph Fundamentals and Representations
    • Message Passing in GNNs
    • Graph Convolutional Networks (GCNs)
    • Graph Attention Networks (GATs)
    • GraphSAGE and Other Architectures
    • Applications: Social Networks, Recommendation Systems, Molecular Analysis
  • 24-audio-speech-processing

    • Audio Signal Fundamentals (Waveforms, Spectrograms, MFCCs)
    • Speech Recognition (ASR) - CTC, Attention-based, Whisper
    • Text-to-Speech (TTS) - Neural TTS, Voice Cloning
    • Audio Classification (Music, Events, Emotions)
    • Music Generation
    • Voice Processing (VAD, Speaker ID, Enhancement)

Phase 6.5: Essential Data Science Skills

Goal: Master critical skills for real-world ML applications

Note: These modules can be learned in parallel with other phases or integrated earlier in your learning journey:

  • SQL can be learned after Phase 1 (Data Fundamentals) for better data access skills

  • Imbalanced Data is most useful after Phase 2 (Classification) when you encounter real-world datasets

  • Model Explainability is valuable after Phase 3 (Advanced ML) when working with complex models

  • 19-sql-database-fundamentals

    • Database Fundamentals and SQL Basics
    • SQL DDL and DML Commands
    • SQL Joins, Subqueries, and Window Functions
    • Advanced SQL Topics (CTEs, Window Functions, Stored Procedures)
    • Data Cleaning with SQL
    • SQL with Python Integration
    • NoSQL Databases (MongoDB, Redis, Cassandra, Neo4j)
    • Suggested Timing: Can be learned after Phase 1 or in parallel with Phase 2
  • 20-handling-imbalanced-data

    • Understanding Imbalanced Data Problems
    • Resampling Techniques (SMOTE, ADASYN, Undersampling, Combined)
    • Algorithm-Level Solutions (Class Weights, Threshold Tuning, Cost-Sensitive Learning)
    • Appropriate Evaluation Metrics (PR-AUC, F1-Score)
    • Complete Workflow Examples
    • Best Practices and Common Pitfalls
    • Suggested Timing: Best learned after Phase 2 (Classification) or Phase 3 (Advanced ML)
  • 21-model-explainability

    • Feature Importance Methods (Tree-based, Permutation)
    • SHAP (SHapley Additive exPlanations) - Tree, Kernel, and Advanced
    • LIME (Local Interpretable Model-agnostic Explanations)
    • Partial Dependence Plots (PDP) and ICE Plots
    • Model Interpretation Best Practices
    • Regulatory Compliance and Ethical AI
    • Suggested Timing: Best learned after Phase 3 (Advanced ML) or Phase 5 (Deep Learning)

Phase 7: Production & Deployment

Goal: Learn to deploy ML models in production

Note: Modules 13-14 are numbered before modules 15-18 but logically come after specialized deep learning topics. Learn these after you have built and trained models.

  • 13-model-deployment

    • Model Serialization
    • REST APIs with Flask/FastAPI
    • Docker for ML
    • Cloud Deployment (AWS, GCP, Azure)
    • Production Server Setup (NGINX, SSL/TLS, Domain Configuration)
    • Security Best Practices (Rate Limiting, Authentication, Input Validation)
    • AWS SageMaker Comprehensive Guide
    • A/B Testing (Statistical Significance, Multi-Armed Bandits, Sequential Testing)
    • Model Monitoring
  • 14-mlops-basics

    • Version Control for ML (DVC, MLflow)
    • CI/CD for ML
    • Experiment Tracking
    • Model Registry
    • Cookiecutter for Data Science
    • Apache Kafka for Data Streaming
    • Apache Spark for Big Data Processing

Phase 8: Projects

Goal: Apply knowledge through real-world projects

Note: Projects are numbered 16-18 but should be worked on throughout your learning journey. Start beginner projects after Phase 2, intermediate after Phase 3-4, and advanced after Phase 6-7.

Projects Summary

Category Count Prerequisites Est. Time Status
Beginner 6 Phases 0-2 2-3 weeks All Available
Intermediate 8 Phases 0-4 4-6 weeks All Available
Advanced 9 Phases 0-7 8-12 weeks All Available
Total 23 - 4-6 months 100% Complete

Beginner Projects (6 projects)

# Project Name Skills Time Status
1 House Price Prediction Regression, Feature Engineering, EDA 2-3 days ✓ Available
2 Iris Flower Classification Classification, EDA, Multiple Algorithms 1 day ✓ Available
3 Titanic Survival Prediction Classification, Data Cleaning, Feature Engineering 2-3 days ✓ Available
4 Spam Email Detection Text Classification, NLP Basics 2-3 days ✓ Available
5 Wine Quality Prediction Regression, Feature Engineering 2-3 days ✓ Available
6 Customer Data Dashboard with Streamlit Data Visualization, Streamlit 3-5 days ✓ Available

Prerequisites: Complete Phases 0-2 before starting

Intermediate Projects (8 projects)

# Project Name Skills Time Status
1 Handwritten Digit Recognition (MNIST) Neural Networks, Image Processing 3-5 days ✓ Available
2 Customer Churn Prediction Classification, Imbalanced Data, Business Metrics 4-5 days ✓ Available
3 Movie Recommendation System Collaborative Filtering, Content-Based 5-7 days ✓ Available
4 Credit Card Fraud Detection Anomaly Detection, Imbalanced Data 4-5 days ✓ Available
5 Customer Segmentation Clustering, Unsupervised Learning 3-4 days ✓ Available
6 Time Series Forecasting Time Series Analysis, ARIMA, LSTM 5-7 days ✓ Available
7 Feature Engineering Mastery Feature Engineering, Advanced Techniques 4-5 days ✓ Available
8 Ensemble Methods Comparison Ensemble Methods, Model Comparison 3-4 days ✓ Available

Prerequisites: Complete Phases 0-4 and some Phase 6.5 topics recommended

Advanced Projects (9 projects)

# Project Name Skills Time Status
1 Image Classification (CIFAR-10) CNNs, Transfer Learning, Data Augmentation 1-2 weeks ✓ Available
2 Sentiment Analysis on Reviews NLP, RNNs/LSTMs, Transformers 1-2 weeks ✓ Available
3 Time Series Forecasting (Advanced) Advanced Time Series, Deep Learning 1-2 weeks ✓ Available
4 Chatbot Development NLP, Sequence-to-Sequence, Transformers 1-2 weeks ✓ Available
5 Object Detection Computer Vision, YOLO, R-CNN 1-2 weeks ✓ Available
6 End-to-End ML Pipeline Full ML Pipeline, MLOps 2-3 weeks ✓ Available
7 Generative Model (GAN/VAE) GANs, VAEs, Image Generation 1-2 weeks ✓ Available
8 Model Explainability & Interpretability SHAP, LIME, Model Interpretation 1-2 weeks ✓ Available
9 Model Deployment & Serving Model Deployment, APIs, Cloud 1-2 weeks ✓ Available

Prerequisites: Complete Phases 0-7 recommended for full benefit

Prerequisites

Before starting, you should have:

  • Basic computer literacy
  • Willingness to learn and practice
  • A computer with internet connection

Note: No prior programming or math experience required! We'll cover everything you need.

Getting Started

1. Clone the Repository

git clone https://github.com/NabidAlam/road-to-machine-learning.git
cd road-to-machine-learning

2. Set Up Your Environment

Option A: Using Anaconda (Recommended for Beginners)

# Install Anaconda from https://www.anaconda.com/products/individual

# Create a new environment
conda create -n ml-env python=3.10
conda activate ml-env

# Install required packages
pip install -r requirements.txt

Option B: Using Python venv

# Create virtual environment
python -m venv ml-env

# Activate (Windows)
ml-env\Scripts\activate

# Activate (Mac/Linux)
source ml-env/bin/activate

# Install required packages
pip install -r requirements.txt

3. Install Jupyter Notebook

pip install jupyter notebook
jupyter notebook

4. Start Learning!

Quick Start Option: Want to see ML in action immediately? Check out GETTING_STARTED.md for a 30-minute first project!

Full Learning Path: Follow the modules in order:

  1. Start with 00-prerequisites
  2. Progress through each module sequentially
  3. Complete exercises and projects
  4. Practice, practice, practice!

Repository Structure

Note: All learning modules now include comprehensive detailed guides with code examples, exercises, and solutions. Beginner projects are fully available with READMEs and code. Intermediate and advanced projects have detailed READMEs with instructions.

road-to-ml/

 00-prerequisites/
    01-python-basics.md (includes time complexity, iterators/generators)
    02-linear-algebra.md
    03-statistics-probability.md
    04-calculus.md
    05-environment-setup.md
    README.md

 01-python-for-data-science/
    01-numpy.md
    02-pandas.md
    03-visualization.md (includes Plotly & Dash)
    04-exploratory-data-analysis.md
    05-data-sources-and-integration.md (includes ETL with AWS RDS, Advanced Selenium)
    06-regular-expressions-text-processing.md (includes advanced regex: lookahead, lookbehind, back references)
    07-advanced-data-wrangling.md
    08-working-with-dates-times.md
    09-streamlit-dashboards.md
    10-flask-web-development.md
    11-tableau-visualization.md
    README.md

 02-introduction-to-ml/
    introduction-to-ml.md
    ml-terminology.md
    problem-identification-algorithm-selection.md
    first-ml-project-tutorial.md
    common-pitfalls-best-practices.md
    README.md

 03-supervised-learning-regression/
    regression.md
    regression-advanced-topics.md (includes statsmodels)
    regression-project-tutorial.md
    regression-quick-reference.md
    README.md

 04-supervised-learning-classification/
    classification.md (includes dtreeviz visualization)
    classification-advanced-topics.md
    classification-project-tutorial.md
    classification-quick-reference.md
    README.md

 05-model-evaluation-optimization/
    evaluation-optimization.md
    evaluation-optimization-advanced-topics.md
    evaluation-optimization-project-tutorial.md
    evaluation-optimization-quick-reference.md
    README.md

 06-ensemble-methods/
    ensemble-methods.md
    ensemble-methods-advanced-topics.md
    ensemble-methods-project-tutorial.md
    ensemble-methods-quick-reference.md
    README.md

 07-feature-engineering/
    feature-engineering.md (includes WOE encoding, advanced discretization)
    feature-engineering-advanced-topics.md (includes sklearn Deep Dive: Estimators, Mixins, Composite Transformers, FeatureUnion)
    feature-engineering-project-tutorial.md
    feature-engineering-quick-reference.md
    README.md

 08-unsupervised-learning/
    unsupervised-learning.md
    unsupervised-learning-advanced-topics.md (includes SVD - Singular Value Decomposition with PCA connection)
    unsupervised-learning-project-tutorial.md
    unsupervised-learning-quick-reference.md
    README.md

 09-neural-networks-basics/
    neural-networks.md
    neural-networks-advanced-topics.md
    neural-networks-project-tutorial.md
    neural-networks-quick-reference.md
    README.md

 10-deep-learning-frameworks/
    deep-learning-frameworks.md
    deep-learning-frameworks-advanced-topics.md
    deep-learning-frameworks-project-tutorial.md
    deep-learning-frameworks-quick-reference.md
    README.md

 11-computer-vision/
    computer-vision.md
    computer-vision-advanced-topics.md
    computer-vision-project-tutorial.md
    computer-vision-quick-reference.md
    README.md

 12-natural-language-processing/
    nlp.md
    nlp-advanced-topics.md
    nlp-project-tutorial.md
    nlp-quick-reference.md
    README.md

 13-model-deployment/
    deployment.md (includes FastAPI advanced features: type checking, dependency injection, background tasks)
    deployment-advanced-topics.md (includes AWS SageMaker comprehensive guide)
    deployment-project-tutorial.md
    deployment-quick-reference.md
    README.md

 14-mlops-basics/
    mlops.md (includes Cookiecutter for Data Science)
    mlops-advanced-topics.md (includes Apache Kafka, Apache Spark, Feature Stores)
    mlops-project-tutorial.md
    mlops-quick-reference.md
    README.md

 15-time-series-analysis/
    time-series-analysis.md
    time-series-analysis-advanced-topics.md
    time-series-project-tutorial.md
    time-series-quick-reference.md
    README.md

 22-reinforcement-learning/
    reinforcement-learning.md
    reinforcement-learning-advanced-topics.md
    reinforcement-learning-project-tutorial.md
    reinforcement-learning-quick-reference.md
    README.md

 23-graph-neural-networks/
    graph-neural-networks.md
    graph-neural-networks-advanced-topics.md
    graph-neural-networks-project-tutorial.md
    graph-neural-networks-quick-reference.md
    README.md

 24-audio-speech-processing/
    audio-speech-processing.md
    audio-speech-processing-advanced-topics.md
    audio-speech-processing-project-tutorial.md
    audio-speech-processing-quick-reference.md
    README.md

 16-projects-beginner/
    [6 project directories with code and READMEs]
    README.md

 17-projects-intermediate/
    [8 project directories with READMEs]
    README.md

 18-projects-advanced/
    [9 project directories with READMEs]
    README.md

 19-sql-database-fundamentals/
    sql-database.md
    sql-database-advanced-topics.md (includes NoSQL: MongoDB, Redis, Cassandra, Neo4j)
    sql-database-project-tutorial.md
    sql-database-quick-reference.md
    README.md

 20-handling-imbalanced-data/
    imbalanced-data.md
    imbalanced-data-advanced-topics.md
    imbalanced-data-project-tutorial.md
    imbalanced-data-quick-reference.md
    README.md

 21-model-explainability/
    model-explainability.md
    model-explainability-advanced-topics.md
    model-explainability-project-tutorial.md
    model-explainability-quick-reference.md
    README.md

 resources/
    data_science_cheatsheet.md
    prerequisites_cheatsheet.md
    introduction_to_ml_cheatsheet.md
    mlops_cheatsheet.md
    model_deployment_cheatsheet.md
    imbalanced_data_cheatsheet.md
    model_explainability_cheatsheet.md
    git_guide.md
    math_formulas.md
    common_errors.md
    ethics_in_ml.md
    model_interpretability.md
    ml_glossary.md
    reinforcement_learning.md
    recommender_systems.md
    career_roadmap_guide.md
    career_portfolio.md
    interview_prep.md
    open_source_contribution.md
    kaggle_competitions.md
    docker_tutorial.md
    ml_model_testing.md
    stakeholder_communication.md
    automl_basics.md
    data_validation.md
    web_scraping_guide.md
    powerbi_guide.md
    excel_data_analysis_guide.md
    mlflow_comprehensive_guide.md
    transformer_fine_tuning_guide.md
    langchain_guide.md
    llamaindex_guide.md
    ai_agents_guide.md
    dsa_for_ml_guide.md
    causal_inference_guide.md
    books.md
    courses.md
    datasets.md
    tools.md
    youtube_channels.md
    blogs_websites.md
    practice_platforms.md

 requirements.txt
 LICENSE
 README.md
 CONTRIBUTING.md
 DISCLAIMER.md
 GETTING_STARTED.md
 LEARNING_ROADMAP.md
 QUICK_START.md

Resources

Books

  • Hands-On Machine Learning by Aurélien Géron
  • Pattern Recognition and Machine Learning by Christopher Bishop
  • Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
  • Introduction to Statistical Learning by James et al.

Online Courses

  • Machine Learning by Andrew Ng (Coursera)
  • Deep Learning Specialization by Andrew Ng (Coursera)
  • Fast.ai Practical Deep Learning
  • CS229: Machine Learning (Stanford)

Datasets

Communities

Cheatsheets & Guides

Core ML & Data Science Guides

Guide Description
Data Science & ML Cheatsheet Quick reference for daily work (NumPy, Pandas, PyTorch, TensorFlow, OpenCV, FastAPI, and more)
Prerequisites Cheatsheet Quick reference for Python, Mathematics, and Statistics fundamentals needed for ML
Introduction to ML Cheatsheet Quick reference for ML fundamentals, types, workflow, and key concepts
Math Formulas Reference Essential mathematical formulas for ML (Statistics, Linear Algebra, Calculus, ML metrics)
ML Glossary Comprehensive glossary of ML terms and concepts for beginners
Common Errors & Debugging Guide Troubleshooting guide for common ML errors and debugging strategies
DSA for ML Guide Essential data structures and algorithms for machine learning

Development & Tools

Guide Description
Complete Git & GitHub Guide Comprehensive Git tutorial with commands, outputs, practice exercises, and solutions
Docker Complete Tutorial Comprehensive Docker guide for ML: containerization, Dockerfile, docker-compose, best practices, and deployment
Web Scraping Guide Complete web scraping guide from basics to advanced: Requests, Beautiful Soup, Selenium, Scrapy, CAPTCHA handling

Advanced ML Topics

Guide Description
MLFlow Comprehensive Guide Complete MLFlow guide: experiment tracking, model registry, hyperparameter tuning, MLFlow UI, Docker deployment
MLOps Cheatsheet Quick reference for MLOps tools, practices, and workflows (DVC, MLflow, CI/CD, monitoring)
Model Deployment Cheatsheet Quick reference for deploying ML models (APIs, Docker, cloud platforms, A/B testing)
Imbalanced Data Cheatsheet Quick reference for handling imbalanced datasets (resampling, class weights, metrics)
Model Explainability Cheatsheet Quick reference for explaining ML models (SHAP, LIME, feature importance, PDP)
Transformer Fine-Tuning Guide Comprehensive guide to fine-tuning transformers (T5, BERT, GPT) with Hugging Face
Model Interpretability Guide Understanding and explaining ML model predictions (SHAP, LIME, feature importance)
Reinforcement Learning Basics Introduction to RL, key concepts, algorithms, and applications
Recommender Systems Guide Building recommendation systems (collaborative filtering, content-based, hybrid approaches)
AutoML Basics Guide Introduction to Automated Machine Learning: when to use, popular tools, and integration strategies
Data Validation Guide Comprehensive data validation: schema validation, quality checks, drift detection, and automated pipelines
Causal Inference Guide Comprehensive guide to causal inference: potential outcomes, confounding, RCTs, observational methods (propensity scores, DiD, IV, RDD), causal ML, and tools (DoWhy, EconML)

Generative AI & Modern Tools

Guide Description
RAG Comprehensive Guide Complete guide to Retrieval Augmented Generation: architecture, components, vector databases, advanced techniques, evaluation, and production deployment
Langchain Guide Complete Langchain guide: building Gen AI projects, chains, agents, memory, RAG, document loaders, and vector stores
LlamaIndex Guide Comprehensive LlamaIndex guide: data indexing, querying, retrieval, chat engines, and advanced generative AI projects
AI Agents Guide Complete guide to AI agents: CrewAI, AutoGen, Langgraph, AutoGPT, MCP (Model Context Protocol), and A2A (Agent-to-Agent) communication
GenAI Production Deployment Guide Comprehensive guide to deploying GenAI at scale: RAG architectures, multi-agent systems, hyperscaler deployment (AWS, GCP, Azure), scaling strategies, monitoring, and cost optimization
Generative AI Comprehensive Guide Complete overview of Generative AI: LLMs, LangChain, RAG, AI Agents, Vector Databases, Multi-Agent Systems, and building production-ready GenAI applications

System Design & Architecture

Guide Description
ML System Design Guide Comprehensive guide to designing scalable ML systems: requests/responses, latency, throughput, load balancing, caching, vertical/horizontal scaling, databases, replication, sharding, message queues, stateless/stateful architecture, high availability, and monitoring

Business & Communication

Guide Description
Stakeholder Communication Guide Effective communication of ML concepts, results, and business value to non-technical stakeholders
ML Model Testing Guide Comprehensive guide to testing ML models, pipelines, and APIs (unit tests, integration tests, best practices)
Ethics in ML Guide Comprehensive guide to bias, fairness, responsible AI, and ethical ML practices
Agile Data Science Guide Applying Agile methodologies (Scrum, Kanban) to data science projects: sprint planning, standups, retrospectives
Data Products Guide Building production-ready data products: APIs, dashboards, recommendation systems, real-time analytics
Enterprise Data Tools Guide Enterprise data platforms: Snowflake, Informatica, Talend, Cloudera, Stibo, Qlik, Tableau integration
Java for Data Science Guide Using Java in data science: ML libraries (Weka, Deeplearning4j), Spark integration, enterprise systems

Data Analysis Tools

Guide Description
Power BI Guide Complete Power BI guide: visualizations, DAX, Power Query, data modeling, dashboards, and AI integration
Excel Data Analysis Guide Comprehensive Excel guide: functions, pivot tables, charts, dashboards, Power Query, and advanced techniques

Career & Interview Resources

  • Career Roadmap Guide - Role-specific learning paths: Data Analyst, Data Scientist, ML Engineer, LLM Engineer, Computer Vision Engineer, AI Engineer, Data Engineer, MLOps Engineer, Research Scientist, BI Analyst
  • Career & Portfolio Guide - Build your portfolio, prepare for interviews, and launch your ML career
  • Interview Preparation Guide - ML interview questions, coding challenges, system design, and preparation strategies
  • Open Source Contribution Guide - How to contribute to open source projects in data science and ML
  • Kaggle Competitions Guide - Complete guide to participating in Kaggle competitions, strategy, and collaboration
  • Docker Complete Tutorial - Comprehensive Docker guide for ML: containerization, Dockerfile, docker-compose, best practices, and deployment

Additional Resource Files

  • Books - Recommended books for ML and data science
  • Courses - Online courses and learning platforms
  • Datasets - Curated list of datasets for practice
  • Tools - Essential tools and libraries for ML
  • YouTube Channels - Comprehensive list of free ML YouTube channels
  • Blogs & Websites - Recommended blogs, websites, and online resources
  • Practice Platforms - Platforms to practice ML, coding, and data science

Learning Tips

  1. Follow the Order: Modules are designed to build on each other. Don't skip ahead!
  2. Practice Regularly: Code along with examples and complete all exercises
  3. Build Projects: Apply what you learn by building projects
  4. Join Communities: Engage with others learning ML
  5. Be Patient: ML is complex - take your time to understand concepts
  6. Experiment: Don't just copy code - experiment and break things!

Common Questions & Learning Guide

Note: These are common questions that learners typically have when starting their ML journey. They're based on typical learning patterns and common concerns, not necessarily from actual user submissions.

Getting Started

Q: I'm a complete beginner. Where do I start?
A: Start with 00-prerequisites - specifically 01-python-basics.md. No prior experience needed! Follow the modules in order.

Q: How long will it take to complete this roadmap?
A: Realistic time estimates vary significantly based on your background and time commitment:

Time Estimates by Commitment Level

Commitment Level Hours/Week Minimum Standard Comprehensive
Full-Time 30-40 hrs 8-10 months 12-15 months 15-18 months
Part-Time 10-15 hrs 18-24 months 24-30 months 30-36 months

Definitions:

  • Minimum: Core concepts, skips some advanced topics
  • Standard: Complete coverage with all modules and projects
  • Comprehensive: Deep understanding, all projects, portfolio building

Phase-by-Phase Breakdown

Phase Focus Full-Time Part-Time
Prerequisites Python, Math 2-3 months 4-6 months
Data Fundamentals NumPy, Pandas, Visualization 2-3 months 4-6 months
ML Basics Regression, Classification, Evaluation 2-3 months 4-6 months
Advanced ML Ensembles, Feature Engineering 1-2 months 2-4 months
Deep Learning Fundamentals Neural Networks, Frameworks 2 months 4 months
Specialized Deep Learning Computer Vision, NLP, Time Series 3-4 months 6-8 months
Essential Skills SQL, Imbalanced Data, Explainability 1-2 months 2-4 months
Production & MLOps Deployment, MLOps 2-3 months 4-6 months
Projects (23 total) Hands-on Practice 4-6 months 8-12 months
Total Complete Path 12-18 months 24-30 months

Factors Affecting Timeline

Factor Impact on Timeline
Prior programming experience -2 to -4 months
Prior math background -1 to -2 months
Number of projects completed +2 to +6 months
Practice vs. reading ratio Practice takes longer but is essential

Q: Do I need a powerful computer?
A: No! Most modules work fine on a regular laptop. Deep learning modules (11-12) benefit from GPUs but can be done on cloud platforms (Google Colab, Kaggle) for free.

Q: Should I learn X before Y?
A: Generally, follow the module order. However:

  • SQL (module 19) can be learned after Phase 1 (Data Fundamentals)
  • Imbalanced Data (module 20) is best after Classification (module 4)
  • Model Explainability (module 21) is best after Advanced ML (Phase 3)

Learning Path

Q: Can I skip modules?
A: We recommend following the order, but you can:

  • Skip advanced topics files if you're pressed for time (come back later)
  • Learn SQL earlier if you need it for data access
  • Jump to projects relevant to your goals

Q: I'm a software engineer. Can I skip Python basics?
A: Review 00-prerequisites/01-python-basics.md quickly - it includes ML-specific Python concepts (time complexity, iterators, generators) that may be new.

Q: I'm a statistician. Can I skip the math modules?
A: Review them quickly - they focus on ML applications of math concepts you may know from a different angle.

Q: What if I get stuck on a concept?
A:

  1. Re-read the explanation
  2. Check the "Additional Resources" section in module READMEs
  3. Look for related topics in other modules
  4. Practice with code examples
  5. Join communities (see Resources section)

Tools and Libraries

Q: Which IDE should I use?
A: Any works! Popular choices:

  • Jupyter Notebooks: Great for learning and experimentation
  • VS Code: Excellent for larger projects, good ML extensions
  • PyCharm: Full-featured Python IDE
  • Google Colab: Free cloud-based notebooks

Q: Do I need to install everything at once?
A: No! Install packages as you need them. Each module lists required packages. Start with requirements.txt for core packages.

Q: Python 3.8, 3.9, 3.10, or 3.11?
A: Python 3.9 or 3.10 recommended. Most libraries support these versions well. Avoid 3.12+ initially as some packages may not be fully compatible yet.

Projects

Q: Should I do all projects?
A: Do at least:

  • 2-3 beginner projects (after Phase 2)
  • 3-4 intermediate projects (after Phase 4)
  • 2-3 advanced projects (after Phase 7) Focus on projects relevant to your career goals.

Q: Can I use my own datasets?
A: Absolutely! Using your own data makes projects more meaningful. Just ensure the dataset is appropriate for the project type.

Q: How long should each project take?
A: Recommended project completion by level:

Level Projects to Complete Time per Project Total Time
Beginner 3-4 projects 1-3 days 1-2 weeks
Intermediate 4-5 projects 3-7 days 3-5 weeks
Advanced 2-3 projects 1-2 weeks 2-6 weeks
Total Minimum 9-12 projects - 6-13 weeks
Recommended 15-18 projects - 10-20 weeks

Career and Job Market

Q: What jobs can I get after completing this?
A: Different roles require different module focuses:

Role Key Modules Focus Areas Est. Time
Data Scientist 0-8, 15, 19-21 Data analysis, modeling, SQL, explainability 10-14 months
ML Engineer 0-14 Full stack: modeling to deployment, MLOps 12-18 months
Research Scientist 0-12, advanced topics Deep learning, research, publications 15-20 months
Business Analyst 0-7, 19 Data analysis, SQL, business context 8-12 months
Data Engineer 0-1, 13-14, 19 Data pipelines, infrastructure, SQL 10-14 months

Q: Do I need a degree?
A: Not necessarily! Many successful ML practitioners are self-taught. However, a degree can help with:

  • Getting past HR filters
  • Research positions
  • Certain companies' requirements

Technical Questions

Q: Should I learn TensorFlow or PyTorch?
A: Both! Start with TensorFlow/Keras (easier for beginners), then learn PyTorch. Many jobs use both. Module 10 covers both.

Q: Do I need to know deep learning for most ML jobs?
A: Not always! Many roles focus on traditional ML (modules 3-8). However, deep learning (modules 9-12) is increasingly important.

Q: How important is MLOps?
A: Very important for production ML! Module 14 covers MLOps basics. Essential for ML Engineer roles, valuable for Data Scientists too.

Q: Should I learn SQL?
A: Yes! Most data science roles require SQL. Module 19 covers it comprehensively. Learn it early if you need data access skills.

Common Concerns

Q: I'm overwhelmed. What should I do?
A:

  1. Take a break
  2. Focus on one module at a time
  3. Don't try to master everything immediately
  4. Practice regularly (even 30 min/day helps)
  5. Join study groups or communities

Q: I don't understand the math. Should I continue?
A:

  1. Review the math modules (00-prerequisites) with visual resources (3Blue1Brown videos)
  2. Focus on intuition over proofs initially
  3. Use code to understand concepts
  4. Math becomes clearer as you apply it

Q: My code doesn't work. What should I do?
A:

  1. Read error messages carefully
  2. Check resources/common_errors.md
  3. Verify you're using correct library versions
  4. Search Stack Overflow with the error message
  5. Check module "Common Issues" sections

Q: How do I know if I'm ready for the next module?
A: You're ready when you can:

  • Explain the main concepts
  • Implement basic examples without copying
  • Complete exercises (even if with some help)
  • Understand most of the code examples

Contributing

Q: How can I contribute?
A: See CONTRIBUTING.md. We welcome:

  • Fixing typos and errors
  • Adding examples
  • Improving explanations
  • Adding projects
  • Translating content

Q: Can I use this content for my course/tutorial?
A: Yes! This is open source (MIT License). Please credit the repository and contributors.


Have more questions? Open an issue on GitHub or check the module-specific README files for detailed information!

Contributing

We welcome contributions! This repository is for the community, by the community. Here's how you can help:

Ways to Contribute

  • Add Projects: Share your ML projects with the community
  • Improve Documentation: Fix typos, clarify explanations, add examples
  • Create Exercises: Add practice problems and solutions
  • Report Issues: Found a bug? Let us know!
  • Suggest Features: Have an idea? Open an issue!

Quick Start for Contributors

  1. Fork the repository
  2. Clone your fork: git clone https://github.com/YOUR_USERNAME/road-to-machine-learning.git
  3. Create a branch: git checkout -b feature/amazing-feature
  4. Make your changes
  5. Commit: git commit -m 'Add amazing feature'
  6. Push: git push origin feature/amazing-feature
  7. Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.

Your contributions make this resource better for everyone!

License

This project is licensed under the MIT License - see the LICENSE file for details.

Show Your Support

If you find this repository helpful, please consider:

  • Star this repo - It helps others discover this resource
  • Fork it - Create your own learning path
  • Share it - Help others on their ML journey
  • Contribute - Add projects, fix issues, improve content

Every star and fork helps the community grow!

What You'll Achieve

By completing this roadmap, you'll be able to:

  • Build and train ML models from scratch
  • Deploy models to production
  • Work with real-world datasets
  • Understand deep learning concepts
  • Create computer vision and NLP applications
  • Implement MLOps best practices
  • Build a portfolio of ML projects

Statistics

Metric Count Details
Learning Modules 25 Modules 00-24 covering all ML topics from basics to advanced
Projects 23 6 beginner + 8 intermediate + 9 advanced with complete code and READMEs
Resource Guides 50 Cheatsheets, tutorials, and career guides
Markdown Files 207+ Comprehensive content, code examples, and exercises
Learning Time (Full-Time) 14-20 months 30-40 hours/week for comprehensive coverage
Learning Time (Part-Time) 28-35 months 10-15 hours/week for comprehensive coverage
Prerequisites Zero Start from scratch - no prior experience needed!

Disclaimer

External Links Disclaimer:

This repository contains links to external websites, courses, documentation, and resources provided for educational purposes only. The maintainers:

  • Do not endorse any specific external service or content provider
  • Are not responsible for the availability, accuracy, or content of external links
  • Cannot guarantee that external links will remain accessible or unchanged
  • Do not assume liability for any issues arising from the use of external resources

GDPR Compliance:

This repository does not collect, store, or process any personal data. It is a static educational resource. Any data processing occurs through GitHub (the platform) or external websites, which have their own privacy policies. See DISCLAIMER.md for complete disclaimer and GDPR information.


Made with ❤️ for the ML community

Happy Learning!

Remember: The journey of a thousand miles begins with a single step. Start with module 00 and keep going!

Back to Top

About

A comprehensive, step-by-step guide to learning Machine Learning from absolute basics to advanced topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages