Road to ML: From Zero to Hero

A comprehensive, step-by-step guide to learning Machine Learning from absolute basics to advanced topics

Perfect for beginners • 25 Learning Modules • 23 Real-World Projects • Production-Ready Skills

Get Started • Learning Path • Projects • Contribute

Why This Repository?

Zero to Hero Path: Complete learning journey from basics to advanced ML
Beginner-Friendly: No prior experience needed - we cover everything
Hands-On Learning: 23 practical projects to build your portfolio
Production Focus: Learn deployment, MLOps, and real-world skills
Well-Organized: Logical progression with clear learning objectives
Community-Driven: Open source, contributions welcome!

Perfect for: Students, career switchers, self-learners, and anyone wanting to master ML

Career Paths

Choose your path! This repository prepares you for multiple ML/AI careers. Select your target role to see a customized learning path:

Role	Focus	Est. Time	Key Modules	Full Guide
Data Analyst	Insights & Reports	8-12 months	00, 01, 19, 20, 21	View Guide →
Data Scientist	Predictive Models	13-20 months	00-08, 15, 19-21	View Guide →
ML Engineer	Production ML	17-26 months	00-10, 13-14, 19-21	View Guide →
LLM Engineer	Language Models	17-24 months	00-01, 05, 09-10, 12-14, 19	View Guide →
GenAI Solution Architect	Production GenAI	15-21 months	00-01, 02, 05, 09-10, 12-14, 19	View Guide →
Computer Vision Engineer	Image Processing	16-25 months	00-01, 04-05, 09-11, 13-14, 19, 21	View Guide →
AI Engineer	Generalist AI	25-38 months	00-15, 19-21, 22-24	View Guide →
Data Engineer	Data Infrastructure	14-20 months	00-01, 13-14, 19-20	View Guide →
MLOps Engineer	ML Operations	16-24 months	00-01, 05, 09-10, 13-14, 19	View Guide →
Research Scientist	Research & Innovation	24-34 months	00-12, 15, 19, 21, 22-24	View Guide →
BI Analyst	Business Intelligence	10-15 months	00-01, 19-21	View Guide →

Each role-specific guide includes:

Detailed module recommendations with priorities
Role-specific projects to build
Essential resources and tools
Skills checklist for each role
Time estimates and learning path

View Complete Career Roadmap Guide →

Overview

This repository provides a structured learning path for machine learning, organized in a logical progression from fundamentals to advanced topics. Each module includes:

Clear explanations of concepts
Hands-on exercises with solutions
Practical projects to reinforce learning
Additional resources for deeper understanding

Learning Path

Note on Module Numbering: Module numbers (00-24) don't strictly follow the learning path order. The phases below represent the recommended learning sequence. Some modules can be learned in parallel or in different orders based on your goals. See individual module READMEs for prerequisites and suggested learning order.

Time Estimates: Realistic completion time is 14-20 months (full-time, 30-40 hrs/week) or 28-35 months (part-time, 10-15 hrs/week) for comprehensive coverage including all 25 modules and 23 projects. See FAQ section for detailed breakdown.

Learning Path Overview

Phase	Modules	Focus Area	Est. Time (Full-Time)	Est. Time (Part-Time)
Phase 0	00	Foundation (Python, Math)	2-3 months	4-6 months
Phase 1	01	Data Fundamentals	2-3 months	4-6 months
Phase 2	02-05	ML Basics	2-3 months	4-6 months
Phase 3	06-07	Advanced ML	1-2 months	2-4 months
Phase 4	08	Unsupervised Learning	1 month	2 months
Phase 5	09-10	Deep Learning Fundamentals	2 months	4 months
Phase 6	11-12, 15	Specialized Deep Learning	3-4 months	6-8 months
Phase 6.5	19-21	Essential Skills (SQL, Imbalanced Data, Explainability)	1-2 months	2-4 months
Phase 7	13-14	Production & MLOps	2-3 months	4-6 months
Phase 8	16-18	Projects (23 total)	4-6 months	8-12 months
Phase 9	22-24	Advanced Specialized Topics (RL, GNNs, Audio)	2-3 months	4-6 months
Total	25 modules	Complete Path	14-20 months	28-35 months

Phase 0: Foundation (Prerequisites)

Goal: Build the mathematical and programming foundation needed for ML

00-prerequisites
- Python Basics (Variables, Data Types, Control Flow, Functions, OOP)
- Time Complexity & Algorithm Efficiency (Big O notation)
- Iterators & Generators (Memory-efficient data processing)
- GUI Development with tkinter
- Essential Mathematics (Linear Algebra including Tensors, Statistics, Calculus basics)
- Environment Setup (Python, Jupyter, Virtual Environments)

Phase 1: Data Fundamentals

Goal: Master data manipulation and visualization

01-python-for-data-science
- NumPy: Arrays, operations, broadcasting
- Pandas: DataFrames, data manipulation, cleaning
- Matplotlib & Seaborn: Data visualization
- Plotly & Dash: Interactive visualizations and web applications
- Streamlit: Building interactive dashboards and ML applications
- Flask: Web applications and REST APIs
- Tableau: Professional data visualization and dashboards
- EDA, Data Sources, Regular Expressions, Advanced Data Wrangling
- ETL with AWS RDS: Extract, Transform, Load pipelines
- Advanced Selenium Web Scraping: Chromedriver, dynamic content, Smartprix example

Phase 2: Machine Learning Basics

Goal: Understand core ML concepts and algorithms

02-introduction-to-ml
- What is Machine Learning?
- Types of ML (Supervised, Unsupervised, Reinforcement)
- ML Workflow and Best Practices
03-supervised-learning-regression
- Linear Regression
- Polynomial Regression
- Regularization (Ridge, Lasso)
- Evaluation Metrics (MSE, RMSE, MAE, R²)
- Statistical Regression Analysis (statsmodels: TSS, RSS, ESS, F-statistic, p-values, confidence intervals)
04-supervised-learning-classification
- Logistic Regression
- Decision Trees
- Random Forests
- Support Vector Machines (SVM)
- K-Nearest Neighbors (KNN)
- Naive Bayes (Gaussian, Multinomial, Bernoulli)
- Multi-Class Classification Strategies (One-vs-Rest, One-vs-One)
- Evaluation Metrics (Accuracy, Precision, Recall, F1, ROC-AUC)
05-model-evaluation-optimization
- Train/Validation/Test Split
- Cross-Validation (K-Fold, Stratified, Leave-One-Out, Time Series Split)
- Hyperparameter Tuning (Grid Search, Random Search, Bayesian Optimization with Optuna)
- Bias-Variance Tradeoff
- Learning Curves
- Overfitting and Underfitting
- Model Calibration (Probability Calibration, Platt Scaling, Isotonic Regression)

Phase 3: Advanced Supervised Learning

Goal: Explore ensemble methods and advanced techniques

06-ensemble-methods
- Bagging (Bootstrap Aggregating)
- Boosting (AdaBoost with detailed components, Gradient Boosting, XGBoost with advanced topics, LightGBM with GOSS & EFB, CatBoost)
- Stacking (detailed steps and best practices)
- Voting Classifiers
07-feature-engineering
- Feature Selection
- Feature Transformation
- Handling Categorical Variables (One-Hot, Label, Target, Frequency, WOE Encoding)
- Feature Scaling and Normalization
- Dimensionality Reduction (PCA)
- Advanced Discretization (Decision Tree-based binning, custom strategies)
- Decision Tree Visualization with dtreeviz
- Comprehensive sklearn Pipeline and ColumnTransformer Guide

Phase 4: Unsupervised Learning

Goal: Learn to work with unlabeled data

08-unsupervised-learning
- Clustering (K-Means, Hierarchical, DBSCAN)
- Dimensionality Reduction (PCA, t-SNE, SVD)
- SVD (Singular Value Decomposition) and its connection to PCA
- Anomaly Detection
- Association Rules

Phase 5: Deep Learning Fundamentals

Goal: Introduction to neural networks and deep learning

09-neural-networks-basics
- Perceptron
- Multi-Layer Perceptron (MLP)
- Activation Functions
- Backpropagation
- Gradient Descent Variants
10-deep-learning-frameworks
- TensorFlow/Keras Basics
- PyTorch Basics
- Building and Training Neural Networks
- Model Saving and Loading

Phase 6: Specialized Deep Learning

Goal: Master domain-specific deep learning applications

11-computer-vision
- Introduction to Computer Vision and Visual Cortex
- Images and Pixels (RGB, Grayscale, Color Models)
- Convolution and Edge Detection (Sobel, Canny, Prewitt)
- Padding, Strides, and Spatial Arrangement
- Convolutional Neural Networks (CNNs)
- Pooling Mechanisms (Max, Average)
- CNN Training Optimization (Batch Normalization, Dropout, Callbacks)
- CNN Architectures: LeNet, AlexNet, VGGNet, ResNet
- ImageNet and Large-Scale Recognition
- Transfer Learning
- Data Augmentation
- Object Detection (R-CNN to YOLO)
- Semantic and Instance Segmentation
- GANs for Image Generation
- Diffusion Models and Stable Diffusion (with Hugging Face integration)
- Variational Autoencoders (VAEs)
- Recent Breakthroughs (Vision Transformers, CLIP)
12-natural-language-processing
- Text Preprocessing
- Word Embeddings (Word2Vec, GloVe)
- Recurrent Neural Networks (RNNs)
- Long Short-Term Memory (LSTM)
- Transformers Basics
- Fine-tuning Transformers (T5, BERT, GPT with Hugging Face)
- RAG (Retrieval Augmented Generation)
- Sentiment Analysis
15-time-series-analysis
- Time Series Fundamentals (Trend, Seasonality, Stationarity)
- Statistical Methods (ARIMA, SARIMA, Exponential Smoothing)
- Deep Learning for Time Series (LSTM, GRU, Transformers)
- Feature Engineering for Time Series
- Evaluation and Validation
- Note: Module 15 is placed here logically but numbered after modules 13-14. It can be learned in parallel with modules 11-12 or after Phase 5.

Phase 9: Advanced Specialized Topics

Goal: Master advanced specialized ML domains

Note: These modules cover advanced topics that build on deep learning fundamentals. Learn these after completing Phase 5 (Deep Learning Fundamentals) and Phase 6 (Specialized Deep Learning).

22-reinforcement-learning
- Markov Decision Processes (MDPs)
- Value-Based Methods (Q-Learning, DQN)
- Policy-Based Methods (REINFORCE, Policy Gradients)
- Actor-Critic Methods
- Deep Reinforcement Learning
- Multi-Agent RL, Hierarchical RL, Imitation Learning
- Applications: Game Playing, Robotics, Recommendation Systems
23-graph-neural-networks
- Graph Fundamentals and Representations
- Message Passing in GNNs
- Graph Convolutional Networks (GCNs)
- Graph Attention Networks (GATs)
- GraphSAGE and Other Architectures
- Applications: Social Networks, Recommendation Systems, Molecular Analysis
24-audio-speech-processing
- Audio Signal Fundamentals (Waveforms, Spectrograms, MFCCs)
- Speech Recognition (ASR) - CTC, Attention-based, Whisper
- Text-to-Speech (TTS) - Neural TTS, Voice Cloning
- Audio Classification (Music, Events, Emotions)
- Music Generation
- Voice Processing (VAD, Speaker ID, Enhancement)

Phase 6.5: Essential Data Science Skills

Goal: Master critical skills for real-world ML applications

Note: These modules can be learned in parallel with other phases or integrated earlier in your learning journey:

SQL can be learned after Phase 1 (Data Fundamentals) for better data access skills
Imbalanced Data is most useful after Phase 2 (Classification) when you encounter real-world datasets
Model Explainability is valuable after Phase 3 (Advanced ML) when working with complex models
19-sql-database-fundamentals
- Database Fundamentals and SQL Basics
- SQL DDL and DML Commands
- SQL Joins, Subqueries, and Window Functions
- Advanced SQL Topics (CTEs, Window Functions, Stored Procedures)
- Data Cleaning with SQL
- SQL with Python Integration
- NoSQL Databases (MongoDB, Redis, Cassandra, Neo4j)
- Suggested Timing: Can be learned after Phase 1 or in parallel with Phase 2
20-handling-imbalanced-data
- Understanding Imbalanced Data Problems
- Resampling Techniques (SMOTE, ADASYN, Undersampling, Combined)
- Algorithm-Level Solutions (Class Weights, Threshold Tuning, Cost-Sensitive Learning)
- Appropriate Evaluation Metrics (PR-AUC, F1-Score)
- Complete Workflow Examples
- Best Practices and Common Pitfalls
- Suggested Timing: Best learned after Phase 2 (Classification) or Phase 3 (Advanced ML)
21-model-explainability
- Feature Importance Methods (Tree-based, Permutation)
- SHAP (SHapley Additive exPlanations) - Tree, Kernel, and Advanced
- LIME (Local Interpretable Model-agnostic Explanations)
- Partial Dependence Plots (PDP) and ICE Plots
- Model Interpretation Best Practices
- Regulatory Compliance and Ethical AI
- Suggested Timing: Best learned after Phase 3 (Advanced ML) or Phase 5 (Deep Learning)

Phase 7: Production & Deployment

Goal: Learn to deploy ML models in production

Note: Modules 13-14 are numbered before modules 15-18 but logically come after specialized deep learning topics. Learn these after you have built and trained models.

13-model-deployment
- Model Serialization
- REST APIs with Flask/FastAPI
- Docker for ML
- Cloud Deployment (AWS, GCP, Azure)
- Production Server Setup (NGINX, SSL/TLS, Domain Configuration)
- Security Best Practices (Rate Limiting, Authentication, Input Validation)
- AWS SageMaker Comprehensive Guide
- A/B Testing (Statistical Significance, Multi-Armed Bandits, Sequential Testing)
- Model Monitoring
14-mlops-basics
- Version Control for ML (DVC, MLflow)
- CI/CD for ML
- Experiment Tracking
- Model Registry
- Cookiecutter for Data Science
- Apache Kafka for Data Streaming
- Apache Spark for Big Data Processing

Phase 8: Projects

Goal: Apply knowledge through real-world projects

Note: Projects are numbered 16-18 but should be worked on throughout your learning journey. Start beginner projects after Phase 2, intermediate after Phase 3-4, and advanced after Phase 6-7.

Projects Summary

Category	Count	Prerequisites	Est. Time	Status
Beginner	6	Phases 0-2	2-3 weeks	All Available
Intermediate	8	Phases 0-4	4-6 weeks	All Available
Advanced	9	Phases 0-7	8-12 weeks	All Available
Total	23	-	4-6 months	100% Complete

Beginner Projects (6 projects)

#	Project Name	Skills	Time	Status
1	House Price Prediction	Regression, Feature Engineering, EDA	2-3 days	✓ Available
2	Iris Flower Classification	Classification, EDA, Multiple Algorithms	1 day	✓ Available
3	Titanic Survival Prediction	Classification, Data Cleaning, Feature Engineering	2-3 days	✓ Available
4	Spam Email Detection	Text Classification, NLP Basics	2-3 days	✓ Available
5	Wine Quality Prediction	Regression, Feature Engineering	2-3 days	✓ Available
6	Customer Data Dashboard with Streamlit	Data Visualization, Streamlit	3-5 days	✓ Available

Prerequisites: Complete Phases 0-2 before starting

Intermediate Projects (8 projects)

#	Project Name	Skills	Time	Status
1	Handwritten Digit Recognition (MNIST)	Neural Networks, Image Processing	3-5 days	✓ Available
2	Customer Churn Prediction	Classification, Imbalanced Data, Business Metrics	4-5 days	✓ Available
3	Movie Recommendation System	Collaborative Filtering, Content-Based	5-7 days	✓ Available
4	Credit Card Fraud Detection	Anomaly Detection, Imbalanced Data	4-5 days	✓ Available
5	Customer Segmentation	Clustering, Unsupervised Learning	3-4 days	✓ Available
6	Time Series Forecasting	Time Series Analysis, ARIMA, LSTM	5-7 days	✓ Available
7	Feature Engineering Mastery	Feature Engineering, Advanced Techniques	4-5 days	✓ Available
8	Ensemble Methods Comparison	Ensemble Methods, Model Comparison	3-4 days	✓ Available

Prerequisites: Complete Phases 0-4 and some Phase 6.5 topics recommended

Advanced Projects (9 projects)

#	Project Name	Skills	Time	Status
1	Image Classification (CIFAR-10)	CNNs, Transfer Learning, Data Augmentation	1-2 weeks	✓ Available
2	Sentiment Analysis on Reviews	NLP, RNNs/LSTMs, Transformers	1-2 weeks	✓ Available
3	Time Series Forecasting (Advanced)	Advanced Time Series, Deep Learning	1-2 weeks	✓ Available
4	Chatbot Development	NLP, Sequence-to-Sequence, Transformers	1-2 weeks	✓ Available
5	Object Detection	Computer Vision, YOLO, R-CNN	1-2 weeks	✓ Available
6	End-to-End ML Pipeline	Full ML Pipeline, MLOps	2-3 weeks	✓ Available
7	Generative Model (GAN/VAE)	GANs, VAEs, Image Generation	1-2 weeks	✓ Available
8	Model Explainability & Interpretability	SHAP, LIME, Model Interpretation	1-2 weeks	✓ Available
9	Model Deployment & Serving	Model Deployment, APIs, Cloud	1-2 weeks	✓ Available

Prerequisites: Complete Phases 0-7 recommended for full benefit

Prerequisites

Before starting, you should have:

Basic computer literacy
Willingness to learn and practice
A computer with internet connection

Note: No prior programming or math experience required! We'll cover everything you need.

Getting Started

1. Clone the Repository

git clone https://github.com/NabidAlam/road-to-machine-learning.git
cd road-to-machine-learning

2. Set Up Your Environment

Option A: Using Anaconda (Recommended for Beginners)

# Install Anaconda from https://www.anaconda.com/products/individual

# Create a new environment
conda create -n ml-env python=3.10
conda activate ml-env

# Install required packages
pip install -r requirements.txt

Option B: Using Python venv

# Create virtual environment
python -m venv ml-env

# Activate (Windows)
ml-env\Scripts\activate

# Activate (Mac/Linux)
source ml-env/bin/activate

# Install required packages
pip install -r requirements.txt

3. Install Jupyter Notebook

pip install jupyter notebook
jupyter notebook

4. Start Learning!

Quick Start Option: Want to see ML in action immediately? Check out GETTING_STARTED.md for a 30-minute first project!

Full Learning Path: Follow the modules in order:

Start with 00-prerequisites
Progress through each module sequentially
Complete exercises and projects
Practice, practice, practice!

Repository Structure

Note: All learning modules now include comprehensive detailed guides with code examples, exercises, and solutions. Beginner projects are fully available with READMEs and code. Intermediate and advanced projects have detailed READMEs with instructions.

road-to-ml/

 00-prerequisites/
    01-python-basics.md (includes time complexity, iterators/generators)
    02-linear-algebra.md
    03-statistics-probability.md
    04-calculus.md
    05-environment-setup.md
    README.md

 01-python-for-data-science/
    01-numpy.md
    02-pandas.md
    03-visualization.md (includes Plotly & Dash)
    04-exploratory-data-analysis.md
    05-data-sources-and-integration.md (includes ETL with AWS RDS, Advanced Selenium)
    06-regular-expressions-text-processing.md (includes advanced regex: lookahead, lookbehind, back references)
    07-advanced-data-wrangling.md
    08-working-with-dates-times.md
    09-streamlit-dashboards.md
    10-flask-web-development.md
    11-tableau-visualization.md
    README.md

 02-introduction-to-ml/
    introduction-to-ml.md
    ml-terminology.md
    problem-identification-algorithm-selection.md
    first-ml-project-tutorial.md
    common-pitfalls-best-practices.md
    README.md

 03-supervised-learning-regression/
    regression.md
    regression-advanced-topics.md (includes statsmodels)
    regression-project-tutorial.md
    regression-quick-reference.md
    README.md

 04-supervised-learning-classification/
    classification.md (includes dtreeviz visualization)
    classification-advanced-topics.md
    classification-project-tutorial.md
    classification-quick-reference.md
    README.md

 05-model-evaluation-optimization/
    evaluation-optimization.md
    evaluation-optimization-advanced-topics.md
    evaluation-optimization-project-tutorial.md
    evaluation-optimization-quick-reference.md
    README.md

 06-ensemble-methods/
    ensemble-methods.md
    ensemble-methods-advanced-topics.md
    ensemble-methods-project-tutorial.md
    ensemble-methods-quick-reference.md
    README.md

 07-feature-engineering/
    feature-engineering.md (includes WOE encoding, advanced discretization)
    feature-engineering-advanced-topics.md (includes sklearn Deep Dive: Estimators, Mixins, Composite Transformers, FeatureUnion)
    feature-engineering-project-tutorial.md
    feature-engineering-quick-reference.md
    README.md

 08-unsupervised-learning/
    unsupervised-learning.md
    unsupervised-learning-advanced-topics.md (includes SVD - Singular Value Decomposition with PCA connection)
    unsupervised-learning-project-tutorial.md
    unsupervised-learning-quick-reference.md
    README.md

 09-neural-networks-basics/
    neural-networks.md
    neural-networks-advanced-topics.md
    neural-networks-project-tutorial.md
    neural-networks-quick-reference.md
    README.md

 10-deep-learning-frameworks/
    deep-learning-frameworks.md
    deep-learning-frameworks-advanced-topics.md
    deep-learning-frameworks-project-tutorial.md
    deep-learning-frameworks-quick-reference.md
    README.md

 11-computer-vision/
    computer-vision.md
    computer-vision-advanced-topics.md
    computer-vision-project-tutorial.md
    computer-vision-quick-reference.md
    README.md

 12-natural-language-processing/
    nlp.md
    nlp-advanced-topics.md
    nlp-project-tutorial.md
    nlp-quick-reference.md
    README.md

 13-model-deployment/
    deployment.md (includes FastAPI advanced features: type checking, dependency injection, background tasks)
    deployment-advanced-topics.md (includes AWS SageMaker comprehensive guide)
    deployment-project-tutorial.md
    deployment-quick-reference.md
    README.md

 14-mlops-basics/
    mlops.md (includes Cookiecutter for Data Science)
    mlops-advanced-topics.md (includes Apache Kafka, Apache Spark, Feature Stores)
    mlops-project-tutorial.md
    mlops-quick-reference.md
    README.md

 15-time-series-analysis/
    time-series-analysis.md
    time-series-analysis-advanced-topics.md
    time-series-project-tutorial.md
    time-series-quick-reference.md
    README.md

 22-reinforcement-learning/
    reinforcement-learning.md
    reinforcement-learning-advanced-topics.md
    reinforcement-learning-project-tutorial.md
    reinforcement-learning-quick-reference.md
    README.md

 23-graph-neural-networks/
    graph-neural-networks.md
    graph-neural-networks-advanced-topics.md
    graph-neural-networks-project-tutorial.md
    graph-neural-networks-quick-reference.md
    README.md

 24-audio-speech-processing/
    audio-speech-processing.md
    audio-speech-processing-advanced-topics.md
    audio-speech-processing-project-tutorial.md
    audio-speech-processing-quick-reference.md
    README.md

 16-projects-beginner/
    [6 project directories with code and READMEs]
    README.md

 17-projects-intermediate/
    [8 project directories with READMEs]
    README.md

 18-projects-advanced/
    [9 project directories with READMEs]
    README.md

 19-sql-database-fundamentals/
    sql-database.md
    sql-database-advanced-topics.md (includes NoSQL: MongoDB, Redis, Cassandra, Neo4j)
    sql-database-project-tutorial.md
    sql-database-quick-reference.md
    README.md

 20-handling-imbalanced-data/
    imbalanced-data.md
    imbalanced-data-advanced-topics.md
    imbalanced-data-project-tutorial.md
    imbalanced-data-quick-reference.md
    README.md

 21-model-explainability/
    model-explainability.md
    model-explainability-advanced-topics.md
    model-explainability-project-tutorial.md
    model-explainability-quick-reference.md
    README.md

 resources/
    data_science_cheatsheet.md
    prerequisites_cheatsheet.md
    introduction_to_ml_cheatsheet.md
    mlops_cheatsheet.md
    model_deployment_cheatsheet.md
    imbalanced_data_cheatsheet.md
    model_explainability_cheatsheet.md
    git_guide.md
    math_formulas.md
    common_errors.md
    ethics_in_ml.md
    model_interpretability.md
    ml_glossary.md
    reinforcement_learning.md
    recommender_systems.md
    career_roadmap_guide.md
    career_portfolio.md
    interview_prep.md
    open_source_contribution.md
    kaggle_competitions.md
    docker_tutorial.md
    ml_model_testing.md
    stakeholder_communication.md
    automl_basics.md
    data_validation.md
    web_scraping_guide.md
    powerbi_guide.md
    excel_data_analysis_guide.md
    mlflow_comprehensive_guide.md
    transformer_fine_tuning_guide.md
    langchain_guide.md
    llamaindex_guide.md
    ai_agents_guide.md
    dsa_for_ml_guide.md
    causal_inference_guide.md
    books.md
    courses.md
    datasets.md
    tools.md
    youtube_channels.md
    blogs_websites.md
    practice_platforms.md

 requirements.txt
 LICENSE
 README.md
 CONTRIBUTING.md
 DISCLAIMER.md
 GETTING_STARTED.md
 LEARNING_ROADMAP.md
 QUICK_START.md

Resources

Books

Hands-On Machine Learning by Aurélien Géron
Pattern Recognition and Machine Learning by Christopher Bishop
Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
Introduction to Statistical Learning by James et al.

Online Courses

Machine Learning by Andrew Ng (Coursera)
Deep Learning Specialization by Andrew Ng (Coursera)
Fast.ai Practical Deep Learning
CS229: Machine Learning (Stanford)

Datasets

Communities

Cheatsheets & Guides

Core ML & Data Science Guides

Guide	Description
Data Science & ML Cheatsheet	Quick reference for daily work (NumPy, Pandas, PyTorch, TensorFlow, OpenCV, FastAPI, and more)
Prerequisites Cheatsheet	Quick reference for Python, Mathematics, and Statistics fundamentals needed for ML
Introduction to ML Cheatsheet	Quick reference for ML fundamentals, types, workflow, and key concepts
Math Formulas Reference	Essential mathematical formulas for ML (Statistics, Linear Algebra, Calculus, ML metrics)
ML Glossary	Comprehensive glossary of ML terms and concepts for beginners
Common Errors & Debugging Guide	Troubleshooting guide for common ML errors and debugging strategies
DSA for ML Guide	Essential data structures and algorithms for machine learning

Development & Tools

Guide	Description
Complete Git & GitHub Guide	Comprehensive Git tutorial with commands, outputs, practice exercises, and solutions
Docker Complete Tutorial	Comprehensive Docker guide for ML: containerization, Dockerfile, docker-compose, best practices, and deployment
Web Scraping Guide	Complete web scraping guide from basics to advanced: Requests, Beautiful Soup, Selenium, Scrapy, CAPTCHA handling

Advanced ML Topics

Guide	Description
MLFlow Comprehensive Guide	Complete MLFlow guide: experiment tracking, model registry, hyperparameter tuning, MLFlow UI, Docker deployment
MLOps Cheatsheet	Quick reference for MLOps tools, practices, and workflows (DVC, MLflow, CI/CD, monitoring)
Model Deployment Cheatsheet	Quick reference for deploying ML models (APIs, Docker, cloud platforms, A/B testing)
Imbalanced Data Cheatsheet	Quick reference for handling imbalanced datasets (resampling, class weights, metrics)
Model Explainability Cheatsheet	Quick reference for explaining ML models (SHAP, LIME, feature importance, PDP)
Transformer Fine-Tuning Guide	Comprehensive guide to fine-tuning transformers (T5, BERT, GPT) with Hugging Face
Model Interpretability Guide	Understanding and explaining ML model predictions (SHAP, LIME, feature importance)
Reinforcement Learning Basics	Introduction to RL, key concepts, algorithms, and applications
Recommender Systems Guide	Building recommendation systems (collaborative filtering, content-based, hybrid approaches)
AutoML Basics Guide	Introduction to Automated Machine Learning: when to use, popular tools, and integration strategies
Data Validation Guide	Comprehensive data validation: schema validation, quality checks, drift detection, and automated pipelines
Causal Inference Guide	Comprehensive guide to causal inference: potential outcomes, confounding, RCTs, observational methods (propensity scores, DiD, IV, RDD), causal ML, and tools (DoWhy, EconML)

Generative AI & Modern Tools

Guide	Description
RAG Comprehensive Guide	Complete guide to Retrieval Augmented Generation: architecture, components, vector databases, advanced techniques, evaluation, and production deployment
Langchain Guide	Complete Langchain guide: building Gen AI projects, chains, agents, memory, RAG, document loaders, and vector stores
LlamaIndex Guide	Comprehensive LlamaIndex guide: data indexing, querying, retrieval, chat engines, and advanced generative AI projects
AI Agents Guide	Complete guide to AI agents: CrewAI, AutoGen, Langgraph, AutoGPT, MCP (Model Context Protocol), and A2A (Agent-to-Agent) communication
GenAI Production Deployment Guide	Comprehensive guide to deploying GenAI at scale: RAG architectures, multi-agent systems, hyperscaler deployment (AWS, GCP, Azure), scaling strategies, monitoring, and cost optimization
Generative AI Comprehensive Guide	Complete overview of Generative AI: LLMs, LangChain, RAG, AI Agents, Vector Databases, Multi-Agent Systems, and building production-ready GenAI applications

System Design & Architecture

Guide	Description
ML System Design Guide	Comprehensive guide to designing scalable ML systems: requests/responses, latency, throughput, load balancing, caching, vertical/horizontal scaling, databases, replication, sharding, message queues, stateless/stateful architecture, high availability, and monitoring

Business & Communication

Guide	Description
Stakeholder Communication Guide	Effective communication of ML concepts, results, and business value to non-technical stakeholders
ML Model Testing Guide	Comprehensive guide to testing ML models, pipelines, and APIs (unit tests, integration tests, best practices)
Ethics in ML Guide	Comprehensive guide to bias, fairness, responsible AI, and ethical ML practices
Agile Data Science Guide	Applying Agile methodologies (Scrum, Kanban) to data science projects: sprint planning, standups, retrospectives
Data Products Guide	Building production-ready data products: APIs, dashboards, recommendation systems, real-time analytics
Enterprise Data Tools Guide	Enterprise data platforms: Snowflake, Informatica, Talend, Cloudera, Stibo, Qlik, Tableau integration
Java for Data Science Guide	Using Java in data science: ML libraries (Weka, Deeplearning4j), Spark integration, enterprise systems

Data Analysis Tools

Guide	Description
Power BI Guide	Complete Power BI guide: visualizations, DAX, Power Query, data modeling, dashboards, and AI integration
Excel Data Analysis Guide	Comprehensive Excel guide: functions, pivot tables, charts, dashboards, Power Query, and advanced techniques

Career & Interview Resources

Career Roadmap Guide - Role-specific learning paths: Data Analyst, Data Scientist, ML Engineer, LLM Engineer, Computer Vision Engineer, AI Engineer, Data Engineer, MLOps Engineer, Research Scientist, BI Analyst
Career & Portfolio Guide - Build your portfolio, prepare for interviews, and launch your ML career
Interview Preparation Guide - ML interview questions, coding challenges, system design, and preparation strategies
Open Source Contribution Guide - How to contribute to open source projects in data science and ML
Kaggle Competitions Guide - Complete guide to participating in Kaggle competitions, strategy, and collaboration
Docker Complete Tutorial - Comprehensive Docker guide for ML: containerization, Dockerfile, docker-compose, best practices, and deployment

Additional Resource Files

Books - Recommended books for ML and data science
Courses - Online courses and learning platforms
Datasets - Curated list of datasets for practice
Tools - Essential tools and libraries for ML
YouTube Channels - Comprehensive list of free ML YouTube channels
Blogs & Websites - Recommended blogs, websites, and online resources
Practice Platforms - Platforms to practice ML, coding, and data science

Learning Tips

Follow the Order: Modules are designed to build on each other. Don't skip ahead!
Practice Regularly: Code along with examples and complete all exercises
Build Projects: Apply what you learn by building projects
Join Communities: Engage with others learning ML
Be Patient: ML is complex - take your time to understand concepts
Experiment: Don't just copy code - experiment and break things!

Common Questions & Learning Guide

Note: These are common questions that learners typically have when starting their ML journey. They're based on typical learning patterns and common concerns, not necessarily from actual user submissions.

Getting Started

Q: I'm a complete beginner. Where do I start?
A: Start with 00-prerequisites - specifically 01-python-basics.md. No prior experience needed! Follow the modules in order.

Q: How long will it take to complete this roadmap?
A: Realistic time estimates vary significantly based on your background and time commitment:

Time Estimates by Commitment Level

Commitment Level	Hours/Week	Minimum	Standard	Comprehensive
Full-Time	30-40 hrs	8-10 months	12-15 months	15-18 months
Part-Time	10-15 hrs	18-24 months	24-30 months	30-36 months

Definitions:

Minimum: Core concepts, skips some advanced topics
Standard: Complete coverage with all modules and projects
Comprehensive: Deep understanding, all projects, portfolio building

Phase-by-Phase Breakdown

Phase	Focus	Full-Time	Part-Time
Prerequisites	Python, Math	2-3 months	4-6 months
Data Fundamentals	NumPy, Pandas, Visualization	2-3 months	4-6 months
ML Basics	Regression, Classification, Evaluation	2-3 months	4-6 months
Advanced ML	Ensembles, Feature Engineering	1-2 months	2-4 months
Deep Learning Fundamentals	Neural Networks, Frameworks	2 months	4 months
Specialized Deep Learning	Computer Vision, NLP, Time Series	3-4 months	6-8 months
Essential Skills	SQL, Imbalanced Data, Explainability	1-2 months	2-4 months
Production & MLOps	Deployment, MLOps	2-3 months	4-6 months
Projects (23 total)	Hands-on Practice	4-6 months	8-12 months
Total	Complete Path	12-18 months	24-30 months

Factors Affecting Timeline

Factor	Impact on Timeline
Prior programming experience	-2 to -4 months
Prior math background	-1 to -2 months
Number of projects completed	+2 to +6 months
Practice vs. reading ratio	Practice takes longer but is essential

Q: Do I need a powerful computer?
A: No! Most modules work fine on a regular laptop. Deep learning modules (11-12) benefit from GPUs but can be done on cloud platforms (Google Colab, Kaggle) for free.

Q: Should I learn X before Y?
A: Generally, follow the module order. However:

SQL (module 19) can be learned after Phase 1 (Data Fundamentals)
Imbalanced Data (module 20) is best after Classification (module 4)
Model Explainability (module 21) is best after Advanced ML (Phase 3)

Learning Path

Q: Can I skip modules?
A: We recommend following the order, but you can:

Skip advanced topics files if you're pressed for time (come back later)
Learn SQL earlier if you need it for data access
Jump to projects relevant to your goals

Q: I'm a software engineer. Can I skip Python basics?
A: Review 00-prerequisites/01-python-basics.md quickly - it includes ML-specific Python concepts (time complexity, iterators, generators) that may be new.

Q: I'm a statistician. Can I skip the math modules?
A: Review them quickly - they focus on ML applications of math concepts you may know from a different angle.

Q: What if I get stuck on a concept?
A:

Re-read the explanation
Check the "Additional Resources" section in module READMEs
Look for related topics in other modules
Practice with code examples
Join communities (see Resources section)

Tools and Libraries

Q: Which IDE should I use?
A: Any works! Popular choices:

Jupyter Notebooks: Great for learning and experimentation
VS Code: Excellent for larger projects, good ML extensions
PyCharm: Full-featured Python IDE
Google Colab: Free cloud-based notebooks

Q: Do I need to install everything at once?
A: No! Install packages as you need them. Each module lists required packages. Start with requirements.txt for core packages.

Q: Python 3.8, 3.9, 3.10, or 3.11?
A: Python 3.9 or 3.10 recommended. Most libraries support these versions well. Avoid 3.12+ initially as some packages may not be fully compatible yet.

Projects

Q: Should I do all projects?
A: Do at least:

2-3 beginner projects (after Phase 2)
3-4 intermediate projects (after Phase 4)
2-3 advanced projects (after Phase 7) Focus on projects relevant to your career goals.

Q: Can I use my own datasets?
A: Absolutely! Using your own data makes projects more meaningful. Just ensure the dataset is appropriate for the project type.

Q: How long should each project take?
A: Recommended project completion by level:

Level	Projects to Complete	Time per Project	Total Time
Beginner	3-4 projects	1-3 days	1-2 weeks
Intermediate	4-5 projects	3-7 days	3-5 weeks
Advanced	2-3 projects	1-2 weeks	2-6 weeks
Total Minimum	9-12 projects	-	6-13 weeks
Recommended	15-18 projects	-	10-20 weeks

Career and Job Market

Q: What jobs can I get after completing this?
A: Different roles require different module focuses:

Role	Key Modules	Focus Areas	Est. Time
Data Scientist	0-8, 15, 19-21	Data analysis, modeling, SQL, explainability	10-14 months
ML Engineer	0-14	Full stack: modeling to deployment, MLOps	12-18 months
Research Scientist	0-12, advanced topics	Deep learning, research, publications	15-20 months
Business Analyst	0-7, 19	Data analysis, SQL, business context	8-12 months
Data Engineer	0-1, 13-14, 19	Data pipelines, infrastructure, SQL	10-14 months

Q: Do I need a degree?
A: Not necessarily! Many successful ML practitioners are self-taught. However, a degree can help with:

Getting past HR filters
Research positions
Certain companies' requirements

Technical Questions

Q: Should I learn TensorFlow or PyTorch?
A: Both! Start with TensorFlow/Keras (easier for beginners), then learn PyTorch. Many jobs use both. Module 10 covers both.

Q: Do I need to know deep learning for most ML jobs?
A: Not always! Many roles focus on traditional ML (modules 3-8). However, deep learning (modules 9-12) is increasingly important.

Q: How important is MLOps?
A: Very important for production ML! Module 14 covers MLOps basics. Essential for ML Engineer roles, valuable for Data Scientists too.

Q: Should I learn SQL?
A: Yes! Most data science roles require SQL. Module 19 covers it comprehensively. Learn it early if you need data access skills.

Common Concerns

Q: I'm overwhelmed. What should I do?
A:

Take a break
Focus on one module at a time
Don't try to master everything immediately
Practice regularly (even 30 min/day helps)
Join study groups or communities

Q: I don't understand the math. Should I continue?
A:

Review the math modules (00-prerequisites) with visual resources (3Blue1Brown videos)
Focus on intuition over proofs initially
Use code to understand concepts
Math becomes clearer as you apply it

Q: My code doesn't work. What should I do?
A:

Read error messages carefully
Check resources/common_errors.md
Verify you're using correct library versions
Search Stack Overflow with the error message
Check module "Common Issues" sections

Q: How do I know if I'm ready for the next module?
A: You're ready when you can:

Explain the main concepts
Implement basic examples without copying
Complete exercises (even if with some help)
Understand most of the code examples

Contributing

Q: How can I contribute?
A: See CONTRIBUTING.md. We welcome:

Fixing typos and errors
Adding examples
Improving explanations
Adding projects
Translating content

Q: Can I use this content for my course/tutorial?
A: Yes! This is open source (MIT License). Please credit the repository and contributors.

Have more questions? Open an issue on GitHub or check the module-specific README files for detailed information!

Contributing

We welcome contributions! This repository is for the community, by the community. Here's how you can help:

Ways to Contribute

Add Projects: Share your ML projects with the community
Improve Documentation: Fix typos, clarify explanations, add examples
Create Exercises: Add practice problems and solutions
Report Issues: Found a bug? Let us know!
Suggest Features: Have an idea? Open an issue!

Quick Start for Contributors

Fork the repository
Clone your fork: git clone https://github.com/YOUR_USERNAME/road-to-machine-learning.git
Create a branch: git checkout -b feature/amazing-feature
Make your changes
Commit: git commit -m 'Add amazing feature'
Push: git push origin feature/amazing-feature
Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.

Your contributions make this resource better for everyone!

License

This project is licensed under the MIT License - see the LICENSE file for details.

Show Your Support

If you find this repository helpful, please consider:

Star this repo - It helps others discover this resource
Fork it - Create your own learning path
Share it - Help others on their ML journey
Contribute - Add projects, fix issues, improve content

Every star and fork helps the community grow!

What You'll Achieve

By completing this roadmap, you'll be able to:

Build and train ML models from scratch
Deploy models to production
Work with real-world datasets
Understand deep learning concepts
Create computer vision and NLP applications
Implement MLOps best practices
Build a portfolio of ML projects

Statistics

Metric	Count	Details
Learning Modules	25	Modules 00-24 covering all ML topics from basics to advanced
Projects	23	6 beginner + 8 intermediate + 9 advanced with complete code and READMEs
Resource Guides	50	Cheatsheets, tutorials, and career guides
Markdown Files	207+	Comprehensive content, code examples, and exercises
Learning Time (Full-Time)	14-20 months	30-40 hours/week for comprehensive coverage
Learning Time (Part-Time)	28-35 months	10-15 hours/week for comprehensive coverage
Prerequisites	Zero	Start from scratch - no prior experience needed!

Disclaimer

External Links Disclaimer:

This repository contains links to external websites, courses, documentation, and resources provided for educational purposes only. The maintainers:

Do not endorse any specific external service or content provider
Are not responsible for the availability, accuracy, or content of external links
Cannot guarantee that external links will remain accessible or unchanged
Do not assume liability for any issues arising from the use of external resources

GDPR Compliance:

This repository does not collect, store, or process any personal data. It is a static educational resource. Any data processing occurs through GitHub (the platform) or external websites, which have their own privacy policies. See DISCLAIMER.md for complete disclaimer and GDPR information.

Made with ❤️ for the ML community

Happy Learning!

Remember: The journey of a thousand miles begins with a single step. Start with module 00 and keep going!

Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
.github		.github
00-prerequisites		00-prerequisites
01-python-for-data-science		01-python-for-data-science
02-introduction-to-ml		02-introduction-to-ml
03-supervised-learning-regression		03-supervised-learning-regression
04-supervised-learning-classification		04-supervised-learning-classification
05-model-evaluation-optimization		05-model-evaluation-optimization
06-ensemble-methods		06-ensemble-methods
07-feature-engineering		07-feature-engineering
08-unsupervised-learning		08-unsupervised-learning
09-neural-networks-basics		09-neural-networks-basics
10-deep-learning-frameworks		10-deep-learning-frameworks
11-computer-vision		11-computer-vision
12-natural-language-processing		12-natural-language-processing
13-model-deployment		13-model-deployment
14-mlops-basics		14-mlops-basics
15-time-series-analysis		15-time-series-analysis
16-projects-beginner		16-projects-beginner
17-projects-intermediate		17-projects-intermediate
18-projects-advanced		18-projects-advanced
19-sql-database-fundamentals		19-sql-database-fundamentals
20-handling-imbalanced-data		20-handling-imbalanced-data
21-model-explainability		21-model-explainability
22-reinforcement-learning		22-reinforcement-learning
23-graph-neural-networks		23-graph-neural-networks
24-audio-speech-processing		24-audio-speech-processing
resources		resources
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DISCLAIMER.md		DISCLAIMER.md
GETTING_STARTED.md		GETTING_STARTED.md
LEARNING_ROADMAP.md		LEARNING_ROADMAP.md
LICENSE		LICENSE
QUICK_START.md		QUICK_START.md
README.md		README.md
requirements.txt		requirements.txt

License

NabidAlam/road-to-machine-learning

Folders and files

Latest commit

History

Repository files navigation

Road to ML: From Zero to Hero

Why This Repository?

Career Paths

Table of Contents

Overview

Learning Path

Learning Path Overview

Phase 0: Foundation (Prerequisites)

Phase 1: Data Fundamentals

Phase 2: Machine Learning Basics

Phase 3: Advanced Supervised Learning

Phase 4: Unsupervised Learning

Phase 5: Deep Learning Fundamentals

Phase 6: Specialized Deep Learning

Phase 9: Advanced Specialized Topics

Phase 6.5: Essential Data Science Skills

Phase 7: Production & Deployment

Phase 8: Projects

Projects Summary

Beginner Projects (6 projects)

Intermediate Projects (8 projects)

Advanced Projects (9 projects)

Prerequisites

Getting Started

1. Clone the Repository

2. Set Up Your Environment

Option A: Using Anaconda (Recommended for Beginners)

Option B: Using Python venv

3. Install Jupyter Notebook

4. Start Learning!

Repository Structure

Resources

Books

Online Courses

Datasets

Communities

Cheatsheets & Guides

Core ML & Data Science Guides

Development & Tools

Advanced ML Topics

Generative AI & Modern Tools

System Design & Architecture

Business & Communication

Data Analysis Tools

Career & Interview Resources

Additional Resource Files

Learning Tips

Common Questions & Learning Guide

Getting Started

Time Estimates by Commitment Level

Phase-by-Phase Breakdown

Factors Affecting Timeline

Learning Path

Tools and Libraries

Projects

Career and Job Market

Technical Questions

Common Concerns

Contributing

Contributing

Ways to Contribute

Quick Start for Contributors

License

Show Your Support

What You'll Achieve

Statistics

Disclaimer

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Packages