📊 Mathematics for Data Science & Machine Learning

A comprehensive reference guide covering all essential mathematical formulas used in Data Science and Machine Learning

📑 Table of Contents

Linear Algebra
Calculus
Probability
Statistics
Linear & Logistic Regression
Neural Networks
Optimization Algorithms
Evaluation Metrics
Clustering
Deep Learning
Dimensionality Reduction
Information Theory
Support Vector Machines
Decision Trees & Ensembles
Bias-Variance Tradeoff

1. Linear Algebra

🔢 Vectors and Matrices

Formula	Expression
Dot Product	`a · b = Σ(aᵢbᵢ) = a₁b₁ + a₂b₂ + ... + aₙbₙ`
Matrix Multiplication	`C = AB where Cᵢⱼ = Σₖ(AᵢₖBₖⱼ)`
Transpose	`(AB)ᵀ = BᵀAᵀ`
Identity Matrix	`AI = IA = A`
Inverse Matrix	`AA⁻¹ = A⁻¹A = I`

📏 Norms

Norm Type	Formula
L1 Norm (Manhattan)	`‖x‖₁ = Σ\|xᵢ\|`
L2 Norm (Euclidean)	`‖x‖₂ = √(Σxᵢ²)`
Frobenius Norm	`‖A‖F = √(ΣᵢΣⱼ aᵢⱼ²)`

🎯 Eigenvalues and Eigenvectors

Eigenvalue Equation:     Av = λv
Characteristic Equation: det(A - λI) = 0
Trace:                   tr(A) = Σλᵢ = Σaᵢᵢ
Determinant:             det(A) = Πλᵢ

2. Calculus

📐 Basic Derivatives

Rule	Formula
Power Rule	`d/dx(xⁿ) = nxⁿ⁻¹`
Chain Rule	`d/dx[f(g(x))] = f'(g(x)) · g'(x)`
Product Rule	`d/dx[f(x)g(x)] = f'(x)g(x) + f(x)g'(x)`
Quotient Rule	`d/dx[f(x)/g(x)] = [f'(x)g(x) - f(x)g'(x)]/g(x)²`

∇ Partial Derivatives

Gradient:        ∇f = [∂f/∂x₁, ∂f/∂x₂, ..., ∂f/∂xₙ]
Hessian Matrix:  Hᵢⱼ = ∂²f/(∂xᵢ∂xⱼ)
Jacobian Matrix: Jᵢⱼ = ∂fᵢ/∂xⱼ

🧮 Common Activation Function Derivatives

Function	Derivative
Sigmoid	`σ'(x) = σ(x)(1 - σ(x))`
Tanh	`tanh'(x) = 1 - tanh²(x)`
ReLU	`ReLU'(x) = {1 if x > 0, 0 otherwise}`
Exponential	`d/dx(eˣ) = eˣ`
Logarithm	`d/dx(ln x) = 1/x`

3. Probability

🎲 Basic Probability Rules

Probability:              P(A) = (Favorable outcomes)/(Total outcomes)
Complement Rule:          P(Aᶜ) = 1 - P(A)
Addition Rule:            P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
Multiplication Rule:      P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A)
Conditional Probability:  P(A|B) = P(A ∩ B)/P(B)

🔮 Bayes' Theorem

Bayes' Rule:      P(A|B) = [P(B|A)P(A)]/P(B)

Extended Form:    P(A|B) = [P(B|A)P(A)]/[P(B|A)P(A) + P(B|Aᶜ)P(Aᶜ)]

💡 Key Application: Fundamental in ML for classification, spam detection, and probabilistic reasoning

📊 Expected Value & Variance

Concept	Formula
Expected Value	`E[X] = Σ xᵢP(xᵢ)` or `∫ xf(x)dx`
Variance	`Var(X) = E[(X - μ)²] = E[X²] - (E[X])²`
Standard Deviation	`σ = √Var(X)`
Covariance	`Cov(X,Y) = E[(X - μₓ)(Y - μᵧ)]`
Correlation	`ρ(X,Y) = Cov(X,Y)/(σₓσᵧ)`

4. Statistics

📈 Descriptive Statistics

Measure	Formula
Mean	`μ = (1/n)Σxᵢ`
Sample Variance	`s² = [1/(n-1)]Σ(xᵢ - x̄)²`
Population Variance	`σ² = (1/n)Σ(xᵢ - μ)²`
Standard Error	`SE = σ/√n`

🎯 Hypothesis Testing

Z-score:             z = (x - μ)/σ
T-statistic:         t = (x̄ - μ)/(s/√n)
Confidence Interval: CI = x̄ ± z(α/2) · SE

Key Terms:

Type I Error (α): Rejecting true null hypothesis
Type II Error (β): Failing to reject false null hypothesis
Power: 1 - β

5. Linear & Logistic Regression

📉 Linear Regression

Component	Formula
Simple Model	`y = β₀ + β₁x + ε`
Matrix Form	`y = Xβ + ε`
Normal Equation	`β = (XᵀX)⁻¹Xᵀy`
Predicted Values	`ŷ = Xβ`

📊 Loss Functions

Mean Squared Error (MSE):     MSE = (1/n)Σ(yᵢ - ŷᵢ)²
Root Mean Squared Error:      RMSE = √MSE
Mean Absolute Error (MAE):    MAE = (1/n)Σ|yᵢ - ŷᵢ|
R-squared:                    R² = 1 - [Σ(yᵢ - ŷᵢ)²]/[Σ(yᵢ - ȳ)²]
Adjusted R²:                  R²adj = 1 - [(1 - R²)(n - 1)]/(n - p - 1)

🔄 Logistic Regression

Component	Formula
Sigmoid Function	`σ(z) = 1/(1 + e⁻ᶻ)`
Logit	`z = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ`
Probability	`P(y=1\|x) = σ(wᵀx + b)`
Odds	`Odds = P(y=1)/P(y=0) = eᶻ`
Log-Odds	`log(Odds) = z`

Log Loss (Binary Cross-Entropy):
L = -(1/n)Σ[yᵢlog(ŷᵢ) + (1-yᵢ)log(1-ŷᵢ)]

🛡️ Regularization

Type	Cost Function
Ridge (L2)	`J(β) = Σ(yᵢ - ŷᵢ)² + λΣβⱼ²`
Lasso (L1)	`J(β) = Σ(yᵢ - ŷᵢ)² + λΣ\|βⱼ\|`
Elastic Net	`J(β) = Σ(yᵢ - ŷᵢ)² + λ₁Σ\|βⱼ\| + λ₂Σβⱼ²`

6. Neural Networks

⚡ Activation Functions

Function	Formula
Sigmoid	`σ(x) = 1/(1 + e⁻ˣ)`
Tanh	`tanh(x) = (eˣ - e⁻ˣ)/(eˣ + e⁻ˣ)`
ReLU	`ReLU(x) = max(0, x)`
Leaky ReLU	`f(x) = {x if x > 0, αx otherwise}`
Softmax	`softmax(xᵢ) = eˣⁱ/Σⱼeˣʲ`

🔄 Forward Propagation

Linear Combination:  z⁽ˡ⁾ = W⁽ˡ⁾a⁽ˡ⁻¹⁾ + b⁽ˡ⁾
Activation:          a⁽ˡ⁾ = g(z⁽ˡ⁾)

⬅️ Backpropagation

Output Layer Error:   δ⁽ᴸ⁾ = (a⁽ᴸ⁾ - y) ⊙ g'(z⁽ᴸ⁾)
Hidden Layer Error:   δ⁽ˡ⁾ = [(W⁽ˡ⁺¹⁾)ᵀδ⁽ˡ⁺¹⁾] ⊙ g'(z⁽ˡ⁾)
Weight Gradient:      ∂L/∂W⁽ˡ⁾ = δ⁽ˡ⁾(a⁽ˡ⁻¹⁾)ᵀ
Bias Gradient:        ∂L/∂b⁽ˡ⁾ = δ⁽ˡ⁾

7. Optimization Algorithms

⬇️ Gradient Descent

Variant	Update Rule
Batch GD	`θ := θ - α∇J(θ)`
Stochastic GD	`θ := θ - α∇J(θ; xⁱ, yⁱ)`
Mini-batch GD	Uses small batches of data

🚀 Advanced Optimizers

Momentum:
  v := βv + (1-β)∇J(θ)
  θ := θ - αv

RMSprop:
  s := βs + (1-β)(∇J)²
  θ := θ - α∇J/√(s + ε)

Adam:
  m := β₁m + (1-β₁)∇J
  v := β₂v + (1-β₂)(∇J)²
  θ := θ - α·m̂/√(v̂ + ε)

8. Evaluation Metrics

✅ Classification Metrics

Metric	Formula
Accuracy	`(TP + TN)/(TP + TN + FP + FN)`
Precision	`TP/(TP + FP)`
Recall (Sensitivity)	`TP/(TP + FN)`
Specificity	`TN/(TN + FP)`
F1-Score	`2·(Precision·Recall)/(Precision + Recall)`
F-beta Score	`(1 + β²)·(Precision·Recall)/(β²·Precision + Recall)`

📉 Confusion Matrix

                    Predicted
                 Positive  Negative
Actual Positive     TP        FN
       Negative     FP        TN

Legend:

TP = True Positive
TN = True Negative
FP = False Positive (Type I Error)
FN = False Negative (Type II Error)

📈 ROC Curve

TPR (True Positive Rate):  TP/(TP + FN)
FPR (False Positive Rate): FP/(FP + TN)
AUC (Area Under Curve):    ∫₀¹ TPR(FPR⁻¹(x))dx

9. Clustering

🎯 K-Means

Objective Function:  minimize Σᵏⱼ₌₁ Σₓ∈Cⱼ ||x - μⱼ||²
Centroid Update:     μⱼ = (1/|Cⱼ|)Σₓ∈Cⱼ x

📏 Distance Metrics

Metric	Formula
Euclidean	`d(x, y) = √(Σᵢ(xᵢ - yᵢ)²)`
Manhattan	`d(x, y) = Σᵢ\|xᵢ - yᵢ\|`
Cosine Similarity	`cos(θ) = (x·y)/(‖x‖·‖y‖)`

📊 Silhouette Score

s(i) = [b(i) - a(i)]/max{a(i), b(i)}

Where:
  a(i) = mean distance to points in same cluster
  b(i) = mean distance to points in nearest cluster

10. Deep Learning

🔄 Batch Normalization

Normalize:        x̂ = (x - μ_B)/√(σ²_B + ε)
Scale and Shift:  y = γx̂ + β

🧠 Convolutional Neural Networks (CNN)

Convolution Operation:  (f * g)(t) = Σₓ f(x)g(t - x)
Output Size:            O = [(W - K + 2P)/S] + 1

Where:
  W = input size
  K = kernel size
  P = padding
  S = stride

🔁 Recurrent Neural Networks (RNN)

Hidden State:  hₜ = tanh(Wₓₕxₜ + Wₕₕhₜ₋₁ + bₕ)
Output:        yₜ = Wₕᵧhₜ + bᵧ

🧬 Long Short-Term Memory (LSTM)

Forget Gate:  fₜ = σ(Wf·[hₜ₋₁, xₜ] + bf)
Input Gate:   iₜ = σ(Wᵢ·[hₜ₋₁, xₜ] + bᵢ)
Output Gate:  oₜ = σ(Wo·[hₜ₋₁, xₜ] + bo)
Cell State:   Cₜ = fₜ ⊙ Cₜ₋₁ + iₜ ⊙ C̃ₜ

💧 Dropout

Training:  Output = mask ⊙ activation / (1 - p)
Testing:   Use all neurons (no dropout)

11. Dimensionality Reduction

📊 Principal Component Analysis (PCA)

Covariance Matrix:          Σ = (1/n)XᵀX
Principal Components:       Eigenvectors of Σ
Explained Variance Ratio:   λᵢ/Σⱼλⱼ
Projection:                 Z = XW (W = eigenvectors)

🔢 Singular Value Decomposition (SVD)

Decomposition:   X = UΣVᵀ
Reduced Form:    X ≈ UₖΣₖVₖᵀ

12. Information Theory

📡 Entropy and Information

Measure	Formula
Entropy	`H(X) = -Σ P(xᵢ)log₂P(xᵢ)`
Cross-Entropy	`H(p,q) = -Σ p(x)log q(x)`
KL Divergence	`DKL(P‖Q) = Σ P(x)log[P(x)/Q(x)]`
Mutual Information	`I(X;Y) = H(X) - H(X\|Y) = H(Y) - H(Y\|X)`
Conditional Entropy	`H(Y\|X) = -ΣₓΣᵧ P(x,y)log P(y\|x)`

13. Support Vector Machines

🎯 Linear SVM

Decision Function:  f(x) = wᵀx + b
Margin:             2/||w||
Optimization:       minimize ½||w||²
                    subject to yᵢ(wᵀxᵢ + b) ≥ 1

🛡️ Soft Margin SVM

Objective:   minimize ½||w||² + CΣξᵢ
Constraint:  yᵢ(wᵀxᵢ + b) ≥ 1 - ξᵢ, ξᵢ ≥ 0

🔧 Kernel Functions

Kernel	Formula
Linear	`K(x, x') = xᵀx'`
Polynomial	`K(x, x') = (xᵀx' + c)ᵈ`
RBF (Gaussian)	`K(x, x') = exp(-γ‖x - x'‖²)`
Sigmoid	`K(x, x') = tanh(αxᵀx' + c)`

14. Decision Trees & Ensemble Methods

🌳 Impurity Measures

Measure	Formula
Gini Impurity	`Gini = 1 - Σᵢ pᵢ²`
Entropy	`H = -Σᵢ pᵢlog₂(pᵢ)`
Classification Error	`E = 1 - max(pᵢ)`

Information Gain:  IG(D, A) = H(D) - Σᵥ[(|Dᵥ|/|D|)·H(Dᵥ)]

🌲 Ensemble Methods

Bagging (Random Forest):
  Prediction: ŷ = (1/B)Σᵇ fᵇ(x)

AdaBoost:
  Sample Weight:  wᵢ⁽ᵗ⁺¹⁾ = wᵢ⁽ᵗ⁾·exp[αₜ·I(yᵢ ≠ hₜ(xᵢ))]
  Model Weight:   αₜ = ½ln[(1 - εₜ)/εₜ]

Gradient Boosting:
  Update:   Fₘ(x) = Fₘ₋₁(x) + γₘhₘ(x)
  Residual: rᵢₘ = -[∂L(yᵢ, F(xᵢ))/∂F(xᵢ)]

15. Bias-Variance Tradeoff

⚖️ Error Decomposition

Total Error = Bias² + Variance + Irreducible Error

Bias:     Bias = E[ŷ] - y
Variance: Variance = E[(ŷ - E[ŷ])²]

Key Insights:

High Bias → Underfitting (model too simple)
High Variance → Overfitting (model too complex)
Goal: Find the optimal balance between bias and variance

🎯 Quick Reference: Loss Functions

Regression Losses

Loss	Formula	Use Case
MSE	`L = (1/n)Σ(yᵢ - ŷᵢ)²`	Standard regression
MAE	`L = (1/n)Σ\|yᵢ - ŷᵢ\|`	Robust to outliers
Huber	`L = {½(y - ŷ)² if \|y - ŷ\| ≤ δ, δ\|y - ŷ\| - ½δ² otherwise}`	Combines MSE & MAE

Classification Losses

Loss	Formula	Use Case
Binary Cross-Entropy	`L = -[ylog(ŷ) + (1-y)log(1-ŷ)]`	Binary classification
Categorical Cross-Entropy	`L = -Σᵢ yᵢlog(ŷᵢ)`	Multi-class classification
Hinge Loss	`L = max(0, 1 - y·ŷ)`	SVM classification

🎓 Feature Engineering

Normalization Techniques

Method	Formula	Range
Min-Max Scaling	`x' = (x - min)/(max - min)`	[0, 1]
Z-Score Normalization	`x' = (x - μ)/σ`	~ [-3, 3]
Max Abs Scaling	`x' = x/\|max\|`	[-1, 1]

Polynomial Features

Degree 2: [x₁, x₂] → [1, x₁, x₂, x₁², x₁x₂, x₂²]
Degree 3: [x₁, x₂] → [1, x₁, x₂, x₁², x₁x₂, x₂², x₁³, x₁²x₂, x₁x₂², x₂³]

🌟 Contributing

Contributions are welcome! If you find any errors or want to add more formulas:

Fork the repository
Create a new branch (git checkout -b feature/new-formulas)
Make your changes
Commit your changes (git commit -am 'Add new formulas')
Push to the branch (git push origin feature/new-formulas)
Create a Pull Request

📚 Resources

🎓 Free Online Courses & Video Lectures

Khan Academy - Linear Algebra - Comprehensive free course on vectors, matrices, and transformations
Khan Academy - Multivariable Calculus - Free course covering derivatives, integrals in multiple dimensions
Khan Academy - Statistics & Probability - Foundation in probability, random variables, and statistical inference
3Blue1Brown - Essence of Linear Algebra - Visual, intuitive video series on linear algebra concepts
3Blue1Brown - Essence of Calculus - Beautiful visual explanations of calculus fundamentals
MIT OCW - Linear Algebra (18.06) - Prof. Gilbert Strang's legendary MIT course
MIT OCW - Single Variable Calculus - Complete calculus course with problem sets
MIT OCW - Multivariable Calculus - Extends calculus to multiple variables
MIT OCW - Matrix Methods in Data Analysis - Linear algebra for ML applications
StatQuest with Josh Starmer - Clear explanations of statistics and ML concepts
Coursera - Mathematics for Machine Learning Specialization - Free to audit, covers linear algebra, calculus, and PCA
Coursera - Data Science Math Skills - Free to audit, basics for data science
NPTEL - Essential Mathematics for Machine Learning - Indian university course on ML math
NPTEL - Probability and Statistics - Comprehensive probability course
CMU Open Learning - Probability & Statistics - Self-paced interactive course

📖 Free Textbooks & Reading Materials

Immersive Linear Algebra - Interactive linear algebra textbook with visualizations
Linear Algebra Done Right - Free online version of the popular textbook
Introduction to Probability - Harvard's Stat 110 course materials
Think Stats - Free book on probability and statistics for programmers
Seeing Theory - Visual introduction to probability and statistics
Statistics Done Wrong - Learn common statistical mistakes to avoid

💻 Open Source GitHub Repositories

Awesome Data Science - Comprehensive open-source Data Science repository to learn and apply towards solving real-world problems
Awesome Machine Learning - Curated list of awesome ML frameworks, libraries and software (by language)
Data Science For Beginners by Microsoft - 10-week, 20-lesson curriculum with quizzes and hands-on projects
ML For Beginners by Microsoft - 12-week, 26-lesson curriculum for Machine Learning beginners
Python Data Science Handbook - Full text in Jupyter Notebooks covering NumPy, Pandas, Matplotlib, Scikit-Learn
Data Science Masters - Open-source curriculum for aspiring data scientists
OSSU Data Science - Path to free self-taught education in Data Science
Homemade Machine Learning - Python implementations of ML algorithms with math explanations
ML YouTube Courses - Curated index of quality ML courses on YouTube from Stanford, MIT, Caltech
Applied ML - Papers and tech blogs on real-world ML applications in production
Awesome Production ML - Libraries to deploy, monitor, version, scale and secure ML models
DS Cheatsheets - Collection of Data Science cheatsheets from Python to Big Data
Free Data Science Books - List of free books for Data Science and Big Data concepts
Data Science Learning Resources - Curated collection of data science and ML resources

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

💡 Tips for Using This Guide

Bookmark this page for quick reference during ML projects
Print specific sections you use frequently
Practice implementing these formulas in code
Understand the intuition behind each formula, not just memorization
Share with fellow ML practitioners and students

🤝 Acknowledgments

Created with ❤️ for the Machine Learning community. Special thanks to all contributors and the open-source ML community.

⭐ Star This Repo!

If you find this helpful, please consider giving it a star! It helps others discover this resource.

Created and Maintained by

@iNSRawat

Creator & Maintainer

🤝 Want to Contribute?

We welcome contributions! If you have:

Formula corrections or additions
Documentation improvements
Feature suggestions
Bug reports

Please feel free to:

Fork the repository
Make your changes
Submit a pull request

📬 Connect

💼 LinkedIn: Connect with me
🐦 Twitter: @iNSRawat
📧 GitHub: Open an issue

⭐ If this project helped you, please give it a star! ⭐

Made with ❤️ by @iNSRawat

Happy Learning! 🚀📊🤖

📚 Resources from The Palindrome: Maths Roadmap for ML

Below is a comprehensive collection of mathematical formulas and concepts essential for Data Science and Machine Learning, as curated from Tivadar Danka's Roadmap.

📐 Linear Algebra

Predictive models are described using linear algebraic concepts. Mastering it is crucial for any ML engineer.

Vectors and Vector Spaces: Points in space represented by tuples.
Normed Spaces: Measuring distance (Euclidean, Manhattan, Supremum).
Matrix Operations: Multiplication as composition of transformations.
Eigenvalues & SVD: Essential for dimensionality reduction (PCA).

📈 Calculus

Used to fit models to data through differentiation and integration.

Derivatives: Tangent slopes for optimization.
Chain Rule: Core of backpropagation in neural networks.
Integrals: Area under the curve, used in expected values and entropy.

🏔️ Multivariable Calculus

The foundation of Gradient Descent.

The Gradient: Vector pointing to the direction of largest increase.
Hessian Matrix: Higher-order derivatives for determining local minima/maxima.

🎲 Probability Theory

The mathematical study of chance and uncertainty.

Bayes' Theorem: Updating priors with new observations.
Expected Value: Long-run averages, core to loss functions.
Information Theory: Entropy and Cross-Entropy loss for classification.

For the full detailed roadmap, check out Maths-Roadmap-for-ML-Resources.md

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assets		assets
LICENSE		LICENSE
Maths-Roadmap-for-ML-Resources.md		Maths-Roadmap-for-ML-Resources.md
README.md		README.md
index.html		index.html

Folders and files

Latest commit

History

Repository files navigation