forked from microsoft/ML-For-Beginners
-
Notifications
You must be signed in to change notification settings - Fork 0
Intro
brewblue edited this page Feb 19, 2025
·
5 revisions
- AI: get machine to fill task that requires human level intelligence
- ML: use specialized algorithm by learning from data -> find patterns
- Deep Learning: rely on neural network to learn from data
This course cover the classical machine learning, includes
- ML core concepts
- statistical techs including regression, classification, clustering and more
We do not want to amplify human bias
- Decide if AI is the right approach of your problem:
- the problem can't be defined by precise rules &
- you can get extensive data that contains the solution
- From preparation to production
-
Data Collection and Preparation
- First, you gather relevant data for your problem
- Clean the data by handling missing values, outliers, and inconsistencies
- Split the data into two or three sets:
- Training set (typically 70-80% of data)
- Validation set (optional, 10-15%)
- Test set (20-30%)
-
Feature Engineering
- Select relevant features (variables) that will help predict your target
- Transform features through:
- Scaling (normalizing numbers to similar ranges)
- Encoding categorical variables
- Creating new features from existing ones
-
Model Selection
- Choose an algorithm based on your problem type:
- Classification (predicting categories)
- Regression (predicting continuous values)
- Clustering (grouping similar items)
- Choose an algorithm based on your problem type:
-
Training Process
- The model learns patterns from the training data
- It works by:
- Making predictions on training data
- Calculating error using a loss function
- Adjusting its parameters to minimize the error
- Repeating this process (iterations/epochs)
-
Validation
- Test the model on validation data
- Tune hyperparameters (model settings)
- Check for overfitting (when model performs well on training but poorly on new data)
-
Evaluation
- Test final model performance on the test set
- Use appropriate metrics (accuracy, precision, recall for classification; MSE, MAE for regression)