Machine Learning Pipeline

Attiriton Rate Prediction

Name : Lau Sook Han Gayle Date: 31/10/22 Email: laugayle@gmail.com

Data Overview

Independent Features

Attribute Description

Age - Age of the member when signing up as a member
Gender - Gender of the member
Monthly Income - Monthly declared income of the member in SGD
Travel Time - Estimated amount of time needed to travel to the club from home (mins)
Qualification - Education qualification level of the member (1-Diploma, 2-Bachelors, 3-Masters, 4-PH.D)
Work Domain - Work domain of the member
Usage rate - Average number of days in a week visiting the country club
Branch - Location of the branch that the member visits
Membership - Membership tier (1-Normal 2-Bronze 3-Silver 4-Gold)
Months - Number of months as a member of the country club
Birth Year - Year the member was born
Usage Time - Average number of hours spent in the country club per visit
Usage - Usage Time * Usage Rate (new)

Target Feature

Target Feature: Attrition (If the member left: 0 = No, 1 = Yes)

Unique Identifying Features

Member Unique ID: Unique member ID (removed)

Catergorical Features

Categorical features: Qualification, Work Domain, Branch, Membership

Binary Features

Binary features: Gender, Attrition

Numerical Features

Numerical Features: Age,Travel Time,Monthly Income, Usage Rate, Months, Birth Year, Usage Time

Sypnopsis of the problem.

Classification: predict member attrition using the provided dataset to help a country club to formulate policies to reduce attrition. In your submission, you are to evaluate at least 3 suitable models for predicting member attrition.

Overview of Submitted folder

. ├── eda.ipnyb ├── data │ └── score.db # removed ├── results │ └── testing accuracy.csv ├── requirements.txt ├── run.sh └── src ├── └── run.py

Executing the pipeline

**run.py

Steps:

1. Imports the data from .db file

2. Data Cleaning (data is cleaned - 'Age', 'Monthly Income', `Birth Year`, 'Qualification', 'Travel Time)'

3. Feature Engineering (one-hot encoding/ordinal encoding for categorical data)

4. Data split into training and testing data (80/20)

5. Building of the Models

Logistic Regression
Decision Tree Model
Boosting Decision Tree
Bagging Decision Tree
Random Forest Model
Support Vector Machine
KNN
Naive Bayes

6. Results

Accuracy and Recall

Running of machine learning pipeline.

Machine learning model created with python 3 and bash script.

Installing dependencies

Paste the following command on your bash terminal to download dependencies

pip install -r requirements.txt

Running the Machine Learning Pipeline

Past the followin command on your bash terminal to grant permission to execute the 'run.sh' file

chmod +x run.sh

Paste the following command on the bash terminal to run the machine learning programme

./run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Pipeline

Attiriton Rate Prediction

Data Overview

Independent Features

Target Feature

Unique Identifying Features

Catergorical Features

Binary Features

Numerical Features

Sypnopsis of the problem.

Overview of Submitted folder

Executing the pipeline

Steps:

1. Imports the data from .db file

2. Data Cleaning (data is cleaned - 'Age', 'Monthly Income', `Birth Year`, 'Qualification', 'Travel Time)'

3. Feature Engineering (one-hot encoding/ordinal encoding for categorical data)

4. Data split into training and testing data (80/20)

5. Building of the Models

Running of machine learning pipeline.

Installing dependencies

Running the Machine Learning Pipeline

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
results		results
src		src
README.md		README.md
eda.ipynb		eda.ipynb
requirements.txt		requirements.txt
run.sh		run.sh

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Pipeline

Attiriton Rate Prediction

Data Overview

Independent Features

Target Feature

Unique Identifying Features

Catergorical Features

Binary Features

Numerical Features

Sypnopsis of the problem.

Overview of Submitted folder

Executing the pipeline

Steps:

1. Imports the data from .db file

2. Data Cleaning (data is cleaned - 'Age', 'Monthly Income', Birth Year, 'Qualification', 'Travel Time)'

3. Feature Engineering (one-hot encoding/ordinal encoding for categorical data)

4. Data split into training and testing data (80/20)

5. Building of the Models

Running of machine learning pipeline.

Installing dependencies

Running the Machine Learning Pipeline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Data Cleaning (data is cleaned - 'Age', 'Monthly Income', `Birth Year`, 'Qualification', 'Travel Time)'

Packages