Skip to content

nithin-seenivasan/BigDataAnalytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Big Data Analytics

MGT819 Elective at Yale

Course Description

Cheap storage and computing power have enabled the gathering and analysis of an unprecedented amount of data on everything from genetic health risk profiles to real-time Wall Street diaper consumption. To take advantage of these massive datasets, new statistical tools and ideas have been developed and this body of knowledge is sometimes referred to as Data Science. The aim of this course is to provide a gentle tour of the business and industry applications of data science. Course concepts will be illustrated using the free open-source statistical language R. R and the general programming language Python are the industry standards for data science

Final Project

Dataset source

Employee Attrition

Description

Classification Trees (decision trees) and Linear Regression was used to analyze the dataset. The dataset had 35 numerical and categorical features, so significant pre-processing had to be done before it could be used. For this dataset, the classification tree model had a much better accuracy (around 84%) as compared to the linear regression model. Although the linear regression model could not be used here, finding a selection of 12 variables exhaustively using the Leaps package led to interesting insights into which factors contribute the most to employee attrition.

About

MGT819 Elective at Yale

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages