Instructor: Prof. Brian Wright - https://datasci.columbian.gwu.edu/program-adminstration
This class covers the basic ideas and techniques of data science, including its definition and the context in data-driven computation and practical applications.
- Explain the major trends in the fields of data science and big data.
- Explain the relative strengths and weaknesses of data science analysis techniques and systems.
- Describe a selection of analysis techniques and when they should be used.
- Gain experience with the use of Python and R to do analyses of simple data sets.
- Midterm Project - Tipping Data
Abstract: This study explores the variables that affect tipping behavior applying the steps of a Data Science project. The scope of the work only includes the first two stages: Stating the question and the Exploratory analysis. For this work we found a data set created by a food server back in 1990, where he was able to record tips and several features of the business. Using Python we analyzed the data and observed interesting operational patterns that influence tipping, at this stage of the project, "party size" and "gender" are one of the most influential factors in tipping. As a recommendation we proposed to fit a linear regression model to properly measure and find a predictable model.
- Final Project - Philadelphia Crime Data
Abstract: Our project focuses on exploring the crime dataset to find out the factor that has most impact on predicting a specific crime - thefts. It analyses a real-world crime dataset for Philadelphia, PA and provides a overall description of Philadelphia's crime situation through a statistical analysis supported by several graphs. Then, it clarifies how we constructed a logistic regression classification model for crime prediction. Proposed model assists law enforcement agencies in discovering crime patterns and predicting future trends, in order to better secure the city.