Skip to content

mvp291/dsga1004

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MS in Data Science – New York University

DSGA1004: Big Data

Instructors: Professor Juliana Freire, Dr. Erin Carson, Dr. Nick Knight

Text: Mining of Massive Data Sets by Anand Rajaraman, Jure Leskovec and Jeff Ullman.

The objective of this course is to study the foundations of data storage and processing at scale.

Concepts and tools used in teh course include:

  • Relational algebra
  • SQL
  • Distributed File Systems and MapReduce
  • Apache Hadoop and Apache Spark
  • Amazon Web Services
  • Algorithms for: Finding similar items, frequent itemsets

You can find an overview and details on the course website: https://vgc.poly.edu/~juliana/courses/BigData2016/

Coursework:

Programming Assignments (35% - Individual)

  • Assignment 1: Querying NYC Taxi data using SQL
  • Assignment 2: NYC Taxi data processing using Map/Reduce (Hadoop)

Project (25% - Group)

Group: Maria Leonor Zamora Maass mzm239@nyu.edu, Luisa Eugenia Quispe Ortiz lqo202@nyu.edu

The objective of the term project was to analyze a massive dataset using the concepts learned in the course. We decided to analyze taxi data, in particular we focused on the analysis of short trips (those that could have been made by foot or bike).

The final report for this project can be found here.

About

Big Data Course – Coursework (New York University, Professor: Juliana Freire)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors