bigdata-final-project

NYU CS-GY 6513 Big Data Group 20

Group member : Hao Dong hd2327 / Siwei Wang sw5050 / Yinchen Shi ys4653

There is a github repository for this project: https://github.com/hd2327/bigdata-final-project. The relate document and simple description are as follows:

data_reference : the reference data we commit to https://github.com/VIDA-NYU/reference-data-repository

reference_data_NYC zipcode.csv : NYC zipcodes
reference_data_collision factor.csv : factors of collisions
reference_data_vehicle.csv : types of vehicles

data_analysis.zpln : the data analysis file with Spark, please open by zepplin
data_cleaning_improve.ipynb : the data cleaning file after improvement, please open by jupyter notebook
data_visualization.ipynb : the data visualization file with Spark and Matplotlib, please open by jupyter notebook
report.pdf : the report

how to open

At first，we should run data_cleaning_improve.ipynb in jupyter notebook to get the result after cleaning. The output is out.zip and there is a file named out.csv.

Then we need put the out.csv to HDFS on the Peel cluster by command hfs -put out.csv. Then we need update the code in data_analysis.zpln and run it in school zepplin environment.

The code to be modified is parking_v = sc.textFile("/user/hd2327/data_cleaning_output.csv"). The hd2327 is my netid. After that, the data_analysis.zpln can run.

In the end, open data_visualization.ipynb and run it in jupyter notebook to look at the result.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bigdata-final-project

how to open

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data_reference		data_reference
.gitignore		.gitignore
README.md		README.md
data_analysis.zpln		data_analysis.zpln
data_cleaning_improve.ipynb		data_cleaning_improve.ipynb
data_visualization.ipynb		data_visualization.ipynb
report.pdf		report.pdf

Folders and files

Latest commit

History

Repository files navigation

bigdata-final-project

how to open

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages