Principles of Bigdata Management, Spring 2016 Project
Environment: Pyspark(Apache Spark)
Visualization: D3.js
Language: Python
Datasets:
Dataset I: Tweets collected over a period of week on Different Payment Technologies like Apple Pay, Samsung Pay, Android Pay, PayPal etc.
Dataset II: Tweets collected about the movie “Batman v Superman: Dawn of Justice”
Dataset III: Public Dataset from U.S. Department of Education on all Accredited Universities in U.S.A.
Queries:
Query I: Number of Tweets for each Payment Technology over a week in Dataset I.
Query II: Tweet count on each day over a period of week for all Payment Technologies in Dataset I.
Query III: Number of Twitter accounts created according to month & year from Dataset I.
Query IV:Top 10 Verified Accounts with Highest Follower Count in Dataset I.
Query V: List of all the Languages that were used to tweet and its count on Dataset - II.
Query VI: Percentage of Tweets with External Links in Tweet Status for Dataset II.
Query VII:Top ten liked tweets with like count for Dataset II.
Query VIII:Number of Colleges in each state for Dataset III.
Query IX:Ratio of Accreditation Types of all Colleges in Dataset III.
Team Members:
Sri Chaitanya Patluri
Sai Venkatesh Gatiganti
Meghasai Reddy Bodimani