Li's thesis project "Classification of Clinical Tweets Using Apache Mahout"
Propose a new tool called Clinical Tweets Classifier (CTC) to enable scalable classification of clinical content on Twitter using Apache Mahout and Hadoop
Download sample tweets using Twitter STREAM API
Parse and retrieve metadata from raw JSON tweets
Implement an original algorithm to calculate Twitter User Influence ranging from 0 to 100
Web UI to compare and display Twitter data using table, pie-chart and bar-chart
Train and test models with Mahout
Classify clinical tweets using Naive Bayes in Hapood cluster
This work was in part supported by the National Science Foundation under Grant No. 1115871.