-
Notifications
You must be signed in to change notification settings - Fork 0
Project Checkpoints
We are interested in text analysis of stack exchange posts with regards to the "score" of each.
This project will use data, both text and numerical from questions and answers posted on Stack Exchange. This data can be obtained via freely available dumps from sources such as Archive.org as well as through Stack Exchange's own API and Data Explorer.
With this data, we will be examining the relationship between the text of a response itself and the score assigned using supervised learning techniques. Because we have access to large amounts of raw data, we are not limited by the size of training or test data. We anticipate that different stack exchange communities might have very different criteria for response score, therefore for initial analysis we will most likely limit the analysis to specific community of interest (such as datascience.stackexchange.com). Ultimately, the goal will be be an algorithm that can analyse the text of an answer and predict the range which the score of the response will fall into.