Liar dataset is used for the fake news detection having 12,800 human labelled short statements in various contexts related to politics. The dataset is used as a fact-checking research and each statement is evaluated by the politifact.com editor for its truthfulness. Automatic fake news detection is a challenging problem in deception detection, and it has tremendous potential on real-world political and social impacts.
The three main ideas out of this dataset which we are interested in pursuing are:
-
Do high positions of authority lead to misleading information?
-
Which method of communication or social media platforms are most prone to misinformation?
-
The state’s with maximum proportions of fake news and in what subject? Which subject has the most concentrated fake news?
Overall, through visualization, we wanted to find out the most reliable method of receiving correct information, which platforms we can pay attention to for controversial topics, and the suspicious figures we should avoid for getting correct information.
Roles/Responsibilities
-
Initial Data Cleaning & Exploration through scatter plots | histograms | facets boxplots | bar plots etc - mainly to get the idea for the distributions of each feature
-
Exploring Dependency Relationships between Labels (Y Variable) and other interesting features
-
Do high positions of authority lead to misleading information? - Ayush
-
Which method of communication or social media platforms are most prone to misinformation? - Angad
-
The state’s with maximum proportions of fake news and in what subject? Which subject has the most concentrated fake news? - Andrew
This repo was initally generated from a bookdown template available here: https://github.com/jtr13/EDAVtemplate