This project focuses on detecting and classifying cyberbullying in text, particularly against the LGBTQ community.
No code
datasetOnePreprocessing.py - preprocessing the dataset from the Cyberbullying Data for Multi-Label Classification.
datasetTwoPreprocessing.py - preprocessing the dataset from the Anti LGBTQ Cyberbullying Texts dataset.
jointDatasetPreprocessing.py - Concantenates the 2 datasets
visualise.py - Creates the visualisation
Run datasetOnePreprocessing.py and datasetTwoPreprocessing.py before running jointDatasetPreprocessing.py.
logisticRegression.py - The logistic regression model
knn.py - The k-nearest neighbours model
- Function for tokenising: Used in like
datasetOnePreprocessing.py,datasetTwoPreprocessing.py, andlogisticRegression.py.