Skip to content

tedydevmac/S3T3CPPT

Repository files navigation

Cyberbullying Detection Project

This project focuses on detecting and classifying cyberbullying in text, particularly against the LGBTQ community.

Stage 1 - Gather Data

No code

Stage 2 - Data Pre-Processing

Dataset 1

datasetOnePreprocessing.py - preprocessing the dataset from the Cyberbullying Data for Multi-Label Classification.

Dataset 2

datasetTwoPreprocessing.py - preprocessing the dataset from the Anti LGBTQ Cyberbullying Texts dataset.

After Concatenating Datasets

jointDatasetPreprocessing.py - Concantenates the 2 datasets

Data Visualisation

visualise.py - Creates the visualisation

Run datasetOnePreprocessing.py and datasetTwoPreprocessing.py before running jointDatasetPreprocessing.py.

Stage 3, 4 & 5 - Model Selection, Training, Evaluation, and Prediction

Logistic Regression

logisticRegression.py - The logistic regression model

K-Nearest Neighbours

knn.py - The k-nearest neighbours model


Additional Files

  • Function for tokenising: Used in like datasetOnePreprocessing.py, datasetTwoPreprocessing.py, and logisticRegression.py.

About

secondary 3 term 3 computing+ coursework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages