This is the official Github Repository for, "Coded Term Discovery for Online Hate Speech Detection" to be presented at the 11th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2024)
Please refer to the paper as follows: Kikkisetti, D., Mustafa, R., Melillo, W., Corizzo, R., Boukouvalas, Z., Gill, J., Japkowicz, N., "Coded Term Discovery for Online Hate Speech Detection", to appear in the 11th IEEE International Conference on Data Science and Advanced Analytics (DSAA'2024).
Antisemitism_term_definition.csv : It contains the definition of all the antisemitic seed words we have used in the paper.
Baseline Results.ipynb : Python code file for solutions 1-1 and 1-2 from table 3 in the paper.
Finetune_bertmodel_pyrradataset.ipynb: Python code file for finetuning the BERT model using our custom dataset.
ReportingLayerData_Bertembeddings.ipynb: Python code file for solution 2-1 and 2-2 from table 3 in the paper.
SRI Coding Statement.pdf: The coding statement designed by the lexical study.
Solution 1-1.csv: The excel file showing the prediction of emerging terms using appraoch 1 in phase 1 and approach 1 in phase 2 of section V.
Solution 1-2.csv: The excel file showing the prediction of emerging terms using appraoch 1 in phase 1 and approach 2 in phase 2 of section V.
Solution 2-1.csv: The excel file showing the prediction of emerging terms using appraoch 2 in phase 1 and approach 1 in phase 2 of section V.
Solution 2-2.csv: The excel file showing the prediction of emerging terms using appraoch 2 in phase 1 and approach 2 in phase 2 of section V.
Unmasking Antisemitism SRI Data Set - Reporting Layer.csv: The actual data file we used for the paper.
Preprocess.py: A python function used by other python files for parts of the preprocessing.
New Terms Labeling: Details of the procedure used to label outputs of the system as New Terms.
DSAA2024.pdf: The pdf of the conference paper.