This project is a simple NLP toxic comment detector coupled with an user-friendly interface. The model itself uses transfer learning, taking DistilBERT, a well-known language modeling transformer, and is fine tuned into a regressor. The model was trained on this dataset using Torch. The model outputs a vector of 7 scores measuring 7 different variations of toxicity, which is further fit into the range of [0,1]. Weighted scores and threshold logic was used to determine a subsequent action of either flagging, warning, or banning. User can also upload screenshots where text can be extracted using tesseract and inputted to the model.
The project also explores simple CSS animations and was great practice for brushing up UI skills.
Frontend: Created with React
Backend: Powered using Flask server
ML: Transfer learning with DistilBERT transformers model and a dataset of 20,000 comments using Torch. Trained on Google Colab.
Utilities: Text from image extraction done using tesseract.
Make sure you have Python3 and node installed.
If not already, have tesseract and git-lfs installed.
macOS instructions:
brew install tesseract
brew install git-lfs
git lfs install
Clone the repository:
git clone https://github.com/addinar/toxicity-detector.git
Install dependencies and activate virtual environment for backend.
macOS instructions:
cd backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Open a double terminal.
On one terminal:
cd backend
python app.py
On the other terminal:
cd frontend
npm run dev
Distributed under the MIT License.



