Hate speech on social media is a real problem — especially in Pakistan where people write in Roman Urdu (Urdu written in English letters). Platforms like Facebook, YouTube, Twitter are full of toxic comments, slurs, and harassment in Roman Urdu but there are almost no tools to detect this. All the existing hate speech detection systems are built for English, they completely fail on Roman Urdu. And this is the language millions of Pakistani users actually type in every single day. On top of that, Roman Urdu has no fixed spelling rules — one toxic word can be written 10 different ways. People mix Urdu, English, Punjabi all in one sentence. There's no ready-made dataset, no pre-trained model, nothing. So we built one from scratch.
- No standard spelling — same word spelled multiple ways (khabees, khabis, khabeees)
- Code-mixing everywhere — heUrdu, English, Punjabi, all in one comment
- Context matters — some words are toxic only in certain contexts
- Zero pre-built tools — no off-the-shelf models, tokenizers, or labeled datasets exist for Roman Urdu
We built a complete ML pipeline that takes a Roman Urdu comment and classifies it as Toxic (1) or Not Toxic (0) — everything from scratch.
- Scraped real comments from Pakistani subreddits
- AI (Gemini 3.0, Claude Opus 4.6) based batch labeled 20,622 comments with toxic keywords recorded
- Built a 1,609-word toxic lexicon from the labeled data
- Full NLP pipeline tuned for Roman Urdu
- Trained 5 ML models — best: SVM at 95.66% accuracy
1. Clone the repo
git clone https://github.com/your-username/hatespeech-detection.git
cd hatespeech-detection2. Create & activate virtual environment
python -m venv venvWindows:
venv\Scripts\activateMac/Linux:
source venv/bin/activate3. Install dependencies
pip install -r requirements.txt4. Run the notebook
jupyter notebook Model.ipynbCustom dataset — commentLabel.csv — scraped from Pakistani subreddits and labeled via AI in batches. 3 columns: text, label (0/1), keyword (the toxic word that triggered the label).
| Detail | Value |
|---|---|
| Total comments | 20,622 |
| After dedup | 16,501 |
| Not Toxic | 11,867 (71.92%) |
| Toxic | 4,634 (28.08%) |
| Unique toxic keywords | 1,609 |
Everything runs in Model.ipynb:
- EDA — checked shape, nulls, duplicates (4,121 removed), label distribution
- Preprocessing — lowercasing, removed URLs/emojis/punctuation/numbers, normalized repeated chars, removed English + Roman Urdu stopwords
- Lexicon Feature — built a 1,609-word toxic lexicon, counted toxic keyword hits per comment
- TF-IDF Features — char n-grams (2–5, 30k) + word n-grams (1–2, 15k) + lexicon count = 45,001 features
- Split + SMOTE — 70/30 stratified split, SMOTE on train set → balanced to 8,306 per class
- Training — 5 models: XGBoost, Naive Bayes, Logistic Regression, Random Forest, SVM
| Model | Accuracy | F1 (Toxic) | Precision | Recall |
|---|---|---|---|---|
| SVM | 95.66% | 0.92 | 0.93 | 0.91 |
| XGBoost | 95.11% | 0.91 | 0.90 | 0.93 |
| Logistic Regression | 95.07% | 0.91 | 0.89 | 0.93 |
| Multinomial NB | 92.39% | 0.86 | 0.88 | 0.84 |
| Random Forest | 90.93% | 0.81 | 0.96 | 0.70 |
Best: SVM — 95.66% accuracy, F1: 0.9219. Manual sanity test: 10/10 correct.