This project demonstrates how to build an email classifier system to :
- Mask personally identifiable information (PII) like name, email, phone number, card number from emails.
- Classify emails into categories like 'Request', 'Incident', 'Change', 'Problem'.
- Language: Python 3
- Framework: FastAPI
- ML: Scikit-learn (Multinomial Naive Bayes)
- Text Preprocessing: Regex, SpaCy (for NER)
- Model Saving: Joblib
- Deployment Ready: Yes, via Uvicorn server
- Regex + Spacy powered PII masking
- Trained Multinomial Naive Bayes classifier (TF-IDF)
- REST API built with FastAPI
- Strict JSON format output
- Ready for deployment and production usage
- Accuracy: 71.5%
- Classifier: Multinomial Naive Bayes with TF-IDF vectorization
- Most Confident Class: Request (F1 = 0.88)
email-classifier
|
|--data
|--combined_emails_with_natural_pii.csv
|--saved_model
|--model.pkl
|--tfidf.pkl
|--api.py
|--model.py
|--utils.py
|--README.md
|--requirements.txt
git clone < repo-link >
cd email_classifier_project
python -m venv email_classifier
source email_classifier/bin/activate # Mac/Linux
email_classifier\Scripts\activate # Windows
pip install -r requirements.txt
python -m spacy download en_core_web_sm
uvicorn api:app --reload
- Open your browser and go to: http://127.0.0.1:8000/docs
- Use the /classify-email endpoint to test!
- Use n-gram features for better phrase detection
- Try SVM or Logistic Regression
- Experiment with contextual models like BERT
- Add logging and exception handling to the API
- Containerize the app using Docker

