Skip to content

samarthyaveer/spam-ham-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Spam/Ham Email Classifier This project provides a machine learning-based solution for classifying emails as Spam or Ham. It leverages multiple models, including Naive Bayes, Support Vector Machines (SVM), and Random Forest, to predict and display results interactively via a Gradio interface. The application is deployed on Hugging Face for seamless accessibility.

Features Multi-Model Predictions: Utilizes Naive Bayes, SVM, and Random Forest classifiers for predictions. Interactive Interface: Accepts single email text or batch input through CSV files. Preprocessing: Includes text preprocessing (stopword removal, lemmatization) and TF-IDF vectorization. Deployment: Deployed on Hugging Face with a user-friendly interface for classification tasks. Getting Started Follow the instructions below to set up the project locally or use the deployed version on Hugging Face.

Usage Single Email Classification Enter email content in the Text Input field. Click Submit to view the classification result for each model. Batch Email Classification Upload a CSV file containing a column with email texts. Ensure the column name is recognized or renamed automatically. Click Submit to receive predictions for all emails, including a majority vote result. Deployed Version Access the live demo on Hugging Face: [https://hf.co/spaces/samarthyaveer/spam-ham-classifier]

How It Works Preprocessing Pipeline Convert text to lowercase. Remove special characters and digits. Tokenize and filter stopwords using NLTK. Lemmatize tokens to reduce words to their base form. Transform preprocessed text using TF-IDF vectorization. Models Naive Bayes: Probabilistic model based on word frequency. Support Vector Machines (SVM): Linear classifier optimized for high-dimensional spaces. Random Forest: Ensemble model aggregating decisions from multiple decision trees. Output Displays predictions from all three models. Provides a final classification based on majority voting across the models. Acknowledgments Libraries Used: NLTK, Scikit-learn, Pandas, NumPy, Gradio

About

This email spam classification system uses three distinct machine learning models—Naive Bayes, Support Vector Machine (SVM), and Random Forest—to classify emails as either "spam" [0] or "ham" (non-spam) [1].

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages