GitHub - lisaguim/SentimentAnalysisIMDB: Machine learning model with >80% accuracy capable of classify whether a user review has positive or negative connotation.

Machine Learning algorithm for Sentiment Analysis of IMDB

📌 Project description

This project have objective the construction of machine learning model with >80% accuracy capabe of classify whether a user review has positive or negative connotation.

📜 Dataset

Dataset used was IMDB (Internet Movie Database) of 50k Movie review from Kaggle. It comprise of two collumns: one with user's review and one with review's sentiment classified as "positive" or "negative".
Link: https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

📚 Libraries

Python libraries used for this project are:

re
pickle
nltk
sklearn
numpy
pandas
nltk
sklearn

🗑️ Data Cleaning

First of all, data cleaning step was applied to convert sentiment column into numeric binary classifiation ("1" = "positive" and "0" = "negative") and remove HTML tags and special characters from reviews. In additional, Natural Language Processing (NLP) techniques were used to remove stopwords and apply stemming into users reviews to standardize the words for the data pre-processing step.

📍 Data pre-processing

In this step, dataset was divede into 80% trainning and 20% test. Also, data was converted into numerical representation, creating a vector using sklearn library and after transform into array NumPy.

⚙️ Data model machine learning

To test accuracy >80%, three distributions supervised machine learning of classification were created based on Naive Bayes algorithm:

GaussianNB
MultinomialNB
BernoulliNB

📊 Results

To check accuracy of each model, was used AUC (Area Under The Curve), a metric for evaluating the performance of classification models in machine learning. So, here we are:

AUC GaussianNB = 0.6668474355243407
AUC MultinomialNB = 0.8071783917411954
AUC BernoulliNB = 0.8188667244694989

BernoulliNB presented best performanced in relation to others models.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Project.ipynb		Project.ipynb
README.md		README.md
model_v3_bernoulliNB.pkl		model_v3_bernoulliNB.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning algorithm for Sentiment Analysis of IMDB

📌 Project description

📜 Dataset

📚 Libraries

🗑️ Data Cleaning

📍 Data pre-processing

⚙️ Data model machine learning

📊 Results

About

Uh oh!

Releases

Packages

Languages

lisaguim/SentimentAnalysisIMDB

Folders and files

Latest commit

History

Repository files navigation

Machine Learning algorithm for Sentiment Analysis of IMDB

📌 Project description

📜 Dataset

📚 Libraries

🗑️ Data Cleaning

📍 Data pre-processing

⚙️ Data model machine learning

📊 Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages