IMDB dataset having 50K movie reviews for natural language processing or Text analytics. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training and 25,000 for testing. So, predict the number of positive and negative reviews using either classification or deep learning algorithms.
For more dataset information, please go through the following link: http://ai.stanford.edu/~amaas/data/sentiment/
Dataset Link : https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
- Python 3
- Tensorflow
- Natural Language Processing (NLP)
- Tokenization (tokenize text into numeric values)
Trained_Model directory contains trained model on IMDB dataset with Training Accuracy : 97.93% and Validation Accuracy : 90.88%
Model.py : Pre-processing of dataset and Training of model has been done.
tokenizer.json : contains tokenizer object in JSON format.