- The project uses a GUI for better interactive and ease purpose.
- The Project delves into NLP concepts and the pre-processing involved before training a Logistic Regression model.
- Dataset used- Kaggle Twitter Sentiment Analysis which consists of 4 columns - Tweet ID, Entity, Text and Sentiment
- Visualization includes using the Word cloud mapping for the user-selected sentiment.
- Preprocessing includes- Retaining only alphanumeric letters, Switching everything to lower-case, Removing stopwords(e.g 'The', 'is'), Lemmatizing (changing words to root form)
- The Text to Vector conversion is done using Bag of Words Or TF_IDF(User Selected)
- Main.py - Includes GUI parts
- Train.py - Includes loading, preprocessing, vectorization, model training
- Predict.py - Prediction on the user input
- vectorizer.py - Includes the frequency based vector embeddings(BOW , TF_IDF)
- model.py - Different models initialized Next steps:
- To see the difference in metrics if Word2Vec is used. Read that Word2Vec is better in highlighting the sentiments involved in the text.
- Hyperparameter tuning
- Use Prediction based Vector embeddings (Word2Vec, Glove,etc). Might need to change few concepts
- Bert, LSTM Implementation
⭐ Star this repo if you find it helpful!
Made with ❤️ by Vivek Padayattil
