MannheimWMProject

Web Mining Project IE671, University of Mannheim

Project Overview

This project is organized into the following sections:

1_EDA - Initial Exploration, Data Preprocessing, and Visualizations
2_Lexicon-Based - Application of Lexicon-Based models to explore the data
3_LogReg_RandForest - Application of ML models Logistic Regression and Random Forest
4_BERT - Application of the pre-trained model BERT
5_LSTM - Application of an LSTM architecture
6_XGBoost - Application of the ML model XGBoost

Structure

Sections 1 & 2: Focus on data exploration and preprocessing.
Sections 3 to 6: Focus on applying various models for predictions.

Repository Contents

This repository contains several Python files, including:

preprocessing.py and lexicon_based.py: Functions used throughout the project.
lstm_helper.py and lstm_model.py: Functions specifically used in Section 6.

Data

The initial dataset, airlines_reviews.csv, was preprocessed and exported as processed_data.csv. The preprocessed data was then used as input for the notebooks in Sections 3 to 6. XGBoost_misclssified_smples.csv was the neutral reviews misclassified by XGBoost.

Collaboration

Tasks were divided among group members, but everyone was available to answer questions and resolve issues that arose during the project.

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
.experiments		.experiments
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
.DS_Store		.DS_Store
.gitignore		.gitignore
1_EDA.ipynb		1_EDA.ipynb
2_Lexicon-Based Analysis.ipynb		2_Lexicon-Based Analysis.ipynb
3_LogReg_RandForest.ipynb		3_LogReg_RandForest.ipynb
4_BERT.ipynb		4_BERT.ipynb
5_LSTM.ipynb		5_LSTM.ipynb
6_XGBoost.ipynb		6_XGBoost.ipynb
Comparison-based Modelling for Sentiment Analysis of Small Corpus Airline _V1.1Dataset.pdf		Comparison-based Modelling for Sentiment Analysis of Small Corpus Airline _V1.1Dataset.pdf
README.md		README.md
Who-did-what-Team_3.xlsx		Who-did-what-Team_3.xlsx
XGBoost_misclassified_samples.csv		XGBoost_misclassified_samples.csv
airlines_reviews.csv		airlines_reviews.csv
lexicon_based.py		lexicon_based.py
preprocessing.py		preprocessing.py
processed_data.csv		processed_data.csv
zm_lstm_helper.py		zm_lstm_helper.py
zm_lstm_model.py		zm_lstm_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MannheimWMProject

Web Mining Project IE671, University of Mannheim

Project Overview

Structure

Repository Contents

Data

Collaboration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MannheimWMProject

Web Mining Project IE671, University of Mannheim

Project Overview

Structure

Repository Contents

Data

Collaboration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages