Short-Text-Classification

Short-form text classification for Twitter data.

This is the repository for the final project for the Master's level Machine Learning course at The University of Chicago.

Project scope

Build a binary classifier that categorize a given Tweet based on whether or not it is referring to a real-life disaster event.

Project motivation

The field of NLP has evolved rapidly in the past few years, leading to innovations across the analysis pipeline. Through this project, we aim to examine the improvement of these new techniques in applicatoin to a classic NLP dataset, disaster tweets.

Anaysis framework

Data Source

Kaggle Disaster Tweet Classification

Data Exploration

We used LDA topic modeling to cluster and identify key features of the disaster vs. non-disaster tweets.

Embedding

We explore the following embedding techniques:

1. TF-IDF
2. GloVe
3. BERT sentence-level embedding

Modeling

We tested the following modeling options:

1. Naive Bayes (baseline)
2. SVM with linear/non-linear kernal (baseline)
3. CNN
4. Simple RNN
5. RNN with LSTM
6. Bert with max pooling layer

Fine-Tuning

For CNN and RNN arthitectures, we employed the strategies to find the optimized hyper-parameters:

1. Random Search
2. Hyperband
3. Baysian

Conclusion

Overall, we found that the embedding in particular provided the highest performance boost with the lowest cost in efficiency.

Project members

Ada Jing Github ; LinkedIn

Dylan Zhang Github; LinkedIn

Rohit Satishchandra LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
1.1_Data_Clean.ipynb		1.1_Data_Clean.ipynb
2.1_Topic_Modeling.ipynb		2.1_Topic_Modeling.ipynb
3.1_BERT.ipynb		3.1_BERT.ipynb
3.2_RNN_GloVe.ipynb		3.2_RNN_GloVe.ipynb
3.3_CNN_GloVe.ipynb		3.3_CNN_GloVe.ipynb
4.1_CNN_Tuning.ipynb		4.1_CNN_Tuning.ipynb
4.2_RNN_Tuning.ipynb		4.2_RNN_Tuning.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Short-Text-Classification

Project scope

Project motivation

Anaysis framework

Project members

About

Uh oh!

Releases

Packages

Languages

License

adajing0101/Short-Text-Classification

Folders and files

Latest commit

History

Repository files navigation

Short-Text-Classification

Project scope

Project motivation

Anaysis framework

Project members

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages