Skip to content

bengsoon/NYT_topic_modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Topic Modeling with The New York Times Headlines (Aug 2019 - Jul 2022)

This repository is a work done for a talk that I have prepared for on Topic Modeling (titled What Can Machine Learning Do with Your Unstructured Data?).

The model used was BERTopic.The work covers how semantically similar documents (in this case, NYT headlines) tend to be closer together in a vector space. It also provides a general idea of Dynamic Topic Modeling, where we delved into how the frequencies of the topics / themes evolve over time.

Reproducibility

As there are limits to the large files storage on Github, I have decided to not push the model artifacts on this repo. However, you can reproduce it by cloning the repo onto your local drive (GPU-enabled machine required) or onto a GPU-enabled Google Colab instance:

    git clone https://github.com/bengsoon/NYT_topic_modeling/

Within the cloned folder, create the conda environment:

    conda create -f environment.yml

Run streamlit

    cd app
    streamlit run app.py

Viewing Results in Web App

I have created a Streamlit app that presents the results of the Topic Modeling https://nyt-topicmodel.streamlitapp.com/.

About

BERT-based Topic Modeling on New York Times Headlines (160k rows)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors