Skip to content

A-cpu-rg/AI-ML-DL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dynamic Trend & Event Detector

Team 25 — Predictive2 | Abhishek Meena & Meenaksh Singhania

Detecting and tracking evolving topics over time using probabilistic topic models and neural embeddings on 10 years of HuffPost news data.


Problem Statement

Organizations monitor information streams to detect emerging narratives and societal shifts. This project builds a system to detect, track, and score trends across news categories from 2012–2022.

Dataset

  • Source: HuffPost News Category Dataset
  • Size: ~210,000 articles
  • Period: 2012–2022
  • Features: headline, short_description, category, date, authors, link

Project Structure

Dynamic-Trend-Event-Detector/
├── data/
│   ├── raw/                  # Original dataset (not tracked by git)
│   └── processed/            # Cleaned features (not tracked by git)
├── notebooks/
│   ├── 01_eda.ipynb          # Exploratory data analysis
│   ├── 02_features.ipynb     # Feature engineering
│   └── 03_models.ipynb       # Baseline + LDA models
├── src/
│   ├── preprocess.py         # Text cleaning pipeline
│   ├── features.py           # Feature engineering functions
│   └── models.py             # Model training and evaluation
├── results/
│   ├── plots/                # All generated figures
│   └── metrics/              # Ablation table, scores
├── report/
│   ├── main.tex              # LaTeX conference report
│   └── refs.bib              # Bibliography
├── config.yaml               # All hyperparameters in one place
└── requirements.txt

Setup

# 1. Clone the repo
git clone https://github.com/meenaksh06/Dynamic-Trend-Event-Detector.git
cd Dynamic-Trend-Event-Detector

# 2. Install dependencies
pip install -r requirements.txt

# 3. Download dataset from Kaggle and place it at:
#    data/raw/News_Category_Dataset_v3.json

# 4. Run notebooks in order
jupyter notebook

Model Pipeline

Phase Model Type Metric
Baseline TF-IDF Top-K Frequency-based Qualitative
Phase 1 LDA (k=tuned) Probabilistic Coherence c_v + Perplexity
Phase 2 BERTopic Neural Embedding TBD
Phase 3 Hybrid (LDA + BERT) Hybrid TBD

Ablation Table (Phase 1)

Model Coherence (c_v) Perplexity Notes
TF-IDF Baseline Qualitative top-K
LDA (Adv ML) updating... updating... Tuned num_topics
Hybrid (Phase 3) TBD TBD Future work

Team

Member Phase 1 Responsibility
Abhishek Meena Repo setup, temporal EDA, TF-IDF baseline, LaTeX methods/results
Meenaksh Singhania Text preprocessing, LDA model + tuning, literature review, LaTeX related work

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors