Review Analytics Engine

End-to-end NLP pipeline for classifying and summarizing product reviews. Compares classical ML baselines (Logistic Regression, Random Forest, MLP) against a fine-tuned BERT model, with MLflow experiment tracking and a Streamlit dashboard for exploring results.

Tech Stack

Models: PyTorch, Hugging Face Transformers (BERT), Scikit-learn
Tracking: MLflow
Data: Pandas, NumPy, Snowflake connector (CSV fallback for local dev)
Dashboard: Streamlit, Matplotlib, Seaborn
Validation: Pydantic

Setup

python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -r requirements.txt

Usage

1. Fetch data

Downloads Amazon product reviews from HuggingFace Datasets:

python scripts/fetch_data.py --samples 50000

2. Train models

Runs classical baselines + optional BERT fine-tuning:

python scripts/train.py                # full pipeline (includes BERT)
python scripts/train.py --skip-bert    # classical models only (faster)

3. View results

python scripts/evaluate.py             # prints comparison table
streamlit run dashboard/app.py         # interactive dashboard
mlflow ui                              # experiment tracking UI

Architecture

scripts/fetch_data.py  -->  data/raw/reviews.csv
                                  |
                          src/data/loader.py  (Snowflake or CSV)
                                  |
                        src/data/preprocessing.py  (cleaning, TF-IDF)
                                  |
                    +-------------+-------------+
                    |                           |
          src/models/classical.py     src/models/bert_classifier.py
            (LogReg, RF, MLP)           (BERT fine-tuning)
                    |                           |
                    +-------------+-------------+
                                  |
                      src/evaluation/metrics.py
                                  |
                      src/tracking/experiment.py  (MLflow)
                                  |
                        dashboard/app.py  (Streamlit)

Configuration

All hyperparameters and data source settings are in config/config.yaml. To use Snowflake instead of local CSV, set data.source: snowflake and provide credentials via environment variables.

Tests

pytest tests/ -v

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Review Analytics Engine

Tech Stack

Setup

Usage

Architecture

Configuration

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
config		config
dashboard		dashboard
data/sample		data/sample
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Review Analytics Engine

Tech Stack

Setup

Usage

Architecture

Configuration

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages