Real-Time User Interaction Analytics Pipeline

A full-stack real-time analytics project that simulates, processes, stores, and visualizes live user interaction data from an e-commerce platform. This system combines data engineering, stream processing, machine learning, and interactive dashboards to support live decision-making.

🚀 Overview

This project demonstrates how to build a real-time pipeline to analyze user behavior events like product clicks, cart additions, and purchases using:

Apache Kafka for live event ingestion
Apache Spark Structured Streaming for stream processing
PostgreSQL for data storage
FastAPI for model inference
Streamlit for real-time dashboarding
Docker Compose for orchestration

📁 Project Structure

realtime-user-analytics/
├── kafka_producer/         # Simulates real-time user events
├── spark_streaming/        # Stream processing jobs
├── postgres_writer/        # PostgreSQL writer functions
├── ml_model/               # Churn model training + FastAPI serving
├── dashboard/              # Streamlit dashboard UI
├── config/                 # Centralized configs
├── docker/                 # Docker Compose setup
├── requirements.txt        # Python dependencies
└── run_all.sh              # Optional unified runner

🔄 End-to-End Pipeline

Step	Component	Description
1	Kafka Producer	Sends user events (`click`, `add_to_cart`, `purchase`) every second
2	Spark Streaming	Reads Kafka stream, applies windowed aggregations
3	PostgreSQL Writer	Writes KPIs to tables like `funnel_summary` and `event_trends`
4	Churn Model Training	Offline RandomForest model trained on session features
5	FastAPI Model Server	Serves churn predictions via REST API
6	Streamlit Dashboard	Displays real-time metrics + churn predictions

📊 Dashboard Features

Funnel Summary: Tracks unique users per event type in 1-minute windows
Event Trends: Count of events over time
Churn Risk Prediction: Live scoring of user sessions (click + cart activity + session duration)

🧪 Sample Tables

`funnel_summary`

window_start	window_end	event_type	unique_users	batch_time
2025-07-08 05:35:00	2025-07-08 05:36:00	add_to_cart	10	2025-07-08 11:05:30

`event_trends`

window_start	window_end	event_type	count	batch_time
2025-07-08 05:35:00	2025-07-08 05:36:00	click	35	2025-07-08 11:05:30

🧠 Churn Model

Model: RandomForestClassifier
Features:
- Number of clicks (num_views)
- Number of add_to_cart (num_cart_adds)
- Session duration (seconds)
Training Script: ml_model/churn_model_train.py
Serving Script: ml_model/churn_predictor.py (FastAPI)

⚙️ Running the Project

1. Start Kafka and PostgreSQL (via Docker Compose):

cd docker/
docker-compose up

2. Start Kafka producer:

python kafka_producer/producer.py

3. Run Spark streaming jobs:

spark-submit spark_streaming/aggregator.py
spark-submit spark_streaming/funnel_analysis.py

4. Train and run churn model:

python ml_model/churn_model_train.py
uvicorn ml_model.churn_predictor:app --reload

5. Launch dashboard:

streamlit run dashboard/app.py

🌟 Highlights

✅ End-to-end pipeline with real-time ingestion, storage, ML inference, and visualization
✅ Seamless integration of Spark, Kafka, PostgreSQL, FastAPI, and Streamlit
✅ Modular and extensible codebase

📌 Potential Extensions

Online model training
Session-based recommender
Alerts for anomaly detection
Kafka consumer for downstream services

📚 Dataset

Synthetic user events generated in producer.py. You can customize frequency, user/product IDs, and event types.

📷 Project Visuals

🔹 Streamlit Dashboard

🔹 Churn Prediction & Trends

🔹 API and Docker

🎬 Demo

▶️ Watch Streamlit in Action](https://youtu.be/g3xRtWLd6A4?si=PfZONBo3OydkV6PQ)

👨‍💻 Author

Harshal Patil MS in Data Science, University at Buffalo LinkedIn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time User Interaction Analytics Pipeline

🚀 Overview

📁 Project Structure

🔄 End-to-End Pipeline

📊 Dashboard Features

🧪 Sample Tables

`funnel_summary`

`event_trends`

🧠 Churn Model

⚙️ Running the Project

1. Start Kafka and PostgreSQL (via Docker Compose):

2. Start Kafka producer:

3. Run Spark streaming jobs:

4. Train and run churn model:

5. Launch dashboard:

🌟 Highlights

📌 Potential Extensions

📚 Dataset

📷 Project Visuals

🔹 Streamlit Dashboard

🔹 Churn Prediction & Trends

🔹 API and Docker

🎬 Demo

👨‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Visuals		Visuals
config		config
dashboard		dashboard
docker		docker
kafka_producer		kafka_producer
ml_model		ml_model
postgres_writer		postgres_writer
spark_streaming		spark_streaming
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
PROJECT_DOCUMENTATION.md		PROJECT_DOCUMENTATION.md
Profile_image.jpg		Profile_image.jpg
README.md		README.md
requirements.txt		requirements.txt
run_all.sh		run_all.sh

Folders and files

Latest commit

History

Repository files navigation

Real-Time User Interaction Analytics Pipeline

🚀 Overview

📁 Project Structure

🔄 End-to-End Pipeline

📊 Dashboard Features

🧪 Sample Tables

funnel_summary

event_trends

🧠 Churn Model

⚙️ Running the Project

1. Start Kafka and PostgreSQL (via Docker Compose):

2. Start Kafka producer:

3. Run Spark streaming jobs:

4. Train and run churn model:

5. Launch dashboard:

🌟 Highlights

📌 Potential Extensions

📚 Dataset

📷 Project Visuals

🔹 Streamlit Dashboard

🔹 Churn Prediction & Trends

🔹 API and Docker

🎬 Demo

👨‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`funnel_summary`

`event_trends`

Packages