Reddit Stream Sentiment Analysis

This project performs real-time and batch sentiment analysis on Reddit posts using Apache Spark Structured Streaming, Kafka, PostgreSQL, and VADER Sentiment Analysis.

It ingests Reddit posts, filters them by keywords, analyzes their sentiment, aggregates results over time windows, and stores outputs in both Kafka and PostgreSQL.

🛠 Architecture Overview

Real-time Streaming

Filter Reddit posts by a keyword from the Kafka topic reddit-raw-posts.
Analyze Sentiment using VADER on the filtered posts from reddit-keyword-filtered.
Aggregate positive, negative, and neutral posts per 1-minute window.
Save the results to PostgreSQL and Kafka (sentiment_analyse topic).

Batch Processing

Load saved sentiment data from PostgreSQL (sentiment_results table).
Perform batch aggregation based on ingestion timestamps (1-minute window).
Write aggregated results into PostgreSQL table sentiment_aggregated_batch.

📚 Technologies Used

Apache Spark Structured Streaming & Batch Processing
Apache Kafka
PostgreSQL
VADER Sentiment Analysis (Python)
Python (PySpark)

📦 Kafka Topics

Topic Name	Description
`reddit-raw-posts`	Raw unfiltered Reddit post JSON data.
`reddit-keyword-filtered`	Posts filtered by a keyword in the title.
`sentiment_analyse`	Posts with sentiment analysis result.

🗄 PostgreSQL Tables

Table Name	Description
`reddit_keyword_filtered`	Raw posts after keyword filtering.
`sentiment_results`	Posts with sentiment labels and ingestion timestamps.
`sentiment_aggregated`	Real-time aggregated sentiment counts over 1-minute windows.
`sentiment_aggregated_batch`	Batch aggregated sentiment counts (offline analysis).

🚀 How to Run Locally

Start Services
- Start Zookeeper, Kafka, and PostgreSQL.
Create Database
```
createdb reddit_stream_db
```
Install Required Python Packages
```
pip install pyspark kafka-python vaderSentiment
```
Download PostgreSQL JDBC Driver
Run Streaming Jobs
- Start producer in ingestion folder
- Run reddit_keyword_filter.py
- Run sentiment_analysis.py
- Run sentiment_window_aggregator.py
Run Batch Job (Optional)
- After some data has been collected, run sentiment_batch_aggregator.py.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
batch_processing		batch_processing
ingestion		ingestion
spark_streaming		spark_streaming
README.md		README.md
Sentiment_Analyse.png		Sentiment_Analyse.png
commands.txt		commands.txt
commands_to_run.txt		commands_to_run.txt
powerbi plots.pbix		powerbi plots.pbix
testing.txt		testing.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reddit Stream Sentiment Analysis

🛠 Architecture Overview

Real-time Streaming

Batch Processing

📚 Technologies Used

📦 Kafka Topics

🗄 PostgreSQL Tables

🚀 How to Run Locally

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Reddit Stream Sentiment Analysis

🛠 Architecture Overview

Real-time Streaming

Batch Processing

📚 Technologies Used

📦 Kafka Topics

🗄 PostgreSQL Tables

🚀 How to Run Locally

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages