Tweets Collection and Analysis Pipeline

This project implements a data pipeline using Docker. Tweets are streamed about sustainability are streamed via Tweepy Listener and stored in a MongoDB database. The ETL job performs live sentiment analysis (using VADER) on the stored tweets and loads them with the according score into a PostgreSQL database. In the end tweets with most positive sentiment are posted on Slack using a Webhook.

Docker Compose necessities

Setting up local environmental variables

Twitter API Access (via https://developer.twitter.com/):
- TWITTER_API_KEY
- TWITTER_API_SECRET
- TWITTER_ACCESS_TOKEN
- TWITTER_ACCESS_TOKEN_SECRET
PostgreSQL Credentials for your database:
- POSTGRES_USER
- POSTGRES_PASSWORD
- POSTGRES_DB
SLACK API Access (via https://api.slack.com/apps):
- SLACK_WEBHOOK (e.g https://hooks.slack.com/services/...)

Changing streaming filter

The file get_tweets.py contains the Tweepy Tweets Listener. The topic filter is found at the end of the file: stream.filter(track=['sustainable'], languages=['en'])

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Images		Images
etl_job		etl_job
slackbot		slackbot
tweety		tweety
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweets Collection and Analysis Pipeline

Docker Compose necessities

Changing streaming filter

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tweets Collection and Analysis Pipeline

Docker Compose necessities

Changing streaming filter

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages