Skip to content

pessini/data-pipeline-blog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎲 Lottery Pipeline Project

Python Docker Prefect Streamlit

This repository contains a production-inspired end-to-end data pipeline for official Brazilian lottery results. The project shows how to collect, store, transform, and visualize real-world data using modern Python tools and Docker.

Project Overview

The pipeline processes lottery results from games like Quina and Mega-Sena through these steps:

  1. Fetch Data from API – Pull historical lottery results as structured JSON.
  2. Store Raw Data in MinIO – Save each draw in an S3-compatible bucket.
  3. Transform and Query with DuckDB – Normalize fields and create queryable tables.
  4. Visualize with Streamlit – Interactive dashboard to explore results and statistics.

Tech Stack

  • Prefect – Orchestrates workflows with retries, scheduling, and monitoring.
  • MinIO – Local S3-compatible data lake for storing raw JSON data.
  • DuckDB – Columnar database optimized for fast querying and transformations.
  • Streamlit – Interactive, user-friendly dashboard.
  • Docker – Ensures reproducibility with isolated, connected containers.

Quick Start

⚠️ Note: Please fork this repository before using it, so you can safely modify environment variables and experiment without affecting the original project.

All the services are set up in the /docker folder. To get the pipeline running:

  1. Copy the environment example:
cp docker/.env.example docker/.env
  1. Edit .env with your credentials if needed.
  2. Start everything with
docker compose -f docker/docker-compose.yaml up -d
  1. Access services:

Series Articles

For a detailed walkthrough, check out the three-part series:

About

Source repo for the blog post series Building End-to-End Data Pipelines: A Hands-On Guide for Data Scientists

Topics

Resources

Stars

Watchers

Forks

Contributors