This repository contains a production-inspired end-to-end data pipeline for official Brazilian lottery results. The project shows how to collect, store, transform, and visualize real-world data using modern Python tools and Docker.
The pipeline processes lottery results from games like Quina and Mega-Sena through these steps:
- Fetch Data from API – Pull historical lottery results as structured JSON.
- Store Raw Data in MinIO – Save each draw in an S3-compatible bucket.
- Transform and Query with DuckDB – Normalize fields and create queryable tables.
- Visualize with Streamlit – Interactive dashboard to explore results and statistics.
- Prefect – Orchestrates workflows with retries, scheduling, and monitoring.
- MinIO – Local S3-compatible data lake for storing raw JSON data.
- DuckDB – Columnar database optimized for fast querying and transformations.
- Streamlit – Interactive, user-friendly dashboard.
- Docker – Ensures reproducibility with isolated, connected containers.
⚠️ Note: Please fork this repository before using it, so you can safely modify environment variables and experiment without affecting the original project.
All the services are set up in the /docker folder. To get the pipeline running:
- Copy the environment example:
cp docker/.env.example docker/.env- Edit .env with your credentials if needed.
- Start everything with
docker compose -f docker/docker-compose.yaml up -d- Access services:
- MinIO API: http://localhost:17110
- MinIO Console: http://localhost:17111
- Prefect UI: http://localhost:17112
- Streamlit Dashboard: http://localhost:17113
For a detailed walkthrough, check out the three-part series: