🏟️ Sports Data Warehouse – Football Analytics

📌 Overview

This project implements a Sports Data Warehouse for football data using a full Raw → Staging → Warehouse architecture. The goal is to ingest data from an external football API, model it using a star schema, and enable incremental loads and BI-style analytics queries.

The warehouse answers questions such as:

Who are the top scorers?
Which players are the most consistent?
How do teams and nationalities perform over time?

🏗️ Architecture

raw/                → JSON snapshots from API Football
staging/            → Cleaned & incremental staging tables (DuckDB)
warehouse/          → Star schema (dimensions + fact)
etl/                → SQL + Python ETL logic
analytics/          → BI queries & sanity checks

Data Flow

Raw: API responses stored as JSON snapshots
Staging:
- Base staging tables (flattened JSON)
- Incremental staging using snapshot_date
Warehouse:
- Dimension tables (dim_player, dim_team, dim_league)
- Fact table (fact_player_stats)
Incremental control via etl_control

⭐ Data Model (Star Schema)

Dimensions

dim_player: player attributes (name, age, nationality)
dim_team: team metadata
dim_league: league metadata

Fact

fact_player_stats

Grain:

One row per player – team – league – season

This grain is enforced and validated using a sanity check.

🔄 Incremental Loads

Incremental logic is driven by the etl_control table:

(source_name, last_snapshot)

Each ETL run:

Reads only snapshots newer than last_snapshot
Merges data into dimensions and fact tables
Updates etl_control

This avoids full reloads and supports historical corrections.

🧪 Data Quality & Sanity Checks

A grain validation query ensures the fact table has no duplicates:

SELECT
  COUNT(*) AS total_rows,
  COUNT(DISTINCT player_key, team_key, league_key, season) AS distinct_events
FROM fact_player_stats;

(Implemented via subquery due to DuckDB limitations.)

📊 Analytics

The analytics/ folder contains example BI queries, including:

Top scorers
Most appearances
Goals by team
Goals by nationality
All-round players (goals + assists)

These queries demonstrate how the warehouse can be used for football performance analysis.

🛠️ Tech Stack

DuckDB – analytical database
Python – orchestration
SQL – transformations & modeling
API Football – data source

🚀 Next Steps

Dockerize the project for reproducibility
Add automated ETL checks
Optional BI dashboard (Power BI / Superset)

Run with Docker

docker compose up


## 👤 Author

Built as a learning-focused end-to-end data engineering project.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.venv		.venv
analytics		analytics
etl		etl
raw/api_football/players/league=39/season=2023		raw/api_football/players/league=39/season=2023
staging		staging
warehouse		warehouse
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏟️ Sports Data Warehouse – Football Analytics

📌 Overview

🏗️ Architecture

Data Flow

⭐ Data Model (Star Schema)

Dimensions

Fact

🔄 Incremental Loads

🧪 Data Quality & Sanity Checks

📊 Analytics

🛠️ Tech Stack

🚀 Next Steps

Run with Docker

About

Uh oh!

Releases 1

Packages

Languages

ThiagoLedesma/Sports-Data-Warehouse

Folders and files

Latest commit

History

Repository files navigation

🏟️ Sports Data Warehouse – Football Analytics

📌 Overview

🏗️ Architecture

Data Flow

⭐ Data Model (Star Schema)

Dimensions

Fact

🔄 Incremental Loads

🧪 Data Quality & Sanity Checks

📊 Analytics

🛠️ Tech Stack

🚀 Next Steps

Run with Docker

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages