Building end-to-end decision systems for football: from data to actionable insights under real-world constraints. (v2.1)
End‑to‑end football recruitment modelling framework integrating performance analytics, risk modelling, financial constraints, scenario simulation, and mixed‑integer optimisation.
This project builds a club‑ready recruitment decision system, not just a ranking model.
It integrates:
- Multi‑source football data (FBref + Transfermarkt)
- Position‑aware talent modelling
- Player risk estimation
- Financial modelling via Total Cost of Ownership (TCO)
- Budget‑constrained squad optimisation (MILP)
- Scenario‑based uncertainty simulation
- CVaR robust optimisation
- Executive reporting outputs
- Real club recruitment case study
The goal is to simulate realistic recruitment decisions under financial and structural constraints, closer to how clubs actually allocate transfer budgets.
- End-to-end football recruitment decision system
- Risk-adjusted player evaluation
- Budget-constrained squad optimisation
- Scenario simulation under uncertainty
- Robust optimisation (CVaR)
- Real club recruitment case study
Technologies:
DuckDB · Python · MILP · Monte Carlo · CVaR optimisation
flowchart LR
A[FBref Data] --> D[DuckDB Data Warehouse]
B[Transfermarkt Data] --> D
D --> E[Feature Engineering]
E --> F[Talent Score]
E --> G[Risk Model]
E --> H[Financial Model]
F --> I[Deterministic Optimisation]
G --> I
H --> I
F --> J[Scenario Engine]
G --> J
J --> K[Robust CVaR Optimisation]
I --> L[Squad Selection]
K --> L
L --> M[Executive Reporting]
flowchart TD
A[Raw Football Data] --> B[Data Cleaning]
B --> C[Feature Engineering]
C --> D[Talent Score]
C --> E[Risk Metrics]
C --> F[Cost Proxy]
D --> G[Player Value]
E --> G
F --> H[TCO]
G --> I[Deterministic MILP]
G --> J[Scenario Simulation]
J --> K[CVaR Optimisation]
I --> L[Squad Selection]
K --> L
L --> M[Performance Analysis]
Robust optimisation significantly improves tail performance of the squad portfolio.
Deterministic optimisation maximises expected value, while CVaR optimisation reduces exposure to extreme negative scenarios.
Example result (Notebook 07):
| Budget (€M) | Mean | P10 | CVaR |
|---|---|---|---|
| 120 | 14.55 | -0.02 | -6.58 |
| 154 | 17.49 | 5.99 | 0.83 |
| 200 | 21.66 | 11.89 | 7.21 |
Increasing recruitment investment systematically improves downside performance.
- FBref + Transfermarkt ingestion
- Relational modelling in DuckDB
Fact tables:
fact_player_season_fbref_tm
fact_player_season_availability
fact_player_market_value
Key properties:
- strict season alignment
- deduplicated joins
- one row per
(player, season)
Position‑aware composite score:
Talent = w1·z(goals90) + w2·z(assists90) + w3·z(minutes)
Features:
- computed within positional groups
- Z‑score normalisation
- weighted aggregation
Produces a cross‑position comparable performance metric.
Risk = z(|age − 24|) + z(minutes volatility)
Captures:
- deviation from peak age
- playing time instability
Negative values indicate safer‑than‑average players.
Recruitment is treated as a capital allocation problem.
TCO = Transfer Fee + discounted wage stream
Assumptions:
- Wage ratio ≈ 15% of market value
- Contract length ≈ 4 years
- Discount rate ≈ 8%
Binary MILP formulation:
max Σ xᵢ (Talentᵢ − λ Riskᵢ)
Subject to:
- transfer budget
- squad size
- positional quotas
Solved using:
scipy.optimize.milp
HiGHS solver
Player value is simulated under uncertainty.
Two engines are implemented.
Models macro football environments:
- normal season
- moderate performance shock
- severe negative shock
Simulates player volatility with:
- common performance shocks
- idiosyncratic noise
To control downside risk, the optimisation problem is reformulated using Conditional Value at Risk (CVaR).
The model maximises expected squad value while protecting against worst‑case scenarios.
This produces downside‑aware recruitment strategies.
The final stage of the project demonstrates how the full decision system could be used by a professional football club.
Scenario:
- Mid-table Big 5 league club
- €200M recruitment budget
- 18-player squad construction
The notebook compares:
Deterministic optimisation - maximises expected squad value
Robust CVaR optimisation - protects against worst-case scenarios
Key analyses include:
- deterministic vs robust squad comparison
- player replacement analysis
- budget allocation across positions
- budget vs downside performance frontier
The case study illustrates how accounting for uncertainty materially changes recruitment decisions.
Across budgets:
- CVaR squads improve P10 and CVaR
- selected players exhibit lower uncertainty (σ)
Deterministic vs robust squads show large Hamming distances, indicating fundamentally different recruitment strategies.
Robust optimisation sacrifices some expected value but dramatically improves downside protection.
This mirrors classical portfolio optimisation trade‑offs.
risk-adjusted-scouting/
│
├── db/
│ └── scouting.duckdb
│
├── notebooks/
│ ├── 02_talent_score_v1.ipynb
│ ├── 03_risk_adjusted_value_and_sensitivity.ipynb
│ ├── 04_budget_constrained_optimisation.ipynb
│ ├── 05_reporting_and_executive_summary.ipynb
│ ├── 06_robust_optimisation_cvar.ipynb
│ └── 07_real_club_case_study.ipynb
│
├── assets/
│ ├── 18man_budget_sensitivity.png
│ └── age_distribution_squad18.png
│
└── README.md1️⃣ Ensure database exists
db/scouting.duckdb
2️⃣ Run notebooks sequentially
02 → Feature engineering
03 → Risk‑adjusted modelling
04 → Deterministic optimisation
05 → Reporting
06 → Robust CVaR optimisation
07 → Real club recruitment case study
Most football analytics projects stop at player ranking models.
This project implements a decision system that:
- integrates performance + financial modelling
- enforces squad constraints
- models uncertainty
- applies robust optimisation
- outputs executable squad configurations
It bridges the gap between analytics research and real recruitment decisions.
Potential improvements:
- correlated positional shocks
- injury probability modelling
- multi‑season squad planning
- transfer fee vs wage decomposition
- Bayesian uncertainty modelling
v2.1 --- Robust optimisation + recruitment case study implemented
Pipeline includes:
- Data ingestion
- Risk‑adjusted modelling
- Deterministic MILP optimisation
- Scenario simulation
- CVaR robust optimisation
- Real club recruitment case study
- Budget frontier analysis
Manuel Pérez Bañuls
Data Scientist | Football Analytics Enthusiast | Probabilistic Modeling
Specializing in:
- Sports analytics and forecasting
- Probabilistic simulation systems
- Machine learning for football prediction
- Production-ready data pipelines
Connect & Collaborate:
- 📧 Email: manuelpeba@gmail.com
- 💼 LinkedIn: manuel-perez-banuls
- 🐙 GitHub: manuelpeba
Interested in discussing sports analytics, forecasting systems, or data-driven decision-making? Feel free to reach out.
MIT License