Predicting the 2025–26 UEFA Champions League — from the group stage to the final — using machine learning (Random Forest classification).
- Match-level pre-game predictions (H/D/A)
- Round-by-round simulation (Monte Carlo from per-match probabilities)
- Incremental upgrades: Elo → rolling form → team stats (FBref)
- Optional inputs (future): news, injuries, sentiment
- Core: Python (pandas, NumPy), scikit-learn
- Data: FBref (CSV) + ClubElo-style ratings (CSV with
from/to) - Viz: Matplotlib (static reports)
- IO: CSV / Parquet (for fast cached datasets)
- App (TBD): Streamlit or Flask UI
- fbref_data//.csv # schedule, shooting, passing, ...
- data/elo_filtered/.csv # time-ranged Elo per team
- scripts/preview_clean_v1.py # schema check + cleaned schedule generation
- Scope: 2025–26 UEFA Champions League (group → final)
- Current focus: data hygiene (schedule parsing), per-match Elo join, baseline RandomForestClassifier
- Risks/Next: team-name normalization, header flattening for FBref tables, leakage-proof rolling windows, calibration
- Current:
0.1.0-dev(data engineering/research/training)
- Data Scraping: https://github.com/probberechts/soccerdata