An attempt at finding expected goals in a soccer game
Statistical model to quantify shot quality in football. Calculates the probability of a shot resulting in a goal based on location, angle, and match context.
xG assigns a probability (0 to 1) to every shot attempt based on historical data. A penalty kick typically has an xG of ~0.76, while a header from outside the box might be 0.02.
Example: If a player takes 10 shots with xG values summing to 2.65 and scores 4 goals, they overperformed their expected output.
| Factor | Impact | Why It Matters |
|---|---|---|
| Distance to Goal | High | Shots from closer range have significantly higher conversion rates |
| Angle to Goal | High | Central positions offer better target visibility than tight angles |
| Body Part | Medium | Foot shots convert better than headers on average |
| Shot Type | Medium | Open play vs. set piece vs. penalty affects probability |
This implementation uses Logistic Regression trained on StatsBomb open data:
- Feature Engineering: Distance, angle, body part, shot type, previous action
- Training: Logistic regression with class balancing (goals are rare events)
- Evaluation: Log-loss, calibration plots, ROC-AUC
- Visualization: Shot maps with xG values overlaid
# Load the trained model
import pickle
with open('xg_model.pkl', 'rb') as f:
model = pickle.load(f)
# Predict xG for a shot
features = {
'distance': 12.5, # meters from goal
'angle': 0.45, # radians
'body_part': 'right_foot',
'shot_type': 'open_play'
}
xg = model.predict_proba(features)[0][1]Uses StatsBomb Open Data - free football event data including shot locations, outcomes, and context.
- Python
- Jupyter Notebook
- Scikit-learn (Logistic Regression)
- Pandas / NumPy
- Matplotlib / mplsoccer (visualization)
The model achieves reasonable calibration on held-out test data, with shots predicted at 0.1 xG converting at approximately 10%.
- StatsBomb xG Methodology
- Understat - Public xG data
- Original xG research by Opta and others
MIT