An XGBoost expected goals model and interactive dashboard covering the top 5 European leagues, built on ~257,000 shots from Understat (2020–2025). Three situation-specific models handle open play, corners, and set pieces separately, with isotonic calibration. Data refreshes daily via GitHub Actions.
Each shot is routed to a situation-specific XGBoost classifier based on how it was created. All three are wrapped in CalibratedClassifierCV(method="isotonic") and tuned independently via GridSearchCV (3-fold CV, Brier score). Penalties are fixed at 0.76 xG.
| Model | Situations | Key features |
|---|---|---|
| OpenPlay | Open play, counter-attacks | Distance, angle, counter-attack proxy, throughball, rebound |
| FromCorner | Corner kicks | Header interactions, centrality, weak-angle header |
| SetPiece | Direct & indirect free kicks | Distance, angle, shot type |
24 features across geometry (distance, angle, coordinates), shot type flags (header, foot, penalty), interaction terms, zone context, and proxy variables. Because Understat labels counter-attacks as open play, a fast_break proxy is engineered from the preceding action type to capture counter-attack context without direct tagging.
| Metric | This model | Understat |
|---|---|---|
| ROC-AUC | 0.792 | 0.805 |
| Brier score | 0.074 | 0.072 |
The small gap vs Understat is expected — commercial models incorporate freeze-frame data (exact defender positions at the moment of the shot) which Understat does not expose via their public API.