Alex Trommer

Data Science · University of Michigan · alextrommer@gmail.com · GitHub

League Winner Predictor

Each season, only one team wins each league — and their forwards tend to have significantly higher goal and shot counts than everyone else. This project trains a K-Nearest Neighbors (KNN) classifier to predict whether a given forward's team won the league that season, using only two features: goals scored and shots on target. Trained on 88,310 player-season records across seven European leagues and five seasons, the model achieves 95.45% accuracy.

95.45% Final accuracy
94.35% Baseline accuracy
88,310 Player-season records
7 Leagues · 5 seasons

Dataset

Player-season records from the Premier League, La Liga, Serie A, Bundesliga, Ligue 1, Eredivisie, and Primeira Liga. The model is restricted to forwards with at least five 90-minute appearances, where the goal-scoring signal is strongest. The 2019–20 season is excluded due to the Eredivisie being suspended mid-season.

Why this works

League-winning teams dominate possession and create more chances — so their forwards rack up noticeably more goals and shots than forwards on lower-finishing sides. This gap is large enough that a simple classifier can pick up on it reliably.

GroupAvg GoalsAvg Shots on Target
Non-winners6.6016.96
Winners13.5929.37

Data cleaning

Model

KNN classifies each player by looking at the 11 most similar player-seasons in the training data (by goals and shots) and taking a majority vote on whether those neighbors' teams won the league. Hyperparameters were tuned via GridSearchCV across 150 combinations and 5-fold cross-validation.

Features: Shots On Target, Goals, Season (one-hot encoded) Scaler: StandardScaler on numerical features Search: GridSearchCV — 5-fold CV, accuracy scoring Best params: n_neighbors 11 metric euclidean weights uniform CV score 0.9545

Key findings