When I try to understand a complex system like soccer, I usually find myself going through three stages of analysis. First I try to come up with a theory of the system that makes intuitive sense and reflects its actual dynamics, to the extent I can learn about them. Then I try to build a model based on this theory using rigorous statistical methods. Finally, I test the results from the model for their robustness, to make sure the whole exercise hasn’t led me to spurious or unwarranted conclusions. To my eye, the evaluation of expected goals models based on shots could use a bit more theory, rigor, and robustness – and hopefully today’s post will help.

Let’s start by considering the true distribution of scoring chances, which is what the models are trying to capture. There are two major kinds of variation in the distribution: within shots and between shots. The former implies that different players taking the same shot will not have the same chances of scoring. The latter represents the fact that shots in different locations and situations will not have the same chances of scoring, either.

Expected goals models based on shots, in their simplest interpretation, attach chances of scoring to shots based on historical averages. Unless they adjust for the identity of the shooter – and, as I’ve written recently, it’s hard to do that robustly – then they will capture the “between” variation but not the “within” variation.

Of course, it is possible that the premise behind the models is entirely wrong. Let’s say that players won’t shoot unless there’s some minimum chance of scoring. This minimum chance is implicitly understood between the players, their teammates, their coaches, and the fans. If players shoot when the chance of scoring is lower than the minimum, then they’ll suffer the opprobrium of all those people. So they keep looking for opportunities at or above that minimum, and there are no shots with lower chances of scoring. Moreover, if the chance of scoring rises, say, the longer the players manage to hold onto the ball, but the likelihood of losing the ball is always high, then there won’t be many shots with higher chances of scoring, either. The players will shoot as soon as they obtain the minimum.

If this alternative theory of shooting is correct, then the true distribution of scoring chances will have very little variation. Expected goals models will estimate too much variation, since they won’t take into account many of the compensating factors that affect players’ decisions to shoot. For example, an expected goals model might call a shot a 20% chance when in fact, because of the sunlight in the shooter’s eyes, the true chance of scoring is just 10%. Or the model might say a shot has a 5% chance of scoring, but the shooter’s knowledge of the goalkeeper’s tendencies makes the true chance 10%.

The absolute version of this theory – all shots have the same chance of scoring – seems unlikely, since buckets of shots from different locations on the field do turn into goals at different rates. But even a hint of it could produce problems for expected goals models. These problems will sometimes manifest themselves when we try to predict the outcomes of matches.

So how would we make these predictions? Expected goals models based on shots estimate the chances of scoring for all the shots by each team in each match. We can simulate these chances – essentially replaying the game – thousands of times, drawing random numbers between 0 and 1 and then comparing them to the estimated scoring chances, to create a distribution of scores for a match. Doing this across a whole season, however, will often generate a simulated distribution of scores that fails to match the actual distribution.

A common reason for the mismatch is that the frequency of teams scoring 0 goals or 1 goal is overestimated, and the frequency of 2 or more goals is underestimated.* Is this because we estimate too little variation in the distribution of scoring chances, or too much? One way to answer this question is by replacing all the expected goal values by the average – flattening the distribution completely – and then trying the simulation again.

The result is that the frequency of 2 or more goals is underestimated even more. Most likely, we need to move in the opposite direction: capturing more variation, not less. The theory of shooting based on a minimum chance of scoring looks incorrect, at least when applied uniformly to all shots. Variation within shots may be the big missing ingredient.

Putting aside this fundamental issue, it’s worth noting here that the use of simulations requires at least two judgment calls. One is whether to look for the most likely match outcome – home win, away win, or draw – across all the iterations of our simulation, or to look for the most likely scores for the home and away teams and then impute the outcome from those two numbers. We also have to decide how to select the most likely scores; depending on the underlying data, choosing the mean, median, or mode will push our predictions in different directions.

These challenges don’t mean expected goals models based on shots are worthless. Their estimates of scoring chances are apparently more useful than the default – that is, the average chance of scoring across all shots. Moreover, they don’t do too badly in predicting match outcomes; even though my models are really geared towards player evaluation and predicting positions at the end of the season, they still beat the bookmakers’ odds for individual matches. But there’s always room for improvement.

More importantly, it should be clear by now that expected goals models based on shots only capture one aspect – albeit an important one – of the game. That’s why I use a variety of models to appraise clubs and players. Time is limited, and I firmly believe that refining a single model endlessly will yield less understanding than pursuing these other avenues as well.

__________

* To understand why this can happen, recall that the composition of a team’s total expected goals matters. Ten shots with a 10% chance of scoring will not yield the same simulated distribution of scores as another ten shots split evenly between 5% chances and 15% chances. For more on this, click here.