One of the biggest and most important challenges in sports analytics is asking the right question. Many metrics are produced and published without any sense of how precise they are, what assumptions they rely on, and how they might be used in practice. As a result, it’s hard to phrase questions that the metrics can answer correctly, at least from a statistical point of view. Shooting and saving skill are cases in point.

Recently Michael Caley, one of the best soccer analysts doing public work, suggested an approach to measuring shooting skill. Like others before him, he chose to focus on the difference between a shooter’s actual goals and expected goals. He borrowed his method from Russell Carleton, a baseball analyst and social scientist. The idea was to find a metric that was “reliable” – a term Carleton used to imply a correlation of 0.7 across two sufficiently large samples of events covering the same group of players. For Michael, actual goals minus expected goals per shot probably would likely become a “reliable” metric after several hundred shots.

Of course, most clubs don’t want to wait until a player has taken several hundred shots in top league action before deciding whether he’s a good shooter. There’s also a chance that innate shooting skill changes over time as players gain experience and move toward or beyond their physical peaks. They might not agree with the 0.7 standard for reliability, either; some clubs can afford take risks, while others need close to a sure thing when they spend a lot of money on a player. So it’s hard to tell if this particular signal is even worth finding in all the noise that surrounds shooting.

Nevertheless, putting aside these issues, I tried to think of a question that a club might ask that would be answered using this metric. “Who’s a better shooter, X or Y?” might be such a question, but to me it’s a badly formed one. All metrics come with a degree of uncertainty because of the many aspects of the game that aren’t measured, aren’t well understood, or are completely random. So the question really ought to sound something like, “Given our data, what’s the probability we can reject the hypothesis that X is no better at shooting than Y?” or, in more accessible language, “What’s the chance that Y is just as good a shooter as X?” Unfortunately, this isn’t a question that can be answered just by using the fact that a metric is “reliable”.

What’s the alternative? A popular one is regression analysis. We could simply calculate the difference between actual goals (0 or 1) and expected goals (somewhere between 0 and 1) for every shot, and then try to estimate the fixed effects — here the added likelihood of scoring versus the expectation — for all the shooters:

*(G – ExpG) = α + β _{1}*shooter_{1} + … + β_{n}*shooter_{n} + ε*

But already, we’d be making a mistake. Because expected goals are estimates, too, we can’t take them as constants for the purposes of this regression; the standard errors for our coefficients would be biased downward. Rather, we need to expand the regression to mimic the estimator that gave us the expected goals values in the first place:

*G = α + β _{1}*shooter_{1} + … + β_{n}*shooter_{n} + γ_{1}*factor_{1} + … + γ_{n}*factor_{n} + ε*

Here the *factor* variables are all the independent variables from our original expected goals estimator. And because the outcome variable *G* is now a binary, we probably wouldn’t want to use linear regression for our specification.

At this point, things start to get messy… like, Lionel Messi level of messy. This new estimator could be missing a ton of interaction terms, such as “(player = Andy Carroll) * (shot type = header)”. Some players, like Carroll, have a comparative advantage in shooting with their head. Others are relatively better shooting from crosses than on the dead run. Name a factor in expected goals, and there’s an interaction term – or twelve – to include.

Even without the interaction terms, the estimator quickly starts to get unwieldy. There were roughly 10,000 non-penalty shots in the English Premier League last season, and more than 150 players took at least 20 of them. There’s also supposedly something called saving skill… so maybe the goalkeepers they faced should also be part of the equation. Pretty soon, we’re trying to estimate about 200 different coefficients in one equation, and we’re still going to hang our hat on the standard errors.

I’m not comfortable presenting that kind of work to a client, which is why I use different methods to estimate shooting and saving skill. This is not at all to denigrate Michael’s work, which was exploratory and raised important questions. But as the creative statisticians reading this will undoubtedly realize, there are other ways to answer the question – and others ways to ask it, too.