Almost two years ago, I wrote an article for Bloomberg Sports about the common traits of ideal soccer/football metrics. Given the recent proliferation of metrics for players and teams, I think it’s worth adding a few ideas about what makes some metrics valuable and others almost worthless.
Winning versus style. The biggest question about metrics is whether they measure something that leads to winning. If a metric isn’t correlated with results, then it’s probably answering a question about style. Style can be important – some teams are known for a certain style of play and want to maintain it – but winning is what ultimately leads to trophies and, in a properly managed club, profits. Some metrics that may seem useful, like a team’s share of possession, are only weakly related to winning. Aiming to dominate possession is therefore a stylistic choice as much as a step towards winning.
Agnostic versus mechanistic. Whether the goal is to win or to play a chosen style, the metrics that track progress to an outcome fall into two categories. Agnostic metrics try to gauge a player’s contribution to an outcome without asking how he produces it. Mechanistic metrics do the same thing by calculating a player’s contribution based on his actions.
Plus-minus, Shapley values, black boxes, and the like are agnostic metrics. They use algorithms to suggest which players are important to outcomes, but they can’t say why the players are important. They’re still useful because they can lead to interesting questions, such as “I thought that guy was useless, but the team always seems to play well when he’s on the field – why?” The answer would come from talking to teammates and watching video, rather than from the metric itself.
Breakdowns of expected goals, non-penalty goals plus assists per 90 minutes, and their ilk are mechanistic metrics. When using these, it’s essential to ensure not only that they lead to the desired outcome (winning or style), but also that the players being evaluated are truly responsible for the actions being measured. For instance, the players who form a chain of passes leading to a shot might all get credit for contributing to expected goals, but how much were they helped by the players off the ball? And what was the right way to divide up credit for the shot between them?
Numerators and denominators. Most metrics are ratios – actions per game, per minute, or per touch – so another central question is whether the numerator and denominator make sense together. A player who makes a lot of clearances per game may be on a team that can’t stop the opposition from attacking the penalty box; indeed, clearances per game are negatively correlated with winning. But clearances per opposing shot are positively correlated with winning, if only marginally.
Specificity and sensitivity. A metric that minimizes false positives, by rarely flagging players and teams that fail to achieve the desired outcome, is specific. A metric that minimizes false negatives, by rarely failing to flag players and teams that achieve the desired outcome, is sensitive. Both of these qualities can be important. Specificity helps clubs to avoid bringing in players who turn out to be duds, while sensitivity helps clubs to avoid missing players who turn out to be stars.
Testing for specificity and sensitivity doesn’t just mean comparing values of the same metric over time. An attacking metric where cohorts of the top 50 players are the same from year to year might look perfectly specific and sensitive. But what if Lionel Messi isn’t in the top 50? Unless the metric has uncovered something that no one else in the soccer world knows, then the metric isn’t perfectly sensitive – it has missed a star. Similarly, what if Jobi McAnuff is in the top 50? Again, the metric would require a leap of faith to believe that it’s perfectly specific. (No offense, Jobi.)
Context, context, context. Because the interactions between players are so complex, metrics mean little when devoid of context. A striker who shoots rarely may just be on a team incapable of providing service. A striker who shoots a lot may be robbing his teammates of even better opportunities to score. By the same token, a team that seems to take a lot of shots may have faced a string of opponents that didn’t press. Or the team might play in a league known for run-and-gun attacking and slack defending. Or the coach may simply prefer to create a high volume of low-quality chances rather than a low volume of high-quality chances. All of these examples suggest that the same metric might mean different things to different players and teams.
Data versus the eye test. Analysts make mistakes. An algorithm intended to mark losses of possession in event data might use the location where the ball goes out, not where it was kicked. An algorithm designed to spot cross-field runs with tracking data might also flag player substitutions. The way to avoid these errors is by checking whether what happens on the video screen matches the story told by the metrics.
These are all basic concepts, yet they’re often ignored in the creation and interpretation of new metrics. Savvy clubs will know better.