Home » BSports articles » A new take on soccer analytics

A new take on soccer analytics

Soccer analytics has come a long way in the past several years. The big teams have hired dozens of number-crunching tyros to sift through data, and academics have started to take an interest as well; not one but two excellent books have charted their progress. The problem, to my eyes, is that lots of people are still doing soccer analytics wrong. Here are some reasons why:

1. Soccer is a team game. It sounds simple, but soccer is a game where every player on the field can affect the performance of every other player. This makes evaluating players infinitely more complicated than in, say, baseball. A defender who almost never touches the ball may be the best player on a team; opposing teams know not to run plays down his side of the field. A forward who scores buckets of goals may just be the beneficiary of excellent service from playmakers – or, worse, he may be hogging chances that other players could have put away at an even higher rate.

As a result, any index that rates players based on their individual statistics is missing a big part of the picture. Context matters in soccer; indeed, as Anderson and Sally note in The Numbers Game, context is most of what’s happening on the field at any one time.

2. Soccer is a complex game. These days we can measure shots, saves, duels, dribbles – almost anything we want, in fact – but if we gave those data to an intelligent person who had never seen a soccer game before, he still might not be able to identify the most important players on the field. That’s because each player can affect the game in infinite ways. Remember Marco Materazzi in the 2006 World Cup final? He changed the whole game just by talking to Zinedine Zidane. Not playing, talking.

To capture a player’s actual contribution to goal difference or points, we have to stop trying to squeeze his participation into a pre-existing model. As Materazzi showed, a player is much more than the sum of his passes, shots, tackles, and the like.

3. Soccer players rarely change. Once players arrive in the top leagues in Europe, they’ve already been part of thousands of matches and practice sessions. Their ways of playing are unlikely to change much over time, except in response to experience and slippage in their physical abilities. A few players manage to learn new tricks that give their careers new life, as Michael Jordan did in basketball with his fade-away jumpers. Some even change position, like Eiður Guðjohnsen. But most become established in fairly fixed roles, and they stick to those roles throughout their careers.

It’s even hard for managers to affect how players operate during a game. Once in a while, a substitution or chance might bring them within earshot of a manager’s instructions. Overall, however, the player is on his own during the game’s crucial moments. This situation is a far cry from baseball, where a manager can send a batter or pitcher instructions on how to handle every single bit of the action. Given this constraint, it’s probably easier – not more important, but easier – to figure out which players to buy, which players to put on the field, and especially which players work well together than it is to help them make different decisions while they’re on it.

4. Soccer teams rarely disappear. As Kuper and Szymanski point out in Soccernomics, most of the leading English clubs from a century ago are still around today. Soccer is not just a game played over 90 minutes and change; it’s a game played over decades. Every decision that players, managers, and executives make will have repercussions for years into the future. To the extent possible, all of those future consequences should go into every decision.

This means taking a much longer-term view of team management than we usually have so far. For example, when figuring out how much to pay for a player, executives should think in terms of his costs and benefits over his entire career at the club. To do this, they need to assess probabilities. How likely is he to stay for a second year, or a third, or more? How likely is he to avoid injury and suspension? Will his salary rise over time? But also, if he helps the club to finish higher in the table, where is the club likely to finish in the season after that?

To deal with these issues, we need a new approach and some new tools. We need new measures of a player’s performance that depend on as few of the usual metrics as possible. We need to measure what he does to help his team while being agnostic about how he does it. Even new methods based on neural networks fall down here; though they start out agnostic, they end up creating models that need to be recalibrated any time anything about the game changes.

We also need to estimate and use probabilities more often. Buying a player might increase the probability of finishing fourth by 10 percent, but it will also affect the probability of every other final position as well – in the current season, and in every season to come.

Finally, we need to be clear about objectives. The question every soccer analyst should ask his or her boss is, “What are we maximizing?” Is it points, wins, profits, a combination, or something else? There is simply no way to design the right metrics, or to make the right decisions, without this knowledge. When I see the reels of statistics that appear on Twitter and other websites during a Premier League game, I wonder, “How many of these are relevant to what the team is maximizing?” My guess is that many of them are relevant, but unless analysts can draw a clear connection between them and the ultimate objective – without too many layers of estimates and errors in between – then they are meaningless.

In the coming weeks, I’ll be sharing some of my new methods for soccer analytics on this site. I’ll start by offering a simple way to assess the contribution of David Beckham to his teams without imposing any model on the evaluation of his performance. Then I’ll use Bayesian statistics to show how goals scored and conceded in the Premier League turn into cash, not just in one season but over a series of seasons. Finally, I’ll demonstrate a non-parametric method for gauging how pivotal almost every member of Newcastle United’s squad was to the team’s goal difference over the entire 2012-13 season.

I’m focusing on team selection and player transfers because I think these are the areas where clubs can most easily improve. Some of my results may surprise you, and that’s a good thing. If anyone could spot good players simply by watching them, there’d be no need for analytics. The more counterintuitive the result, the more of an advantage it may bring to a team.