How good was David Beckham? | North Yard Analytics

Was David Beckham a great soccer player, or just a good one who became a cultural icon? He undoubtedly had talent, but teams who signed him faced questions about whether they did so for his star power more than his sporting prowess. To assess him as a player, we could look at his shots, goals, passes, and tackles. But a new method – new to soccer analytics, as far as I can tell – can help to reveal Beckham’s true value to his teams without looking at any of his individual statistics.

Though a fairly reliable starter for Manchester United, throughout his career Beckham missed many matches through injury and suspension, as well as for commercial and international obligations. This variation in his availability to his teams created a series of “natural experiments” that now offer an opportunity to appraise his value as a player.

So what is a natural experiment? In a traditional laboratory experiment, scientists randomly split a group of people into two smaller groups of similar characteristics: the test group, which receives some sort of treatment, and the control, which doesn’t. The concept of a natural experiment simply recognizes that this happens in non-laboratory settings as well.

When a soccer player is injured or suspended, his team unexpectedly has to play without him. Because injuries and suspensions occur somewhat randomly, this change should not be correlated with other important aspects of the team’s potential. For this reason, we can divide a team’s games into two similar groups, where the player was available for only one.

For simplicity, I focused on league games. I considered Beckham available when he played, obviously, but also when he was an unused substitute or rested by the decision of his manager. My analysis begins with Beckham’s first season as a regular starter, 1995-96. The following table shows the matches for which Beckham was and was not available for the three teams that employed him the longest:

David Beckham — Match Availability

Club	Matches Available	Matches Unavailable	Dates
Manchester United	284	19	August 1995 – May 2003
Real Madrid	122	28	August 2003 – May 2007
Los Angeles Galaxy	119	74	August 2007 – December 2012

Comparing the average goal difference (GD) per game between these groups of games offers the simplest way to evaluate the ease of replacing Beckham:

David Beckham — Goal Differential

Club	Goal Differential when Available	Goal Differential when unavailable	Difference
Manchester United	1.11	1.37	-0.26
Real Madrid	0.71	0.86	-0.14
Los Angeles Galaxy	0.22	0.35	-0.13

In every case, Beckham’s teams appear to have performed better when he was not available for selection. (Note: The difference for Real Madrid is indeed -0.14; the apparent error is a result of rounding in the other figures.) Of course, some of these recorded differences may be due to randomness; the true effect of Beckham’s availability is difficult to estimate with a small number of games. Also, other factors may have affected goal difference in these games. These factors shouldn’t be correlated with Beckham’s availability if indeed it was randomly determined, but controlling for some of them should improve the precision of the estimate.

Three of the most important factors contributing to goal difference are home field advantage, the strength of Beckham’s team in a given season, and the strength of the opposition in each match. By adjusting for these factors, regression analysis can provide more precise estimates of the effect associated with Beckham’s availability. Here are the new estimates, along with the levels of statistical confidence that the true effects are negative:

David Beckham — Estimated effect

Club	Estimated effect	Confidence that effect is negative
Manchester United	-0.47	74%
Real Madrid	-0.13	29%
Los Angeles Galaxy	-0.31	75%

Based on this evidence, it cannot be ruled out that Beckham was as good as his replacements at Real Madrid. However, there is a good chance that his replacements were better at Manchester United and the Los Angeles Galaxy. The inclusion of other controls could alter these results, though it seems unlikely that the signs attached to the estimated effects would turn positive.

These are not the highest levels of statistical confidence – econometricians usually use 90% or 95% – but they are high enough to be worthy of notice. Still, we should take these results with an extra grain of salt, because other factors might have affected the matches Beckham missed.

For example, in Beckham’s absence, other players may have made an extra effort to pick up the slack. His replacements may also have tried especially hard to impress the coaches and fans, seizing their opportunities to shine. In both cases, the players involved might not have been able to maintain the same level of performance over an entire season. Given this caveat, the statistical results may actually underestimate Beckham’s value as a player.

Nevertheless, the point of this exercise is that a player’s value to his team can, in some cases, be estimated without looking at any of his individual statistics. Nowhere does this method specify how Beckham contributed to goal difference. The natural experiments measure his contribution in its entirety, no matter how it was achieved.