Betting seems like an obvious application for analytics, but that doesn’t mean the analytics is easy. New information that’s pertinent to a fixture can come in at any moment right up until kickoff, and we’ll never be able to specify a model that takes account of everything. For that matter, we’d never be able to calibrate such a huge model properly, either, since we’d be dealing with lots of situations that weren’t quite identical. So what can we do?
In theory, we’d like to create a huge model of the game that could suck up any kind of news: injuries, coaching changes, the pitch, the weather, the crowd, the referee, the angle of the sun… you name it. Obviously this is impossible in practice. And even if we could create the soccer equivalent of the McGraw-Hill DRI model of the American economy, it might not work that well.
For example, let’s say we had a variable in our model for a wet pitch. Well, who should decide when a pitch is “wet”? Is it always the same person, or at least a person using the same criteria? More objectively, how many inches of rain would have to fall? But what if Stoke’s pitch soaked up rain differently from Chelsea’s? Would we need a “wet in Stoke” variable and a “wet in Chelsea variable”? What if the effect of the wet pitch depended on who was playing? It might not make as much difference to a long-ball club, after all.
What we can do is take a more parsimonious model and use it as a benchmark. If we use fewer variables, but we stick to variables that really are pretty uniform – like certain kinds of actions on the pitch – then we can at least come up with some guidelines for results in a generic set of circumstances. The bookies may try to account for many more factors, but the guidelines from our model can help us to do a gut check. If, say, the bookies have a team as 55% to win, and our model says 45%, can we think of any factors that could possibly make such a big difference? Based on everything that’s publicly knowable, does the gap seem reasonable?
It’s important to ask this question, since the bookies’ odds may not perfectly reflect their beliefs about the result of a match. They’re running a business, after all, and they have to mitigate their risks based on the demand for certain positions – and that demand could well be out of whack with the likely result. For instance, imagine that the betting fans of one club are always biased to believe that their club will win an annual derby match at home. The true probability might be 50%, but in their minds it’s 60%. So the bookies might receive an unusual volume of bets on the club to win, and they might decide to shorten the odds to protect their bottom lines.
This isn’t always the case, of course. In general, bookies’ odds are a good predictor of results. Yet sometimes it’s useful to have an extra set of guidelines that are based entirely on objective data and an unchanging model, just in case the bookies are being swayed one way or the other. And that’s why NYA is making tip sheets available.
This week’s guideline probabilities for the Premier League and the EFL Championship are online here and here, and we’ll publish more from time to time. With data collected in the same format and models that work identically in every league, we can produce tip sheets for fixtures around the world. If you’re interested, just get in touch via the email address listed on each sheet. Happy punting!