When predictive analytics in football fall short (an example)
According to predictions done by Soccerbot 3000, the AI-powered prediction machine, Germany should face Brazil in the finals of the World Cup in Russia – or should have, that is. Then the unthinkable happened.
The short explanation for those not interested in the football matches being played in Russia: the German team – the same one Goldman Sachs picked as the probable world champ – failed to get out of its group for the first time in 80 years. And when the German team was vanquished by South Korea and its brilliant Son Heung-Min, predictions were proved wrong. Not much later, the always ballyhooed Brazilian team was knocked just as far out of the tournament, which is to say, all the way. Indeed, Mr. Neymar and peers, following legends Cristiano Ronaldo and Leo Messi, were sent packing before the semi-finals.
The Financial Times pointed out that Soccerbot 3000 used “200,000 models” that generated “1,000,000 possible evolutions of the tournament”. According to its prediction, this year’s Cup should have gone to Brazil by a nose, or a toe as it were. The conclusion that Machine Learning is still ineffective in predictions seems obvious, but it would be severely biased.
It wasn’t only the model that was surprised
In the detailed report on the current World Cup and among generated vast majority have shown Brazil, France, and Germany were forecast to lead, with 18.5%, 11.3%, and 10.7% chances of bringing home the cup, respectively. None of the other teams garnered more than 10%.
The models used historical data about team characteristics, individual players and recent team performance. The model later learned the correlation between these metrics and the teams’ performance based on World Cup data since 2005. Is that a massive amount of data? Indeed it is. But that’s hardly the entire issue.
The model was unable to predict the weather, player health or the atmosphere prevailing in each team. Football, like any other game, consists of many more variables than researchers are able to predict and insert into a model. Of the one million scenarios it produced, the model predicted almost 200,000 scenarios when Germany didn’t reach the round of 16.
That it didn’t happen shocked the world, not only people who trusted the AI to predict the outcome. Even the famous Gary Lineker, who said that “Football is a simple game – twenty-two men chase a ball for 90 minutes and at the end, the Germans always win.” after Germany have beaten England in Italy in 1990 updated his famous quote.
Football is a simple game. Twenty-two men chase a ball for 90 minutes and at the end, the Germans no longer always win. The previous version is confined to history.
— Gary Lineker (@GaryLineker) June 27, 2018
Racing with probability
The key role machine learning models play is in reducing the randomness of choices based on data processing. The level of accuracy applied to be used in production is highly dependant on the purpose it was designed for.
A fraud detection model that was 80% accurate would never be used in a bank or any other institution. The 20% of the fraud it didn’t catch would be nothing short of a disaster for such an institution.
On the other hand, a model that could return that same 80% processing investment opportunities would earn millions of dollars. Warren Buffet may have missed the investment opportunity in Google and Amazon, but that doesn’t make him an unreliable investor.
Considering its 81,25% accuracy in the group phase, the model would be quite reliable as an advisor, even if it was unable to read opinions, use social media, leaked information or just read the news just before each match to make corrections.
When a company has access to more reliable data or even provides all the data possible, the accuracy rises. This can be seen in visual quality control or recognizing diabetic retinopathy from photos. Predicting the outcome of sporting events is a much different business.
Even the Goldman Sachs analysts behind the model cautioned against seeing it as an oracle. In any case, however many analyses or however much data science gets done, the World Cup will be exciting to watch.