Euro 2016 Predictions Using Team Rating Systems
The 2016 UEFA European Championship is about to kick-off in a few hours in France with 24 national teams looking to claim the title. In this post, we’ll explain how to utilize various football team rating systems in order to make Euro 2016 predictions.
Rating systems for football teams
Have you ever wondered how to predict the outcome of a football match? One of the basic techniques for doing so is to use a rating system. Usually, a rating system assigns each team a single parameter – its rating – based on its performance in previous games. These ratings can then be used to generate predictions for future matches. There are many rating systems to choose from. In this post, we will review several methods used for rating football a.k.a. soccer teams (of course, these methods can also be applied to other sports). Next, we will use these rating systems to generate our Euro 2016 predictions.
Elo rating system
However, before getting started with football, we’ll have to briefly discuss… chess. In the previous century, Arpad Elo, a Hungarian-American physicist, proposed a rating system to assess chess players’ performance. Since its development, the system has been widely adapted for other sports and online gaming. It also serves as the foundation for other rating systems, such as: Glicko or TrueSkill. The Elo model’s appealing formulation, elegance and, most importantly, accuracy, contributed to its popularity.
Let’s briefly introduce the Elo model. The general idea is that the Elo model updates its ratings based on what result it expects prior to the game and its actual outcome. There are two steps in compiling team ratings. First of all, given two team ratings ri and rj, one can derive the expected outcome of their match by using the so-called sigmoid function applied to the difference in their ratings. This function takes values from 0 to 1 and has a direct interpretation as a probability estimate. The exact formula is
where a is a scaling factor and h is an extra points parameter for the home team, which has a slight advantage over the visiting team (in chess, a parallel advantage is given to the ‘White’ player who always makes the first move). Given the predicted outcome pij and actual outcome oij equal to 1 in case of team i‘s win, 0.5 in case of a tie and 0 for team j’s win, the ratings are updated as follows:
and accordingly for the second team:
Here, k is the so-called K-factor, which governs the magnitude of rating changes. Note that in its original formulation the Elo system only predicts binary outcomes with 0.5 being interpreted as a draw. To generate the probability of a tie we used a simple method suggested here.
As far as football is concerned, Elo ratings’ implementation is maintained at EloRatings.net website. Moreover, the system is also the basis of the FIFA Women’s World Ranking. Notably, these systems have been documented to work better than FIFA’s Men’s Ranking when considering the ranking systems’ predictive capabilities. We will employ both versions of the Elo model in their original formulation to generate the predictions below.
Ordinal logistic regression ratings
Another way of estimating team ratings is to use an ordinal regression model. This model is an extension of the basic logistic regression model to ordered outcomes – in this case win, draw and loss. Somewhat analogous to the Elo system, the probabilities of the occurrence of these events, given the two teams’ ratings ri and rj are determined as:
where c > 0 is a parameter governing draw margin and h is used to adjust for home team advantage. Here, unlike in the original Elo model, the probability of a draw is modeled explicitly (in case c = 0 we arrive at the Elo’s expected outcome equation provided previously). Using these equations and the method of maximum likelihood, one can estimate team ratings ri, c and the home team advantage parameters.
Least squares method
The next rating system is based on a simple observation that the difference si – sj in the scores produced by the teams should correspond to the difference in ratings:
Again, h is a correction for the home team i advantage. The rating system’s name originates from its estimation method: one finds ratings ri such that the sum of squared differences (over a set of games) between the two sides of the above equation is minimal. Kenneth Massey’s website, among others, compiles and maintains a version of the rating system for various sports.
For the least squares model, we still need to generate probabilities for particular outcomes. Once again, we do this by using the sigmoid function analogously to the Elo model.
Poisson model
The final rating system that we’ll discuss is based on the assumption that the goals scored by a team can be modeled as a Poisson distributed variable. This distribution is applicable in situations in which we deal with count data, e.g., the number of accidents, telephone calls or… goals scored :) the mean rate of this variable is dependent on the attacking capabilities of a team and the defensive skills of its opponent. This extends ratings to two parameters – offensive and defensive skills per team as opposed to a single parameter in the methods discussed above.
Given the attacking and defensive skills of teams i and j, ai, aj and di, dj, respectively, the rates of Poisson variables for a home team i and visiting team j, λ and μ respectively, are modeled as:
Under this model, the probability of a score x to y is equal to:
Given a dataset of matches, one can estimate the team rating parameters using the maximum likelihood method. Here, we employ the basic version of the model that assumes that the Poisson variables modeling the goals scored by the teams, given their rating parameters, are independent.
Tuning the predictive power
We used the rating systems presented here to estimate win, draw and loss probabilities for every pair of possible matchups among the 24 teams participating in Euro 2016. Given these probabilities, we simulated the tournament multiple times and computed each team’s probability of winning it all. We used the database of international football match results provided at this website (thanks to Christian Muck for generously exporting the data).
First of all, the rating systems involve some adjustable parameters e.g., weights for importance of matches (friendly vs. World Cup final), a weighing function for most recent results and regularization (to avoid overfitting of rating models to historical results). We then tuned these parameters to maximize the predictive accuracy of the models: using a sample of games, we predicted their results and evaluated them. For tuning the parameters, we chose matches from major international tournaments – World Cup finals, European Championships and Copa America (South American continental championships).
The parameters of ratings systems are chosen for World Cup finals held between 1994 and 2010 (5 tournaments), UEFA European Championships 1996 – 2008 (4) and Copa America finals 1999 – 2011 (5). This accounts for a set of 562 matches. The prediction accuracy is evaluated using logarithmic loss (so-called logloss). It is an error metric that is often used to evaluate probabilistic predictions. Perhaps a more direct interpretation is provided by accuracy – this is just the percentage of matches that were correctly predicted by a given method. The table below presents logloss for probabilities of match outcome as well as accuracy of predictions for each method.
Method | Logloss | Accuracy |
EloRatings.net | 0.9818 | 52% |
FIFA Women World Rankings | 0.9934 | 52% |
Ordinal Logistic Regression | 0.9638 | 53% |
Least Squares | 0.9553 | 55% |
Poisson Ratings | 0.9646 | 55% |
The estimates below might be overly optimistic since they were chosen so as to minimize the prediction error on this specific set of games. To validate the methods more thoroughly, we used 121 other matches from the three most recent tournaments – the 2014 World Cup finals, the 2012 European championships and 2015 Copa America finals. The results are presented below. To provide some context for the numbers, we present a benchmark solution of random guessing and probabilities derived from an average of bookmakers’ odds. A random guess yields a logarithmic loss of -log(1/3) ≈ 1.1 and accuracy of 33% for a three-way outcome.
Method | Logloss | Accuracy |
EloRatings.net | 1.0074 | 55% |
FIFA Women World Rankings | 1.0032 | 54% |
Ordinal Logistic Regression | 0.9972 | 50% |
Least Squares | 0.9949 | 56% |
Poisson Ratings | 0.9981 | 55% |
Random guess | 1.0986 | 33% |
Bookmakers | 0.9726 | 52% |
Ensemble | 0.9919 | 55% |
The results achieved by bookmakers (in terms of logloss) are better than all the individual rating methods. Of course, the bookmakers can include some additional information on player injuries, suspensions or a team’s form during the contest – this provides them with an advantage over the models. Including such external information would be the next step to enhancing the accuracy of the presented models. In any case, the accuracy of predictions is slightly better in case of the rating systems. The bottom row of the table presents results for an ensemble method – which is the average of predictions for the three best performing methods: least squares, Poisson and ordinal regression ratings. It is a simple method for increasing the predictive power of individual models. We observe that this method slightly improves logloss while maintaining accuracy.
The rating methods presented here have some limitations. There are many factors influencing match results and we only covered simple predictive models based on historical data. Naturally, one could use some external and more sophisticated information e.g., players and their skills, and include it in a model. We encourage you to think about other factors playing a role in match outcomes which could be included in a model. This could greatly improve the models’ accuracy!
Euro 2016 predictions
Given match outcome probabilities for each possible matchup, we simulated 1,000,000 Euro 2016 tournaments. We sampled only win, draw and loss results. If – after considering head-to-head results – the teams are still tied in the group stage, we resolved such ties randomly. According to the tournament’s official rules, we should use goal differences, however, this information is not available in our simulation. Notably, coin-tosses (random outcome) were used to resolve ties (if the game was tied after extra-time) before the penalty shoot-out was “invented.” For instance, on its way to winning Euro 1968, Italy “won” its semifinal with the USSR through a coin toss. Although we do not support this manner of deciding the outcomes of sporting events, we employ drawing lots if teams are tied at the end of the tournament’s group stage. If there is a draw in the playoffs, we sample the result again.
And… here are the predictions generated using the ensemble of the three best-performing ratings systems! The consecutive columns indicate the probability of advancing to a given stage of the competition. For example, the number next to Portugal in the first column indicates that there is a 91.37% chance that it will advance past the group stage. On the other hand, in the case of Spain, there is a 33.95% chance that it will reach the Euro 2016 final. The last column indicates a team’s chance of winning the whole tournament.
Team | Last 16 | Quarterfinals | Semifinals | Final | Champions |
France | 98.01% | 82.6% | 67.71% | 51.21% | 37.55% |
Spain | 92.6% | 72.24% | 51.11% | 33.95% | 19.08% |
Germany | 94.71% | 70.41% | 45.99% | 24.88% | 13.21% |
England | 93.52% | 67.5% | 40.87% | 22.25% | 10.4% |
Belgium | 84.38% | 48.2% | 26.1% | 11.51% | 4.55% |
Portugal | 91.37% | 54.7% | 26.31% | 12.09% | 4.42% |
Italy | 72.43% | 33.38% | 14.83% | 5.26% | 1.55% |
Ukraine | 76.81% | 37.05% | 15.5% | 5.53% | 1.52% |
Croatia | 66% | 31.92% | 14.65% | 5.27% | 1.5% |
Russia | 75.34% | 37.84% | 13.07% | 4.29% | 1.14% |
Turkey | 61.9% | 27.97% | 12.07% | 4% | 1.05% |
Switzerland | 69.98% | 30.49% | 11.8% | 3.97% | 0.88% |
Poland | 67.4% | 26.58% | 9.35% | 2.77% | 0.6% |
Sweden | 57.89% | 20.76% | 7.45% | 2.11% | 0.47% |
Romania | 62.64% | 23.82% | 8.07% | 2.35% | 0.45% |
Austria | 71.63% | 27.01% | 7.46% | 2.07% | 0.43% |
Slovakia | 63.66% | 25.57% | 6.96% | 1.79% | 0.37% |
Republic of Ireland | 54.68% | 18.64% | 6.38% | 1.72% | 0.35% |
Czech Republic | 46.28% | 16.19% | 5.6% | 1.44% | 0.29% |
Hungary | 56.86% | 16.08% | 3.37% | 0.69% | 0.11% |
Iceland | 47.81% | 11.32% | 2.02% | 0.36% | 0.05% |
Albania | 31.46% | 6.62% | 1.26% | 0.19% | 0.02% |
Wales | 34.29% | 7.98% | 1.19% | 0.16% | 0.02% |
Northern Ireland | 28.32% | 5.11% | 0.88% | 0.13% | 0.01% |
Some of you might find these predictions surprising – and our discussion thread is now open! As far as our thoughts are concerned, first of all, we see that France tops the ranking. The 12th man is behind them – they are playing at home and the methods we used give them some edge due to this fact. On the other hand, the prediction for four-time World Cup winners Italy is somewhat discouraging. In recent years, Italy has seen disappointing results, including draws with Armenia, Haiti and Luxembourg (not to mention their 2010 and 2014 World Cup records). However, what the rating system could not infer is the fact that the Italian team usually rises to the occasion when faced with a major challenge – which usually happens at the big tournaments. Russia’s perhaps surprisingly high position in the ranking might be partially attributed to the easier (according to the rating systems that we used) group stage opponents they will face: Wales, Slovakia and England.
All in all, no team is condemned to lose before the start of the tournament and that is the very beauty of sports. We might well end up with a surprising result, such as Greece’s Euro 2004 triumph… so, which team will upset the favorites this year?