## Predictions for World Cup 2018

Here we compare predictions for the outcome of the World Cup 2018 tournament. We only include those predictions with a detailed probability distribution in percent for *the round of the last 16, quarter-final, semifinal, final* and the *world champion*. For those predictions we make a detailed comparison of the quality of the predictions and a comparison of the different methods. First we present the sites that have made predictions and shortly summarize their methods.

### Sites / Institutes

The following sites or institutes have published a prediction for the World Cup 2018:

**ECO =**@Futbolmetrix1 and D.Paserman Economical data (population, GDP per capita and FIFA Confederation) are the base for a economic model.**EEE =**EEECON Scientific paper with probability predictions based on*Bookmaker Consensus Model*of various providers with a detailed description of the method and analysis.**EFP =**EightyFivePoints A simple model for predicting match results on ELO base with simulation of the World Cup 10,000 times has been used to evaluate the likelihood of the outcomes.**FBD =**Footballdope Predictions based Elo rating system, which is modified by home advantage of Russia and some other adjustments which do rescale overestimation of ELO rankings.**FMS =**Fussballmathe.de In version*Standard*from 13.06. A weighted (1:3:2) mixture of historical results, ELO ranking and recent market value, is used to calculate a probability for each match and eventually the overall outcome.**FNU =**@Fitbawnumbers Weighted prediction ELO/FIFA = 75/25 with code done by @EightyFivePoints.**FTE =**FiveThirtyEight Soccer Power Index (SPI) ratings, which is based on international match results (75%), club match results and a*roster-based*SPI rating (25%) are combined.**FUP =**Fussballprognose A modified ELO-ranking justified by historical results is used.**GMS =**Goldman Sachs Football and Machine Learning based on individual players and team characteristics.**GOI =**@Goalimpact*Goalimapct-method*based on individuell player strength, which are calculated by impact of goals in every game they participate or not.**GOP =**Goalprojection The tournament result is determined randomly and averaged over many outcomes, weighted by team’s average attack and defence strength from the*Goalprojection*goals rating system.**ITM =**ITM-Predictive Machine-Learning method based on historical data back to 1872 weighted by relevance, like qualification for #WorldCup18.**KIF =**KickForm Algorithmus for predictions based on the ELO system and a large simulation.**NOB =**@nobilor Based on the Spielverlagerung.de-assessment of all teams. This assessment with a scale from 1-25 for every team serves as the input of a simulation of the tournament.**SUD =**Statistics Faculty, Technische Universität Dortmund Random Forest algorithm methods with 100000 simulations based of rankings of different nature (economics, betting, FIFA, home advantage, confederation, team structure, coach) prior to the last four World Cups.**UBS =**UBS Predictions based on economic models with data based on the last 5 tournaments, the home advantage of Russia and the ELO ranking.

### Predictions for R16

The table shows the qualified teams for the round of last 16, together with predicted probabilities in percent (%). Bold letters mark the correct prediction of qualification. The last line of the table shows the number of correct predictions (**#CP**). The variation in number of correct predictions is obviously very similar and do not differentiate enough to really judge who has done the best prediction. With Germany only one big favorite team has not made R16.

Team | ECO | EEE | FBD | EFP | FMS | FNU | FTE | FUP | GMS | GOI | GOP | ITM | KIF | NOB | SUD | UBS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

BRA | 96.4 | 89.9 | 96.0 | 90.0 | 96.7 | 92.0 | 89.0 | 89.9 | 87.5 | 77.0 | 88.1 | 90.3 | 94.0 | 82.3 | 83.5 | 89.9 |

ESP | 87.5 | 85.9 | 89.0 | 81.0 | 91.9 | 82.0 | 87.0 | 75.2 | 79.7 | 85.5 | 82.0 | 83.4 | 84.0 | 72.3 | 88.4 | 87.1 |

FRA | 85.0 | 87.0 | 81.0 | 77.0 | 89.3 | 78.0 | 86.0 | 75.3 | 81.4 | 77.6 | 89.7 | 78.3 | 76.0 | 77.2 | 85.5 | 84.7 |

ARG | 88.8 | 78.7 | 88.0 | 80.0 | 87.9 | 84.0 | 77.0 | 90.7 | 73.1 | 69.2 | 80.1 | 87.3 | 83.0 | 73.9 | 81.6 | 81.1 |

BEL | 79.0 | 81.7 | 86.0 | 77.0 | 90.3 | 84.0 | 87.0 | 72.1 | 78.5 | 86.2 | 82.2 | 74.0 | 79.0 | 77.1 | 86.3 | 82.0 |

ENG | 93.2 | 75.6 | 87.0 | 80.0 | 95.5 | 76.0 | 85.0 | 77.1 | 72.3 | 72.8 | 80.2 | 81.7 | 82.0 | 70.9 | 79.8 | 87.3 |

URU | 44.2 | 68.1 | 89.0 | 77.0 | 92.2 | 83.0 | 76.0 | 70.8 | 74.4 | 49.9 | 68.0 | 74.8 | 87.0 | 64.7 | 86.6 | 76.8 |

COL | 82.5 | 64.6 | 84.0 | 75.0 | 78.7 | 75.0 | 69.0 | 84.6 | 74.9 | 41.4 | 72.3 | 77.2 | 79.0 | 52.6 | 79.2 | 56.1 |

POR | 64.4 | 66.3 | 78.0 | 69.0 | 78.6 | 79.0 | 68.0 | 66.0 | 75.2 | 54.6 | 58.2 | 61.3 | 72.0 | 62.3 | 67.5 | 63.4 |

RUS | 96.2 | 64.2 | 71.0 | 62.0 | 48.4 | 56.0 | 73.0 | 63.8 | 33.5 | 71.4 | 57.4 | 80.7 | 46.0 | 40.6 | 50.4 | 75.0 |

CRO | 34.1 | 58.7 | 57.0 | 57.0 | 68.2 | 56.0 | 64.0 | 57.9 | 42.8 | 70.4 | 74.3 | 69.5 | 57.0 | 62.8 | 65.9 | 39.9 |

SUI | 71.1 | 45.4 | 59.0 | 52.0 | 59.0 | 62.0 | 53.0 | 44.8 | 47.9 | 47.4 | 44.5 | 55.8 | 55.0 | 37.3 | 58.9 | 59.3 |

MEX | 51.5 | 45.2 | 55.0 | 56.0 | 49.8 | 58.0 | 46.0 | 55.4 | 47.8 | 40.6 | 50.4 | 45.4 | 52.0 | 58.2 | 41.5 | 53.8 |

DEN | 43.6 | 46.7 | 42.0 | 38.0 | 57.8 | 42.0 | 56.0 | 27.4 | 52.0 | 69.1 | 53.0 | 47.1 | 43.0 | 53.0 | 59.0 | 40.2 |

SWE | 52.7 | 44.5 | 31.0 | 33.0 | 42.2 | 34.0 | 50.0 | 25.7 | 45.9 | 36.8 | 27.8 | 47.4 | 35.0 | 30.1 | 54.0 | 32.5 |

JAP | 33.6 | 36.3 | 21.0 | 34.0 | 14.1 | 23.0 | 43.0 | 33.7 | 35.2 | 50.9 | 35.1 | 34.5 | 24.0 | 46.3 | 20.5 | 37.2 |

#CP | 11 | 14 | 13 | 13 | 14 | 14 | 14 | 13 | 12 | 13 | 14 | 13 | 13 | 12 | 14 | 12 |

The chart below shows the averaged prediction to make the round of last 16, together with individual predictions of the sites (colored points). The red error bars are the standard deviation of all predictions. The teams are ordered descending by the arithmetic mean (black numbers). The colored bars and numbers are the corresponding averages for the qualification of knockout stages.

The team with the largest scattering and highest standard deviation is Russia. This may be based on the fact, that some sites have included a home advantage, other not. One site (**ECO**) has a far too high prediction (96.4%), based on economic data. The same is true for the too small prediction for Croatia.
From the 8 group head teams (RUS, **GER**, BRA, POR, ARG, BEL, **POL**, FRA) and second seeded teams (ESP, **PER**, SUI, ENG, COL, MEX, URU, CRO) only **three** teams (in bold) have not qualified for R16! Those teams are included in the chart and are marked with red bars only for comparison.

### Comparison of predictions

**The question is: How should we compare the predictions for the whole tournament?**

All predictions are given in terms of probabilities, but there are different kinds of predictions. On the one hand, we have a group and a knockout stage. On the other hand, the predictions for different groups are as well kind of different and compare them is not as straightforward as one might think. Obviously, we have to define a number, which we create from all the probability predictions. The easiest way is to use a mean. But which one? The commonly used arithmetic mean is not a suitable measure.

First, suppose we had unrealistically *predicted*, that the 16 highest seeded teams will qualify with $p=100$ certainty, then we had 13 correct predicted (CP) teams with an *arithmetic mean* value of $P=75$, which is much better than the *best* such prediction of FMS with $P=71.3$. Second, eventually we want to compare the overall prediction, but the predictions are for several reasons of different nature, which itself excludes the *arithmetic mean* as a measure. One natural choice for different kinds of scales is the geometric mean:
$$
P := \left(\prod_{n=1}^N p_n\right)^{1/N}
$$
This mean yields for the mentioned *non-prediction* exactly $P=0$. The geometric mean weights smaller probabilities larger, which is reasonable. For example, to predict a qualification of R16 for Brazil with $p=80$, should relatively be less worth, than $p=30$ for Japan. The drawback is, such a calculated number $P$ is no longer a *probability*, we call it for this purpose **prediction score**, the bigger the better. It is also possible to use other means, e.g. harmonic mean, but they have other drawbacks, which we do not discuss here. In the following we stick to the prediction score $P$. Let us start with the group stage.

#### Group stage

The graphic shows the prediction score $P$ after R16 for all sites and for comparison the *arithmetic mean* as colored points. Altogether we have $N=16$ predictions for each site.

The top ranking is highly dominated by those sites with high scores for favorite teams, which in particular favores **FMS** and **ECO**. The *best* prediction is given by **FTE** with a noticeable gap to the *second best* prediction of **ITM**. The only surprise is the last place of **NOB**, which might be related to wrong assessment of Russia, Switzerland and Sweden. Otherwise the group stage is kind of boring due to the domination of the favorite teams and therefore, the variation of *best* and *worst* prediction is rather small (17%). Let us come to the more interesting knockout stage.

#### Knockout stage

The table below shows the qualified teams for the knockout stage, together with predicted probabilities. Furthermore, we have counted the number of correct predicted qualified teams for every round by ordering the probabilities.

Team | ECO | EEE | FBD | EFP | FMS | FNU | FTE | FUP | GMS | GOI | GOP | ITM | KIF | NOB | SUD | UBS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

R16 | ||||||||||||||||

FRA | 58.4 | 56.5 | 52.9 | 46.0 | 62.3 | 47.0 | 53.0 | 45.6 | 58.4 | 43.9 | 55.9 | 50.1 | 51.0 | 49.6 | 56.1 | 59.5 |

URU | 17.2 | 32.1 | 36.0 | 36.0 | 37.1 | 36.0 | 29.0 | 27.3 | 34.6 | 20.6 | 34.5 | 32.9 | 37.0 | 29.6 | 37.5 | 32.0 |

RUS | 74.7 | 28.9 | 18.2 | 23.0 | 9.0 | 16.0 | 26.0 | 21.5 | 15.4 | 37.4 | 26.1 | 40.7 | 8.0 | 13.8 | 10.5 | 30.5 |

CRO | 8.7 | 29.2 | 22.4 | 27.0 | 27.2 | 25.0 | 35.0 | 23.8 | 15.9 | 41.0 | 43.6 | 32.4 | 23.0 | 30.3 | 30.8 | 15.0 |

BRA | 67.8 | 61.2 | 74.9 | 61.0 | 75.8 | 62.0 | 66.0 | 58.3 | 60.8 | 43.1 | 62.7 | 64.7 | 74.0 | 50.3 | 51.6 | 60.5 |

BEL | 37.3 | 53.6 | 52.8 | 44.0 | 58.2 | 54.0 | 59.0 | 37.2 | 51.1 | 58.7 | 54.5 | 42.4 | 50.0 | 49.3 | 64.5 | 56.9 |

SWE | 19.4 | 16.1 | 6.9 | 12.0 | 9.4 | 11.0 | 17.0 | 8.6 | 19.4 | 16.1 | 8.7 | 18.1 | 8.0 | 11.7 | 21.7 | 9.9 |

ENG | 58.9 | 46.4 | 54.9 | 42.0 | 67.4 | 39.0 | 58.0 | 43.0 | 50.1 | 43.1 | 51.6 | 49.7 | 53.0 | 42.9 | 57.0 | 66.2 |

#CP | 4 | 4 | 4 | 3 | 4 | 3 | 4 | 4 | 4 | 5 | 5 | 4 | 4 | 4 | 4 | 4 |

QF | ||||||||||||||||

FRA | 33.8 | 36.3 | 29.5 | 27.0 | 38.0 | 27.0 | 31.0 | 28.5 | 36.6 | 24.1 | 33.4 | 29.3 | 30.0 | 32.2 | 36.9 | 35.1 |

BEL | 15.2 | 27.5 | 19.1 | 19.0 | 24.1 | 25.0 | 27.0 | 14.0 | 27.7 | 32.9 | 29.6 | 19.1 | 19.0 | 27.8 | 35.7 | 23.8 |

ENG | 33.0 | 22.0 | 20.5 | 17.0 | 30.4 | 14.0 | 27.0 | 17.5 | 28.8 | 19.8 | 27.1 | 23.0 | 21.0 | 22.6 | 29.8 | 31.4 |

CRO | 2.3 | 14.2 | 8.2 | 13.0 | 10.1 | 11.0 | 16.0 | 11.2 | 6.1 | 21.9 | 23.0 | 14.0 | 9.0 | 16.0 | 15.6 | 4.4 |

#CP | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 2 | 2 | 1 |

HF | ||||||||||||||||

FRA | 18.9 | 21.3 | 13.3 | 14.0 | 18.2 | 13.0 | 16.0 | 15.4 | 19.9 | 12.1 | 18.4 | 15.7 | 17.0 | 19.5 | 20.8 | 23.4 |

CRO | 0.5 | 6.3 | 2.5 | 6.0 | 3.2 | 4.0 | 7.0 | 4.5 | 2.0 | 11.0 | 12.1 | 5.6 | 3.0 | 7.8 | 6.0 | 1.1 |

#CP | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |

F | ||||||||||||||||

FRA | 9.7 | 12.3 | 6.1 | 7.0 | 9.1 | 7.0 | 8.0 | 7.8 | 11.3 | 5.9 | 10.0 | 8.4 | 7.0 | 11.7 | 11.2 | 7.3 |

#CP | 5 | 5 | 4 | 3 | 5 | 3 | 5 | 4 | 6 | 6 | 6 | 4 | 4 | 8 | 6 | 5 |

Only **NOB** has predicted the correct World Champion and furthermore, they have the highest number of correct predictions in the knockout stage! But let us turn to the more interesting prediction score $P$ for the knockout stage, which is shown in the chart below.

As in the case of the group stage, the *best* prediction by **GOP** has a noticeable gap to the *second best* prediction of **SUD**. The distance between best and worst is more than 54%. The largest discrepancy in group and knockout stage is given by **ECO**, which drops from 4 to 16. Only **FTE** and **ITM** are in the top five in both stages. Furthermore, all ELO/FIFA based models are placed in the lower part of the ranking. Let us combine group and knockout stage and go to the overall ranking of all predictions.

#### Summary of prediction

### Conclusion

It is hard to judge, which is the best prediction model. First of all, the data number is rather small. We have 16 predictions for the group stage and 15 predictions for the knockout stage, altogether $N=31$. Second, there is not a single criterion for *What is best?* If we chose the correct prediction of World Champion, then **NOB** is the best. But this criterion is too coarse. A bit better might be the number of correct predictions of qualification (**CP**). In this case, the best prediction is done by **GOP, NOB** and **SUD** (CP=20). But keep in mind, a single match that would have been different by a single penalty, like Spain against Russia could turn the whole ranking. Therefore, also the number of correct predictions is not suitable, it just a sidemark. If we use the prediction score, every prediction takes into account. But even in this case, the decision might not be as clear. The group stage is significantly different from the knockout stage.

To give a rough overview about the significance of the result let us assume, that for every single prediction the statistical error is 1% (rough estimate). With usual error propagation one can deduce an error estimation for $P$. This error is shown in the above chart by grey error bars. As can be seen, the first three predictions are within this error bars equal and fourth and fifth place are not that far away.

Anyway, let us draw a conclusion and keep in mind the above reasoning. The *best* prediction was done by **FTE**, and with a small gap **GOP** and **SUD** are the *second best*.

## Diskussion

Thank you for this work, I was always on the lookout for someone to keep predictions accountable ;)