wm18:pgwm18

Unterschiede

Hier werden die Unterschiede zwischen zwei Versionen angezeigt.

Link zu dieser Vergleichsansicht

Beide Seiten der vorigen Revision Vorhergehende Überarbeitung
Nächste Überarbeitung
Vorhergehende Überarbeitung
wm18:pgwm18 [2018/07/15 20:43] – [Conclusion] adminwm18:pgwm18 [2018/07/15 20:56] (aktuell) – [Conclusion] admin
Zeile 1: Zeile 1:
 ===== Predictions for World Cup 2018 ===== ===== Predictions for World Cup 2018 =====
-Here we compare predictions for the outcome of the World Cup 2018 tournament. We only include those predictions with a detailed probability distribution in percent for //the round of the last 16, quarter-final, semifinal, final// and the //world champion//. For those predictions we make a detailed comparison of the quality of the predictions and a comparison of the different methods. First we present the sites which have made predictions and shortly summarize their method.  +Here we compare predictions for the outcome of the World Cup 2018 tournament. We only include those predictions with a detailed probability distribution in percent for //the round of the last 16, quarter-final, semifinal, final// and the //world champion//. For those predictions we make a detailed comparison of the quality of the predictions and a comparison of the different methods. First we present the sites that have made predictions and shortly summarize their methods.  
    
 ==== Sites / Institutes ==== ==== Sites / Institutes ====
Zeile 20: Zeile 20:
   * **SUD =** [[https://arxiv.org/pdf/1806.03208.pdf|Statistics Faculty, Technische Universität Dortmund]] Random Forest algorithm methods with 100000 simulations based of rankings of different nature (economics, betting, FIFA, home advantage, confederation, team structure, coach) prior to the last four World Cups.    * **SUD =** [[https://arxiv.org/pdf/1806.03208.pdf|Statistics Faculty, Technische Universität Dortmund]] Random Forest algorithm methods with 100000 simulations based of rankings of different nature (economics, betting, FIFA, home advantage, confederation, team structure, coach) prior to the last four World Cups. 
   * **UBS =** [[https://www.ubs.com/de/de/wealth-management/house-view/special-edition.html?intCampID=INTERNAL-HPPROMOTEASER-de_ubs_house_view_world_cup-de|UBS]] Predictions based on economic models with data based on the last 5 tournaments, the home advantage of Russia and the ELO ranking.    * **UBS =** [[https://www.ubs.com/de/de/wealth-management/house-view/special-edition.html?intCampID=INTERNAL-HPPROMOTEASER-de_ubs_house_view_world_cup-de|UBS]] Predictions based on economic models with data based on the last 5 tournaments, the home advantage of Russia and the ELO ranking. 
- 
-/* 
-  * **OPT =** [[https://goo.gl/GTLFgK|Opta Sports]] Computing the win probability of every possible matchup and simulating the tournament. Unfortunately, they have used a //different// normation and those data can not be compared to the other ones. Therefore, those data were not included in the comparison.   
-*/ 
  
 ==== Predictions for R16 ==== ==== Predictions for R16 ====
Zeile 46: Zeile 42:
 ^JAP | 33.6 | 36.3 | 21.0 | 34.0 | 14.1 | 23.0 | 43.0 | 33.7 | 35.2 ^ 50.9 | 35.1 | 34.5 | 24.0 | 46.3 | 20.5 | 37.2| ^JAP | 33.6 | 36.3 | 21.0 | 34.0 | 14.1 | 23.0 | 43.0 | 33.7 | 35.2 ^ 50.9 | 35.1 | 34.5 | 24.0 | 46.3 | 20.5 | 37.2|
 ^#CP ^  11  ^  14  ^  13  ^  13  ^  14  ^  14  ^  14  ^  13  ^  12  ^  13  ^  14  ^  13  ^  13 ^   12  ^  14  ^  12 | ^#CP ^  11  ^  14  ^  13  ^  13  ^  14  ^  14  ^  14  ^  13  ^  12  ^  13  ^  14  ^  13  ^  13 ^   12  ^  14  ^  12 |
- 
  
 The chart below shows the averaged prediction to make the round of last 16, together with individual predictions of the sites (colored points). The red error bars are the standard deviation of all predictions. The teams are ordered descending by the [[https://en.wikipedia.org/wiki/Arithmetic_mean|arithmetic mean]] (black numbers). The colored bars and numbers are the corresponding averages for the qualification of knockout stages.    The chart below shows the averaged prediction to make the round of last 16, together with individual predictions of the sites (colored points). The red error bars are the standard deviation of all predictions. The teams are ordered descending by the [[https://en.wikipedia.org/wiki/Arithmetic_mean|arithmetic mean]] (black numbers). The colored bars and numbers are the corresponding averages for the qualification of knockout stages.   
Zeile 58: Zeile 53:
 **The question is: //How should we compare the predictions for the whole tournament?//** **The question is: //How should we compare the predictions for the whole tournament?//**
  
-All predictions are given in terms of probabilities, but there are different kinds of predictions. On the one hand, we have a group and a knockout stage. On the other hand, the predictions for different groups are as well kind of different and compare them is not as straightforward as one might think. Obviously, we have to define a number, which we create from all the probability predictions. The easiest way is to use a [[https://en.wikipedia.org/wiki/Mean|mean]]. But which one? The commonly used [[https://en.wikipedia.org/wiki/Arithmetic_mean|arithmetic mean]] is not a suitable measure for several reasons.+All predictions are given in terms of probabilities, but there are different kinds of predictions. On the one hand, we have a group and a knockout stage. On the other hand, the predictions for different groups are as well kind of different and compare them is not as straightforward as one might think. Obviously, we have to define a number, which we create from all the probability predictions. The easiest way is to use a [[https://en.wikipedia.org/wiki/Mean|mean]]. But which one? The commonly used [[https://en.wikipedia.org/wiki/Arithmetic_mean|arithmetic mean]] is not a suitable measure.
  
-First, suppose we had unrealistically //predicted//, that the 16 highest seeded teams will qualify with $p=100$ certainty, then we had 13 correct predicted (CP) teams with an //arithmetic mean// value of $P=75$, which is much better than the //best// such prediction of FMS with $P=71.3$. Second, eventually we want to compare the overall prediction, but as mentioned above the predictions are for several reasons of different nature, which itself excludes the //arithmetic mean// as a measure. One natural choice for different kinds of scales is the [[https://en.wikipedia.org/wiki/Geometric_mean|geometric mean]]:+First, suppose we had unrealistically //predicted//, that the 16 highest seeded teams will qualify with $p=100$ certainty, then we had 13 correct predicted (CP) teams with an //arithmetic mean// value of $P=75$, which is much better than the //best// such prediction of FMS with $P=71.3$. Second, eventually we want to compare the overall prediction, but the predictions are for several reasons of different nature, which itself excludes the //arithmetic mean// as a measure. One natural choice for different kinds of scales is the [[https://en.wikipedia.org/wiki/Geometric_mean|geometric mean]]:
 $$ $$
   P := \left(\prod_{n=1}^N p_n\right)^{1/N}   P := \left(\prod_{n=1}^N p_n\right)^{1/N}
Zeile 73: Zeile 68:
 {{wm18:predscore_v.png?640}}  {{wm18:predscore_v.png?640}} 
  
-The ranking is highly dominated by those sites with high scores for favorite teams, which in particular favores **FMS** and **ECO**. The //best// prediction is given by **FTE** with a noticeable gap to the //second best// prediction of **ITM**. The only surprise is the last place of **NOB**, which might be related to wrong assessment of Russia, Switzerland and Sweden. Otherwise the group stage is kind of boring due to the domination of the favorite teams and therefore, the variation of //best// and //worst// prediction is rather small (17%). Let us come to the more interesting knockout stage.+The top ranking is highly dominated by those sites with high scores for favorite teams, which in particular favores **FMS** and **ECO**. The //best// prediction is given by **FTE** with a noticeable gap to the //second best// prediction of **ITM**. The only surprise is the last place of **NOB**, which might be related to wrong assessment of Russia, Switzerland and Sweden. Otherwise the group stage is kind of boring due to the domination of the favorite teams and therefore, the variation of //best// and //worst// prediction is rather small (17%). Let us come to the more interesting knockout stage.
  
 \\ \\
Zeile 106: Zeile 101:
 ^#CP ^    ^    ^    ^    ^    ^    ^    ^    ^    ^      6  ^    ^    ^    ^     ^ 5  ^ ^#CP ^    ^    ^    ^    ^    ^    ^    ^    ^    ^      6  ^    ^    ^    ^     ^ 5  ^
  
-Only **NOB** has predicted the correct World Champion and furthermore, they have the highest number of correct prediction in the knockout stage! But let us turn to the more interesting prediction score $P$ for the knockout stage, which is shown in the chart below.+Only **NOB** has predicted the correct World Champion and furthermore, they have the highest number of correct predictions in the knockout stage! But let us turn to the more interesting prediction score $P$ for the knockout stage, which is shown in the chart below.
  
 {{wm18:predscore_ko.png?680}} {{wm18:predscore_ko.png?680}}
  
-As  in the case of the group stage, the //best// prediction by **GOP** has a noticeable gap to the //second best// prediction of **SUD**. The largest discrepancy in group and knockout stage is given by **ECO**, which drops from 4 to 16. Only **FTE** and **ITM** are in the top five in both stages. Furthermore, all ELO/FIFA based models are placed in the lower part of the ranking. Let us combine group and knockout stage and go to the overall ranking of all predictions.+As  in the case of the group stage, the //best// prediction by **GOP** has a noticeable gap to the //second best// prediction of **SUD**. The distance between best and worst is more than 54%. The largest discrepancy in group and knockout stage is given by **ECO**, which drops from 4 to 16. Only **FTE** and **ITM** are in the top five in both stages. Furthermore, all ELO/FIFA based models are placed in the lower part of the ranking. Let us combine group and knockout stage and go to the overall ranking of all predictions.
  
 \\ \\
Zeile 123: Zeile 118:
  
 ==== Conclusion ==== ==== Conclusion ====
-It is hard to judge, which is the best prediction model. First of all, the data number is rather small. We have 16 predictions for the group stage and 15 predictions for the knockout stage, altogether $N=31$. Second, there is not a single criterion for //What is best?// If we chose the correct prediction of World Champion, then **NOB** is the best. But this criterion is too coarse. A bit better might be the number of correct predictions of qualification (**CP**). In this case, the best prediction is done by **GOP,NOB** and **SUD** (CP=20). But keep in mind, a single match that goes differently by a single penalty, like Spain against Russia can turn the whole ranking. Therefore, also the number of correct predictions is not suitable, it just a sidemark. If we use the prediction score, every prediction takes into account. But even in this case, the decision might not be as clear as one might think. The group stage is significantly different from the knockout stage. +It is hard to judge, which is the best prediction model. First of all, the data number is rather small. We have 16 predictions for the group stage and 15 predictions for the knockout stage, altogether $N=31$. Second, there is not a single criterion for //What is best?// If we chose the correct prediction of World Champion, then **NOB** is the best. But this criterion is too coarse. A bit better might be the number of correct predictions of qualification (**CP**). In this case, the best prediction is done by **GOP, NOB** and **SUD** (CP=20). But keep in mind, a single match that would have been different by a single penalty, like Spain against Russia could turn the whole ranking. Therefore, also the number of correct predictions is not suitable, it just a sidemark. If we use the prediction score, every prediction takes into account. But even in this case, the decision might not be as clear. The group stage is significantly different from the knockout stage. 
  
 To give a rough overview about the significance of the result let us assume, that for every single prediction the statistical error is 1% (rough estimate). With usual error propagation one can deduce an error estimation for $P$. This error is shown in the above chart by grey error bars. As can be seen, the first three predictions are within this error bars equal and fourth and fifth place are not that far away.   To give a rough overview about the significance of the result let us assume, that for every single prediction the statistical error is 1% (rough estimate). With usual error propagation one can deduce an error estimation for $P$. This error is shown in the above chart by grey error bars. As can be seen, the first three predictions are within this error bars equal and fourth and fifth place are not that far away.  
  • Zuletzt geändert: 2018/07/15 20:56
  • von admin