First of all, the data number is rather small. We have 16 predictions for the group stage and 15 predictions for the knockout stage, altogether $N=31$. Second, there is not a single criterion for //What is best?// If we chose the correct prediction of World Champion, then **NOB** is the best. But this criterion is too coarse. A bit better might be the number of correct predictions of qualification (**CP**). In this case, the best prediction is done by **GOP, NOB** and **SUD** (CP=20). But keep in mind, a single match that would have been different by a single penalty, like Spain against Russia could turn the whole ranking. Therefore, also the number of correct predictions is not suitable, it just a sidemark. If we use the prediction score, every prediction takes into account. But even in this case, the decision might not be as clear. The group stage is significantly different from the knockout stage. To give a rough overview about the significance of the result let us assume, that for every single prediction the statistical error is 1% (rough estimate). With usual error propagation one can deduce an error estimation for $P$. This error is shown in the above chart by grey error bars. As can be seen, the first three predictions are within this error bars equal and fourth and fifth place are not that far away. To give a rough overview about the significance of the result let us assume, that for every single prediction the statistical error is 1% (rough estimate). With usual error propagation one can deduce an error estimation for $P$. This error is shown in the above chart by grey error bars. As can be seen, the first three predictions are within this error bars equal and fourth and fifth place are not that far away.
