| Literature DB >> 25198501 |
Andreas Heuer1, Oliver Rubner1.
Abstract
We present a systematic approach for prediction purposes based on panel data, involving information about different interacting subjects and different times (here: two). The corresponding bivariate regression problem can be solved analytically for the final statistical estimation error. Furthermore, this expression is simplified for the special case that the subjects do not change their properties between the last measurement and the prediction period. This statistical framework is applied to the prediction of soccer matches, based on information from the previous and the present season. It is determined how well the outcome of soccer matches can be predicted theoretically. This optimum limit is compared with the actual quality of the prediction, taking the German premier league as an example. As a key step for the actual prediction process one has to identify appropriate observables which reflect the strength of the individual teams as close as possible. A criterion to distinguish different observables is presented. Surprisingly, chances for goals turn out to be much better suited than the goals themselves to characterize the strength of a team. Routes towards further improvement of the prediction are indicated. Finally, two specific applications are discussed.Entities:
Mesh:
Year: 2014 PMID: 25198501 PMCID: PMC4157759 DOI: 10.1371/journal.pone.0104647
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Schematic representation of the general prediction setup.
Figure 2The efficiency factors as a function of the differences of the chances for goals .
Figure 3The variance of the distribution of scoring efficiencies in dependence of the number of match days.
The different systematic and random contributions of the observables, relevant for this work.
|
|
| |
|
| 2.32 | 14.1 |
|
| 2.66 | 14.2 |
|
|
| 2.95 |
Figure 4The prediction quality of the team strength, determined via , as a function of the number of match days .
Different choices of variables are shown. For the second and third case and , respectively) the information from the previous season is neglected. The solid lines are based on the explicit formulas for the prediction quality.
The two regression parameters as a function of .
|
|
|
|
| 0 | 3.71 | 0 |
| 4 | 3.20 | 0.82 |
| 8 | 2.60 | 1.70 |
| 12 | 2.23 | 2.30 |
| 17 | 1.86 | 2.77 |
The predictions of the goal difference of the second half of the Bundesliga-season 2007/08 for each team, based on the differences of chances for goals of the previous season (3rd column) or, additionally, on the differences of chances for goals of the first 17 matches of the present season (4th column).
|
|
|
|
|
| |
| plus market value | |||||
| B. München | 23 | 24 | 10 | 21 | 23 |
| Bremen | 18 | 12 | 11 | 15 | 14 |
| Hamburg | 11 | 10 | 3 | 9 | 10 |
| Leverkusen | 16 | 1 | 2 | 9 | 8 |
| Schalke | 9 | 14 | 8 | 12 | 11 |
| Karlsruhe | −2 | −13 | −8 | −6 | −7 |
| Hannover | −1 | −1 | 3 | −1 | −2 |
| Stuttgart | −1 | 1 | 9 | 5 | 6 |
| Frankfurt | −4 | −3 | 2 | −3 | −4 |
| Dortmund | −4 | −8 | 0 | 0 | 2 |
| Wolfsburg | 0 | 12 | −4 | −5 | −2 |
| Hertha | −5 | 0 | −5 | −8 | −5 |
| Bochum | −2 | −4 | −1 | −4 | −7 |
| Bielefeld | −19 | −6 | −6 | −11 | −10 |
| Rostock | −10 | −12 | −8 | −11 | −13 |
| Nürnberg | −7 | −9 | 1 | 1 | 1 |
| Cottbus | −10 | −11 | −8 | −10 | −12 |
| Duisburg | −12 | −7 | −8 | −13 | −12 |
The estimation in the final column also involves information about the market value. The actual goal differences of the first half of that season and the second half are included in the first two columns, respectively.
The K-value for the regression model during the seasons 2002/03 and 2006/07 as well as for the Oddset-odds.
| first 10 matches of season | all 34 matches | |
| Only home advantage | 1.073 | 1.057 |
| + matches of present season | 1.054 | 1.013 |
| + matches of previous season | 1.027 | 1.004 |
| + market value | 1.019 | 1.000 |
| Oddset | 1.025 | 1.012 |
| Difference | 0.006 | 0.012 |
The impact of adding additional information to the model is listed.
Figure 5The uncertainty of the prediction of the goal difference of the second half when using the complete information of the first half ().
Different choices of variables are shown. Furthermore, the limit of perfect predictability is indicated.