| Literature DB >> 35885194 |
Wendi Yao1, Yifan Wang1, Mengyao Zhu1, Yixin Cao2, Dan Zeng1.
Abstract
Due to a colossal soccer market, soccer analysis has attracted considerable attention from industry and academia. In-game outcome prediction has great potential in various applications such as game broadcasting, tactical decision making, and betting. In some sports, the method of directly predicting in-game outcomes based on the ongoing game state is already being used as a statistical tool. However, soccer is a sport with low-scoring games and frequent draws, which makes in-game prediction challenging. Most existing studies focus on pre-game prediction instead. This paper, however, proposes a two-stage method for soccer in-game outcome prediction, namely in-game outcome prediction (IGSOP). When the full length of a soccer game is divided into sufficiently small time frames, the goal scored by each team in each time frame can be modeled as a random variable following the Bernoulli distribution. In the first stage, IGSOP adopts state-based machine learning to predict the probability of a scoring goal in each future time frame. In the second stage, IGSOP simulates the remainder of the game to estimate the outcome of a game. This two-stage approach effectively captures the dynamic situation after a goal and the uncertainty in the late phase of a game. Chinese Super League data have been used for algorithm training and evaluation, and the results demonstrate that IGSOP outperforms existing methods, especially in predicting draws and prediction during final moments of games. IGSOP provides a novel perspective to solve the problem of in-game outcome prediction in soccer, which has a potential ripple effect on related research.Entities:
Keywords: Bernoulli distribution; in-game outcome prediction; machine learning; probability prediction; regression coefficients; soccer
Year: 2022 PMID: 35885194 PMCID: PMC9315984 DOI: 10.3390/e24070971
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.738
Figure 1On the 2012–2019 seasons CSL data, the distribution of the number of goals per unit time frame under different split situations: (a) no divide, (b) one game divided into 4 time frames, (c) one game divided into 40 time frames, and (d) one game divided into 200 time frames. The numbers on the horizontal axis represent the number of goals scored in a time frame.
Figure 2Flowchart of in-game outcome prediction (IGSOP). The upper part represents the training process of the team goal probability model, and the lower part describes the game outcome prediction process.
Pre-game features description.
| Home Features | Away Features | Differential Features |
|---|---|---|
| Home form | Away form | Form differential |
| Home streak | Away streak | Streak differential |
| Past 10 home shots | Past 10 away shots | Past 10 shots differential |
| Past 10 home goals | Past 10 away goals | Past 10 goals differential |
| Past 10 home corners | Past 10 away corners | Past 10 corners differential |
| Home attack rating | Away attack rating | Attack rating differential |
| Home defense rating | Away defense rating | Defense rating differentia |
| Home streak rating | Away streak rating | Streak rating differential |
| Home goal difference | Away goal difference | Goal difference differential |
| Home weighted streak | Away weighted streak | Weighted streak differential |
Overview of 22 event types.
| Event Type | Description |
|---|---|
| Block | A player blocks a shot on target from an opposing player |
| Save the ball | A goalkeeper preventing the ball from entering the goal |
| Chance | A situation where a player should be expected to score |
| Clearance | A player kicks the ball away from his own goal |
| Cross | A ball played in from wide areas into the box |
| Dribble | A player attempts to beat an opponent when he is in possession |
| Drop of ball | A goalkeeper tries to catch the ball, but drops it from his grasp |
| Penalty | Foul resulting in a free-kick, penalty, and player out |
| Hold of ball | A goalkeeper holds the ball in his hands |
| Own goal | A player kicks a ball into his own net |
| Pass | Any intentional played ball from one player to another |
| Reception | Receive the ball from another player |
| Corner | A kick is taken from the corner of the field |
| Shot not on target | Shot off the net |
| Shot on target | Shot into the net, no matter score or not |
| Tackle | A player takes the ball away from the player in possession. |
| Free-kick | Direct free-kick and indirect free-kick |
| Goal kick | The goalkeeper restarts the game and kicks the ball |
| Goal | Goal and score |
| Offside | A player who is in an offside position when the pass was made |
| Yellow card | A player is shown a yellow card |
| Red card | A player is shown a straight red card |
Comparison of the prediction results of the home team’s goal probability by Ridge Linear Regression, Bayesian Ridge Regression, RF and XGB.
| R2 | MAE | RMSE | |
|---|---|---|---|
| Ridge Linear Regression | 3.695 × 10−2 | 1.619 × 10−4 | 1.272 × 10−2 |
| Bayesian Ridge Regression | 3.393 × 10−2 | 1.623 × 10−4 | 1.274 × 10−2 |
| RF | 2.443 × 10−2 | 1.640 × 10−4 | 1.280 × 10−2 |
| XGB | 3.033 × 10−2 | 1.630 × 10−4 | 1.276 × 10−2 |
Comparison of the prediction results of the away team’s goal probability by Ridge Linear Regression, Bayesian Ridge Regression, RF and XGB.
| R2 | MAE | RMSE | |
|---|---|---|---|
| Ridge Linear Regression | 1.007 × 10−2 | 1.508 × 10−4 | 1.228 × 10−2 |
| Bayesian Ridge Regression | 0.995 × 10−2 | 1.508 × 10−4 | 1.228 × 10−2 |
| RF | 0.135 × 10−2 | 1.521 × 10−4 | 1.233 × 10−2 |
| XGB | 0.560 × 10−2 | 1.515 × 10−4 | 1.231 × 10−2 |
Ranked probability score (RPS) of MC, PD, and IGSOP. In addition to the overall RPS, we also calculated the average RPS for the first half, the second half, the last 25% of games, and the last 10% of games.
| First Half | Second Half | Final 25% | Final 10% | Overall | |
|---|---|---|---|---|---|
| MC | 0.1811 | 0.1099 | 0.0904 | 0.0758 | 0.1455 |
| PD | 0.1778 (−1.82%) | 0.0913 (−16.9%) | 0.0609 (−32.6%) | 0.0318 (−58.0%) | 0.1346 (−7.49%) |
| IGSOP | 0.1755 (−3.09%) | 0.0892 (−18.8%) | 0.0570 (−36.9%) | 0.0270 (−64.4%) | 0.1323 (−9.07%) |
Results of the 5 × 2 cv paired t-test performed on MC, PD and IGSOP.
| IGSOP/MC | −4.512 | 0.006 < 0.05 |
| PD/MC | −3.270 | 0.022 < 0.05 |
| IGSOP/PD | −1.689 | 0.152 > 0.05 |
Figure 3Probability calibration for the multiclass classification method, Poisson distribution method, and IGSOP. (a) Multiclass classification calibration curves and ECE, (b) Poisson distribution calibration curves and ECE, and (c) IGSOP calibration curves and ECE.
Figure 4Regression coefficients of the top 30 features that have the greatest impact on the home team’s goal probability. Blue represents a positive effect on the probability of scoring, and red represents a negative effect.
Figure 5Regression coefficients of the top 30 features that have the greatest impact on the away team’s goal probability. Blue represents a positive effect on the probability of scoring, and red represents a negative effect.