| Literature DB >> 33920720 |
Wei-Jen Chen1, Mao-Jhen Jhou1, Tian-Shyug Lee1, Chi-Jie Lu1,2,3.
Abstract
The sports market has grown rapidly over the last several decades. Sports outcomes prediction is an attractive sports analytic challenge as it provides useful information for operations in the sports market. In this study, a hybrid basketball game outcomes prediction scheme is developed for predicting the final score of the National Basketball Association (NBA) games by integrating five data mining techniques, including extreme learning machine, multivariate adaptive regression splines, k-nearest neighbors, eXtreme gradient boosting (XGBoost), and stochastic gradient boosting. Designed features are generated by merging different game-lags information from fundamental basketball statistics and used in the proposed scheme. This study collected data from all the games of the NBA 2018-2019 seasons. There are 30 teams in the NBA and each team play 82 games per season. A total of 2460 NBA game data points were collected. Empirical results illustrated that the proposed hybrid basketball game prediction scheme achieves high prediction performance and identifies suitable game-lag information and relevant game features (statistics). Our findings suggested that a two-stage XGBoost model using four pieces of game-lags information achieves the best prediction performance among all competing models. The six designed features, including averaged defensive rebounds, averaged two-point field goal percentage, averaged free throw percentage, averaged offensive rebounds, averaged assists, and averaged three-point field goal attempts, from four game-lags have a greater effect on the prediction of final scores of NBA games than other game-lags. The findings of this study provide relevant insights and guidance for other team or individual sports outcomes prediction research.Entities:
Keywords: National Basketball Association; XGBoost; basketball game; data mining; game score prediction; sports outcomes prediction
Year: 2021 PMID: 33920720 PMCID: PMC8073849 DOI: 10.3390/e23040477
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Flowchart of the proposed basketball game score prediction scheme.
Variables description.
| Variables | Definition | Description |
|---|---|---|
|
| 2PA | 2-Point Field Goal Attempts of a team in |
|
| 2P% | 2-Point Field Goal Percentage of a team in |
|
| 3PA | 3-Point Field Goal Attempts of a team in |
|
| 3P% | 3-Point Field Goal Percentage of a team in |
|
| FTA | Free Throw Attempts of a team in |
|
| FT% | Free Throw Percentage of a team in |
|
| ORB | Offensive Rebounds of a team in |
|
| DRB | Defensive Rebounds of a team in |
|
| AST | Assists of a team in |
|
| STL | Steals of a team in |
|
| BLK | Blocks of a team in |
|
| TOV | Turnovers of a team in |
|
| PF | Personal Fouls of a team in |
|
| Score | Team Score of a team in |
Figure 2Example of the designed features for variable in different game-lags.
Feature rank by MARS, XGBoost and SGB methods under game-lag = 4.
| Designed Feature | MARS | XGBoost | SGB | Average Rank |
|---|---|---|---|---|
|
| 6 | 8 | 10 | 8.00 |
|
| 2 | 1 | 3 | 2.00 |
|
| 5 | 6 | 8 | 6.33 |
|
| 8 | 7 | 5 | 6.67 |
|
| 7 | 9 | 7 | 7.67 |
|
| 3 | 4 | 4 | 3.67 |
|
| 4 | 5 | 6 | 5.00 |
|
| 1 | 2 | 2 | 1.67 |
|
| 13 | 3 | 1 | 5.67 |
|
| 13 | 13 | 13 | 13.00 |
|
| 13 | 10 | 11 | 11.33 |
|
| 13 | 11 | 12 | 12.00 |
|
| 9 | 12 | 9 | 10.00 |
Performance of the five single models under six game-lags.
| Methods |
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| S-ELM | 0.1020 | 0.0960 | 0.0915 | 0.0870 | 0.0931 | 0.0928 |
| S-MARS | 0.0910 | 0.0909 | 0.0897 | 0.0846 | 0.0917 | 0.0907 |
| S-XGBoost | 0.0919 | 0.0907 | 0.0911 |
| 0.0927 | 0.0920 |
| S-SGB | 0.0910 | 0.0925 | 0.0913 | 0.0845 | 0.0923 | 0.0908 |
| S-KNN | 0.0992 | 0.1011 | 0.0947 | 0.0873 | 0.0934 | 0.0941 |
Note: The bold indicates the best prediction performance.
Performance of the five two-stage models under six game-lags.
| Methods |
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| T-ELM | 0.1206 | 0.0924 | 0.0951 | 0.0863 | 0.0972 | 0.0902 |
| T-MARS | 0.0917 | 0.0911 | 0.0912 | 0.0845 | 0.0928 | 0.0900 |
| T-XGBoost | 0.0918 | 0.0930 | 0.0916 |
| 0.0929 | 0.0920 |
| T-SGB | 0.0909 | 0.0918 | 0.0912 | 0.0829 | 0.0930 | 0.0908 |
| T-KNN | 0.0998 | 0.0984 | 0.0973 | 0.0872 | 0.0993 | 0.0970 |
Note: The bold indicates the best prediction performance.
Figure 3Evaluation results of the selection of different numbers of important features for modeling the two-stage models: (a) T-ELM, (b) T-MARS, (c) T-XGBoost, (d) T-SGB, (e) T-KNN.
Comparison of prediction performance of T-XGBoost and the six competing models.
| Models (Lag = 4) | MAPE | RMSE | SSE |
|---|---|---|---|
| S-Linear | 0.0897 | 12.7324 | 75,868.89 |
| T-Linear | 0.0883 | 12.0904 | 68,410.85 |
| S-M5P | 0.0922 | 13.0613 | 79,839.95 |
| T-M5P | 0.0931 | 12.9102 | 78,,003.64 |
| S-SVR | 0.0914 | 13.0213 | 79,351.58 |
| T-SVR | 0.0889 | 12.2547 | 70,283.06 |
| T-XGBboost | 0.0818 | 11.4753 | 61,627.37 |
Wilcoxon singed-rank test between six pieces of game-lag information on the T-XGBoost model.
| T-XGBboost | Lag = 1 | Lag = 2 | Lag = 3 | Lag = 5 | Lag = 6 |
|---|---|---|---|---|---|
| Lag = 4 | −1.017 | −1.044 | −4.284 | −6.115 | −10.859 |
Note: The numbers in parentheses are the corresponding p-value: ** p < 0.05.
Wilcoxon sing-rank test between T-XGBoost, T-Linear, T-MARS, T-SVR, T-SGB, T-KNN, T-ELM and T-M5P models.
| Lag = 4 | T-Linear | T-MARS | T-SVR | T-SGB | T-KNN | T-ELM | T-M5P |
|---|---|---|---|---|---|---|---|
| T-XGBboost | −1.239 | −0.994 | −0.997 | −0.989 | −0.885 | −1.377 | −1.043 |
Note: The numbers in parentheses are the corresponding p-value; **: p < 0.05.