| Literature DB >> 35395047 |
Jack C Yue1, Elizabeth P Chou1, Ming-Hui Hsieh2, Li-Chen Hsiao1.
Abstract
Tennis is a popular sport, and professional tennis matches are probably the most watched games globally. Many studies consider statistical or machine learning models to predict the results of professional tennis matches. In this study, we propose a statistical approach for predicting the match outcomes of Grand Slam tournaments, in addition to applying exploratory data analysis (EDA) to explore variables related to match results. The proposed approach introduces new variables via the Glicko rating model, a Bayesian method commonly used in professional chess. We use EDA tools to determine important variables and apply classification models (e.g., logistic regression, support vector machine, neural network and light gradient boosting machine) to evaluate the classification results through cross-validation. The empirical study is based on men's and women's single matches of Grand Slam tournaments (2000-2019). Our analysis results show that professional tennis ranking is the most important variable and that the accuracy of the proposed Glicko model is slightly higher than that of other models.Entities:
Mesh:
Year: 2022 PMID: 35395047 PMCID: PMC8992979 DOI: 10.1371/journal.pone.0266838
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Wimbledon championship bracket.
https://www.interbasket.net/wp-content/uploads/Wimbledon-bracket-1-of-2-print-768x590.jpg.
Fig 2Winning probability of higher-ranked player vs. rank difference (male single).
Fig 3Histogram of ranking differences (ATP male).
80-percentile rank for each round (male single, 2000–2019).
| Match | 80-percentile | log( | number of Samples |
|---|---|---|---|
| Champion | 5 | 1.6094 | 80 |
| Final | 9 | 2.1972 | 160 |
| Semi-Final | 15 | 2.7081 | 320 |
| Quarter-Final | 25 | 3.2189 | 640 |
| 4th Round | 41 | 3.7136 | 1280 |
| 3rd Round | 64 | 4.1589 | 2560 |
| 2nd Round | 98 | 4.5850 | 5120 |
Fig 4Logarithm of rank vs. rounds (80%).
Fig 5Age structure of players advancing to different rounds (male single).
Fig 6Winning probability of higher-rated player vs. rating difference.
Fig 7Possible combinations of moving windows.
Model comparison for ATP single matches (moving window).
| Model | AUC | Accuracy |
|---|---|---|
| Baseline | -- | 0.724 |
| Logistic | 0.691 | 0.727 |
| SVM | -- | 0.719 |
| Neural Network | 0.657 | 0.724 |
| LightGBM | 0.688 | 0.723 |
| Glicko | 0.715 | 0.734 |
| Glicko (Courts) | 0.731 | 0.738 |
Model comparison of WTA single matches (moving window).
| Model | AUC | Accuracy |
|---|---|---|
| Baseline | – | 0.701 |
| Logistic | 0.656 | 0.699 |
| SVM | – | 0.700 |
| Neural Network | 0.648 | 0.701 |
| LightGBM | 0.667 | 0.700 |
| Glicko | 0.695 | 0.712 |
| Glicko (Courts) | 0.691 | 0.709 |
Model accuracy of single matches (non-moving window).
| Model | ATP | WTA | ||
|---|---|---|---|---|
| AUC | Accuracy | AUC | Accuracy | |
| Baseline | – | 0.717 | – | 0.704 |
| Logistic | 0.685 | 0.718 | 0.661 | 0.704 |
| SVM | – | 0.721 | – | 0.709 |
| Neural Network | 0.700 | 0.720 | 0.690 | 0.707 |
| LightGBM | 0.705 | 0.717 | 0.694 | 0.707 |
| Glicko | 0.704 | 0.723 | 0.696 | 0.713 |
| Glicko (Courts) | 0.728 | 0.732 | 0.697 | 0.714 |