| Literature DB >> 34811371 |
Mikael Jamil1, Ashwin Phatak2, Saumya Mehta3, Marco Beato4, Daniel Memmert3, Mark Connor4,5.
Abstract
This study applied multiple machine learning algorithms to classify the performance levels of professional goalkeepers (GK). Technical performances of GK's competing in the elite divisions of England, Spain, Germany, and France were analysed in order to determine which factors distinguish elite GK's from sub-elite GK's. A total of (n = 14,671) player-match observations were analysed via multiple machine learning algorithms (MLA); Logistic Regressions (LR), Gradient Boosting Classifiers (GBC) and Random Forest Classifiers (RFC). The results revealed 15 common features across the three MLA's pertaining to the actions of passing and distribution, distinguished goalkeepers performing at the elite level from those that do not. Specifically, short distribution, passing the ball successfully, receiving passes successfully, and keeping clean sheets were all revealed to be common traits of GK's performing at the elite level. Moderate to high accuracy was reported across all the MLA's for the training data, LR (0.7), RFC (0.82) and GBC (0.71) and testing data, LR (0.67), RFC (0.66) and GBC (0.66). Ultimately, the results discovered in this study suggest that a GK's ability with their feet and not necessarily their hands are what distinguishes the elite GK's from the sub-elite.Entities:
Year: 2021 PMID: 34811371 PMCID: PMC8609025 DOI: 10.1038/s41598-021-01187-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Machine learning pipeline for obtaining KPI’s.
5-fold cross validation results for training data (mean ± standard deviation).
| Logistic regression | Random forest classifier | Gradient boosting classifier | |
|---|---|---|---|
| F1 | 0.70 ± 0.012 | 0.82 ± 0.005 | 0.71 ± 0.011 |
| Accuracy | 0.70 ± 0.014 | 0.82 ± 0.005 | 0.71 ± 0.011 |
| ROC - AUC | 0.77 ± 0.054 | 0.91 ± 0.003 | 0.78 ± 0.011 |
5-fold cross validation results for testing data set (mean ± standard deviation).
| Logistic regression | Random forest classifier | Gradient boosting classifier | |
|---|---|---|---|
| F1 | 0.664 ± 0.055 | 0.64 ± 0.0723 | 0.651 ± 0.049 |
| Accuracy | 0.671 ± 0.0445 | 0.66 ± 0.045 | 0.66 ± 0.043 |
| ROC - AUC | 0.729 ± 0.057 | 0.723 ± 0.051 | 0.724 ± 0.049 |
F test results.
| Measure | Compared algorithms | F-statistic | p-value |
|---|---|---|---|
| F1 | RF vs LR | 5.159 | 0.042 |
| LR vs GBC | 5.723 | 0.034 | |
| RF vs GBC | 1.503 | 0.342 | |
| Accuracy | RF vs LR | 3.713 | 0.080 |
| LR vs GBC | 5.906 | 0.032 | |
| RF vs GBC | 0.877 | 0.600 | |
| ROC - AUC | RF vs LR | 8.159 | 0.016 |
| LR vs GBC | 9.631 | 0.011 | |
| RF vs GBC | 1.149 | 0.467 |
Feature importance from multiple machine learning algorithms.
| Features | LR coefficients | RFC variable importance | GBC variable importance |
|---|---|---|---|
| Passes received | 3.3866 | 0.0389 | 0.0395 |
| % successful passes forwards | 1.1582 | 0.0341 | 0.0404 |
| GK short distribution | 0.8093 | 0.0249 | 0.0255 |
| Clean sheets | 0.3488 | 0.0283 | 0.0218 |
| Unsuccessful passes opposition half | − 0.5439 | 0.0499 | 0.0703 |
| Successful passes opposition half | − 0.5879 | 0.0669 | 0.0537 |
| Goals conceded | − 0.8896 | 0.0431 | 0.0390 |
| GK long distribution | − 0.9598 | 0.0330 | 0.0327 |
| Touches | − 0.9882 | 0.0321 | 0.0368 |
| Total unsuccessful passes Excl crosses corners | − 1.0458 | 0.0457 | 0.0361 |
| Successful passes final third | − 1.1860 | 0.0295 | 0.0290 |
| GK—pick up | − 1.3739 | 0.0266 | 0.0253 |
| Shots on conceded | − 1.4388 | 0.0302 | 0.0198 |
| Total successful passes Excl crosses corners | − 1.6280 | 0.0303 | 0.0354 |
| Successful long balls | − 2.6940 | 0.0627 | 0.0626 |