| Literature DB >> 35064172 |
Ashwin A Phatak1, Saumya Mehta2, Franz-Georg Wieland3, Mikael Jamil4, Mark Connor4,5, Manuel Bassek2, Daniel Memmert2.
Abstract
Key Performance Indicators (KPIs) have been investigated, validated and applied in multitude of sports for recruiting, coaching, opponent, self-analysis etc. Although a wide variety of in game performance indicators have been used as KPIs, they lack sports specific context. With the introduction of artificial intelligence and machine learning (AI/ML) in sports, the need for building intrinsic context into the independent variables is even greater as AI/ML models seem to perform better in terms of predictability but lack interpretability. The study proposes domain specific feature preprocessing method (normalization) that can be utilized across a wide range of sports and demonstrates its value through a specific data transformation by using team possession as a normalizing factor while analyzing defensive performance in soccer. The study performed two linear regressions and three gradient boosting machine models to demonstrate the value of normalization while predicting defensive performance. The results demonstrate that the direction of correlation of the relevant variables changes post normalization while predicting defensive performance of teams for the whole season. Both raw and normalized KPIs showing significant correlation with defensive performance (p < 0.001). The addition of the normalized variables contributes towards higher information gain, improved performance and increased interpretability of the models.Entities:
Year: 2022 PMID: 35064172 PMCID: PMC8782855 DOI: 10.1038/s41598-022-05089-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
xGA vs blocking performance (R2 = 51.1 ± 12.1%).
| Names | Coefficients | Standard error | Z value | P value | Standardized coefficients |
|---|---|---|---|---|---|
| Intercept | 7.392 | 2.226 | 3.320 | < 0.01–04 | 44.754 |
| Blocks | 0.109 | 0.006 | 19.237 | < 0.001 | 10.975 |
| NormBlocks | −0.019 | 0.003 | −7.025 | < 0.001 | −4.008 |
xGA vs tackling performance (R2 = 39.7 ± 12.1%).
| Names | Coefficients | Standard error | Z value | P value | Standardized coefficients |
|---|---|---|---|---|---|
| Intercept | 11.265 | 2.825 | 3.987 | < 0.01 | 44.753 |
| Tkl + Int | 0.072 | 0.004 | 16.653 | < 0.001 | 10.092 |
| NormTkl + Int | −0.016 | 0.002 | −9.115 | < 0.001 | −5.524 |
GBM results for xGA vs raw KPIs.
| Variable | Relative importance | Scaled importance | Percentage (%) |
|---|---|---|---|
| Blocks | 95,117.367 | 1.000 | 58.8 |
| Tkl + Int | 66,722.398 | 0.702 | 41.2 |
GBM results for xGA vs normalized KPIs.
| Variable | Relative importance | Scaled importance | Percentage (%) |
|---|---|---|---|
| NormBlocks | 72,164.851 | 1 | 65.8 |
| NormTkl + Int | 37,500.191 | 0.520 | 34.2 |
GBM results for xGA vs combined KPIs.
| Variable | Relative importance | Scaled importance | Percentage (%) |
|---|---|---|---|
| Blocks | 90,860.570 | 1.000 | 54.4 |
| Tkl + Int | 36,498.390 | 0.402 | 21.8 |
| NormBlocks | 24,005.883 | 0.264 | 14.4 |
| NormTkl + Int | 15,775.142 | 0.174 | 9.4 |
GBM results for xGA vs KPIs and possession.
| Variable | Relative importance | Scaled importance | Percentage (%) |
|---|---|---|---|
| Blocks | 108,474.078 | 1.000 | 66.25 |
| Tkl + Int | 30,484.479 | 0.281 | 18.61 |
| Possession | 24,762.047 | 0.228 | 15.12 |
Cross validation results for all models.
| Model | Train (R2%) | Test (CV (R2% ± SD%)) |
|---|---|---|
| Blocking performance LR | 55.30 | 51.50 ± 12.07 |
| Tackling performance LR | 42.84 | 39.70 ± 12.16 |
| Raw KPIS GBM | 61.76 | 49.90 ± 7.25 |
| Normalized KPIs GBM | 45.53 | 37.16 ± 4.11 |
| Combined performance GBM | 70.11 | 57.83 ± 2.37 |
| KPIs and possession GBM | 66.72 | 56.96 ± 2.57 |