| Literature DB >> 33241060 |
Jaret M Karnuta1, Bryan C Luu2, Heather S Haeberle1,2, Paul M Saluan1, Salvatore J Frangiamore1, Kim L Stearns1, Lutul D Farrow1, Benedict U Nwachukwu3, Nikhil N Verma4, Eric C Makhni5, Mark S Schickendantz1, Prem N Ramkumar1.
Abstract
BACKGROUND: Machine learning (ML) allows for the development of a predictive algorithm capable of imbibing historical data on a Major League Baseball (MLB) player to accurately project the player's future availability.Entities:
Keywords: injury prediction; injury prevention; machine learning
Year: 2020 PMID: 33241060 PMCID: PMC7672741 DOI: 10.1177/2325967120963046
Source DB: PubMed Journal: Orthop J Sports Med ISSN: 2325-9671
Figure 1.Schematic demonstrating machine learning algorithm development and testing.
Player Injury Characteristics
| Player-Years, n (%) | |
|---|---|
| Position players | |
| Total | 9316 (100.0) |
| With prior injuries | 4091 (44.0) |
| Without prior injuries | 5225 (56.0) |
| ≥1 placement on 10-day DL | 147 (1.6) |
| ≥1 placement on 15-day DL | 1859 (19.9) |
| ≥1 placement on 60-day DL | 496 (5.3) |
| ≥1 game missed because of day-to-day injuries | 3052 (32.7) |
| Pitchers | |
| Total | 4657 (100.0) |
| With prior injuries | 2030 (43.6) |
| Without prior injuries | 2627 (56.4) |
| ≥1 placement on 10-day DL | 88 (1.9) |
| ≥1 placement on 15-day DL | 1040 (22.3) |
| ≥1 placement on 60-day DL | 319 (6.9) |
| ≥1 game missed because of day-to-day injuries | 1004 (21.6) |
| Combined | |
| Knee injury | 955 [355] (6.8) |
| Back injury | 1201 [327] (8.6) |
| Hand injury | 1668 [581] (11.9) |
| Foot and ankle injury | 925 [324] (6.6) |
| Shoulder injury | 1129 [569] (8.1) |
| Elbow injury | 643 [364] (4.6) |
Values in brackets indicate those requiring DL placement. DL, disabled list.
Models Predicting Future Injuries Among Position Players
| Model | Accuracy, % | AUC | F1 Score | Brier Score Loss |
|---|---|---|---|---|
| Logistic regression | 68.7 ± 1.9 | 0.74 ± 0.021 | 0.68 ± 0.027 | 0.20 ± 0.008 |
| Random forest | 69.0 ± 2.0 | 0.75 ± 0.020 | 0.70 ± 0.027 | 0.20 ± 0.008 |
|
| 60.1 ± 1.9 | 0.64 ± 0.017 | 0.59 ± 0.027 | 0.29 ± 0.010 |
| Naïve Bayes | 62.7 ± 3.0 | 0.71 ± 0.027 | 0.59 ± 0.071 | 0.35 ± 0.035 |
| XGBoost | 69.0 ± 2.1 | 0.75 ± 0.021 | 0.70 ± 0.029 | 0.20 ± 0.008 |
| Top 3 ensemble | 70.0 ± 2.0 | 0.76 ± 0.020 | 0.70 ± 0.029 | 0.20 ± 0.008 |
Values are reported as mean ± SD across 10 k-folds. AUC, area under the receiver operating characteristic curve.
Figure 2.Position player receiver operating characteristic (ROC) curve for predicting future injuries based on prior-season performance and injuries, with sensitivity on the y-axis and 1-specificity on the x-axis. Area under the ROC curve (AUC) values of <0.7 are poor, ≥0.7 are fair, ≥0.8 are good, and ≥0.9 are excellent.
Figure 3.Variables ranked by relative importance for predicting future injuries among position players. Previous injuries and weighted cutter runs per 100 pitches were the most important variables in predicting outcomes. The relative importance is expressed as a fraction based on the weight of each variable, with 1.0 being the most important and 0.0 having no contribution to the model. DL, disabled list.
Models Predicting Future Injuries Among Pitchers
| Model | Accuracy, % | AUC | F1 Score | Brier Score Loss |
|---|---|---|---|---|
| Logistic regression | 60.9 ± 3.0 | 0.64 ± 0.03 | 0.54 ± 0.04 | 0.24 ± 0.003 |
| Random forest | 62.2 ± 2.0 | 0.65 ± 0.02 | 0.54 ± 0.02 | 0.23 ± 0.005 |
|
| 54.6 ± 3.3 | 0.54 ± 0.03 | 0.42 ± 0.02 | 0.33 ± 0.023 |
| Naïve Bayes | 58.9 ± 2.6 | 0.62 ± 0.03 | 0.38 ± 0.08 | 0.41 ± 0.024 |
| XGBoost | 60.3 ± 2.1 | 0.64 ± 0.01 | 0.54 ± 0.03 | 0.24 ± 0.004 |
| Top 3 ensemble | 63.7 ± 2.0 | 0.65 ± 0.02 | 0.55 ± 0.02 | 0.23 ± 0.003 |
Values are reported as mean ± SD across 10 k-folds. AUC, area under the receiver operating characteristic curve.
Figure 4.Pitcher receiver operating characteristic (ROC) curve for predicting future injuries based on prior-season performance and injuries, with sensitivity on the y-axis and 1-specificity on the x-axis. Area under the ROC curve (AUC) values <0.7 are poor, ≥0.7 are fair, ≥0.8 are good, and ≥0.9 are excellent.
Best Performing Models Predicting Future Injuries Among Position Players, as Determined by the Highest AUC
| Accuracy, % | AUC | F1 Score | Brier Score Loss | |
|---|---|---|---|---|
| Future knee injury (top 3 ensemble) | 90.0 ± 1.3 | 0.68 ± 0.04 | 0.10 ± 0.07 | 0.10 ± 0.010 |
| Future back injury (top 3 ensemble) | 89.0 ± 1.4 | 0.73 ± 0.03 | 0.22 ± 0.06 | 0.11 ± 0.010 |
| Future hand injury (top 3 ensemble) | 84.2 ± 1.7 | 0.71 ± 0.04 | 0.23 ± 0.03 | 0.13 ± 0.010 |
| Future foot/ankle injury (top 3 ensemble) | 90.7 ± 0.9 | 0.67 ± 0.04 | 0.06 ± 0.04 | 0.11 ± 0.005 |
| Future shoulder injury (top 3 ensemble) | 93.2 ± 0.9 | 0.64 ± 0.05 | 0.06 ± 0.05 | 0.09 ± 0.004 |
| Future elbow injury (logistic regression) | 63.0 ± 3.6 | 0.61 ± 0.08 | 0.07 ± 0.02 | 0.23 ± 0.007 |
Values are reported as mean ± SD across 10 K-folds.
Best Performing Models Predicting Future Injuries Among Pitchers, as Determined by the Highest AUC
| Accuracy, % | AUC | F1 Score | Brier Score Loss | |
|---|---|---|---|---|
| Future knee injury (top 3 ensemble) | 83.0 ± 1.1 | 0.58 ± 0.04 | 0.24 ± 0.07 | 0.13 ± 0.01 |
| Future back injury (random forest) | 94.2 ± 1.4 | 0.73 ± 0.04 | 0.54 ± 0.04 | 0.06 ± 0.01 |
| Future hand injury (top 3 ensemble) | 92.9 ± 1.3 | 0.70 ± 0.06 | 0.11 ± 0.07 | 0.06 ± 0.01 |
| Future foot/ankle injury (top 3 ensemble) | 87.0 ± 0.8 | 0.57 ± 0.04 | 0.33 ± 0.05 | 0.15 ± 0.01 |
| Future shoulder injury (top 3 ensemble) | 83.0 ± 1.9 | 0.63 ± 0.04 | 0.23 ± 0.04 | 0.14 ± 0.01 |
| Future elbow injury (top 3 ensemble) | 86.6 ± 1.9 | 0.61 ± 0.06 | 0.17 ± 0.05 | 0.12 ± 0.01 |
Values are reported as mean ± SD across 10 k-folds. AUC, area under the receiver operating characteristic curve.