| Literature DB >> 36187785 |
Yuanqi Huang1,2, Shengqi Huang2, Yukun Wang2, Yurong Li3, Yuheng Gui4, Caihua Huang1.
Abstract
The application of machine learning algorithms in studying injury assessment methods based on data analysis has recently provided a new research insight for sports injury prevention. However, the data used in these studies are primarily multi-source and multimodal (i.e., longitudinal repeated-measures data and cross-sectional data), resulting in the models not fully utilising the information in the data to reveal specific injury risk patterns. Therefore, this study proposed an injury risk prediction model based on a multi-modal strategy and machine learning algorithms to handle multi-source data better and predict injury risk. This study retrospectively analysed the routine monitoring data of sixteen young female basketball players. These data included training load, perceived well-being status, physiological response, physical performance and lower extremity non-contact injury registration. This study partitions the original dataset based on the frequency of data collection. Extreme gradient boosting (XGBoost) was used to construct unimodal submodels to obtain decision scores for each category of indicators. Ultimately, the decision scores from each submodel were fused using the random forest (RF) to generate a lower extremity non-contact injury risk prediction model at the decision-level. The 10-fold cross-validation results showed that the fusion model was effective in classifying non-injured (mean Precision: 0.9932, mean Recall: 0.9976, mean F2-score: 0.9967), minimal lower extremity non-contact injuries risk (mean Precision: 0.9317, mean Recall: 0.9167, mean F2-score: 0.9171), and mild lower extremity non-contact injuries risk (mean Precision: 0.9000, mean Recall: 0.9000, mean F2-score: 0.9000). The model performed significantly more optimal than the submodel. Comparing the fusion model proposed with a traditional data integration scheme, the average Precision and Recall improved by 8.2 and 20.3%, respectively. The decision curves analysis showed that the proposed fusion model provided a higher net benefit to athletes with potential lower extremity non-contact injury risk. The validity, feasibility and practicality of the proposed model have been confirmed. In addition, the shapley additive explanation (SHAP) and network visualisation revealed differences in lower extremity non-contact injury risk patterns across severity levels. The model proposed in this study provided a fresh perspective on injury prevention in future research.Entities:
Keywords: injury prevention; injury risk pattern; injury risk prediction; machine learning; multimodal fusion
Year: 2022 PMID: 36187785 PMCID: PMC9520324 DOI: 10.3389/fphys.2022.937546
Source DB: PubMed Journal: Front Physiol ISSN: 1664-042X Impact factor: 4.755
Assignment of indices and units.
| Index | Assignment | Frequency | Unit |
|---|---|---|---|
| sRPE | Original value input | day | AU |
| Menses | No = 0, Yes = 1 | day | AU |
| Fatigue | Original value input | day | AU |
| Sleep Quality | Original value input | day | AU |
| Muscle Soreness | Original value input | day | AU |
| Stress Levels | Original value input | day | AU |
| Desire | Original value input | day | AU |
| Urine Protein | Negative = 1; Microscale = 2; 0.3 g/L = 3; 1 g/L = 4; 3 g/L = 5 | 1-week | AU |
| Urobilinogen | 3.2 mg/dl = 1; 16 mg/dl = 5; 33 mg/dl = 10 | 1-week | AU |
| Urine pH | Original value input | 1-week | AU |
| Urine Specific Gravity | ≤1.025 = 1; ≥1.030 = 2 | 1-week | AU |
| Urine Blood | Negative = 1; Microscale = 2; Ca25 Ery/µL = 3; Ca80 Ery/µL = 4; Ca200 Ery/µL = 6 | 1-week | AU |
| Urine Ketones | Negative = 1; Microscale = 2; 1.5 nmol/L = 3 | 1-week | AU |
| Squat 1RM | Original value input | 4-weeks | kg |
| 15 m × 17 Shuttle Run | Original value input | 4-weeks | s |
| 5.8 m × 6 Shuttle Run | Original value input | 4-weeks | s |
| Maximum Vertical Jump | Original value input | 4-weeks | cm |
| Injury Severity | Negative = 0; 0–3 = 1; 4–7 = 2; 8–28 = 3; ≥29 = 4 | day | AU |
FIGURE 1Schematic diagram of the aggregation and prediction sliding windows.
FIGURE 2Schematic diagram of the multimodal model architecture.
FIGURE 3Comparison of reference integration solutions.
Descriptive information on the incidence of non-contact injuries of all lower extremities.
| 1–3 days minimal | 4–7 days mild | 8–28 days moderate | >29 days severe | Count | |
|---|---|---|---|---|---|
| Hip | 0 | 0 | 0 | 0 | 0 (0.0) |
| Knee | 2 | 2 | 0 | 0 | 4 (14.8) |
| Thigh | 5 | 0 | 0 | 0 | 5 (18.5) |
| Calf | 10 | 0 | 0 | 0 | 10 (37.0) |
| Ankle | 0 | 0 | 0 | 0 | 0 (0.0) |
| Foot | 5 | 2 | 1 | 0 | 8 (29.6) |
FIGURE 4A week of LENCI occurrence.
Distribution of each variable in the primary dataset.
| Encoding | Feature | Mean ± SD | Minimum | Maximum | N |
|---|---|---|---|---|---|
| PW-1 | Menses | 0.160 ± 0.367 | 0 | 1 | 1813 |
| PW-2 | Fatigue (EWMA) | 3.019 ± 0.409 | 2.004 | 4.570 | 1813 |
| PW-3 | Sleep (EWMA) | 3.094 ± 0.448 | 2.049 | 4.500 | 1813 |
| PW-4 | MS (EWMA) | 3.252 ± 0.453 | 1.381 | 4.380 | 1813 |
| PW-5 | Stress (EWMA) | 3.044 ± 0.537 | 1.157 | 4.410 | 1813 |
| PW-6 | Desire (EWMA) | 2.988 ± 0.173 | 1.610 | 3.980 | 1813 |
| TL-1 | TM (sRPE) | 1.525 ± 0.468 | 0.267 | 2.690 | 1811 |
| TL-2 | sRPE (EWMA) | 1083.5 ± 265.0 | 13.885 | 1685.7 | 1813 |
| PR-1 | Urine Protein | 1.719 ± 1.011 | 1 | 4 | 232 |
| PR-2 | Urobilinogen | 2.056 ± 2.343 | 1 | 10 | 232 |
| PR-3 | Urine pH | 6.727 ± 0.624 | 5 | 8 | 232 |
| PR-4 | Urine Specific Gravity | 2.446 ± 0.498 | 2 | 3 | 232 |
| PR-5 | Urine Blood | 1.854 ± 1.305 | 1 | 5 | 232 |
| PR-6 | Urine Ketones | 1.330 ± 0.640 | 1 | 3 | 232 |
| PP-1 | Squat 1RM | 80.592 ± 17.535 | 60 | 110 | 59 |
| PP-2 | 5.8 m × 6 Shuttle Run | 9.747 ± 0.603 | 8.69 | 11 | 59 |
| PP-3 | 15 m × 17 Shuttle Run | 67.637 ± 1.946 | 63.78 | 74.16 | 59 |
| PP-4 | MVJ | 284.732 ± 6.738 | 267 | 295 | 59 |
Performance levels of submodels and fusion models in dataset B.
| Model | Dimension | Weighted-average precision | Weighted-average recall | Weighted-average F2-score |
|---|---|---|---|---|
| SubModel (wPW) | Perceived Well-being | 0.8657 ± 0.0305 | 0.8118 ± 0.1141 | 0.8172 ± 0.1011 |
| SubModel (wTL) | Training Load | 0.8776 ± 0.0572 | 0.7355 ± 0.0702 | 0.7589 ± 0.0646 |
| SubModel (wPR) | Physiological Response | 0.8605 ± 0.0468 | 0.7315 ± 0.1458 | 0.7507 ± 0.1306 |
| SubModel (wPP) | Physical Performance | 0.8601 ± 0.0325 | 0.8352 ± 0.0410 | 0.8399 ± 0.0378 |
| wFusionModel | 0.9835 ± 0.0521 | 0.9731 ± 0.0851 | 0.9750 ± 0.0792 |
FIGURE 5Confusion matrix: (A) wFusionModel; (B) dFusionModel.
FIGURE 6Decision curve analysis.
Performance evaluation results of the fusion and integration schemes in dataset A.
| Model | Weighted-average Precision | Weighted-average Recall | Weighted-average F2-score |
|---|---|---|---|
| DC | 0.8670 ± 0.0143 | 0.3506 ± 0.0116 | 0.3857 ± 0.0089 |
| LR | 0.8906 ± 0.0223 | 0.5638 ± 0.1150 | 0.5916 ± 0.1048 |
| SVM | 0.9206 ± 0.0269 | 0.9045 ± 0.0351 | 0.9050 ± 0.0330 |
| KNN | 0.8961 ± 0.0175 | 0.8023 ± 0.0365 | 0.8140 ± 0.0322 |
| NB | 0.9007 ± 0.0260 | 0.6223 ± 0.1208 | 0.6422 ± 0.1163 |
| DT | 0.9026 ± 0.0260 | 0.8244 ± 0.1105 | 0.8324 ± 0.1003 |
| RF | 0.9169 ± 0.0355 | 0.9183 ± 0.0898 | 0.9152 ± 0.0833 |
| XGBoost | 0.9141 ± 0.0322 | 0.8813 ± 0.0600 | 0.8835 ± 0.0517 |
| dFusionModel | 0.9881 ± 0.0423 | 0.9912 ± 0.0312 | 0.9903 ± 0.0348 |
FIGURE 7The feature importance of the dFusionModel.
FIGURE 8The feature importance of each submodel.
FIGURE 9Network analysis of the relationship between SHAP values of independent variables: (A) non injured; (B) minimal LENCI risk versus non injured; (C) mild LENCI risk versus non injured.