| Literature DB >> 36010724 |
Talko B Dijkhuis1,2, Frank J Blaauw3.
Abstract
Although causal inference has shown great value in estimating effect sizes in, for instance, physics, medical studies, and economics, it is rarely used in sports science. Targeted Maximum Likelihood Estimation (TMLE) is a modern method for performing causal inference. TMLE is forgiving in the misspecification of the causal model and improves the estimation of effect sizes using machine-learning methods. We demonstrate the advantage of TMLE in sports science by comparing the calculated effect size with a Generalized Linear Model (GLM). In this study, we introduce TMLE and provide a roadmap for making causal inference and apply the roadmap along with the methods mentioned above in a simulation study and case study investigating the influence of substitutions on the physical performance of the entire soccer team (i.e., the effect size of substitutions on the total physical performance). We construct a causal model, a misspecified causal model, a simulation dataset, and an observed tracking dataset of individual players from 302 elite soccer matches. The simulation dataset results show that TMLE outperforms GLM in estimating the effect size of the substitutions on the total physical performance. Furthermore, TMLE is most robust against model misspecification in both the simulation and the tracking dataset. However, independent of the method used in the tracking dataset, it was found that substitutes increase the physical performance of the entire soccer team.Entities:
Keywords: TMLE; causal inference; machine learning; methods; statistics
Year: 2022 PMID: 36010724 PMCID: PMC9407135 DOI: 10.3390/e24081060
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.738
Figure 1The causal model representation of the system being studied. Y = the total distance of a team in five-minute periods; A = a substitute or not in the previous five-minute period; = the consecutive five-minute periods in the second half of the match (i.e., an index variable indicating the minute of the match); = the number of substitutes present; = number of substitutes in the current period; U = possible unknown confounders influencing A, , and Y. The dashed lines indicate that this confounding effect is uncertain.
Figure 2Number of substitutions in the second half per 5-minute period.
Figure 3Difference in the total distance when a substitution took place in the previous period or not (A).
Figure 4Graphical depiction of TMLE [2].
Simulation of the correct causal model.
| True ATE: 0.0646 | |||
|---|---|---|---|
| Measure | GLM | TMLE | TMLEH |
| ATE | 0.1442 | 0.0647 | 0.0647 |
| Confidence Interval 95% | 0.1399–0.1485 | 0.0628–0.0665 | 0.0605–0.0688 |
| Bias | 0.0797 | 0.0001 | 0.0001 |
| Bias % | 123.50 | 0.22 | 0.17 |
GLM = Generalized Linear Model; TMLE = Targeted Maximum Likelihood Estimation; TMLEH = Targeted Maximum Likelihood Estimation using Handpicked algorithms; ATE = Average Treatment Effect (i.e., effect size) of a substitute in a previous period on the total distance of a soccer team.
Simulation of misspecified causal model.
| True ATE: 0.0646 | |||
|---|---|---|---|
| Measure | GLM | TMLE | TMLEH |
| ATE | 0.1491 | 0.0647 | 0.0646 |
| Confidence Interval 95% | 0.1399–0.1485 | 0.0628–0.0665 | 0.0613–0.0679 |
| Bias | 0.0846 | 0.0001 | 0.0000 |
| Bias % | 131.00 | 0.22 | 0.00 |
GLM = Generalized Linear Model; TMLE = Targeted Maximum Likelihood Estimation; TMLEH = Targeted Maximum Likelihood Estimation using Handpicked algorithms; ATE = Average Treatment Effect (i.e., effect size) of a substitute in a previous period on the total distance of a soccer team.
Figure 5The Average Treatment Effect of the simulation of the causal model and the misspecified causal model. True ATE = True Average Treatment Effect (i.e., effect size) of a substitute in a previous period on the total distance of a soccer team; CI 95% = Confidence Interval 95%; GLM miss = Generalized Linear Model with misspecified causal model; GLM = Generalized Linear Model, TMLE miss = Targeted Maximum Likelihood Estimation with misspecified causal model; TMLE = Targeted Maximum Likelihood Estimation; TMLEH miss = Targeted Maximum Likelihood Estimation using Handpicked algorithms with misspecified causal model; TMLEH = Targeted Maximum Likelihood Estimation using Handpicked algorithms.
Observed dataset causal model.
| Measure | GLM | TMLE | TMLEH |
|---|---|---|---|
|
| |||
| ATE | 0.0105 | 0.0149 | 0.0142 |
| Confidence Interval 95% | –0.0007–0.0216 | 0.0007–0.0290 | –0.0021–0.0303 |
|
| |||
| ATE | 0.0193 | 0.0245 | 0.0247 |
| Confidence Interval 95% | –0.0007–0.0216 | 0.0115–0.0374 | 0.0210–0.0381 |
|
| |||
| Difference correct causal model and misspecified | 0.0089 | 0.0096 | 0.0121 |
| Difference correct causal model and misspecified % | 84.7 | 65.0 | 66.3 |
GLM = Generalized Linear Model; TMLE = Targeted Maximum Likelihood Estimation; TMLEH = Targeted Maximum Likelihood Estimation using Handpicked algorithms; ATE = Average Treatment Effect (i.e., effect size) of a substitute in a previous period on the total distance of a soccer team.