| Literature DB >> 35136114 |
Finnian Lattimore1, Daniel Steinberg2, Diana Cárdenas3, Katherine J Reynolds4.
Abstract
Young people worldwide face new challenges as climate change and complex family structures disrupt societies. These challenges impact on youth's subjective well-being, with evidence of decline across many countries. While the burden of negative well-being on productivity is widely examined amongst adults, its cost among youth remains understudied. The current research comprehensively investigates the relationship between youth subjective well-being and standardized academic test scores. We use highly controlled machine learning models on a moderately-sized high-school student sample (N ~ 3400), with a composite subjective well-being index (composed of depression, anxiety and positive affect), to show that students with greater well-being are more likely to have higher academic scores 7-8 months later (on Numeracy: β* = .033, p = .020). This effect emerges while also accounting for previous test scores and other confounding factors. Further analyses with each well-being measure, suggests that youth who experience greater depression have lower academic achievement (Numeracy: β* = - .045, p = .013; Reading: β* = - .033, p = .028). By quantifying the impact of youth well-being, and in particular of lowering depression, this research highlights its importance for the next generation's health and productivity.Entities:
Mesh:
Year: 2022 PMID: 35136114 PMCID: PMC8826920 DOI: 10.1038/s41598-022-05780-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Major results for the treatment-outcome effect models (extended results presented in the supplementary material, Table LIN-EXT-RES). β* represents the standardised effect size. Well-being significantly predicts Numeracy outcomes, and depression significantly predicts both Numeracy and Reading outcomes.
| Target (grade 9) | Model | N | β* (95% interval) | s.e.(β*) or σβ*|X, | RMSE | |
|---|---|---|---|---|---|---|
| Numeracy | Bayesian ridge | 3368 | 0.0294 (0.0092, 0.0495) | 0.0103 | .0045 | 39.184 |
| Two-stage ridge | 0.0332 (0.0052, 0.0612) | 0.0143 | .0202 | 38.989 | ||
| DML | 0.0270 (0.0072, 0.0468) | 0.0101 | .007 | NA | ||
| Reading | Bayesian ridge | 3414 | 0.0139 (− 0.0086, 0.0364) | 0.0115 | .2271 | 45.073 |
| Two-stage ridge | 0.0201 (− 0.0107, 0.0509) | 0.0157 | .2002 | 44.475 | ||
| DML | 0.0189 (− 0.0054, 0.0432) | 0.0124 | .124 | NA | ||
| Numeracy | Bayesian ridge | 3416 | − 0.0438 (− 0.0636, − 0.0240) | 0.0101 | 1.5 × 10–5 | 39.412 |
| Two-stage ridge | − 0.0446 (− 0.0716, − 0.0176) | 0.0138 | .0012 | 39.080 | ||
| DML | − 0.0475 (− 0.0675, − 0.0275) | 0.0102 | 3 × 10–6 | NA | ||
| Reading | Bayesian ridge | 3463 | − 0.0385 (− 0.0605, − 0.0165) | 0.0112 | .0006 | 45.088 |
| Two-stage ridge | − 0.0328 (− 0.0622, − 0.0034) | 0.0150 | .0281 | 44.556 | ||
| DML | − 0.0425 (− 0.0676, − 0.0174) | 0.0128 | .001 | NA | ||
Figure 3The causal relationships between factors assumed in this study. Some of these factors are at the level of the individual, and others are at the school level. The detailed graph on the left can be simplified into the smaller graph on the right, which was used to inform the modelling approach.
Figure 1Partial dependence plots of well-being index on NAPLAN Numeracy and Reading grade 9 scores. These can be interpreted as the (conditional) average treatment effect of well-being index on NAPLAN scores (for year 9 students), i.e. they demonstrate the average effect on NAPLAN of changing a student’s well-being. The histogram beneath each plot represents the density of the treatment variable. Three models have been used here: (a) a two stage ridge regressor (linear treatment, kernelized controls), (b) an approximate kernelized Bayesian regression (using a Nyström gram matrix approximation), and (c) a gradient boosted regression tree. The grey lines are bootstrap samples of the model predictions, and the dashed red line is the mean of the samples. A higher model uncertainty is depicted by less agreement in the gray prediction samples. (b) and (c) Show mostly linear treatment-outcome effect relationships even though they are completely nonlinear models. These figures were made using Matplotlib (https://matplotlib.org/; ver. 3.3.0).
Figure 2Partial dependence plots of self-reported depression on NAPLAN Numeracy and Reading grade 9 NAPLAN scores. The description of these plots is the same as those in Fig. 1, but use depression as the treatment variable as opposed to the well-being index. The nonlinear models (b) and (c) show mostly linear relationships. These figures were made using Matplotlib (https://matplotlib.org/; ver. 3.3.0).
A comparison of machine learning approaches to observational causal inference.
| Method | Susceptible to regularisation bias | Susceptible to model mis-specification bias | Readily available software |
|---|---|---|---|
| Direct response surface modelling (DRSM) | More | Less | Y |
| Two-stage ridge (TS) | Less | More | N (but this was easily implemented)a |
| Double machine learning (DML) | Less | More | Y |
aThe code for this can be found at https://github.com/gradientinstitute/twostageridge.