| Literature DB >> 31953186 |
Thiago Botter-Maio Rocha1, Helen L Fisher2, Arthur Caye3, Luciana Anselmi4, Louise Arseneault2, Fernando C Barros4, Avshalom Caspi5, Andrea Danese6, Helen Gonçalves4, Hona Lee Harrington7, Renate Houts7, Ana M B Menezes4, Terrie E Moffitt5, Valeria Mondelli8, Richie Poulton9, Luis Augusto Rohde10, Fernando Wehrmeister4, Christian Kieling11.
Abstract
OBJECTIVE: Prediction models have become frequent in the medical literature, but most published studies are conducted in a single setting. Heterogeneity between development and validation samples has been posited as a major obstacle for the generalization of models. We aimed to develop a multivariable prognostic model using sociodemographic variables easily obtainable from adolescents at age 15 to predict a depressive disorder diagnosis at age 18 and to evaluate its generalizability in 2 samples from diverse socioeconomic and cultural settings.Entities:
Keywords: adolescent; cohort studies; depression; prognosis; risk assessment
Mesh:
Year: 2020 PMID: 31953186 PMCID: PMC8215370 DOI: 10.1016/j.jaac.2019.12.004
Source DB: PubMed Journal: J Am Acad Child Adolesc Psychiatry ISSN: 0890-8567 Impact factor: 8.829
Figure 1Flowcharts for Each Included Cohort Study
(a) Pelotas cohort. (b) E-Risk cohort. (c) Dunedin cohort.
aIn the Pelotas dataset, 5 excluded participants had both Tanner < 2 and IQ <70.
Sample Description for Each Cohorta
| Pelotas (Brazil) | E-Risk (United Kingdom) | Dunedin (New Zealand) | |
|---|---|---|---|
| Included sample | 2,192 | 1,144 | 739 |
| Assessment age, years | 15 | 12 | 15 |
| Male sex | 977 (44.6) | 520 (45.5) | 375 (50.7) |
| White skin color | 1,478 (67.4) | 1,040 (90.9) | NA |
| Childhood maltreatment | |||
| None | 1,539 (70.2) | 963 (84.2) | 489 (66.2) |
| Probable | 390 (17.8) | 139 (12.2) | 187 (25.3) |
| Severe | 263 (12.0) | 42 (3.7) | 63 (8.5) |
| School failure | 1,127 (51.4) | 212 (18.5) | 80 (10.8) |
| Social isolation | 231 (10.5) | 63 (5.5) | 70 (9.5) |
| Fights | 211 (9.6) | 130 (11.4) | 12 (1.6) |
| Ran away from home | 80 (3.6) | 9 (0.8) | 49 (6.6) |
| Any drug use | 1,367 (62.4) | 569 (49.7) | 592 (80.1) |
| Relationship with mother | NA | ||
| Great | 1,417 (64.6) | ||
| Very good | 430 (19.6) | ||
| Good | 264 (12.0) | ||
| Regular | 68 (3.1) | ||
| Bad | 13 (0.6) | ||
| Relationship with father | NA | 22.0 ± 5.4 | |
| Great | 1,019 (46.5) | ||
| Very good | 434 (19.8) | ||
| Good | 370 (16.9) | ||
| Regular | 237 (10.8) | ||
| Bad | 132 (6.0) | ||
| Relationship between parents | NA | ||
| Great | 886 (40.4) | 345 (46.7) | |
| Very good | 421 (19.2) | 278 (37.6) | |
| Good | 404 (18.4) | 91 (12.3) | |
| Regular | 301 (13.7) | 23 (3.1) | |
| Bad | 180 (8.2) | 2 (0.3) | |
| Depression prevalence | 69 (3.1) | 202 (17.7) | 124 (16.8) |
Note: Results are shown as number of participants (percentage) for categorical variables and as mean ± SD for continuous variables for participants included in the final analyses. NA = Data not available in the cohort.
See Table S1, available online, for assessment strategies applied to each cohort.
Superscript letters b, c, and d denote column differences among the samples: different letters show significant differences and the same letters indicate nonsignificant differences from each other, assessed by χ2 test at .05 level. For variables with more than 2 categories, the superscript letters were placed in the first row of the variable and represent the assessment of the variable as a group, not per row.
Skin color was not assessed in the cohort. Less than 7% of the cohort had any nonwhite ancestry.
Parent Attachment Scale score (range, −6 to 28)—adolescent assessment about the relationship with both parents.
Presence of symptoms reaching diagnostic criteria within a 2-week period before assessment.
Presence of symptoms reaching diagnostic criteria within a 12-month period before assessment.
Apparent Performance Parameters Obtained From the Models Derived From the Pelotas Dataset
| Model parameters | |||||||
|---|---|---|---|---|---|---|---|
| LR | PMLE | Ridge | .25 | .50 | .75 | LASSO | |
| 0.15 | 0.12 | 0.12 | 0.10 | 0.10 | 0.10 | 0.10 | |
| LR χ2 | 81.90 | 66.17 | 63.30 | 54.40 | 54.32 | 54.71 | 54.10 |
| Brier score | 2.88 | 2.93 | 2.93 | 2.95 | 2.95 | 2.95 | 2.95 |
| C-statistic | 0.79 | 0.78 | 0.78 | 0.76 | 0.76 | 0.76 | 0.76 |
| Calibration slope | 1.00 | 1.26 | 1.35 | 1.47 | 1.42 | 1.38 | 1.39 |
Note: Higher results for R, LR χ2, and C-statistic; lower results for Brier score; and results closer to 1 for calibration slope indicate better model performance. .25 = Elastic-Net with alpha = .25; .50 = Elastic-Net with α = .50; .75 = Elastic-Net with α = .75; Brier score = quadratic scoring rule that combines calibration and discrimination; C-statistic = concordance statistic, or area under the curve of the receiver operating characteristic; Calibration slope = measure of agreement between observed and predicted risk of the event (outcome) across the whole range of predicted values; LASSO = least absolute shrinkage and selection operator; LR = logistic regression; LR χ2 = likelihood ratio χ2; PMLE = penalized maximum likelihood estimation; R = Nagelkerke’s R; Ridge = Ridge regression.
The penalty factor used in the PMLE was empirically obtained from our data.
For the Elastic-Net approach, we have a priori defined a grid of values for the hyperparameter α, ranging from 0 (full Ridge) to 1 (full LASSO), with increments of 0.25. For each α value, a 10-fold cross-validation was used to select the penalty coefficient (λ) that minimized the mean squared prediction error, which was then used for shrinkage of coefficients and/or variable selection. See Table S4, available online, for model’s coefficients.
All LR χ2p values < .001.
Multiplied by 102.
The C-statistic ranges from 0.5 for noninformative models to 1.0 for perfect models.
Comparative Results for Each Step of Model Performance in the 3 Cohorts
| Performance parameter | Description | Pelotas | E-risk | Dunedin | |||||
|---|---|---|---|---|---|---|---|---|---|
| Apparent validation | Internal validation | External validation | Case mix–corrected model | Refitted model | External validation | Case mix–corrected model | Refitted model | ||
| C-statistic | Concordance statistic, equal to area under the curve of receiver operating characteristic in binary endpoints | 0.78 | 0.71 | 0.59 | 0.66 | 0.62 | 0.63 | 0.68 | 0.67 |
| Calibration-in-the-large | Overall measure of calibration, compares mean observed with mean predicted in validation dataset | 0.00 | 0.02 | 2.37 | 0.02 | 0.00 | 2.26 | −0.06 | 0.00 |
| Calibration slope | Measure of agreement between observed and predicted risk of event (outcome) across whole range of predicted values | 1.26 | 1.00 | 0.58 | 0.99 | 1.20 | 0.77 | 0.98 | 1.24 |
| Measure of overall goodness-of-fit of model | 0.12 | 0.06 | 0.03 | 0.04 | 0.05 | 0.05 | 0.05 | 0.09 | |
| Brier score | Quadratic scoring rule that combines calibration and discrimination | 0.03 | 0.03 | 0.17 | 0.02 | 0.14 | 0.16 | 0.02 | 0.13 |
| Emax | Maximum absolute error in predicted probabilities | 0.19 | 0.03 | 0.29 | 0.01 | 0.09 | 0.38 | 0.01 | 0.11 |
| 100% | 86.9% | 93.1% | |||||||
Note: Higher results for C-statistic and R, lower results for Brier score and Emax, results closer to 0 for calibration-in-the-large, and results closer to 1 for calibration slope indicate better model performance.
Reference values indicating the model’s performance under the assumption that Pelotas model’s coefficients are fully correct for the validation setting, simulating similar case mix between samples.
Reference values indicating the model’s performance after refitting predictors’ coefficients that would be optimal for the validation sample. (See Supplement 2, available online, for further details.)
Figure 3Prognostic Contribution of Each Included Variable to the Aggregated Sample Prediction Model of Adolescent Depression
Comparison of the prognostic contribution of each included variable in each cohort to the aggregated sample prediction model of adolescent depression, stratified by sex for Brazil, United Kingdom, and New Zealand cohorts. Predictors’ β coefficients from penalized logistic regression are shown as bars in the x-axis. Positive values represent greater risk and negative values represent lower risk of the outcome. The results shown are derived from values presented inTable S5, available online. Some of the variables previously included in the Pelotas model were excluded for comparability among datasets.