| Literature DB >> 34240783 |
Alex Luna1,2, Joel Bernanke1,2, Kakyeong Kim3, Natalie Aw1,2, Jordan D Dworkin1,2, Jiook Cha1,3,4,5, Jonathan Posner1,2.
Abstract
Brain predicted age difference, or BrainPAD, compares chronological age to an age estimate derived by applying machine learning (ML) to MRI brain data. BrainPAD studies in youth have been relatively limited, often using only a single MRI modality or a single ML algorithm. Here, we use multimodal MRI with a stacked ensemble ML approach that iteratively applies several ML algorithms (AutoML). Eligible participants in the Healthy Brain Network (N = 489) were split into training and test sets. Morphometry estimates, white matter connectomes, or both were entered into AutoML to develop BrainPAD models. The best model was then applied to a held-out evaluation dataset, and associations with psychometrics were estimated. Models using morphometry and connectomes together had a mean absolute error of 1.18 years, outperforming models using a single MRI modality. Lower BrainPAD values were associated with more symptoms on the CBCL (pcorr = .012) and lower functioning on the Children's Global Assessment Scale (pcorr = .012). Higher BrainPAD values were associated with better performance on the Flanker task (pcorr = .008). Brain age prediction was more accurate using ComBat-harmonized brain data (MAE = 0.26). Associations with psychometric measures remained consistent after ComBat harmonization, though only the association with CGAS reached statistical significance in the reduced sample. Our findings suggest that BrainPAD scores derived from unharmonized multimodal MRI data using an ensemble ML approach may offer a clinically relevant indicator of psychiatric and cognitive functioning in youth.Entities:
Keywords: biomarkers; brain age; connectome; diffusion tensor imaging; machine learning
Mesh:
Year: 2021 PMID: 34240783 PMCID: PMC8410534 DOI: 10.1002/hbm.25565
Source DB: PubMed Journal: Hum Brain Mapp ISSN: 1065-9471 Impact factor: 5.399
FIGURE 1Participant selection workflow. In brief, of 498 participants with CBCL‐Parent report data available, 263 with a CBCL score <60 were used for modeling with H2O's AutoML function using an 80%/20% train/test split. For statistical analysis, test set participants and participants with CBCL ≥60 with race/ethnicity data available were combined for a total of 249 participants
Healthy Brain Network participant demographics
| Training set ( | Test set ( | Evaluation set ( | ||
|---|---|---|---|---|
| Age | 1.79 (±2.98) | 10.65 (±3.42) | 10.97 (±3.29) | .74 |
| Sex | .90 | |||
| Male | 133 (61.86%) | 29 (60.42%) | 149 (59.84%) | |
| Female | 82 (38.14%) | 19 (39.58%) | 100 (40.16%) | |
| Race | .86 | |||
| White/Caucasian | 93 (43.26%) | 21 (43.75%) | 127 (51.00%) | |
| Black/African‐American | 34 (15.81%) | 5 (10.42%) | 42 (16.87%) | |
| Hispanic | 19 (8.84%) | 5 (10.42%) | 23 (9.24%) | |
| Asian | 6 (2.79%) | 0 (0.00%) | 8 (3.21%) | |
| Indian | 6 (2.79%) | 1 (2.08%) | 1 (0.40%) | |
| Native‐American Indian | 0 (.00%) | 0 (0.00%) | 1 (0.40%) | |
| Two or more races | 32 (14.88%) | 8 (16.67%) | 38 (15.26%) | |
| Other race | 4 (1.86%) | 0 (0.00%) | 6 (2.41%) | |
| Unknown | 2 (.93%) | 0 (0.00%) | 3 (1.20%) | |
| Missing | 19 (8.84%) | 8 (16.67%) | 0 (0.00%) | |
| Site | .77 | |||
| CBIC | 21 (9.77%) | 3 (6.25%) | 27 (10.84%) | |
| RU | 81 (37.67%) | 22 (45.83%) | 101 (40.56%) | |
| SI | 113 (52.56%) | 23 (47.92%) | 121 (48.59%) | |
| Ethnicity | .94 | |||
| White/Caucasian | 127 (59.07%) | 25 (52.08%) | 167 (67.07%) | |
| Black/African‐American | 48 (22.33%) | 12 (25.00%) | 61 (24.50%) | |
| Hispanic | 13 (6.05%) | 4 (8.33%) | 15 (6.02%) | |
| Asian | 4 (1.86%) | 1 (2.08%) | 6 (2.41%) | |
| Missing | 23 (1.70%) | 6 (12.50%) | 0 (0.00%) | |
| CBCL total raw score | 2.22 (±1.09) | 19.08 (±10.91) | 57.68 (±25.81) | <.0001 |
| CGAS total score | <.0001 | |||
| Mean ( | 67.91 (±11.18) | 64.74 (±9.80) | 61.08 (±10.19) | |
| Missing | 116 (53.95%) | 25 (52.08%) | 124 (49.80%) | |
| Flanker uncorrected standard score | .46 | |||
| Mean ( | 87.39 (±13.91) | 83.94 (±17.89) | 86.04 (±15.71) | |
| Missing | 67 (31.16%) | 16 (33.33%) | 96 (38.55%) | |
| SDQ difficulties total score | <.0001 | |||
| Mean ( | 8.81 (±4.69) | 8.12 (±4.57) | 16.38 (±6.32) | |
| Missing | 7 (3.26%) | 0 (0%) | 6 (2.41%) | |
Note: Between group differences were assessed for the Training Set, Test Set, and Evaluation Set using ANOVA. No significant differences were found for age, sex, race, ethnicity, or site (scanner).
Brain age prediction accuracy across all models
| Model | MAE (years) | MRD | Optimal algorithm |
|---|---|---|---|
| Morphometry + WM Connectomes | 1.1801 | 1.962 | SEML—Family |
| WM alone | 1.3494 | 2.634 | SEML—Family |
| Morphometry alone | 1.578 | 4.301 | Deep learning (MLP) |
| Age‐harmonized—Morphometry + WM | 0.261 | 0.128 | SEML—Family |
| Age‐harmonized—WM alone | 0.332 | 0.185 | SEML—Family |
| Age‐harmonized—Morphometry alone | 1.438 | 3.964 | SEML—Family |
| Outcome‐harmonized—Morphometry + WM | 0.880 | 1.270 | SEML—Family |
| Outcome‐harmonized—WM alone | 0.776 | 1.246 | SEML—Family |
| Outcome‐harmonized—Morphometry alone | 1.801 | 4.152 | XGBoost |
| Global FA + eTIV | 2.582 | — | — |
| Global FA alone | 2.858 | — | — |
| eTIV alone | 2.591 | — | — |
Note: SEML Models using both morphometry and white matter connectomes performed best when using the unharmonized brain data (top) and age‐harmonized brain data (middle). A SEML model using only the white matter connectomes performed best when using the outcome‐harmonized brain data (middle). All ML models outperformed linear models using FA and/or eTIV (bottom).
Abbreviations: FA, fractional anisotropy; MAE, mean absolute error; MLP, multi‐layer perceptron; MRD, mean residual deviance; SEML, stacked ensemble machine learning, WM, white matter.
FIGURE 2Scatterplot of predicted age versus chronological age. Scatterplot depicting the relationship between predicted and chronological age for the participants in the held‐out test sets for the best age models built using the unharmonized (N = 48), age‐harmonized (N = 48), and age‐outcome‐harmonized (N = 23) brain data
FIGURE 3Scatterplot of BrainPAD versus outcome measures. Scatterplots depicting the relationship between BrainPAD and (a) CBCL, (b) CGAS, (c) Flanker, and (d) SDQ scores among participants in the evaluation dataset (N = 249). BrainPAD scores are adjusted for age, sex, race, ethnicity, and site (scanner)