| Literature DB >> 33738883 |
Lea Baecker1, Jessica Dafflon2, Pedro F da Costa2, Rafael Garcia-Dias1, Sandra Vieira1, Cristina Scarpazza1,3, Vince D Calhoun4,5, João R Sato6, Andrea Mechelli1, Walter H L Pinaya1,6,7.
Abstract
Brain morphology varies across the ageing trajectory and the prediction of a person's age using brain features can aid the detection of abnormalities in the ageing process. Existing studies on such "brain age prediction" vary widely in terms of their methods and type of data, so at present the most accurate and generalisable methodological approach is unclear. Therefore, we used the UK Biobank data set (N = 10,824, age range 47-73) to compare the performance of the machine learning models support vector regression, relevance vector regression and Gaussian process regression on whole-brain region-based or voxel-based structural magnetic resonance imaging data with or without dimensionality reduction through principal component analysis. Performance was assessed in the validation set through cross-validation as well as an independent test set. The models achieved mean absolute errors between 3.7 and 4.7 years, with those trained on voxel-level data with principal component analysis performing best. Overall, we observed little difference in performance between models trained on the same data type, indicating that the type of input data had greater impact on performance than model choice. All code is provided online in the hope that this will aid future research.Entities:
Keywords: biological ageing; healthy ageing; machine learning; regression analysis; support vector machine
Mesh:
Year: 2021 PMID: 33738883 PMCID: PMC8090783 DOI: 10.1002/hbm.25368
Source DB: PubMed Journal: Hum Brain Mapp ISSN: 1065-9471 Impact factor: 5.038
Demographic information on UK Biobank data set from Sites 1 to 2
| UK Biobank ( | ||
|---|---|---|
| Site 1 ( | Site 2 ( | |
| Age, years | ||
| Mean ± | 61.3 ± 6.9 | 62.4 ± 6.7 |
| Range | [47, 73] | [47, 73] |
| Sex, n (%) | ||
| Men | 4,734 (45%) | 149 (43%) |
| Women | 5,746 (55%) | 195 (57%) |
Performance metrics for region‐ or voxel‐based SVR, RVR and GPR models in 10‐times 10‐fold CV (UK Biobank Site 1) with or without dimensionality reduction through PCA
| Data type | Method | MAE | Weighted MAE | RMSE | Pearson's | Prediction | Age‐BrainAGE correlation |
|---|---|---|---|---|---|---|---|
| Region | SVR | 4.43 (0.09) | 0.17 | 5.48 (0.12) | 0.62 (0.00) | 0.37 (0.03) | −0.73 (0.00) |
| RVR | 4.43 (0.09) | 0.17 | 5.44 (0.11) | 0.62 (0.00) | 0.38 (0.02) | −0.78 (0.00) | |
| GPR | 4.42 (0.09) | 0.17 | 5.44 (0.11) | 0.62 (0.00) | 0.38 (0.02) | −0.77 (0.00) | |
| Voxel (no PCA) | SVR | 4.33 (0.10) | 0.17 | 5.43 (0.12) | 0.73 (0.00) | 0.39 (0.03) | −0.23 (0.00) |
| RVR | 3.69 (0.45) | 0.14 | 4.60 (0.50) | 0.75 (0.02) | 0.55 (0.11) | −0.62 (0.02) | |
| Voxel (PCA) | SVR | 3.89 (0.08) | 0.15 | 4.86 (0.11) | 0.71 (0.00) | 0.51 (0.02) | −0.68 (0.00) |
| RVR | 3.90 (0.08) | 0.15 | 4.85 (0.10) | 0.71 (0.00) | 0.51 (0.02) | −0.72 (0.00) | |
| GPR | 3.90 (0.08) | 0.15 | 4.85 (0.10) | 0.71 (0.00) | 0.51 (0.02) | −0.71 (0.00) |
Note: . In each column, the data are presented as mean value (SD) across all model iterations. GPR performance on voxel‐level data without PCA was not assessed.
Statistical assessment of differences in model performance in terms of MAE of the region‐ or voxel based SVR, RVR and GPR in 10‐times 10‐fold CV
| SVR (region) | RVR (region) | GPR (region) | SVR (voxel, no PCA) | RVR (voxel, no PCA) | SVR (voxel, PCA) | RVR (voxel, PCA) | GPR (voxel, PCA) | |
|---|---|---|---|---|---|---|---|---|
| SVR (region) | – | 0.98 (−0.02) | 0.80 (0.26) | 0.50 (0.68) | 0.14 (1.50) |
|
|
|
| RVR (region) | – | 0.48 (0.70) | 0.48 (0.71) | 0.13 (1.51) |
|
|
| |
| GPR (region) | – | 0.51 (0.67) | 0.14 (1.50) |
|
|
| ||
| SVR (voxel, no PCA) | – | 0.20 (1.30) |
|
|
| |||
| RVR (voxel, no PCA) | – | 0.69 (−0.40) | 0.66 (−0.44) | 0.67 (−0.43) | ||||
| SVR (voxel, PCA) | – | 0.44 (0.78) | 0.31 (1.02) | |||||
| RVR (voxel, PCA) | – | 0.58 (−0.55) | ||||||
| GPR (voxel, PCA) | – |
Note: The table presents the p‐values (t‐statistic). Statistical significance was assessed using a version of the paired Student's t test corrected for the violation of the independence assumption in CV. The significance level was corrected for multiple comparisons using Bonferroni's method (α = .05/28 ≈ .0018). Statistically significant differences between model performances are shown in bold.
Performance metrics of region‐ or voxel‐based SVR, RVR and GPR models with or without PCA in independent test set (UK Biobank Site 2)
| Data type | Method | MAE | Weighted MAE | RMSE | Pearson's | Prediction | Age‐BrainAGE correlation |
|---|---|---|---|---|---|---|---|
| Region | SVR | 4.06 (0.02) | 0.16 | 5.07 (0.02) | 0.65 (0.00) | 0.42 (0.00) | −0.72 (0.00) |
| RVR | 4.10 (0.02) | 0.16 | 5.06 (0.02) | 0.66 (0.00) | 0.42 (0.00) | −0.77 (0.00) | |
| GPR | 4.08 (0.01) | 0.16 | 5.05 (0.01) | 0.66 (0.00) | 0.42 (0.00) | −0.77 (0.00) | |
| Voxel (no PCA) | SVR | 4.69 (0.09) | 0.18 | 5.92 (0.11) | 0.71 (0.01) | 0.21 (0.03) | −0.16 (0.01) |
| RVR | 3.66 (0.57) | 0.14 | 4.51 (0.61) | n/a | 0.53 (0.15) | −0.82 (0.16) | |
| Voxel (PCA) | SVR | 3.77 (0.04) | 0.15 | 4.65 (0.04) | 0.74 (0.00) | 0.51 (0.01) | −0.60 (0.01) |
| RVR | 3.82 (0.03) | 0.15 | 4.65 (0.04) | 0.74 (0.00) | 0.51 (0.01) | −0.64(0.01) | |
| GPR | 3.81 (0.03) | 0.15 | 4.64 (0.04) | 0.74 (0.00) | 0.51 (0.01) | −0.63 (0.01) |
Note: In each column, the data are presented as mean value (SD) of the predictions from the 100 model iterations. GPR performance on voxel‐level data without PCA was not assessed.
Pearson's r for voxel‐based RVR without PCA could not be calculated, since the model underfitted to the training set and predicted the sample mean age in 41 out of the 100 iterations; therefore, their predictions in the independent test set had no variance.
Statistical assessment of differences in model performance in terms of MAE of the region‐ or voxel‐based SVR, RVR and GPR models in an independent test set (UK Biobank Site 2)
| SVR (region) | RVR (region) | GPR (region) | SVR (voxel, no PCA) | RVR (voxel, no PCA) | SVR (voxel, PCA) | RVR (voxel, PCA) | GPR (voxel, PCA) | |
|---|---|---|---|---|---|---|---|---|
| SVR (region) | ‐ | 0.04 (−2.07) | 0.15 (−1.46) |
| 0.51 (0.66) |
|
|
|
| RVR (region) | ‐ | 0.23 (1.21) |
| 0.47 (0.73) |
|
|
| |
| GPR (region) | ‐ |
| 0.49 (0.70) |
|
|
| ||
| SVR (voxel, no PCA) | ‐ | 0.10 (1.66) |
|
|
| |||
| RVR (voxel, no PCA) | ‐ | 0.86 (−0.18) | 0.80 (−0.25) | 0.81 (−0.25) | ||||
| SVR (voxel, PCA) | ‐ | 0.10 (1.67) | 0.08 (−1.77) | |||||
| RVR (voxel, PCA) | ‐ | 0.66 (0.44) | ||||||
| GPR (voxel, PCA) | ‐ |
Note: The table presents the p‐values (t‐statistic). Statistical significance was assessed using a version of the paired Student's t test corrected for the violation of the independence assumption in CV. The significance level was corrected for multiple comparisons using Bonferroni's method (α = .05/28 ≈ .0018). Statistically significant differences between model performances are shown in bold.
FIGURE 1MAE of region‐ and voxel‐based SVR, RVR, and GPR models with or without PCA for the training set size compared to chance level (7.5 years; black dotted line). MAE is shown for the performance within the training (red line) and test set (green line) of the CV (Site 1) and in the independent test set (Site 2; blue line). The confidence intervals (shaded areas) for the different size of the data sets were calculated using bootstrap analysis. Note that bootstrap training samples were selected to be age‐ and sex‐homogeneous of increasing size with the minimum of one man and one woman per age and maximum of 20 men and 20 women per age. For the voxel‐based models with PCA, data sets with <150 subjects could not be assessed, because the PCA algorithm requires more samples than principal components. Furthermore, training set sizes above 500 were not calculated due limited time and computational resources
FIGURE 2A decision tree for researchers choosing the most suitable brain age prediction model for their project. The ranking is inferred from our experience developing the models as well as the results of our investigation. These recommendations are thus built on the UK Biobank data set and our specific computational resources, so any application to other projects should be done with caution. The models in this study were developed using a high‐end consumer‐grade desktop computer with a 16‐core (32‐processes) CPU @ 3.40 GHz utilising 128 GB RAM. The voxel‐based models with PCA took 1–2 weeks to train, while the voxel‐based models without PCA took <1 day. The region‐based models took <1 hr to train