| Literature DB >> 33343431 |
Pedro F Da Costa1,2, Jessica Dafflon1, Walter H L Pinaya3.
Abstract
As we age, our brain structure changes and our cognitive capabilities decline. Although brain aging is universal, rates of brain aging differ markedly, which can be associated with pathological mechanism of psychiatric and neurological diseases. Predictive models have been applied to neuroimaging data to learn patterns associated with this variability and develop a neuroimaging biomarker of the brain condition. Aiming to stimulate the development of more accurate brain-age predictors, the Predictive Analytics Competition (PAC) 2019 provided a challenge that included a dataset of 2,640 participants. Here, we present our approach which placed between the top 10 of the challenge. We developed an ensemble of shallow machine learning methods (e.g., Support Vector Regression and Decision Tree-based regressors) that combined voxel-based and surface-based morphometric data. We used normalized brain volume maps (i.e., gray matter, white matter, or both) and features of cortical regions and anatomical structures, like cortical thickness, volume, and mean curvature. In order to fine-tune the hyperparameters of the machine learning methods, we combined the use of genetic algorithms and grid search. Our ensemble had a mean absolute error of 3.7597 years on the competition, showing the potential that shallow methods still have in predicting brain-age.Entities:
Keywords: brain-age; genetic algorithm; linear models; shallow machine learning; support vector machine
Year: 2020 PMID: 33343431 PMCID: PMC7738323 DOI: 10.3389/fpsyt.2020.604478
Source DB: PubMed Journal: Front Psychiatry ISSN: 1664-0640 Impact factor: 4.157
Figure 1Overview of the different methods used in our analysis. In addition to the gray matter (GM) and white matter (WM) volume maps provided by the PAC competition, we also pre-processed the data in order to obtain the regional volume, thickness and mean curvature information of the brain using Freesurfer. We then used different strategies that involved creating a gram matrix, dimensionality reduction algorithms (e.g., PCA) and TPOT (an automated machine learning framework) to train different models. In addition, to using different pre-processing, we also trained different models for the different sites where the data was recorded. All models that had a mean absolute error (MAE) lower than 7 years were used to build a weighted ensemble.
Performance of each machine learning model when using the whole dataset.
| SVR | WM data | 5.589 |
| SVR | GM data | 5.004 |
| SVR | GM+WM data | 4.571 |
| SVR | vol | 7.187 |
| LR | PC from GM data | 13.609 |
| LR | PC from WM data | 13.613 |
| GPR | curv | 7.200 |
| GPR | thk+vol | 6.385 |
| GPR | thk+vol+curv | 6.132 |
The results are presented as the mean MAE for a 5-fold cross validation. WM, white matter volumetric map; GM, gray matter volumetric map; vol, regional volume; curv, regional mean curvature; thk, regional thickness; PC, principal components.
Performance of the SVR model when using White Matter + Gray Matter volumetric data from each specific site.
| 0 | 5.087 |
| 1 | 4.473 |
| 2 | 4.887 |
| 3 | 3.620 |
| 4 | 1.662 |
| 5 | 4.527 |
| 6 | 3.091 |
| 7 | 9.777 |
| 8 | 3.850 |
| 9 | 5.678 |
| 10 | 6.266 |
| 11 | 5.188 |
| 12 | 4.846 |
| 13 | 7.084 |
| 14 | 7.070 |
| 15 | 1.159 |
| 16 | 2.447 |
The results are presented as the mean MAE for a 3-fold cross validation.
Performance of the resulting TPOT pipelines when using thickness, volume, and mean curvature information from each specific site separately.
| 0 | 3 Lasso + RVR + Ridge + RF | 5.557 |
| 1 | Lasso + KNR | 4.101 |
| 2 | ElasticNet + Extra Trees + Ridge | 4.721 |
| 3 | Linear SVR + RF | 4.027 |
| 4 | 2 Extra Trees + Ridge | 2.05 |
| 5 | RF | 6.667 |
| 6 | 2 GPR | 5.940 |
| 7 | 2 ElasticNet | 5.638 |
| 8 | ElasticNet + RF | 3.938 |
| 9 | Lasso + RF + Extra Trees | 6.685 |
| 10 | KNR + DT + Ridge | 9.210 |
| 11 | RVR | 4.213 |
| 12 | DT + Ridge | 4.375 |
| 13 | 2 RF + DT + Ridge | 10.155 |
| 14 | Extra Trees + 2 DT + LR + Ridge | 10.849 |
| 15 | LR | 1.861 |
| 16 | RF + ElasticNet + DT | 2.220 |
The results are presented as the mean MAE for a 5-fold cross validation. Lasso, lasso model fit with least angle regression; RVR, relevance vector regressor; Ridge, linear least squares with l2 regularization; RF, random forest; KNR, K-neighbors regressor; DT, decision tree; GPR, gaussian process regressor; LR, linear regression.