| Literature DB >> 34985964 |
Tim Hahn1, Jan Ernsting1,2, Nils R Winter1, Vincent Holstein1, Ramona Leenings1,2, Marie Beisemann3, Lukas Fisch1, Kelvin Sarink1, Daniel Emden1, Nils Opel1,4, Ronny Redlich1,5, Jonathan Repple1, Dominik Grotegerd1, Susanne Meinert1, Jochen G Hirsch6, Thoralf Niendorf7, Beate Endemann7, Fabian Bamberg8, Thomas Kröncke9, Robin Bülow10, Henry Völzke11, Oyunbileg von Stackelberg12,13, Ramona Felizitas Sowade12,13, Lale Umutlu14, Börge Schmidt14, Svenja Caspers15,16, Harald Kugel17, Tilo Kircher18, Benjamin Risse2, Christian Gaser19, James H Cole20,21, Udo Dannlowski1, Klaus Berger22.
Abstract
The deviation between chronological age and age predicted from neuroimaging data has been identified as a sensitive risk marker of cross-disorder brain changes, growing into a cornerstone of biological age research. However, machine learning models underlying the field do not consider uncertainty, thereby confounding results with training data density and variability. Also, existing models are commonly based on homogeneous training sets, often not independently validated, and cannot be shared because of data protection issues. Here, we introduce an uncertainty-aware, shareable, and transparent Monte Carlo dropout composite quantile regression (MCCQR) Neural Network trained on N = 10,691 datasets from the German National Cohort. The MCCQR model provides robust, distribution-free uncertainty quantification in high-dimensional neuroimaging data, achieving lower error rates compared with existing models. In two examples, we demonstrate that it prevents spurious associations and increases power to detect deviant brain aging. We make the pretrained model and code publicly available.Entities:
Year: 2022 PMID: 34985964 PMCID: PMC8730629 DOI: 10.1126/sciadv.abg9471
Source DB: PubMed Journal: Sci Adv ISSN: 2375-2548 Impact factor: 14.136
Fig. 1.Example data illustrating the effects of adjusting the BAG for individual uncertainty.
Left: Regression model (solid line) with uncertainty estimate (e.g., 95% predictive interval; dotted lines) trained on toy data with varying density and variability (light grey) was applied to three test samples (dark gray). BAG is defined as a test sample’s distance from the regression line. Right: Uncertainty adjustment increases BAG in areas of low uncertainty (left-most test sample) and decreases it in areas of high uncertainty (right-most test sample). σa, aleatory uncertainty; σe, epistemic uncertainty.
MAE for all models, cross-validation schemes, and independent validation samples.
CV, cross-validation. For cross-validation, SD across folds is given in parentheses.
|
|
|
|
|
|
|
| RVR | 3.37 (0.16) | 3.32 (0.13) | 3.60 | 5.07 | 4.91 |
| GPR | 3.05 (0.22) | 3.09 (0.11) | 3.74 | 4.15 | 5.03 |
| SVM | 3.05 (0.22) | 3.09 (0.11) | 3.74 | 4.15 | 5.03 |
| SVM-rbf | 4.19 (0.27) | 4.16 (0.16) | 4.79 | 9.92 | 8.10 |
| LASSO | 4.25 (0.30) | 4.19 (0.12) | 4.44 | 8.35 | 6.94 |
| ANN | 3.10 (0.14) | 3.02 (0.15) | 3.56 | 3.76 | 4.48 |
| MCCQR | 2.94 (0.22) | 2.95 (0.16) | 3.45 | 3.91 | 4.57 |
Fig. 2.PICP for leave-site-out GNC and independent validation samples (BiDirect, MACS, and IXI) for the RVR, the GPR, and our MCCQR Neural Network.
Underestimation (overestimation) of uncertainty occurs if empirical PICPs are below (above) optimal PICP as indicated by the solid line.