| Literature DB >> 30979917 |
Yiwang Zhou1,2, Lu Zhao3, Nina Zhou1,2, Yi Zhao1, Simeone Marino1, Tuo Wang1,4, Hanbo Sun1,4, Arthur W Toga3, Ivo D Dinov5,6,7,8.
Abstract
The UK Biobank is a rich national health resource that provides enormous opportunities for international researchers to examine, model, and analyze census-like multisource healthcare data. The archive presents several challenges related to aggregation and harmonization of complex data elements, feature heterogeneity and salience, and health analytics. Using 7,614 imaging, clinical, and phenotypic features of 9,914 subjects we performed deep computed phenotyping using unsupervised clustering and derived two distinct sub-cohorts. Using parametric and nonparametric tests, we determined the top 20 most salient features contributing to the cluster separation. Our approach generated decision rules to predict the presence and progression of depression or other mental illnesses by jointly representing and modeling the significant clinical and demographic variables along with the derived salient neuroimaging features. We reported consistency and reliability measures of the derived computed phenotypes and the top salient imaging biomarkers that contributed to the unsupervised clustering. This clinical decision support system identified and utilized holistically the most critical biomarkers for predicting mental health, e.g., depression. External validation of this technique on different populations may lead to reducing healthcare expenses and improving the processes of diagnosis, forecasting, and tracking of normal and pathological aging.Entities:
Mesh:
Year: 2019 PMID: 30979917 PMCID: PMC6461626 DOI: 10.1038/s41598-019-41634-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Clustering optimization based on average Silhouette value for (a) k-means clustering (b) hierarchical clustering. The optimal number of clusters is two, which maximizes the average Silhouette value for both k-means and hierarchical clustering.
Figure 2Panel a: Multidimensional scaling (MDS) for neuroimaging biomarkers with clustering labels generated by k-means clustering. MCR is the misclassification rate based on the 1,000 k-means clustering experiments. Panels b and c: 2-dimensional plots of (b) PCA and (c) t-SNE for the brain neuroimaging biomarkers with the clustering label generated by k-means clustering.
Figure 33-dimensional plots of (a) PCA and (b) t-SNE for the brain neuroimaging biomarkers with the red (cluster1) and blue (cluster2) clustering labels generated by k-means clustering.
Figure 4Density plots of the scaled top twenty brain neuroimaging biomarkers with the clustering label generated by k-means clustering. Details about the specific FreeSurfer[8] derivation and interpretation of the neuroimaging biomarkers listed in Table 1 and shown in Fig. 4 are available online at https://surfer.nmr.mgh.harvard.edu/fswiki/FsTutorial/AnatomicalROI and https://surfer.nmr.mgh.harvard.edu/fswiki/FsTutorial/AnatomicalROI/FreeSurferColorLUT.
Summary statistics of the unscaled values for the top twenty brain neuroimaging biomarkers separating cluster 1 and 2.
| Name | Computed Phenotype 1 (cluster 1) | Computed Phenotype 2 (cluster 2) | Significance | ||||
|---|---|---|---|---|---|---|---|
| Mean | Median | SD | Mean | Median | SD | ||
| rh_BA_exvivo_area__rh_WhiteSurfArea_area | 83,840 | 84,354 | 4,687 | 96,861 | 95,814 | 5,283 | *** |
| lh_BA_exvivo_area__lh_WhiteSurfArea_area | 83,571 | 84,190 | 4,685 | 96,467 | 95,423 | 5,260 | *** |
| rh_aparc_area__rh_WhiteSurfArea_area | 78,686 | 79,209 | 4,556 | 91,279 | 90,232 | 5,126 | *** |
| rh_aparc.a2009s_area__rh_WhiteSurfArea_area | 78,705 | 79,231 | 4,556 | 91,299 | 90,238 | 5,125 | *** |
| lh_aparc_area__lh_WhiteSurfArea_area | 78,418 | 79,022 | 4,545 | 90,870 | 89,864 | 5,103 | *** |
| lh_aparc.a2009s_area__lh_WhiteSurfArea_area | 78,437 | 79,037 | 4,545 | 90,891 | 89,890 | 5,103 | *** |
| aseg__SupraTentorialVol | 944,195 | 948,603 | 62,451 | 1,102,286 | 1,093,828 | 72,318 | *** |
| aseg__SupraTentorialVolNotVent | 921,038 | 925,358 | 61,368 | 1,072,067 | 1,063,706 | 70,466 | *** |
| aseg__SupraTentorialVolNotVentVox | 918,633 | 922,839 | 61,275 | 1,069,254 | 1,060,989 | 70,261 | *** |
| aseg__BrainSegVol | 1,077,598 | 1,082,266 | 69,154 | 1,247,680 | 1,238,569 | 79,105 | *** |
| aseg__BrainSegVolNotVent | 1,050,639 | 1,055,186 | 67,942 | 1,213,003 | 1,204,173 | 77,138 | *** |
| aseg__BrainSegVolNotVentSurf | 1,050,038 | 1,054,468 | 67,912 | 1,212,341 | 1,203,650 | 77,144 | *** |
| aseg__CortexVol | 431,767 | 433,836 | 28,159 | 496,015 | 492,066 | 31,309 | *** |
| aseg__rhCortexVol | 216,033 | 217,094 | 14,119 | 248,268 | 246,293 | 15,757 | *** |
| aseg__MaskVol | 1,479,307 | 1,482,621 | 97,626 | 1,700,818 | 1,691,706 | 107,502 | *** |
| aseg__lhCortexVol | 215,734 | 216,893 | 14,237 | 247,747 | 245,918 | 15,738 | *** |
| aseg__TotalGrayVol | 590,111 | 592,534 | 36,298 | 669,907 | 665,861 | 40,069 | *** |
| rh_aparc.DKTatlas_area__rh_superiortemporal_area | 4,411 | 4,414 | 329 | 5,038 | 5,005 | 385 | *** |
| rh_aparc.DKTatlas_area__rh_superiorfrontal_area | 8,055 | 8,034 | 751 | 9,475 | 9,382 | 887 | *** |
| lh_aparc.DKTatlas_area__lh_superiortemporal_area | 4,723 | 4,716 | 387 | 5,459 | 5,411 | 472 | *** |
Significance code: ***p-value <1 × 10−8. The p-values were calculated based on Whitney-Wilcoxon tests.
Figure 5Mosaic plots for some of the significantly different categorical features detected by Chi-square test and Fisher’s exact test. The six parts of the figure include (a) Sex; (b) Sensitivity/hurt feelings; (c) Worrier/anxious feelings; (d) Risk taking; (e) Ever depressed for a whole week; and (f) Sleeplessness/insomnia. The standard residuals, reported in the right margins, indicate the significance of the differences.
Figure 6Variable importance plots for four different outcome predictions: (a) sensitivity/hurt feelings; (b) ever depressed for a whole week; (c) worrier/anxious feelings; and (d) miserableness based on mean decrease Gini values by random forest.
Cross-validated random forest prediction results for “sensitivity/hurt feelings,” “ever depressed for a whole week,” “worrier/anxious feelings,” and “miserableness.”
| Accuracy | 95% CI of Accuracy | Sensitivity | Specificity | |
|---|---|---|---|---|
| Sensitivity/hurt feelings | 0.720 | (0.686, 0.753) | 0.684 | 0.754 |
| Ever depressed for a whole week | 0.778 | (0.746, 0.807) | 0.912 | 0.640 |
| Worrier/anxious feelings | 0.739 | (0.706, 0.771) | 0.723 | 0.755 |
| Miserableness | 0.743 | (0.710, 0.775) | 0.867 | 0.550 |
Random forest prediction results for “sensitivity/hurt feelings,” “ever depressed for a whole week,” “worrier/anxious feelings,” and “miserableness” in the testing dataset.
| Accuracy | 95% CI of Accuracy | Sensitivity | Specificity | |
|---|---|---|---|---|
| Sensitivity/hurt feelings | 0.708 | (0.677, 0.737) | 0.690 | 0.724 |
| Ever depressed for a whole week | 0.773 | (0.745, 0.800) | 0.908 | 0.624 |
| Worrier/anxious feelings | 0.725 | (0.695, 0.754) | 0.735 | 0.716 |
| Miserableness | 0.747 | (0.718, 0.775) | 0.880 | 0.521 |