| Literature DB >> 30809667 |
Sandra Vieira1, Qi-Yong Gong2,3, Walter H L Pinaya1,4, Cristina Scarpazza1,5, Stefania Tognin1, Benedicto Crespo-Facorro6,7, Diana Tordesillas-Gutierrez6,8, Victor Ortiz-García6,7, Esther Setien-Suero6,7, Floortje E Scheepers9, Neeltje E M Van Haren10, Tiago R Marques1, Robin M Murray1, Anthony David1, Paola Dazzan1, Philip McGuire1, Andrea Mechelli1.
Abstract
Despite the high level of interest in the use of machine learning (ML) and neuroimaging to detect psychosis at the individual level, the reliability of the findings is unclear due to potential methodological issues that may have inflated the existing literature. This study aimed to elucidate the extent to which the application of ML to neuroanatomical data allows detection of first episode psychosis (FEP), while putting in place methodological precautions to avoid overoptimistic results. We tested both traditional ML and an emerging approach known as deep learning (DL) using 3 feature sets of interest: (1) surface-based regional volumes and cortical thickness, (2) voxel-based gray matter volume (GMV) and (3) voxel-based cortical thickness (VBCT). To assess the reliability of the findings, we repeated all analyses in 5 independent datasets, totaling 956 participants (514 FEP and 444 within-site matched controls). The performance was assessed via nested cross-validation (CV) and cross-site CV. Accuracies ranged from 50% to 70% for surfaced-based features; from 50% to 63% for GMV; and from 51% to 68% for VBCT. The best accuracies (70%) were achieved when DL was applied to surface-based features; however, these models generalized poorly to other sites. Findings from this study suggest that, when methodological precautions are adopted to avoid overoptimistic results, detection of individuals in the early stages of psychosis is more challenging than originally thought. In light of this, we argue that the current evidence for the diagnostic value of ML and structural neuroimaging should be reconsidered toward a more cautious interpretation.Entities:
Keywords: psychosis; multivariate pattern recognition/classification; neuroimaging/multi-site
Mesh:
Year: 2020 PMID: 30809667 PMCID: PMC6942152 DOI: 10.1093/schbul/sby189
Source DB: PubMed Journal: Schizophr Bull ISSN: 0586-7614 Impact factor: 7.348
Demographic and Clinical Characteristics for FEP and HC for Each Site
| Chengdu, China (N = 222) | London, England (N = 142) | Santander A, Spain (N = 220) | Santander B, Spain (N = 210) | Utrecht, The Netherlands (N = 162) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| HC | FEP | HC | FEP | HC | FEP | HC | FEP | HC | FEP | ||
|
| 111 | 111 | 71 | 71 | 110 | 110 | 70 | 140 | 81 | 81 | |
| Gender (%) | M | 51 (46) | 51 (46) | 36 (51) | 36 (51) | 68 (62) | 68 (62) | 45 (64) | 90 (64) | 64 (79) | 64 (79) |
| F | 61 (54) | 61 (54) | 35 (49) | 35 (49) | 42 (38) | 42 (38) | 25 (46) | 50 (46) | 17 (21) | 17 (21) | |
| χ2 = ns | χ2 = ns | χ2 = ns | χ2 = ns | χ2 = ns | |||||||
| Age M (SD) | 27.2 (7.3) | 25.7 (8.1) | 26.8 (7.1) | 26.4 (6.2) | 29.7 (7.8) | 28.5 (8.6) | 27.3 (7.5) | 28.3 (7.6) | 26.9 (8.0) | 25.2 (5.9) | |
|
|
|
|
|
| |||||||
| TIV (L) M (SD) | 1.5 (0.1) | 1.5 (0.2) | 1.5 (0.2) | 1.5 (0.2) | 1.5 (0.1) | 1.4 (0.2) | 1.5 (0.1) | 1.5 (0.1) | 1.6 (0.1) | 1.5 (0.2) | |
|
|
|
|
|
| |||||||
| Positive symptoms M (SD) | — | 24.6 (6.6)a | — | 13.9 (5.5)a | — | 14.7 (4.6)b | — | 14.4 (4.1)b | — | 15.9 (6.3)a | |
| Negative symptoms M (SD) | — | 18.2 (7.7)a | — | 16.0 (6.0)a | — | 6.3 (4.6)c | — | 6.1 (5.0)c | — | 16.2 (6.9)a | |
| Duration of illness (years) Med (IQR) | — | 0.3 (1.1) | — | 1.1 (0.3) | — | 0.3 (0.7) | — | 0.3 (0.9) | — | 0.6 (1.0) |
Note: TIV, total intracranial volume; L, liters; M, male; F, female; FEP, first episode psychosis; HC, healthy controls, SD, standard deviation; Med, median; IQR, interquartile range.
aPANSS: Positive and Negative Symptoms Scale.
bSAPS: Scale for the Assessment of Negative Symptoms.
cSANS: Scale for the Assessment of Negative Symptoms.
ns: P > .05
Fig. 1.Three features were extracted from each image: GMV, VBCT, and FreeSurfer surface-based regional volumes and cortical thickness. The dimensionality of GMV and VBCT was reduced through PCA. The resulting features were analyzed with four classifiers: (a) SVM, (b) LR, (c) KNN and (d) DNN. GMV, gray matter volume; VBCT, voxel-based cortical thickness; PCA, principal component analysis; SVM, support vector machine; KNN, k-nearest neighbors; LR, logistic regression; DNN, deep neural network.
Fig. 2.Schematic representation of nested CV. Nested CV involves a secondary inner CV loop using the training data from the primary outer CV split, where different sets of hyperparameters are tested (eg, different values for the C parameter for SVM). The best-performing hyperparameters among the 10 inner folds are then used to train a model in the whole training set defined by the outer loop. This model is then tested using the test set of the outer loop. The final performance is estimated by averaging accuracies in the test set across all 10 outer folds. CV, cross-validation; SVM, support vector machine.
Accuracies (Sensitivity/Specificity) for Each Feature Set and Algorithm Across All Sites Using Nested 10-fold Stratified Cross-Validation. The Classifier Yielding the Best Balanced Accuracy Is Highlighted in Bold for Each Site
| Regional volumes and cortical thickness | GMV | VBCT | ||
|---|---|---|---|---|
| Site 1 | KNN | 60.7** (74.3/47.1) |
| 62.1** (72.1/52.1) |
| LR | 61.9** (64.9/58.9) | 60.1** (62.9/58.6) |
| |
| SVM | 61.3** (66.4/56.2) |
| 52.7* (24.6/97.3) | |
| DNN |
| 57.7** (59.5/56.0) | 66.4** (63.9/68.3) | |
| Site 2 | KNN | 56.7 (50.9/62.5) | 43.9 (33.6/54.3) | 53.5 (38.4/68.6) |
| LR | 51.6 (45.0/58.2) | 51.9 (53.8/50.0) |
| |
| SVM | 45.9 (49.3/42.5) |
| 51.0 (96.3/5.7) | |
| DNN |
| 40.8 (47.4/34.3) | 53.4 (52.4/55.3) | |
| Site 3 | KNN | 59.6** (45.5/73.6) | 50.5 (31.8/69.1) | 58.0* (50.0/66.4) |
| LR | 58.6* (58.2/59.1) | 63.2** (63.6/62.7) | 59.1* (58.2/60.0) | |
| SVM | 60.5** (61.8/59.1) |
| 51.8* (90.9/12.7) | |
| DNN |
| 50.2 (52.7/63.6) |
| |
| Site 4 | KNN | 56.6* (91.8/21.4) | 58.9** (70.7/47.1) | 59.5* (67.7/51.1) |
| LR | 54.8 (73.9/35.7) | 59.6** (57.8/61.4) |
| |
| SVM | 56.0 (65.0/47.1) | 57.4* (71.9/42.9) | 58.4* (71.9/52.9) | |
| DNN |
|
| 58.8** (62.4/53.1) | |
| Site 5 | KNN | 52.7 (53.6/51.8) | 54.5 (33.8/75.3) | 52.2 (36.5/67.9) |
| LR | 58.5* (61.7/55.4) | 61.3** (56.8/65.7) |
| |
| SVM |
|
| 56.3 (51.2/61.4) | |
| DNN | 54.9 (59.2/51.8) | 58.0** (58.1/57.9) | 60.1** (56.1/64.2) |
Note: SVM, support vector machine; LR, logistic regression, KNN, k-nearest neighbors; DNN, deep neural network; GMV, voxel-based gray matter volume; VBCT, voxel-based cortical thickness.
*P < .05; **P < .01.
Fig. 3.(A) Accuracy of diagnostic sMRI ML studies over time and sample size (circle increases with sample size). From the first study until 2015, the vast majority of studies reported accuracies ranging between 70% and 100%; from 2016, however, performances have dropped overall with accuracies ranging between chance-level and 85%. (B) Funnel plot for sMRI studies in schizophrenia and FEP showing the distribution of individual studies according to their sample size (1/√ESS) and effect size (log diagnostic odds ratio). The plot revealed statistically significant asymmetric distribution around the main effect of sMRI studies (P = .013), indicating a bias favoring higher effect sizes. sMRI, structural MRI; ML, machine learning; FEP, first episode psychosis.