| Literature DB >> 30171211 |
Abraham Nunes1,2, Hugo G Schnack3, Christopher R K Ching4,5, Ingrid Agartz6,7,8,9, Theophilus N Akudjedu10, Martin Alda1, Dag Alnæs6,7, Silvia Alonso-Lana11,12, Jochen Bauer13, Bernhard T Baune14, Erlend Bøen8, Caterina Del Mar Bonnin15, Geraldo F Busatto16,17, Erick J Canales-Rodríguez11,12, Dara M Cannon10, Xavier Caseras18, Tiffany M Chaim-Avancini16,17, Udo Dannlowski19, Ana M Díaz-Zuluaga20, Bruno Dietsche21, Nhat Trung Doan6,7, Edouard Duchesnay22, Torbjørn Elvsåshagen6,23, Daniel Emden19, Lisa T Eyler24,25, Mar Fatjó-Vilas11,12,26, Pauline Favre22, Sonya F Foley27, Janice M Fullerton28,29, David C Glahn30,31, Jose M Goikolea15, Dominik Grotegerd19, Tim Hahn19, Chantal Henry32, Derrek P Hibar5, Josselin Houenou22,33, Fleur M Howells34,35, Neda Jahanshad5, Tobias Kaufmann6,7, Joanne Kenney10, Tilo T J Kircher21, Axel Krug21, Trine V Lagerberg6, Rhoshel K Lenroot36,37, Carlos López-Jaramillo20,38, Rodrigo Machado-Vieira16,39, Ulrik F Malt40,41, Colm McDonald10, Philip B Mitchell36,42, Benson Mwangi39, Leila Nabulsi10, Nils Opel19, Bronwyn J Overs28, Julian A Pineda-Zapata43, Edith Pomarol-Clotet11,12, Ronny Redlich19, Gloria Roberts36,42, Pedro G Rosa16,17, Raymond Salvador11,12, Theodore D Satterthwaite44, Jair C Soares39, Dan J Stein45, Henk S Temmingh45,46, Thomas Trappenberg2, Anne Uhlmann45,47, Neeltje E M van Haren3,48, Eduard Vieta15, Lars T Westlye6,7,49, Daniel H Wolf44, Dilara Yüksel21, Marcus V Zanetti16,17,50, Ole A Andreassen6,7, Paul M Thompson5, Tomas Hajek51.
Abstract
Bipolar disorders (BDs) are among the leading causes of morbidity and disability. Objective biological markers, such as those based on brain imaging, could aid in clinical management of BD. Machine learning (ML) brings neuroimaging analyses to individual subject level and may potentially allow for their diagnostic use. However, fair and optimal application of ML requires large, multi-site datasets. We applied ML (support vector machines) to MRI data (regional cortical thickness, surface area, subcortical volumes) from 853 BD and 2167 control participants from 13 cohorts in the ENIGMA consortium. We attempted to differentiate BD from control participants, investigated different data handling strategies and studied the neuroimaging/clinical features most important for classification. Individual site accuracies ranged from 45.23% to 81.07%. Aggregate subject-level analyses yielded the highest accuracy (65.23%, 95% CI = 63.47-67.00, ROC-AUC = 71.49%, 95% CI = 69.39-73.59), followed by leave-one-site-out cross-validation (accuracy = 58.67%, 95% CI = 56.70-60.63). Meta-analysis of individual site accuracies did not provide above chance results. There was substantial agreement between the regions that contributed to identification of BD participants in the best performing site and in the aggregate dataset (Cohen's Kappa = 0.83, 95% CI = 0.829-0.831). Treatment with anticonvulsants and age were associated with greater odds of correct classification. Although short of the 80% clinically relevant accuracy threshold, the results are promising and provide a fair and realistic estimate of classification performance, which can be achieved in a large, ecologically valid, multi-site sample of BD participants based on regional neurostructural measures. Furthermore, the significant classification in different samples was based on plausible and similar neuroanatomical features. Future multi-site studies should move towards sharing of raw/voxelwise neuroimaging data.Entities:
Mesh:
Year: 2018 PMID: 30171211 PMCID: PMC7473838 DOI: 10.1038/s41380-018-0228-9
Source DB: PubMed Journal: Mol Psychiatry ISSN: 1359-4184 Impact factor: 15.992
Descriptive statistics of the whole sample
| Controls | Cases | p-Value | |
|---|---|---|---|
| 2167 | 853 | ||
| Age mean (SD) | 34.89 (12.41) | 37.43 (11.64) | < 0.001 |
| Sex, | 1201 (55.4) | 516 (60.5) | 0.013 |
| Diagnosis, | |||
| BD-I | - | 582 (68.63) | |
| BD-II | - | 234 (27.59) | |
| BD-NOS | - | 13 (1.53) | |
| SZA | - | 19 (2.24) | |
| Treatment at the time of scanning, | |||
| Li | 265 (33.5) | ||
| AED | - | 339 (43.1) | |
| FGA | - | 32 (4.1) | |
| SGA | - | 313 (39.9) | |
| AD | - | 281 (35.5) | |
| Mood state, | |||
| Euthymic | - | 475 (75.5) | |
| Depressed | - | 131 (20.8) | |
| Manic | - | 11 (1.7) | |
| Hypomanic | - | 9 (1.4) | |
| Mixed | - | 3 (0.5) | |
| Age of onset mean (SD) | - | 22.36 (9.08) | |
| Duration of illness mean (SD) | - | 14.64 (10.45) | |
| History of psychosis, | - | 372 (61.1) | |
AD antidepressants, AED antiepileptics, BD-I bipolar I disorder, BD-II bipolar II disorder, BD-NOS bipolar disorder not otherwise specified, FGA first-generation antipsychotics, Li lithium, SD standard deviation, SGA second-generation antipsychotics, SZA schizoaffective disorder
Fig. 1a Performance of SVM classifiers independently trained on each sample – mean with 95% confidence interval. Each row denotes a site in the data set, whereas each column denotes a specific performance metric. b Meta-analytic (summary) receiver operating characteristic (SROC) curves. Site-level sensitivity (Sn) and specificity (Sp) are empty circles of radius proportional to sample size. The red point is the median estimate of Sn and Sp. The solid black line is the SROC curve. Dashed diagonal represents chance performance. The red ellipse is the 95% posterior credible region, and the blue dashed line is the 95% posterior predictive region. c Receiver operating characteristic (ROC) curves for the aggregate subject-level analysis. Faint gray lines are the ROC curves for individual validation folds, and blue lines represent the mean ROC curve
Summary of classification results from meta-analysis of sample-level classifiers, leave-one-site-out and aggregate subject-level analyses
| Statistic | Meta-analysis | Leave-one-site-out | Aggregate subject-level |
|---|---|---|---|
| Accuracy (%) | - | 58.67 (56.70–60.63) | 65.23 (63.47–67.00) |
| ROC-AUC | - | 60.92 (58.18–63.67) | 71.49 (69.39–73.59) |
| Sensitivity (%) | 42.60 (13.40–71.57) | 51.99 (48.20–55.78) | 66.02 (62.71–69.33) |
| Specificity (%) | 59.14 (30.59–87.94) | 64.85 (61.91–67.79) | 64.90 (62.86–66.93) |
| PPV (%) | - | 47.25 (37.67–56.84) | 44.45 (42.04–46.86) |
| NPV (%) | - | 67.67 (60.36–74.98) | 83.73 (82.21–85.26) |
Note that meta-analytic results of the HSROC package include only sensitivity and specificity of the overall meta-analytic classification. Results for meta-analytic summary are the posterior predictive value of the performance metric, reported as mean (95% credible interval; the Bayesian analog of 95% confidence intervals). Results for the aggregate subject-level and leave-one-site-out analyses are reported as mean and 95% confidence interval
NPV negative predictive value, PPV positive predictive value, ROC-AUC area under receiver operating characteristic curve
Fig. 2Violin plot of feature importance across cross-validation (CV) folds for aggregate subject-level analysis (left), and the site, which yielded the highest ROC-AUC (right). At each CV iteration, we extracted linear support vector machine (SVM) coefficients. The set of all coefficients from our SVM models are centered about 0. Deviation of coefficients from zero is an indication of the relative importance of individual features in the data. Features with positive and negative coefficients have positive and negative associations, respectively, with probability of classification as a case. The y axis lists variables for which SVM coefficients were strictly non-zero throughout all cross-validation iterations
Fig. 3Bar plot of the area under the receiver operating characteristic curve (ROC-AUC) for the leave-one-site-out (LOSO) analyses. The sites listed along the x axis are those that were held-out at each fold