| Literature DB >> 30560750 |
Bjørn H Ebdrup1,2, Martin C Axelsen1,3, Nikolaj Bak1, Birgitte Fagerlund1,4, Bob Oranje1,2,5, Jayachandra M Raghava1,6, Mette Ø Nielsen1,2, Egill Rostrup1, Lars K Hansen3, Birte Y Glenthøj1,2.
Abstract
BACKGROUND: A wealth of clinical studies have identified objective biomarkers, which separate schizophrenia patients from healthy controls on a group level, but current diagnostic systems solely include clinical symptoms. In this study, we investigate if machine learning algorithms on multimodal data can serve as a framework for clinical translation.Entities:
Keywords: Antipsychotic-naïve first-episode schizophrenia; cognition; diffusion tensor imaging; electrophysiology; machine learning; structural magnetic resonance imaging
Year: 2018 PMID: 30560750 PMCID: PMC6877469 DOI: 10.1017/S0033291718003781
Source DB: PubMed Journal: Psychol Med ISSN: 0033-2917 Impact factor: 7.723
Demographical and clinical data. Lifetime use of tobacco, alcohol, cannabis, stimulants, hallucinogens, and opioids were categorized according to an ordinal five-item (0 = never tried/1 = tried few times/2 = use regularly/3 = harmful use/4 = dependency)
| Schizophrenia | Healthy controls | ||||
|---|---|---|---|---|---|
| Mean (SD) | Mean (SD) | ||||
| Age, years | 46 | 25.0 (5.6) | 58 | 24.79(5.68) | 0.79 |
| Gender (m/f) | 46 | 28/18 | 58 | 36/22 | 0.901 |
| Parental SES (a/b/c) | 44 | 8/30/6 | 56 | 16/30/10 | <0.001 |
| Years of education | 45 | 12.5 (2.6) | 56 | 14.58(2.60) | <0.001 |
| Danish Adult Reading Test (DART) | 41 | 21.4 (9.7) | 56 | 23.2 (6.3) | 0.27 |
| Total IQ (WAIS III) | 41 | −0.8 (1.5) | 54 | 0.0 (1.0) | 0.002 |
| Tobacco (0/1/2/3/4) | 45 | 7/11/22/1/4 | 59 | 13/29/11/2/1 | 0.003 |
| Alcohol (0/1/2/3/4) | 46 | 2/6/33/4/1 | 57 | 3/1/53/0/0 | 0.005 |
| Cannabis (0/1/2/3/4) | 46 | 8/23/9/6/0 | 57 | 23/28/6/0/0 | 0.003 |
| Opioids (0/1/2/3/4) | 46 | 38/8/0/0/0 | 56 | 52/4/0/0/0 | 0.132 |
| Stimulants (0/1/2/3/4) | 46 | 27/14/5/0/0 | 56 | 47/9/0/0/0 | 0.003 |
| Hallucinogens (0/1/2/3/4) | 45 | 38/7/0/0/0 | 57 | 53/3/0/0/0 | 0.105 |
| Other drugs (0/1/2/3/4) | 43 | 40/3/0/0/0 | 56 | 54/2/0/0/0 | 0.65 |
| Benzodiazepines (0/1/2/3/4) | 42 | 31/11/0/6/0 | 55 | 55/0/0/0/0 | <0.001 |
| DUI, weeks | 45 | 65.8 (70.5) | – | – | – |
| CGI, severity | 44 | 4.2 (0.7) | – | – | – |
| GAF, symptom | 44 | 40.9 (9.9) | – | – | – |
| GAF, function | 43 | 42.6 (11.1) | – | – | – |
| PANSS, positive | 46 | 20.1 (4.2) | – | – | – |
| PANSS, negative | 46 | 21.3 (7.9) | – | – | – |
| PANSS, general | 46 | 42.2 (9.4) | – | – | – |
| PANSS, total | 46 | 83.5 (17.2) | – | – | – |
| Amisulpride, mg/day | 32 | 248.4 (140.6) | – | – | – |
| Remission (yes/no) | 34 | 11/23 | – | – | – |
SES, parental socioeconomic status; DUI, duration of untreated illness; CGI, Clinical Global Impression Scale; GAF, Global Assessment of Functioning; PANSS, Positive And Negative Syndrome Scale.
Mann–Whitney U test.
χ2.
Danish Adult Reading Test (DART) (Nelson and O'Connell, 1978).
Two-sample t test with pooled variance estimates.
A combined score based on four subtests from WAIS III: Wechsler Adult Intelligence Scale (Wechsler Adult Intelligence Scale® – Third Edition n.d.), presented as z-scores standardized from the mean and standard deviation of the healthy control sample.
fFisher's exact test.
gSymptom remission after 6 weeks according to Andreasen criteria (Andreasen et al., 2005).
Fig. 1.Diagram of the multivariate analysis pipeline. Forty-six patients and 58 healthy controls were included in the baseline analyses. ‘Data’ refer to input variables from cognition, electrophysiology, structural magnetic resonance imaging, and diffusion tensor imaging. For each of the 100 splits, 2/3 of subjects were used for training and 1/3 of subjects were used for testing. Subjects with missing data were not used in test sets. Training data were scaled (zero mean, unit variance), and the test sets were scaled using these parameters. Missing data were imputed using K-nearest neighbor imputation with K = 3 (Bak and Hansen, 2016), and only subjects with complete data were included in the test sets. Finally, nine different configurations of machine learning algorithms were applied to predict diagnosis. CV = cross-validation. See text for details.
Fig. 2.Unimodal diagnostic accuracies for cognition (Cog), electrophysiology (EEG), structural magnetic resonance imaging (sMRI), and diffusion tensor imaging (DTI) for each of the nine different configurations of machine learning algorithms. X-axes show the accuracies (acc), and y-axes show the sum of correct classifications for each of the 100 random subsamples (see Fig. 1). Dotted vertical black line indicates chance accuracy (56%). With cognitive data, all nine configurations of algorithms significantly classified ‘patient v. control’ (p values = 0.001–0.009). No algorithms using EEG, sMRI, and DTI-data resulted in accuracies exceeding chance. The nine different configuration of machine learning algorithms: nB, naïve Bayes; LR, logistic regression without regularization; LR_r, logistic regression with regularization; SVM_l, support vector machine with linear kernel; SVM_h, SVM with heuristic parameters; SVM_o, SVM optimized through cross-validation; DT, decision tree; RF, random forest; AS, auto-sklearn. See text for details.
Fig. 3.(a) Manhattan plot with univariate t tests of all variables along the x-axis [cognition (Cog), electrophysiology (EEG), structural magnetic resonance imaging (sMRI), and diffusion tensor imaging (DTI)] and log-transformed p values along the y-axis. Lower dashed horizontal line indicates significance level of p = 0.05. Upper dashed lines indicate the Bonferroni-corrected p value for each modality. (b) In colored horizontal lines, the fraction of data splits (see Fig. 1), where individual variables were included in the final machine learning model, which determined the diagnostic accuracy (presented in Fig. 2). Specification of variables is provided in online Supplementary Material. Only configurations of the six machine learning algorithms, which included feature selection, are shown. nB, naïve Bayes; LR, logistic regression without regularization; LR_r, logistic regression with regularization; SVM_l, support vector machine with linear kernel; DT, decision tree; RF, random forest.