| Literature DB >> 32778656 |
Karen S Ambrosen1, Martin W Skjerbæk2, Jonathan Foldager3, Martin C Axelsen2,3, Nikolaj Bak4, Lars Arvastson4, Søren R Christensen4, Louise B Johansen2,5, Jayachandra M Raghava2,6, Bob Oranje2,7, Egill Rostrup2, Mette Ø Nielsen2,8, Merete Osler9,10, Birgitte Fagerlund2,11, Christos Pantelis2,12, Bruce J Kinon13, Birte Y Glenthøj2,8, Lars K Hansen3, Bjørn H Ebdrup2,8.
Abstract
The reproducibility of machine-learning analyses in computational psychiatry is a growing concern. In a multimodal neuropsychiatric dataset of antipsychotic-naïve, first-episode schizophrenia patients, we discuss a workflow aimed at reducing bias and overfitting by invoking simulated data in the design process and analysis in two independent machine-learning approaches, one based on a single algorithm and the other incorporating an ensemble of algorithms. We aimed to (1) classify patients from controls to establish the framework, (2) predict short- and long-term treatment response, and (3) validate the methodological framework. We included 138 antipsychotic-naïve, first-episode schizophrenia patients with data on psychopathology, cognition, electrophysiology, and structural magnetic resonance imaging (MRI). Perinatal data and long-term outcome measures were obtained from Danish registers. Short-term treatment response was defined as change in Positive And Negative Syndrome Score (PANSS) after the initial antipsychotic treatment period. Baseline diagnostic classification algorithms also included data from 151 matched controls. Both approaches significantly classified patients from healthy controls with a balanced accuracy of 63.8% and 64.2%, respectively. Post-hoc analyses showed that the classification primarily was driven by the cognitive data. Neither approach predicted short- nor long-term treatment response. Validation of the framework showed that choice of algorithm and parameter settings in the real data was successfully guided by results from the simulated data. In conclusion, this novel approach holds promise as an important step to minimize bias and obtain reliable results with modest sample sizes when independent replication samples are not available.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32778656 PMCID: PMC7417553 DOI: 10.1038/s41398-020-00962-8
Source DB: PubMed Journal: Transl Psychiatry ISSN: 2158-3188 Impact factor: 6.222
Demographic and clinical characteristics of patients with schizophrenia and healthy control subjects.
| Schizophrenia patients | Healthy controls | Statistics | ||||
|---|---|---|---|---|---|---|
| Distribution | Distribution | |||||
| Subjects, cohorts A/B/Ca | 138 | 31/46/61 | 151 | 27/53/71 | 0.623 | |
| Age, years, Mean (SD)b | 135 | 25.36 (5.88) | 146 | 25.48 (5.61) | 0.638 | |
| Gender, Male/Femalea | 138 | 94/44 | 151 | 99/52 | 0.645 | |
| P-SES, High/Moderate/Lowa | 134 | 39/73/22 | 146 | 61/70/15 | 0.057 | |
| Years of education, Mean (SD)b | 103 | 11.47 (2.61) | 71 | 13.95 (3.86) | < | |
| Handedness according to EHI Score, Right/Ambidextrous/Leftc | 134 | 115/3/16 | 138 | 124/1/13 | – | 0.459 |
| Estimated premorbid intelligence (Danish Adult Reading Test (DART)), Mean (SD) [Mean | 122 | 22.11 (8.51) [−0.59] | 139 | 26.65 (7.63) [0.0] | ||
| Estimated intelligence based on WAIS, Mean | 69 | −1.26 | 79 | 0.0 | – | – |
| Estimated intelligence based on WAIS-III, Mean | 52 | 0.73 | 59 | 0.0 | – | – |
| PANSS, positive, Mean (SD) | 134 | 20.12 (4.36) | – | – | – | – |
| PANSS, negative, Mean (SD) | 134 | 21.00 (6.69) | – | – | – | – |
| PANSS, general, Mean (SD) | 134 | 39.20 (9.57) | – | – | – | – |
| PANSS, total, Mean (SD) | 134 | 80.32 (16.45) | – | – | – | – |
| DUI, weeks, Mean (SD)h | 96 | 113.51 (163.64) | – | – | – | – |
Analyses were performed on subjects with available data. Some variables were not available for all cohorts, hence the varying N. Significant p-values (p < 0.05) are in bold. Handedness was determined with The Edinburgh Handedness Inventory (EHI)[58].
Duration of untreated illness (DUI) was registered and defined as the time from initial decline in functioning estimated as a consequence of unspecific symptoms related to psychosis[59].
P-SES parental socioeconomic status, EHI Edinburgh Handedness Inventory score, PANSS Positive and Negative Syndrome Scale, DUI duration of untreated illness.
aPearson χ2 test.
bMann−Whitney U test.
cFisher’s exact test.
dTwo-sample t test with pooled variance estimates.
eA combined score based on the Similarities and Vocabulary subtests from WAIS/WAIS III: Wechsler Adult Intelligence Scale (Wechsler Adult Intelligence Scale®), presented as Z-scores standardized from the mean and standard deviation of the healthy control sample.
fOnly data from cohorts A and B.
gOnly data from cohort C.
hOnly data from cohorts B and C.
Fig. 1Radial dendrogram depicting our data model. Modalities were divided into submodalities, each with a set of features.
The nodes closest to the center (depicted as a brain) represent the modalities. Distal to these are the submodalities, and along the circumference are the leaves representing the features (i.e. the variables). MRI magnetic resonance imaging, LH left hemisphere, RH right hemisphere, MMN mismatch negativity, PPI prepulse inhibition, SA selective attention, ISI inter-stimulus-interval, APGAR Appearance Pulse Grimace Activity Respiration, BACS Brief Assessment of Cognition in Schizophrenia, Buschke Buschke Selective Reminding Test, DART Danish version of the National Adult Reading Test, IED Intra-Extra Dimensional Set Shifting, RTI reaction time, RVP Rapid Visual Information Processing, SDTM Symbol Digit Modalities Test, SOC Stockings of Cambridge, SSP Spatial Span, SWM Spatial Working Memory, SCOLP Speed and Capacity of Language Processing, TMT Trail Making Test, WAIS Wechsler Adult Intelligence Scale, WCST Wisconsin Card Sorting Test. For a description of electrophysiology features, see Supplementary Table S1.
Fig. 2Overall machine-learning framework using both simulated and real data.
In the outer CV loop, the data were randomly split, leaving out 25% of the subjects for testing with 10 replications for the simulated data and 100 replications for the real data. The subjects were stratified with respect to cohort (short-term treatment response) or outcome (long-term treatment response and diagnostic classification). Values missing at random were imputed using estimates derived from the training set. The training and test sets were standardized by the mean and standard deviation derived from the training set. Both training and test data were split into submodalities. We used the following conventional covariates: sex, age, cohort, and handedness. In the inner CV loop, the training data were further split into a training and test set using threefold CV. Threefold CV was selected as a tradeoff between limited sample size and computation time. Algorithm parameters and ensembles were optimized in the inner CV loop with two different approaches (see text). The best performing model was applied to the outer CV loop test set and the prediction of each submodality was combined in a late integration scheme to provide the prediction. The analysis of the real data followed the same framework as the simulated data, except that only the best, median and poorest performing algorithms, parameter settings, and methods learned from the simulated data were applied on the real data. CV cross-validation.
Fig. 3Flow diagram of subject inclusion into diagnostic classification and prediction of short- and long-term treatment response.
mean relative change in Positive And Negative Syndrome Score, s.d. standard deviation, LTR long-term treatment response.
Performance and confidence intervals of the selected algorithms when predicting the three different problems.
| Single algorithm approach | ||
| Diagnostic classification | BACC (%) | 95% confidence interval |
| Best performance: | ||
| Medium performance: | ||
| Worst performance: | 50.4 | [44.0, 56.8] |
| Long-term treatment response (classification) | ||
| Best performance: | 50.3 | [39.4, 61.2] |
| Medium performance: | 49.7 | [44.7, 54.6] |
| Worst performance: | 50.0 | [50.0, 50.0] |
| Short-term treatment response (regression) | NMSE | 95% confidence interval |
| Best performance: | 0.96 | [0.43, 1.49] |
| Medium performance: | 0.96 | [0.42, 1.51] |
| Worst performance: | 14.86 | [0, 35.09] |
Balanced accuracy and NMSE are averaged across 100 cross-validation splits. Values in bold are significant on a 95% confidence level. BACC, balanced accuracy.
NMSE normalized mean squared error, SVM support vector machine.