| Literature DB >> 32582679 |
Eugene Lin1,2,3, Chieh-Hsin Lin3,4,5, Chung-Chieh Hung6, Hsien-Yuan Lane3,6,7,8.
Abstract
In the wake of recent advances in artificial intelligence research, precision psychiatry using machine learning techniques represents a new paradigm. The D-amino acid oxidase (DAO) protein and its interaction partner, the D-amino acid oxidase activator (DAOA, also known as G72) protein, have been implicated as two key proteins in the N-methyl-D-aspartate receptor (NMDAR) pathway for schizophrenia. Another potential biomarker in regard to the etiology of schizophrenia is melatonin in the tryptophan catabolic pathway. To develop an ensemble boosting framework with random undersampling for determining disease status of schizophrenia, we established a prediction approach resulting from the analysis of genomic and demographic variables such as DAO levels, G72 levels, melatonin levels, age, and gender of 355 schizophrenia patients and 86 unrelated healthy individuals in the Taiwanese population. We compared our ensemble boosting framework with other state-of-the-art algorithms such as support vector machine, multilayer feedforward neural networks, logistic regression, random forests, naive Bayes, and C4.5 decision tree. The analysis revealed that the ensemble boosting model with random undersampling [area under the receiver operating characteristic curve (AUC) = 0.9242 ± 0.0652; sensitivity = 0.8580 ± 0.0770; specificity = 0.8594 ± 0.0760] performed maximally among predictive models to infer the complicated relationship between schizophrenia disease status and biomarkers. In addition, we identified a causal link between DAO and G72 protein levels in influencing schizophrenia disease status. The study indicates that the ensemble boosting framework with random undersampling may provide a suitable method to establish a tool for distinguishing schizophrenia patients from healthy controls using molecules in the NMDAR and tryptophan catabolic pathways.Entities:
Keywords: N-methyl-D-aspartate receptor; ensemble boosting; multilayer feedforward neural networks; precision psychiatry; schizophrenia
Year: 2020 PMID: 32582679 PMCID: PMC7287032 DOI: 10.3389/fbioe.2020.00569
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
FIGURE 1The schematic illustration of the ensemble boosting method. The idea of the ensemble boosting approach is to train weak/base classifiers sequentially in a way that each classifier tries to correct its predecessor. A higher weight is assigned to samples that were incorrectly classified by earlier rounds. That is, week/base classifiers are produced in sequence based on a weighted version of the data during the training phase. The final classification prediction is then produced by a weighted majority vote.
Demographic characteristics of schizophrenia patients and healthy individuals.
| Characteristic | Schizophrenia patients | Healthy individuals | |
| No. of subjects (n) | 355 | 86 | |
| Gender (male)% | 61.9% | 52.3% | 0.101 |
| Age (year) | 39.6 ± 10.0 | 37.8 ± 12.2 | 0.136 |
| DAO level (ng/mL) | 37.64 ± 14.18 | 28.03 ± 9.84 | 5.55 × 10–9 |
| G72 level (ng/μL) | 3.24 ± 1.80 | 1.68 ± 0.81 | 4.71 × 10–14 |
| Melatonin level (pg/mL) | 89.89 ± 46.07 | 60.04 ± 42.72 | 9.75 × 10–7 |
The results of repeated 10-fold cross-validation experiments for differentiating schizophrenia patients from healthy individuals using ensemble boosting with random undersampling, ensemble boosting, SVM, MFNNs, logistic regression, random forests, naive Bayes, and C4.5 decision tree with biomarkers such as DAO protein levels, G72 protein levels, melatonin protein levels, age, and gender.
| Ensemble boosting with random undersampling | 0.9242 ± 0.0652 | 0.8580 ± 0.0770 | 0.8594 ± 0.0760 | 5 |
| Ensemble boosting | 0.9010 ± 0.0464 | 0.8442 ± 0.0447 | 0.5803 ± 0.1446 | 5 |
| SVM | 0.6720 ± 0.0837 | 0.8461 ± 0.0393 | 0.4979 ± 0.1364 | 5 |
| MFNN with 1 hidden layer | 0.8920 ± 0.0463 | 0.8343 ± 0.0457 | 0.5816 ± 0.1340 | 5 |
| MFNN with 2 hidden layers | 0.8949 ± 0.0455 | 0.8391 ± 0.0515 | 0.6121 ± 0.1383 | 5 |
| MFNN with 3 hidden layers | 0.8884 ± 0.0507 | 0.8359 ± 0.0463 | 0.6312 ± 0.1454 | 5 |
| Logistic Regression | 0.8677 ± 0.0566 | 0.8497 ± 0.0566 | 0.5660 ± 0.1295 | 5 |
| Random Forests | 0.8543 ± 0.0627 | 0.8229 ± 0.0379 | 0.4197 ± 0.1213 | 5 |
| naive Bayes | 0.8546 ± 0.0628 | 0.8320 ± 0.0473 | 0.6611 ± 0.1411 | 5 |
| C4.5 decision tree | 0.7701 ± 0.0721 | 0.8306 ± 0.0469 | 0.4526 ± 0.1272 | 5 |