| Literature DB >> 33986383 |
Eugene Lin1,2,3, Chieh-Hsin Lin4,5,6, Hsien-Yuan Lane7,8,9,10.
Abstract
Genetic variants such as single nucleotide polymorphisms (SNPs) have been suggested as potential molecular biomarkers to predict the functional outcome of psychiatric disorders. To assess the schizophrenia' functional outcomes such as Quality of Life Scale (QLS) and the Global Assessment of Functioning (GAF), we leveraged a bagging ensemble machine learning method with a feature selection algorithm resulting from the analysis of 11 SNPs (AKT1 rs1130233, COMT rs4680, DISC1 rs821616, DRD3 rs6280, G72 rs1421292, G72 rs2391191, 5-HT2A rs6311, MET rs2237717, MET rs41735, MET rs42336, and TPH2 rs4570625) of 302 schizophrenia patients in the Taiwanese population. We compared our bagging ensemble machine learning algorithm with other state-of-the-art models such as linear regression, support vector machine, multilayer feedforward neural networks, and random forests. The analysis reported that the bagging ensemble algorithm with feature selection outperformed other predictive algorithms to forecast the QLS functional outcome of schizophrenia by using the G72 rs2391191 and MET rs2237717 SNPs. Furthermore, the bagging ensemble algorithm with feature selection surpassed other predictive algorithms to forecast the GAF functional outcome of schizophrenia by using the AKT1 rs1130233 SNP. The study suggests that the bagging ensemble machine learning algorithm with feature selection might present an applicable approach to provide software tools for forecasting the functional outcomes of schizophrenia using molecular biomarkers.Entities:
Year: 2021 PMID: 33986383 PMCID: PMC8119477 DOI: 10.1038/s41598-021-89540-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Genotype frequencies of 11 genetic polymorphisms in 302 schizophrenia patients.
| Genetic polymorphisms | Genotype frequency | |
|---|---|---|
| AA/AG/GG: 0.31/0.49/0.20 | 0.899 | |
| GG/GA/AA: 0.56/0.35/0.09 | 0.066 | |
| TT/TA/AA: 0.79/0.20/0.01 | 0.676 | |
| AA/AG/GG: 0.48/0.45/0.07 | 0.170 | |
| TT/TA/AA: 0.41/0.42/0.18 | 0.040 | |
| AA/AG/GG: 0.37/0.49/0.14 | 0.597 | |
| AA/AG/GG: 0.36/0.51/0.13 | 0.133 | |
| CC/CT/TT: 0.30/0.46/0.24 | 0.183 | |
| GG/GA/AA: 0.31/0.48/0.21 | 0.593 | |
| AA/GA/GG: 0.30/0.48/0.22 | 0.578 | |
| TT/GT/GG: 0.25/0.51/0.24 | 0.814 |
The results of repeated tenfold cross-validation experiments for predicting the QLS and GAF functional outcome of schizophrenia with genetic variants using machine learning predictors such as the bagging ensemble model with feature selection, the bagging ensemble model, MFNNs, SVM, linear regression, and random forests.
| Algorithm | QLS | GAF | ||||
|---|---|---|---|---|---|---|
| RMSE | Feature set | Number of features | RMSE | Feature set | Number of features | |
| Bagging ensemble with feature selection | Feature-B | 2 | Feature-C | 1 | ||
| Bagging ensemble | 8.7102 ± 1.0716 | Feature-A | 11 | 9.7777 ± 1.3301 | Feature-A | 11 |
| SVM | 8.8799 ± 1.0893 | Feature-A | 11 | 10.0754 ± 1.4486 | Feature-A | 11 |
| MFNNs | 8.8675 ± 1.1103 | Feature-A | 11 | 10.0625 ± 1.3753 | Feature-A | 11 |
| Linear regression | 8.7839 ± 1.0538 | Feature-A | 11 | 9.7011 ± 1.3341 | Feature-A | 11 |
| Random forests | 9.4253 ± 1.1750 | Feature-A | 11 | 10.4998 ± 1.3586 | Feature-A | 11 |
The best QLS or GAF score is shown in bold.
Feature-A: 11 features (related to 11 SNPs) including AKT1 rs1130233, COMT rs4680, DISC1 rs821616, DRD3 rs6280, G72 rs1421292, G72 rs2391191, 5-HT2A rs6311, MET rs2237717, MET rs41735, MET rs42336, and TPH2 rs4570625.
Feature-B: 2 features (related to 2 SNPs) including G72 rs2391191 and MET rs2237717.
Feature-C: 1 feature (related to 1 SNP) including AKT1 rs1130233.
GAF Global assessment of functioning, MFNNs Multilayer feedforward neural networks, QLS Quality of life scale, RMSE Root mean square error, SNPs Single nucleotide polymorphisms, SVM Support vector machine.
Data are presented as mean ± standard deviation.
Figure 1The schematic illustration of the bagging ensemble machine learning method with feature selection. First, the M5 Prime feature selection algorithm is conducted to find a small subset of biomarkers, which serves as the input to the bagging ensemble machine learning method. The concept of the bagging ensemble machine learning method is to create the multiple versions of a base model by bootstrap reproductions. Then, the ultimate prediction is generated by averaging the predictive performance of the multiple versions. The base model was chosen as linear regression in this study.