| Literature DB >> 28280741 |
Hang Zhang1, Ziyang Xie1, Yuwen Yang1, Yizhen Zhao1, Bao Zhang2, Jing Fang3.
Abstract
Microarray analysis of gene expression is often used to diagnose different types of disease. Many studies report remarkable achievements in nervous system disease. Clinical diagnosis of schizophrenia (SCZ) still depends on doctors' experience, which is unreliable and needs to be more objective and quantified. To solve this problem, we collected whole blood gene expression data from four studies, including 152 individuals with schizophrenia (SCZ) and 138 normal controls in different regions. The correlation-based feature selection (CFS, one of the machine learning methods) algorithm was applied in this study, and 103 significantly differentially expressed genes between patients and controls, called "feature genes," were selected; then, a model for SCZ diagnosis was built. The samples were subdivided into 10 groups, and cross-validation showed that the model we constructed achieved nearly 100% classification accuracy. Mathematical evaluation of the datasets before and after data processing proved the effectiveness of our algorithm. Feature genes were enriched in Parkinson's disease, oxidative phosphorylation, and TGF-beta signaling pathways, which were previously reported to be associated with SCZ. These results suggest that the analysis of gene expression in whole blood by our model could be a useful tool for diagnosing SCZ.Entities:
Mesh:
Year: 2017 PMID: 28280741 PMCID: PMC5322573 DOI: 10.1155/2017/7860506
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The flowchart of the diagnostic classification of schizophrenia.
Detailed information about dataset collected.
| Dataset | Case | Control | Tissue | Platform |
|---|---|---|---|---|
|
| 18 | 12 | Blood | GPL15314 Arraystar Human LncRNA microarray V2.0 |
|
| 15 | 22 | Blood | GPL6883 Illumina HumanRef-8 v3.0 expression beadchip |
|
| 106 | 96 | Blood | GPL6947 Illumina HumanHT-12 V3.0 expression beadchip |
|
| 13 | 8 | Blood | GPL5175 [HuEx-1_0-st] Affymetrix Human Exon 1.0 ST Array |
The evaluation results of data processing.
| | LWL | BayesNet | SMO | KNN | NativeBayes | J48 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Before | After | Before | After | Before | After | Before | After | Before | After | Before | After | |
| CCI | 73.1 |
| 79.3 |
| 72.4 |
| 69.0 |
| 51.7 |
| 58.6 |
|
| RRSE | 85.8 |
| 91.3 |
| 105.4 |
| 81.8 |
| 139.4 |
| 128.0 |
|
| CC | 99.7 |
| 79.3 |
| 72.4 |
| 100.0 |
| 51.7 |
| 62.1 |
|
|
| 0.73 |
| 0.80 |
| 0.71 |
| 0.67 |
| 0.37 |
| 0.57 |
|
| ROC | 0.81 |
| 0.83 |
| 0.70 |
| 0.84 |
| 0.48 |
| 0.51 |
|
| PRC | 0.80 |
| 0.79 |
| 0.66 |
| 0.84 |
| 0.49 |
| 0.53 |
|
CCI, correctly classified instances; RRSE, Root Relative Squared Error; CC, Coverage of Cases (0.95 level). ROC, Receiver Operating Characteristic; PRC, Precision and Recall Curve.
Figure 2The comparisons of evaluation indicators before and after data processing. (a) Comparison of classifier training (NativeBayes) values. (b) Comparison of F-measure values.
The pathway results of significant differences between patient and control group.
| Feature | ID | Input number | Background number |
|
|---|---|---|---|---|
| Parkinson's disease | hsa05012 | 6 | 108 | 7.96 |
| Oxidative phosphorylation | hsa00190 | 4 | 98 | 0.003793377 |
| TGF-beta signaling pathway | hsa04350 | 3 | 66 | 0.009126485 |
| Metabolic pathways | hsa01100 | 14 | 1118 | 0.010961336 |
| Alzheimer's disease | hsa05010 | 4 | 139 | 0.012275212 |
| Primary immunodeficiency | hsa05340 | 2 | 34 | 0.021039942 |
| Huntington's disease | hsa05016 | 4 | 165 | 0.021326367 |
| Vascular smooth muscle contraction | hsa04270 | 3 | 109 | 0.032774232 |
| ABC transporters | hsa02010 | 2 | 44 | 0.033204998 |
| Nonalcoholic fatty liver disease | hsa04932 | 3 | 129 | 0.049322146 |
Note: the KEGG pathway analysis shows genes in these pathways are significantly differently expressed (P < 0.05). In addition, each pathway contains no fewer than 2 genes.
Figure 3Functional category by pathways.