| Literature DB >> 30002337 |
Xia Guo1, Xue Jiang2, Jing Xu3, Xiongwen Quan4, Min Wu5, Han Zhang6.
Abstract
Due to the complexity of the pathological mechanisms of neurodegenerative diseases, traditional differentially-expressed gene selection methods cannot detect disease-associated genes accurately. Recent studies have shown that consensus-guided unsupervised feature selection (CGUFS) performs well in feature selection for identifying disease-associated genes. Since the random initialization of the feature selection matrix in CGUFS results in instability of the final disease-associated gene set, for the purposes of this study we proposed an ensemble method based on CGUFS-namely, ensemble consensus-guided unsupervised feature selection (ECGUFS) in order to further improve the accuracy of disease-associated genes and the stability of feature gene sets. We also proposed a bagging integration strategy to integrate the results of CGUFS. Lastly, we conducted experiments with Huntington's disease RNA sequencing (RNA-Seq) data and obtained the final feature gene set, where we detected 287 disease-associated genes. Enrichment analysis on these genes has shown that postsynaptic density and the postsynaptic membrane, synapse, and cell junction are all affected during the disease's progression. However, ECGUFS greatly improved the accuracy of disease-associated gene prediction and the stability of the disease-associated gene set. We conducted a classification of samples with labels based on the linear support vector machine with 10-fold cross-validation. The average accuracy is 0.9, which suggests the effectiveness of the feature gene set.Entities:
Keywords: Huntington’s disease; RNA-Seq data; disease-associated genes; ensemble consensus guided unsupervised feature selection
Year: 2018 PMID: 30002337 PMCID: PMC6071299 DOI: 10.3390/genes9070350
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Flow chart of the ensemble consensus-guided unsupervised feature selection (ECGUFS) algorithm. Consensus-guided unsupervised feature selection (CGUFS); : a vector of weights for all genes; : the area under the receiver operating characteristic (ROC) curve of the gene-ranked list.
Ensemble consensus-guided unsupervised feature selection.
Note: Initialize the elements of between 0 and 1, . Initialize through consensus clustering.
Performance mean ± standard deviation of FNMF, jNMFMA, CGUFS, and ECGUFS.
| FNMF | jNMFMA | CGUFS | ECGUFS | |
|---|---|---|---|---|
| AUC | 56.0 ± 1.9 | 56.7 ± 1.6 | 54.3 ± 1.5 | 59.2 ± 0.8 |
| AUPR | 20.4 ± 1.9 | 20.7 ± 1.6 | 22.5 ± 1.8 | 29.4 ± 1.9 |
FNMF: Flexible non-negative matrix factorization method; jNMFMA: Joint non-negative matrix factorization meta-analysis method; AUC: Area under the ROC curve; AUPR: Area under the precision-recall (PR) curve.
Figure 2ROC curves of the t-test, fold change (FC), multi-label propagation clustering algorithm (LP), FNMF, jNMFMA, CGUFS, limma, edgeR, and ECGUFS prediction results.
Figure 3PR curves of the t-test, FC, LP, FNMF, jNMFMA, CGUFS, limma, edgeR, and ECGUFS prediction results.
The number of overlapped genes between the top 1000 genes of any two ranked lists obtained by ECGUFS.
| E2 | E3 | E4 | E5 | E6 | E7 | E8 | E9 | E10 | |
|---|---|---|---|---|---|---|---|---|---|
| E1 | 710 | 705 | 686 | 677 | 695 | 679 | 663 | 691 | 666 |
| E2 | 697 | 686 | 657 | 686 | 721 | 676 | 737 | 682 | |
| E3 | 689 | 677 | 691 | 683 | 655 | 678 | 665 | ||
| E4 | 684 | 704 | 696 | 681 | 715 | 668 | |||
| E5 | 659 | 657 | 674 | 665 | 664 | ||||
| E6 | 670 | 670 | 691 | 690 | |||||
| E7 | 666 | 707 | 669 | ||||||
| E8 | 678 | 649 | |||||||
| E9 | 682 |
Note: E1 represents experiment 1 using ECGUFS.
The number of overlapped genes between the top 2000 genes of any two ranked lists obtained by ECGUFS.
| E2 | E3 | E4 | E5 | E6 | E7 | E8 | E9 | E10 | |
|---|---|---|---|---|---|---|---|---|---|
| E1 | 1593 | 1598 | 1565 | 1570 | 1603 | 1569 | 1547 | 1589 | 1564 |
| E2 | 1623 | 1589 | 1550 | 1621 | 1618 | 1534 | 1621 | 1595 | |
| E3 | 1582 | 1589 | 1610 | 1603 | 1559 | 1599 | 1590 | ||
| E4 | 1573 | 1605 | 1563 | 1570 | 1592 | 1567 | |||
| E5 | 1569 | 1545 | 1575 | 1572 | 1567 | ||||
| E6 | 1597 | 1570 | 1607 | 1619 | |||||
| E7 | 1557 | 1615 | 1584 | ||||||
| E8 | 1561 | 1550 | |||||||
| E9 | 1596 |
The functional annotation clusterings of the 287 overlapped genes.
| Annotation Cluster | Category | Annotation | Count | Benjamini | |
|---|---|---|---|---|---|
| 1 | GOTERM_CC_DIRECT | Postsynaptic density | 16 | 4.2 × 10−7 | 1.3 × 10−4 |
| GOTERM_CC_DIRECT | Postsynaptic membrane | 10 | 2.2 × 10−3 | 9.1 × 10−2 | |
| GOTERM_CC_DIRECT | Synapse | 14 | 1.3 × 10−2 | 2.4 × 10−1 | |
| GOTERM_CC_DIRECT | Cell junction | 15 | 7.4 × 10−2 | 4.9 × 10−1 | |
| 2 | GOTERM_BP_DIRECT | Fatty acid metabolic process | 9 | 9.3 × 10−4 | 4.8 × 10−1 |
| GOTERM_BP_DIRECT | Fatty acid biosynthetic process | 5 | 1.5 × 10−2 | 8.4 × 10−1 | |
| KEGG_PATHWAY | Fatty acid metabolism | 4 | 3.6 × 10−2 | 8.7 × 10−1 | |
| GOTERM_MF_DIRECT | Transferase activity, transferring acyl groups other than amino-acyl groups | 3 | 3.7 × 10−2 | 7.6 × 10−1 | |
| 3 | GOTERM_MF_DIRECT | Transferase activity | 37 | 2.4 × 10−4 | 1.1 × 10−1 |
| GOTERM_BP_DIRECT | Phosphorylation | 17 | 5.7 × 10−3 | 6.4 × 10−1 | |
| GOTERM_MF_DIRECT | kinase activity | 18 | 8.5 × 10−3 | 5.7 × 10−1 | |
| GOTERM_MF_DIRECT | Nucleotide binding | 38 | 1.4 × 10−2 | 6.3 × 10−1 | |
| GOTERM_MF_DIRECT | ATP binding | 30 | 2.6 × 10−2 | 7.3 × 10−1 | |
| GOTERM_BP_DIRECT | Protein phosphorylation | 14 | 3.5 × 10−2 | 9.6 × 10−1 | |
| GOTERM_MF_DIRECT | Protein kinase activity | 12 | 9.6 × 10−2 | 8.6 × 10−1 | |
| GOTERM_MF_DIRECT | Protein serine/threonine kinase activity | 9 | 2.1 × 10−1 | 9.4 × 10−1 | |
| 4 | GOTERM_BP_DIRECT | Learning or memory | 5 | 4.1 × 10−3 | 6.9 × 10−1 |
| GOTERM_BP_DIRECT | Regulation of synaptic plasticity | 4 | 1.7 × 10−2 | 8.4 × 10−1 | |
| GOTERM_BP_DIRECT | Embryo development | 3 | 2.7 × 10−1 | 9.9 × 10−1 | |
| 5 | GOTERM_CC_DIRECT | Cell–cell adherens junction | 10 | 2.0 × 10−2 | 2.9 × 10−1 |
| GOTERM_MF_DIRECT | Cadherin binding involved in cell–cell adhesion | 9 | 3.3 × 10−2 | 7.7 × 10−1 | |
| GOTERM_BP_DIRECT | Cell–cell adhesion | 6 | 9.7 × 10−2 | 9.9 × 10−1 |