| Literature DB >> 29207477 |
Rong Zhu1,2, Jin-Xing Liu3, Yuan-Ke Zhang4, Ying Guo5.
Abstract
Detecting genomes with similar expression patterns using clustering techniques plays an important role in gene expression data analysis. Non-negative matrix factorization (NMF) is an effective method for clustering the analysis of gene expression data. However, the NMF-based method is performed within the Euclidean space, and it is usually inappropriate for revealing the intrinsic geometric structure of data space. In order to overcome this shortcoming, Cai et al. proposed a novel algorithm, called graph regularized non-negative matrices factorization (GNMF). Motivated by the topological structure of the GNMF-based method, we propose improved graph regularized non-negative matrix factorization (GNMF) to facilitate the display of geometric structure of data space. Robust manifold non-negative matrix factorization (RM-GNMF) is designed for cancer gene clustering, leading to an enhancement of the GNMF-based algorithm in terms of robustness. We combine the l 2 , 1 -norm NMF with spectral clustering to conduct the wide-ranging experiments on the three known datasets. Clustering results indicate that the proposed method outperforms the previous methods, which displays the latest application of the RM-GNMF-based method in cancer gene clustering.Entities:
Keywords: gene clustering; manifold; matrix factorization; robust
Mesh:
Year: 2017 PMID: 29207477 PMCID: PMC6149772 DOI: 10.3390/molecules22122131
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Statistics of three gene expression datasets.
| Data Sets | Instances | Features | Classes |
|---|---|---|---|
| Colon | 62 | 2000 | 2 |
| GLI_85 | 85 | 22,283 | 2 |
| Leukemia | 72 | 7070 | 2 |
Figure 1Influence of parameters on clustering accuracy.
Clustering results on different datasets. NMF: non-negative matrix factorization; GNMF: graph regularized non-negative matrices factorization; RM-GNMF: robust manifold non-negative matrix factorization; NMI: normalized mutual information.
| Methods | Colon | GLI_85 | Leukemia | |||
|---|---|---|---|---|---|---|
| ACC | NMI | ACC | NMI | ACC | NMI | |
| NMF | 0.6290 | 0.0110 | 0.6088 | 0.1906 | 0.6389 | 0.0193 |
| 0.5323 | 0.0048 | 0.6088 | 0.1916 | 0.6328 | 0.0258 | |
| LNMF | 0.6129 | 0.0181 | 0.5294 | 0.0011 | 0.6250 | 0.0306 |
| GNMF | 0.6290 | 0.0110 | 0.6000 | 0.1584 | 0.6389 | 0.0193 |
| RM-GNMF | 0.6613 | 0.0220 | 0.7529 | 0.1925 | 0.6528 | 0.0369 |
Figure 2Clustering results on different datasets.
Figure 3Influence of noise on clustering accuracy.
Friedman test (significance level of 0.05).
| Statistic | Result | |
|---|---|---|
| 7.00000 | 0.01003 |
Ranking.
| Rank | Algorithm |
|---|---|
| 1.33333 | LNMF |
| 2.33333 | NMF |
| 2.66667 | |
| 3.66667 | GNMF |
| 5.00000 | RM-GNMF |