| Literature DB >> 33987200 |
Junmin Zhao1, Yuanyuan Ma2, Lifang Liu3.
Abstract
A network is an efficient tool to organize complicated data. The Laplacian graph has attracted more and more attention for its good properties and has been applied to many tasks including clustering, feature selection, and so on. Recently, studies have indicated that though the Laplacian graph can capture the global information of data, it lacks the power to capture fine-grained structure inherent in network. In contrast, a Vicus matrix can make full use of local topological information from the data. Given this consideration, in this paper we simultaneously introduce Laplacian and Vicus graphs into a symmetric non-negative matrix factorization framework (LVSNMF) to seek and exploit the global and local structure patterns that inherent in the original data. Extensive experiments are conducted on three real datasets (cancer, cell populations, and microbiome data). The experimental results show the proposed LVSNMF algorithm significantly outperforms other competing algorithms, suggesting its potential in biological data analysis.Entities:
Keywords: Laplacian regularization; Vicus graph; local structure; matrix factorization; microbiome
Year: 2021 PMID: 33987200 PMCID: PMC8111298 DOI: 10.3389/fmolb.2021.643014
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
FIGURE 1An illustrative example of the proposed LVSNMF algorithm. (A) the original data matrix (gene expression matrix, microbiome abundance profile matrix, and so on. (B) Laplacian graph used to maintain manifold consistence assumptions. (C) A Vicus graph explores the local geometrical structure in the data. Then c and d are introduced into the proposed LVSNMF model, which integrated the global (Laplacian) and local (Vicus) geometrical structure of the original data. (D) the clustering result given by LVSNMF.
Statistics of the two datasets.
| Dataset | Number of samples | Number of features | Number of clusters |
| Lung cancer | 203 | 3,312 | 5 |
| Pollen | 249 | 14,805 | 11 |
| HMP | 637 | 710 | 7 |
The best performance in three real datasets.
| Accuracy (%) | Normalized mutual information (%) | |||||
| Lung | Pollen | HMP | Lung | Pollen | HMP | |
| SNMF | 83.74 | 84.66 | 87.84 | 67.51 | 86.49 | 84.52 |
| SNMF + Laplacian | 90.64 | 85.94 | 88.27 | 70.03 | 87.33 | 84.46 |
| SNMF + Vicus | 90.15 | 85.60 | 88.49 | 71.26 | 87.12 | 84.95 |
| LVSNMF | 91.13 | 89.16 | 90.58 | 72.96 | 89.50 | 85.56 |
FIGURE 2The best performance of LVSNMF as αincreases.
FIGURE 3The performance of LVSNMF varies as K increases.