| Literature DB >> 33203357 |
Yuanyuan Ma1, Junmin Zhao2, Yingjun Ma3.
Abstract
BACKGROUND: With the rapid development of high-throughput technique, multiple heterogeneous omics data have been accumulated vastly (e.g., genomics, proteomics and metabolomics data). Integrating information from multiple sources or views is challenging to obtain a profound insight into the complicated relations among micro-organisms, nutrients and host environment. In this paper we propose a multi-view Hessian regularization based symmetric nonnegative matrix factorization algorithm (MHSNMF) for clustering heterogeneous microbiome data. Compared with many existing approaches, the advantages of MHSNMF lie in: (1) MHSNMF combines multiple Hessian regularization to leverage the high-order information from the same cohort of instances with multiple representations; (2) MHSNMF utilities the advantages of SNMF and naturally handles the complex relationship among microbiome samples; (3) uses the consensus matrix obtained by MHSNMF, we also design a novel approach to predict the classification of new microbiome samples.Entities:
Keywords: Hessian regularization; Human microbiome; Multi-view clustering; Symmetric nonnegative matrix factorization
Mesh:
Year: 2020 PMID: 33203357 PMCID: PMC7672850 DOI: 10.1186/s12859-020-03555-w
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Illustrative of MHSNMF framework on human microbiome data. a Example representation of the phylogenetic profile and metabolic profile for the same cohort of samples. b Sample-sample similarity matrices obtained from each view. c Using MHSNMF, each similarity matrix is factorized into a low rank matrix and its transposition. Matrix fusion process iteratively updates each clustering with information from the other view. d The iterative fusion leads to convergence to the final consensus matrix H∗. e Given a new sample x from the i ‐ th view, we can obtain its subspace representation h by H∗ and the proposed mapping approach. Here, indicates the training samples from i ‐ th view, S denotes the similarity between x and . α is the regularization parameter. f Once obtaining h, some applications such as classification, prediction and so on would be executed naturally
The pseudocode of MHSNMF
| MHSNMF algorithm | |
|---|---|
| Input: | |
| Output: | |
| 1. Transforming each | |
| 2. Solving the Hessian matrix | |
| 3. Initializing | |
| 4. Iteration beginning | |
| For | |
| Fixing | |
| Fixing | |
| Learning | |
| Until all views have been updated | |
| 5. Repeating |
Statistics of the Three-source dataset
| Topics | # Samples |
|---|---|
| Business | 56 |
| Entertainment | 21 |
| Health | 11 |
| Politics | 18 |
| Sport | 51 |
| Technology | 12 |
Statistics of the HMP dataset
| Body sites | # Samples |
|---|---|
| Stool | 134 |
| Posterior_fornix | 49 |
| Anterior_nares | 86 |
| Buccal_mucosa | 106 |
| Plaque | 122 |
| Retroauricular_crease | 17 |
| Tongue_dorsum | 123 |
The best clustering performance on two datasets
| Accuracy (%) | NMI (%) | |||
|---|---|---|---|---|
| Three-source | HMP | Three-source | HMP | |
| BSSV | 79.88 | 88.54 | 69.66 | 84.64 |
| WSSV | 65.68 | 81.16 | 58.26 | 80.71 |
| Multi-NMF | 66.86 | 77.55 | 55.04 | 72.87 |
| Co-training SC | 61.54 | 63.58 | 58.03 | 63.68 |
| SNF | 65.68 | 91.21 | 56.34 | 89.20 |
| LJ-NMF | 69.82 | 73.16 | 60.08 | 67.77 |
| CSMF | 65.18 | 74.01 | 63.23 | 65.43 |
| NetNMF | 70.18 | 82.50 | 61.24 | 81.76 |
| MHSNMF | 82.84 | 95.28 | 71.43 | 91.76 |
In Multi-NMF, these clustering results on three-source and HMP data are obtained when γ = 0.01 and 0.05, respectively. For three-source dataset, the cosine function was used to construct the similarity matrix. For BSSV, WSSV and LJ-NMF, the number of neighborhoods on HMP data was set to be 12. For other values, MHSNMF still outperforms other algorithms in most cases.
Fig. 2The performance of MHSNMF w.r.t parameters γ and β on three-source and HMP datasets, respectively
Fig. 3Convergence and corresponding AC curve of MHSNMF on three-source and HMP datasets
Fig. 4Performance of MHSNMF versus p and knn on HMP data
Fig. 5Scatter plot of HMP data in three-dimension space. The result is obtained when γ equals to 0.05 and is set to be 1e-4. Seven colors indicate the true labels of microbiome samples from different body sites
The prediction accuracy on HMP data
| Phylogenetic profile (%) | Metabolic profile (%) | Average (%) | |
|---|---|---|---|
| SNMF | 91.08% | 90.71% | 90.90% |
| CCA | 64.58% | 21.88% | 43.23% |
| PLSR | 91.67% | 69.27% | 80.47% |
| MHSNMF |