| Literature DB >> 31639067 |
Na Yu1, Ying-Lian Gao2, Jin-Xing Liu3, Juan Wang4, Junliang Shang1.
Abstract
BACKGROUND: As one of the most popular data representation methods, non-negative matrix decomposition (NMF) has been widely concerned in the tasks of clustering and feature selection. However, most of the previously proposed NMF-based methods do not adequately explore the hidden geometrical structure in the data. At the same time, noise and outliers are inevitably present in the data.Entities:
Keywords: Clustering; Common abnormal gene selection; Hypergraph Laplacian; L2,1-norm; Multi-view gene expression data; Non-negative matrix decomposition
Mesh:
Year: 2019 PMID: 31639067 PMCID: PMC6805321 DOI: 10.1186/s40246-019-0222-6
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Fig. 1Illustration of the hypergraph. a An example of a hypergraph. b Its corresponding incidence matrix
Fig. 2The whole framework of RHNMF
Summary of four multi-view datasets
| Datasets | Samples | Genes | Classes | Views | Pv |
|---|---|---|---|---|---|
| PAAD_HNSC_CHOL_GE | 610 | 20502 | 3 | 3 | 20502, 20502, 20502 |
| PAAD_ESCA_CHOL_GE | 395 | 20502 | 3 | 3 | 20502, 20502, 20502 |
| PAAD_HNSC_ESCA_GE | 757 | 20502 | 3 | 3 | 20502, 20502, 20502 |
| HNSC_ESCA_CHOL_GE | 617 | 20502 | 3 | 3 | 20502, 20502, 20502 |
Note: Datasets are different multi-view data. Classes represent the number of data categories (the type of cancer), views represent the number of data views (the type of cancer), and PV represents the dimension of each view
Fig. 3Performance of the RHNMF set with different values of α
Comparison of clustering performance in multi-view datasets
| Datasets | PAAD_HNSC_CHOL_GE | PAAD_ESCA_CHOL_GE | PAAD_HNSC_ESCA_GE | HNSC_ESCA_CHOL_GE | ||||
|---|---|---|---|---|---|---|---|---|
| AC (%) | NMI (%) | AC (%) | NMI (%) | AC (%) | NMI (%) | AC (%) | NMI (%) | |
| K-means | 57.19 ± 0.21 | 20.71 ± 0.74 | 52.24 ± 0.33 | 6.67 ± 0.48 | 46.79 ± 0.07 | 14.35 ± 0.30 | 54.62 ± 0.09 | 15.93 ± 0.10 |
| PCA | 57.71 ± 0.02 | 18.38 ± 0.32 | 47.02 ± 0.12 | 1.00 ± 0.01 | 46.98 ± 0.08 | 13.63 ± 0.32 | 48.95 ± 0.04 | 10.70 ± 0.06 |
| NMF | 48.28 ± 0.28 | 15.95 ± 0.08 | 52.56 ± 0.17 | 6.05 ± 0.15 | 46.41 ± 0.00 | 13.27 ± 0.02 | 48.87 ± 0.14 | 9.74 ± 0.09 |
| GNMF | 53.46 ± 0.24 | 17.23 ± 0.37 | 47.68 ± 0.01 | 1.52 ± 0.01 | 44.82 ± 0.10 | 14.18 ± 0.28 | 52.95 ± 0.09 | 15.29 ± 0.10 |
| NMFL2,1 | 58.69 ± 0.00 | 26.19 ± 0.00 | 57.17 ± 0.09 | 21.58 ± 0.03 | 50.21 ± 0.14 | 22.38 ± 0.26 | 51.70 ± 0.18 | 15.62 ± 0.09 |
| HNMF | 65.70 ± 0.02 | 32.18 ± 0.19 | 51.36 ± 0.07 | 25.64 ± 0.02 | 64.63 ± 0.08 | 26.90 ± 0.15 | 58.63 ± 0.09 | 19.32 ± 0.05 |
| SHNMF | 66.40 ± 0.03 | 35.62 ± 0.31 | 52.10 ± 0.07 | 26.01 ± 0.01 | 63.85 ± 0.04 | 36.93 ± 0.01 | 58.96 ± 0.06 | 19.07 ± 0.04 |
| RGNMF | 79.33 ± 0.83 | 60.42 ± 0.19 | 75.44 ± 0.76 | 60.52 ± 0.69 | 79.98 ± 0.81 | 53.74 ± 1.25 | 72.49 ± 1.35 | 38.36 ± 1.17 |
| RHNMF |
|
|
|
|
|
|
|
|
Note: The best experimental results are highlighted in italics
The clustering performance of the nine methods on single-cell dataset
| Methods | K-means | PCA | NMF | GNMF | NMFL2,1 | HNMF | SHNMF | RGNMF | RHNMF |
|---|---|---|---|---|---|---|---|---|---|
| AC (%) | 76.16 ± 0.18 | 76.89 ± 0.64 | 77.19 ± 0.64 | 78.57 ± 0.47 | 78.15 ± 0.32 | 79.19 ± 0.26 | 78.36 ± 0.45 | 79.76 ± 0.13 |
|
| NMI (%) | 38.29 ± 0.22 | 36.34 ± 0.77 | 38.27 ± 0.73 | 39.63 ± 0.53 | 41.05 ± 0.10 | 40.39 ± 0.26 | 39.12 ± 0.57 | 40.78 ± 0.04 |
|
Note: The best experimental results are highlighted in italics
Performance comparison of com-abnormal gene selection in multi-view datasets
| Methods |
| Com-abnormal genes |
|---|---|---|
| PCA | 25 | |
| NMF | 15 | |
| GNMF | 24 | |
| NMF | 31 | CEACAM5, |
| HNMF | 32 | CEACAM5, |
| SHNMF | 31 | CEACAM5, |
| RGNMF | 33 | EGFR, CCND1, |
| RHNMF | 34 | CEACAM5, |
Note: Bold genes denote that they are selected simultaneously by these eight methods. Underlined genes denote that they can be selected by RHNMF. N represents the number of com-abnormal genes selected for every method
Detailed analysis of the com-abnormal genes selected only by the RHNMF method
| Gene ID | Gene ED | Related GO annotations | Related diseases | Relevance score |
|---|---|---|---|---|
| 3875 | KRT18 | Poly(A) RNA binding and scaffold protein binding | Cirrhosis, cryptogenic, and nonalcoholic steatohepatitis | 11.99, 11.76, 2.61 |
| 3309 | HSPA5 | Calcium ion binding and ubiquitin protein ligase binding | Borna disease and Wolfram syndrome | 11.46, 9.13, 0.88 |
Summary of the same com-abnormal genes discovered by eight methods
| Gene ID | Gene ED | Related GO annotations | Related diseases | Relevance score |
|---|---|---|---|---|
| 3880 | KRT19 | Structural molecule activity and structural constituent of cytoskeleton | Lung cancer and thyroid cancer | 31.72, 24.50, 20.66 |
| 1508 | CTSB | Peptidase activity and cysteine-type peptidase activity | Keratolytic winter erythema and occlusion of gallbladder | 24.10, 10.61, 1.22 |
| 2778 | GNAS | GTP binding and signal transducer activity | McCune-Albright syndrome, somatic, mosaic, and pseudohypoparathyroidism Ia | 28.52, 9.69, 1.78 |
| 3852 | KRT5 | Structural molecule activity and scaffold protein binding | Epidermolysis bullosa simplex, Dowling-Meara type and epidermolysis bullosa simplex, Weber-Cockayne type | 24.03, 13.77, 0.17 |
| 6277 | S100A6 | Calcium ion binding and calcium-dependent protein binding | Endometrial cancer and pancreatic cancer | 19.09, 6.61, 1.44 |
| 1277 | COL1A1 | Identical protein binding and platelet-derived growth factor binding | Caffey disease and osteogenesis imperfecta, type I | 8.85, 19.64, 1.22 |
| 9168 | TMSB10 | Actin binding and actin monomer binding | Actin binding and actin monomer binding | 3.31, 1.29, 1.22 |