| Literature DB >> 30487464 |
Na Yu1, Ying-Lian Gao2, Jin-Xing Liu3, Junliang Shang4, Rong Zhu5, Ling-Yun Dai6.
Abstract
Cancer genomic data contain views from different sources that provide complementary information about genetic activity. This provides a new way for cancer research. Feature selection and multi-view clustering are hot topics in bioinformatics, and they can make full use of complementary information to improve the effect. In this paper, a novel integrated model called Multi-view Non-negative Matrix Factorization (MvNMF) is proposed for the selection of common differential genes (co-differential genes) and multi-view clustering. In order to encode the geometric information in the multi-view genomic data, graph regularized MvNMF (GMvNMF) is further proposed by applying the graph regularization constraint in the objective function. GMvNMF can not only obtain the potential shared feature structure and shared cluster group structure, but also capture the manifold structure of multi-view data. The validity of the proposed GMvNMF method was tested in four multi-view genomic data. Experimental results showed that the GMvNMF method has better performance than other representative methods.Entities:
Keywords: common differential gene selection; graph regularization; integrated model; multi-view clustering; non-negative matrix factorization
Year: 2018 PMID: 30487464 PMCID: PMC6315625 DOI: 10.3390/genes9120586
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Description of four multi-view datasets.
| Datasets | Data Types | Normal Samples | Tumor Samples | Genes |
|---|---|---|---|---|
| PAAD | GE, CNV, ME | 176 | 4 | 19,877 |
| ESCA | GE, CNV, ME | 183 | 9 | 19,877 |
| HNSC | GE, CNV, ME | 398 | 20 | 19,877 |
| COAD | GE, CNV, ME | 262 | 19 | 16,977 |
Note: Datasets represent different multi-view data. PAAD: pancreatic adenocarcinoma; ESCA: esophageal carcinoma; HNSC: head and neck squamous cell carcinoma; COAD: colon adenocarcinoma; GE: gene expression; CNV: copy number variation; ME: methylation.
Figure 1Performance of the Multi-view Non-negative Matrix Factorization (MvNMF) set with different values of and . (a) is the clustering performance of MvNMF on PAAD and HNSC about ; (b) is the clustering performance of MvNMF on ESCA and COAD about ; (c) is the clustering performance of MvNMF on PAAD, ESCA and COAD about .
Figure 2Performance of the graph regularized MvNMF (GMvNMF) set with different values of , and . (a) is the clustering performance of GMvNMF on PAAD, HNSC, ESCA and COAD about ; (b) is the clustering performance of GMvNMF on PAAD, HNSC, ESCA and COAD about ; (c) is the clustering performance of GMvNMF on PAAD, HNSC, ESCA and COAD about .
Figure 3Convergence curves of joint Non-negative Matrix Factorization (jNMF), integrated NMF (iNMF), integrative orthogonality-regularized NMF (iONMF), MvNMF, and GMvNMF.
Computational time on ESCA.
| Methods | Times (s) |
|---|---|
| jNMF | 2.8808 ± 1.7 × 10−4 |
| iNMF | 3.4647 ± 1.3 × 10−3 |
| iONMF | 5.7375 ± 2.8 × 10−3 |
| MvNMF | 1.3495 ± 7.0 × 10−5 |
| GMvNMF | 1.0767 ± 1.4 × 10−4 |
The clustering performance on PAAD, ESCA, COAD and HNSC.
| Methods | Metrics | jNMF | iNMF | iONMF | MvNMF | GMvNMF |
|---|---|---|---|---|---|---|
| PAAD | AC (%) | 70.39 ± 3.71 | 70.30 ± 3.71 | 65.01 ± 2.73 | 63.86 ± 0.78 |
|
| Recall (%) | 61.78 ± 7.34 | 56.49 ± 8.30 | 53.17 ± 5.48 | 56.30 ± 2.77 |
| |
| Precision (%) | 97.93 ± 0.03 |
| 97.89 ± 0.00 | 97.88 ± 0.03 | 95.99 ± 1.92 | |
| F-measure (%) | 71.99 ± 5.26 | 66.92 ± 7.06 | 65.65 ± 4.65 | 69.89 ± 1.92 |
| |
| ESCA | AC (%) | 65.32 ± 3.70 | 66.42 ± 3.49 | 57.64 ± 0.21 | 68.04 ± 0.70 |
|
| Recall (%) | 51.48 ± 6.67 | 54.39 ± 6.55 | 51.90 ± 0.67 | 51.10 ± 3.75 |
| |
| Precision (%) | 88.16 ± 5.84 | 88.29 ± 6.21 | 94.70 ± 0.20 | 93.51 ± 0.51 |
| |
| F-measure (%) | 62.81 ± 6.60 | 65.61 ± 6.25 | 67.16 ± 0.55 | 64.47 ± 3.39 |
| |
| COAD | AC (%) | 73.91 ± 1.84 | 71.00 ± 1.33 | 66.99 ± 0.68 | 65.13 ± 0.03 |
|
| Recall (%) | 57.15 ± 6.54 | 51.28 ± 5.16 | 50.24 ± 2.95 | 47.15 ± 1.58 |
| |
| Precision (%) | 90.02 ± 3.29 | 87.60 ± 4.52 | 90.18 ± 1.88 | 89.94 ± 0.64 |
| |
| F-measure (%) | 68.25 ± 5.34 | 63.53 ± 5.11 | 63.79 ± 2.8 | 61.45 ± 1.58 |
| |
| HNSC | AC (%) | 66.75 ± 0.00 | 66.16 ± 0.01 | 66.39 ± 0.00 | 67.70 ± 0.03 |
|
| Recall (%) | 53.62 ± 2.19 | 51.18 ± 2.21 | 50.68 ± 2.20 | 55.23 ± 2.44 |
| |
| Precision (%) | 95.22 ± 0.38 | 94.30 ± 0.39 | 94.03 ± 0.39 |
| 94.93 ± 0.05 | |
| F-measure (%) | 67.85 ± 2.01 | 65.61 ± 1.96 | 65.09 ± 2.03 | 69.10 ± 2.20 |
|
Note: The best experimental results are highlighted in bold.
Co-differential genes selection results on four multi-view datasets.
| Methods | PAAD | ESCA | COAD | HNSC | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| HRS | ARS |
| HRS | ARS |
| HRS | ARS |
| HRS | ARS | |
| jNMF | 374 | 84.93 | 4.89 | 168 |
| 5.19 | 142 | 103.7 | 7.02 | 175 |
| 17.75 |
| iNMF | 375 | 84.93 | 4.84 | 171 |
| 5.31 | 144 | 103.7 | 7.71 | 175 | 102.98 | 16.65 |
| iONMF | 375 |
| 5.19 | 170 |
| 5.36 | 141 | 165.65 | 8.64 | 175 |
| 17.52 |
| MvNMF | 365 |
| 5.23 | 170 |
| 5.52 | 145 | 165.65 |
|
|
|
|
| GMvNMF |
|
|
|
|
|
|
|
| 8.37 |
|
| 17.60 |
Note: N is obtained by matching the co-differential genes selected by each method to the virulence gene pool of PAAD, ESCA, COAD, and HNSC. HRS represents the highest relevant score, and ARS represents the average relevant score. The best experimental results are highlighted in bold.
Summary of the co-differential genes selected by the GMvNMF method.
| Gene ID | Gene ED | Related Go Annotations | Related Diseases | Relevance Score |
|---|---|---|---|---|
| 672 |
| RNA binding and ligase activity | Breast-Ovarian Cancer, Familial 1 and Pancreatic Cancer 4 | 173.12 |
| 675 |
| protease binding and histone acetyltransferase activity | Fanconi Anemia, Complementation Group D1 and Breast Cancer | 135.87 |
| 1956 |
| identical protein binding and protein kinase activity | Inflammatory Skin and Bowel Disease, Neonatal, 2 and Lung Cancer | 104.16 |
| 3569 |
| signaling receptor binding and growth factor activity | Kaposi Sarcoma and Rheumatoid Arthritis, Systemic Juvenile | 58.74 |
| 4318 |
| identical protein binding and metalloendopeptidase activity | Metaphyseal Anadysplasia 2 and Metaphyseal Anadysplasia | 45.57 |
| 1495 |
| actin filament binding | Macular Dystrophy, Patterned, 2 and Butterfly-Shaped Pigment Dystrophy | 41.99 |
| 1950 |
| calcium ion binding and epidermal growth factor receptor binding | Hypomagnesemia 4, Renal and Familial Primary Hypomagnesemia with Normocalciuria and Normocalcemia | 40.84 |
| 5594 |
| transferase activity, transferring phosphorus-containing groups and protein tyrosine kinase activity | Chromosome 22Q11.2 Deletion Syndrome, Distal and Pertussis | 39.23 |
| 2475 |
| transferase activity, transferring phosphorus-containing groups and protein serine/threonine kinase activity | Focal Cortical Dysplasia, Type II and Smith-Kingsmore Syndrome | 34.07 |
| 887 |
| G-protein coupled receptor activity and 1-phosphatidylinositol-3-kinase regulator activity | Panic Disorder and Anxiety | 23.43 |
Note: Gene ID represents the number of the gene. Gene ED represents the gene name.
Summary of the co-differential genes selected on PAAD, ESCA and HNSC.
| Gene ID | Gene ED | Related Go Annotations | Related Diseases | Paralog Gene |
|---|---|---|---|---|
| 999 |
| calcium ion binding and protein phosphatase binding | Gastric Cancer, Hereditary Diffuse and Blepharocheilodontic Syndrome 1 |
|
| 1499 |
| DNA binding transcription factor activity and binding | Mental Retardation, Autosomal Dominant 19 and Pilomatrixoma |
|
| 1956 |
| identical protein binding and protein kinase activity | Inflammatory Skin and Bowel Disease, Neonatal, 2 and Lung Cancer |
|
Note: Paralog gene produced via gene duplication within a genome.