| Literature DB >> 35057728 |
Qi Liu1.
Abstract
BACKGROUND: Clustering and feature selection act major roles in many communities. As a matrix factorization, Low-Rank Representation (LRR) has attracted lots of attentions in clustering and feature selection, but sometimes its performance is frustrated when the data samples are insufficient or contain a lot of noise.Entities:
Keywords: Clustering; Gene selection; Graph-Laplacian; Low-rank representation; Truncated nuclear norm
Mesh:
Year: 2022 PMID: 35057728 PMCID: PMC8772046 DOI: 10.1186/s12859-021-04333-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Description about seven integrative gene expression datasets
| Datasets | Genes | Samples | Samples classes |
|---|---|---|---|
| PAAD-COAD | 20502 | 176-262 | 2 |
| HNSC-ESCA | 20502 | 398-183 | 2 |
| CHOL-HNSC-ESCA | 20502 | 36-398-183 | 3 |
| COAD-PAAD-ESCA | 20502 | 262-176-183 | 3 |
| PAAD-ESCA-HNSC | 20502 | 180-192-418 | 3 |
| HNSC-PAAD-CHOL-ESCA | 20502 | 398-176-36-183 | 4 |
| ESCA-COAD-CHOL-PAAD | 20502 | 183-262-36-176 | 4 |
Fig. 1The clustering performance of TGLRR model versus parameter and . a the clustering results on PAAD-COAD dataset, b the results on COAD-PAAD-ESCA dataset, c the results on HNSC-PAAD-CHOL-ESCA dataset
Fig. 2The singular values on six distinct matrices
Fig. 3Convergence curves of TGLRR on gene expression data
The clustering results on PAAD-COAD and HNSC-ESCA integrative data
| PAAD-COAD | HNSC-ESCA | |||||
|---|---|---|---|---|---|---|
| ACC(%) | NMI(%) | F-measure(%) | ACC(%) | NMI(%) | F-measure(%) | |
| K-means | 91.57 ± 0.89 | 68.77 ± 4.24 | 91.62 ± 1.01 | 99.36 ± 0.05 | 98.00 ± 0.50 | 98.81 ± 0.18 |
| LLRR | 93.95 ± 0.29 | 71.59 ± 1.29 | 93.83 ± 0.26 | 99.83 ± 0.00 | 98.07 ± 0.00 | 99.80 ± 0.00 |
| LRR | 93.63 ± 0.57 | 70.70 ± 2.52 | 93.64 ± 0.51 | 99.83 ± 0.00 | 98.07 ± 0.00 | 99.80 ± 0.00 |
| RPCA | 93.81 ± 0.46 | 71.09 ± 2.27 | 93.81 ± 0.42 | 100.00 ± 0.00 | 100.00 ± 0.00 | 100.00 ± 0.00 |
| DGLRR | 94.14 ± 0.40 | 71.98 ± 1.80 | 94.13 ± 0.47 | 99.83 ± 0.00 | 98.07 ± 0.00 | 99.80 ± 0.00 |
| LatLRR | 93.76 ± 0.33 | 71.46 ± 1.50 | 93.77 ± 0.29 | 99.83 ± 0.00 | 98.07 ± 0.00 | 99.80 ± 0.00 |
| TGLRR | 95.15 ± 0.00 | 74.44 ± 0.00 | 95.10 ± 0.00 | 100.00 ± 0.00 | 100.00 ± 0.00 | 100.00 ± 0.00 |
The clustering results on CHOL-HNSC-ESCA and COAD-PAAD-ESCA data
| CHOL-HNSC-ESCA | COAD-PAAD-ESCA | |||||
|---|---|---|---|---|---|---|
| ACC (%) | NMI (%) | F-measure (%) | ACC (%) | NMI (%) | F-measure (%) | |
| K-means | 83.49 ± 1.77 | 76.23 ± 3.47 | 77.05 ± 3.42 | 83.61 ± 3.25 | 76.11 ± 2.85 | 81.95 ± 4.28 |
| LLRR | 96.73 ± 0.80 | 94.80 ± 1.26 | 94.69 ± 2.09 | 87.07 ± 2.23 | 79.50 ± 1.46 | 86.19 ± 2.94 |
| LRR | 97.13 ± 0.42 | 95.32 ± 0.61 | 96.16 ± 0.82 | 88.16 ± 1.98 | 80.27 ± 1.48 | 87.47 ± 2.53 |
| RPCA | 85.40 ± 2.64 | 81.43 ± 4.16 | 81.26 ± 4.59 | 85.59 ± 2.72 | 78.85 ± 2.39 | 83.98 ± 3.49 |
| DGLRR | 94.70 ± 1.03 | 92.33 ± 1.78 | 91.63 ± 2.62 | 86.14 ± 2.38 | 78.67 ± 1.82 | 84.93 ± 3.11 |
| LatLRR | 93.94 ± 1.57 | 91.37 ± 2.46 | 91.57 ± 3.32 | 87.16 ± 2.52 | 79.33 ± 1.93 | 86.16 ± 3.30 |
| TGLRR | 98.37 ± 0.00 | 90.58 ± 0.03 | 96.09 ± 0.01 | 92.82 ± 0.77 | 79.51 ± 0.93 | 92.62 ± 0.91 |
The clustering results on HNSC-PAAD-CHOL-ESCA and ESCA-COAD-CHOL-PAAD data
| HNSC-PAAD-CHOL-ESCA | ESCA-COAD-CHOL-PAAD | |||||
|---|---|---|---|---|---|---|
| ACC (%) | NMI (%) | F-measure (%) | ACC (%) | NMI (%) | F-measure (%) | |
| K-means | 78.42 ± 0.94 | 71.34 ± 1.03 | 72.19 ± 1.92 | 82.49 ± 2.15 | 77.01 ± 1.76 | 75.71 ± 3.30 |
| LLRR | 87.66 ± 0.94 | 75.56 ± 0.40 | 86.90 ± 2.04 | 84.41 ± 2.07 | 80.24 ± 1.22 | 82.60 ± 2.73 |
| LRR | 88.63 ± 0.39 | 75.89 ± 0.21 | 89.16 ± 0.81 | 87.40 ± 2.05 | 82.52 ± 1.20 | 87.62 ± 2.17 |
| RPCA | 84.85 ± 1.54 | 80.29 ± 1.50 | 81.72 ± 2.57 | 83.39 ± 1.73 | 79.28 ± 1.31 | 76.86 ± 3.10 |
| DGLRR | 86.68 ± 0.85 | 75.22 ± 0.41 | 84.99 ± 1.94 | 85.99 ± 2.26 | 81.52 ± 1.39 | 84.01 ± 3.13 |
| LatLRR | 85.14 ± 1.02 | 73.96 ± 0.43 | 84.26 ± 2.21 | 86.04 ± 1.85 | 81.49 ± 1.14 | 82.37 ± 1.94 |
| TGLRR | 93.46 ± 0.93 | 82.83 ± 0.75 | 90.90 ± 1.20 | 90.62 ± 1.53 | 79.87 ± 1.49 | 90.34 ± 1.64 |
The top 10 genes selected via TGLRR on PAAD-ESCA-HNSC
| Gene ED | Relevance score | Related diseases | Coded proteins |
|---|---|---|---|
| CDH1 | 101.03, 96.95, 124.3, 107.43 | Gastric, breast, colorectal, thyroid and ovarian cancer | Cadherin superfamily |
| TGFB1 | 73.21, 44.14, 76.66, 64.67 | Camurati-Engelmann disease, Encephalopathy, Inflammatory Bowel Disease and Immunodeficiency | Transforming Growth Factor-Beta Superfamily of Proteins |
| RELA | 27.63, 11.33, 41.36, 26.77 | Mucocutaneous Ulceration, Chronic and Ependymoma | Transcription Factor |
| ANXA5 | 26.80, 10.31, 42.30, 26.47 | Pregnancy Loss, Recurrent 3 and Antiphospholipid Syndrome | Calcium-Dependent Phospholipid Binding Proteins |
| RHOA | 27.48, 11.81, 31.46, 23.58 | Adenocarcinoma and Peripheral T-Cell Lymphoma | Rho Family of Small GTPases |
| PTPN11 | 13.04, 13.56, 43.23, 23.28 | Noonan Syndrome 1 and Juvenile Myelomonocytic Leukemia | Protein Tyrosine Phosphatase |
| CTNNA1 | 20.94, 19.40, 24.80, 21.71 | Macular Dystrophy, Patterned, 2 and Butterfly-Shaped Pigment Dystrophy | Cell Adhesion Process Protein |
| IGF2R | 13.40, 19.07, 25.26, 19.24 | Hepatocellular Carcinoma and Inclusion-Cell Disease | Receptor for Both Insulin-Like Growth Factor 2 and Mannose 6-Phosphate |
| RUNX1 | 10.85, 12.97, 25.61, 16.48 | Platelet Disorder, Familial, with Associated Myeloid Malignancy, leukemia and Isolated Delta-Storage Pool Disease | Transcription Factor |
| EWSR1 | 12.55, 9.19, 27.33, 16.36 | Ewing Sarcoma and Desmoplastic Small Round Cell Tumor | Multifunctional Protein |
Take the contents in the second column of the second row as an example, the first, second and third numeral are the relevance score of CDH1 gene to PAAD, ESCA and HNSC, respectively, and the fourth is the mean