| Literature DB >> 30804977 |
Limin Jiang1, Yongkang Xiao2, Yijie Ding3, Jijun Tang1,4, Fei Guo1.
Abstract
Discovering cancer subtypes is useful for guiding clinical treatment of multiple cancers. Progressive profile technologies for tissue have accumulated diverse types of data. Based on these types of expression data, various computational methods have been proposed to predict cancer subtypes. It is crucial to study how to better integrate these multiple profiles of data. In this paper, we collect multiple profiles of data for five cancers on The Cancer Genome Atlas (TCGA). Then, we construct three similarity kernels for all patients of the same cancer by gene expression, miRNA expression and isoform expression data. We also propose a novel unsupervised multiple kernel fusion method, Similarity Kernel Fusion (SKF), in order to integrate three similarity kernels into one combined kernel. Finally, we make use of spectral clustering on the integrated kernel to predict cancer subtypes. In the experimental results, the P-values from the Cox regression model and survival curve analysis can be used to evaluate the performance of predicted subtypes on three datasets. Our kernel fusion method, SKF, has outstanding performance compared with single kernel and other multiple kernel fusion strategies. It demonstrates that our method can accurately identify more accurate subtypes on various kinds of cancers. Our cancer subtype prediction method can identify essential genes and biomarkers for disease diagnosis and prognosis, and we also discuss the possible side effects of therapies and treatment.Entities:
Keywords: The Cancer Genome Atlas; cancer subtypes prediction; similarity kernel fusion; sparse matrix; spectral clustering
Year: 2019 PMID: 30804977 PMCID: PMC6370730 DOI: 10.3389/fgene.2019.00020
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1The flowchart of our method for discovering cancer subtype.
Description of three datasets from TCGA.
| No.1 Dataset | Breast | 1071 | 60483 | 183 | 1881 |
| Colon | 426 | 60483 | 186 | 1881 | |
| Kidney | 868 | 60483 | 176 | 1881 | |
| Lung | 981 | 60483 | 174 | 1881 | |
| Stomach | 377 | 60483 | 211 | 1881 | |
| No.2 Dataset | Breast | 105 | 17814 | 23094 | 354 |
| Colon | 92 | 17814 | 23088 | 312 | |
| Kidney | 122 | 17899 | 24960 | 329 | |
| Lung | 106 | 12042 | 23074 | 352 | |
| GBM | 215 | 12042 | 1305 | 534 | |
| No.3 Dataset | Breast | 1071 | 18222 | 183 | 1881 |
| Colon | 426 | 18222 | 186 | 1881 | |
| Kidney | 868 | 18222 | 176 | 1881 | |
| Lung | 981 | 18222 | 174 | 1881 | |
| Stomach | 377 | 18222 | 211 | 1881 |
Figure 2Results of SKF with α on three datasets.
Comparison results between SKF and single kernel on three datasets.
| Dataset No.1 | Stomach (C=12) | 0.703 | 0.027 | 0.548 | 8.86 × 10−14 |
| Lung (C = 8) | 0.621 | 0.137 | 0.829 | 3.81 × 10−4 | |
| Kidney (C = 3) | 0.228 | 0.642 | 0.358 | 0.120 | |
| Breast (C = 5) | 0.516 | 0.281 | 0.281 | 9.79 × 10−6 | |
| Colon (C = 6) | 0.045 | 0.726 | 0.133 | 0.025 | |
| Dataset No.2 | GBM (C = 5) | 0.159 | 0.001 | 0.436 | 0.037 |
| Lung (C = 3) | 8.25 × 10−4 | 0.009 | 0.289 | 6.66 × 10−5 | |
| Kidney (C = 5) | 0.0177 | 0.467 | 0.368 | 0.0372 | |
| Breast (C = 5) | 0.009 | 0.00164 | 1.38 × 10−4 | 2.7 × 10−7 | |
| Colon (C = 5) | 0.587 | 0.084 | 0.702 | 1.81 × 10−3 | |
| Dataset No.3 | Stomach (C = 9) | 0.0538 | 0.438 | 0.621 | 0.003 |
| Lung (C = 3) | 0.352 | 0.171 | 0.398 | 0.005 | |
| Kidney (C = 8) | 0.048 | 0.0018 | 0.779 | 0.101 | |
| Breast (C = 7) | 0.597 | 0.0343 | 0.864 × 10−8 | 1.06 × 10−34 | |
| Colon (C = 7) | 0.0465 | 0.626 | 0.134 | 3.66 × 10−4 |
Figure 3Calculating P-values of SKF, SNF, and UMKL with different number of clusters. (A) Results of Dataset No.1. (B) Results of Dataset No.2. (C) Results of Dataset No.3.
Figure 4Survival curves of subtypes for four cancers.