| Literature DB >> 26072491 |
Nora K Speicher1, Nico Pfeifer2.
Abstract
MOTIVATION: Despite ongoing cancer research, available therapies are still limited in quantity and effectiveness, and making treatment decisions for individual patients remains a hard problem. Established subtypes, which help guide these decisions, are mainly based on individual data types. However, the analysis of multidimensional patient data involving the measurements of various molecular features could reveal intrinsic characteristics of the tumor. Large-scale projects accumulate this kind of data for various cancer types, but we still lack the computational methods to reliably integrate this information in a meaningful manner. Therefore, we apply and extend current multiple kernel learning for dimensionality reduction approaches. On the one hand, we add a regularization term to avoid overfitting during the optimization procedure, and on the other hand, we show that one can even use several kernels per data type and thereby alleviate the user from having to choose the best kernel functions and kernel parameters for each data type beforehand.Entities:
Mesh:
Year: 2015 PMID: 26072491 PMCID: PMC4765854 DOI: 10.1093/bioinformatics/btv244
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Survival analysis of clustering results of similarity network fusion (SNF) and rMKL-LPP with one and five kernels per data type
| Cancer type | SNF | rMKL-LPP | |
|---|---|---|---|
| 3K | 15K | ||
| GBM | 2.0E-4 (3) | 4.5E-2 (5) | 6.5E-6 (6) |
| BIC | 1.1E-3 (5) | 3.0E-4 (6) | 3.4E-3 (7) |
| KRCCC | 2.9E-2 (3) | 0.23 (6) | 4.0E-5 (14) |
| LSCC | 2.0E-2 (4) | 2.2E-3 (2) | 2.4E-4 (6) |
| COAD | 8.8E-4 (3) | 2.8E-2 (2) | 2.8E-3 (6) |
The numbers in brackets denote the number of clusters. For SNF, these are determined using the eigenrotation method (Wang ), and for rMKL-LPP, by the silhouette value.
Fig. 1.Contribution of the different kernel matrices to each entry in the unified ensemble kernel matrix. The three colors represent gene expression (blue), DNA methylation (yellow) and miRNA expression (red). The intensities represent the kernel parameter γ, starting from (high intensity) to (low intensity)
Fig. 2.Robustness of clustering for leave-one-out datasets measured using Rand index. Each patient is left out once in the dimensionality reduction and clustering procedure and afterwards added to the cluster with the closest mean based on the learned projection for this data point, which is given by = A. The resulting cluster assignment is then compared with the clustering of the whole dataset. The error bars represent one standard deviation
Fig. 3.Robustness of clustering for leave-one-out cross-validation applied to reduced sized datasets measured using Rand index. For each cancer type, we sampled 20 times half of the patients and applied leave-one-out cross-validation as described in Section 3.4. The error bars represent one standard deviation
Fig. 4.Comparison of the robustness of the clustering generated with and without regularization averaged over all cancer types for datasets of different sizes. The percentage on the x-axis denotes, how many patients were used for generating a smaller dataset on which leave-one-out cross-validation was performed. For each cancer type and each fraction of patients, we repeated the process 20 times. The error bars represent one standard deviation
Comparison of clusters identified by rMKL-LPP to gene expression and DNA methylation subtypes of GBM (Rand indices of 0.75 and 0.64, respectively)
| rMKL-LPP clusters | Gene expression subtypes ( | DNA methylation subtypes ( | |||||
|---|---|---|---|---|---|---|---|
| Classical | Mesenchymal | Neural | Proneural | G-CIMP+ | #2 | #3 | |
| #1 | 0 | 36 | 5 | 1 | 0 | 7 | 37 |
| #2 | 31 | 7 | 13 | 2 | 0 | 46 | 6 |
| #3 | 1 | 0 | 1 | 15 | 16 | 1 | 1 |
| #4 | 1 | 1 | 5 | 22 | 0 | 13 | 27 |
| #5 | 9 | 8 | 2 | 3 | 0 | 19 | 18 |
| #6 | 6 | 1 | 2 | 9 | 3 | 7 | 9 |
Fig. 5.Survival analysis of GBM patients for treatment with Temozolomide in the different clusterings. The numbers in brackets denote the number of patients in the respective group; the specified P values are corrected for multiple testing using the Bonferroni method
Top 15 enriched GO terms (FDR q value ≪ 0.001) from the category biological process of differentially expressed genes of Cluster 3
| GO enrichment of overexpressed genes | GO enrichment of underexpressed genes |
|---|---|
| Nucleic acid metabolic process | Immune system process |
| RNA biosynthetic process | Defense response |
| Transcription, DNA templated | Response to external stimulus |
| Nucleic acid templated transcription | Response to stress |
| RNA metabolic process | Extracellular matrix organization |
| Regulation of cellular macromolecule biosynthetic process | Extracellular structure organization |
| Cellular macromolecule biosynthetic process | Regulation of immune system process |
| Nucleobase-containing compound metabolic process | Positive regulation of immune system process |
| Nucleobase-containing compound biosynthetic process | Inflammatory response |
| Regulation of RNA metabolic process | Positive regulation of response to stimulus |
| Regulation of transcription, DNA-templated | Response to external biotic stimulus |
| Regulation of nucleic acid-templated transcription | Regulation of response to stimulus |
| Regulation of macromolecule biosynthetic process | Response to biotic stimulus |
| Macromolecule biosynthetic process | Cell activation |
| Regulation of RNA biosynthetic process | Leukocyte migration |