| Literature DB >> 31480483 |
Li Zeng1, Zhaolong Yu2, Hongyu Zhao3,4.
Abstract
The analysis of cancer genomic data has long suffered "the curse of dimensionality." Sample sizes for most cancer genomic studies are a few hundreds at most while there are tens of thousands of genomic features studied. Various methods have been proposed to leverage prior biological knowledge, such as pathways, to more effectively analyze cancer genomic data. Most of the methods focus on testing marginal significance of the associations between pathways and clinical phenotypes. They can identify informative pathways but do not involve predictive modeling. In this article, we propose a Pathway-based Kernel Boosting (PKB) method for integrating gene pathway information for sample classification, where we use kernel functions calculated from each pathway as base learners and learn the weights through iterative optimization of the classification loss function. We apply PKB and several competing methods to three cancer studies with pathological and clinical information, including tumor grade, stage, tumor sites and metastasis status. Our results show that PKB outperforms other methods and identifies pathways relevant to the outcome variables.Entities:
Keywords: boosting; classification; gene set enrichment analysis; kernel method
Mesh:
Year: 2019 PMID: 31480483 PMCID: PMC6770716 DOI: 10.3390/genes10090670
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
An overview of the Pathway-based Kernel Boosting (PKB) algorithm。
| 1. Initialize target function as an optimal constant: |
| |
| For t from 0 to T-1 (maximum number of iterations) do: |
| 2. calculate the first and second derivatives: |
| |
| 3. optimize the regularized loss function in the base learner space: |
| |
| 4. find the step length with the steepest descent: |
| |
| 5. update the target function: |
| |
| End For |
| return |
Figure 1Estimated pathway weights by PKB in simulation studies. The X-axis represents pathways and the Y-axis represents estimated weights. Based on the simulation settings, the first three pathways are relevant in Models 1 and 2 and the first ten pathways are relevant in Model 3. M represents the number of simulated pathways.
Classification error rate from PKB and competing methods in simulation studies. The numbers below each model represent the number of pathways simulated in the data sets.
| Method | Model 1 | Model 2 | Model 3 | |||||
|---|---|---|---|---|---|---|---|---|
| 50 | 150 | 50 | 150 | 50 | 150 | |||
| PKB- |
| 0.196 |
| 0.189 | 0.179 | 0.21 | ||
| PKB- | 0.158 |
| 0.201 |
|
|
| ||
| Random Forest | 0.305 | 0.331 | 0.290 | 0.328 | 0.341 | 0.400 | ||
| SVM | 0.353 | 0.431 | 0.412 | 0.476 | 0.431 | 0.492 | ||
| NPR | 0.271 | 0.321 | 0.299 | 0.317 | 0.479 | 0.440 | ||
| EasyMKL | 0.253 | 0.284 | 0.268 | 0.330 | 0.212 | 0.300 | ||
Classification error rates on real data. The names in the parenthesis of each data set are the variables used as classification outcome. The best error rates are highlighted with bold font for each column.
| Method | Data Sets | ||||
|---|---|---|---|---|---|
| Metabric (Grade) | Glioma (Grade) | Glioma (Site) | Melanoma (Stage) | Melanoma (Met) | |
| PKB- |
|
| 0.168 | 0.304 |
|
| PKB- | 0.304 |
|
| 0.307 | 0.083 |
| Random Forest | 0.306 | 0.302 | 0.306 | 0.320 | 0.136 |
| SVM | 0.285 | 0.292 | 0.185 | 0.314 | 0.083 |
| NPR | 0.306 | 0.298 | 0.197 |
| 0.110 |
| EasyMKL | 0.297 | 0.302 | 0.291 | 0.314 | 0.100 |
Top fifteen pathways with the largest weights fitted by PKB. In each column, pathways are sorted in descending order from top to bottom. Pathways in the first two columns are from GO Biological Process pathways and the third column from Biocarta.
| Metabric (Grade) | Glioma (Grade) | Melanoma (Met) | |
|---|---|---|---|
| 1 | Cell aggregation | Homophilic cell adhesion via plasma membrane adhesion molecules | Lectin induced complement pathway |
| 2 | Sequestering of metal ion | Neuropeptide signaling pathway | Classical complement pathway |
| 3 | Glutathione derivative metabolic process | Multicellular organismal macromolecule metabolic process | Phospholipase c delta in phospholipid associated cell signaling |
| 4 | Antigen processing and presentation of exogenous peptide antigen via mhc class i | Peripheral nervous system neuron differentiation | Fc epsilon receptor i signaling in mast cells |
| 5 | Sterol biosynthetic process | Positive regulation of hair cycle | Inhibition of matrix metalloproteinases |
| 6 | Pyrimidine containing compound salvage | Peptide hormone processing | Regulation of map kinase pathways through dual specificity phosphatases |
| 7 | Protein dephosphorylation | Hyaluronan metabolic process | Estrogen responsive protein efp controls cell cycle and breast tumors growth |
| 8 | Homophilic cell adhesion via plasma membrane adhesion molecules | Positive regulation of synapse maturation | Chaperones modulate interferon signaling pathway |
| 9 | Cyclooxygenase pathway | Stabilization of membrane potential | Il-10 anti-inflammatory signaling pathway |
| 10 | Establishment of protein localization to endoplasmic reticulum | Lymphocyte chemotaxis | Reversal of insulin resistance by leptin |
| 11 | Negative regulation of dephosphorylation | Insulin secretion | Bone remodeling |
| 12 | Xenophagy | Positive regulation of osteoblast proliferation | Cycling of ran in nucleocytoplasmic transport |
| 13 | Attachment of spindle microtubules to kinetochore | Negative regulation of dephosphorylation | Alternative complement pathway |
| 14 | Fatty acyl coa metabolic process | Trophoblast giant cell differentiation | Cell cycle: g |
| 15 | Apical junction assembly | Synaptonemal complex organization | Hop pathway in cardiac development |