| Literature DB >> 25520554 |
Lin Zhang1, Jeffrey S Morris2, Jiexin Zhang3, Robert Z Orlowski4, Veerabhadran Baladandayuthapani5.
Abstract
It is well-established that the development of a disease, especially cancer, is a complex process that results from the joint effects of multiple genes involved in various molecular signaling pathways. In this article, we propose methods to discover genes and molecular pathways significantly associated with clinical outcomes in cancer samples. We exploit the natural hierarchal structure of genes related to a given pathway as a group of interacting genes to conduct selection of both pathways and genes. We posit the problem in a hierarchical structured variable selection (HSVS) framework to analyze the corresponding gene expression data. HSVS methods conduct simultaneous variable selection at the pathway (group level) and the gene (within-group) level. To adapt to the overlapping group structure present in the pathway-gene hierarchy of the data, we developed an overlap-HSVS method that introduces latent partial effect variables that partition the marginal effect of the covariates and corresponding weights for a proportional shrinkage of the partial effects. Combining gene expression data with prior pathway information from the KEGG databases, we identified several gene-pathway combinations that are significantly associated with clinical outcomes of multiple myeloma. Biological discoveries support this relationship for the pathways and the corresponding genes we identified.Entities:
Keywords: Bayesian variable selection; hierarchical variable selection; multiple myeloma; overlapping group
Year: 2014 PMID: 25520554 PMCID: PMC4260770 DOI: 10.4137/CIN.S13787
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1Schematic plot showing the overlapping group structures present in the gene expression data. Each gene in the left column can belong to one or multiple pathways, the activities of which are associated with the clinical outcome.
Figure 2Posterior probability of including each pathway in the model for the MM data.
Significant KEGG pathways selected for the MM data.
| No. | KEGG ID | KEGG PATHWAY | # OF GENES SELECTED | TOTAL # OF GENES |
|---|---|---|---|---|
| 2 | hsa04010 | MAPK signaling pathway | 0 | 265 |
| 4 | hsa04115 | p53 signaling pathway | 0 | 68 |
| 5 | hsa04210 | Apoptosis | 0 | 87 |
| 7 | hsa03018 | RNA degradation | 0 | 57 |
| 8 | hsa03030 | DNA replication | 0 | 36 |
| 9 | hsa03040 | Spliceosome | 0 | 116 |
| 10 | hsa03420 | Nucleotide excision repair | 0 | 44 |
| 11 | hsa04512 | ECM reception interaction | 0 | 84 |
| 12 | hsa04620 | Toll like reception signaling | 0 | 102 |
| 13 | hsa04621 | NOD like reception signaling | 0 | 62 |
| 14 | hsa04622 | RIG-I-like reception signaling | 0 | 71 |
| 15 | hsa05120 | Epithelial cell signaling in | 0 | 68 |
| 16 | hsa00310 | Lysine degradation | 0 | 44 |
| 17 | hsa00330 | Arginine and proline metabolism | 0 | 54 |
| 18 | hsa03010 | Ribosome | 0 | 87 |
| 19 | hsa04910 | Insulin signaling pathway | 0 | 135 |
| 20 | hsa 05211 | Renal cell carcinoma | 0 | 70 |
| 21 | hsa04540 | Gap junction | 0 | 87 |
Figure 3The 95% posterior credible intervals of the coefficients for the gene variables in the selected pathways: (A) galactose metabolism pathway; (B) cell cycle pathway; (C) Wnt signaling pathway.
Significant genes selected for the MM data.
| No. | GENE SYMBOL | GENE NAME |
|---|---|---|
| 1 | ANAPC7 | anaphase promoting complex subunit 7 |
| 2 | CAMK2G | calcium/calmodulin-dependent protein kinase II gamma |
| 3 | CCNA1 | cyclin A1 |
| 4 | CCND2 | cyclin D2 |
| 5 | CCND3 | cyclin D3 |
| 6 | CCNE2 | cyclin E2 |
| 7 | CDC14B | cell division cycle 14B |
| 8 | CDKN1C | cyclin-dependent kinase inhibitor 1C (p57, Kip2) |
| 9 | CSNK2B | casein kinase 2, beta polypeptide |
| 10 | CTNNB1 | catenin (cadherin-associated protein), beta 1, 88kDa |
| 11 | CUL1 | cullin 1 |
| 12 | DBF4 | DBF4 homolog |
| 13 | FZD7 | frizzled family receptor 7 |
| 14 | FZD8 | frizzled family receptor 8 |
| 15 | MAPK9 | mitogen-activated protein kinase 9 |
| 16 | MCM7 | minichromosome maintenance complex component 7 |
| 17 | NKD2 | naked cuticle homolog 2 (Drosophila) |
| 18 | PCNA | proliferating cell nuclear antigen |
| 19 | PFKM | phosphofructokinase, muscle |
| 20 | PPP2R1B | protein phosphatase 2, regulatory subunit a, beta |
| 21 | PRKCA | protein kinase C, alpha |
| 22 | RAC1 | ras-related C3 botulinum toxin substrate 1 (rho family, small GTP binding protein Racl) |
| 23 | ROCK1 | Rho-associated, coiled-coil containing protein kinase 1 |
| 24 | SFN | stratifin |
| 25 | SFRP4 | secreted frizzled-related protein 4 |
| 26 | WIF1 | WNT inhibitory factor 1 |
| 27 | WNT11 | wingless-type MMTV integration site family, member 11 |
| 28 | WNT5B | wingless-type MMTV integration site family, member 5B |
Figure 4The 30 genes with highest posterior probabilities of regression coefficients being greater than ϕ = 0.5 in the absolute value. The line patterns correspond to the pathways to which the genes belong: dotted lines for genes in the galactose metabolism pathway; solid lines for genes in the cell cycle pathway; dashed lines for genes in the Wnt signaling pathway; dot-dash lines for genes in both the cell cycle and Wnt signaling pathways.
Figure 5Three regulatory networks that involve the 28 flagged genes as identified by IPA. The nodes with filled color correspond to the flagged significant genes.