| Literature DB >> 25125969 |
Andrew E Dellinger1, Andrew B Nixon2, Herbert Pang3.
Abstract
Recent method development has included multi-dimensional genomic data algorithms because such methods have more accurately predicted clinical phenotypes related to disease. This study is the first to conduct an integrative genomic pathway-based analysis with a graph-based learning algorithm. The methodology of this analysis, graph-based semi-supervised learning, detects pathways that improve prediction of a dichotomous variable, which in this study is cancer stage. This analysis integrates genome-level gene expression, methylation, and single nucleotide polymorphism (SNP) data in serous cystadenocarcinoma (OV) and colon adenocarcinoma (COAD). The top 10 ranked predictive pathways in COAD and OV were biologically relevant to their respective cancer stages and significantly enhanced prediction accuracy and area under the ROC curve (AUC) when compared to single data-type analyses. This method is an effective way to simultaneously predict binary clinical phenotypes and discover their biological mechanisms.Entities:
Keywords: clinical outcome prediction; colon adenocarcinoma; integrative analysis; multi-dimensional genomic data; serous cystadenocarcinoma
Year: 2014 PMID: 25125969 PMCID: PMC4125381 DOI: 10.4137/CIN.S13634
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Top COAD accuracy improvements by combining three data types.
| PATHWAY | MEAN GENE | MEAN METHYLATION | MEAN SNP | MEAN 3 TYPES (97.5% LCI) |
|---|---|---|---|---|
| Cell-to-Cell | 54.0 | 53.9 | 53.0 | 63.3 (61.5) |
| Biopeptides | 52.8 | 53.0 | 51.8 | 59.5 (57.5) |
| Thrombopoietin | 54.1 | 51.8 | 54.6 | 58.6 (57.1) |
| Cholesterol biosynthesis | 57.7 | 58.0 | 57.1 | 62.3 (57.5) |
| Statin | 59.1 | 56.7 | 60.5 | 64.7 (60.8) |
| Nucleotide metabolism | 55.1 | 57.9 | 57.5 | 61.5 (58.5) |
| Fc epsilon receptor I-mediated signaling | 57.0 | 55.2 | 50.2 | 61.1 (57.1) |
| Growth hormone signaling | 58.3 | 53.0 | 53.1 | 62.0 (58.5) |
| Toll-like receptor signaling | 58.0 | 57.3 | 49.0 | 61.4 (58.2) |
| P38 MAPK signaling | 54.1 | 57.0 | 43.5 | 60.2 (57.2) |
Notes: The top 10 ranked pathways using the accuracy measure in the COAD dataset. “Mean” denotes the mean accuracy of the pathway’s classification of early versus advanced stage over 50 iterations. LCI is calculated as defined in the Materials and Methods section.
Top COAD AUC improvements by combining three data types.
| PATHWAY | MEAN GENE | MEAN METHYLATION | MEAN SNP | MEAN 3 TYPES (97.5% LCI) |
|---|---|---|---|---|
| Differentiation in PC12 cells | 0.52 | 0.53 | 0.55 | 0.66 (0.64) |
| Leukocyte transendothelial migration | 0.60 | 0.60 | 0.53 | 0.68 (0.65) |
| Cell adhesion molecules | 0.60 | 0.61 | 0.59 | 0.68 (0.66) |
| BCR | 0.53 | 0.48 | 0.53 | 0.61 (0.59) |
| Apoptosis | 0.62 | 0.56 | 0.53 | 0.69 (0.67) |
| Biopeptides | 0.57 | 0.51 | 0.57 | 0.64 (0.62) |
| Integrin mediated cell adhesion | 0.55 | 0.55 | 0.50 | 0.62 (0.59) |
| Angiotensin II mediated activation of JNK | 0.55 | 0.51 | 0.57 | 0.63 (0.61) |
| Death | 0.68 | 0.60 | 0.63 | 0.73 (0.71) |
| Fc epsilon receptor I-mediated signaling | 0.55 | 0.49 | 0.57 | 0.62 (0.60) |
Notes: The top 10 ranked pathways using the AUC measure in the COAD dataset. “Mean” denotes the mean accuracy of the pathway’s classification of early versus advanced stage over 50 iterations. LCI is calculated as defined in the Materials and Methods section.
Top OV accuracy improvements by combining three data types.
| PATHWAY | MEAN GENE | MEAN METHYLATION | MEAN SNP | MEAN 3 TYPES (97.5% LCI) |
|---|---|---|---|---|
| Caspase | 53.2 | 53.5 | 54.7 | 60.4 (58.8) |
| Alzheimer’s disease | 56.3 | 52.9 | 56.9 | 62.0 (60.4) |
| Glycolysis and gluconeogenesis | 61.4 | 54.9 | 55.0 | 65.6 (64.0) |
| ACE2 | 57.8 | 54.9 | 52.5 | 61.8 (60.4) |
| Maturity onset diabetes | 56.6 | 54.1 | 58.0 | 62.1 (60.5) |
| Anaplastic lymphoma kinase | 55.3 | 58.1 | 55.5 | 62.1 (60.5) |
| G alpha 12 | 54.6 | 46.8 | 51.5 | 58.2 (56.8) |
| T-cell receptor | 52.6 | 54.2 | 54.0 | 58.5 (56.3) |
| Glycosphingolipid biosynthesis | 58.5 | 51.5 | 58.4 | 62.2 (60.6) |
| G Alpha S | 59.7 | 50.4 | 58.1 | 63.4 (61.7) |
Notes: The top 10 ranked pathways using the accuracy measure in the OV dataset. “Mean” denotes the mean accuracy of the pathway’s classification of early versus advanced stage over 50 iterations. LCI is calculated as defined in the Materials and Methods section.
Top OV AUC improvements by combining three data types.
| PATHWAY | MEAN GENE | MEAN METHYLATION | MEAN SNP | MEAN 3 TYPES (97.5% LCI) |
|---|---|---|---|---|
| Maturity onset diabetes of the young | 0.57 | 0.57 | 0.59 | 0.69 (0.67) |
| Stem Cell | 0.63 | 0.56 | 0.56 | 0.72 (0.70) |
| Cytokine-cytokine receptor interaction | 0.65 | 0.60 | 0.57 | 0.73 (0.71) |
| Caspase | 0.56 | 0.53 | 0.55 | 0.63 (0.62) |
| Alanine and aspartate metabolism | 0.64 | 0.58 | 0.63 | 0.72 (0.70) |
| Neurodegenerative diseases | 0.60 | 0.68 | 0.62 | 0.75 (0.73) |
| Histidine metabolism | 0.61 | 0.60 | 0.50 | 0.69 (0.67) |
| Leukocyte transendothelial migration | 0.64 | 0.63 | 0.63 | 0.71 (0.69) |
| Colorectal cancer | 0.62 | 0.58 | 0.61 | 0.68 (0.67) |
| Calcium signaling | 0.64 | 0.62 | 0.57 | 0.70 (0.68) |
Note: The top 10 ranked pathways using the AUC measure in the OV dataset. “Mean” denotes the mean accuracy of the pathway’s classification of early versus advanced stage over 50 iterations. LCI is calculated as defined in the Materials and Methods section.
Figure 1Caspase pathway validation network in OV. This figure represents the network of patients discovered in testing the caspase pathway in ovarian cancer. Nodes represent patients. The top 200 weighted edges are shown. Weights were determined using α and Pearson correlation coefficients of the integrated data types. Light gray nodes are incorrect integrative method predictions. Medium gray nodes are correct predictions by all data types. Dark gray nodes are correct integrative method predictions and at least one incorrect single data-type prediction.
Figure 2ROC curves for OV pathway Maturity Onset Diabetes of the Young.
Notes: ROC curves for single data type analyses SNP (dashed), gene expression (dotted), and methylation level (dot-dash), and for the three data type analysis (solid).