| Literature DB >> 28361034 |
Hieu T Nim1, Milena B Furtado2, Mirana Ramialison3, Sarah E Boyd4.
Abstract
BACKGROUND: Quantitative high-throughput data deposited in consortia such as International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) present opportunities and challenges for computational analyses.Entities:
Keywords: The Cancer Genome Atlas; data mining; prognosis; prostate cancer; retinoic acid; systems biology
Year: 2017 PMID: 28361034 PMCID: PMC5350134 DOI: 10.3389/fonc.2017.00030
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Data used in this study.
| Database | Dataset | PMID | Platform |
|---|---|---|---|
| BIOGRID 3.4 | BIOGRID-ALL-3.4.138 | 25428363 | Two-hybrid, affinity capture MS, and genetics |
| STRING 10 | protein.link.detailed.v10 | 25352553 | Protein–protein interaction network and text mining |
| TCGA | PRAD | 26544944 | RNA-Seq, DNA copy number, and clinical profile |
| EGA/ICGC | EGAS00001000682 | 25066126 | DNA methylation |
| Ingenuity® Pathway Analysis | Ingenuity Knowledge Base | 24336805 | Causal network and interaction network |
| NCBI GEO | GSE35988 | 22722839 | Gene expression |
| DAVID 6.7 | DAVID Knowledgebase | 19131956 | Gene ontology annotation |
Figure 1Pipeline of the combinatorial ranking procedures, developed to systematically explore and evaluate gene sets based clinical relevance. A core gene set (Gcore) is derived in a two-phase procedure: (1) network expansion using Ingenuity® Pathway Analysis and (2) network contraction by verifying the individual network links in BIOGRID 3.4 and STRING 10 databases. Power set generation populates all combinatorial gene sets based on Gcore. Finally, disease-free survival analysis ranks all candidate gene sets based on prognostic values.
Figure 2Gene/protein interaction network of . (A) Network expansion phase: Ingenuity Pathway Analysis tool gives 279 interaction partners of ALDH1A2. (B) Network contraction phase: BIOGRID 3.4 and STRING 10 databases reduce the 279-node ALDH1A2 network down to 11 nodes (genes/proteins), with a minimum of two lines of evidence (indicated with colored lines).
Differential gene expression analysis between cancer and normal samples based on Grasso et al. dataset (.
| Gene symbol | Probe ID | Adjusted | Log fold-change |
|---|---|---|---|
| A_24_P73577 | 2.24E−15 | −3.811791 | |
| A_23_P138655 | 3.35E−03 | 2.7341369 | |
| CYP26B1 | A_23_P210100 | 6.86E−02 | −1.3187345 |
| A_32_P25050 | 1.57E−01 | −0.7989623 | |
| ADH5 | A_24_P260346 | 2.68E−13 | −2.1221184 |
| DHRS3 | A_23_P33759 | 1.16E−01 | −0.4794662 |
| ADH4 | A_23_P30098 | 3.91E−01 | 0.592659 |
| ADH1B | A_24_P940469 | 4.15E−03 | 2.0103106 |
| ADH1A | A_24_P291658 | 1.58E−02 | 1.669169 |
Differential methylation analysis between tumor and normal samples based on Brocks et al. dataset (.
| Gene symbol | ENSEMBL ID | Adjusted | Log fold-change |
|---|---|---|---|
| ENSG00000128918 | 0.800930711 | −0.033209504 | |
| CYP26C1 | ENSG00000187553 | 0.800930711 | −0.074378124 |
| ENSG00000095596 | 0.800930711 | −0.035321392 | |
| CYP26B1 | ENSG00000003137 | 0.800930711 | −0.048280355 |
| ENSG00000121039 | 0.800930711 | 0.015297498 | |
| ADH5 | ENSG00000197894 | 0.800930711 | 0.02746407 |
| DHRS3 | ENSG00000162496 | 0.800930711 | 0.06188264 |
| ADH7 | ENSG00000196344 | 0.887042456 | 0.004002081 |
| ADH1B | ENSG00000196616 | 0.800930711 | 0.085607147 |
| ADH1A | ENSG00000187758 | 0.800930711 | 0.116789434 |
Heteroscedastic unpaired .
| Clinical parameters | No relapse ( | Relapse ( | |
|---|---|---|---|
| Age | 60.877 (6.999) | 61.554 (5.944) | 0.343 |
| Number of lymph nodes | 11.538 (9.129) | 13.095 (11.892) | 0.265 |
| Most recent PSA results | 0.822 (3.605) | 1.865 (5.301) | 0.085 |
Means and SDs are shown.
Figure 3Systematic analysis of all candidate gene sets, generated from the power set of 11 genes in . (A) The DFS Kaplan–Meier (KM) log-rank p-value landscape from the ALDH1A2-derived candidate gene sets. The optimal gene set (OGS) according to KM log-rank p-values is indicated (red dashed box). (B) KM log-rank survival curves of n = 491 patients in the TCGA cohorts with respect to the presence or absence of aberrant expression (based on z-statistics) of genes in the OGS.
Univariate Cox regression analysis of Gleason score and optimal gene set (OGS), with respect to disease-free survival.
| Clinical parameters | Univariate HR (95% CI) | |
|---|---|---|
| Gleason 1 (OldGleason ≤ 6) | Reference | Reference |
| Gleason 2 (OldGleason = 3 + 4) | 3.638 (0.473, 27.98) | 0.21472 |
| Gleason 3 (OldGleason = 4 + 3) | 5.223 (0.6788, 40.19) | 0.11233 |
| Gleason 4 (OldGleason = 8) | 9.173 (1.1997, 70.13) | 0.03273 |
| Gleason 5 (OldGleason ≥ 9) | 20.826 (2.8802, 150.58) | 0.00263 |
| OGS | 2.123 (1.1, 4.098) | 0.0248 |
*p-Value < 0.05.
ANOVA analysis for multivariable Cox regression models of Gleason score and/or optimal gene set (OGS).
| Clinical parameters | Log (likelihood) | Chi-square | Degree of freedom | |
|---|---|---|---|---|
| Gleason | −494.69 | Reference | Reference | Reference |
| Gleason + OGS | −466.78 | 55.816 | 4 | 2.191 × 10−11 |
*p-Value < 0.05.