| Literature DB >> 25551281 |
Qingyang Zhang1, Joanna E Burdette2, Ji-Ping Wang3.
Abstract
BACKGROUND: Over the past years, tremendous efforts have been made to elucidate the molecular basis of the initiation and progression of ovarian cancer. However, most existing studies have been focused on individual genes or a single type of data, which may lack the power to detect the complex mechanisms of cancer formation by overlooking the interactions of different genetic and epigenetic factors.Entities:
Mesh:
Year: 2014 PMID: 25551281 PMCID: PMC4331442 DOI: 10.1186/s12918-014-0136-9
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Summary of TCGA ovarian cancer data
|
|
|
|
|---|---|---|
| Gene expression | Agilent 244K | 574 (8 organ-specific controls) |
| Somatic mutation | Agilent 415K | 579 (8 organ-specific controls) |
| DNA methylation | Illumina 27K | 584 (8 organ-specific controls) |
| Copy number variation | Agilent 1M | 579 (8 organ-specific controls) |
| Clinical information | N/A | 583 |
Summary of TCGA ovarian cancer data including the data types we incorporated in the analysis, platforms and the number of available cases.
Figure 1Removal of batch effect and age effect. (a) Boxplots of BRCA1 expression before (on the left) and after (on the right) removal of batch effect, where x-axis is the batch number and y-axis is the expression level; (b) Boxplots of BRCA1 expression before (on the left) and after (on the right) removal of age effect, x-axis is age group (<50 yrs old, 50-70 yrs old and >70 yrs old) and y-axis is the expression level. In the preprocessing step, we removed batch and age effects of expression level and methylation level for every single gene.
Figure 2Workflow of our integrative framework.
48 tumor suppressors and oncogenes from TCGA data
|
|
|
|
|---|---|---|
| Suppressor | CDKN2A | Cyclin-Dependent Kinase Inhibitor 2A |
| Suppressor | MAP2K4 | Mitogen-Activated Protein Kinase Kinase 4 |
| Suppressor | MAGEC1 | Melanoma Antigen Family C, 1 |
| Suppressor | RIMBP2 | RIMS Binding Protein 2 |
| Suppressor | DIRAS3 | DIRAS Family, GTP-Binding RAS-Like 3 |
| Suppressor | PEG3 | Paternally Expressed 3 |
| Suppressor | DAB2 | Disabled Homolog 2, Mitogen-Responsive Phosphoprotein |
| Suppressor | NF1 | Neurofibromin 1 |
| Suppressor | ARID1A | AT Rich Interactive Domain 1A |
| Suppressor | OPCML | Opioid Binding Protein/Cell Adhesion |
| Suppressor | PLAGL1 | Pleiomorphic Adenoma Gene-Like 1 |
| Suppressor | CASP9 | Caspase 9, Apoptosis-Related Cysteine Peptidase |
| Suppressor | WWOX | WW Domain Containing Oxidoreductase |
| Suppressor | RPS6KA2 | Ribosomal Protein S6 Kinase, 90kDa, Polypeptide 2 |
| Suppressor | SPARC | Secreted Protein, Acidic, Cysteine-Rich |
| Suppressor | DLEC1 | Deleted In Lung And Esophageal Cancer 1 |
| Oncogene | THY1 | Thy-1 Cell Surface Antigen |
| Oncogene | ALG3 | Alpha-1, 3-Mannosyltransferase |
| Oncogene | ATP5E | ATP Synthase, H+ Transporting, Mitochondrial F1 Complex, Epsilon Subunit |
| Oncogene | ATP6V1C1 | ATPase, H+ Transporting, Lysosomal 42kDa, V1 Subunit C1 |
| Oncogene | C19orf53 | Chromosome 19 Open Reading Frame 53 |
| Oncogene | CSNK2A1 | Casein Kinase 2, Alpha 1 Polypeptide |
| Oncogene | CTSF1 | Cathepsin F |
| Oncogene | DERL1 | Derlin 1 |
| Oncogene | HSF1 | Heat Shock Transcription Factor 1 |
| Oncogene | ITPA | Inosine Triphosphatase |
| Oncogene | MRPL34 | Mitochondrial Ribosomal Protein L34 |
| Oncogene | NCBP2 | Nuclear Cap Binding Protein Subunit 2 |
| Oncogene | NDUFA13 | NADH Dehydrogenase (Ubiquinone) 1 Alpha Subcomplex, 13 |
| Oncogene | NDUFB7 | NADH Dehydrogenase (Ubiquinone) 1 Beta Subcomplex, 7 |
| Oncogene | NDUFB9 | NADH Dehydrogenase (Ubiquinone) 1 Beta Subcomplex, 9 |
| Oncogene | OSBPL2 | Oxysterol Binding Protein-Like 2 |
| Oncogene | POLR2H | Polymerase (RNA) II (DNA Directed) Polypeptide H |
| Oncogene | PIK3R1 | Phosphoinositide-3-Kinase, Regulatory Subunit 1 |
| Oncogene | AKT2 | V-Akt Murine Thymoma Viral Oncogene Homolog 2 |
| Oncogene | ERG | V-Ets Erythroblastosis Virus E26 Oncogene Homolog |
| Oncogene | PTK2 | Protein Tyrosine Kinase 2 |
| Oncogene | RAE1 | RAE1 RNA Export 1 Homolog |
| Oncogene | RIOK1 | RIO Kinase 1 |
| Oncogene | SNRPB2 | Small Nuclear Ribonucleoprotein Polypeptide B |
| Oncogene | SNX5 | Sorting Nexin 5 |
| Oncogene | SRXN1 | Sulfiredoxin 1 |
| Oncogene | STX10 | Syntaxin 10 |
| Oncogene | TRMT1 | TRNA Methyltransferase 1 Homolog |
| Oncogene | TRMT6 | TRNA Methyltransferase 6 Homolog |
| Oncogene | WDR53 | WD Repeat Domain 53 |
| Oncogene | YWHAZ | Tyrosine 3-Monooxygenase/Tryptophan 5-Monooxygenase Activation Protein, Zeta Polypeptide |
| Oncogene | RAB25 | RAB25, Member RAS Oncogene Family |
Presented in the table are the symbol and name of 48 tumor suppressors and oncogenes identified from TCGA data.
36 tumor suppressors and oncogenes from literature
|
|
|
|
|---|---|---|
| Suppressor | RB1 | Retinoblastoma 1 |
| Suppressor | PTEN | Phosphatase And Tensin Homolog |
| Suppressor | DAB2 | Disabled Homolog 2, Mitogen-Responsive Phosphoprotein |
| Suppressor | DLEC1 | Deleted In Lung And Esophageal Cancer 1 |
| Suppressor | TP53 | Tumor Protein P53 |
| Suppressor | NF1 | Neurofibromin 1 |
| Suppressor | SPARC | Secreted Protein, Acidic, Cysteine-Rich |
| Suppressor | TMPRSS2 | Transmembrane Protease, Serine 2 |
| Suppressor | CASP9 | Caspase 9, Apoptosis-Related Cysteine Peptidase |
| Suppressor | PLAGL1 | Pleiomorphic Adenoma Gene-Like 1 |
| Suppressor | WWOX | WW Domain Containing Oxidoreductase |
| Suppressor | RPS6KA2 | Ribosomal Protein S6 Kinase, 90kDa, Polypeptide 2 |
| Suppressor | BRCA1 | Breast Cancer 1, Early Onset |
| Suppressor | BRCA2 | Breast Cancer 2, Early Onset |
| Suppressor | DIRAS3 | DIRAS Family, GTP-Binding RAS-Like 3 |
| Suppressor | PEG3 | Paternally Expressed 3 |
| Suppressor | ARID1A | AT Rich Interactive Domain 1A |
| Suppressor | OPCML | Opioid Binding Protein/Cell Adhesion |
| Oncogene | MYC | V-Myc Myelocytomatosis Viral Oncogene Homolog |
| Oncogene | CDC25A | Cell Division Cycle 25A |
| Oncogene | PIK3CA | Phosphatidylinositol-4, 5-Bisphosphate 3-Kinase |
| Oncogene | NOTCH3 | Notch 3 |
| Oncogene | EIF5A2 | Eukaryotic Translation Initiation Factor 5A2 |
| Oncogene | STAT3 | Signal Transducer And Activator Of Transcription 3 |
| Oncogene | ETV6 | Ets Variant 6 |
| Oncogene | EGFR | Epidermal Growth Factor Receptor |
| Oncogene | FGF1 | Fibroblast Factor 1 |
| Oncogene | AKT2 | V-Akt Murine Thymoma Viral Oncogene Homolog 2 |
| Oncogene | KRAS | V-Ki-Ras2 Kirsten Rat Sarcoma Viral Oncogene Homolog |
| Oncogene | RAB25 | RAB25, Member RAS Oncogene Family |
| Oncogene | AURKA | Aurora Kinase A |
| Oncogene | PIK3R1 | Phosphoinositide-3-Kinase, Regulatory Subunit 1 |
| Oncogene | ERG | V-Ets Erythroblastosis Virus E26 Oncogene Homolog |
| Oncogene | ATAD2 | ATPase Family, AAA Domain Containing 2 |
| Oncogene | PDGFRA | Platelet-Derived Growth Factor Receptor, Alpha Polypeptide |
| Oncogene | ERBB2 | V-Erb-B2 Erythroblastic Leukemia Viral Oncogene Homolog 2 |
Presented in the table are the symbol and name of 36 tumor suppressors and oncogenes reported in the literature [2].
Figure 3Predicted graph by Bayesian network model with logit link function and blockwise coordinate descent algorithm, with 339 nodes including expression level of 245 genes (yellow), copy number at 82 sites (blue), methylation at 11 sites (green) and 1 somatic mutation at gene , connected by 698 directed edges. Direction of the edge indicates the downstream feature is regulated by the upstream one. Red edge represents activation and black edge represents suppression. Details are listed in Additional file 3: Table S3.
Figure 4Histogram of outdegree (number of edges going out from the node) for each gene in the predicted network, the mean degree and standard deviation are 2.15 and 2.31 respectively. Genes with outdegree greater than 7 (mean+2SD) are identified as hub genes. The 13 hub genes are listed in Table 4.
13 local drivers (hub genes) in the predicted network
|
|
|
|---|---|
| ARID1A | AT Rich Interactive Domain 1A |
| C19orf53 | Chromosome 19 Open Reading Frame 53 |
| CSNK2A1 | Casein Kinase 2, Alpha 1 Polypeptide |
| DERL1 | Derlin 1 |
| TRMT6 | TRNA Methyltransferase 6 Homolog |
| COL5A2 | Collagen, Type V, Alpha 2 |
| TCF21 | Transcription Factor 21 |
| LUM | Lumican |
| TPX2 | Microtubule-Associated, Homolog |
| UBE2C | Ubiquitin-Conjugating Enzyme E2C |
| DPM1 | Dolichyl-Phosphate Mannosyltransferase Polypeptide 1, |
| Catalytic Subunit | |
| NDUFB7 | NADH Dehydrogenase (Ubiquinone) 1 Beta Subcomplex, 7 |
| NDUFB9 | NADH Dehydrogenase (Ubiquinone) 1 Beta Subcomplex, 9 |
Presented in the table are the symbol and name of 13 hub genes identified from the predicted Bayesian network.
Figure 5Multidimensional Scaling (MDS) plots for sample classification. (a) MDS plot based on 13 hub genes only where the distance between samples is measured by Euclidean distance of the gene expression level; (b) MDS plot based on all the 245 genes in the predicted network where each dot represents one sample and totally 580 samples including 8 normal samples (cancer-free, red), 15 early-stage samples (cancer at stage I, green) and 557 high-grade samples (cancer at stage II or higher, black).
Figure 6Identification of gene clusters. (a) MDS plot based on correlation dissimilarity metric among 245 genes (each circle represents one gene), where 13 hub genes are indicated by red dots; (b) The proportion of variance that can be explained by clustering (y-axis) against the number of clusters (x-axis) based on different values of k (k =1,2,…,7) by k-means clustering method. From this plot, the most likely number of clusters is four.
Figure 7Four major clusters with distinct cellular functions. (a) Multidimensional (MDS) plot based on correlation dissimilarity metric between 245 genes (each circle represents one gene). Genes falling into four clusters (by k-means clustering method where k = 4) are indicated by different colors; (b) Correlation plot of the four clusters, the connection between a pair of genes represents a significant correlation.
Number of causal edges within/between four clusters in the TCGA ovarian cancer data
|
|
|
|
| |
|---|---|---|---|---|
|
|
|
|
| |
| Cluster 1 (23) | 46 | 0 | 2 | 35 |
| Cluster 2 (18) | 28 | 0 | 40 | |
| Cluster 3 (20) | 40 | 29 | ||
| Cluster 4 (184) | 384 |
Presented in the table are the number of predicted edges within and between clusters. The number of genes in each cluster is listed in the parentheses.
Figure 8Subnetwork extracted from Figure 3 which is corresponding to cell division process (mitosis, spindle formation etc), containing 18 nodes and 28 directed edges. Direction of the edge indicates the downstream feature is regulated by the upstream one. Red edge represents activation and black edge represents suppression. TPX2 and UBE2C are two hub genes that may drive this subnetwork.
66 survival-related genes
| AK7 | C2orf39 | CCDC19 | LOC136288 | WDR38 | C1orf192 | CCDC37 |
| FLJ23049 | RSHL3 | TEKT1 | CXorf41 | ZMYND10 | RNASE3 | LILRA1 |
| MS4A4A | RNASE2 | SIGLEC7 | ABI3 | LAT2 | OLR1 | SIGLEC9 |
| CD53 | LAIR1 | SAMSN1 | SRGN | RGS18 | LAPTM5 | PSG11 |
| C10orf96 | HOTAIR | HTR5A | TDH | CCDC83 | GYPA | USP9Y |
| UTY | HOXC10 | LCT | NPY5R | SLC2A2 | MBL2 | PEX5L |
| PSG8 | SLC17A2 | CCDC63 | HOXA11 | GALNT10 | GJB2 | ITGA5 |
| MMP2 | RUNX1 | FAP | INHBA | THBS2 | VCAN | ADAMTS2 |
| ALPK2 | ECM1 | SPHK1 | AEBP1 | COL5A1 | LUM | ANTXR1 |
| C21orf96 | COL8A1 | HOM-TES-103 |
Figure 9Survival induced Bayesian network, including expression level of 66 genes (yellow) and promoter methylation at 2 sites (green). Two genes, PSG11 and GALNT10, have direct effect on the survival time. Direction of the edge indicates the downstream feature is regulated by the upstream one. Red edge represents activation and black edge represents suppression. Details are listed in Additional file 3: Table S3.
Figure 10Survival probability against time of different groups by the expression level of gene (a) and gene (b). The x axis represents the survival time (in days) and y axis represents survival rate. By log-rank test, the p-values are 4.9×10−5 and 2.2×10−9 for PSG11 and GALNT11, respectively. The black solid line is based on all subjects (235 patients) and the 95% confidence limits are represented by the black dashed lines. The red and blue lines are based on overexpressed and underexpressed group, respectively.
Simulation I results
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 50 | 100 | 0.5 | 500 | 46.0 | 0.298 (0.410) | 0.468 (0.152) |
| 1000 | 63.2 | 0.420 (0.627) | 0.333 (0.063) | |||
| 2000 | 78.4 | 0.600 (0.783) | 0.273 (0.032) | |||
| 1 | 500 | 78.2 | 0.550 (0.740) | 0.294 (0.051) | ||
| 1000 | 92.8 | 0.676 (0.910) | 0.265 (0.019) | |||
| 2000 | 98.4 | 0.781 (0.960) | 0.236 (0.017) | |||
| 100 | 200 | 0.5 | 500 | 110.6 | 0.260 (0.400) | 0.528 (0.272) |
| 1000 | 124.2 | 0.328 (0.557) | 0.484 (0.104) | |||
| 2000 | 168.4 | 0.590 (0.825) | 0.291 (0.019) | |||
| 1 | 500 | 163.2 | 0.539 (0.735) | 0.349 (0.098) | ||
| 1000 | 167.8 | 0.614 (0.892) | 0.347 (0.018) | |||
| 2000 | 194.4 | 0.768 (0.959) | 0.216 (0.010) | |||
| 200 | 400 | 0.5 | 500 | 252.6 | 0.225 (0.358) | 0.647 (0.444) |
| 1000 | 272.8 | 0.383 (0.597) | 0.438 (0.132) | |||
| 2000 | 326.4 | 0.546 (0.791) | 0.337 (0.031) | |||
| 1 | 500 | 347.2 | 0.535 (0.825) | 0.377 (0.073) | ||
| 1000 | 364.8 | 0.583 (0.872) | 0.359 (0.044) | |||
| 2000 | 396.4 | 0.698 (0.963) | 0.294 (0.028) |
Presented in the table are the average number of predicted edges (P), true positive rate (TPR) and false discovery rate (FDR) for both directed and undirected (skeleton) edges over 10 replicated samples in each setting (p |E|, β, N). The number of edges |E| is set to be 2p.
Figure 11Comparison of three different Bayesian network models. (a) The known signaling pathway (Bayesian network) containing 11 proteins (nodes) and 20 causal relations (directed edges); (b) Predicted network by logistic BN model; (c) Predicted Network by Gaussian BN model; (d) Predicted network by Multinomial BN model.
Comparison of three different BN models
|
|
|
|
|
|---|---|---|---|
| Gaussian BN | 26 | 0.55 (0.70) | 0.58 (0.46) |
| Multinomial BN | 20 | 0.40 (0.60) | 0.60 (0.40) |
| Logistic BN | 22 | 0.55 (0.80) | 0.50 (0.18) |
Presented in the table are the number of predicted edges (p), true positive rate (TPR) and false discovery rate (FDR) for both directed and undirected (skeleton) edges using three different BN models. The true network is known and it contains 11 nodes and 20 edges.
Figure 12Simulation II: Comparison between Pearson’s Chi-square test and SCBS procedure in feature selection. Four curves presented in the plot are based on the true positive rate (TPR) by two methods under two different causal effects β=1 and β=2. Sample sizes are set to be 500, 1000, 2000 and 5000. The total number of positives is restricted to be 49 for both methods and the TPR is calculated as the number of true positives divided by 49.
Figure 13True positive rate (TPR) of selected feature against the choice of based on networks with average degree 4 (100 edges, blue curve), 8 (200 edges, black curve) and 12 (300 edges, red curve), respectively. It is shown that increasing network density leads to increasing TPR, and a k of 4, 5 or 6 performs better in all situations.