| Literature DB >> 27029057 |
Min Zhao1, Yining Liu1, Hong Qu2.
Abstract
Epithelial-mesenchymal transition (EMT) is a cellular process through which epithelial cells transform into mesenchymal cells. EMT-implicated genes initiate and promote cancer metastasis because mesenchymal cells have greater invasive and migration capacities than epithelial cells. In this pan-cancer analysis, we explored the relationship between gene expression changes and copy number variations (CNVs) for EMT-implicated genes. Based on curated 377 EMT-implicated genes from the literature, we identified 212 EMT-implicated genes associated with more frequent copy number gains (CNGs) than copy number losses (CNLs) using data from The Cancer Genome Atlas (TCGA). Then by correlating these CNV data with TCGA gene expression data, we identified 71 EMT-implicated genes with concordant CNGs and gene up-regulation in 20 or more tumor samples. Of those, 14 exhibited such concordance in over 110 tumor samples. These 14 genes were predominantly apoptosis regulators, which may implies that apoptosis is critical during EMT. Moreover, the 71 genes with concordant CNG and up-regulation were largely involved in cellular functions such as phosphorylation cascade signaling. This is the first observation of concordance between CNG and up-regulation of specific genes in hundreds of samples, which may indicate that somatic CNGs activate gene expression by increasing the gene dosage.Entities:
Keywords: cancer genomics; copy number variation; epithelial-mesenchymal transition (EMT); gene expression; pan-cancer
Mesh:
Year: 2016 PMID: 27029057 PMCID: PMC5029734 DOI: 10.18632/oncotarget.8371
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1Gene ontology analysis of 212 human EMT-implicated genes with frequent CNGs
The scatterplot shows the gene ontology (GO) clusters for the 212 EMT-implicated genes in a two-dimensional space derived through application of multidimensional scaling to a matrix of the semantic similarities of the GO terms. Bubble colors indicate the frequency of a GO term in the GOA database (bubbles with more general terms are red), while bubble sizes indicate the log of the corrected P-value (bubbles with smaller corrected P-values are larger).
The top 10 pathways enriched for the 212 EMT-implicated genes with frequent CNGs
| Pathway | #G | EMT List | |
|---|---|---|---|
| Proteoglycans in cancer | 5.78E-25 | 39 | MYC,HGF,ERBB2,ITGA5,ITGB1,ITGB3,PTK2,STAT3,DDX5,TWIST1,MET,BRAF,PIK3CA,WNT3A,PLAUR,FGFR1,SDC1,RAC1,VEGFA,ROCK2,KRAS,RAF1,VTN,HBEGF,WNT1,CAV1,TGFB1,TGFB2,MAPK14,IGF1R,EGFR,CD44,PAK1, MIR21,PRKCA,ROCK1,TNF,MAPK1,MAPK3 |
| Pathways in cancer | 4.93E-21 | 41 | MYC,CDKN1B,HGF,ERBB2,ITGA6,ITGB1,PTGS2,PTK2,STAT3,STAT5A,AXIN1,STAT5B,AXIN2,MET,BMP2,BMP4,BRAF,PIK3CA,WNT3A,FGFR1,FGFR2,KIT,RAC1,VEGFA, KRAS,RAF1,WNT1,SHH,LAMA5,BIRC2,PPARG,GSK3B,TGFA,TGFB1,TGFB2,IGF1R,EGFR,PRKCA,MAPK1,MAPK3,EPAS1 |
| Integrated pancreatic cancer pathway | 4.93E-21 | 29 | MYC,CDKN1B,SP1,ERBB2,EZH2,PTGS2,STAT5A,PIK3CA, FGFR1,RAC1,VEGFA,KRAS,RAF1,ANXA1,WT1,SHH,GSK3A,TGFB1,LEFTY1,MAPK14,IGFBP3,EGFR,EGR1,PAK1, PRKCA,TNF,MAPK1,MAPK3,MAPK7 |
| MicroRNAs in cancer | 4.56E-16 | 34 | MYC,CDKN1B,ERBB2,ITGA5,EZH2,ITGB3,PTGS2,STAT3,BMI1,MET,PIK3CA,WNT3A,NOTCH2,VEGFA,KRAS,RAF1,VIM,ZEB1,TGFB2,MIR137,MIR15B,EGFR,TP63,MIR194–1,CD44,MIR21,PRKCA,PRKCE,ROCK1,MIR200C,MAPK1,MAPK7,FSCN1,MIR23A |
| Focal adhesion | 5.94E-16 | 29 | HGF,ERBB2,ITGA6,ITGA5,ITGB1,ITGB3,ITGB4,ZYX,PTK2,MET,BRAF,PIK3CA,RAC1,VEGFA,ROCK2,RAF1,FLT1,VTN,LAMA5,BIRC2,CAV1,GSK3B,IGF1R,EGFR,PAK1,PRKCA,ROCK1,MAPK1,MAPK3 |
| MicroRNAs in cardiomyocyte hypertrophy | 3.98E-14 | 21 | STAT3,PIK3CA,WNT3A,FGFR2,RAC1,ROCK2,RAF1,EDN1,GSK3B,TGFB1,MAPK14,IGF1R,MIR15B,LRP6,MIR21,ROCK1,TNF,MAPK1,MAPK3,MAPK7,MIR23A |
| Prolactin signaling pathway | 1.58E-12 | 17 | MYC,ERBB2,ITGB1,PTK2,STAT3,GAB2,STAT5A,STAT5B, PIK3CA,RAC1,RAF1,YWHAZ,GSK3B,MAPK14,PAK1,MAPK1,MAPK3 |
| ErbB signaling pathway | 1.58E-12 | 18 | MYC,CDKN1B,ERBB2,PTK2,STAT5A,STAT5B,BRAF,PIK3CA,KRAS,RAF1,HBEGF,GSK3B,TGFA,EGFR,PAK1,PRKCA,MAPK1,MAPK3 |
| IL-3 signaling Pathway | 2.26E-12 | 19 | PTK2,STAT3,GAB2,STAT5A,STAT5B,PIK3CA,HSPB1,RAC1,KRAS,RAF1,YWHAZ,GSK3A,GSK3B,MAPK14,PAK1,PRKCA,MAPK1,MAPK3,MAPK7 |
Q-values: the raw P-values of the hypergeometric test were corrected by Benjamini-Hochberg multiple testing correction.
G: the number of EMT-implicated genes associated with the pathway.
Figure 2Collection of 71 EMT-implicated genes with increased gene expression induced by CNGs
(A) The computational pipeline for identifying 71 EMT-implicated genes with concordance between CNG and up-regulation; (B) the global CNV patterns across multiple cancers for 71 EMT-implicated genes with increased gene expression induced by CNGs.
Figure 3The number of genes with concordance between CNG and up-regulation, and the global CNV mutational pattern
(A) The number of EMT-implicated genes with concordance between CNGs and up-regulation in different tumor samples. (B-H) The CNV landscape in multiple cancer datasets in the cBio portal for: (B) YWHAZ, (C) PIK3CA, (D) MTDH, (E) EGFR, (F) ECT2, (G) ERBB2, and (H) ESRP1.
Figure 4Reconstructed interaction map for EMT-implicated genes with CNGs and increased gene expression in matched tumor samples
(A) The 49 genes in orange are those among the 71 EMT-implicated genes with increased expression induced by CNGs in 20 or more matched tumour samples. The other 19 genes in blue are linker genes that connect the 49 genes. The node size indicates the connection strength - the larger the node, the greater the degree of connectivity; The node size indicates the connection strength. The larger the node, the greater the degree of connectivity (B) the plot of degrees for all nodes in the network. The X axis represents the degrees of the nodes, and the Y axis represents the total number of nodes that correspond to the values on the X axis; The X axis represents the degree of the nodes, and the Y axis represents the total number of nodes that correspond to the values in the X axis (C) the plot of lengths for short paths in the network.
Figure 5The correlation of CNVs and gene expression in the TCGA ovarian cancer cohort
(A) The oncoprint for the four genes with the most frequent CNGs in the TCGA ovarian cancer cohort. The red cells represent a CNG in a tumour sample; the blue cells represent a CNL in the corresponding tumour sample. (B–E) The box plots for the four genes with the most frequent CNGs in the TCGA ovarian cancer cohort: (B) MYC, (C) NDRG1, (D) SCRIB, and (E) EIF5A2. These CNV levels are derived from the copy-number analysis algorithm GISTIC. For each gene, a deep loss is a copy-number level of “–2” with a possible ehomozygous deletion; a shallow loss is a copy-number level of “–1” with a possible heterozygous deletion; the normal gene copy number is noted as “diploid”; “gain” indicates a low level of CNG; and “amplification” indicates a high level of CNG.