Literature DB >> 23251436

Meta-analysis of gene expression signatures defining the epithelial to mesenchymal transition during cancer progression.

Christian J Gröger1, Markus Grubinger, Thomas Waldhör, Klemens Vierlinger, Wolfgang Mikulits.   

Abstract

The epithelial to mesenchymal transition (EMT) represents a crucial event during cancer progression and dissemination. EMT is the conversion of carcinoma cells from an epithelial to a mesenchymal phenotype that associates with a higher cell motility as well as enhanced chemoresistance and cancer stemness. Notably, EMT has been increasingly recognized as an early event of metastasis. Numerous gene expression studies (GES) have been conducted to obtain transcriptome signatures and marker genes to understand the regulatory mechanisms underlying EMT. Yet, no meta-analysis considering the multitude of GES of EMT has been performed to comprehensively elaborate the core genes in this process. Here we report the meta-analysis of 18 independent and published GES of EMT which focused on different cell types and treatment modalities. Computational analysis revealed clustering of GES according to the type of treatment rather than to cell type. GES of EMT induced via transforming growth factor-β and tumor necrosis factor-α treatment yielded uniformly defined clusters while GES of models with alternative EMT induction clustered in a more complex fashion. In addition, we identified those up- and downregulated genes which were shared between the multitude of GES. This core gene list includes well known EMT markers as well as novel genes so far not described in this process. Furthermore, several genes of the EMT-core gene list significantly correlated with impaired pathological complete response in breast cancer patients. In conclusion, this meta-analysis provides a comprehensive survey of available EMT expression signatures and shows fundamental insights into the mechanisms that are governing carcinoma progression.

Entities:  

Mesh:

Year:  2012        PMID: 23251436      PMCID: PMC3519484          DOI: 10.1371/journal.pone.0051136

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The epithelial to mesenchymal transition (EMT) has been originally described as an essential process of metazoan embryogenesis [1]. In the past decade, EMT has been realized as a critical event in carcinoma progression as epithelial tumor cells acquire a mesenchymal phenotype that allows them to detach from the primary tumor and to invade into the local tissue [2]. In general, polarized epithelial cells are organized by cell-cell junctions and cell-anchoring complexes to form apical and basolateral surfaces. In contrast, mesenchymal cells form irregularly shaped structures in the absence of tight adhesions to the neighboring cells and reduced cell contact to the substratum. Mesenchymal cells have an elongated shape compared to epithelia and display an anterior-posterior polarity that enables enhanced migration through reduced adhesion forces. While epithelial cells invade collectively in clusters, mesenchymal cells show individual cell movement that allows them to disseminate from bulk cells [3]. In addition, a partial EMT displaying different levels of E-cadherin expression has been observed that might still lead to collective cell invasion [4]. EMT has been classified into three subtypes [5]. Type 1 EMT is required for embryogenesis to provide gastrulation and formation of neural crest cells that differentiate into various cell types without systemic spreading. Type 2 EMT is involved in tissue regeneration and fibrosis of different organs such as the kidney, liver, lung and intestine leading to the accumulation of connective tissue. Type 3 EMT associates with a gain in malignancy of carcinoma cells. Neoplastic epithelial cells induced to undergo EMT are frequently localized at the invasive front of the primary tumor and initiate the cascade of tumor cell dissemination by local cell invasion which is followed by the entry into the vasculature. Notably, EMT represents a transient and reversible process that can lead to a mesenchymal to epithelial transition (MET) upon metastatic colonization [5], . Cycles of EMT and MET are assumed to be involved in metastasis formation at distal sites [3]. Yet, the molecular basis for the changes in epithelial plasticity by EMT and MET is still an open issue and its role in cancer patients is a matter of debate. Signaling molecules and inducers of type 3 EMT confer the resistance of cancer cells to apoptosis and oncogene-induced senescence as well as chemoresistance [6]. Recent findings indicate that EMT provides mesenchymal cells with stem cell features that enable carcinoma cells to generate metastasis at secondary sites [3]. These cancer stem cells, also termed cancer initiating cells, share phenotypic and functional characteristics with migratory embryonic cells displaying a mesenchymal phenotype [6]. Profiling of the transcriptome using microarrays has been widely used to elucidate the expression patterns during EMT under different conditions which revealed novel biomarkers and molecular mechanisms from single studies. A meta-analysis usually describes the combination of a large number of studies from different samples and tissues or the comparison of own data with published data [7], [8]. Recent progress in the establishment of gene expression datasets enables to identify new markers and relevant mechanisms which were underestimated in single studies but emerged from a meta-analysis. By now, a plethora of gene expression studies (GES) covering a wide variety of cell types undergoing EMT together with various modes of induction are available. Yet to our knowledge, no meta-analysis dealing with these EMT studies has been performed so far. Changes in a biological system require a concerted alteration of gene expression sets. Bioinformatic enrichment analysis tools investigate gene expression sets for such changes. These tools examine the overrepresentation of gene sets in comparison to the whole genome, map an input list of genes to biological categories in online databases and statistically assess the overrepresentation of genes for each biological category or annotation such as Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and gene ontology (GO) terms [9]. The use of several single enrichment tools for the same input list and the consideration of only consistently enriched categories have been reported to be a very promising strategy [10], [11]. We gathered data from 18 published and independent GES of EMT and extracted gene lists of significantly up- and downregulated genes for cluster analysis. This approach revealed gene clusters according to treatment modalities rather than to cell type. We subsequently extracted an EMT-core list consisting of 130 genes with official gene symbols and names which was further investigated by enrichment analysis with several single enrichment tools. Notably, selected genes from the EMT-core list significantly correlated with impaired pathological complete response (pCR) in breast cancer patients. This analysis proposes that the EMT-core gene list is relevant for the recognition of the molecular mechanisms of EMT. In addition, the cluster analysis shows novel insights into the relationships of EMT processes across different cell types and induction modes.

Results

Data collection of gene expression studies (GES)

To assess the similarities between published GES and define a core gene list of human EMT, we analyzed 18 independent GES of EMT. These 18 independent and published GES consisted of 24 datasets in total (Table 1). Several authors reported EMT kinetics of different cell types or dose-dependent effects of EMT inducers within single studies. Nevertheless, only the particular testing point showing the strongest effect or EMT phenotype, as reported by the authors, has been selected. Takahashi et al. published two related GES, of which one consisted of two datasets, resulting in three datasets of one independent study [12]. Taube et al. reported 5 datasets published within one GES with similar expression patterns and different modes of EMT induction [13]. Processed data (normalized and generally logarithmized data) were downloaded from the Gene expression Omnibus (GEO) and ArrayExpress (AE) databases and annotated with BioConductor and NetAffx. Numerous GES, available on GEO and AE, were excluded as they either did not provide processed data or did not contain replicates or have not been published. Due to the variety of microarray formats as well as different normalization and filtering methods used in the literature, we used processed instead of raw data in order to maintain the quality criteria applied by the authors during the data preprocessing. Two-tailed Student's t-test was used to compute p-values. Significantly up- and downregulated genes were selected to meet a fold change greater than 2 or lower than 0.5 and a p-value below 0.05.
Table 1

Gene expression studies of EMT used for meta-analysis.

First authorAcc.Ref.Cell typeCell originTreatment modalityPlatformSamples*
KeE-TABM-949 [28] EP156T/EPT2Prostatehigh cell densitya Agil WHG 4×44K G4112F2
AndarawewaGSE8240 [61] MCF10ABreastTGF-β+irradiationa Affy HTU133A3
Takahashic GSE12548/GSE15205 [12] ARPE19Retinal pigmentTGF-β+TNF-αa/TGF-β or TNF-αa Affy U133Plus23
TayGSE13759 [62] HCT116/E1Colonserial transplantationb Affy U133A3
DrakeGSE14405 [63] PC-3/TEM4-18Prostatetransendothelial migrationa Affy U133Plus22
HwangGSE14773 [19] CRCColonspheroid formationa Affy U133Plus22
SartorGSE17708 [64] A549LungTGF-βa Affy U133Plus23
PapageorgisGSE18070 [65] MCF10CA1hBreastH-Ras+carcinomab Affy U133Plus23
HillsGSE20247 [66] HK2KidneyTGF-β+Cpepa Illum HWG-6 v3.03
LeshemGSE22010 [67] PrEC-hTERTProstateAR+T/ERGa Affy HG 1.0 ST4
MicalizziGSE23655 [20] MCF7BreastSix1 vectora Affy HTU133A6
MaupinGSE23952 [68] Panc-1PancreasTGF-βa Affy U133Plus23
Taubed GSE24202 [13] HMLEBreastTGF-β1; Snail1, Twist, Gsc vectors; siRNA against E-Cadherin a Affy HTU133A3
BaniwalGSE24261 [21] PCa C4-2B/Rx2doxProstateRunx2 vectora Illum HR-8 v3.04
van ZijlGSE26391 [26] 3p/3spLivertumor cell recoveryb Affy HG 1.0 ST2
OhashiGSE27424 [29] EPC2-hTERTEsophagusNotch3 knock-down (shRNA)a Affy U133Plus23
HeslingGSE28448 [69] HMEC-TRBreastTGF-β+siRNA against TIFγa Affy U133Plus22
WangGSE28799 [70] OVCAR-3Ovaryspheroid formationa Affy U133Plus23

, lowest number of samples per class (control or test subject).

, in vitro;

, in vivo;

, consists of two studies with three datasets in total; d, consists of five datasets.

Abbreviations: Affy, Affymetrix; Agil, Agilent; AR, androgen receptor; Illum, Illumina; sh, small hairpin; si, small interfering; T/ERG, TMPRSS2/ERG; TGF, transforming growth factor; TNF, tumor necrosis factor.

, lowest number of samples per class (control or test subject). , in vitro; , in vivo; , consists of two studies with three datasets in total; d, consists of five datasets. Abbreviations: Affy, Affymetrix; Agil, Agilent; AR, androgen receptor; Illum, Illumina; sh, small hairpin; si, small interfering; T/ERG, TMPRSS2/ERG; TGF, transforming growth factor; TNF, tumor necrosis factor.

GES cluster analysis

We generated a matrix containing gene symbols across the analyzed GES (n = 14,113) that are all uniquely reported. Significantly up- and downregulated genes of each GES were transferred into the matrix according to their type of regulation. Upregulated genes were labeled with 1, downregulated genes with −1 and not differentially regulated genes with 0 (Table S1). This data distribution consisted of 88.22% not differentially regulated genes and 11.78% up- or downregulated genes and is significantly different to a binomial distribution with those parameters (p<0.0001). In order to determine a cutoff for the number of GES sharing a particular gene used for cluster analysis, the binomial distribution function provided by R as well as the preliminary hierarchical clustering results of each cutoff option were analyzed (data not shown). From this we decided to investigate the clustering of genes shared between at least 10 datasets (n = 365; p<0.0001; Figure 1). In addition, this analysis showed clusters of GES according to the mode of EMT stimulus rather than to cell type (Figure 2A). Interestingly, a more stringent clustering of genes shared between at least 14 of the analyzed GES datasets provided similar clusters, despite the fact that this list contains only 41 genes (Figure 2B and Figure S1).
Figure 1

Cluster analysis of genes shared between at least 10 GES datasets shows distinguishable and significant clusters.

Genes shared between at least 10 out of 24 datasets were used for Manhattan hierarchical clustering. The type of regulation within a particular study was visualized via heatmap. Columns: genes shared between at least 10 datasets (n = 365); rows: analyzed GES (24 datasets in total); green: downregulated genes; red: upregulated genes; black: genes not regulated. GSE: Gene expression omnibus (GEO) series record; E.TABM: ArrayExpress (AE) series record; TGF, transforming growth factor; TNF, tumor necrosis factor.

Figure 2

Gene expression studies cluster according to the mode of EMT initiation rather than to cell type.

The cell type and treatment modality of EMT was annotated and revealed clustering according to the mode of EMT induction. The clustering persisted when genes shared between at least 14 GES datasets were used for the analysis. (A) Hierarchical clustering of 365 genes shared between at least 10 datasets. (B) Hierarchical clustering of 41 genes shared between at least 14 datasets. The legend indicates cell type and treatment modality (right panel). *, Transcription factor vectors: Runx2, Six1, Snail, Twist and Goosecoid. GSE: Gene expression omnibus (GEO) series record; E.TABM: ArrayExpress (AE) series record; TGF, transforming growth factor; TNF, tumor necrosis factor.

Cluster analysis of genes shared between at least 10 GES datasets shows distinguishable and significant clusters.

Genes shared between at least 10 out of 24 datasets were used for Manhattan hierarchical clustering. The type of regulation within a particular study was visualized via heatmap. Columns: genes shared between at least 10 datasets (n = 365); rows: analyzed GES (24 datasets in total); green: downregulated genes; red: upregulated genes; black: genes not regulated. GSE: Gene expression omnibus (GEO) series record; E.TABM: ArrayExpress (AE) series record; TGF, transforming growth factor; TNF, tumor necrosis factor.

Gene expression studies cluster according to the mode of EMT initiation rather than to cell type.

The cell type and treatment modality of EMT was annotated and revealed clustering according to the mode of EMT induction. The clustering persisted when genes shared between at least 14 GES datasets were used for the analysis. (A) Hierarchical clustering of 365 genes shared between at least 10 datasets. (B) Hierarchical clustering of 41 genes shared between at least 14 datasets. The legend indicates cell type and treatment modality (right panel). *, Transcription factor vectors: Runx2, Six1, Snail, Twist and Goosecoid. GSE: Gene expression omnibus (GEO) series record; E.TABM: ArrayExpress (AE) series record; TGF, transforming growth factor; TNF, tumor necrosis factor.

Generation of the EMT-core gene list

Based on the cluster analysis of the GES, we aimed to define a meaningful EMT-core gene list which describes the majority of the involved genes across the analyzed GES. The cluster analysis of the genes shared between at least 10 datasets contained 365 genes (Table S2). However, it does not show whether a gene is up- or downregulated across different GES. Therefore, the list was filtered to keep only genes which were either up- or downregulated in at least 10 of the GES datasets. The resulting list contained 130 genes of which 67 are up- and 63 are downregulated (Table 2 and Table S3). This selection of genes could be further classified into five categories ((i) cell adhesion and migration, (ii) development, cell differentiation and proliferation, (iii) angiogenesis and wound healing, (iv) metabolism, (v) others or unclassified) according to single enrichment analysis as described below. Several genes were also present in more than one of those categories (Table S3). In conclusion, this resulting EMT-core gene list contains 130 genes which were derived from a multitude of cell types and EMT initiation methods.
Table 2

EMT-core list of 130 genes shared between at least 10 GES datasets.

UpregulatedDownregulated
Cell adhesion and migration ADAM12, CDH11, CDH2, COL1A1, COL3A1, COL5A1, COL6A1, COL6A3, CTGF, CYP1B1, DLC1, FBLN1, FBLN5, FGF2, FGFR1, FN1, HAS2, LUM, MMP2, MYL9, NID2, NR2F1, NRP1, PLAT, PPAP2B, PRKCA, RECK, SERPINE1, SERPINE2, SPOCK1, TGM2, TNFAIP6, TPM1, VCAN, WNT5ACD24, CDH1, CXADR, CXCL16, DSG3, ELF3, EPCAM, EPHA, JUP, MPZL2, OVOL2, PLXNB1, S100P, SLC7A5, SYK
Development, cell differentiation and proliferation CDKN2C, EMP3, FBN1, IGFBP3, IL1R1, LTBP1, MME, PMP22, PTGER2, PTX3, SRGN, SULF1, SYNE1, TAGLN, TUBA1A, VIM, ZEB1ABLIM1, ADRB2, ALDH1A3, ANK3, BIK CA2, CTSL2, FGFR2, FGFR3, FST, GJB3, IFI30, IL18, KLK7, KRT15, KRT17, LSR, MAP7, MBP, OCLN, PKP2, PPL, PRSS8, RAPGEF5, SPINT1
Angiogenesis and wound healing DCN, LOX, TFPI no gene with a major classification *
Metabolism ABCA1, GALNT10, SLC22A4GPX3, SLC27A2, SMPDL3B, SORL1, ST6GALNAC2
Others or unclassified C5orf13, CDK14, EML1, FSTL1, LTBP2, MAP1B, RGS4, SYT11, TMEM158AGR2, C10orf10, CDS1, FAM169A, FXYD3, KLK10, LAD1, MTUS1, PLS1, PRRG4, RHOD, SERPINB1, SLPI, TMEM30B, TPD52L1, TSPAN1, ZHX2, ZNF165

Categories have been chosen according to the GO classifications of the enrichment tools. Genes may be present in more than one category.

see Table S3 for more information.

Categories have been chosen according to the GO classifications of the enrichment tools. Genes may be present in more than one category. see Table S3 for more information.

Consistently enriched KEGG pathway and GO term analysis of the EMT-core gene list

To further analyze the EMT-core list consisting of 130 genes, a rigorous single enrichment analysis combined with stringent selection criteria was performed. First, an enriched KEGG pathway or GO term had to contain at least 5 genes from the input list and a p-value below 0.05 to be considered significant. An enumeration of significantly enriched terms and pathways is shown in Table 3. Second, a significantly enriched KEGG pathway or GO term had to be observed in at least 4 out of 5 used bioinformatic tools. Third, a consistently enriched KEGG pathway or GO term had to be identified in both the EMT-core gene list and the 365 gene list. Using these criteria, we obtained 6 KEGG pathways, 20 GO biological processes and 15 GO molecular functions consistently enriched in both lists (Table 4). The KEGG pathways consisted of the MAPK signaling pathway, axon guidance, focal adhesion, ECM-receptor interaction, regulation of actin cytoskeleton and pathways in cancer. The GO biological processes could be grouped into processes involved in tissue development, wound healing, cell migration or cell proliferation. The GO molecular functions consisted of ECM and cytoskeleton constituents, peptidase inhibitors and the binding of collagen, growth factors, heparin and integrin. As expected, the list with 365 genes comprised all significantly enriched pathways and GO terms from the 130 genes EMT-core list except for 2 GO biological processes (ECM organization and lung development). Several more KEGG pathways, GO biological processes and molecular functions could be identified in the list with 365 genes (Table 3 and 4). All these pathways, biological processes and molecular functions are well known to be involved in EMT [5], [14]–[16], and thus confirm the integrity of our EMT-core gene list. In addition, both the EMT-core list and the list with 365 genes display comparable enrichment ratios of KEGG pathways and GO biological processes (Figure 3) as well as GO molecular functions (Figure S2). Therefore, the list containing 365 genes may be considered as an amelioration of the EMT-core list by containing additional genes that might have an ambiguous role in EMT. In summary, our EMT-core list of 130 genes and its amelioration containing 365 genes show strong enrichment of EMT-relevant processes.
Table 3

Number of enriched terms and pathways in all lists detected by the enrichment tools.

Tool130 gene list365 gene listGSE13195 core listGSE24202 core list
BPMFKEGGBPMFKEGGBPMFKEGGBPMFKEGG
ConsensusPathDB305319558613162106247348
FatiGO17828945272360021722810
GeneCodis34168155454659174240487
ToppFun241211610455001127140
WebGestalt4028640403754440308

The numbers of enriched terms and pathways found by the particular enrichment tools are displayed. BP, GO biological process; MF, GO molecular function; KEGG, KEGG pathway. GSE13195 core list of Choi et al., GSE24202 core list of Taube et al. [13], [39].

Table 4

Consistently enriched GO terms and KEGG pathways and their occurrence in the analyzed gene lists.

Term IDCategoryTerm size* 130 gene list365 gene listGSE13915 core listGSE24202 core list
ToolsGenesToolsGenesToolsGenesToolsGenes
GO biological process
GO:0048646anatomical structure formation involved in morphogenesis3904244620-422
GO:0001525angiogenesis1894164380-414
GO:0007596blood coagulation1824134290-313
GO:0001568blood vessel development2884255540-520
GO:0007155cell adhesion953536576219541
GO:0016049cell growth2264134340-414
GO:0016477cell migration405532567113535
GO:0048870cell motility484433469113535
GO:0006928cellular component movement666436573116541
GO:0009790embryo development6194184460-320
GO:0008544epidermis development21851643226526
GO:0007507heart development23051542816310
GO:0009887organ morphogenesis8005215540-534
GO:0042127regulation of cell proliferation823428581114537
GO:0050793regulation of developmental process10054344880-432
GO:0009611response to wounding7765315850-434
GO:0001501skeletal system development3944144350-520
GO:0009888tissue development808438493112552
GO:0001944vasculature development2944254560-520
GO:0042060wound healing2704205500-319
GO molecular function
GO:0005509calcium ion binding10334224550-434
GO:0030246carbohydrate binding38041542917414
GO:0005518collagen binding40455120-0-
GO:0004866endopeptidase inhibitor activity179494190-49
GO:0004857enzyme inhibitor activity32741042628413
GO:0005201ECM constituent105575120-47
GO:0005539glycosaminoglycan binding1464135240-510
GO:0019838growth factor binding1274134260-514
GO:0008201heparin binding108494170-37
GO:0005178integrin binding5746590-47
GO:0030414peptidase inhibitor activity192595200-49
GO:0030247polysaccharide binding1654144270-513
GO:0032403protein complex binding1994114200-28
GO:0004867serine-type endopeptidase inhibitor activity118494140-37
GO:0005200structural constituent of cytoskeleton92585100-513
KEGG pathway
map04360axon guidance126464110-46
map04512ECM-receptor interaction92575180-15
map04510focal adhesion207495230-38
map04010MAPK signaling pathway289374150-0-
map05200pathways in cancer3294115281528
map04810regulation of actin cytoskeleton209474160-26

According to FatiGO category size in genome.

The maximum number of genes from the category present in the input list is displayed. ID, identity; GO, gene ontology; KEGG, Kyoto encyclopedia of genes and genomes. GSE13195 core list of Choi et al., GSE24202 core list of Taube et al. [13], [39].

Figure 3

The 130 genes EMT-core list and the 365 genes list exhibit comparable enrichment ratios of GO biological processes and KEGG pathways.

The enrichment ratio is the number of observed genes divided by the number of expected genes for a given term or pathway. Enrichment ratios were obtained from WebGestalt or calculated with data from FatiGO. GO, gene ontology; BP, biological process; KEGG, Kyoto encyclopedia of genes and genomes.

The 130 genes EMT-core list and the 365 genes list exhibit comparable enrichment ratios of GO biological processes and KEGG pathways.

The enrichment ratio is the number of observed genes divided by the number of expected genes for a given term or pathway. Enrichment ratios were obtained from WebGestalt or calculated with data from FatiGO. GO, gene ontology; BP, biological process; KEGG, Kyoto encyclopedia of genes and genomes. The numbers of enriched terms and pathways found by the particular enrichment tools are displayed. BP, GO biological process; MF, GO molecular function; KEGG, KEGG pathway. GSE13195 core list of Choi et al., GSE24202 core list of Taube et al. [13], [39]. According to FatiGO category size in genome. The maximum number of genes from the category present in the input list is displayed. ID, identity; GO, gene ontology; KEGG, Kyoto encyclopedia of genes and genomes. GSE13195 core list of Choi et al., GSE24202 core list of Taube et al. [13], [39].

Clinical relevance of the EMT-core gene list

The EMT-core gene list contains several genes with yet unidentified roles in cancer progression and/or EMT. We aimed to investigate the clinical relevance of this selection of genes. Therefore, we correlated their expression with overall survival of patients suffering from squamous cell lung carcinomas (SCC) [17] and pathological complete response (pCR) of breast cancer patients [18]. From the downregulated genes of the EMT-core gene list, low FXYD3 expression showed a trend to poor overall survival of SCC patients (p = 0.17) and low expression of LAD1 (p = 0.00074), SLC7A5 (p = 0.0093) and SLPI (p = 0.043) significantly correlated with worse pCR of breast cancer patients. From the upregulated genes of the EMT-core gene list, high PTX3 expression tends to poor overall survival of SCC patients (p = 0.16) and high expression of NID2 (p = 0.0091), SPOCK1 (p = 0.038) and SULF1 (p = 0.00029) significantly correlated with impaired pCR of breast cancer patients. These correlations demonstrate that the comparison of different data sets is a powerful tool to identify novel relevant target genes that do not emerge from single studies.

Discussion

Over the past decade a considerable number of GES that deal with EMT have been accumulating in the literature. These cover a variety of cell types which display EMT and include different modes of EMT induction. So far, these resources have only been partially used to compare single findings with those in the literature [8], [19], [20]. To our knowledge, no attempt has been made to investigate the majority of the independent GES of EMT for their relations to each other. Although we are aware that gene expression data of EMT are not complete, we analyzed the currently available GES to generate an EMT-core list of genes altered most frequently during the EMT process, as depicted in the flow chart (Figure S3). Cluster analysis of genes shared between at least 10 GES datasets revealed clusters of GES with the same or a similar treatment type. The GES in which EMT was induced by TNF-α either alone or in combination with TGF-β, by TGF-β alone or by different transcription factors consistently grouped together. These clusters persisted when genes shared between at least 14 datasets were used for cluster analysis. A clear clustering of different types of EMT induction, however, would have only been possible if an adequate number of GES on each of these EMT initiation methods existed. Since several treatment modalities are only represented once in the literature, such GES cluster to their most related treatment type. One cluster predominantly emerged from GES of TGF-β-induced EMT which consisted of 13 datasets. Interestingly, the cluster includes the exogenous expression of Six1 (Micalizzi et al; GSE23655; [20]) which has been shown to enhance tumor-promoting TGF-β signaling, and Runx2 (Baniwal et al; GSE24261; [21]) that acts downstream of TGF-β signaling [22]–[25]. Hence, this supports the clustering of these studies together with others using TGF-β as EMT initiator. The study by van Zijl et al. (GSE26391; [26]) described the analysis of epithelial and mesenchymal hepatocellular carcinoma cells derived from the same tumor patient. The clustering of this study along with other studies with TGF-β-induced EMT suggests an involvement of TGF-β signaling during the establishment of the mesenchymal cell line. The cluster of GES with TNF-α as EMT inducer contained the study by Takahashi et al. which analyzed the ARPE19 cell line treated with either TNF-α alone (GSE15205_TNFa), TNF-α together with TGF-β (GSE12548) or TGF-β alone (GSE15205_TGFb) in order to induce EMT [12]. The two datasets with TNF-α treatment formed a consistent cluster. However, the third dataset which was obtained from the exclusive treatment with TGF-β clustered to other GES describing EMT initiation by TGF-β. Hence, these data suggest a stronger impact of the EMT stimulus on the clustering rather than the cell type. One cluster consisted mainly of the datasets from Taube et al. (GSE24202; [13]) who reported the induction of EMT in HMLE cells using overexpression of Twist, Snail, Goosecoid and TGF-β as well as the knockdown of E-cadherin. Consistent with the data reported by Taube et al, the datasets from Snail- and Twist-induced EMT were the most similar within this cluster. This finding is concordant with the fact that Twist is a direct target of Snail [27]. The high number of datasets in this study might lead to an overrepresentation within the cluster analysis. Furthermore, the use of the same cell line as well as transcription factors with similar targets such as Twist and Snail might lead to a high level of similarity within the datasets of this particular study. The cluster comprising of Ke et al. (E-TABM-949; [28]) who utilized high cell density culturing of EPT2 cells and Ohashi et al. (GSE27424; [29]) who described a NOTCH3 knock-down in EPC2 cells displays a low level of relation to other clusters due to the unique types of EMT induction. It appears likely that on the one hand these GES form a cluster due to the lack of relationship to the other clusters. On the other hand, it might also suggest a relation of their types of EMT initiation as well. We found a variety of well-known markers of EMT upregulated in our EMT-core gene list such as CDH2, CDH11, COL1A1, COL3A1, FBLN5, FN1, HAS2, LOX, MMP2, PLAT, SERPINE1, VIM, WNT5A and ZEB1 [15], [30], [31]. Furthermore, we detected downregulated genes reported to be reduced in EMT such as ANK3, CDH1, CXADR, PRSS8 and SYK [15], [32]–[34], several downregulated epithelial cell markers such as EPCAM, JUP, KRT15, KRT17, OCLN, PKP2 and PPL [5], [15] and a number of downregulated tumor suppressors such as KLK10, MTUS1, OAS1 and SERPINB1 [35]–[38]. Together, these genes provide a solid verification of our EMT-core gene list. Besides those genes confirming the integrity of our gene list, however, genes with unknown functions as well as an unknown or unclear relation to cancer and/or EMT emerged which are novel candidates for further investigation. Upregulated genes include MAP1B, NID2, PTX3, SPOCK1, SULF1, TAGLN and TMEM158 while downregulated genes comprised ABLIM1, LAD1, FAM169A, FXYD3, SLC7A5, SLPI, TMEM30B and TPD52L1. Two meta-analyses of EMT in breast cancer considering different cell lines or types of EMT induction have been reported. These have identified EMT-core gene lists with 200 and 251 genes [13], [39], however, overlapping with approximately 10% only. Our EMT-core list containing 130 genes shows a poor overlap of 7% with the list of Choi et al. [39] but an overlap of 55% with Taube et al. [13]. Both the lists by Choi et al. and Taube et al. contain unmapped identifiers (IDs) such as array IDs, expressed sequence tags and locus IDs. We used consistently enriched pathway analysis to further investigate these gene lists. Notably, our EMT-core list displayed more enriched KEGG pathways and GO terms than the gene lists of Choi et al. and Taube et al. (Table 3 and 4). Upon reducing the stringency of analysis to two genes within an enriched category, the enrichment for the list of Choi et al. did not improve whereas nearly all KEGG pathways and GO terms enriched in our EMT-core list could be observed in the list of Taube et al. (data not shown, Table 4). The EMT-core list contains several genes with unknown functions and relations to cancer and/or EMT. We were able to show that FXYD3 and PTX3 expression is associated with poor overall patient survival in SCC patients and LAD1, SLC7A5, SLPI, NID2, SPOCK1 and SULF1 correlated significantly with impaired pCR in breast cancer patients. FXYD3 has been shown to be involved in tumor cell proliferation and to be downregulated by TGF-β signaling [40], [41]. PTX3 has been reported to be a lung cancer biomarker [42]. NID2 has been shown to be elevated during phorbol 12-myristate 13-acetate-induced invasion of several human tumor cell lines and as a potential tumor biomarker [43], [44]. SPOCK1 has been reported to be involved in neuronal attachment and matrix metalloproteinase activation [45], [46]. SULF1 has been shown to be a potential biomarker for gastric cancer which can be induced by TGF-β1 [47], [48]. LAD1 is an adaptor protein involved in ERK5 and JNK pathways [49]. SLPI has been reported to act anti-tumorigenic for certain tumors as well as to promote migration and invasion in others [50]–[52]. Hence, these genes seem to be promising candidates for further investigation. Taken together, we propose that the EMT-core list of 130 genes is highly relevant for EMT and the cluster analysis represents a useful overview on the relationships of currently available GES of EMT.

Materials and Methods

Data collection and annotation

Processed microarray data were downloaded from the websites of GEO (available: http://www.ncbi.nlm.nih.gov/geo/) and AE (available: http://www.ebi.ac.uk/arrayexpress/) by using “EMT” as keyword for published GES until February 2012. The downloaded GES were annotated to retrieve official gene symbols, EntrezID and gene names using BioConductor 2.9 (available: http://www.bioconductor.org/; accessed: 2012 Jan 02) [53] and the online tool NetAffx (available: http://www.affymetrix.com/analysis/index.affx; accessed: 2012 June 25). BioConductor was used within the R environment [54]. Annotated data was imported to MS-Excel 2010 and log2 transformed. Subsequently, fold changes and p-values using two-sided Student's t-test were calculated. Significantly up- and downregulated genes were selected and separated from each other when showing a fold change greater than 2 or below 0.5 and a p-value below 0.05. Upregulated genes were ordered from highest to lowest fold change. Vice versa, downregulated genes were arranged from lowest to highest fold change. Duplicates were removed afterwards. Gene symbols have been used for further analysis and will be referred to as genes.

Cluster analysis

The up- and downregulated genes from each study were summarized, ordered and duplicates were removed to obtain a list of all uniquely reported genes across all studies. Upregulated genes were labeled with 1 and downregulated genes were labeled with −1. Genes that were not significantly deregulated within a GES and genes which were found to be both up- and downregulated within a study were labeled with 0. The distribution of the observed number of up- and downregulated genes was tested against a binomial distribution with parameter p = 11.78% by means of a chi-squared test. We calculated the possibilities of drawing each cutoff option for cluster analysis (>1, >2, >3, and so forth) by chance with the binomial distribution function provided by R (probability = 11.78%). The possibilities to draw each cutoff option by chance were compared to preliminary cluster analyses of each cutoff option in order to determine a suitable cutoff. The clustering was performed in BioConductor 2.9 embedded in R 2.14.1 (64 bit) with the packages gdata [55], gplots [56] and heatmap.plus [57] using hierarchical heatmap clustering with Manhattan distance function.

Consistently enrichment of KEGG pathways and GO terms

The gene lists were analyzed using five different bioinformatic enrichment tools. A comprehensive overview of the used tools and their characteristics is shown in Table S4. The tools FatiGO and GeneCodis were used on the Babelomics 4 platform [58], which provided access to both programs at once. The selection criteria for significantly enriched pathways were a p-value or FDR below 0.05 and a minimum of 5 genes of the input list within an enriched category. Furthermore, consistently enriched GO terms and KEGG pathways were identified in at least 4 of 5 programs in both the EMT-core gene list and the 365 gene list. Enrichment ratios (number of observed genes divided by the number of expected genes for a GO or KEGG category) have been obtained by WebGestalt, or alternatively, have been calculated as described by Zhang et al. with the data from FatiGO [59].

Correlation of the EMT-core list with clinical data

Microarray and clinical data for patients with squamous cell lung carcinomas (n = 130) reported by Raponi et al. [17] with the accession GDS2373 were downloaded from GEO. Microarray and clinical data for breast cancer patients (n = 133) reported by Hess et al. [18] were downloaded from the MD Anderson Cancer Center website (available: http://bioinformatics.mdanderson.org/pubdata.html; accessed 2012 Sep 07). Patients were divided into high and low expressing groups for selected genes within the EMT-core list. The p-values were computed using two-sided Student's t-test. Survival analysis for the data by Raponi et al. was performed with the chi-squared test of equality using the survival package in R [60]. P-values below 0.05 were considered significant. Cluster analysis of genes shared between at least 14 GES datasets shows persistent and distinct clusters. (PDF) Click here for additional data file. The 130 genes EMT-core list and the 365 genes list exhibit comparable enrichment ratios of GO molecular functions. (PDF) Click here for additional data file. Flow chart depicting the generation of the EMT-core gene list. (PDF) Click here for additional data file. Matrix containing significantly up- and downregulated genes across the analyzed GES datasets. (XLS) Click here for additional data file. List of 365 genes significantly regulated in at least 10 GES datasets. (DOC) Click here for additional data file. EMT-core gene list of 130 up- or downregulated genes shared between at least 10 GES datasets. (DOC) Click here for additional data file. Enrichment tools used in this study and their properties. (DOC) Click here for additional data file.
  65 in total

1.  Identification of a new tumor suppressor gene located at chromosome 8p21.3-22.

Authors:  Stefan Seibold; Claudia Rudroff; Manfred Weber; Jan Galle; Christoph Wanner; Martin Marx
Journal:  FASEB J       Date:  2003-04-08       Impact factor: 5.191

2.  Suppression of membrane-type 1 matrix metalloproteinase (MMP)-mediated MMP-2 activation and tumor invasion by testican 3 and its splicing variant gene product, N-Tes.

Authors:  M Nakada; A Yamada; T Takino; H Miyamori; T Takahashi; J Yamashita; H Sato
Journal:  Cancer Res       Date:  2001-12-15       Impact factor: 12.701

3.  Expression analysis and clinical evaluation of kallikrein-related peptidase 10 (KLK10) in colorectal cancer.

Authors:  Maroulio Talieri; Dimitra K Alexopoulou; Andreas Scorilas; Dimitris Kypraios; Niki Arnogiannaki; Marina Devetzi; Matina Patsavela; Dimitris Xynopoulos
Journal:  Tumour Biol       Date:  2011-04-12

4.  SNAIL regulates interleukin-8 expression, stem cell-like activity, and tumorigenicity of human colorectal carcinoma cells.

Authors:  Wei-Lun Hwang; Muh-Hwa Yang; Ming-Long Tsai; Hsin-Yi Lan; Shu-Han Su; Shih-Ching Chang; Hao-Wei Teng; Shung-Haur Yang; Yuan-Tzu Lan; Shih-Hwa Chiou; Hsei-Wei Wang
Journal:  Gastroenterology       Date:  2011-04-16       Impact factor: 22.682

5.  Secretory leukocyte protease inhibitor is associated with MMP-2 and MMP-9 to promote migration and invasion in SNU638 gastric cancer cells.

Authors:  Baik-Dong Choi; Soon-Jeong Jeong; Guanlin Wang; Jin-Ju Park; Do-Seon Lim; Byung-Hoon Kim; Yong-Ick Cho; Chang-Seok Kim; Moon-Jin Jeong
Journal:  Int J Mol Med       Date:  2011-06-17       Impact factor: 4.101

6.  Both the Smad and p38 MAPK pathways play a crucial role in Runx2 expression following induction by transforming growth factor-beta and bone morphogenetic protein.

Authors:  Kyeong-Sook Lee; Seung-Hyun Hong; Suk-Chul Bae
Journal:  Oncogene       Date:  2002-10-17       Impact factor: 9.867

7.  Antagonistic regulation of EMT by TIF1γ and Smad4 in mammary epithelial cells.

Authors:  Cédric Hesling; Laurent Fattet; Guillaume Teyre; Delphine Jury; Philippe Gonzalo; Jonathan Lopez; Christophe Vanbelle; Anne-Pierre Morel; Germain Gillet; Ivan Mikaelian; Ruth Rimokh
Journal:  EMBO Rep       Date:  2011-07-01       Impact factor: 8.807

8.  MEK kinase 2 and the adaptor protein Lad regulate extracellular signal-regulated kinase 5 activation by epidermal growth factor via Src.

Authors:  Weiyong Sun; Xudong Wei; Kamala Kesavan; Timothy P Garrington; Ruihua Fan; Junjie Mei; Steven M Anderson; Erwin W Gelfand; Gary L Johnson
Journal:  Mol Cell Biol       Date:  2003-04       Impact factor: 4.272

9.  TMPRSS2/ERG promotes epithelial to mesenchymal transition through the ZEB1/ZEB2 axis in a prostate cancer model.

Authors:  Orit Leshem; Shalom Madar; Ira Kogan-Sakin; Iris Kamer; Ido Goldstein; Ran Brosh; Yehudit Cohen; Jasmine Jacob-Hirsch; Marcelo Ehrlich; Shmuel Ben-Sasson; Naomi Goldfinger; Ron Loewenthal; Ephraim Gazit; Varda Rotter; Raanan Berger
Journal:  PLoS One       Date:  2011-07-01       Impact factor: 3.240

Review 10.  Initial steps of metastasis: cell invasion and endothelial transmigration.

Authors:  Franziska van Zijl; Georg Krupitza; Wolfgang Mikulits
Journal:  Mutat Res       Date:  2011-05-12       Impact factor: 2.433

View more
  78 in total

1.  Loss of BRMS1 promotes a mesenchymal phenotype through NF-κB-dependent regulation of Twist1.

Authors:  Yuan Liu; Marty W Mayo; Aizhen Xiao; Emily H Hall; Elianna B Amin; Kyuichi Kadota; Prasad S Adusumilli; David R Jones
Journal:  Mol Cell Biol       Date:  2014-11-03       Impact factor: 4.272

Review 2.  Do circulating tumor cells, exosomes, and circulating tumor nucleic acids have clinical utility? A report of the association for molecular pathology.

Authors:  Bert Gold; Milena Cankovic; Larissa V Furtado; Frederick Meier; Christopher D Gocke
Journal:  J Mol Diagn       Date:  2015-05       Impact factor: 5.568

3.  Spatial and morphological reorganization of endosymbiosis during metamorphosis accommodates adult metabolic requirements in a weevil.

Authors:  Justin Maire; Nicolas Parisot; Mariana Galvao Ferrarini; Agnès Vallier; Benjamin Gillet; Sandrine Hughes; Séverine Balmand; Carole Vincent-Monégat; Anna Zaidman-Rémy; Abdelaziz Heddi
Journal:  Proc Natl Acad Sci U S A       Date:  2020-07-28       Impact factor: 11.205

4.  Gene expression profiling of tumor-initiating stem cells from mouse Krebs-2 carcinoma using a novel marker of poorly differentiated cells.

Authors:  Ekaterina A Potter; Evgenia V Dolgova; Anastasia S Proskurina; Yaroslav R Efremov; Alexandra M Minkevich; Aleksey S Rozanov; Sergey E Peltek; Valeriy P Nikolin; Nelly A Popova; Igor A Seledtsov; Vladimir V Molodtsov; Evgeniy L Zavyalov; Oleg S Taranov; Sergey I Baiborodin; Alexander A Ostanin; Elena R Chernykh; Nikolay A Kolchanov; Sergey S Bogachev
Journal:  Oncotarget       Date:  2017-02-07

5.  Pan-cancer survey of epithelial-mesenchymal transition markers across the Cancer Genome Atlas.

Authors:  Don L Gibbons; Chad J Creighton
Journal:  Dev Dyn       Date:  2017-05-04       Impact factor: 3.780

6.  Single-tubule RNA-Seq uncovers signaling mechanisms that defend against hyponatremia in SIADH.

Authors:  Jae Wook Lee; Mohammad Alsady; Chung-Lin Chou; Theun de Groot; Peter M T Deen; Mark A Knepper; Carolyn M Ecelbarger
Journal:  Kidney Int       Date:  2017-08-23       Impact factor: 10.612

7.  Decomposing Oncogenic Transcriptional Signatures to Generate Maps of Divergent Cellular States.

Authors:  Jong Wook Kim; Omar O Abudayyeh; Huwate Yeerna; Chen-Hsiang Yeang; Michelle Stewart; Russell W Jenkins; Shunsuke Kitajima; David J Konieczkowski; Kate Medetgul-Ernar; Taylor Cavazos; Clarence Mah; Stephanie Ting; Eliezer M Van Allen; Ofir Cohen; John Mcdermott; Emily Damato; Andrew J Aguirre; Jonathan Liang; Arthur Liberzon; Gabriella Alexe; John Doench; Mahmoud Ghandi; Francisca Vazquez; Barbara A Weir; Aviad Tsherniak; Aravind Subramanian; Karina Meneses-Cime; Jason Park; Paul Clemons; Levi A Garraway; David Thomas; Jesse S Boehm; David A Barbie; William C Hahn; Jill P Mesirov; Pablo Tamayo
Journal:  Cell Syst       Date:  2017-08-23       Impact factor: 10.304

8.  Silver nanoparticles alter epithelial basement membrane integrity, cell adhesion molecule expression, and TGF-β1 secretion.

Authors:  Megan E Martin; Denise K Reaves; Breanna Jeffcoat; Jeffrey R Enders; Lindsey M Costantini; Susan T Yeyeodu; Diane Botta; Terrance J Kavanagh; Jodie M Fleming
Journal:  Nanomedicine       Date:  2019-07-24       Impact factor: 5.307

9.  Gemcitabine resistant pancreatic cancer cell lines acquire an invasive phenotype with collateral hypersensitivity to histone deacetylase inhibitors.

Authors:  Betty K Samulitis; Kelvin W Pond; Erika Pond; Anne E Cress; Hitendra Patel; Lee Wisner; Charmi Patel; Robert T Dorr; Terry H Landowski
Journal:  Cancer Biol Ther       Date:  2015       Impact factor: 4.742

10.  Strong correlation between N-cadherin and CD133 in breast cancer: role of both markers in metastatic events.

Authors:  Carolin Bock; Christina Kuhn; Nina Ditsch; Regina Krebold; Sabine Heublein; Doris Mayr; Sophie Doisneau-Sixou; Udo Jeschke
Journal:  J Cancer Res Clin Oncol       Date:  2014-06-25       Impact factor: 4.553

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.