Literature DB >> 34345719

In silico assessment of EpCAM transcriptional expression and determination of the prognostic biomarker for human lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC).

Abu Tayab Moin1, Bishajit Sarkar2, Md Asad Ullah2, Yusha Araf3, Nafisa Ahmed4, Bashudev Rudra1.   

Abstract

Epithelial cell adhesion molecule (EpCAM) is a transmembrane glycoprotein which is involved in cell signaling, proliferation, maturation, and movement, all of which are crucial for the proper development of cells and tissues. Cleavage of the EpCAM protein leads to the up-regulation of c-myc, e-fabp, and cyclins A and E which promote tumorigenesis. EpCAM can act as potential diagnostic and prognostic biomarker for different types of cancers as it is also found to be expressed in epithelia and epithelial-derived neoplasms. Hence, we aimed to analyze the EpCAM gene expression and any associated feedback in the patients of two major types of lung cancer (LC) i.e., lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), based on the publicly available online databases. In this study, server-based gene expression analysis represents the up-regulation of EpCAM in both LUAD and LUSC subtypes as compared to the corresponding normal tissues. Besides, the histological sections revealed the over-expression of EpCAM protein in cancerous tissues by depicting strong staining signals. Furthermore, mutation analysis suggested missense as the predominant type of mutation both in LUAD and LUSC in the EpCAM gene. A significant correlation (P-value < 0.05) between the higher EpCAM expression and lower patient survival was also found in this study. Finally, the co-expressed genes were identified with their ontological features and signaling pathways associated in LC development. The overall study suggests EpCAM to be a significant biomarker for human LC prognosis.
© 2021 The Authors.

Entities:  

Keywords:  Biomarker; EpCAM gene; Gene expression; Gene mutation; Lung adenocarcinoma; Lung squamous cell carcinoma; mRNA expression and regulation

Year:  2021        PMID: 34345719      PMCID: PMC8319582          DOI: 10.1016/j.bbrep.2021.101074

Source DB:  PubMed          Journal:  Biochem Biophys Rep        ISSN: 2405-5808


Introduction

The malady that ensues when cells adopt bizarre behaviors i.e., disregarding anti-growth signals, being indifferent to anti-apoptotic pathways, deceiving immune surveillance, and sustaining autonomous replicative potential and angiogenesis-is called cancer [1]. Such atypical cell growth is invasive and comes right after heart diseases as the leading cause of deaths worldwide. The International Agency for Research on Cancer (IARC) has reported a rise to 19.3 million cases and 10 million deaths due to cancer in 2020 [2]. Lung cancer (LC) is one of the most prominent cancers, comprising of 11.6% of the total cancer cases around the world, affecting both male and female alike [3]. According to the Global Burden of Disease Study, the five-year survival rate of LC was 17.8% in 2020 which is significantly lower than that of other types of cancers [4]. And, LCs cause 18.4% of the total cancer deaths globally [5]. Non-small cell lung cancer (NSCLC) is the most common pathological type of LC that covers about 85% of all LCs. Squamous cell lung carcinoma (LUSC) and Lung adenocarcinoma (LUAD) are two important histopathological subtypes of NSCLC [6]. LUSC is one of the most common form of NSCLC where approximately 400,000 new cases occur annually, accounting for 20–30% NSCLCs. On the other hand, LUAD occurs more in non-smoking females and the survival rate of LUAD patients is poor because of drug resistance, failure to diagnose at an early stage and lack of efficient treatments [[7], [8], [9]]. Early diagnosis, especially in the case of LUADs, can definitely reduce the mortality rate associated with LCs. Previous studies suggest that the cell adhesion molecules like EpCAM protein have a role in cancer's intercellular and cell-extracellular matrix interactions. The differential expression i. e, upregulation or downregulation of several cell adhesion genes reveals the association between cell adhesion proteins and Differentially Expressed Genes (DEGs) [10]. For instance, CDH-1 (Cadherin-1), ITGB6 (Integrin beta 6) and DSC3 (Desmocollin-3) are found to be upregulated in breast cancer [11]. Also, CD44 and CD29 (Integrin beta-1) are downregulated in ovarian cancer whereas, ICAM-1 (Intercellular Adhesion Molecule 1) and CDH3 (Cadherin-3) are upregulated [12]. Further, colorectal cancer upregulates CDH1 and CDH3, but downregulates CDH19 (Cadherin 19) and PTPRF [13]. LC has been shown to upregulate EpCAM, ICAM-1, ITGA3 (Integrin alpha-3/beta-1), ITGB4 (Integrin alpha-6/beta-4), DSP (Desmoplakin), and DSC3 genes [[14], [15], [16]]. Thus, differential expression of DEGs could be used as possible biomarkers for detecting LCs like LUAD and LUSC [17]. A subset of cell adhesion proteins that are located on the surface of the cell and participate in cellular binding with either other cells or extracellular matrix (ECM) are called ‘cell adhesion molecules (CAMs)’ [18]. If a cell binds to another cell of the same type through these adhesion molecules, then this binding is called homotypic cell-cell adhesion [19,20]. Epithelial Cell Adhesion Molecule (EpCAM) is a trans-membrane glycoprotein of about 40 KD which consists of 314 amino acids. It is a homophilic cell-cell adhesion molecule independent of Ca2+ [21] and contains an extracellular domain (EpEX), a transmembrane domain, and an intracellular domain (EpICD) [22]. According to both in vivo and in vitro studies, EpCAM protein has pivotal roles in cell signalling, proliferation, formation, differentiation, and maintenance of organ morphology [23]. EpCAM acts as a cell surface marker on progenitor cells as well as various stem cells along with being expressed in multiple types of epithelial tissues [24,25]. During the early stages of first development such as in the morula, the EpCAM gene is expressed, although it remains tissue-specific [26]. The murine homologue of EpCAM gene shows expression on thymocytes, antigen-presenting cells, T-cells, and definitely on epithelia. The interaction between intra-epithelial lymphocytes and epithelial cells is assisted by the selective expression of the CAMs on them [27]. Therefore, the expression of murine EpCAM of non-epithelial nature may facilitate homotypic adhesive interactions between dendritic cells or epithelial cells with thymocytes. Inconsistent expression like up-regulation or down-regulation or de novo expression is demonstrated by maximum CAMs during and after a malignant transformation occurs [26]. EpCAM is overexpressed in different types of cancers including breast cancer, ovarian cancer, and head and neck squamous cell cancer [28]. Due to the overexpression of EpCAM protein molecules in cancerous cells, it appears to be important in tumorigenesis and metastasis of carcinomas, so it can act as a potential prognostic marker as well as a possible target for immunotherapeutic strategies [29]. Previous studies revealed that EpCAM acts as prognostic marker in gallbladder carcinoma, breast cancer, and colorectal cancer. Moreover, EpCAM is also referred as a universal molecular marker for the detection of circulating tumor cells (CTCs) [30]. The overexpression of EpCAM has been proved to be an indicator of lesser survival rates in breast cancer and ovarian cancer patients [31,32]. This overexpression of EpCAM can be utilized as a biomarker for detection of LUAD and LUSC as EpCAM distributes differently for squamous cell carcinomas and adenocarcinomas. This adds a good diagnostic value in the precise recognition of a certain type of carcinoma [33,34]. LUAD and LUSC are the most prevalent of NSCLCs and the disease diagnosis in the early stages is unsatisfactory. Prognostic factors are meant to determine the features of patients and the stage of the cancer before the treatment is started. Some traditional prognostic factors for survival in patients with LUAD and LUSC are performance status, stage-tumor dimension, nodal status, and weight loss [35]. Moreover, symptoms are seen in advanced stages (stage III or stage IV) in about 70% of patients diagnosed with LC [36]. The lack of symptoms in the initial stage and traditional prognosis such as determination of cancer metastasis by chest radiograph, CT scan or determining the presence of Napsin-A or TTF-1 proteins helps confirm the cancer progression. However, these methods are not satisfactory for early-prognosis and cannot lead to targeted treatment which is why survival rate is quite low [37]. Presently, there are almost no biomarkers available for detection of LC which can be applied in clinical use. This occurs either due to the lack of a robust sensitivity and specificity of the biomarkers or their functional relevance with lung carcinogenesis [36]. Thus, a biomarker is crucially required to detect LC in earlier stages, which can significantly increase the survival rate of the patients. Therefore, in this study, we hypothesize that studying gene expression of EpCAM may unveil its immunotherapy as well as prognostic value for diagnosis and treating LUAD and LUSC.

Materials and methods

Expression analysis of EpCAM in different types of cancerous and normal tissues

The expression pattern of EpCAM mRNA or transcript in numerous cancerous and their respective normal tissues was analyzed by three different databases i.e., Oncomine (https://www.oncomine.org) [38], GEPIA2 (http://gepia2.cancer-pku.cn) [39], and GENT2 (http://gent2.appex.kr) [40]. The Oncomine database is mainly used for analyzing the translational bioinformatics, which aids in discovering the highly ranked over-expressed genes from their 715 independent datasets, containing 86,733 clinical cancer samples and 12,764 standard tissue samples [38]. During the analysis, the threshold of p-value was kept at 1E-4 and the thresholds for fold change and gene rank were set at 2.0 and Top 10%, respectively. The Gene Expression Profiling Interactive Analysis or GEPIA2 webserver integrates the mRNA expression data of 9736 tumors and 8587 normal samples from The Cancer Genome Atlas (TCGA) program and the Genotype-Tissue Expression (GTEx) project. This server is widely recognized as the tool for differential expression analysis, patient survival analysis, detection of similar gene(s), correlation analysis etc. [39]. During analysis by GEPIA2 server, all the parameters were kept at their default values. And the Gene Expression database of Normal and Tumor tissues-2 (GENT2) server generates the result of expression analysis of a particular gene or mRNA by analyzing more than 68,000 samples stored in its database. This server has a user-friendly interface that uses the Apache Lucene indexing and Google Web Toolkit (GWT) framework to predict the results [40]. Like GEPIA2, during the experiment, all the parameters were kept default in GENT2 webserver.

Expression analysis of EpCAM in cancerous and normal lung tissues

Four different web servers were used to analyze the expression of EpCAM in normal and cancerous human lung tissues i.e., Oncomine [38], GEPIA2 [39], UALCAN (http://ualcan.path.uab.edu/) [41], and HPA (https://www.proteinatlas.org/) [42]. UALCAN is a user-friendly and interactive web server that allows the users to identify novel biomarkers and perform various in silico analyses of potential genes of interest by providing free access to the publicly available cancer OMICS data, such as the TCGA database [41]. This server was used to determine the relative expression pattern of EpCAM in both the target LUAD and LUSC samples from the TCGA database. The Human Protein Atlas (HPA) is a publicly available, Swedish-based project which was initiated in 2003. The aim of this project is to map all the human proteins in cells, tissues and organs using an integration of different types of omics technologies, for example, mass spectrometry-based proteomics, transcriptomics, antibody-based imaging etc. [42]. The HPA database is the official webserver of this project. In the HPA database, a visual comparison was made between a cancerous lung tissue (with lung-specific antibody HPA026761) and a normal lung tissue (without any lung-specific antibody CAB030012). In all these servers, default parameters were used to predict the results and the p-value less than 0.05 was considered statistically significant.

Analysis of the association of EpCAM expression with different clinical features and promoter methylation

To analyze the relationship of EpCAM gene expression with different types of clinical features, the UALCAN server [41] was again used in this step. In this study, both the LUAD and LUSC cancerous tissues were compared with normal tissues and numerous features i.e., individual cancer stages, patient's race, patient's gender, patient's age, patient's smoking habit, tumor histology, nodal metastasis status, and T53 mutation status, were noted. Thereafter, to analyze the DNA methylation pattern, the UCSC Xena Functional Genomic Explorer (https://xenabrowser.net/) was utilized which provides the users to analyze the functional genomic data for determining the correlations between numerous genomic and phenotypic variables [43]. The server generated the methylation pattern of EpCAM gene by analyzing 877 LUAD and 765 LUSC samples from GDC TCGA database. From the server, the Illumina Human Methylation 27 and Illumina Human Methylation 450 patterns were analyzed keeping all the parameters default.

Analysis of mutations and copy number alterations in the EpCAM gene

The cbioportal server (https://www.cbioportal.org/) was used to determine the pattern of the mutations and alterations in copy number of the EpCAM gene. The server is a freely available online server designed to visualize and interpret multidimensional cancer genomics datasets [44]. In our analysis, total 2983 LUAD samples and 1176 LUSC samples from different databases available in the server, were utilized to find out the anomalies or mutations and copy numbers of the EpCAM gene using the GISTIC algorithm (Genomic Identification of Significant Targets in Cancer), keeping all the parameters default. The server predicts the probable mutations in different regions along the EpCAM gene, which might be responsible for cancer development.

Determining the relationship between EpCAM gene expression and the survival of LC patients

The relationship between the expression of EpCAM gene and the survival of patients with LC was determined by the PrognoScan server (http://dna00.bio.kyutech.ac.jp/PrognoScan/). PrognoScan server is a database for meta-analysis of the prognostic values of genes. It searches the publicly available microarray datasets to establish a relationship between the expression of a target gene and the prognosis of the disease caused by that particular gene such as overall survival (OS) and disease free survival (DFS) [45]. The server plots the expression levels of a gene using the Kaplan-Meier statistical method. During the analysis, default parameters were used and any p-value <0.05 was considered as the statistically significant value.

Determination of the genes co-expressed with EpCAM in LC tissues

The profiling of genes co-expressed with EpCAM was performed by three different servers i.e., Oncomine [38], GEPIA2 [39], and UCSC Xena web browser [43]. At first, the Oncomine server was used to find out the co-expressed genes. The server generates the correlation scores and the co-expressed genes are expected to have better correlation scores. Thereafter, the GEPIA2 server was exploited to analyze the correlation between EpCAM gene and the gene that was found to be highly co-expressed with EpCAM in the Oncomine server. GEPIA2 server generated the p-values when presenting the relation of the two selected genes and any p-value with less than 0.05 value was considered as statistically significant. Finally, the UCSC Xena web browser was used to predict the gene expression pattern of the two selected genes in LC patients. The co-expression analysis was conducted for both LUAD and LUSC.

Determination of gene ontology and signaling pathways of EpCAM and its related genes in LC development

The gene ontology (GO) and cell signaling pathways of EpCAM gene were retrieved from the Enrichr server (http://amp.pharm.mssm.edu/Enrichr/). This server takes the advantage of using the enrichment analysis inferring knowledge about a target gene set by comparing it to multiple genomics datasets representing prior biological knowledge [46,47]. In this analysis, we used the EpCAM gene and all the co-expressed genes from the previous step to find out the Go terms i.e., GO biological process, GO molecular function, and GO cellular component. And the signaling pathways were also determined from BioPlanet 2019, KEGG 2019 Human, and Reactome 2016 databases via the Enrichr web server.

Results

Analysis of the mRNA expression of EpCAM in different types of cancerous and normal tissues

The differential expression of EpCAM transcripts or mRNAs in different cancer tissues and their respective healthy counterparts was determined using three web-based servers, i.e., Oncomine, GEPIA2, and GENT2 servers. In the Oncomine server, the statistically significant results were found in only 66 studies ranked in the top 10%, among the 437 unique datasets. The up-regulation of EpCAM was evident in numerous malignancies including lung, breast, bladder, cervical, esophageal, gastric, head and neck, liver, ovarian, prostate, and other cancers. Likewise, EpCAM was found to be highly expressed in both LUAD and LUSC patients while downregulated in the patients of Brain, CNS, Kidney Cancer, Leukemia, Melanoma, and Sarcoma (Fig. 1A). Fig. 1B illustrates the comparative expression of EpCAM in 33 different human cancers and their corresponding normal tissues. Furthermore, the overexpression of EpCAM in various cancers was revealed in another analysis using the Affymetrix HG-U133pLUS2 (Fig. 1Ci) and HG-U133A (Fig. 1Cii) platform of the GENT2 database as portrayed in Fig. 1C. Therefore, these outcomes provided strong shreds of evidence regarding the higher expression of EpCAM gene in the LC tissues when contrasted with the normal, non-cancerous tissues (Fig. 1).
Fig. 1

Pattern of the tissue-wide expression of EpCAM in different human cancers, (A) comparison between cancer versus normal tissues in which high and low expression of mRNA has been indicated by red and blue colors, respectively, (B) the dot plot depicts the gene expression profile of the EpCAM gene in 33 different types of human cancers including tumors and normal tissue samples together. Herein, red and green dashed lines indicate the average expression value in all tumor and normal tissues, respectively, and (C) the box-plot showing the EpCAM mRNA expression in tumors and respective normal tissues using the Affymetrix HG-U133pLUS2 (Ci) and HG-U133A platforms (Cii) of the GENT2 database where the boxes indicate the median, the dots represent outliers, the red-boxes refer to the tumor tissues, whereas the blue-boxes represent the expression of normal tissues. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Pattern of the tissue-wide expression of EpCAM in different human cancers, (A) comparison between cancer versus normal tissues in which high and low expression of mRNA has been indicated by red and blue colors, respectively, (B) the dot plot depicts the gene expression profile of the EpCAM gene in 33 different types of human cancers including tumors and normal tissue samples together. Herein, red and green dashed lines indicate the average expression value in all tumor and normal tissues, respectively, and (C) the box-plot showing the EpCAM mRNA expression in tumors and respective normal tissues using the Affymetrix HG-U133pLUS2 (Ci) and HG-U133A platforms (Cii) of the GENT2 database where the boxes indicate the median, the dots represent outliers, the red-boxes refer to the tumor tissues, whereas the blue-boxes represent the expression of normal tissues. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Expression of EpCAM transcript in human LC tissues

The Oncomine webserver was used to analyze the EpCAM gene expression for each subtype of LC compared to the normal tissues. The results revealed the overexpression of EpCAM in two different LCs i.e., LUAD and LUSC (Fig. 2Ai-vi and Table 1). Further assessment of TCGA datasets with UALCAN and GEPIA2 servers also showed considerable up-regulation of EpCAM in both LUAD and LUSC (Fig. 2Bi, 2Bii and 2C). In addition to that, the relative immunohistochemistry analysis was performed between normal and cancer tissues using the HPA database. While comparing the staining signals between normal and cancerous lung tissues, the signal was found to be moderate to weak in normal alveolar cells, whereas both LUAD and LUSC tissue samples showed strong staining signals as depicted in Fig. 2D, E and 2F, respectively. The results indicate that the EpCAM expression is relatively higher in cancerous tissues than the normal tissues (Fig. 2 and Table 1).
Table 1

The expression of EpCAM in the two subtypes of LC from the Oncomine database.

DatasetParametersSamplesP- valueGene RankFold Change
Talbot Lung (n = 93)Normal Lung2
Squamous Cell Lung Carcinoma341.94E-93024.253
Su Lung (n = 66)Normal30
Lung Adenocarcinoma271.29E-62922.157
Strearman Lung (n = 39)Normal19
Lung Adenocarcinoma201.27E-8722.337
Selamat Lung (n = 116)Normal58
Lung Adenocarcinoma582.18E-201242.413
Landi Lung (n = 107)Normal49
Lung Adenocarcinoma583.09E-191002.197
Beer Lung (n = 96)Normal10
Lung Adenocarcinoma861.46E-10272.569
Fig. 2

Analysis of the EpCAM expression in LUAD and LUSC in which (A) box-plots showing comparative expression between normal (left) and cancer tissue (right) - for LUAD and LUSC (Ai-Avi), (B–C) box-plots showing the expression of EpCAM mRNA based on sample types in tumor tissue and normal tissues for LUAD (Bi) and LUSC (Bii), using the UALCAN and GEPIA2 (C) servers, respectively, and (D–F) the immunohistochemistry images of the expression of EpCAM in LUAD (D) and LUSC (E) tissues as well as normal tissue (F) retrieved from the HPA database. The number within the parenthesis in Fig. 2 (Ai-Avi) represents the number of the samples deposited in the database.

The expression of EpCAM in the two subtypes of LC from the Oncomine database. Analysis of the EpCAM expression in LUAD and LUSC in which (A) box-plots showing comparative expression between normal (left) and cancer tissue (right) - for LUAD and LUSC (Ai-Avi), (B–C) box-plots showing the expression of EpCAM mRNA based on sample types in tumor tissue and normal tissues for LUAD (Bi) and LUSC (Bii), using the UALCAN and GEPIA2 (C) servers, respectively, and (D–F) the immunohistochemistry images of the expression of EpCAM in LUAD (D) and LUSC (E) tissues as well as normal tissue (F) retrieved from the HPA database. The number within the parenthesis in Fig. 2 (Ai-Avi) represents the number of the samples deposited in the database.

Association of EpCAM expression with clinical characteristics of LC patients

The relationship between EpCAM gene expression and the clinical representation of LUAD and LUSC patients along with the control healthy individuals was observed using the TCGA dataset via the UALCAN database. Up-regulation of EpCAM for LUAD resulted in different parameters including individual cancer stages, patient's race, gender, age, smoking habit, tumor histology, nodal metastasis, and TP53 mutation status, as depicted in Fig. 3 and listed in Table 2a. In the analysis, the increased expression of EpCAM was the highest in a fluctuating manner for LUAD (Fig. 3Ai) and in an incremental manner for LUSC (Fig. 3Aii) in the middle stages of LC. Also, the expression of EpCAM was found to be the highest among the patients from Asia for LUAD (Fig. 3Bi) and Caucasian regions for LUSC (Fig. 3Bii). The increased expression level was almost the same in male and female patients for both LUAD and LUSC (Fig. 3Ci, ii). Moreover, Fig. 3Di and ii shows higher expression in the middle age group (41–80 years old) for LUAD, whereas the expression was higher in the age group of 41–60 years old for LUSC. Up-regulation of EpCAM expression can also be seen considering other clinicopathological parameters, including smoking habit (Fig. 3Ei, ii), tumor histology (Fig. 3Fi,ii), nodal metastasis (Fig. 3Gi,ii), and TP53 mutation status (Fig. 3Hi, ii) (Table 2ab). These results indicate that the expression of EpCAM and clinical characteristics is significantly higher in LC compared to the healthy, normal individuals.
Fig. 3

The analysis of EpCAM expression with clinical characteristics of LUAD and LUSC patients. The EpCAM mRNA expression in LUAD (Ai-Hi) and LUSC (Aii-Hii) showing individual cancer stage, patient's race, gender, age group, smoking habit, histological subtypes, nodal metastasis, and TP53 status, respectively.

Table 2

aThe relationship between EpCAM and numerous clinicopathological features of LUAD.

EpCAM expression based onFeaturesExpression of mRNANumber of samplesStatistical significance (p-value)
Individual cancer stagesNormalUnderexpression59
Stage-1Overexpression277<1E-12
Stage-2Overexpression1251.624E-12
Stage-3Overexpression85<1E-12
Stage-4Overexpression282.237E-06
Patient's raceNormalUnderexpression59
CaucasianOverexpression387<1E-12
African AmericanOverexpression511.728E-12
AsianOverexpression82.996E-04
Patient's genderNormalUnderexpression59
MaleOverexpression2381 .624E-12
FemaleOverexpression276<1E-12
Patient's ageNormalUnderexpression59
21-40 YrsOverexpression125.362E-02
41–60Overexpression901.624E-12
61–80Overexpression149<1E-12
81–100Overexpression322.480E-10
Patient's smoking habitNormalUnderexpression59
Non smokerOverexpression75<1E-12
SmokerOverexpression118<1E-12
Reformed Smoker 1Overexpression135<1E-12
Reformed Smoker 2Overexpression1681.624E-12
Histological subtypesNormalUnderexpression59
NOSOverexpression8201.624E-12
MixedOverexpression1071.624E-12
Clear CellOverexpression2<1E-12
LBC-Non MucinousOverexpression192.341E-06
Solid Pattern PredominantOverexpression51.921E-03
AcinarOverexpression185.900E-05
LBC-MucinousOverexpression54.740E-02
Mucinous carcinomaOverexpression107.959E-03
PapillaryOverexpression238.074E-08
MucinousOverexpression23.905E-01
MicropapillaryOverexpression35.009E-02
Nodal metastasis statusNormalUnderexpression59
NOOverexpression3311.624E-12
N1Overexpression961.624E-12
N2Overexpression74<1E-12
N3Overexpression23.781E-08
TP53 mutation statusNormalUnderexpression59
TP53-MutantOverexpression2331.624E-12
TP53-NonMutantOverexpression279<1E-12
The analysis of EpCAM expression with clinical characteristics of LUAD and LUSC patients. The EpCAM mRNA expression in LUAD (Ai-Hi) and LUSC (Aii-Hii) showing individual cancer stage, patient's race, gender, age group, smoking habit, histological subtypes, nodal metastasis, and TP53 status, respectively. aThe relationship between EpCAM and numerous clinicopathological features of LUAD.

Analysis of promoter methylation of LC from TCGA dataset

DNA methylation is crucial to understand because when it is found in a gene promoter, it acts to inhibit or lower the transcription process. Thus, promoter methylation is an important factor to be considered while studying cellular development, gene silencing, mRNA expression, tissue differentiation, genetic imprinting etc [48]. Also, hypermethylation of high-density CpG regions or genome-wide hypomethylation has been proved to be correlated with numerous types of cancer. Hence, the correlation between EpCAM expression and DNA methylation for both LUAD and LUSC was analyzed using two different methylation assays available in the server i.e., Illumina Human Methylation 27 and Illumina Human Methylation 450. The result in the heat maps revealed a likewise negative relation between the EpCAM expression and some CpG islands which indicates the overexpression of EpCAM and for both genes methylation was found to be responsible for mutation and possibly, cancer formation (Fig. 4).
Fig. 4

Promoter methylation of the EpCAM gene in LC tissues. Heat map of EpCAM expression and DNA methylation status for LUAD (A) and LUSC (B).

Promoter methylation of the EpCAM gene in LC tissues. Heat map of EpCAM expression and DNA methylation status for LUAD (A) and LUSC (B).

Analysis of mutations, copy number alterations, and expression of mutant EpCAM transcript

Alterations of the EpCAM gene in both types of LCs covering a total of 2983 samples from 8 LUAD studies, and 1176 samples from 3 LUSC studies were determined using cBioPortal database (Table 3). The EpCAM gene was found to be altered in 28 (<1%) among the quarried LUAD samples with an achieved somatic mutation frequency of 0.5%. A total of 14 mutations including 13 missenses and 1 splice mutations located within 1–314 residues of Thyroglobulin_1 domain were identified. All these mutations took place within 47600710 to 47607038 base pair of chromosome 2. Interestingly, the highest mutation in both LUAD and LUSC was the missense mutation, occurring in a total of 1765 (Fig. 5Ai) and 977 samples (Fig. 5Aii), respectively. Furthermore, in the alteration frequency analysis, LUAD TCGA pub occupied the highest frequency of 2.17% of 230 cases, among the six categories of the studies (Fig. 5Bi). Finally, profiling of mutated EpCAM mRNA expression was conducted where eleven cases of mutation including ten missense and one non-sense were found in LUAD (Fig. 5Ci). Hence, the results suggested that alterations found in EpCAM protein in the quarried LUAD tissues are not possibly responsible for the over expression of EpCAM. Again, the EpCAM gene was altered in 42 (4%) of quarried LUSC samples with a somatic mutation frequency of 0.3%. A total of 4 missense mutations were detected in patients with multiple samples of LUAD. In the alteration frequency analysis, the highest frequency was reported (3.37% of 178 cases) for LUSC TCGA pub among the three categories of the studies (Fig. 5Bii). Thereafter, the mutated EpCAM mRNA expression was profiled for LUSC and the result revealed three cases of analysis all of which were missense mutations (Fig. 5Cii). Therefore, these findings suggested that the overexpression of EpCAM in LUSC might not be correlated with mutations or copy number alterations found in EpCAM protein just like the LUAD samples (Fig. 5). This also provided indication that mutations or alterations in EpCAM gene might not be pivotal in causing LC even though this could be a case that is needed further studies.
Table 3

A list of EpCAM mutational positions and types in LUAD and LUSC from the TCGA dataset.

Sample IDCancer TypeProtein ChangeMutation TypeChromosome NoStart PositionEnd PositionNumber of Samples
TCGA-78-8662-01Lung AdenocarcinomaX62_spliceSplice247600710476007101014
LUAD-B00416Lung AdenocarcinomaG222VMissense24760691547606915803
LUAD-S01315Lung AdenocarcinomaN91SMissense247601034476010341765
BGI-RS55Lung AdenocarcinomaQ204EMissense24760614647606146107
TCGA-44-4112-01Lung AdenocarcinomaG263VMissense24760703847607038371
TCGA-44-6779-01Lung AdenocarcinomaA210TMissense2476061644760616489
TCGA-78-7163-01Lung AdenocarcinomaD98EMissense2476010564760105624
TCGA-44-4112-01Lung AdenocarcinomaG263VMissense24760703847607038434
TCGA-44-6779-01Lung AdenocarcinomaA210TMissense2476061644760616491
TCGA-78-7163-01Lung AdenocarcinomaD98EMissense2476010564760105649
TCGA-86-8279-01Lung AdenocarcinomaE137AMissense24760117247601172458
TCGA-44-4112-01Lung AdenocarcinomaG263VMissense24760703847607038389
TCGA-44-6779-01Lung AdenocarcinomaA210TMissense2476061644760616493
TCGA-78-7163-01Lung AdenocarcinomaD98EMissense2476010564760105627
TCGA-37-3789-01Lung Squamous Cell CarcinomaM294IMissense24761232847612328379
TCGA-98-8021-01Lung Squamous Cell CarcinomaE147QMissense24760238647602386930
TCGA-22-4604-01Lung Squamous Cell CarcinomaP3SMissense24759665147596651225
TCGA-37-3789-01Lung Squamous Cell CarcinomaM294IMissense24761232847612328977
Fig. 5

Genetic alteration and mutations of EpCAM in LUAD and LUSC tissues. Herein (Ai) lollipop plot shows the type of alteration in fourteen mutation spots within the peptide sequence (1–314 residues) of Thyroglobulin_1 domain in EpCAM from LUAD tissues, (Aii) depicts the alterations in only four mutation spots in EpCAM from LUSC tissues, (Bi and Bii) bar diagrams show the mutation frequencies and genome alteration in the EpCAM gene for LUAD and LUSC, respectively, and (Ci and Cii) indicate the correlation between the expression and copy number alteration of EpCAM for LUAD and LUSC in the TCGA dataset.

A list of EpCAM mutational positions and types in LUAD and LUSC from the TCGA dataset. Genetic alteration and mutations of EpCAM in LUAD and LUSC tissues. Herein (Ai) lollipop plot shows the type of alteration in fourteen mutation spots within the peptide sequence (1–314 residues) of Thyroglobulin_1 domain in EpCAM from LUAD tissues, (Aii) depicts the alterations in only four mutation spots in EpCAM from LUSC tissues, (Bi and Bii) bar diagrams show the mutation frequencies and genome alteration in the EpCAM gene for LUAD and LUSC, respectively, and (Ci and Cii) indicate the correlation between the expression and copy number alteration of EpCAM for LUAD and LUSC in the TCGA dataset.

EpCAM expression and clinical prognosis of LC patients

The relationship between the level of EpCAM expression and patient's survival in LC was analyzed using the PrognoScan database (significant level was kept at P-value < 0.05 and HR > 1). The analysis showed a negative correlation of EpCAM expression with the survival of the patients. According to the analysis, high expression of EpCAM might lead to lower survival rate whereas low or normal expression of the protein should enhance the survival rate of the patients. For dataset GSE31210 (Number of samples = 204), patients with low EpCAM expression (n = 105 and 163) had significantly higher survival probability whereas higher expression (n = 99 and 41) of EpCAM was responsible lower survival rate in the patients as illustrated in Fig. 6A and B. The results revealed that the enhanced expression of EpCAM could be responsible for relatively poor prognosis in LC patients.
Fig. 6

The in Kaplan-Meier plot represents the relationship between EpCAM gene expression and survival of LC patients. The survival curves demonstrate patients' survival with the high (red) and low (blue) expression of EpCAM in the plots where (A–B) showing overall survival, and relapse-free survival, respectively. The analysis was focused on the EpCAM expression in LC patients. Here, HR, hazard ratio; CI, confidence interval. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

The in Kaplan-Meier plot represents the relationship between EpCAM gene expression and survival of LC patients. The survival curves demonstrate patients' survival with the high (red) and low (blue) expression of EpCAM in the plots where (A–B) showing overall survival, and relapse-free survival, respectively. The analysis was focused on the EpCAM expression in LC patients. Here, HR, hazard ratio; CI, confidence interval. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Analysis of gene signatures linked to EpCAM and human LC

The co-expression profile of EpCAM was analyzed with 19 genes using a total of 16 samples from LUAD and LUSC patients through the Oncomine database (Fig. 7A). The HOXB7 was found to be mostly co-expressed (R = 0.998) among the total 19 genes. The EpCAM and HOXB7 were found to be positively correlated with the Spearman coefficient in both LUAD (R = 0.14) (Fig. 7Bi) and LUSC (R = 0.26) (Fig. 7Bii) using GEPIA2 server. Furthermore, the Pearson and Spearman correlation analyses were performed to ensure the positive correlation between EpCAM and HOXB7 in LC patients using TCGA data through UCSC Xena server (Fig. 7C–D). For the LUAD (Fig. 7Ci-Di) and LUSC (Fig. 7Cii-Dii) patients, the Pearson correlation values were found to be 0.1243 and 0.2117, respectively. Besides this, the Spearman correlation values were also reported to be 0.1513 and 0.2894, respectively. Therefore, these results suggest that these two genes might be interlinked in different signaling pathways in LC progression.
Fig. 7

Co-expression profile of the EpCAM and co-expressed genes in human LC. The figure shows (A) co-expression profile of EpCAM derived from the Oncomine database, (B) correlation analysis between EpCAM and HOXB7 obtained by GEPIA2 server, (C) heatmap of mRNA expression for EpCAM and HOXB7 genes across LC in the TCGA database, and (D) co-expression analysis between EpCAM and HOXB7 genes in LC using UCSC Xena server.

Co-expression profile of the EpCAM and co-expressed genes in human LC. The figure shows (A) co-expression profile of EpCAM derived from the Oncomine database, (B) correlation analysis between EpCAM and HOXB7 obtained by GEPIA2 server, (C) heatmap of mRNA expression for EpCAM and HOXB7 genes across LC in the TCGA database, and (D) co-expression analysis between EpCAM and HOXB7 genes in LC using UCSC Xena server.

Determination of gene ontologies and signaling pathways linked to EpCAM and LC progression

Based on the EpCAM and correlated genes, signaling pathways and gene ontological features that lead to the progression of LC in humans were identified. For pathway determination, the results from three databases depicted in Fig. 8A–C were considered. In the analysis of KEGG human 2019 database, different significant pathways were found including mucin type O-glycan biosynthesis and branched chain amino acid (BCAA) metabolism i.e., degradation of valine, leucine and isoleucine. (Fig. 8A). Furthermore, Reactome 2016 database showed pathways related to amino acid catabolism, protein metabolism, activation of gene expression by RORA, BMAL1:CLOCK, NPAS2, and SREBF (SREBP), YAP1- and WWTR1 (TAZ)- stimulated gene expression, mitochondrial biogenesis, regulation of transcription by NOTCH1 intracellular domain, etc. (Fig. 8B). Finally, the analysis of the Bioplanet 2019 database revealed the p38 alpha/beta MAPK downstream pathway as well as some other significant pathways related to aflatoxin B1 metabolism, benzo(a)pyrene metabolism, amino acid catabolism, Rho-mediated activation of SRF, eicosanoid metabolism, YAP1- and WWTR1 (TAZ)-stimulated gene expression, O-glycan biosynthesis, and alpha-synuclein signaling (Fig. 8C). These pathways above, therefore, are anticipated to be involved in the progression of LC. After that, the GO terms were also considered and determined for the corresponding genes. The suggested GO features in the analysis mainly include myelination, membrane raft polarization and distribution, regulation of cell-cell adhesion, protein autoprocessing, regulation of ion transport, protein and fat metabolic process (Fig. 8D), hydrolase, dehydrogenase, peptidase and transferase (Fig. 8E), and mitotic spindle activity (Fig. 8F). The cell signaling pathway where EpCAM gene is involved is depicted in Fig. 8G. These results suggested a possible correlation of EpCAM and its correlated genes in the development and progression of LC.
Fig. 8

Analysis of the pathways and gene ontologies related to EpCAM expression and LC. Pathways and ontologies are achieved from (A) KEGG human 2019, (B) Reactome 2016, (C) BioPlanet 2019, (D) GO biological process 2018, (E) GO molecular function 2018, and (F) GO cellular component 2018. The length and the color gradient of the bar represent the level of significance; brighter color indicates the more significant term and vice versa. (G) Figure representing the cell signaling network where EpCAM is involved. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Analysis of the pathways and gene ontologies related to EpCAM expression and LC. Pathways and ontologies are achieved from (A) KEGG human 2019, (B) Reactome 2016, (C) BioPlanet 2019, (D) GO biological process 2018, (E) GO molecular function 2018, and (F) GO cellular component 2018. The length and the color gradient of the bar represent the level of significance; brighter color indicates the more significant term and vice versa. (G) Figure representing the cell signaling network where EpCAM is involved. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Discussion

LC has the highest cancer-related mortality rate worldwide and is infamously known for its appearance as one of the most frequently diagnosed malignant tumors [49,50]. NSCLC, consisting of two subtypes i.e., LUAD and LUSC, alone constitutes of an estimated 85% of all LC cases [50]. Hence, early detection and correct prognosis of LC are pivotal to understand a patient's illness circumstances and set up further appropriate treatment approaches, which is why the accurate prediction is crucial to both patients and physicians and researchers [51,52]. According to previous reports, LC in the majority of the patients (approx. 75%) seem to already reach the advanced stage (stage III/IV) during diagnosis which only puts a greater emphasis on the gravity of early diagnosis in case of LC [53]. For instance, surgical resection of NSCLC presents a favorable prognosis in case of small, localized tumors (stage I) with 5-year survival rates of 70–90% [[54], [55], [56]]. This suggests a strong correlation between the early diagnosis of LC and enhanced survival rates. Therefore, the early diagnosis of LC still remains a major deciding factor for optimal outcomes, even with the addition of contemporary advancements and breakthroughs [57]. In this study, the significance of EpCAM as a prognostic marker in the early determination of two NSCLC i.e., LUAD and LUSC using bioinformatics was evaluated [33]. Based on the multiple databases, it can be concluded that the expression level of EpCAM is positively correlated with the progression of LC. The analysis of EpCAM expression in LUAD and LUSC displayed a negative correlation with all cases of overall survival, diseases free survival and relapse-free survival with the overall HR > 1. It was also observed that high EpCAM level showed a detrimental effect on the survival rate than those with lower EpCAM levels. A previous study suggested poor prognosis in LC patients due to increased expression of EpCAM which further validated this observation [33]. Again, the expression patterns of EpCAM in the two cancer tissues were also found to be significantly associated with different clinical characteristics of LC patients including, tumor histology, patient's race, gender, age, smoking habit, nodal metastasis status, etc. Therefore, analysis of these features requires further investigation since a high expression level of EpCAM might suggest a risk of cancer transformation and progression. Computer-aided systems and digital imaging, combined with the methodological aspects of modern immunohistochemistry, provide excellent insights into immunohistochemical scoring [33,58,59]. The immunohistochemical data of EpCAM exhibited a strong nuclear immunoreactivity of EpCAM in every target LC cells. Robust and intense staining of cancer cells that distinguish them from normal alveolar cells, helped to recognize the higher levels of EpCAM expression in LUAD and LUSC tissues. Varied alterations of four categories i.e., somatically acquired genetic, epigenetic, transcriptomic, and proteomic alterations, constitute a series of histopathological process that contribute to cancer progression [58]. Any of these alterations in the genomic region (either loss or gain) can result in either suppressive or oncogenic effects [60]. The cBioPortal webserver was utilized to explore the copy number alterations, mutations, and mutant mRNA expressions of EpCAM. The EpCAM gene was found to be altered in 28 (<1%) of quarried LUAD samples with 0.5% somatic mutation frequency, whereas alteration was reported in 42 (4%) of the LUSC samples with 0.3% somatic mutation frequency. The highest mutation in both LUAD and LUSC was found to be a missense mutation, occurring in total of 1765 and 977 samples, respectively. This study also investigated the potential correlation between DNA methylation and EpCAM expression which revealed a negative correlation between them. Transcriptional silencing of genes usually occurs as an outcome of DNA methylation that are found mainly on the CpG islands of the promoters of the genes. Following this prospect, several studies on the methylation of promoter in multiple cancer types including LC were conducted, which implied that the expression of EpCAM is inversely correlated with DNA methylation in tissues from cancer patients [61]. Through further co-expression and correlation analysis, it was found that 19 genes generated a positive correlation with the EpCAM gene and the homeobox B7 (HOXB7) was found to be positively correlated with EpCAM expression (R = 0.998). Analysis conducted by other servers also confirmed the relationship between EpCAM and HOXB7. A dysregulated HOXB7 was suggested to act as a critical player in regulating tumorigenesis and metastases of some cancers. A relevant previous study found the increased expression of HOXB7 to be associated with poor clinical outcomes in LUAD patients leading to significant correlation with short survival time [62]. Finally, the correlated genes were used to analyze the possible EpCAM related pathways responsible for the development of LC. In the KEGG pathway analysis, the correlated genes were mostly related to the O-glycan biosynthesis and BCAA metabolism [63]. The observation seems reasonable as sometimes the process of oncogenesis is dependent on cellular energy and metabolites provided by amino acids that are degraded by overexpressed enzymes in many cancer types [64,65]. In the GO analysis, the widely used ontology term was myelination. Cognitive impairment is a common consequence of chemotherapy where myelination is reportedly claimed as an underlying factor according to previous studies [66]. Furthermore, membrane raft polarization facilitates another significant correlated molecular function which includes cell adhesion and migration. The cell adhesion disorders and aggressive phenotypes of migration and invasion constitute the malignant phenotype of cancer which is modulated by the dynamic feature of the cancer cell surface. Lipid rafts are also recently found to contribute to the cancer cell adhesion and migration [67]. Overall, the pathways and GO enrichment analysis portrayed the significance of EpCAM and its correlated genes in different oncogenic processes leading to LC development. Overall, this study supports the idea that the expression of EpCAM gene could be used as the prognostic biomarker for the early detection of human LUAD and LUSC cases. However, more in vivo and in vitro studies are needed to finally confirm the outcome of this study.

Conclusion

In this study, we targeted the molecular signatures that play key roles in the development and progression of LUAD and LUSC. In cancer, prognostic factors are important for early diagnosis as well as efficient treatment and thus help patients prevent the risk of overtreatments. To determine the potency of EpCAM as a potential prognostic marker in LC development, the mRNA expression, DNA methylation, mutations and CNAs, correlated genes and the prognostic features were analyzed in this study. The analyses exhibited a sharp overexpression as well as a potential correlation of EpCAM to the LC development. Furthermore, it also indicates the probable signaling pathways and gene ontological features related to EpCAM and its expression in LC progression. These pathways could be the significant target to interfere with the development of cancer. For this reason, EpCAM is suggested to be an effective biomarker as well as a potential therapeutic target in effort of preventing LC in humans.

Data availability statement

Authors made all the data generated during experiment and analysis available within the manuscript.

Funding statement

Authors received no specific funding from any external sources.

Author contribution

BS and ATM conceived the study. BS designed the analysis. ATM, MAU and BS conducted the experiments. ATM, MAU and YA wrote the paper. NA, BS and BR edited the paper. BR supervised the study. BS was the co-supervisor of the study.

Declaration of competing interest

The authors declare that they have no conflict of interest.
  62 in total

1.  Immunohistochemistry profiles of breast ductal carcinoma: factor analysis of digital image analysis data.

Authors:  Arvydas Laurinavicius; Aida Laurinaviciene; Valerijus Ostapenko; Darius Dasevicius; Sonata Jarmalaite; Juozas Lazutka
Journal:  Diagn Pathol       Date:  2012-03-16       Impact factor: 2.644

2.  Proteomics. Tissue-based map of the human proteome.

Authors:  Mathias Uhlén; Linn Fagerberg; Björn M Hallström; Cecilia Lindskog; Per Oksvold; Adil Mardinoglu; Åsa Sivertsson; Caroline Kampf; Evelina Sjöstedt; Anna Asplund; IngMarie Olsson; Karolina Edlund; Emma Lundberg; Sanjay Navani; Cristina Al-Khalili Szigyarto; Jacob Odeberg; Dijana Djureinovic; Jenny Ottosson Takanen; Sophia Hober; Tove Alm; Per-Henrik Edqvist; Holger Berling; Hanna Tegel; Jan Mulder; Johan Rockberg; Peter Nilsson; Jochen M Schwenk; Marica Hamsten; Kalle von Feilitzen; Mattias Forsberg; Lukas Persson; Fredric Johansson; Martin Zwahlen; Gunnar von Heijne; Jens Nielsen; Fredrik Pontén
Journal:  Science       Date:  2015-01-23       Impact factor: 47.728

3.  Colorectal carcinoma-specific antigen: detection by means of monoclonal antibodies.

Authors:  M Herlyn; Z Steplewski; D Herlyn; H Koprowski
Journal:  Proc Natl Acad Sci U S A       Date:  1979-03       Impact factor: 11.205

4.  Overexpression of epithelial cell adhesion molecule (Ep-CAM) is an independent prognostic marker for reduced survival of patients with epithelial ovarian cancer.

Authors:  Gilbert Spizzo; Philip Went; Stephan Dirnhofer; Peter Obrist; Holger Moch; Patrick A Baeuerle; Elisabeth Mueller-Holzner; Christian Marth; Guenther Gastl; Alain G Zeimet
Journal:  Gynecol Oncol       Date:  2006-05-06       Impact factor: 5.482

5.  Role of HOXB7 in regulation of progression and metastasis of human lung adenocarcinoma.

Authors:  Weiwei Yuan; Xuelin Zhang; Yu Xu; Shasha Li; Yide Hu; Shiyong Wu
Journal:  Mol Carcinog       Date:  2012-08-21       Impact factor: 4.784

6.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.

Authors:  Maxim V Kuleshov; Matthew R Jones; Andrew D Rouillard; Nicolas F Fernandez; Qiaonan Duan; Zichen Wang; Simon Koplev; Sherry L Jenkins; Kathleen M Jagodnik; Alexander Lachmann; Michael G McDermott; Caroline D Monteiro; Gregory W Gundersen; Avi Ma'ayan
Journal:  Nucleic Acids Res       Date:  2016-05-03       Impact factor: 16.971

7.  Prognostic Factors and Survival in Non-Small Cell Lung Cancer Patients Treated with Chemoradiotherapy.

Authors:  Simonida Crvenkova
Journal:  Open Access Maced J Med Sci       Date:  2014-12-29

8.  Heterogeneity and stochastic growth regulation of biliary epithelial cells dictate dynamic epithelial tissue remodeling.

Authors:  Kenji Kamimoto; Kota Kaneko; Cindy Yuet-Yin Kok; Hajime Okada; Atsushi Miyajima; Tohru Itoh
Journal:  Elife       Date:  2016-07-19       Impact factor: 8.140

9.  PrognoScan: a new database for meta-analysis of the prognostic value of genes.

Authors:  Hideaki Mizuno; Kunio Kitada; Kenta Nakai; Akinori Sarai
Journal:  BMC Med Genomics       Date:  2009-04-24       Impact factor: 3.063

10.  Human hepatic stem cells from fetal and postnatal donors.

Authors:  Eva Schmelzer; Lili Zhang; Andrew Bruce; Eliane Wauthier; John Ludlow; Hsin-lei Yao; Nicholas Moss; Alaa Melhem; Randall McClelland; William Turner; Michael Kulik; Sonya Sherwood; Tommi Tallheden; Nancy Cheng; Mark E Furth; Lola M Reid
Journal:  J Exp Med       Date:  2007-07-30       Impact factor: 14.307

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.