Hao Wang1, Zhihong Zhang1, Ke Xu1, Song Wei1, Lailing Li1, Lijun Wang2. 1. Department of Respiratory Oncology, Anhui Provincial Cancer Hospital (The First Affiliated Hospital of USTC West District), Hefei, Anhui 230031, P.R. China. 2. Department of Respiratory Disease, Tongling People's Hospital, Tongling, Anhui 244000, P.R. China.
Lung cancer remains the leading cause of cancer-associated mortalities worldwide. Small cell lung cancer and non-small cell lung cancer (NSCLC) are the two main types of lung cancer. NSCLC includes adenocarcinoma, squamous cell cancer and large cell lung cancer (1). Chemotherapy, radiotherapy, targeted therapy and immunotherapy are the main treatment strategies for NSCLC. However, NSCLC is often diagnosed at a late stage; consequently, the five-year survival rate is low (2). Therefore, investigating the etiology and prognostic factors of NSCLC is important. Squamous cell lung cancer is strongly associated with smoking. While lung adenocarcinoma is associated with smoking, this type of cancer also occurs in non-smokers (3). Non-smoking lung adenocarcinoma is strongly associated with the female sex (4). Specific molecules (CXCR2 and PPBP) and pathways (cell adhesion molecules and CAMs) play important roles in the pathogenesis of lung adenocarcinoma in non-smoking female patients (5).The prevalence of lung adenocarcinomas in non-smoking females is higher than that in non-smoking males, suggesting the sex hormones may be involved in tumorigenesis (3). In vitro studies have revealed that estrogen promotes the proliferation of NSCLC cells through estrogen receptor-mediated signaling pathways, whereas, anti-estrogens exhibit the opposite effect (6,7). Downregulation of estrogen receptor β (ERβ) inhibits cell growth in lung adenocarcinoma (8). 17β-estradiol upregulates the expression of interleukin-16 through the ERβ signaling pathway and promotes the progression of lung adenocarcinoma (9). Previous studies have demonstrated that EGFR (epidermal growth factor receptor) and HER2 (humanepidermal growth factor receptor 2) mutations, and anaplastic lymphoma kinase rearrangements are more commonly observed in lung cancer in non-smokers compared with that in smokers (10,11). Tumor protein P53 and breast cancer types 1 and 2 susceptibility protein variants are likely to contribute to the development of lung adenocarcinoma in non-smoking females (12). Osteopontin (OPN), hypoxia inducible factor-1 and several energy metabolism-associated proteins have been associated with estrogen receptor function (13). However, the pathogenesis and prognostic factors of non-smoking female patients with lung adenocarcinoma remain unclear.In the present study, bioinformatics analysis was used to explore estrogen receptor-associated genes that are related to prognosis in non-smoking female patients with lung adenocarcinoma. The results may improve the understanding of the pathogenic and prognostic factors associated with lung adenocarcinoma in non-smoking females.
Materials and methods
Analysis of microarray data and RNA-sequencing data
Microarray data and the corresponding clinical data for non-smoking female patients with lung adenocarcinoma from the GSE32863 (14) and GSE75037 (15) datasets, both datasets of 24 non-smoking female patients, were downloaded from the Gene Expression Omnibus (GEO) (ncbi.nlm.nih.gov/geo/) based on the platform of GPL6884 Illumina Human WG-6 v3.0 expression beadchip (Illumina, Inc.). Data for 48 non-smoking female patients with lung adenocarcinoma detected using a microarray chip in the GEO database (GSE32863 and GSE75037) and from 160 non-smoking female patients with lung adenocarcinoma detected using RNA-sequencing in the The Cancer Genome Atlas (TCGA) database (portal.gdc.cancer.gov; last updated on July 2017) were also downloaded. The SVA package (version 3.32.1; www.bioconductor.org/help/search/index.html?q=sva/) in Bioconductor (version 3.9; www.bioconductor) was used to normalize the gene expression profile data.
Identification of differentially expressed genes (DEGs)
The Linear Models for Microarray Data (LIMMA) package (version 3.1; www.bioconductor.org/help/search/index.html?q=limma) in Bioconductor was used to identify DEGs between samples from non-smoking female patients with lung adenocarcinoma and samples of adjacent non-cancerous lung tissue. Adjusted P-values and fold-change (FC) values were calculated. The DEGs screening criteria were an adjusted P<0.05 and absolute value of Log2FC >2. The DEGs at the intersection of the datasets (DEGs in GEO and TCGA) were selected for subsequent investigation. The pheatmap package (version 1.0.12; http://cran.r-project.org/web/packages/pheatmap/index.html) was used to draw the heat map.
Enrichment analyses of DEGs
The Database for Annotation, Visualization and Integrated Discovery (version 6.8; david.ncifcrf.gov) was used to perform functional enrichment analysis of the DEGs in non-smoking female patients with lung adenocarcinoma. Gene Ontology (GO; version 6.8; www.geneontology.org) and Kyoto Encyclopedia of Genes and Genomes (KEGG; version 6.8; www.genome.ad.jp/kegg/) pathway analyses were conducted using the WEB-based GEne SeT AnaLysis Toolkit (www.webgestalt.org). P<0.05 was considered to indicate strong enrichment in the annotation categories.
Analysis of protein-protein interaction (PPI) networks
The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) (version 3.6.0; string-db.org/cgi/input.pl) provides experimental and predicted interactions among proteins. STRING analyses were performed to form a PPI network, with the criterion of a combined score of >0.4.The DEGs, ESR1 (estrogen receptor 1), ESR2 (estrogen receptor 2) and GPER (G protein-coupled estrogen receptor) were used as queries in the STRING database and the resultant PPI network was subsequently visualized using Cytoscape software (version 3.6.0; cytoscape.org). CytoHubba (a plug-in in Cytoscape; version 1.6; http://apps.cytoscape.org/apps/cytohubba) was used to identify the estrogen receptor-related hub genes (the top 20 genes with the most connections in the PPI network).
DEGs prognosis analysis
The Kaplan-Meier plotter (kmplot.com) (16) is an online database including clinical and expression data. The median expression level of each gene was used to divide patients into high and low groups. The Kaplan-Meier plotter was used to identify the hub genes with significant effects on prognosis. P<0.05 was considered to indicate a statistically significant difference.
Statistical analysis
Statistical analysis was performed using R (version 3.4.3; www.r-project.org). For the microarray and RNA-sequencing data analysis, the LIMMA package in Bioconductor was used to identify the DEGs. The thresholds for identifying DEGs were P<0.05 and a false discovery rate <2. The SVA package was used for batch normalization of GSE32863 and GSE75037. The log rank test was used to compare the survival trend. P<0.05 was considered to indicate a statistically significant difference.
Results
Identification of DEGs
According to the screening criteria, 248 DEGs (57 upregulated and 191 downregulated) were identified in the GEO data, and 2,362 DEGs (1,773 upregulated and 589 downregulated) were identified in the TCGA data. The DEGs at the intersection of the two databases were selected for further investigation, revealing 170 DEGs between lung adenocarcinoma and normal lung tissues from non-smoking female patients. All 248 DEGs in the GEO database and the 2,362 DEGs in the TCGA database were visualized using a heat map, and a Venn diagram was used to present the DEGs at the intersection of the two databases (Fig. 1). The top 10 (by fold change) upregulated and downregulated DEGs in the GEO and TCGA databases are presented in Table I.
Figure 1.
DEGs in non-smoking female patients with lung adenocarcinoma. (A) The DEGs in the GSE32863 and GSE75037 datasets, which were downloaded from the GEO database. There were 248 DEGs, comprising 57 upregulated and 191 downregulated genes. (B) The DEGs from the TCGA database. There were 2,362 DEGs, comprising 1,773 upregulated and 589 downregulated genes. Red indicates genes that were upregulated and green indicates downregulated genes. Each column represents a tissue sample; each row represents a gene. (C) Venn diagram presenting the 170 common DEGs between the GEO and TCGA datasets. DEGs, differentially expressed genes; GEO, Gene Expression Omnibus; TCGA, The Cancer Genome Atlas.
Table I.
The top 10 (by fold change) upregulated and downregulated DEGs in the GEO and the TCGA databases.
DEGs, differentially expressed genes; GEO, Gene Expression Omnibus; TCGA, The Cancer Genome Atlas.
DEGs enrichment analysis
The 170 DEGs were grouped into BP (biological process; 56 BP terms were significantly enriched), MF (molecular function; 17 MF terms were significantly enriched) and CC (cellular component; 23 CC terms were significantly enriched) categories. The most enriched GO terms in the BP category were ‘cell adhesion’, ‘receptor-mediated endocytosis’ and ‘angiogenesis’. The most enriched GO terms in the MF category were ‘heparin binding’ and ‘carbohydrate binding’. The most enriched GO terms in the CC category were ‘extracellular region’, ‘proteinaceous extracellular matrix’ and ‘extracellular space’ (Fig. 2).
Figure 2.
Enrichment analyses of differentially expressed genes. The 170 differentially expressed genes were grouped into (A) BP, (B) MF, (C) CC and (D) KEGG categories. BP, biological process; MF, molecular function; CC, cellular component; KEGG, Kyoto Encyclopedia of Genes and Genomes.
KEGG pathway enrichment analysis (8 KEGG terms were significantly enriched) revealed that the majority of the DEGs were enriched in pathways including ‘malaria’ and ‘ECM-receptor interaction’ (Fig. 2).
PPI network analysis
A PPI network was constructed to understand the biological significance of the DEGs. The PPI network consisted of 124 nodes and 266 interactions (Fig. 3). The network between the estrogen receptor (ESR1, ESR2 and GPER) and DEGs was also constructed. In addition, a network of hub genes associated with estrogen receptor was constructed. The PPI network analysis identified 27 DEGs that were considered as hub genes in the network. There were four hub genes that were closely associated with the estrogen receptor, including caveolin 1 (CAV1), matrix metalloproteinase 9 (MMP9), SUMO-1-specific protease 1 (SPP1) and collagen type I α 1 chain (COL1A1), as presented in Fig. 4.
Figure 3.
PPI network showing experimentally verified and predicted interaction information among the proteins encoded by the differentially expressed genes. There were 124 nodes and 266 interactions in the PPI network. The key below the network shows which lines indicate experimentally verified interactions and which indicate predicted interactions. PPI, protein-protein interaction.
Figure 4.
The network between DEGs and genes encoding estrogen receptors. Estrogen receptors comprise three subtypes: ESR1, ESR2, and GPER (presented in red). The hub genes (presented in green) make up the small circle. The big circle together with the small circle represents all the DEGs. DEGs, differentially expressed genes; ESR1, estrogen receptor 1; ESR2, estrogen receptor 2; GPER, G protein-coupled estrogen receptor.
Kaplan-Meier survival analysis
Kaplan-Meier curves were used to assess the effect of estrogen receptor-associated hub genes on the overall survival (OS) of 121 non-smoking females with lung adenocarcinoma. Low expression of SPP1 and high expression of CAV1 were associated with improved OS. However, there was no significant association between MMP9 or COL1A1 expression status and OS (Fig. 5).
Figure 5.
Effect on OS of four hub genes associated with the estrogen receptor in the network. The prognostic effect of (A) CAV1, (B) SPP1, (C) COL1A1 and (D) MMP9 were plotted using the Kaplan-Meier plotter. Red lines indicate patients with high expression of the gene. Black lines indicate patients with low expression of the gene. Low expression of SPP1 and high expression of CAV1 were associated with increased OS, whereas COL1A1 and MMP9 expression were not relevant to survival. The 95% confidence interval of HR is also presented. OS, overall survival; SPP1, secreted phosphoprotein 1; CAV1, caveolin 1; HR, hazard ratio.
Discussion
The pathogenesis and prognostic factors for lung adenocarcinoma in non-smoking females remain controversial (3). Previous studies have suggested that estrogen and its receptor (ER) may serve important roles. In vitro studies have revealed that the ER promotes NSCLC vasculogenic mimicry and cell invasion (17). The ER is also activated in EGFR-tyrosine kinase inhibitor mediated secondary resistance (18). In addition, a high expression level of ER is a significant prognostic factor for survival in advanced NSCLC (19). There are three types of estrogen receptors: ERα, ERβ and GPER. ERα and ERβ are important nuclear transcription factors located in the cell nucleus. GPER is a G-protein coupled receptor containing seven transmembrane domains located in the cell membrane (19). Few studies have investigated the association between ERs and lung adenocarcinoma in non-smoking female patients (20,21).In the present study, gene expression profiles from the GSE32863 and GSE75037 datasets and data from the TCGA database were analyzed using bioinformatic methods. A total of 170 DEGs between lung adenocarcinoma and normal lung tissue samples from non-smoking women were common to both databases. Additionally, the GO terms and KEGG pathways associated with these DEGs, which might significantly affect lung adenocarcinoma in non-smoking females, were identified.The PPI network analysis identified that 27 DEGs were considered as hub genes in the network. A network consisting of the ERs, DEGs and hub genes was constructed and it was revealed that the hub genes CAV1, SPP1, MMP9 and COL1A1 are significantly associated with ER function.CAV1 is the main structural component of the caveolae, which form flask-shaped invaginations that are involved in cell signaling and transport (22). Low expression levels of CAV1 induce a hyper-proliferative state, promoting cell proliferation, angiogenesis and tumor progression in certain tumors, suggesting that loss of CAV1 regulation is an important step in the acquisition of a transformed phenotype (23). Ramírez et al (24) demonstrated that ERα is present in caveolae and is stabilized by CAV1. Interactions between ERα with CAVI were demonstrated using epitope proximity ligation assays (25). In vitro, the association between ERα and caveolin-1 increased in tumors that regressed in response to estradiol (26). In addition, CAV1 is associated with prognosis in cancer, such as in breast, colon and ovarian carcinoma (22). In the present study, non-smoking female patients with lung adenocarcinoma with high CAV1 expression had an improved prognosis compared with patients with a low CAV1 expression. The interaction between CAV1 and ERα may therefore serve an important role in the pathogenesis and prognosis of lung adenocarcinoma in non-smoking female patients.SPP1 encodes secreted phosphoprotein 1, also known as OPN. OPN is a highly phosphorylated glycophosphoprotein rich in aspartic acid, which facilitates cell-matrix interactions (27). Previous studies investigating lung cancer have reported that tumor development, progression and metastasis are promoted by increasing the release of OPN (28–30). In addition, OPN expression levels were significantly associated with lung cancer differentiation and the efficacy of platinum-based treatment (31). OPN levels may have a significant predictive potential in estimating survival of NSCLC and high OPN expression levels were significantly associated with poor prognosis in NSCLC compared with low expression (32). The survival analysis of non-smoking female patients with lung adenocarcinoma in the current study supported the aforementioned study in NSCLC, as a low expression level of OPN was associated with improved prognosis compared with high expression. SPP1 may be a candidate molecular marker associated with the pathogenesis and prognosis of lung adenocarcinoma in non-smoking female patients.MMP9 regulates various cellular behaviors associated with cancer cell differentiation, migration, invasion and immune system surveillance (33). Suppression of ESR2 in breast cancer cells may affect the expression of MMP9 though microRNA-145 (34). In lung adenocarcinoma, downregulation of ESR2 inhibits cell growth though decreased expression of MMP9 (8). MMP9 may therefore be implicated in lung adenocarcinoma in non-smoking women.COL1A1 is dysregulated in a variety of tumors, including breast and gastric cancer (35). COL1A1 gene expression is inhibited by halofuginone, resulting in inhibition of the proliferation of bladder carcinoma cells (36). Thus, COL1A1 may serve as a potential therapeutic target for lung adenocarcinoma in non-smoking female patients.The present study had a number of limitations. The expression levels of the DEGs, their functions, the hub genes and the association between the ERs and DEGs require experimental validation. The lack of tissues collected from newly diagnosed patients with adenocarcinoma in non-smoking female patients and clinical data were also a limitation and should be conducted in future studies.In conclusion, the present study used bioinformatics analysis to explore the pathogenesis of lung adenocarcinoma in non-smoking female patients and to identify prognostic biomarkers for this disease. Additionally, the effect of genetic and molecular effect of estrogen in non-smoking female patients with lung adenocarcinoma was investigated. The results obtained in the present study provide novel insights into the molecular mechanisms of lung adenocarcinoma in non-smoking female patients.
Authors: Marius Lund-Iversen; Helge Scott; Erik H Strøm; Noah Theiss; Odd Terje Brustugun; Bjørn H Grønberg Journal: Anticancer Res Date: 2018-04 Impact factor: 2.480