Literature DB >> 29285217

Identification of novel gene expression signature in lung adenocarcinoma by using next-generation sequencing data and bioinformatics analysis.

Ya-Ling Hsu1, Jen-Yu Hung2,3, Yen-Lung Lee4, Feng-Wei Chen5, Kuo-Feng Chang6, Wei-An Chang3,5, Ying-Ming Tsai1,3, Inn-Wen Chong3,7, Po-Lin Kuo5,8,9.   

Abstract

Lung adenocarcinoma is one of the leading causes of cancer-related death worldwide. We showed transcriptomic profiles in three pairs of tumors and adjacent non-tumor lung tissues using next-generation sequencing (NGS) to screen protein-coding RNAs and microRNAs. Combined with meta-analysis from the Oncomine and Gene Expression Omnibus (GEO) databases, we identified a representative genetic expression signature in lung adenocarcinoma. There were 9 upregulated genes, and 8 downregulated genes in lung adenocarcinoma. The analysis of the effects from each gene expression on survival outcome indicated that 6 genes (AGR2, SPDEF, CDKN2A, CLDN3, SFN, and PHLDA2) play oncogenic roles, and 7 genes (PDK4, FMO2, CPED1, GNG11, IL33, BTNL9, and FABP4) act as tumor suppressors in lung adenocarcinoma. In addition, we also identified putative genetic interactions, in which there were 5 upregulated microRNAs with specific targets - hsa-miR-183-5p-BTNL9, hsa-miR-33b-5p-CPED1, hsa-miR-429-CPED1, hsa-miR-182-5p-FMO2, and hsa-miR-130b-5p-IL33. These 5 microRNAs have been shown to be associated with tumorigenesis in lung cancer. Our findings suggest that these genetic interactions play important roles in the progression of lung adenocarcinoma. We propose that this molecular change of genetic expression may represent a novel signature in lung adenocarcinoma, which may be developed for diagnostic and therapeutic strategies in the future.

Entities:  

Keywords:  bioinformatics; lung adenocarcinoma; messenger RNA; microRNA; next-generation sequencing

Year:  2017        PMID: 29285217      PMCID: PMC5739604          DOI: 10.18632/oncotarget.21022

Source DB:  PubMed          Journal:  Oncotarget        ISSN: 1949-2553


INTRODUCTION

Lung cancer is one of the leading causes of cancer-related death worldwide [1]. The development and progression of lung cancer has been widely studied. Briefly, the genetic alterations or mutations occurred in a single cell, leading to cellular transformation and thus expansion into a malignant tumor [2]. Non-small cell lung carcinoma (NSCLC) accounts for about 80–85% of all lung cancers [3, 4], of which lung adenocarcinoma (40%) is the most common subtype of NSCLC [5]. Surgery to remove cancer, with or without radiotherapy/chemotherapy, is the standard approach for early stage lung cancer. However, the recurrence of distant metastasis [6, 7] or resistance to therapy [8, 9] often occurs, and such phenomenon is associated with critical genetic alterations involved in various biological mechanisms. The genetic alterations related to cellular transformation are involved in various biological processes, including transcription [10], DNA repair [11], cell cycle progression [12], apoptosis [13], migration ability [14], and metabolism [15, 16]. In lung cancer, many genes have been identified as oncogenes or tumor suppressor genes, which modulate varieties of molecular functions involved in tumor development and progression [17, 18]. Recently, small RNAs have been found to play important roles in lung cancer. MicroRNAs are a group of small non-coding RNAs containing 20–26 nucleotides, which can regulate gene expression via binding to the 3′ untranslated region (3′UTR) of specific messenger RNAs (mRNAs). This interaction can cause mRNAs’ degradation or translation inhibition [19]. The signaling pathways associated with microRNAs targeting oncogenes or tumor suppressor genes have been reported to be involved in lung cancer progression [20]. Alterations of many microRNAs in chromosome regions associated with various cancers have also been implicated [21]. Next-generation sequencing (NGS) is a powerful method to screen the entire transcriptomic profile, including messenger RNAs or small RNAs [22]. In this study, we attempted to identify the differentially expressed genes and genetic interactions of target gene-microRNA in lung adenocarcinoma combined with systematic analysis, by using bioinformatics tools, including the Oncomine [23], Gene Expression Omnibus (GEO) [24], PrognoScan [25], Kaplan-Meier plotter [26], SurvExpress [27], and miRmap databases [28]. We sought to identify novel gene expression signature and/or genetic interactions in lung adenocarcinoma via systematic bioinformatics analysis. Hopefully, the approach and findings from this study will provide new perspectives on the development of diagnostic and therapeutic strategies for lung adenocarcinoma.

RESULTS

Identification of differentially expressed genes as a molecular signature in lung adenocarcinoma

To investigate genetic expression changes in lung adenocarcinoma, we analyzed the transcriptomic profiles in 3 pairs of human specimens from lung adenocarcinoma and its adjacent normal lung tissue using next-generation sequencing (Figure 1). We focused on protein-coding RNAs and Venn diagram analysis which showed that 9 genes were upregulated (Figure 1A), whereas 8 genes were downregulated (Figure 1B) in lung adenocarcinoma tissue compared to adjacent normal lung tissue. The analysis criteria were fold change > 2 and fragments per kilobase million (FPKM) > 0.3. The hierarchical color clustering showed the expression pattern of each gene with z-scores (log2) in these 3 pairs of specimens (Figure 1C). The list of 17 differentially expressed genes with FPKM is shown in Table 1. To investigate whether this genetic expression pattern could represent a molecular signature in lung adenocarcinoma, we investigated these genes in the Oncomine database, which contains different data sets of specimens from lung adenocarcinoma and normal lung tissue. We selected 7 datasets from the Oncomine database for comparison of gene expression, including Hou (normal = 65 and lung adenocarcinoma = 45), Landi (normal = 49 and lung adenocarcinoma = 58), Selamat (normal = 58 and lung adenocarcinoma = 58), Okayama (normal = 20 and lung adenocarcinoma = 226), Su (normal = 30 and lung adenocarcinoma = 27), Wei (normal = 25 and lung adenocarcinoma = 25), and Stearman (normal = 19 and lung adenocarcinoma = 20). The heatmap analysis indicated that the genetic expression patterns of 17 genes were similar among these datasets (Figure 2), which suggests that this molecular change is consistent in lung adenocarcinoma, and may represent a novel genetic signature in lung adenocarcinoma.
Figure 1

Identification of differentially expressed genes in lung adenocarcinoma compared to adjacent normal tissue using next-generation sequencing

Venn diagram analysis showed 9 upregulated genes (A) and 8 downregulated genes (B) in lung adenocarcinoma, compared to adjacent non-adenocarcinoma tissue from 3 pairs of clinical specimens. The criteria were fold change > 2 (tumor/normal) and fragments per kilobase million (FPKM) > 0.3. (C) The heatmap diagram showed the differentially expressed genes with z-score (log2) values by using color clustering on GENE-E web-tool. Green represents downregulation (minimum = −2.5), and red represents upregulation (maximum = 2.5).

Table 1

Differentially expressed genes identified from next-generation sequencing data

GeneDescriptionFPKM ( fragments per kilobase million)T/N
N1T1N2T2N3T3
TOX3TOX High Mobility Group Box Family Member 31.6511.990.7011.571.0831.11Up
AGR2Anterior gradient 2, Protein Disulphide Isomerase Family Member340.961530.74126.15864.49148.26955.51Up
SPDEFSAM Pointed Domain Containing ETS Transcription Factor5.1137.100.483.910.885.90Up
CDKN2ACyclin-Dependent Kinase Inhibitor 2A7.04298.0710.05129.694.9480.09Up
AQP5Aquaporin 541.76333.2812.7059.0214.62107.83Up
CLDN3Claudin 324.83104.716.0238.537.3139.11Up
SFNStratifin27.1675.727.4859.639.5965.88Up
PHLDA2Pleckstrin Homology-Like Domain, Family A, Member 29.0796.142.6019.932.3618.58Up
ZDHHC9Zinc finger, DHHC-type containing 917.8953.1412.3849.4911.0878.66Up
PDK4Pyruvate Dehydrogenase Kinase 430.9110.8272.243.1847.767.63Down
FMO2Flavin Containing Monooxygenase 250.6213.47138.8211.57185.0815.11Down
NDRG4NDRG Family Member 413.433.8840.805.0826.466.07Down
CPED1Cadherin-like and PC-esterase Domain Containing 116.735.7019.425.6823.245.39Down
GNG11G Protein Subunit Gamma 1132.9611.1557.855.3742.7012.85Down
IL33Interleukin 33117.7733.34133.9816.49153.3115.49Down
BTNL9Butyrophilin-like 919.576.4332.150.9226.155.25Down
FABP4Fatty Acid Binding Protein 4252.5674.90486.965.08459.2149.26Down
Figure 2

Comparison of differentially expressed genes in clinical lung adenocarcinoma and normal lung tissue by Oncomine database analysis

Seven microarray datasets from the Oncomine database were used to analyze gene expression patterns (lung adenocarcinoma vs. normal), including (A) Hou, (B) Landi, (C) Selamat, (D) Okayama, (E) Su, (F) Wei, and (G) Stearman. Seventeen differentially expressed genes (9 up and 8 down) identified from 3 pairs of clinical lung adenocarcinoma were selected. Raw data were extracted and re-plotted by GENE-E web-tool, and the relative color scheme used for clustering analysis. Yellow represents high expression (maximum = 1) and blue represents low expression (minimum = 0). The gene symbols and corresponding specific probes are displayed on the right side of each diagram.

Identification of differentially expressed genes in lung adenocarcinoma compared to adjacent normal tissue using next-generation sequencing

Venn diagram analysis showed 9 upregulated genes (A) and 8 downregulated genes (B) in lung adenocarcinoma, compared to adjacent non-adenocarcinoma tissue from 3 pairs of clinical specimens. The criteria were fold change > 2 (tumor/normal) and fragments per kilobase million (FPKM) > 0.3. (C) The heatmap diagram showed the differentially expressed genes with z-score (log2) values by using color clustering on GENE-E web-tool. Green represents downregulation (minimum = −2.5), and red represents upregulation (maximum = 2.5).

Comparison of differentially expressed genes in clinical lung adenocarcinoma and normal lung tissue by Oncomine database analysis

Seven microarray datasets from the Oncomine database were used to analyze gene expression patterns (lung adenocarcinoma vs. normal), including (A) Hou, (B) Landi, (C) Selamat, (D) Okayama, (E) Su, (F) Wei, and (G) Stearman. Seventeen differentially expressed genes (9 up and 8 down) identified from 3 pairs of clinical lung adenocarcinoma were selected. Raw data were extracted and re-plotted by GENE-E web-tool, and the relative color scheme used for clustering analysis. Yellow represents high expression (maximum = 1) and blue represents low expression (minimum = 0). The gene symbols and corresponding specific probes are displayed on the right side of each diagram. We classified 17 differentially expressed genes into 6 groups by biological and molecular functions, based on literature searches (Table 2; Supplementary Table 1), including (1) transcription regulation – TOX3 and SPDEF, (2) metabolism – PDK4, FABP4, and FMO2, (3) cell cycle regulation – CDKN2A, PHLDA2, SFN, and NDRG4, (4) cellular migration – AGR2, AQP5, and CLDN3 (5) inflammation – IL33, and (6) undefined group – ZDHHC9, BTNL9, GNG11, and CPED1 (also known as C7orf58). To further elucidate the role of these genetic expression changes in cancer progression, we performed survival curve analysis using the PrognoScan, Kaplan–Meier plotter, and SurvExpress databases.
Table 2

Functional classification of differentially expressed genes

FunctionsGenesFold change (Cancer/Normal)References#
Transcriptional regulationTOX3UP1
SPDEFUP2
MetabolismPDK4DOWN3
FABP4DOWN4
FMO2DOWN5, 6
Cell cycle regulationCDKN2AUP7
PHLDA2UP8, 9
NDRG4DOWN10
SFNUP11
Cellular migrationAGR2UP12
AQP5UP13
CLDN3UP14
InflammationIL33DOWN15
OthersZDHHC9UP16
BTNL9DOWN17
GNG11DOWN18
CPED1DOWN19

# References were list in Supplementray Table 1.

# References were list in Supplementray Table 1.

Analysis of TOX3 and SPDEF in lung adenocarcinoma

The mRNA expression of TOX3 and SPDEF between lung adenocarcinoma and normal lung tissue derived from the Oncomine database is listed in Table 3. The analysis criteria were fold change > 2, p-value < 1E-04, and gene ranking in the top 10%. The results showed that both TOX3 and SPDEF are significantly upregulated in lung adenocarcinoma, compared to normal tissue. In addition, we selected a microarray with the accession number of GSE10072 from the Gene Expression Omnibus (GEO) database for gene expression analysis. This array contains 31 pairs of clinical lung adenocarcinomas and adjacent normal tissue. The results showed that the mRNA expression of either TOX3 (Figure 3A) or SPDEF (Figure 3B) is upregulated in lung adenocarcinoma. To further investigate the role of TOX3 and SPDEF expression in cancer progression, we performed a survival curve analysis to evaluate the effects of gene expression in lung cancer patients with lung adenocarcinoma. The results indicated that the population with higher TOX3 expression has better survival rates (Figure 3C–3G), whereas the population with higher SPDEF expression has poorer survival outcome (Figure 3H, 3I). The prognostic values of TOX3 and SPDEF expression in lung adenocarcinoma were shown as forest plots (Figure 3J), which were derived from the PrognoScan database with a Cox p-value < 0.05, and the Kaplan–Meier plotter database with a log-rank p-value < 0.05.
Table 3

Analysis of TOX3 and SPDEF mRNA expression in lung adenocarcinoma compared to normal tissue from Oncomine database

GeneFold change (Cancer/Normal)P-valueGene Ranking (Top%)Samples (Normal : Tumor)DatasetProbe
TOX38.6851.18E-22120 : 226Okayama215108_x_at
19.9731.86E-24125 : 25Wei214774_x_at
4.9682.80E-18249 : 58Landi216623_x_at
8.4971.79E-9130 : 27Su216623_x_at
3.153.09E-4217 : 132Bhattacharjee37426_at
12.6172.17E-7319 : 20Stearman37426_at
3.7644.72E-71065 : 45Hou216623_x_at
SPDEF3.7191.54E-11520 : 226Okayama213441_x_at
3.4512.64E-12125 : 25Wei220192_x_at
5.442.86E-9130 : 27Su220192_x_at
2.121.94E-7965 : 45Hou220192_x_at
3.8449.6E-17258 : 58SelamatILMN_2161330
Figure 3

Analysis of TOX3 and SPDEF in clinical lung adenocarcinoma patients using bioinformatics databases

The gene expression of TOX3 (A) and SPDEF (B), comparing 31 pairs of clinical lung adenocarcinoma (red) and adjacent normal tissue (blue), was performed on GSE10072 microarray from the GEO database. (3 probes for TOX3 and SPDEF respectively in GSE10072) The p-value of gene expression was calculated by t-test with Wilcoxon matched-pairs signed rank test. *** represents p < 0.001, ** represents p < 0.01. The survival curves comparing 2 populations with high (red) and low (black) gene expression in lung adenocarcinoma patients were performed on the PrognoScan database - TOX3 (C–F) and SPDEF (H, I), and the Kaplan–Meier plotter database - TOX3 (G). The analysis criteria of the PrognoScan and Kaplan–Meier plotter databases were Cox p-value < 0.05 and log-rank p-value < 0.05 respectively. The raw data of GEO and PrognoScan databases were extracted and re-plotted by GraphPad Prism 5 software. (J) The forest plots showed hazard ratios (95% CI, confidence interval) identified from PrognoScan and Kaplan–Meier plotter databases.

Analysis of TOX3 and SPDEF in clinical lung adenocarcinoma patients using bioinformatics databases

The gene expression of TOX3 (A) and SPDEF (B), comparing 31 pairs of clinical lung adenocarcinoma (red) and adjacent normal tissue (blue), was performed on GSE10072 microarray from the GEO database. (3 probes for TOX3 and SPDEF respectively in GSE10072) The p-value of gene expression was calculated by t-test with Wilcoxon matched-pairs signed rank test. *** represents p < 0.001, ** represents p < 0.01. The survival curves comparing 2 populations with high (red) and low (black) gene expression in lung adenocarcinoma patients were performed on the PrognoScan database - TOX3 (C–F) and SPDEF (H, I), and the Kaplan–Meier plotter database - TOX3 (G). The analysis criteria of the PrognoScan and Kaplan–Meier plotter databases were Cox p-value < 0.05 and log-rank p-value < 0.05 respectively. The raw data of GEO and PrognoScan databases were extracted and re-plotted by GraphPad Prism 5 software. (J) The forest plots showed hazard ratios (95% CI, confidence interval) identified from PrognoScan and Kaplan–Meier plotter databases.

Analysis of PDK4, FMO2, and FABP4 in lung adenocarcinoma

mRNA expression of PDK4, FMO2, and FABP4 between lung adenocarcinoma and normal lung tissue was analyzed by using the Oncomine database, and listed in Table 4. The expressions of PDK4, FMO2, and FABP4 are significantly downregulated in lung adenocarcinoma compared to normal lung tissue in several datasets, and this phenomenon was also observed in GSE10072 array (Figure 4A–4C). In cancer patients with lung adenocarcinoma, the survival curve analysis indicated that the population with higher expression of PDK4 (Figure 4D–4F), FMO2 (Figure 4G–4I), or FABP4 (Figure 4J) is correlated with better survival rates. The prognostic values of PDK4, FMO2, and FABP4 are shown in Figure 4K.
Table 4

Analysis of PDK4, FMO2 and FABP4 mRNA expression in lung adenocarcinoma compared to normal tissue from Oncomine database

GeneFold change (Cancer/Normal)P-valueGene Ranking (Top%)Samples (Normal : Tumor)DatasetProbe
PDK4−5.1621.27E-24120 : 226Okayama225207_at
−9.8342.27E-13225 : 25Wei225207 _at
−2.4151.92E-16449 : 58Landi205960_at
−4.0742.98E-7530 : 27Su205960_at
−4.331.56E-4817 : 132Bhattacharjee36739_at
−2.6181.77E-4919 : 20Stearman36739_at
−4.5936.29E-27258 : 58SelamatILMN_1684982
−3.0959.03E-7710 : 86BeerU54617_at
−10.3682.77E-625 : 40GarberIMAGE:78946
FMO2−3.774.98E-18120 : 226Okayama211726_s_at
−7.8114.49E-15225 : 25Wei228268_at
−5.5286.60E-28149 : 58Landi211726_s_at
−4.3292.47E-10230 : 27Su211726_s_at
−8.7051.62E-53158 : 58SelamatILMN_1732158
−9.1021.02E-24165 : 65Hou228268_at
−7.567.38E-625 : 39GarberIMAGE:80507
−11.0629.47E-19110 : 86BeerY09267_at
FABP4−14.4219.74E-28120 : 226Okayama203980_at
−20.1121.42E-17125 : 25Wei203980_at
−19.6251.89E-17130 : 27Su203980_at
−14.2931.90E-37149 : 58Landi203980_at
−68.0433.67E-13117 : 132Bhattacharjee38430_at
−13.9182.07E-7419 : 20Stearman38430_at
−26.5321.26E-12210 : 86BeerJ02874_at
−9.2141.34E-535 : 40GarberIMAGE:2308848
−12.8425.80E-44158 : 58SelamatILMN_1773006
Figure 4

Analysis of PDK4, FMO2 and FABP4 in clinical lung adenocarcinoma patients using bioinformatics databases

The gene expression of PDK4 (A), FMO2 (B), and FABP4 (C), comparing 31 pairs of clinical lung adenocarcinoma (red) and adjacent normal tissue (blue), was performed on GSE10072 microarray from the GEO database. The p-value of gene expression was calculated by t-test with Wilcoxon matched-pairs signed rank test. *** represents p < 0.001. The survival curves comparing 2 populations with high (red) and low (black) gene expression in lung adenocarcinoma patients were performed on the PrognoScan database - PDK4 (D, E), FMO2 (G), and FABP4 (J), and the Kaplan–Meier plotter database – PDK4 (F) and FMO2 (H, I). The analysis criteria of the PrognoScan and Kaplan–Meier plotter databases were Cox p-value < 0.05 and log-rank p-value < 0.05 respectively. Raw data of the GEO and the PrognoScan databases were extracted and re-plotted by GraphPad Prism 5 software. (K) The forest plots showed hazard ratios (95% CI, confidence interval) identified from the PrognoScan and Kaplan–Meier plotter databases.

Analysis of PDK4, FMO2 and FABP4 in clinical lung adenocarcinoma patients using bioinformatics databases

The gene expression of PDK4 (A), FMO2 (B), and FABP4 (C), comparing 31 pairs of clinical lung adenocarcinoma (red) and adjacent normal tissue (blue), was performed on GSE10072 microarray from the GEO database. The p-value of gene expression was calculated by t-test with Wilcoxon matched-pairs signed rank test. *** represents p < 0.001. The survival curves comparing 2 populations with high (red) and low (black) gene expression in lung adenocarcinoma patients were performed on the PrognoScan database - PDK4 (D, E), FMO2 (G), and FABP4 (J), and the Kaplan–Meier plotter database – PDK4 (F) and FMO2 (H, I). The analysis criteria of the PrognoScan and Kaplan–Meier plotter databases were Cox p-value < 0.05 and log-rank p-value < 0.05 respectively. Raw data of the GEO and the PrognoScan databases were extracted and re-plotted by GraphPad Prism 5 software. (K) The forest plots showed hazard ratios (95% CI, confidence interval) identified from the PrognoScan and Kaplan–Meier plotter databases.

Analysis of CDKN2A, NDRG4, SFN, and PHLDA2 in lung adenocarcinoma

We observed that the expression of CDKN2A, PHLDA2, and SFN are upregulated and the expression of NDRG4 is downregulated in lung adenocarcinoma when compared to normal lung tissue, which was also confirmed by the Oncomine database, listed in Table 5. In the GSE10072 array, expression levels of CDKN2A, PHLDA2, and SFN are significantly upregulated in lung adenocarcinoma compared to normal lung tissue in several datasets (Figure 5A–5C), and NDRG4 is downregulated (Figure 5D). The survival curve analysis showed that lung adenocarcinoma patients with high levels of CDKN2A expression are associated with poorer survival rates (Figure 5E). The expression level of NDRG4, however, has no significant effects on survival outcomes for patients with lung adenocarcinoma (Figure 5F). Higher expressions of PHLDA2 (Figure 5G, 5H), or SFN (Figure 5I–5K) are also associated with poorer survival. The forest plots showed the prognostic values of CDKN2A, PHLDA2, SFN, and NDRG4 (Figure 5L).
Table 5

Analysis of CDKN2A, PHLDA2, SFN, and NDRG4 mRNA expression in lung adenocarcinoma compared to normal tissue from Oncomine database

GeneFold change (Cancer/Normal)P-valueGene Ranking (Top%)Samples (Normal : Tumor)DatasetProbe
CDKN2A2.7931.08E-8120 : 226Okayama225207_at
3.2032.43E-6225 : 25Wei225207 _at
2.0307.82E-11449 : 58Landi205960_at
3.1614.32E-9565 : 45Hou205960_at
2.2060.01819 : 20Stearman36739_at
3.5067.68E-4910 : 86Beer36739_at
PHLDA24.2074.25E-9125 : 25Wei211726_s_at
5.731.80E-4217 : 132Bhattacharjee228268_at
4.0271.99E-19149 : 58Landi211726_s_at
3.6341.09E-7230 : 27Su211726_s_at
2.3871.42E-17158 : 58SelamatILMN_1732158
2.2121.35E-5165 : 45Hou228268_at
4.0159.11E-8219 : 20StearmanIMAGE:80507
2.7992.05E-11110 : 86BeerY09267_at
SFN4.7211.98E-11225 : 25Wei33323_r_at
2.0491.60E-12549 : 58Landi33322_i_at
6.4593.04E-8230 : 27Su209260_at
2.4104.06E-6519 : 20Stearman33322_i_at
4.4871.67E-24158 : 58SelamatILMN_1806607
2.3022.74E-5610 : 86BeerX57348_s_at
NDRG4−3.5191.78E-12120 : 226Okayama203980_at
−5.0091.03E-13125 : 25Wei203980_at
−3.9235.21E-9130 : 27Su203980_at
−4.4022.32E-7419 : 20Stearman38430_at
−2.4481.60E-14265 : 45HouJ02874_at
−2.1524.63E-435 : 40GarberIMAGE:2308848
−3.0071.04E-18158 : 58SelamatILMN_1773006
Figure 5

Analysis of CDKN2A, PHLDA2, SFN, and NDRG4 in clinical lung adenocarcinoma patients using bioinformatics databases

The gene expression of CDKN2A (A), PHLDA2 (B), SFN (C), and NDRG4 (D), comparing 31 pairs of clinical lung adenocarcinoma (red) and adjacent normal tissue (blue), was performed on a GSE10072 microarray from the GEO database. The p-value of gene expression was calculated by t-test with Wilcoxon matched-pairs signed rank test. *** represents p < 0.001, ** represents p < 0.01. The survival curves comparing 2 populations with high (red) and low (black) gene expression in lung adenocarcinoma patients were performed on the Kaplan–Meier plotter database – CDKN2A (E) and NDRG4 (F), and PrognoScan database – PHLDA2 (G, H) and SFN (I–K). The analysis criteria of PrognoScan and Kaplan–Meier plotter databases were Cox p-value < 0.05 and log-rank p-value < 0.05 respectively. Raw data of the GEO and PrognoScan databases were extracted and re-plotted by GraphPad Prism 5 software. (L) The forest plots showed hazard ratios (95% CI, confidence interval) identified from the PrognoScan and Kaplan–Meier plotter databases.

Analysis of CDKN2A, PHLDA2, SFN, and NDRG4 in clinical lung adenocarcinoma patients using bioinformatics databases

The gene expression of CDKN2A (A), PHLDA2 (B), SFN (C), and NDRG4 (D), comparing 31 pairs of clinical lung adenocarcinoma (red) and adjacent normal tissue (blue), was performed on a GSE10072 microarray from the GEO database. The p-value of gene expression was calculated by t-test with Wilcoxon matched-pairs signed rank test. *** represents p < 0.001, ** represents p < 0.01. The survival curves comparing 2 populations with high (red) and low (black) gene expression in lung adenocarcinoma patients were performed on the Kaplan–Meier plotter database – CDKN2A (E) and NDRG4 (F), and PrognoScan database – PHLDA2 (G, H) and SFN (I–K). The analysis criteria of PrognoScan and Kaplan–Meier plotter databases were Cox p-value < 0.05 and log-rank p-value < 0.05 respectively. Raw data of the GEO and PrognoScan databases were extracted and re-plotted by GraphPad Prism 5 software. (L) The forest plots showed hazard ratios (95% CI, confidence interval) identified from the PrognoScan and Kaplan–Meier plotter databases.

Analysis of AGR2, CLDN3, and AQP5 in lung adenocarcinoma

Analysis of mRNA expression from the Oncomine database revealed that AGR2, AQP5, and CLDN3 are upregulated in lung adenocarcinoma, compared to normal lung tissue, and these results are listed in Table 6. In the GSE10072 array, we also observed that expression levels of AGR2 and CLDN3 are significantly upregulated in lung adenocarcinoma when compared to normal lung tissue (Figure 6A, 6B). However, the expression of AQP5 showed no significant change in the GSET10072 array (Figure 6C). The survival curve analysis showed that high expression of CLDN3 is correlated with poorer rates of survival in lung adenocarcinoma patients (Figure 6D). However, the population with a higher expression of AQP5 has better survival outcome (Figure 6E–6G). Higher expression of AGR2 is also associated with poorer survival rates (Figure 6H). The prognostic values of AGR2, CLDN3, and AQP5 were shown as forest plots (Figure 6I).
Table 6

Analysis of AGR2, AQP5, and CLDN3 mRNA expression in lung adenocarcinoma compared to normal tissue from Oncomine database

GeneFold change (Cancer/Normal)P-valueGene Ranking (Top%)Samples (Normal : Tumor)DatasetProbe
AGR22.9659.38E-11620 : 226Okayama228969_at
5.5861.83E-15125 : 25Wei209173 _at
2.7799.67E-11749 : 58Landi209173_at
3.4341.49E-8130 : 27Su209173_at
2.9020.003317 : 132Bhattacharjee38827_at
2.6771.80E-5619 : 20Stearman38827_at
2.3935.73E-11758 : 58SelamatILMN_1814151
AQP52.8412.80E-61025 : 25Wei213611_at
CLDN33.2875.33E-12420 : 226Okayama203954_x_at
3.5941.50E-11125 : 25Wei203954_x_at
3.2821.71E-7230 : 27Su203954_x_at
2.1934.34E-16249 : 58Landi203954_x_at
3.7071.61E-8119 : 20Stearman33904_at
5.1523.60E-5117 : 132Bhattacharjee33904_at
3.1543.74E-14458 : 58SelamatILMN_1723042
Figure 6

Analysis of AGR2, CLDN3, and AQP5 in clinical lung adenocarcinoma patients using bioinformatics databases

The gene expression of AGR2 (A), CLDN3 (B), and AQP5 (C), comparing 31 pairs of clinical lung adenocarcinoma (red) and adjacent normal tissue (blue), was performed on a GSE10072 microarray from the GEO database. The p-value of gene expression was calculated by t-test with Wilcoxon matched-pairs signed rank test. *** represents p < 0.001 and n.s. represents no significance. Survival curves comparing 2 populations with high (red) and low (black) gene expression in lung adenocarcinoma patients were performed on the Kaplan–Meier plotter – CLDN3 (D) and AQP5 (E), and PrognoScan databases – AQP5 (F, G) and AGR2 (H). The analysis criteria of the PrognoScan and Kaplan–Meier plotter databases were Cox p-value < 0.05 and log-rank p-value < 0.05 respectively. The raw data of the GEO and PrognoScan databases were extracted and re-plotted by GraphPad Prism 5 software. (I) The forest plots showed hazard ratios (95% CI, confidence interval) identified from the PrognoScan and Kaplan–Meier plotter databases.

Analysis of AGR2, CLDN3, and AQP5 in clinical lung adenocarcinoma patients using bioinformatics databases

The gene expression of AGR2 (A), CLDN3 (B), and AQP5 (C), comparing 31 pairs of clinical lung adenocarcinoma (red) and adjacent normal tissue (blue), was performed on a GSE10072 microarray from the GEO database. The p-value of gene expression was calculated by t-test with Wilcoxon matched-pairs signed rank test. *** represents p < 0.001 and n.s. represents no significance. Survival curves comparing 2 populations with high (red) and low (black) gene expression in lung adenocarcinoma patients were performed on the Kaplan–Meier plotter – CLDN3 (D) and AQP5 (E), and PrognoScan databases – AQP5 (F, G) and AGR2 (H). The analysis criteria of the PrognoScan and Kaplan–Meier plotter databases were Cox p-value < 0.05 and log-rank p-value < 0.05 respectively. The raw data of the GEO and PrognoScan databases were extracted and re-plotted by GraphPad Prism 5 software. (I) The forest plots showed hazard ratios (95% CI, confidence interval) identified from the PrognoScan and Kaplan–Meier plotter databases.

Analysis of IL33 in lung adenocarcinoma

The mRNA expression of IL33 is significantly downregulated in lung adenocarcinoma compared to normal tissue (Table 7). We also found that IL33 expression is decreased in lung adenocarcinoma identified from the GSE10072 array (Figure 7A). The survival curve analysis performed using the SurvExpress database showed that the high risk population with lower expression of IL33 has poorer survival outcomes for lung adenocarcinoma patients (Figure 7B, 7C). This phenomenon was also observed in the PrognoScan (Figure 7D–7F) and Kaplan–Meier plotter databases (Figure 7G). The prognostic values of IL33 in lung adenocarcinoma was illustrated as a forest plot (Figure 7H).
Table 7

Analysis of IL33 mRNA expression in lung adenocarcinoma compared to normal tissue from Oncomine database

GeneFold change (Cancer/Normal)P-valueGene Ranking (Top%)Samples (Normal : Tumor)DatasetProbe
IL33−3.8096.11E-20120 : 226Okayama209821_at
−4.3251.04E-9525 : 25Wei209821 _at
−3.2761.23E-21249 : 58Landi209821_at
−7.0881.82E-9330 : 27Su209821_at
−3.5822.02E-11765 : 45Hou209821_at
−2.1638.36E-5717 : 132Bhattacharjee35333_r_at
−2.3515.08E-5819 : 20Stearman35333_r_at
−4.2585.82E-30158 : 58SelamatILMN_1809099
Figure 7

Analysis of IL33 in clinical lung adenocarcinoma patients using bioinformatics databases

The gene expression of IL33 (A), comparing 31 pairs of clinical lung adenocarcinoma (red) and adjacent normal tissue (blue), was performed on a GSE10072 microarray from the GEO database. The p-value of gene expression was calculated by t-test with Wilcoxon matched-pairs signed rank test. *** represents p < 0.001. (B) The survival curve was captured from the SurvExpress database, which divided lung adenocarcinoma patients into 2 populations of high (red) and low (green) risk, and (C) the box plots showed that the high risk (red) population has lower levels of IL33 expression, while the low risk (green) population has higher levels of IL33 expression. (D–F) The survival curves comparing 2 populations with high (red) and low (black) gene expression in lung adenocarcinoma patients were performed on the PrognoScan and (G) Kaplan–Meier plotter databases. The analysis criteria of the PrognoScan and Kaplan–Meier plotter databases were Cox p-value < 0.05 and log-rank p-value < 0.05 respectively. Raw data of the GEO and the PrognoScan databases were extracted and re-plotted by GraphPad Prism 5 software. (H) The forest plots showed hazard ratios (95% CI, confidence interval) identified from the PrognoScan and Kaplan–Meier plotter databases.

Analysis of IL33 in clinical lung adenocarcinoma patients using bioinformatics databases

The gene expression of IL33 (A), comparing 31 pairs of clinical lung adenocarcinoma (red) and adjacent normal tissue (blue), was performed on a GSE10072 microarray from the GEO database. The p-value of gene expression was calculated by t-test with Wilcoxon matched-pairs signed rank test. *** represents p < 0.001. (B) The survival curve was captured from the SurvExpress database, which divided lung adenocarcinoma patients into 2 populations of high (red) and low (green) risk, and (C) the box plots showed that the high risk (red) population has lower levels of IL33 expression, while the low risk (green) population has higher levels of IL33 expression. (D–F) The survival curves comparing 2 populations with high (red) and low (black) gene expression in lung adenocarcinoma patients were performed on the PrognoScan and (G) Kaplan–Meier plotter databases. The analysis criteria of the PrognoScan and Kaplan–Meier plotter databases were Cox p-value < 0.05 and log-rank p-value < 0.05 respectively. Raw data of the GEO and the PrognoScan databases were extracted and re-plotted by GraphPad Prism 5 software. (H) The forest plots showed hazard ratios (95% CI, confidence interval) identified from the PrognoScan and Kaplan–Meier plotter databases.

Analysis of ZDHHC9, BTNL9, GNG11, and CPED1 in lung adenocarcinoma

The mRNA expression of BTNL9, GNG11, or CPED1 is downregulated in lung adenocarcinoma when compared to normal tissue, and ZDHHC9 is upregulated (Table 8). In the GSE10072 array, GNG11 (Figure 8A) and CPED1 (Figure 8B) expression is downregulated in lung adenocarcinoma. There is no specific probe for ZDHHC9 and BTNL9 in the GSE10072 array. Survival curve analysis revealed that lung adenocarcinoma patients with high expression of ZDHHC9 (Figure 8C), BTNL9 (Figure 8D–8F), GNG11 (Figure 8G, 8H) or CPED1 (Figure 8I–8K) are correlated with better survival outcomes. The forest plot showed the prognostic values of ZDHHC9, BTNL9, GNG11, and CPED1 (Figure 8L).
Table 8

Analysis of ZDHHC9, BTNL9, GNG11 and CPED1 mRNA expression in lung adenocarcinoma compared to normal tissue from Oncomine database

GeneFold change (Cancer/Normal)P-valueGene Ranking (Top%)Samples (Normal : Tumor)DatasetProbe
ZDHHC92.2958.84E-12520 : 226Okayama222451_s_at
2.6219.16E-10325 : 25Wei222451_s_at
2.3773.27E-19158 : 58SelamatILMN_1803824
BTNL9−12.0003.77E-14320 : 226Okayama228434_at
−10.9166.99E-26125 : 25Wei228434_at
−9.1021.02E-24165 : 45Hou228434_at
GNG11−3.6309.97E-21120 : 226Okayama204115_at
−4.3376.55E-11425 : 25Wei204115_at
−3.4631.49E-12130 : 27Su204115_at
−3.2777.32E-23249 : 58Landi204115_at
−3.4857.75E-11119 : 20Stearman37908_at
−20.2132.51E-11117 : 132Bhattacharjee37908_at
−6.2026.90E-25258 : 58SelamatILMN_1782419
−2.8304.15E-11310 : 86BeerU31384_at
−4.2963.94E-445 : 5Wachi204115_at
CPED1−2.1817.18E-14225 : 25Wei220032_at
−3.1081.13E-17265 : 45Hou228728_at
−3.3251.03E-34158 : 58SelamatILMN_1677038
Figure 8

Analysis of ZDHHC9, BTNL9, GNG11 and CPED1 in clinical lung adenocarcinoma patients using bioinformatics databases

The gene expression of GNG11 (A) and CPED1 (B), comparing 31 pairs of clinical lung adenocarcinoma (red) and adjacent normal tissue (blue), was performed on a GSE10072 microarray from the GEO database. The p-value of gene expression was calculated by t-test with Wilcoxon matched-pairs signed rank test. *** represents p < 0.001. The survival curves comparing 2 populations with high (red) and low (black) gene expression in lung adenocarcinoma patients were performed on the Kaplan–Meier plotter database – ZDHHC9 (C), BTNL9 (D) and CPED1 (I), and the PrognoScan database – BTNL9 (E, F), GNG11 (G, H), and CPED1 (J, K). The analysis criteria of the PrognoScan and Kaplan–Meier plotter databases were Cox p-value < 0.05 and log-rank p-value < 0.05 respectively. Raw data of the GEO and PrognoScan databases were extracted and re-plotted by GraphPad Prism 5 software. (L) The forest plots showed hazard ratios (95% CI, confidence interval) identified from the PrognoScan and Kaplan–Meier plotter databases.

Analysis of ZDHHC9, BTNL9, GNG11 and CPED1 in clinical lung adenocarcinoma patients using bioinformatics databases

The gene expression of GNG11 (A) and CPED1 (B), comparing 31 pairs of clinical lung adenocarcinoma (red) and adjacent normal tissue (blue), was performed on a GSE10072 microarray from the GEO database. The p-value of gene expression was calculated by t-test with Wilcoxon matched-pairs signed rank test. *** represents p < 0.001. The survival curves comparing 2 populations with high (red) and low (black) gene expression in lung adenocarcinoma patients were performed on the Kaplan–Meier plotter database – ZDHHC9 (C), BTNL9 (D) and CPED1 (I), and the PrognoScan database – BTNL9 (E, F), GNG11 (G, H), and CPED1 (J, K). The analysis criteria of the PrognoScan and Kaplan–Meier plotter databases were Cox p-value < 0.05 and log-rank p-value < 0.05 respectively. Raw data of the GEO and PrognoScan databases were extracted and re-plotted by GraphPad Prism 5 software. (L) The forest plots showed hazard ratios (95% CI, confidence interval) identified from the PrognoScan and Kaplan–Meier plotter databases.

Identification of genetic regulation in lung adenocarcinoma using next-generation sequencing

We simultaneously performed small RNA-seq in these 3 pairs of specimens using next-generation sequencing (Figure 9). We focused on microRNAs and found 22 upregulated microRNAs in lung adenocarcinoma using Venn diagram analysis (Figure 9A), which is listed in Table 9. The analysis criteria were fold change > 2 and reads per million (RPM) > 1. No microRNA with downregulated changes were found in our analysis (Figure 9B). Heatmap color clustering showed the expression patterns of each upregulated microRNA from these 3 pairs of specimens (Figure 9C). To further elucidate the genetic interactions in lung adenocarcinoma, we performed miRmap database for target prediction. Among 22 upregulated microRNAs, there were 13 putative targets, shown in the “Targets” Venn diagram (Figure 9D). The prediction threshold was miRmap score > 90.0. The Venn diagram analysis between 13 targets of microRNAs and 8 downregulated genes, shown in Figure 1B, indicates that there were 10 genetic interactions of microRNA-mRNA in lung adenocarcinoma (Figure 9D), which is listed in Table 10. Only 6 genes were involved in these 10 genetic interactions, due to the 3 genes have been attributed to different microRNAs.
Figure 9

Identification of differentially expressed microRNAs in lung adenocarcinoma compared to adjacent normal tissue using next-generation sequencing

Venn diagram analysis showed 22 upregulated microRNAs (A) and 0 downregulated microRNAs (B) in lung adenocarcinoma, compared to adjacent normal tissue from 3 pairs of clinical specimens. The criteria were fold change > 2 (tumor/normal) and reads per million (RPM) > 1. (C) The heatmap diagram showed the differentially expressed genes with z-score (log2) values by using color clustering on the GENE-E web-tool. Green represents downregulation (minimum = −3.0), and red represents upregulation (maximum = 3.0). (D) The “Targets” Venn diagram shows the predicted genes of microRNAs from the “microRNAs” Venn diagram using the miRmap web-site database. The selection threshold was miRmap score ≥ 90.0. The intersection Venn diagram between “mRNAs” and “Targets” showed total of 6 potential microRNA-mRNA interactions.

Table 9

Differentially expressed microRNAs identified from next-generation sequencing data

miRNAsPrecursorRPM (Reads per million)T/N
N1T1N2T2N3T3
hsa-miR-1307-5phsa-mir-1307118.34396.09109.76504.1328.3466.54Up
hsa-miR-130b-5phsa-mir-130b2.3112.191.715.581.293.25Up
hsa-miR-130b-3phsa-mir-130b15.6132.8314.0964.8513.729.88Up
hsa-miR-182-5phsa-mir-182926.75350.27731.465334.411779.5715146.54Up
hsa-miR-183-5phsa-mir-183179.581197.91111.57917.474712525.39Up
hsa-miR-190a-5phsa-mir-190a6.730.481126.462.766.5Up
hsa-miR-200a-5phsa-mir-200a7.428.837.0587.6710.0836.2Up
hsa-miR-200b-3phsa-mir-200b459.711338.61275.252046.74245.19494.41Up
hsa-miR-21-3phsa-mir-21472.882282.29448.17108.65312.3862.23Up
hsa-miR-224-5phsa-mir-22415.9585.3417.72150.9715.1668.89Up
hsa-miR-301b-3phsa-mir-301b4.74173.226.4122.72.768.49Up
hsa-miR-31-5phsa-mir-3191.18452.095.9865.765.1755.44Up
hsa-miR-33b-5phsa-mir-33b5.7820.575.4551.232.510.56Up
hsa-miR-345-5phsa-mir-34534.4497.5328.29178.8554.97139.04Up
hsa-miR-424-3phsa-mir-4242.4330.224.3847.342.510.38Up
hsa-miR-424-5phsa-mir-42435.82240.934.27164.0717.8354.17Up
hsa-miR-429hsa-mir-429143.41407.01153.53673.9119.92359.34Up
hsa-miR-450a-5phsa-mir-450a-14.3925.535.8728.42.938.67Up
hsa-miR-450a-5phsa-mir-450a-24.3925.45.8728.272.938.67Up
hsa-miR-452-5phsa-mir-45211.7936.9523.8172.520.1645.05Up
hsa-miR-542-3phsa-mir-54211.3354.2313.9960.057.4134.67Up
hsa-miR-7705hsa-mir-7705314.64.9128.43.277.4Up
hsa-miR-96-5phsa-mir-9625.316413.7755.2519.82204.5Up
Table 10

Genes selected between differentially expressed genes and putative targets of microRNA

miRNAPredicted targetsmiRmap score
hsa-miR-183-5pBTNL992.2457
hsa-miR-200b-3pCPED197.3127
hsa-miR-33b-5pCPED191.5414
hsa-miR-429CPED197.7498
hsa-miR-182-5pFMO298.0884
hsa-miR-345-5pFMO296.6797
hsa-miR-130b-5pIL3398.1906
hsa-miR-542-3pIL3396.593
hsa-miR-21-3pNDRG495.067
hsa-miR-424-5pPDK499.1986

Identification of differentially expressed microRNAs in lung adenocarcinoma compared to adjacent normal tissue using next-generation sequencing

Venn diagram analysis showed 22 upregulated microRNAs (A) and 0 downregulated microRNAs (B) in lung adenocarcinoma, compared to adjacent normal tissue from 3 pairs of clinical specimens. The criteria were fold change > 2 (tumor/normal) and reads per million (RPM) > 1. (C) The heatmap diagram showed the differentially expressed genes with z-score (log2) values by using color clustering on the GENE-E web-tool. Green represents downregulation (minimum = −3.0), and red represents upregulation (maximum = 3.0). (D) The “Targets” Venn diagram shows the predicted genes of microRNAs from the “microRNAs” Venn diagram using the miRmap web-site database. The selection threshold was miRmap score ≥ 90.0. The intersection Venn diagram between “mRNAs” and “Targets” showed total of 6 potential microRNA-mRNA interactions.

DISCUSSION

Lung cancer, one of the leading causes of cancer-related death worldwide [29], still has much that requires further study. In our project, we hoped to identify novel gene expression signature or genetic interactions of gene-microRNA in lung adenocarcinoma by using next-generation sequencing combined with systematic bioinformatics analysis. We found 17 differentially expressed genes in lung adenocarcinoma compared to its adjacent normal lung tissue, which were classified into 6 functional groups based on a search of the literature. These results indicated that tumor progression is involved in alterations of various biological functions. We then summarized the potential oncogenic and tumor suppressor roles of these genes in lung adenocarcinoma (Supplementary Table 2). TOX3 contains an HMG-box (high mobility group box) domain. The function of TOX3 remains unclear, but it may be involved in various DNA-dependent processes [30-32]. TOX3 polymorphisms and epigenetic regulation have been demonstrated in breast cancer [33] and lung cancer [34] respectively. In our study, TOX3 was significantly upregulated in lung adenocarcinoma, and higher expression of TOX3 is correlated with better survival outcome. We speculated that as more factors may be involved in TOX3-related mechanisms of tumor progression, more studies are needed to clarify the relationship between TOX3 expression and tumor progression. SPDEF containing ETS domain has been reported to be overexpressed in many cancers [35-37]. Our study suggests that SPDEF may play an oncogenic role in lung adenocarcinoma, although some reports have shown that SPDEF can suppress cancer metastasis [38]. However, contrary effects of SPDEF on tumorigenesis require further research. PDK4 is a mitochondrial protein that can regulate glucose metabolism through inhibition of pyruvate dehydrogenase complex. An aberrant metabolism is one of the characteristics of cancer cells. In liver cancer, PDK4 has been identified as a potential tumor suppressor [39]. Ironically, PDK4 exerts oncogenic effects in colon cancer [40]. In our study, PDK4 played a potential tumor suppressor role in lung adenocarcinoma. FMO2 is an NADPH-dependent enzyme that catalyzes the oxygenation of substrates [41], but the effect of FMO2 on tumorigenesis is unclear. Although genetic polymorphisms of FMO genes may influence drug metabolism [42], we found that FMO2 might have tumor suppressor effects in lung adenocarcinoma. FABP4 is involved in fatty acids trafficking and metabolism. Fatty acids serve as both an energy source and signaling molecules that can regulate various cellular functions [43]. The dysfunction of FABP proteins has been found to be associated with some metabolic diseases [44], and elevated FABP4 has been observed in many types of cancer [45-47]. Our data showed that FABP4 may have tumor suppressor effects in lung adenocarcinoma. CDKN2A encodes two spliced transcripts, p16INK4a and p14ARF, which regulate cell cycle progression through inhibition of CDK4 kinase and p53 respectively [48]. CDKN2A has been shown as a tumor suppressor in cancer progression [49], and its alterations, including epigenetic modifications, deletion, and mutations, frequently occur in cancers [50]. In our study, CDKN2A may have potential oncogenic effects in lung adenocarcinoma. PHLDA2, located in an imprinted region on chromosome 11p15.5, has primarily been studied for its regulation of placental growth [51]. Although the role of PHLDA2 in cancer is unclear, our data showed that PHLDA2 may potentially exert oncogenic effects in lung adenocarcinoma. SFN is involved in protein synthesis and epithelial cell growth. Numerous reports have demonstrated the molecular functions of SFN in keratinocytes and fibroblasts [52]. Furthermore, elevated expression of SFN has been reported in lung adenocarcinoma [53], and our study also found that SFN may exert oncogenic effects in lung adenocarcinoma. NDRG4 is involved in the regulation of cell cycle progression [54] and has been identified as a novel tumor suppressor in colon cancer [55]. We found that NDRG4 levels were significantly decreased in lung adenocarcinoma, but with regard to survival analysis, the expression of NDRG4 has shown no significant influence on survival rates of lung cancer patients. AGR2 is an endoplasmic reticulum (ER) protein which can catalyze protein folding. Its oncogenic role and increased expression have been reported in different types of cancer [56-58]. In our study, we found that AGR2 may serve as a potential prognostic biomarker of lung adenocarcinoma. AQP5, aquaporin 5, is a water channel protein involved in pulmonary secretions, and elevated expression of AQP5 is associated with poor survival outcome in many types of cancer [59-61]. In our study, the expression of AQP5 was upregulated in lung adenocarcinoma, and its high expression correlated with better survival outcome. CLDN3 regulates tight junctions of cell-cell adhesion in epithelial or endothelial cells and is overexpressed in ovarian [62] and colon cancer [63]. Loss of claudin 3 expression increases the metastatic ability of esophageal cancer [64], whereas claudin 3 is upregulated in lung adenocarcinoma [65]. Our study showed that claudin 3 may have oncogenic effects in lung adenocarcinoma. IL33 is a cytokine involved in a spectrum of biological processes, and the chronic inflammatory signaling activation is known to be involved in cancer progression. The expression of IL33 in tumor tissues is depressed, but tumor stroma and serum have increased levels of IL33, suggesting the distinct functions of IL33 in cancer cells from the microenvironment [66]. IL-33 is shown to promote tumorigenesis and induce stemness in breast cancer [67]. In ApcMin/+ mice, epithelial-derived IL-33 can promote intestinal tumorigenesis [68]. These reports indicated the function of IL-33 in tumorigenesis is controversial. In our analysis, we found that IL33 may have tumor suppressor functions in lung adenocarcinoma. ZDHHC9 is a palmitoyltransferase that can regulate palmitoylation of HRAS and NRAS. The function of ZDHHC9 in cancers is unclear, although inactivation of ZDHHC9 can reduce leukemogenic effects through repression of oncogenic NRAS [69]. According to our data, increased ZDHHC9 is observed in lung adenocarcinoma, but its high expression is correlated with better rates of survival. BTNL9 belongs to the immunoglobulin superfamily, with the butyrophilin family modulating immune homeostasis [70]. Although the function of BTNL9 in tumorigenesis remains unclear, our results suggest that BTNL9 may serve as a tumor suppressor. GNG11 is a lipid-anchored protein, which has been reported to inhibit cell growth [71] and regulate cellular senescence in lymphoma [72]. We found that GNG11 may exert suppressor functions in the tumorigenesis of lung adenocarcinoma. CPED1, also known as C7orf58, contains a cadherin-like beta sandwich domain [73], but the molecular function of CPED1 is unclear. Our data showed that CPED1 may play a role as a potential tumor suppressor in lung adenocarcinoma. The summary of differentially expressed genes in the Oncomine database is shown in Supplementary Table 3. Seven of a total of 11 datasets showed similar patterns of genetic expression, suggesting that this molecular change is constant between lung adenocarcinoma and normal lung tissue. Thus, those genes found in this study may represent a novel gene expression signature in lung adenocarcinoma (Figure 10). The increased expression of AGR2 and decreased expression of IL33 have also been identified in other reports. We also analyzed the expression of microRNAs in lung adenocarcinoma (Table 9). Twenty-too upregulated microRNAs were identified in lung adenocarcinoma. We focused on microRNAs with predictable putative targets - BTNL9, FMO2, IL33, CPED1, and PDK4. Among these microRNAs, elevated expression of hsa-miR-183-5p [74], hsa-miR-33b-5p [75], hsa-miR-429 [76], hsa-miR-182-5p [77], and hsa-miR-130b-5p [78] have been associated with tumorigenesis in lung cancer. The function of hsa-miR-542-3p is unclear. However, since the genetic interactions of hsa-miR-183-5p-BTNL9, hsa-miR-33b-5p-CPED1, hsa-miR-429-CPED1, hsa-miR-182-5p-FMO2, hsa-miR-130b-5p-IL33, and hsa-miR-542-3p-IL33 have not been identified, these altered genetic regulations may play important roles in the progression of lung adenocarcinoma.
Figure 10

The proposed novel molecular signatures of gene regulations involved in lung adenocarcinoma

MATERIALS AND METHODS

Clinical lung adenocarcinoma specimens

Three pairs of tumors and adjacent non-tumor lung tissues were collected from the Division of Thoracic surgery and Division of Pulmonary and Critical Care Medicine, Kaohsiung Medical University Hospital (KMUH), Kaohsiung, Taiwan. Approval for these studies was obtained from the Institutional Review Board (IRB) of KMUH, and informed consent was obtained from all patients in accordance with the Declaration of Helsinki.

Next-generation sequencing (NGS)

The expression profile of mRNA and microRNA was performed using NGS [22]. Three pairs of lung adenocarcinomas and adjacent normal specimens were used in this project. Total RNA was extracted by using Trizol® Reagent (Invitrogen, USA), according to the manufacturer's instructions. The cell lysates were applied to Welgene Biotechnology Company (Welgene, Taipei, Taiwan) for RNA-seq and small RNA-seq analysis. The criteria for differentially expressed mRNA analysis were fold change > 2 and fragments per kilobase million (FPKM) > 0.3. The criteria for differentially expressed microRNAs’ selection were fold change > 2 and reads per million (RPM) > 1.

Oncomine database analysis

The Oncomine database contains over 18,000 microarray experiments and 35 major cancer types [23]. The raw data of mRNA expression in clinical lung adenocarcinoma and normal specimens (cancer vs. normal) were extracted from the Oncomine database (http://www.oncomine.org, Compendia biosciences, Ann Arbor, MI, USA). The criteria in the analysis were P-value < 1E-4, fold change > 2, and gene rank in top 10%. P-value was calculated using the Oncomine database through two-sided Student's t-test. For the comparison of genes in each dataset, raw data was extracted and re-plotted using the GENE-E web-tool (https://software.broadinstitute.org/GENE-E/), and the relative color scheme was used for clustering as minimum = 0 (blue) and maximum = 1 (yellow). Eleven datasets were selected for our analysis, including Hou (normal = 65 and lung adenocarcinoma = 45) [79], Landi (normal = 49 and lung adenocarcinoma = 58) [80], Selamat (normal = 58 and lung adenocarcinoma = 58) [81], Okayama (normal = 20 and lung adenocarcinoma = 226) [82], Su (normal = 30 and lung adenocarcinoma = 27) [83], Wei (normal = 25 and lung adenocarcinoma = 25)[84], Stearman (normal = 19 and lung adenocarcinoma = 20) [85], Bhattacharjee (normal = 17 and lung adenocarcinoma = 132) [86], Beer (normal = 10 and lung adenocarcinoma = 86) [87], Garber (normal = 5 and lung adenocarcinoma = 40) [88], and Wachi (normal = 5 and lung adenocarcinoma = 5) [89].

SurvExpress database analysis

SurvExpress integrates the TCGA database (https://tcga-data.nci.nih.gov) which provides microarray information, including cancer type, survival, and gene expression. The correlation between IL33 mRNA expression and survival rate was analyzed on the SurvExpress web-databse (http://bioinformatica.mty.itesm.mx/SurvExpress). The dataset lung adenocarcinoma TCGA (N = 255) was used in our analysis. Samples of each dataset were split into 2 risk groups (high and low risk) of the same size, of which each group was determined by the ordered Prognostic Index (PI, high value for high risk) [27]. Prognostic Index (PI) is the linear component of the Cox model, computed by gene expression value multiplied with values estimated from the Cox fitting [90].

PrognoScan database analysis

PrognoScan collects information of the GEO (Gene Expression Omnibus) microarray database, including cancer type, survival rates, and gene expression. The correlation between gene expression and overall survival rates was performed on the PrognoScan web-databse (http://www.abren.net/PrognoScan/) [25]. The raw data were extracted and re-plotted by GraphPad Prism 5 software (GraphPad Software, La Jolla, CA, USA). The threshold was determined as Cox p-value < 0.05. Samples of each dataset were divided into 2 expression groups (high and low) at the potential cutpoint. The cutpoint (from < 0.1 or > 0.9 quantile) was estimated by the minimum P-value approach [91], and the P-value correlation was calculated by the formula [92]. The hazard ratios (95% confidence intervals) of each dataset was calculated using the Cox proportional model, and are listed in the related Tables. HR = 0 represents no difference between 2 groups, HR < 1 represents better survival rate in the population with higher levels of expression, and HR > 1 represents better survival rates in the population with lower levels of expression. The specific probe of each dataset is listed in its related Figure.

Kaplan–Meier plotter database analysis

The Kaplan-Meier plotter is a web-database providing the information on 54675 genes’ expression and survival rates in 10461 cancer samples, including 5143 breast, 1816 ovarian, 2437 lung, and 1065 gastric cancer patients [26]. The correlation of gene expression and overall survival rates in lung cancer was determined, and lung adenocarcinoma (N = 720) was selected in our analysis. Patients were split into 2 populations with the best cut-off, which was computed with median survival. The hazard ratios (95% confidence intervals) were calculated using the Cox proportional model, and are listed in the related Tables. The specific probe of each dataset was listed in its related Figure.

Gene expression omnibus (GEO) database analysis

GEO is a web-database providing submitted high throughput gene expression data of microarrays, chips, or NGS (https://www.ncbi.nlm.nih.gov/geo/) [24]. We selected the microarray with accession number GSE10072, published in 2008 [80], for this project. This microarray provides gene expression information of 180 clinical lung adenocarcinoma and non-tumor samples. Here, we selected 31 pairs of lung adenocarcinoma with adjacent normal tissue for gene expression analysis. The raw data were analyzed and extracted from GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/), and re-plotted by GraphPad prism 5 software (GraphPad Software, La Jolla, CA, USA). The p-value was calculated by using t-test with Wilcoxon matched-pairs signed rank test.

miRmap database analysis

miRmap is a web-tool database providing analysis of microRNA targets prediction (http://mirmap.ezlab.org/) [28]. It identifies the putative target genes by calculating the complementary ability of microRNA-mRNA interactions. The strength of mRNA repression is estimated for ranking potential candidate targets by employing various features, including thermodynamic, evolutionary, probabilistic or sequence-based features. The prediction results show a list of putative target genes with miRmap scores, which are a predictive reference values. Putative targets with miRmap scores ≥ 90.0 were selected for this project.

Statistical analysis

The raw data extracted from GEO database were statistically analyzed using t-test with Wilcoxon matched-pairs signed rank test by GraphPad Prism 5 software (GraphPad Software, La Jolla, CA, USA).
  87 in total

Review 1.  HMG1 and 2: architectural DNA-binding proteins.

Authors:  J O Thomas
Journal:  Biochem Soc Trans       Date:  2001-08       Impact factor: 5.407

2.  DNA damage responsive miR-33b-3p promoted lung cancer cells survival and cisplatin resistance by targeting p21WAF1/CIP1.

Authors:  Shun Xu; Haijiao Huang; Yu-Ning Chen; Yun-Ting Deng; Bing Zhang; Xing-Dong Xiong; Yuan Yuan; Yanmei Zhu; Haiyong Huang; Luoyijun Xie; Xinguang Liu
Journal:  Cell Cycle       Date:  2016-08-25       Impact factor: 4.534

Review 3.  Prostate-derived Ets factor, an oncogenic driver in breast cancer.

Authors:  Ashwani K Sood; Joseph Geradts; Jessica Young
Journal:  Tumour Biol       Date:  2017-05

4.  Lung cancer prognosis before and after recurrence in a population-based setting.

Authors:  Dario Consonni; Mariaelena Pierobon; Mitchell H Gail; Maurizia Rubagotti; Melissa Rotunno; Alisa Goldstein; Lynn Goldin; Jay Lubin; Sholom Wacholder; Neil E Caporaso; Pier Alberto Bertazzi; Margaret A Tucker; Angela C Pesatori; Maria Teresa Landi
Journal:  J Natl Cancer Inst       Date:  2015-03-23       Impact factor: 13.506

5.  Drug resistance and its significance for treatment decisions in non-small-cell lung cancer.

Authors:  E Tsvetkova; G D Goss
Journal:  Curr Oncol       Date:  2012-06       Impact factor: 3.677

Review 6.  Polycyclic aromatic hydrocarbons: from metabolism to lung cancer.

Authors:  Bhagavatula Moorthy; Chun Chu; Danielle J Carlin
Journal:  Toxicol Sci       Date:  2015-05       Impact factor: 4.849

Review 7.  MicroRNAs and lung cancer: new oncogenes and tumor suppressors, new prognostic factors and potential therapeutic targets.

Authors:  Cécile Ortholan; Marie-Pierre Puissegur; Marius Ilie; Pascal Barbry; Bernard Mari; Paul Hofman
Journal:  Curr Med Chem       Date:  2009       Impact factor: 4.530

8.  Differential epigenetic regulation of TOX subfamily high mobility group box genes in lung and breast cancers.

Authors:  Mathewos Tessema; Christin M Yingling; Marcie J Grimes; Cynthia L Thomas; Yushi Liu; Shuguang Leng; Nancy Joste; Steven A Belinsky
Journal:  PLoS One       Date:  2012-04-04       Impact factor: 3.240

Review 9.  Implications of Genetic and Epigenetic Alterations of CDKN2A (p16(INK4a)) in Cancer.

Authors:  Ran Zhao; Bu Young Choi; Mee-Hyun Lee; Ann M Bode; Zigang Dong
Journal:  EBioMedicine       Date:  2016-05-03       Impact factor: 8.143

10.  Stratifin, a keratinocyte specific 14-3-3 protein, harbors a pleckstrin homology (PH) domain and enhances protein kinase C activity.

Authors:  E Dellambra; M Patrone; B Sparatore; A Negri; F Ceciliani; S Bondanza; F Molina; F D Cancedda; M De Luca
Journal:  J Cell Sci       Date:  1995-11       Impact factor: 5.285

View more
  28 in total

1.  Altered global microRNA expression in hepatic stellate cells LX-2 by angiotensin-(1-7) and miRNA-1914-5p identification as regulator of pro-fibrogenic elements and lipid metabolism.

Authors:  Brenda de Oliveira da Silva; Luciane Carla Alberici; Letícia Ferreira Ramos; Caio Mateus Silva; Marina Bonfogo da Silveira; Carlos R P Dechant; Scott L Friedman; Kumiko Koibuchi Sakane; Letícia Rocha Gonçalves; Karen C M Moraes
Journal:  Int J Biochem Cell Biol       Date:  2018-03-07       Impact factor: 5.085

Review 2.  State-of-the-Art Molecular Oncology of Lung Cancer in Taiwan.

Authors:  Yung-Hung Luo; Kung-Hao Liang; Hsu-Ching Huang; Chia-I Shen; Chi-Lu Chiang; Mong-Lien Wang; Shih-Hwa Chiou; Yuh-Min Chen
Journal:  Int J Mol Sci       Date:  2022-06-24       Impact factor: 6.208

3.  GOLM1 predicts poor prognosis of patients with NSCLC and is associated with the proliferation and chemo-sensitivity of cisplatin in NSCLC cells: bioinformatics analysis and laboratory validation.

Authors:  Mei Zhao; Xuelian Li; Xin Chen
Journal:  J Bioenerg Biomembr       Date:  2021-02-04       Impact factor: 2.945

Review 4.  Subtype-dependent regulation of Gβγ signalling.

Authors:  Mithila Tennakoon; Kanishka Senarath; Dinesh Kankanamge; Kasun Ratnayake; Dhanushan Wijayaratna; Koshala Olupothage; Sithurandi Ubeysinghe; Kimberly Martins-Cannavino; Terence E Hébert; Ajith Karunarathne
Journal:  Cell Signal       Date:  2021-02-11       Impact factor: 4.850

Review 5.  The Pro-tumorigenic IL-33 Involved in Antitumor Immunity: A Yin and Yang Cytokine.

Authors:  Jean-Jacques Fournié; Mary Poupot
Journal:  Front Immunol       Date:  2018-10-26       Impact factor: 7.561

6.  Identification of Potential Crucial Genes Associated With the Pathogenesis and Prognosis of Endometrial Cancer.

Authors:  Li Liu; Jiajing Lin; Hongying He
Journal:  Front Genet       Date:  2019-04-26       Impact factor: 4.599

7.  Distinct signatures of lung cancer types: aberrant mucin O-glycosylation and compromised immune response.

Authors:  Marta Lucchetta; Isabelle da Piedade; Mohamed Mounir; Marina Vabistsevits; Thilde Terkelsen; Elena Papaleo
Journal:  BMC Cancer       Date:  2019-08-20       Impact factor: 4.430

Review 8.  Lamins in Lung Cancer: Biomarkers and Key Factors for Disease Progression through miR-9 Regulation?

Authors:  Julien Guinde; Diane Frankel; Sophie Perrin; Valérie Delecourt; Nicolas Lévy; Fabrice Barlesi; Philippe Astoul; Patrice Roll; Elise Kaspi
Journal:  Cells       Date:  2018-07-16       Impact factor: 6.600

Review 9.  The Pleiotropic Immunomodulatory Functions of IL-33 and Its Implications in Tumor Immunity.

Authors:  Claudia Afferni; Carla Buccione; Sara Andreone; Maria Rosaria Galdiero; Gilda Varricchi; Gianni Marone; Fabrizio Mattei; Giovanna Schiavoni
Journal:  Front Immunol       Date:  2018-11-13       Impact factor: 7.561

10.  Identification of genes and analysis of prognostic values in nonsmoking females with non-small cell lung carcinoma by bioinformatics analyses.

Authors:  Guangda Yang; Qianya Chen; Jieming Xiao; Hailiang Zhang; Zhichao Wang; Xiangan Lin
Journal:  Cancer Manag Res       Date:  2018-10-08       Impact factor: 3.989

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.