Literature DB >> 31186783

Gene set enrichment analysis and meta-analysis identified 12 key genes regulating and controlling the prognosis of lung adenocarcinoma.

Wenwu He¹, Liangmin Fu², Qunlun Yan², Qiuxi Zhou³, Kun Yuan⁴, Linxin Chen², Yongtao Han¹.

Abstract

The aim of the present study was to analyze lung adenocarcinoma-associated microarray data and identify potentially crucial genes. The gene expression profiles were downloaded from the Gene Expression Omnibus database and 6 datasets, of which 2 were discarded and 4 were retained, were preprocessed using packages in the R computing language. Subsequently, Gene Set Enrichment Analysis (GSEA) and meta-analysis was used to screen the common pathways and differentially expressed genes at the transcriptional level. The genes detected from GSEA through The Cancer Genome Atlas databases were subsequently examined, and the crucial genes by survival data were identified. Pathways of the crucial genes were obtained using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway of the online website Database for Annotation, Visualization and Integrated Discovery (DAVID) tool, and the pathways of crucial genes that were upregulated or downregulated were matched using the Venn method to identify the common crucial pathways. Furthermore, on the basis of the common crucial pathways, key genes that are closely associated with the development and progression of lung adenocarcinoma were identified with the KEGG pathway of DAVID. Additional information was obtained through Gene Ontology annotation. A total of two key pathways, including cell cycle and DNA replication, as well as 12 key genes [DNA polymerase δ subunit 2, DNA replication licensing factor MCM4, MCM6, mitotic checkpoint serine/threonine-protein kinase BUB1, BUB1β, mitotic spindle assembly checkpoint protein MAD2A, dual specificity protein kinase TTK, M-phase inducer phosphatase 1, cell division control protein 45 homolog, cyclin-dependent kinase inhibitor 1C, pituitary tumor-transforming gene 1 protein and polo-like kinase 1] were identified. These key pathways and genes may be studied in future studies involving gene transfection/knockdown, which may provide insights into the prognosis of lung adenocarcinoma. Additional studies are required to confirm their biological function.

Entities: Chemical Disease Gene Mutation Species

Keywords: gene set enrichment analysis; key genes; key pathways; lung adenocarcinoma; meta-analysis

Year: 2019 PMID： 31186783 PMCID： PMC6507356 DOI： 10.3892/ol.2019.10236

Source DB: PubMed Journal: Oncol Lett ISSN： 1792-1074 Impact factor: 2.967

Introduction

Lung cancer is the leading cause of cancer-associated mortality among men and the second leading cause among women worldwide (1). The vast majority of lung cancer cases are non-small cell lung cancer (NSCLC), comprising 80–85% of cases (2), among which adenocarcinoma is the most common histological type (~50% of all NSCLCs) (3). However, despite continuous clinical research from 1975 onwards, the overall 5-year survival rate of patients with NSCLC has only improved from 14 to 18% (4). Therefore, although previous studies have focused on genes associated with lung adenocarcinoma, the genetic molecular mechanism underlying the development of this type of cancer remains to be elucidated. Studies investigating lung adenocarcinoma-associated genes may improve the prognosis, diagnosis and treatment of lung adenocarcinoma. With the developments in the field of biotechnology, the expression levels of thousands of genes can be detected simultaneously by microarray, providing a record of the RNA transcriptional levels in the tissues being studied, further facilitating the study of lung adenocarcinoma (5). All microarray datasets used in the present study are available from the Gene Expression Omnibus (GEO) public database at the National Center for Biotechnology Information (6). However, the large volume of data must be preprocessed and converted into a smaller set of genes, which exhibit meaningful biological differences between the control and test systems. Analyzing such a huge amount of information from microarray datasets to identify molecular pathways and key genes deregulated in lung adenocarcinoma is extremely challenging. Subramanian et al (7) addressed this problem by describing a method, referred to as Gene Set Enrichment Analysis (GSEA), to reveal significant differences in expression between normal and patient samples. GSEA is a test for groups of genes rather than a single gene. However, the sample capacity, the difference of platforms and the standardization may affect the statistical results, and the meta-analysis may also make a difference. Meta-analysis of microarray data may be an improved method of dealing with poor reproducibility and reliability (8,9). These two methods were utilized to select significant genes for Gene Ontology (GO) annotation and identify the genes involved in the molecular mechanism underlying lung adenocarcinoma development. These observations highlight the importance of improving our understanding of the etiology of lung adenocarcinoma, as well as the molecular changes underlying this disease.

Materials and methods

Data collection

All research datasets were selected from GEO (www.ncbi.nlm.nih.gov/geo/), using ‘lung neoplasms’ as the medical subheading search term and setting the study type to ‘expression profiling by array’, then limiting the species to ‘human’. A total of 168 sets of genome-wide expression microarray data associated with lung neoplasms were identified. The studies that met all the following criteria are listed in Table I: i) Data on the expression of genome-wide RNA; ii) valid complete microarray raw data or standardized data; iii) data providing a comparison between lung adenocarcinoma patients with normal controls; iv) data containing ≥6 samples; v) raw data expressed as CEL files; and vi) the studied organism was Homo sapiens. A total of 6 gene expression datasets met all the selection criteria; however, two of the datasets, GSE43458 and GSE19188, presented problems with exporting the data or lacked a correspondence between normal and pathological tissues, respectively, and were therefore discarded. Thus, four datasets were retained containing data on 132 lung adenocarcinomas and 132 normal.

Table I.

Characteristics of datasets selected in the studies.

GEO Accession	Author, year	(Refs.)	Country	Chip	Experimental design	Probes	Disease, n	Normal, n
GSE18842	Sanchez-Palencia et al (2010)	(34)	Spain	HG-U133_Plus_2	Paired, tissues	54675	12	12
GSE33356	Lu et al (2012)	(35)	Taiwan	GPL570 (HG-U133_Plus_2) GPL6801 (GenomeWideSNP_6)	Paired, tissues	54675	60	60
GSE10072	Landi et al (2008)	(36)	USA	GPL96 (HG-U133A)	Paired, tissues	22283	33	33
GSE7670	Su et al (2007)	(37)	Taiwan	HG-U133A	Paired, tissues	22283	27	27

GEO, Gene Expression Omnibus.

GSEA

GSEA primarily analyzes microarray data, using genomic and genetic sequencing to detect significant biological differences in microarray datasets (10). In the present study, differentially expressed genes and common crucial pathways between lung adenocarcinoma patients and normal controls from microarray data were identified by GSEA. Computing and general statistical analysis were processed in the R computing language http://www.R-project.org/ (11). The datasets were normalized and the intensity of the log10 probe set was calculated using the Robust Multichip averaging algorithm with bio-conductors (12). The selected differentially expressed genes were required to have been mapped to an explicit Kyoto Encyclopedia of Genes and Genomes (KEGG; www.genome.jp/kegg/) pathway of the Database for Annotation, Visualization and Integrated Discovery (DAVID; david.abcc.ncifcrf.gov/) for further analysis using the Venn and meta-analysis methods (13). Pathway analysis of each dataset was performed independently. The variability was measured in the interquartile range (IQR) and a cut-off was set in order to foreclose IQR values <0.5 for all the remaining genes. If one gene was targeted in multiple probe sets, the probe set with the greatest variability was retained. In addition, genes in each pathway were subjected to statistical analysis system (SAS), and each pathway's P-value was obtained in the permutation test with 1000×. P<0.05 was considered to indicate a statistically significant difference.

Meta-analysis

A meta-analysis was performed in order to obtain the significantly differentially expressed genes from the genes included in each dataset mentioned above. The meta-analysis was conducted in SAS 9.4 (SAS Institute, Inc., Cary, NC, USA). Then, the χ2 value of each gene was calculated based on the formula according to Brown (14): A cut-off was set in order to foreclose χ2 values <0.05 for all the remaining genes, which were used to obtain the pathways of the KEGG from DAVID Bioinformatics Resources 6.7; k is the number of datasets.

The Cancer Genome Atlas (TCGA) database

TCGA is a coordinated and comprehensive method for promoting our understanding of the molecular mechanisms underlying cancer development. Additional information on lung adenocarcinoma-associated genes identified through clinical data in GSEA may be obtained. The P′-value (P-value in TCGA) was adjusted to <0.05. A total of 2,494 significantly differentially expressed genes were obtained. Subsequently, 610 differentially expressed genes from the meta-analysis were matched with the 2,494 genes from TCGA by the Venn method, which allowed crucial genes to be filtered out according to the survival data.

Gene annotation of DAVID

Crucial genes were entered into DAVID, selecting the official gene symbol as ‘select identifier’ and gene list as ‘list type’ in the upload. A species limit of humans was set in the list and background. Selecting the functional annotation tool and entering the option of pathways, crucial common pathways of crucial genes were obtained by the KEGG pathway of DAVID and their numbers in the KEGG database.

Identification of significant common pathways and key genes

As the significant common pathways serve an important role in the pathogenesis of lung adenocarcinoma, identifying significant common pathways was also attempted. Crucial common pathways were matched with upregulated and downregulated pathways by the Venn method to identify significant common pathways. Key genes serving important roles in significant common pathways were obtained. Furthermore, in order to gain an improved insight into the key genes, the Blast2GO software (version 1.9; /david.ncifcrf.gov/) was used to annotate all 12 key genes. A preliminary understanding of the association between key genes was also provided by the String website (http://string-db.org). The term ‘lung adenocarcinoma’ and organism ‘Homosapiens’ were used to search and obtain clinical data of 221 patients from TCGA. Furthermore, the data was analyzed by single factor Cox regression analysis, setting the minimum time 0.1 and the maximum time as 10, and the year as the time unit. From the results of the Cox regression analysis, the Kaplan-Meier curves were plotted and patients were organized into either high or low risk.

Results

GSEA analysis

Based on the criteria mentioned above, six datasets were obtained of which four were retained containing 132 lung adenocarcinomas and 132 normal tissues. The GSEA method was performed independently on the four datasets, and common pathways and differentially expressed genes were screened out from the four datasets. Detailed information on the analysis results is presented in Table I. A volcano plot (Fig. 1) was used to initially screen the genes in a crude manner. Genes present outside of the two vertical lines were considered to be the differentially expressed genes of each database. The distance a gene was from the vertical line indicated the degree of difference in expression of that gene.

Figure 1.

Volcano plot of four datasets to determine the significantly differentially expressed genes. Genes outside of the middle of the two vertical were considered to be significantly differentially expressed. The further the position of the gene from the vertical line, the larger the difference in expression. FC, fold-change.

Meta-analysis is a tool that can help obtain significantly differentially expressed genes from GSEA analysis (15). The SAS was used to calculate the P-value for each gene. In addition, the gene probe platform was downloaded from the GEO database so that the gene probe number could be translated into the gene name, and the gene names were entered into SAS version 9.42 software for total analysis. A total of 610 significant differentially expressed genes were obtained (data not shown). The common pathways, including 78 upregulated and 20 downregulated pathways, were also identified. The names of the common pathways are listed in Table II.

Table II.

Details of the upregulated (n=78) and downregulated (n=20) common crucial pathways.

Regulation	Pathway
Downregulated	‘N-Glycan biosynthesis’, ‘mismatch repair’, ‘cellular tumor antigen p53 signaling pathway’, ‘amino sugar and nucleotide sugar metabolism’, ‘aminoacyl-transferRNA biosynthesis’, ‘pyrimidine metabolism’, ‘drug metabolism-other enzymes’, ‘ribosome biogenesis in eukaryotes’, ‘RNA transport’, ‘glycosphingolipid biosynthesis-lacto and neolacto series’, ‘base excision repair’, ‘cell cycle’, ‘protein export’, ‘alanine’, ‘aspartate and glutamate metabolism’, ‘proteasome’, ‘fructose and mannose metabolism’, ‘pentose phosphate pathway’, ‘DNA replication’, ‘Parkinson's disease’, ‘homologous recombination’
Upregulated	‘Type I diabetes mellitus’, ‘vascular smooth muscle contraction’, ‘gap junction’, ‘leukocyte transendothelial migration’, ‘leukocyte transendothelial migration’, ‘janus kinase-signal transducer and activator of transcription signaling pathway’, ‘osteoclast differentiation’, ‘ATP-binding cassette transporters’, ‘mitogen-activated protein kinase signaling pathway’, ‘basal cell carcinoma’, ‘viral myocarditis’, ‘metabolism of xenobiotics by cytochrome P450’, ‘tryptophan metabolism’, ‘B cell receptor signaling pathway’, ‘hypertrophic cardiomyopathy’, ‘drug metabolism-cytochrome P450’, ‘fatty acid degradation’, ‘neuroactive ligand-receptor interaction’, ‘regulation of actin cytoskeleton’, ‘dorso-ventral axis formation’, ‘neurotrophin signaling pathway’, ‘salivary secretion’, ‘hematopoietic cell lineage’, ‘prion diseases’, ‘cell adhesion molecules’, ‘inositol phosphate metabolism’, ‘peroxisome proliferator-activated receptor signaling pathway’, ‘intestinal immune network for IgA production’, ‘carbohydrate digestion and absorption’, ‘phagosome’, ‘chronic myeloid leukemia’, ‘long-term potentiation’, ‘natural killer cell mediated cytotoxicity’, ‘aldosterone-regulated sodium reabsorption’, ‘tight junction’, ‘phosphatidylinositol signaling system’, ‘acute myeloid leukemia’, ‘African trypanosomiasis’, ‘bile secretion’, ‘calcium signaling pathway’, ‘adipocytokine signaling pathway’, ‘allograft rejection’, ‘type II diabetes mellitus’, ‘progonadoliberin-1 signaling pathway’, ‘vascular endothelial growth factor signaling pathway’, ‘complement and coagulation cascades’, ‘graft-vs.-host disease’, ‘melanogenesis’, ‘rheumatoid arthritis’, ‘malaria’, ‘T cell receptor signaling pathway’, ‘Fcε RI signaling pathway’, ‘autoimmune thyroid disease’, ‘gastric acid secretion’, ‘arachidonic acid metabolism’, ‘cytokine-cytokine receptor interaction’, ‘soluble vesicle-fusing ATPase attachment protein receptor interactions in vesicular transport’, ‘insulin signaling pathway’, ‘proximal tubule bicarbonate reclamation’, ‘vasopressin-regulated water reabsorption’, ‘long-term depression’, ‘toxoplasmosis’, ‘asthma’, ‘transforming growth factor-β signaling pathway’, ‘Fcγ R-mediated phagocytosis’, ‘dilated cardiomyopathy’, ‘histidine metabolism’, ‘epithelial cell signaling in Helicobacter pylori infection’, ‘pancreatic secretion’, ‘endocytosis’, ‘nucleotide-binding oligomerization domain-like receptor signaling pathway’, ‘cytosolic DNA-sensing pathway’, ‘chemokine signaling pathway’, ‘wingless/integrated signaling pathway’, ‘hedgehog signaling pathway’, ‘chagas disease (American trypanosomiasis)’, ‘apoptosis’, ‘leishmaniasis’, ‘Staphylococcus aureus infection’

Regulation

Pathway

Downregulated

‘N-Glycan biosynthesis’, ‘mismatch repair’, ‘cellular tumor antigen p53 signaling pathway’, ‘amino sugar and nucleotide sugar metabolism’, ‘aminoacyl-transferRNA biosynthesis’, ‘pyrimidine metabolism’, ‘drug metabolism-other enzymes’, ‘ribosome biogenesis in eukaryotes’, ‘RNA transport’, ‘glycosphingolipid biosynthesis-lacto and neolacto series’, ‘base excision repair’, ‘cell cycle’, ‘protein export’, ‘alanine’, ‘aspartate and glutamate metabolism’, ‘proteasome’, ‘fructose and mannose metabolism’, ‘pentose phosphate pathway’, ‘DNA replication’, ‘Parkinson's disease’, ‘homologous recombination’

Upregulated

‘Type I diabetes mellitus’, ‘vascular smooth muscle contraction’, ‘gap junction’, ‘leukocyte transendothelial migration’, ‘leukocyte transendothelial migration’, ‘janus kinase-signal transducer and activator of transcription signaling pathway’, ‘osteoclast differentiation’, ‘ATP-binding cassette transporters’, ‘mitogen-activated protein kinase signaling pathway’, ‘basal cell carcinoma’, ‘viral myocarditis’, ‘metabolism of xenobiotics by cytochrome P450’, ‘tryptophan metabolism’, ‘B cell receptor signaling pathway’, ‘hypertrophic cardiomyopathy’, ‘drug metabolism-cytochrome P450’, ‘fatty acid degradation’, ‘neuroactive ligand-receptor interaction’, ‘regulation of actin cytoskeleton’, ‘dorso-ventral axis formation’, ‘neurotrophin signaling pathway’, ‘salivary secretion’, ‘hematopoietic cell lineage’, ‘prion diseases’, ‘cell adhesion molecules’, ‘inositol phosphate metabolism’, ‘peroxisome proliferator-activated receptor signaling pathway’, ‘intestinal immune network for IgA production’, ‘carbohydrate digestion and absorption’, ‘phagosome’, ‘chronic myeloid leukemia’, ‘long-term potentiation’, ‘natural killer cell mediated cytotoxicity’, ‘aldosterone-regulated sodium reabsorption’, ‘tight junction’, ‘phosphatidylinositol signaling system’, ‘acute myeloid leukemia’, ‘African trypanosomiasis’, ‘bile secretion’, ‘calcium signaling pathway’, ‘adipocytokine signaling pathway’, ‘allograft rejection’, ‘type II diabetes mellitus’, ‘progonadoliberin-1 signaling pathway’, ‘vascular endothelial growth factor signaling pathway’, ‘complement and coagulation cascades’, ‘graft-vs.-host disease’, ‘melanogenesis’, ‘rheumatoid arthritis’, ‘malaria’, ‘T cell receptor signaling pathway’, ‘Fcε RI signaling pathway’, ‘autoimmune thyroid disease’, ‘gastric acid secretion’, ‘arachidonic acid metabolism’, ‘cytokine-cytokine receptor interaction’, ‘soluble vesicle-fusing ATPase attachment protein receptor interactions in vesicular transport’, ‘insulin signaling pathway’, ‘proximal tubule bicarbonate reclamation’, ‘vasopressin-regulated water reabsorption’, ‘long-term depression’, ‘toxoplasmosis’, ‘asthma’, ‘transforming growth factor-β signaling pathway’, ‘Fcγ R-mediated phagocytosis’, ‘dilated cardiomyopathy’, ‘histidine metabolism’, ‘epithelial cell signaling in Helicobacter pylori infection’, ‘pancreatic secretion’, ‘endocytosis’, ‘nucleotide-binding oligomerization domain-like receptor signaling pathway’, ‘cytosolic DNA-sensing pathway’, ‘chemokine signaling pathway’, ‘wingless/integrated signaling pathway’, ‘hedgehog signaling pathway’, ‘chagas disease (American trypanosomiasis)’, ‘apoptosis’, ‘leishmaniasis’, ‘Staphylococcus aureus infection’

TCGA database

The clinical data and expression profiles of lung adenocarcinoma in TCGA database were downloaded. Cox regression analysis was used, and P′-value (P-value in TCGA) was adjusted to <0.05. A total of 2,494 significant differentially expressed genes were obtained. Subsequently, 610 differentially expressed genes from the meta-analysis were matched with the 2,494 genes from TCGA by the Venn method (Fig. 2); 100 common genes exhibited statistically significant differences in expression and were considered to affect survival prognosis. The names, P′-value and P-value of the 100 common genes are presented in Table III.

Figure 2.

Venn diagram of common crucial genes differentially expressed in the meta-analysis and in TCGA database. TCGA, The Cancer Genome Atlas; Meta R, meta-analysis; Survival R, genes associated with survival in the TCGA database.

Table III.

Common crucial genes significantly differentially expressed in the meta-analysis and in The Cancer Genome Atlas database.

Gene name	P-value	P′-value
ARRB2	2.58×10⁻⁶	1.51×10⁻³
IL6R	4.99×10⁻⁴	7.62×10⁻³
HPGDS	3.90×10⁻⁴	3.54×10⁻²
NR3C2	1.09×10⁻⁴	4.38×10⁻²
ALG8	5.36×10⁻¹³	4.62×10⁻²
ACSL4	8.73×10⁻³	1.85×10⁻²
BDNF	1.69×10⁻¹²	1.12×10⁻⁴
ADRB2	<1.00×10⁻¹⁶	4.52×10⁻²
FGF2	1.22×10⁻¹⁵	7.90×10⁻⁴
MCM6	7.92×10⁻⁸	3.87×10⁻²
NCF4	6.27×10⁻³	3.67×10⁻²
AURKA	3.05×10⁻¹²	1.28×10⁻²
IL20RA	6.67×10⁻⁴	2.64×10⁻²
TACC3	9.2×10⁻⁸	1.12×10⁻²
COL4A6	1.06×10⁻³	4.05×10⁻³
KAT2B	5.24×10⁻¹²	4.19×10⁻²
SEMA3A	2.88×10⁻²	2.11×10⁻³
SGCG	<1.00×10⁻¹⁶	2.94×10⁻²
ELOVL6	3.63×10⁻²	1.60×10⁻³
ABLIM3	6.65×10⁻¹⁴	1.04×10⁻³
GALNT3	1.78×10⁻⁵	1.24×10⁻³
HK3	6.49×10⁻¹⁰	3.88×10⁻²
PSMD12	1.48×10⁻³	1.64×10⁻²
FMO3	1.87×10⁻⁶	6.75×10⁻³
LCP2	7.39×10⁻⁴	1.88×10⁻²
HYAL1	1.44×10⁻¹³	2.49×10⁻³
PPARG	1.56×10⁻¹⁰	2.01×10⁻²
BUB1	1.86×10⁻¹¹	4.65×10⁻²
BUB1B	1.55×10⁻¹³	2.55×10⁻²
F12	1.47×10⁻⁸	2.13×10⁻²
COL4A5	9.88×10⁻⁵	3.38×10⁻³
MAD2L1	1.14×10⁻¹⁰	1.02×10⁻²
TYMS	1.45×10⁻¹⁴	7.92×10⁻⁴
CSGALNACT1	8.60×10⁻⁵	6.00×10⁻⁴
IL10RA	1.27×10⁻⁴	4.15×10⁻²
CDC25A	4.83×10⁻⁶	5.68×10⁻³
CKS1B	8.27×10⁻¹⁰	3.26×10⁻²
P2RY13	5.11×10⁻⁷	1.14×10⁻³
CDKN1C	5.84×10⁻¹²	3.24×10⁻²
YKT6	1.80×10⁻⁷	3.08×10⁻²
FGR	<1.00×10⁻¹⁶	4.18×10⁻²
BTK	5.23×10⁻⁶	2.36×10⁻³
GTSE1	4.07×10⁻⁹	8.18×10⁻³
TLR7	1.76×10⁻²	9.61×10⁻⁴
PRKCH	1.32×10⁻¹⁴	1.56×10⁻²
CHPT1	3.34×10⁻⁷	3.64×10⁻²
LEF1	1.43×10⁻³	3.32×10⁻²
P4HA2	2.48×10⁻²	2.71×10⁻²
PPAT	1.12×10⁻⁸	2.57×10⁻²
VIPR1	<1.00×10⁻¹⁶	1.61×10⁻²
SLK	8.67×10⁻¹²	1.86×10⁻²
HCK	1.40×10⁻⁹	1.48×10⁻²
GPD1L	5.28×10⁻⁴	6.20×10⁻⁴
ARHGEF4	2.27×10⁻⁷	4.02×10⁻³
GSTM5	4.37×10⁻¹³	1.74×10⁻²
CD4	9.53×10⁻³	2.11×10⁻²
AOC3	<1.00×10⁻¹⁶	2.02×10⁻²
FUT1	2.48×10⁻⁹	4.87×10⁻²
VCL	2.84×10⁻³	3.22×10⁻²
TTK	3.15×10⁻¹¹	3.84×10⁻²
BIRC5	2.20×10⁻¹⁴	1.78×10⁻²
ASAP2	2.56×10⁻²	1.16×10⁻³
VPS37B	4.54×10⁻⁴	2.07×10⁻²
CDC45	7.5×10⁻¹⁰	2.05×10⁻²
CX3CR1	1.58×10⁻⁷	6.33×10⁻³
DOCK2	8.47×10⁻⁶	2.69×10⁻²
OAS3	1.31×10⁻²	1.06×10⁻²
UBE2S	4.02×10⁻⁴	2.89×10⁻³
ALG3	2.76×10⁻¹¹	3.51×10⁻²
ADCY9	4.62×10⁻⁶	7.66×10⁻³
F2RL1	1.48×10⁻⁹	1.82×10⁻³
POLD2	3.77×10⁻⁸	4.31×10⁻²
PTTG1	1.2×10⁻¹¹	6.39×10⁻³
STIP1	1.68×10⁻³	2.46×10⁻²
FZD4	<1.00×10⁻¹⁶	1.01×10⁻²
DPYSL2	3.77×10⁻¹⁵	1.52×10⁻²
BLM	1.16×10⁻³	1.54×10⁻²
ATP6V1B2	1.85×10⁻³	1.03×10⁻²
ARHGEF6	9.99×10⁻¹⁵	5.74×10⁻³
CSF2RB	6.37×10⁻⁷	3.03×10⁻²
NUP37	1.58×10⁻³	2.57×10⁻²
MTHFD1	4.28×10⁻⁵	6.66×10⁻³
P2RY14	2.22×10⁻¹⁶	1.78×10⁻²
MCM4	8.75×10⁻¹²	7.57×10⁻³
WDR3	1.19×10⁻⁵	9.23×10⁻³
CD33	3.15×10⁻³	7.13×10⁻³
VEGFC	1.35×10⁻³	1.0×10⁻²
ATP1A2	1.86×10⁻¹⁰	3.05×10⁻²
HMMR	2.15×10⁻¹³	1.03×10⁻³
C6	1.97×10⁻²	4.86×10⁻²
PPP2R5A	6.32×10⁻⁶	2.85×10⁻²
GRIA1	<1.00×10⁻¹⁶	1.89×10⁻²
HACD1	2.03×10⁻⁸	6.72×10⁻³
PTPN6	3.57×10⁻⁴	8.81×10⁻³
HGF	1.02×10⁻⁵	1.49×10⁻²
PLK1	6.12×10⁻⁷	2.47×10⁻⁵
DAPK2	5.99×10⁻¹³	2.27×10⁻²
TUBB6	1.03×10⁻⁸	3.66×10⁻⁴
ADIPOR2	2.87×10⁻¹¹	5.48×10⁻⁴
HCLS1	5.48×10⁻⁴	3.01×10⁻²

Results of significant common pathways and key genes

The official gene symbols of 100 crucial genes were imported into the functional annotation tool of DAVID and five crucial pathways were obtained by KEGG, which is a distinct pathway analysis tool. A total of 78 upregulated and 20 downregulated pathways were screened out by the Venn method among common pathways obtained from GSEA (Table II). A total of five pathways were matched with 78 upregulated and 20 downregulated pathways by the Venn method, and two significant pathways were identified: Cell cycle (Fig. 3) and DNA replication (Fig. 4). In addition, the genes from the KEGG database were also identified to serve crucial roles in two significant common pathways, presented in Figs. 3 and 4. According to the two significant pathways, 12 key genes were obtained [DNA polymerase δ subunit 2 (POLD2), DNA replication licensing factor MCM4, MCM6, mitotic checkpoint serine/threonine-protein kinase BUB1 (BUB1), BUB1β, mitotic spindle assembly checkpoint protein MAD2A (MAD2L1), dual specificity protein kinase TTK, M-phase inducer phosphatase 1 (CDC25A), cell division control protein 45 homolog (CDC45), cyclin-dependent kinase inhibitor 1C (CDKN1C), pituitary tumor-transforming gene 1 protein (PTTG1) and polo-like kinase 1 (PLK1)] from KEGG of DAVID. Subsequently, 12 key genes were mapped in the String database to explore associations among them (Fig. 5), and MCM4 was identified to serve an important role in their interactions. GO annotation was applied to detect common pathways (Fig. 6) of biological process, cellular components and molecular function. Furthermore, the Kaplan-Meier curves (Fig. 7) of 12 key genes were obtained and demonstrated that patients in the high-risk group had poorer survival when compared with patients in the low-risk group.

Figure 3.

Cell cycle pathways from the Kyoto Encyclopedia of Genes and Genomes database. Genes with red stars are considered to be differentially expressed. Corresponding P-values are presented in Table III.

Figure 4.

DNA replication pathway from the Kyoto Encyclopedia of Genes and Genomes database. Genes with red stars are considered to be differentially expressed. Corresponding P-values are presented in Table III. GAP, GTPase-activating protein; SSB, ribosome-associated molecular chaperone SSB; RFC, replication factor C subunit; MCM, DNA replication licensing factor MCM; RPA, replication protein A; FEN, Flap endonuclease 1.

Figure 5.

A gene network of the 12 key genes identified from the String database. POLD2, DNA polymerase δ subunit 2; MCM, DNA replication licensing factor MCM; BUB1, mitotic checkpoint serine/threonine-protein kinase BUB1; MAD2L1, mitotic spindle assembly checkpoint protein MAD2A; TTK, dual specificity protein kinase TTK; CDC25A, M-phase inducer phosphatase 1; CDC45, cell division control protein 45 homolog, CDKN1C, cyclin-dependent kinase inhibitor 1C; PTTG1, pituitary tumor-transforming gene 1 protein; PLK1, polo-like kinase 1. Nodes, network nodes represent proteins. Splice isoforms or post-translational modifications are collapsed, i.e., each node represents all protein-coding gene loci; edges, edges represent protein-protein associations, associations are meant to be specific and meaningful, i.e., proteins jointly contribute to a shared function although this does not necessarily mean they are physically binding to each other.

Figure 6.

Gene Ontology annotation of the 100 common crucially differentially expressed genes.

Figure 7.

Kaplan-Meier survival curves of the 12 key genes identified from the String database. (A) An increase in BUB1 expression was associated with a significant decrease in overall survival. (B) An increase in BUB1B expression was associated with a significant decrease in overall survival. (C) An increase in CDC25A expression was associated with a notable decrease in overall survival. (D) An increase in CDC45 expression was associated with a significant decrease in overall survival. (E) The increase in CDKN1C expression is related to the decrease in overall survival rate. (F) An increase in MAD2L1 expression was associated with a significant decrease in overall survival. (G) An increase in MCM4 expression was associated with a notable decrease in overall survival. (H) An increase in MCM6 expression was associated with a decrease in overall survival. (I) An increase in PLK1 expression was associated with a significant decrease in overall survival. (J) An increase in POLD2 expression was associated with a significant decrease in overall survival. (K) An increase in PTTG1 expression was associated with a decrease in overall survival. (L) An increase in TTK expression is related to the decrease in overall survival rate. POLD2, DNA polymerase δ subunit 2; MCM, DNA replication licensing factor MCM; BUB1, mitotic checkpoint serine/threonine-protein kinase BUB1; MAD2L1, mitotic spindle assembly checkpoint protein MAD2A; TTK, dual specificity protein kinase TTK; CDC25A, M-phase inducer phosphatase 1; CDC45, cell division control protein 45 homolog, CDKN1C, cyclin-dependent kinase inhibitor 1C; PTTG1, pituitary tumor-transforming gene 1 protein; PLK1, polo-like kinase 1.

Discussion

Although lung adenocarcinoma is the most common primary lung neoplasm (16), its causes and underlying molecular mechanisms have not been fully elucidated (17). Previous studies primarily focused on a single factor that may lead to the development of lung adenocarcinoma (18,19); however, a single theory cannot provide a detailed explanation for all the different cases of lung adenocarcinoma. Global analysis, which includes metabolome, transcriptome, proteome and genome, collectively referred to as ‘omics’ after the completion of the Human Genome Project (20), enabled the description of the genome-wide molecular mechanisms of lung adenocarcinoma and revealed disease-specific molecular markers and biomarkers for its diagnosis, classification and prognosis (21). Furthermore, microarray technology serves an important role in numerous studies based on genomics and post-genomics (22). In addition, microarray technology provides the basis for obtaining significantly differentially expressed genes and crucial common pathways. A large number of genes are considered to be associated with lung adenocarcinoma (23); however, it is difficult to determine which genes are the most relevant. Previous studies have generally investigated one gene or conducted only a single research method (24–26). However, these studies may overlook the key genes and crucial common pathways. In addition, there are certain limitations regarding studies of a single gene chip analysis. For example, it may not take into consideration differences in expression levels among different samples, which may cause various significant genes and key genes to go undetected (27). Therefore, in the present study, four groups of datasets containing samples of normal and cancerous biological states were selected based on the GSEA method, in order to avoid the deviation from the number of samples. Analysis of these datasets is expected to more accurately identify the significantly differentially expressed genes and common pathways. GSEA and meta-analysis were used simultaneously to analyze four datasets in order to obtain the crucial genes and significant common pathways in lung adenocarcinoma. The main function of GSEA was to indicate differentially expressed genes extracted from samples (number of samples ≥6). In addition, 610 significantly differentially expressed genes were obtained using the R software and meta-analysis, and 78 upregulated and 20 downregulated pathways were identified by the Venn method. The reasoning for selecting meta-analysis to identify the significantly differentially expressed genes rather than overlap of samples were as follows: Since the sample size was small, genes that were not common to the four gene sets may have been overlooked, and a simple comparison was additionally performed where a strict cut-off for significance was not used, possibly introducing a statistical bias. Therefore, meta-analysis was deemed to be an improved approach to decrease deviations. A total of 610 significantly differentially expressed genes were matched with lung adenocarcinoma-associated genes in the TCGA database and 100 crucial genes were obtained. To identify genes closely associated with lung adenocarcinoma, the common pathways of 100 genes were overlapped with 78 upregulated and 20 downregulated pathways by the Venn method, and two crucial pathways were filtered out: Cell cycle and DNA replication. In addition, 12 key genes (POLD2, MCM4, MCM6, BUB1B, BUB1, MAD2L1, TTK, CDC25A, CDC45, CDKN1C, PTTG1 and PLK1) were identified through the KEGG pathway database when their roles in cell cycle and DNA replication were examined. Blast2GO separated all key genes into three groups: i) biological process; ii) cellular components; and iii) molecular function. These genes may be closely associated with tumor development, in which a proportion of the genes confer an increased susceptibility to lung adenocarcinoma. Future experiments are required to verify specific associations between these findings and lung adenocarcinoma. A number of studies demonstrated the function of key genes identified in the present study and their impact on the pathology of other diseases: Mutation of MCM4 may contribute to skin cancer development by disturbing DNA replication (28), POLD2 is associated with the outcome of ovarian carcinomas (29), BUB1B may be a therapeutic target for glioblastoma (30), and the DNA-binding properties of human CDC45 reveal its function as a molecular wedge for DNA unwinding (31). In additional studies on lung adenocarcinoma, MCM4 has been considered to affect the tumorigenesis of lung adenocarcinoma (32), CDC45 was reported to be associated with the diagnosis of lung adenocarcinoma (33) and TTK serves a role in the development and survival of lung adenocarcinoma (34). However, the number of studies on the effect of key genes affecting the pathology of lung adenocarcinoma is limited. Furthermore, our results suggest that MCM4 and MCM6 affect cell cycle and DNA replication, while cell cycle and DNA replication serve important roles in the pathogenesis of lung adenocarcinoma. Therefore, MCM4 and MCM6 may serve a crucial role in the diagnosis and treatment of lung adenocarcinoma. Studies of the genes implicated, in the diagnosis and treatment of this type of cancer are required. In conclusion, the pathogenesis of lung adenocarcinoma is complicated. The aim of the present study was to provide insight into the underlying mechanisms by focusing on gene sets or common pathways rather than on a single gene. In addition, a number of consistent biological mechanisms involved in lung adenocarcinoma were identified by GSEA and meta-analysis. Pathways involved in cell cycle and DNA replication and 12 key genes (POLD2, MCM4, MCM6, BUB1B, BUB1, MAD2L1, TTK, CDC25A, CDC45, CDKN1C, PTTG1 and PLK1) were identified as relevant. Follow-up experiments are required to explore specific links between these data and the prognosis of lung adenocarcinoma. In addition, new computational and bioinformatics tools may prove to be of value for the diagnosis and prognosis of lung adenocarcinoma.

37 in total

1. Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts.

Authors: Dov Greenbaum; Ronald Jansen; Mark Gerstein
Journal: Bioinformatics Date: 2002-04 Impact factor: 6.937

2. Gene expression profiling reveals novel biomarkers in nonsmall cell lung cancer.

Authors: Abel Sanchez-Palencia; Mercedes Gomez-Morales; Jose Antonio Gomez-Capilla; Vicente Pedraza; Laura Boyero; Rafael Rosell; M Esther Fárez-Vidal
Journal: Int J Cancer Date: 2010-11-28 Impact factor: 7.396

3. The Human Genome Project. Revealing the shared inheritance of all humankind.

Authors: F S Collins; M K Mansoura
Journal: Cancer Date: 2001-01-01 Impact factor: 6.860

4. Addition of thiazolidinedione or exenatide to oral agents in type 2 diabetes: a meta-analysis.

Authors: Nicole R Pinelli; Raymond Cha; Morton B Brown; Linda A Jaber
Journal: Ann Pharmacother Date: 2008-10-28 Impact factor: 3.154

5. Bioconductor: open software development for computational biology and bioinformatics.

Authors: Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal: Genome Biol Date: 2004-09-15 Impact factor: 13.583

6. GSEA-P: a desktop application for Gene Set Enrichment Analysis.

Authors: Aravind Subramanian; Heidi Kuehn; Joshua Gould; Pablo Tamayo; Jill P Mesirov
Journal: Bioinformatics Date: 2007-07-20 Impact factor: 6.937

Review 7. Histologic subtype in NSCLC: does it matter?

Authors: Giovanni Selvaggi; Giorgio V Scagliotti
Journal: Oncology (Williston Park) Date: 2009-11-30 Impact factor: 2.990

8. NCBI GEO: mining tens of millions of expression profiles--database and tools update.

Authors: Tanya Barrett; Dennis B Troup; Stephen E Wilhite; Pierre Ledoux; Dmitry Rudnev; Carlos Evangelista; Irene F Kim; Alexandra Soboleva; Maxim Tomashevsky; Ron Edgar
Journal: Nucleic Acids Res Date: 2006-11-11 Impact factor: 16.971

9. Selection of DDX5 as a novel internal control for Q-RT-PCR from microarray data using a block bootstrap re-sampling scheme.

Authors: Li-Jen Su; Ching-Wei Chang; Yu-Chung Wu; Kuang-Chi Chen; Chien-Ju Lin; Shu-Ching Liang; Chi-Hung Lin; Jacqueline Whang-Peng; Shih-Lan Hsu; Chen-Hsin Chen; Chi-Ying F Huang
Journal: BMC Genomics Date: 2007-06-01 Impact factor: 3.969

10. Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival.

Authors: Maria Teresa Landi; Tatiana Dracheva; Melissa Rotunno; Jonine D Figueroa; Huaitian Liu; Abhijit Dasgupta; Felecia E Mann; Junya Fukuoka; Megan Hames; Andrew W Bergen; Sharon E Murphy; Ping Yang; Angela C Pesatori; Dario Consonni; Pier Alberto Bertazzi; Sholom Wacholder; Joanna H Shih; Neil E Caporaso; Jin Jen
Journal: PLoS One Date: 2008-02-20 Impact factor: 3.240

4 in total

1. Identification of candidate genes and prognostic value analysis in patients with PDL1-positive and PDL1-negative lung adenocarcinoma.

Authors: Xiaoguang Qi; Chunyan Qi; Xindan Kang; Yi Hu; Weidong Han
Journal: PeerJ Date: 2020-06-17 Impact factor: 2.984

2. Regulatory role of DEPTOR‑mediated cellular autophagy and mitochondrial reactive oxygen species in angiogenesis in multiple myeloma.

Authors: Jizhen Wang; Junmin Chen; Dongbiao Qiu; Zhiyong Zeng
Journal: Int J Mol Med Date: 2020-12-24 Impact factor: 4.101

3. A novel 14-gene signature for overall survival in lung adenocarcinoma based on the Bayesian hierarchical Cox proportional hazards model.

Authors: Na Sun; Jiadong Chu; Wei Hu; Xuanli Chen; Nengjun Yi; Yueping Shen
Journal: Sci Rep Date: 2022-01-07 Impact factor: 4.379

4. Investigating the evolution process of lung adenocarcinoma via random walk and dynamic network analysis.

Authors: Bolin Chen; Jinlei Zhang; Teng Wang; Ci Shao; Lijun Miao; Shengli Zhang; Xuequn Shang
Journal: Front Genet Date: 2022-09-29 Impact factor: 4.772

4 in total