Literature DB >> 33021972

Creation of a Prognostic Risk Prediction Model for Lung Adenocarcinoma Based on Gene Expression, Methylation, and Clinical Characteristics.

Honggang Ke1, Yunyu Wu2, Runjie Wang3, Xiaohong Wu4.   

Abstract

BACKGROUND This study aimed to identify important marker genes in lung adenocarcinoma (LACC) and establish a prognostic risk model to predict the risk of LACC in patients. MATERIAL AND METHODS Gene expression and methylation profiles for LACC and clinical information about cases were downloaded from the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases, respectively. Differentially expressed genes (DEGs) and differentially methylated genes (DMGs) between cancer and control groups were selected through meta-analysis. Pearson coefficient correlation analysis was performed to identify intersections between DEGs and DMGs and a functional analysis was performed on the genes that were correlated. Marker genes and clinical factors significantly related to prognosis were identified using univariate and multivariate Cox regression analyses. Risk prediction models were then created based on the marker genes and clinical factors. RESULTS In total, 1975 DEGs and 2095 DMGs were identified. After comparison, 16 prognosis-related genes (EFNB2, TSPAN7, INPP5A, VAMP2, CALML5, SNAI2, RHOBTB1, CKB, ATF7IP2, RIMS2, RCBTB2, YBX1, RAB27B, NFATC1, TCEAL4, and SLC16A3) were selected from 265 overlapping genes. Four clinical factors (pathologic N [node], pathologic T [tumor], pathologic stage, and new tumor) were associated with prognosis. The prognostic risk prediction models were constructed and validated with other independent datasets. CONCLUSIONS An integrated model that combines clinical factors and gene markers is useful for predicting risk of LACC in patients. The 16 genes that were identified, including EFNB2, TSPAN7, INPP5A, VAMP2, and CALML5, may serve as novel biomarkers for diagnosis of LACC and prediction of disease prognosis.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 33021972      PMCID: PMC7549534          DOI: 10.12659/MSM.925833

Source DB:  PubMed          Journal:  Med Sci Monit        ISSN: 1234-1010


Background

Lung cancer is one of the most common cancers and a severe threat to human health. The number of lung cancer-related deaths is growing, with an estimated one-quarter of cancer-related deaths due to the disease [1]. There are 2 main types of lung cancer: small cell lung cancer and non-small cell lung cancer (NSCLC). Lung adenocarcinoma (LACC) is the most frequent histological subtype of NSCLC, accounting for approximately 75% of all cases of lung cancer. Over the past few decades, incidence of LACC in China has rapidly increased [2]. Despite recent advances in multimodality therapy, the overall 5-year survival rate for patients with LACC is only 15% [3], because two-thirds of lung cancers are discovered at advanced stages. Furthermore, 30% to 55% or more of patients who undergo resection for lung cancer experience relapse of disease within 5 years and die of metastatic recurrence [4]. Currently, it is impossible to accurately identify specific patients at high risk of recurrence to provide individualized therapy. In recent years, molecular characterization of NSCLC has reached an unprecedented level of detail [5,6]. Vascular invasion, poor differentiation, tumor size, and high tumor proliferation index have been found to have prognostic significance. In addition, advances in human genomics have revolutionized methods of identifying new prognostic factors for human cancer [7,8]. For instance, Jiang et al. [9] identified 16 survival marker genes on the basis of datasets from previous studies. Beer et al. [10] evaluated a group of survival marker genes for use in identification of high-risk patients with LACC. Moreover, global gene expression profiling based on microarray technology has identified novel gene signatures and potential biomarkers to better predict patient prognosis in lung cancer [11-15], such as KRAS [16], p53 [17], SLC1A6, MGB1, REG1A, and AKAP12 [18]. Despite this progress, however, it remains challenging to accurately predict prognosis in patients with LACC. In this study, we integrated gene expression profiling, methylation profiling and clinical characteristics to identify important marker genes that could predict survival and prognosis in a cohort of patients with LACC. A comprehensive prognostic risk model was constructed based on tumor marker genes and clinical factors. Reasonable use of reliable tumor markers may be helpful in early diagnosis of LACC and prediction of prognosis in patients with the disease.

Material and Methods

Data collection for meta-analysis

The datasets for LACC, including gene expression and methylation profiles obtained from the same patient population, were downloaded from the National Center of Biotechnology Information Gene Expression Omnibus (GEO) database () and the European Bioinformatics Institute database on September 5, 2017. The datasets were further screened according to the following inclusion criteria: (1) Presence of LACC and normal control samples; (2) Availability of more than 50 samples; and (3) More than 20 000 total probes detected in the dataset. Finally, a total of 7 gene expression profile (GSE75037, GSE33532, GSE43458, GSE30219, GSE32863, GSE10072 and GSE62949) and 4 methylation profile datasets (GSE32861, GSE49996, GSE63384, and GSE62948) were selected. Detailed information about them is shown in Table 1. Furthermore, GSE62949 and GSE62948 were both included in dataset GSE62950 ().
Table 1

The gene expression profiling and methylation profiling datasets in this study.

GEO accessionPlatformTotal probe numberTotal sampleNormal sampleCancer sample
Gene expressionGSE75037GPL6884 Illumina488031668383
GSE33532GPL570 Affymetrix54675604020
GSE43458GPL6244 Affymetrix332971108030
GSE30219GPL570 Affymetrix54675988414
GSE32863GPL6884 Illumina488031165858
GSE10072GPL96 Affymetrix222831075849
Gene methylationGSE32861GPL8490 Illumina275781185959
GSE49996GPL8490 Illumina27578884444
GSE63384GPL8490 Illumina27578703535
GSE62948GPL8490 Illumina27578562828

GEO – Gene Expression Omnibus.

Data collection for construction of the prognostic risk prediction model

Data on gene expression and methylation profiles for LACC used to construct the prognostic risk prediction model were downloaded from The Cancer Genome Atlas (TCGA) database (). After matching the methylation and gene expression profiles, 473 matched tumor samples were obtained. A total of 335 tumor samples were obtained by removing the samples that did not have survival prognosis information. These data were used as the training dataset. At the same time, the expression profile for LACC tissue, GSE37745 [19] (platform: GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array), was downloaded from the GEO database. This dataset, which contains 106 LACC tissue samples, was used as an independent validation dataset for the prognostic risk prediction model. Clinical information about the 2 datasets is shown in Table 2.
Table 2

Clinical information from The Cancer Genome Atlas (TCGA) and GSE62254 datasets.

Clinical characteristicsTCGA (N=335)GSE37745 (N=106)
Age (years, mean±SD)65.19±10.2562.94±9.22
Sex (Male/Female)155/18046/60
Pathologic M (M0/M1/–)226/13/96
Pathologic N (N0/N1/N2/–)214/60/55/6
Pathologic T (T1/T2/T3/T4/–)111/180/29/14/1
Pathologic stage (I/II/III/IV)180/81/61/1370/19/13/4
Radiation therapy (yes/no/–)41/254/40
Targeted molecular therapy (yes/no/–)99/194/42
Tobacco smoking history (current/reformed/never/–)70/206/45/14
Recurrence (yes/no/–)104/176/5526/27
Death (dead/alive)120/21577/29
Recurrence-free survival time (months, mean±SD)22.27±27.7754.11±53.48
Overall survival time (months, mean±SD)27.54±29.7461.74±49.96

‘–’ – Represents information unavailable.

Preprocessing, quality control, and differential expression analysis of data used in the meta-analysis

We used the oligo package [20] () in R3.4.1 language for CEL data conversion, missing values supplementation (median method), background correction (MAS method), and data normalization (quantile method) of the GSE333532, GSE43458, GSE30219, and GSE1072 datasets, which were downloaded from the GEO database based on the Affy platform. Using the limma package [21] () in the R3.4.1 language with the Illumima platform (quantile method), gene annotation, log2 conversion, and data normalization were performed on the GSE75037 and GSE32863 datasets (TXT format). For methylation profiling of GSE32861, GSE49996, GSE63384, and GSE62948, we identified the chromosomal sites and methylation beta values using the GenomeStudio Methylation Module [22]. The main purpose of the meta-analysis was to comprehensively generate multiple research results using multiple experimental datasets, improve the ability to generate statistics, and screen for more reliable genes. Because these datasets were collected from different samples and experiments, they may be subject to bias. Therefore, the MetaQC [23] package () in R3.4.1 was used to perform quality control on the datasets. Next, the differentially expressed genes (DEGs) and differentially methylated genes (DMGs) were screened out using MetaDE.ES in MetaDE [24] package (). The tau2=0, and Qpval >0.05 were used as the homogeneity test parameters; a false discovery rate (FDR) <0.05 was set as the threshold.

Analysis of correlation between gene expression and methylation levels

For the above obtained DEGs and DMGs, we selected the intersection genes, which then served as candidate tumor marker genes. The Pearson coefficient correlation between gene expression level and methylation level was calculated using the cor function () in R3.4.1. Then the DAVID Bioinformatics Database, v6.8 [25,26] (), was used to perform enrichment analyses of the candidate tumor marker genes from the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases.

Screening of tumor marker genes and clinical factors related to prognosis

From among the total tumor marker gene set and the corresponding clinical factors for the tumor samples, we then identified the tumor marker genes and clinical factors significantly related to prognosis, using the univariate and multivariate Cox regression analyses in the Bioconductor R3.4.1 survival package [27] (). A log-rank test P<0.05 was used as the threshold for significance.

Construction and validation of the risk prediction model

Using the prognosis-associated tumor marker genes identified by the Cox regression analysis, we constructed a risk prediction model and calculated a prognostic index (PI) for each sample. The samples in the training set were divided into high- and low-risk groups, according to median PI. Then the correlation between the risk prediction model and prognosis was assessed with construction of a Kaplan-Meier survival curve [28] in the survival package of R3.4.1 and validated using the validation dataset. In addition, following the same method, a risk prediction model was constructed using the clinical factors and the Cox regression analysis-generated prognosis associated with those factors. Similarly, the samples in the training set were divided into high- and low-risk groups, and the correlation between the risk prediction model and the prognosis was assessed through a Kaplan-Meier survival curve. Finally, a risk prediction model that synthesized clinical factors and tumor marker genes was constructed based on the prognosis correlation coefficients obtained from the 2 models previously described. The PI of each sample was recalculated, the median of which was used to divide the samples in training set into high- and low-risk groups. The correlation between the risk prediction model and prognosis was evaluated via a Kaplan-Meier survival curve, and validated with the validation dataset.

Results

Quality control and differential expression analysis of data used in the meta-analysis

After normalization, quality control was performed on the datasets with MetaQC. Five parameter scores were calculated, including internal quality control (IQC), external quality control (EQC), accuracy quality control (AQCg), consistency quality control (CQCg), and standardized mean rank score (SMR), as shown in Table 3. In addition, results of principal component analysis of these datasets are shown in Figure 1A and 1B. After combining Table 3 and Figure 1, we concluded that the distribution of 6 expression profiling datasets and 4 methylation profiling datasets was balanced, and all indexes fit the standard of data quality, so the 10 datasets were included in the subsequent analysis. Finally, 1975 significant DEGs and 2095 DMGs were identified using MetaDE. A heatmap of these DEGs and DMGs showed that that the DEGs and DMGs screened from different datasets were consistent in their differential degree and direction (Figure 2).
Table 3

MetaQC quality control of 6 expression profiling datasets and 4 methylation profiling datasets.

IQCEQCCQCgCQCpAQCgAQCpSMR
Gene expression profiling
 GSE750375.273.23106.65158.8632.7190.881.62
 GSE328634.383.1664.14146.5126.4696.742.42
 GSE335324.813.2359.25171.4925.5084.372.86
 GSE434586.091.10101.10114.3019.5329.463.92
 GSE302196.643.7183.97107.6947.8763.894.33
 GSE100728.069.1912.248.929.7814.527.76
Methylation profiling
 GSE328619.805.0019.2441.016.1724.773.28
 GSE499966.224.9646.7042.028.6733.563.14
 GSE633847.563.0524.5733.793.4517.845.67
 GSE629485.113.6359.2560.2729.4984.373.25

IQC – internal quality control; EQC – external quality control; CQC – consistency quality control; AQC – accuracy quality control; SMR – standardized mean rank score.

Figure 1

MetaQC quality control charts of (A) 5 gene expression profiles and (B) 2 gene methylation profiles. The horizontal and vertical axes represent the first and second principal components in principal component analysis. The numbers represent the corresponding datasets.

Figure 2

Heatmaps of (A) significant differentially expressed genes and (B) differentially methylated genes obtained based on MetaDE screening.

Correlation analysis between gene expression level and methylation level

By comparing the 1975 DEGs and 2095 DMGs, 265 intersecting genes (candidate genes) were identified. An analysis of the correlation between the expression and methylation levels of the 265 candidate genes then was performed, based on the methylation and expression profiles that matched the samples in TCGA and GSE62950 datasets. As shown in Figure 3, the expression values and methylation levels of 265 gene were negatively correlated in the TCGA and GSE62950 datasets, and the correlation coefficients were −0.5108 (P=0.0114) and −0.4216 (P=0.0003), respectively. Functional enrichment analysis of 265 candidate genes identified 15 significant GO biological processes and 9 KEGG pathways, as shown in Table 4.
Figure 3

Correlation analysis of expression levels and methylation levels of 265 genes in (A) TCGA and (B) the GSE62950 dataset. The horizontal axis represents the gene expression level, the vertical axis represents the gene methylation level, the oblique line represents the trend line synthesized by points, and the red font represents the correlation coefficient (CC) and the significant P value.

Table 4

Functional enrichment analysis results for 265 candidate genes.

CategoryTermCountP valueGenes
Biologic processGO: 0032409 ~ regulation of transporter activity60.0002PLCG2, NDFIP1, PKD2, FKBP1B, NKX2-5, SYNGR3
GO: 0009611 ~ response to wounding210.0005PPARA, A2M, ACHE, BMP2, UCN, FOXA2, EFEMP2, ATRN, CHST2, HOXB13, SERPING1, CD40, TNFRSF1B, THBD, PLSCR4, CTGF, PLA2G7, LTA4H, CFD, PLAU, ACVR1
GO: 0050777 ~ negative regulation of immune response50.0007A2M, IL27RA, NDFIP1, CTLA4, SERPING1
GO: 0048585 ~ negative regulation of response to stimulus80.0013PPARA, A2M, TNFRSF1B, IL27RA, NDFIP1, CTLA4, SERPING1, NT5E
GO: 0015718 ~ monocarboxylic acid transport60.0013SLC16A3, SLC25A20, PPARA, SLC16A1, PLA2G1B, SLCO2A1
GO: 0055082 ~ cellular chemical homeostasis160.0016FXYD1, TRPM8, IL6ST, NDFIP1, TP53, FZD2, FKBP1B, CKB, GCKR, PLCG2, CLDN1, PKD2, RGN, SV2A, KCNH2, KCNQ1
GO: 0050878 ~ regulation of body fluid levels90.0023SCT, UCN, THBD, PLSCR4, FOXA2, EFEMP2, SERPING1, CD40, PLAU
GO: 0006869 ~ lipid transport90.0028SLC25A20, PPARA, OSBPL3, SORL1, LIPG, PLA2G1B, VPS4B, VLDLR, SLCO2A1
GO: 0031348 ~ negative regulation of defense response50.0028A2M, TNFRSF1B, NDFIP1, SERPING1, NT5E
GO: 0050801 ~ ion homeostasis160.0033FXYD1, TRPM8, IL6ST, NDFIP1, TP53, FZD2, CPS1, FKBP1B, CKB, PLCG2, CLDN1, PKD2, RGN, SV2A, KCNH2, KCNQ1
GO: 0035295 ~ tube development110.0036BMP2, FOXA2, CTGF, CRISPLD2, TGFBR1, HOXB13, PCSK5, NKX2-5, HECA, MYCN, ACVR1
GO: 0006873 ~ cellular ion homeostasis150.0037FXYD1, TRPM8, IL6ST, NDFIP1, TP53, FZD2, FKBP1B, CKB, PLCG2, CLDN1, PKD2, RGN, SV2A, KCNH2, KCNQ1
GO: 0010876 ~ lipid localization90.0045SLC25A20, PPARA, OSBPL3, SORL1, LIPG, PLA2G1B, VPS4B, VLDLR, SLCO2A1
GO: 0019725 ~ cellular homeostasis170.0046FXYD1, TRPM8, PDIA2, IL6ST, NDFIP1, TP53, FZD2, FKBP1B, CKB, GCKR, PLCG2, CLDN1, PKD2, RGN, SV2A, KCNH2, KCNQ1
GO: 0048878 ~ chemical homeostasis180.0049FXYD1, TRPM8, IL6ST, NDFIP1, TP53, FZD2, CPS1, FKBP1B, CKB, GCKR, PLCG2, LIPG, CLDN1, PKD2, RGN, SV2A, KCNH2,
KEGG pathwayhsa00562: Inositol phosphate metabolism50.0017ISYNA1, PLCG2, SYNJ2, ITPKB, INPP5A
hsa04610: Complement and coagulation cascades50.0037A2M, THBD, SERPING1, CFD, PLAU
hsa04070: Phosphatidylinositol signaling system50.0046PLCG2, SYNJ2, ITPKB, CALML5, INPP5A
hsa00532: Chondroitin sulfate biosynthesis30.0060B3GAT1, XYLT1, CHSY1
hsa05217: Basal cell carcinoma40.0393BMP2, TP53, WNT11, FZD2
hsa00534: Heparan sulfate biosynthesis30.0081B3GAT1, XYLT1, HS3ST1
hsa00590: Arachidonic acid metabolism40.0082AKR1C3, CYP2C18, PLA2G1B, LTA4H
hsa04514: Cell adhesion molecules (CAMs)60.0093NRCAM, CDH15, CLDN1, CTLA4, CD40, SDC2
hsa00340: Histidine metabolism30.0098HDC, LCMT2, MAOB

KEGG – Kyoto Encyclopedia of Genes and Genomes.

Screening of prognosis-related tumor marker genes and clinical factors

From an initial pool of 256 candidate genes and based on the clinical factors in the samples, 16 prognosis-related genes (EFNB2, TSPAN7, INPP5A, VAMP2, CALML5, SNAI2, RHOBTB1, CKB, ATF7IP2, RIMS2, RCBTB2, YBX1, RAB27B, NFATC1, TCEAL4, and SLC16A3) (Table 5) were screened using univariate and multivariate Cox regression analyses. An analysis then was performed of the correlation between the expression and methylation levels in 16 prognostic genes in TCGA and GSE62950 datasets (Supplementary Figure 1). Five clinical factors were identified: pathologic N (nodes), pathologic T (tumor), pathologic stage, new tumor, and radiation therapy. As shown in Table 6, pathologic N, pathologic T, pathologic stage, and new tumor were significantly correlated with prognosis. The Kaplan-Meier curves for the correlations between the 4 clinical factors and overall survival (OS) are shown in Supplementary Figure 2. A cluster analysis of the expression and methylation levels of the 16 prognosis-related genes and the 4 clinical factors revealed that the samples could be divided into 2 clusters. There were 160 and 175 samples in clusters 1 and 2, respectively (Figure 4). In addition, a chi-square test of sample clinical information in the 2 clusters revealed that pathologic N was significantly correlated with both clusters (P=0.0467) (Supplementary Table 1).
Table 5

Tumor marker genes significantly associated with prognosis.

GeneCoefficientHazard ratioLower.95Upper.95P value
EFNB20.71212.03841.52102.7317<0.0001
TSPAN7−0.58240.55860.43800.7123<0.0001
INPP5A−1.47300.22920.11030.4762<0.0001
VAMP21.42774.16902.00048.68850.0001
CALML50.20061.22211.09961.35820.0002
SNAI20.54491.72451.24342.39160.0011
RHOBTB10.63481.88671.24672.85520.0027
CKB−0.35110.70390.55780.88840.0031
ATF7IP2−0.46660.62720.42990.91490.0155
RIMS20.15231.16451.02271.32590.0215
RCBTB2−0.61060.54300.31890.92470.0246
YBX10.77662.17401.09094.33250.0273
RAB27B0.25541.29091.02761.62180.0283
NFATC1−0.52890.58920.36600.94870.0295
TCEAL4−0.64010.52720.29330.94760.0324
SLC16A3−0.41250.66200.45200.96960.0341
Table 6

Univariate and multivariate Cox regression analyses of clinical factors.

Clinical characteristicsUnivariate Cox regressionMultivariate Cox regression
P valueHR (95%CI)P valueHR (95%CI)
Age (above/below median, 65 years)0.43701.155 (0.804~1.659)
Sex (Male/Female)0.74501.062 (0.741~1.52)
Pathologic M (M0/M1)0.13101.692 (0.848~3.378)
Targeted molecular therapy (yes/no)0.16011.366 (0.883~ 2.114)v
Tobacco smoking history (current/reformed/never)0.99001.002 (0.737~1.362)
Radiation therapy (yes/no)0.00352.033 (1.25~3.307)0.59241.163 (0.669~2.019)
Pathologic N (N0/N1/N2)<0.00011.85 (1.494~2.29)0.04711.439 (1.005~2.060)
Pathologic T (T1/T2/T3/T4)0.00021.537 (1.223~1.932)0.01691.236 (0.914~1.672)
Pathologic stage (I/II/III/IV)<0.00011.671 (1.413~1.976)0.01031.279 (0.952~1.718)
New tumor (yes/no)<0.00012.362 (1.535~3.634)0.00012.395 (1.533~3.742)

HR – hazard ratio.

Figure 4

Bidirectional hierarchical cluster heatmaps based on 16 gene expression and methylation levels. The first line under the cluster tree represents pathologic N information, and the change from light orange to deep orange represents N0 to N2. The second line represents the pathologic T information, and the change from light blue to dark blue represents T1 to T4. The third line represents pathologic stage information, and the change from light green to dark green represents stages I to IV. The fourth line represents new tumor information, and the blue and gold represent the samples without and with new tumor, respectively.

A risk prediction model was constructed using the 16 prognosis-associated tumor marker genes identified by the Cox regression analysis. A Kaplan-Meier survival curve was used on the TCGA training set to assess the correlation between the risk groups and the prognosis for OS and recurrence. In OS prognosis, low-risk patients (167 samples) had a longer OS time compared with high-risk patients (168 samples) (Table 7). The P value for the correlation between the risk groups and OS prognosis was 3.961e-08. The Kaplan-Meier curve is shown on the left side of Figure 5A.
Table 7

Prognostic time for different risk classification models of the TCGA and GSE37745 dataset.

Overall survival time (months, mean±SD)Recurrence-free survival time (months, mean±SD)
Low-riskHigh-riskLow-riskHigh-risk
TCGAGene expression model33.64±37.1721.41±17.7128.18±35.0415.79±13.97
Clinic factor model29.03±35.4627.01±27.7625.74±32.6819.32±22.79
Combined model33.55±40.0922.33±17.7828.33±35.8716.31±14.57
GSE37745Gene expression model68.66±47.0853.69±52.4670.53±57.4737.06±43.82
Clinic factor model84.51±62.3854.51±50.6762.53±55.7730.66±39.29
Combined model84.50±62.3854.51±50.6962.53±55.7730.66±39.29

TCGA – The Cancer Genome Atlas.

Figure 5

(A) The Kaplan-Meier curves for the risk prediction model based on tumor marker genes and OS prognosis (left) and recurrence prognosis (right) in TCGA training set. (B) The Kaplan-Meier curves for the risk prediction model based on tumor marker genes and OS prognosis (left) and recurrence prognosis (right) in the GSE37745 validation set. (C) AUROC curves for the prognosis prediction model and OS prognosis and recurrence prognosis in TCGA training set and the GSE37745 verification set. (D) The Kaplan-Meier curves for the risk prediction model based on clinical factors and OS prognosis (left) and recurrence prognosis (right) in TCGA training set. (E) The Kaplan-Meier curves for the risk prediction model based on clinical factors and OS prognosis (left) and recurrence prognosis (right) in the GSE37745 validation set. (F) The AUROC curves for the prognosis prediction model and OS prognosis and recurrence prognosis in TCGA training set and the GSE37745 verification set. (G) The Kaplan-Meier curves for the risk prediction model based on tumor marker genes combined with clinical factors and OS prognosis (left) and recurrence prognosis (right) in TCGA training set. (H) The Kaplan-Meier curves for the risk prediction model based on tumor marker genes combined with clinical factors and OS prognosis (left) and recurrence prognosis (right) in the GSE37745 validation set. (I) The AUROC curves for the prognosis prediction model and OS prognosis and recurrence prognosis in TCGA training set and the GSE37745 verification set. The green and blue curves in (C, F, I) represent the AUROC curves for OS prognosis and recurrence prognosis in TCGA and the black and red curves represent the AUROC curves of OS prognosis and recurrence prognosis in the GSE37745 verification set.

Based on the PI values, a receiver operating characteristic (ROC) curve was constructed. The area under the ROC curve (AUROC) for prognosis was 0.997 (Figure 5C; green curve). In the analysis of recurrence prognosis (260 samples), low-risk patients (130 samples) also had a longer time to relapse relative to the high-risk patients (130 samples) (Table 7). The P value for the correlation between the risk groups and prognosis for recurrence-free survival (RFS) was 3.961e-08 (Figure 5A; right) and the AUROC of the ROC curve was 0.985 (Figure 5C; blue curve). The risk prediction model was validated in GSE37745 and the results were consistent with that in the training set. As shown in Figure 5B, the P value for the correlation between the risk groups and the prognosis for OS was 0.0091 (left) and between the risk groups and prognosis for recurrence was 0.0260 (right). The AUROC of ROC curve for OS and relapse prognoses were 0.979 (black curve) and 0.953 (red curve), respectively (Figure 5C). Using the same methods, a risk prediction mode was constructed using the clinical factors (Figure 5D–5F) and both the tumor marker genes and the clinical factors (Figure 5G–5I). The OS and RFS for high- and low-risk groups are shown in Table 7.

Discussion

The present study integrated multiple LACC gene expression and methylation profile datasets and used meta-analysis to preliminarily screen out 265 genes whose expression levels were significantly influenced by methylation. Then 16 prognosis-related genes (EFNB2, TSPAN7, INPP5A, VAMP2, CALML5, SNAI2, RHOBTB1, CKB, ATF7IP2, RIMS2, RCBTB2, YBX1, RAB27B, NFATC1, TCEAL4, and SLC16A3) were elected using Cox regression analysis, which was then used successfully to construct a prognostic risk prediction model. In addition, we constructed a risk prediction model based on 4 clinical factors: pathologic N, pathologic T, pathologic stage, and new tumor. Finally, a comprehensive prognostic risk model that combined tumor marker genes and clinical factors was constructed and validated. Of the 16 tumor marker genes, both calmodulin like 5 (CALML5) and inositol polyphosphate-5-phosphatase a (INPP5A) were involved in the hsa04070: phosphatidylinositol signaling system. Signaling by phosphorylated species of phosphatidylinositol regulates various cellular processes, such as cytoskeletal reorganization, membrane trafficking, and sex-dependent synaptic patterning [29,30]. Phosphatidylinositol 3-kinase (PI3K) signaling is the most common phosphatidylinositol signaling in cancers, including those of the lung [31,32]. Specifically, INPP5A recently has been reported to be a prognostic marker for cutaneous squamous cell carcinoma [33]. In addition to CALML5 and INPP5A, creatine kinase B (CKB) and solute carrier family 16 member 3 (SLC16A3) were also identified in function enrichment analysis. CKB was enriched in ion homeostasis-associated functions and SLC16A3 was enriched in function associated with monocarboxylic acid transport. CKB is an enzyme involved in energy transduction pathways, and levels of it are low in colorectal cancer [34]. A recent study revealed that quantification of DNA methylation of specific CpG sites in the SLC16A3 promoter had clinical potential for diagnosing and predicting prognosis of clear cell renal cell carcinoma [35]. Those findings, taken together with our results, lead us to speculate that CALML5, INPP5A, CKB and SLC16A3 may be involved in the progression of LACC through different pathways, and they may serve as important markers of diagnosis and prognosis in LACC. Among the 16 marker genes, ephrin B2 (EFNB2), tetraspanin 7 (TSPAN7), INPP5A, vesicle-associated membrane protein 2 (VAMP2), and CALML5 had the lowest P values. EFNB2 is a member of the ephrin family. The ephrin system is implicated in many cellular processes, such as cell proliferation, differentiation, and migration, as well as physiological or pathological angiogenesis [36]. It is also implicated in human cancers through autocrine or juxtacrine activation [37]. Coexpression of EFNB2 and its receptor, Ephrin type-B receptor 4, has been reported in papillary thyroid carcinoma, glioblastoma, and uterine cervical and ovarian cancers [38-41]. Recently, Oweida et al. [42] suggested that overexpression of EFNB2 can serve as a biomarker for patient prognosis. TSPAN7, a member of the transmembrane 4 superfamily, has been implicated in the development and progression of several cancers. It was first found to be strongly expressed in T-cell acute lymphoblastic leukemia [43]. Subsequent microarray analyses have demonstrated overexpression of TSPAN7 in several solid tumors [44,45]. Research on the role of TSPAN7 in LACC is rare. VAMP2 is a member of the vesicle-associated membrane protein. The VAMP2-NRG1 fusion gene promotes anchorage-independent colony formation of LACC cells, serving as a novel oncogenic driver of LACC [46]. Recently, Wang et al. demonstrated that miR-493-5p overexpression promotes cell apoptosis and inhibits the proliferation and migration of liver cancer cells by negatively regulating the expression of VAMP2 [47], which indirectly indicates the important role that VAMP2 plays in cancer. Taken together, all of these studies suggest that EFNB2, TSPAN7 and VAMP2, may serve as prognostic makers in LACC. Most of the other tumor marker genes we identified have been reported to be implicated in lung cancer or other human cancers. For instance, Sail family transcriptional repressor 2 (SNAI2) encodes a zinc-finger protein from the SNAI family of transcription factors [48]. SNAI2 is amplified or interacts with specific oncogenes in many human cancers, including lung cancer [49,50]. SNAI2 expression by cancer-associated fibroblasts is correlated with worse OS in NSCLC [51]. Rho-related BTB domain containing 1 (RHOBTB1), which belongs to the RhoBTB subfamily, has been proposed as a tumor suppressor [52]. Y-box binding protein-1 (YBX1) is upregulated in various cancers, including lung cancer, and serves as a new marker of lung cancer progression [53]. Hendrix et al. [54] found that RAB27B, a member of RAS oncogene family, regulates invasive tumor growth and metastasis of several breast cancer cell lines. Nuclear factor of activated T cells 1 (NFATC1) regulates many cancer-related functions, such as cell proliferation, migration, and angiogenesis. It also acts as an oncogene involved in some functions in cancer and induces a tumorigenic microenvironment [55]. Transcription elongation factor A (SII)-like 4 (TCEAL4) is downregulated in anaplastic thyroid cancer [56]. Therefore, these genes may have roles as key biomarkers in LACC.

Conclusions

In the present study, we identified 16 tumor marker genes for LACC, based on analysis of multiple gene expression and methylation profiling datasets, and constructed an integrated risk prediction model that combined those tumor markers with clinical factors. The 16 genes we identified, such as EFNB2, TSPAN7, INPP5A, VAMP2, and CALML5, may serve as novel biomarkers in early diagnosis and prediction of prognosis of LACC. Analysis of the correlation between expression and methylation levels for 16 prognostic genes in TCGA and the GSE62950 dataset. The Kaplan-Meier curves for the correlations between the 4 clinical factors (pathologic N, pathologic T, pathologic stage, and new tumor) and overall survival. Clinical and chi-square test information for samples in clusters 1 and 2.
Supplementary Table 1

Clinical and chi-square test information for samples in clusters 1 and 2.

Clinical characteristicsCluster 1Cluster 2X-squaredP value
Pathologic N (N0/N1/N2)93362812124275.40910.0467
Pathologic T (T1/T2/T3/T4)489314563871592.82250.4198
Pathologic stage (I/II/III/IV)824231598393081.57350.6654
New tumor (yes/no)478457920.08230.7742
  53 in total

1.  K-ras oncogene activation as a prognostic marker in adenocarcinoma of the lung.

Authors:  R J Slebos; R E Kibbelaar; O Dalesio; A Kooistra; J Stam; C J Meijer; S S Wagenaar; R G Vanderschueren; N van Zandwijk; W J Mooi
Journal:  N Engl J Med       Date:  1990-08-30       Impact factor: 91.245

2.  Activation of NFAT signaling establishes a tumorigenic microenvironment through cell autonomous and non-cell autonomous mechanisms.

Authors:  P Tripathi; Y Wang; M Coussens; K R Manda; A M Casey; C Lin; E Poyo; J D Pfeifer; N Basappa; C M Bates; L Ma; H Zhang; M Pan; L Ding; F Chen
Journal:  Oncogene       Date:  2013-04-29       Impact factor: 9.867

3.  Overexpression of EphB4, EphrinB2, and epidermal growth factor receptor in papillary thyroid carcinoma: A pilot study.

Authors:  Giriraj K Sharma; Vaninder K Dhillon; Rizwan Masood; Dennis R Maceri
Journal:  Head Neck       Date:  2014-06-27       Impact factor: 3.147

4.  Gene expression in lung adenocarcinomas of smokers and nonsmokers.

Authors:  Charles A Powell; Avrum Spira; Adnan Derti; Charles DeLisi; Gang Liu; Alain Borczuk; Steve Busch; Sudhir Sahasrabudhe; Yangde Chen; David Sugarbaker; Raphael Bueno; William G Richards; Jerome S Brody
Journal:  Am J Respir Cell Mol Biol       Date:  2003-02-21       Impact factor: 6.914

5.  limma powers differential expression analyses for RNA-sequencing and microarray studies.

Authors:  Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth
Journal:  Nucleic Acids Res       Date:  2015-01-20       Impact factor: 16.971

6.  Proteome analysis of human colon cancer by two-dimensional difference gel electrophoresis and mass spectrometry.

Authors:  David B Friedman; Salisha Hill; Jeffrey W Keller; Nipun B Merchant; Shawn E Levy; Robert J Coffey; Richard M Caprioli
Journal:  Proteomics       Date:  2004-03       Impact factor: 3.984

7.  Gene-expression profiles predict survival of patients with lung adenocarcinoma.

Authors:  David G Beer; Sharon L R Kardia; Chiang-Ching Huang; Thomas J Giordano; Albert M Levin; David E Misek; Lin Lin; Guoan Chen; Tarek G Gharib; Dafydd G Thomas; Michelle L Lizyness; Rork Kuick; Satoru Hayasaka; Jeremy M G Taylor; Mark D Iannettoni; Mark B Orringer; Samir Hanash
Journal:  Nat Med       Date:  2002-07-15       Impact factor: 53.440

8.  SNAI2/SLUG and estrogen receptor mRNA expression are inversely correlated and prognostic of patient outcome in metastatic non-small cell lung cancer.

Authors:  Akin Atmaca; Ralph W Wirtz; Dominique Werner; Kristina Steinmetz; Silke Claas; Wolfgang M Brueckl; Elke Jäger; Salah-Eddin Al-Batran
Journal:  BMC Cancer       Date:  2015-04-17       Impact factor: 4.430

9.  Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline.

Authors:  Lun-Ching Chang; Hui-Min Lin; Etienne Sibille; George C Tseng
Journal:  BMC Bioinformatics       Date:  2013-12-21       Impact factor: 3.169

10.  The coexpression of EphB4 and EphrinB2 is associated with poor prognosis in HER2-positive breast cancer.

Authors:  Xuelu Li; Chen Song; Gena Huang; Siwen Sun; Jingjing Qiao; Jinbo Zhao; Zuowei Zhao; Man Li
Journal:  Onco Targets Ther       Date:  2017-03-21       Impact factor: 4.147

View more
  1 in total

1.  Identification of a Seven-lncRNA-mRNA Signature for Recurrence and Prognostic Prediction in Relapsed Acute Lymphoblastic Leukemia Based on WGCNA and LASSO Analyses.

Authors:  Haiyan Qi; Long Chi; Xiaogang Wang; Xing Jin; Wensong Wang; Jianping Lan
Journal:  Anal Cell Pathol (Amst)       Date:  2021-06-09       Impact factor: 2.916

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.