Literature DB >> 32377703

A risk score system based on DNA methylation levels and a nomogram survival model for lung squamous cell carcinoma.

Ming Zhang1, Libing Sun1, Yi Ru1, Shasha Zhang1, Junjun Miao1, Pengda Guo1, Jinghuan Lv1, Feng Guo1, Biao Liu1.   

Abstract

Lung squamous cell carcinoma (LSCC) is one of the primary types of non‑small cell lung carcinoma, and patients with recurrent LSCC usually have a poor prognosis. The present study was conducted to build a risk score (RS) system for LSCC. Methylation data on LSCC (training set) and on head and neck squamous cell carcinoma (validation set 2) were obtained from The Cancer Genome Atlas database, and GSE39279 (validation set 1) was retrieved from the Gene Expression Omnibus database. Differentially methylated protein‑coding genes (DMGs)/long non‑coding RNAs (DM‑lncRNAs) between recurrence‑associated samples and nonrecurrence samples were screened out using the limma package, and their correlation analysis was conducted using the cor.test() function. Following identification of the optimal combinations of DMGs or DM‑lncRNAs using the penalized package in R, RS systems were built, and the system with optimal performance was selected. Using the rms package, a nomogram survival model was then constructed. For the differentially expressed genes (DEGs) between the high‑ and low‑risk groups, pathway enrichment analysis was performed by Gene Set Enrichment Analysis. There were 335 DMGs and DM‑lncRNAs in total. Following screening out of the top 10 genes (aldehyde dehydrogenase 7 family member A1, chromosome 8 open reading frame 48, cytokine‑like 1, heat shock protein 90 alpha family class A member 1, isovaleryl‑CoA dehydrogenase, phosphodiesterase 3A, PNMA family member 2, SAM domain, SH3 domain and nuclear localization signals 1, thyroid hormone receptor interactor 13 and zinc finger protein 878) and 6 top lncRNAs, RS systems were constructed. According to Kaplan‑Meier analysis, the DNA methylation level‑based RS system exhibited the best performance. In combination with independent clinical prognostic factors, a nomogram survival model was built and successfully predicted patient survival. Furthermore, 820 DEGs between the high‑ and low‑risk groups were identified, and 3 pathways were identified to be enriched in this gene set. The 10‑DMG methylation level‑based RS system and the nomogram survival model may be applied for predicting the outcomes of patients with LSCC.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 32377703      PMCID: PMC7255475          DOI: 10.3892/ijmm.2020.4590

Source DB:  PubMed          Journal:  Int J Mol Med        ISSN: 1107-3756            Impact factor:   4.101


Introduction

Lung cancer is a common cancer worldwide, with the second highest mortality rate among females and the highest mortality rate among males (1,2). Non-small cell lung cancer (NSCLC) is the primary lung cancer type that accounts for ~85% of lung cancer cases and includes the following common pathological subtypes: Lung squamous cell carcinoma (LSCC), lung adenocarcinoma (LUAD), and lung large cell carcinoma (3,4). LSCC is more common among males, potentially due to smoking habits (5). LSCC has distinct clinical characteristics, and the prognosis of patients with recurrent LSCC is poor (6). Therefore, it is urgent to identify new molecular factors predictive of LSCC treatment outcomes. Methylation changes in certain tumor-associated genes have been identified in previous studies examining tumorigenesis, suggesting that they are critical risk factors of tumorigenesis and molecular markers for early diagnosis (7). Promoter methylation of PR/SET domain 5 is significantly associated with lymph node metastasis and tumor differentiation status of LSCC; therefore, this gene is a candidate target for the diagnosis, prognosis and treatment of this cancer (8). The drought-repressed 4 gene has been proposed for methylation status analysis of LSCC cells, and its low expression correlates with a poor patient prognosis (9). Tripartite motif-containing 58/cg26157385 methylation is associated with expression of 8 prognosis-associated genes in LSCC, indicating its potential regulatory role in the progression of LSCC (10). Long non-coding RNAs (lncRNAs) are a class of RNAs measuring >200 nucleotides and have pivotal functions in the progression of numerous cancer types (11). In addition, increasing numbers of lncRNAs with a prognostic value have been described, for example, urothelial cancer associated and long intergenic non-coding RNA-p21 (12-14). The lncRNAs cancer susceptibility 2 and surfactant associated 1, pseudogene participate in the regulation of tumor suppressor genes and oncogenes during the formation of LSCC and may be used for the diagnosis, prognosis and targeted treatment of this disease (15). Several lncRNA combination signatures have been identified to predict LSCC prognosis (16,17). In LUAD, DNA methylation has regulatory effects on the role of lncRNA (18); however, to the best of our knowledge, this type of regulation in LSCC has rarely been reported. Despite these valuable data, the predictive performance of methylated DNA sequences or lncRNAs is rarely compared, and additional potential biomarkers should be investigated to improve cancer prognosis. In the present study, the methylation levels of DNA sequences and lncRNAs in patients with LSCC were compared, and the levels identified to be significantly correlated with LSCC prognosis were screened out. Furthermore, different risk score (RS) systems were built, and the one with the best predictive performance was selected.

Materials and methods

Data sources

Methylation data from patients with LSCC (downloaded on March 25, 2019; platform: Illumina Infinium Human Methylation 450 BeadChip) including 372 LSCC samples and 43 healthy tissue samples, were retrieved from The Cancer Genome Atlas (TCGA; https://cancergenome.nih. gov/) database. Following collation with clinical information, 293 LSCC samples with data recurrence were selected as the training set. With 'lung cancer' and 'Homo sapiens' as key words, a search for methylation datasets was conducted in the Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih. gov/geo/) database (19) using the following criteria: i) Available histological data that could differentiate adenocarcinoma from squamous cell carcinoma; ii) total sample size was not <150; and iii) the clinical information associated with the LSCC samples contained data on actual future recurrence. Following this search, an eligible methylation dataset (accession no., GSE39279 (20); platform: Illumina HumanMethylation450 BeadChip; validation set 1) was selected and downloaded. There were 444 samples in the dataset, including 43 LSCC samples with data concerning future recurrence. In addition, the methylation data on head and neck squamous cell carcinoma (HNSC), including 530 HNSC samples and 50 healthy tissue samples, were also retrieved from TCGA to investigate whether the predictive model for LSCC was applicable to other squamous carcinoma types. Following collation with the clinical information downloaded from TCGA simultaneously, 382 HNSC samples with information on future recurrence were selected as validation set 2.

Differential methylation analysis

According to the data corresponding to probe locations and IDs provided in the downloaded annotation files that contained information about protein-coding genes and non-coding RNAs, like gene and lncRNA function, DNA location site and pathway information, corresponding lncRNAs and genes in the methylation data were annotated using the HUGO Gene Nomenclature Committee (http://www.genenames.org/) database (21), which contains records on 4,112 lncRNAs and 19,201 genes. On the basis of the recurrence data, the LSCC samples in the training set were classified into recurrence-associated (the tumor recurred) and nonrecurrence (the tumor did not recur) groups. Using the limma package (22) (v.3.34.7, https://bioconductor.org/packages/release/bioc/html/limma.html) in the R software (23), differentially methylated genes (DMGs) between the recurrence-associated and nonrecurrence samples were identified. The thresholds for significance were defined as |log2 fold change (FC)|>0.263 and false discovery rate (FDR) <0.05. Following the addition of data on methylation levels of the DMGs in the training set, bidirectional hierarchical clustering was performed on the DMGs via the pheatmap package (v.1.0.8, https://cran.r-project.org/web/packages/pheatmap/index.html) in R (24).

Analysis of correlation between the methylation levels and expression levels of the DMGs

Data on the methylation levels and the matched expression levels of the DMGs were extracted, and their Pearson correlation coefficients (PCCs) were calculated using the cor.test() function (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.test.html) in R (25). P<0.05 was set as the significance threshold. DMGs with negative PCCs were selected for further analyses.

Construction of RS systems

Using univariate and multivariate Cox regression analyses in R package survival (v3.1-8; https://www.rdocumentation.org/packages/survival/versions/3.1-8), differentially methylated lncRNAs (DM-lncRNAs) and DMGs significantly associated with overall survival and independent prognosis factors were next screened out. A log-rank P<0.05 was selected as the cutoff criterion. Based on the DMGs correlating with independent prognosis, an optimal gene combination and optimal lncRNA combination were identified using the LASSO Cox regression model (26) in penalized package in R (27) (v.0.9.50; https://cran.r-project.org/web/packages/penalized/index.html). Optimal parameter 'lambda' in the model was calculated by the 1,000-fold cross-validation likelihood (cvl) method. In combination with prognostic coefficients of the optimal DM-lncRNAs and DMGs, methylation status-based or methylation level-based RS systems were built. For the optimal combinations, cutoff values of methylation levels of the optimal lncRNAs and genes were computed via the X-Tile Bio-Informatics Tool (https://medicine.yale.edu/lab/rimm/research/software.aspx) (28). The cutoff values were determined based on the following criterion: Monte Carlo P<0.05. The status of each sample at an lncRNA or DNA methylation level was defined in accordance with the cutoff value of each lncRNA or gene. When the lncRNA or DNA methylation level was increased compared with the cutoff value, its status was set to 1.0. Otherwise, the status was set to 0. In combination with the regression coefficients of each optimal lncRNA or gene and their methylation status, a RS system was created, and the RS for each sample was calculated via the following formula: where β and Status represent the regression coefficient of an lncRNA or gene and the methylation status variable, respectively. The status RS systems, based on the methylation status of the optimal genes and optimal lncRNAs, were constructed separately as two separate status RS systems. RS systems based on the methylation levels of optimal lncRNAs or optimal genes were constructed. The formula for calculating the RS of each sample was as follows: where β and MethylationRNAn represent the regression coefficient and methylation level of an lncRNA or gene, respectively. A methylation RS system based on the methylation levels of optimal genes and a methylation RS system based on the methylation levels of optimal lncRNAs were built separately. Samples in the training set were subdivided into high- and low-risk groups based on the median of the RSs of the two status RS systems and the two methylation RS systems. Using the Kaplan-Meier (KM) survival curve constructed by means of R package survival (29), associations between the prognosis and RS systems were analyzed. Concomitantly, the two validation sets were used to validate the results obtained. Finally, from the four RS systems generated, the system with optimum performance was selected through comparison of the parameters in the training set and the two validation sets.

Construction of a nomogram survival model

In the training set, independent prognostic factors were selected by univariate and multivariate Cox regression analyses via the survival package (29). The cutoff criterion was log-rank P<0.05. According to the risk data determined by the RS system, the associations between the independent clinical prognostic factors and prognosis were then analyzed. Using the rms package (v.5.1-2, https://cran.r-project.org/web/packages/rms/index.html) (30) in R, a 5-year nomogram survival model involving the independent clinical prognostic factors was built.

Pathway enrichment analysis

RSs of the best RS system were used to classify the samples in the training set into high- and low-risk groups. Using the limma package (23), differential expression analysis of the high- and low-risk groups was performed to identify differentially expressed genes (DEGs) between the two risk groups. The DEGs were screened according to FDR <0.05 and |log2 FC|>0.263 cut-off values. Using Gene Set Enrichment Analysis (http://software.broadinstitute.org/gsea/index.jsp) (31), pathway enrichment analysis of the DEG set was conducted. The pathways with P<0.05 were identified as significant.

Results

Differential-methylation analysis

On the basis of the annotation files, 1,028 lncRNAs and 15,544 genes were selected from the methylation dataset. According to the recurrence data, samples in the training set were subdivided into the recurrence-associated (78 LSCC samples from patients who later experienced cancer recurrence) and nonrecurrence (215 LSCC samples) groups. There were 335 DMGs and DM-lncRNAs between the recurrence-associated and nonrecurrence groups, including 27 DM-lncRNAs, 4 hypo-methylated and 23 hypermethylated lncRNAs, and 308 DMGs, 139 hypomethylated and 169 hypermethylated genes (Fig. 1A). The bidirectional hierarchical clustering heatmap for the DMGs and GM-lncRNAs is presented in Fig. 1B.
Figure 1

Identification of DMGs/DM-lncRNAs between recurrence-associated and nonrecurrence samples of LSCC tumors. (A) The volcano plot for the DMGs/DM-lncRNAs. The horizontal dashed line and the two vertical dashed lines represent FDR <0.05 and |log2 FC|>0.263, respectively; blue and orange dots separately represent hypomethylated genes/lncRNAs and hypermethylated genes/lncRNAs. (B) The bidirectional hierarchical clustering heatmap for the DMGs/DM-lncRNAs. The purple and green strips represent recurrence-associated and nonrecurrence samples, respectively. DMGs, differentially methylated genes; DM-lncRNAs, differentially methylated lncRNAs; FDR, false discovery rate; FC, fold change; Rec, recurring; non-rec, non-recurring; lncRNAs, long non-coding RNAs.

Analysis of the correlation between the methylation levels and expression levels of the DMGs or DM-lncRNAs

As aforementioned, the matched methylation samples from the LSCC mRNA-seq samples were first selected, and the methylation and expression data on the DMGs and DM-lncRNAs were then extracted. The overall correlation between the methylation levels and expression levels of the DMGs/DM-lncRNAs were analyzed among the matched samples, and a significant negative correlation (PCC=-0.3857; P=0.574×10−13) was identified (Fig. 2). A total of 181 DMGs and DM-lncRNAs with negative PCCs, including 25 DM-lncRNAs and 156 DMGs, were identified for subsequent analyses.
Figure 2

Scatter plot of the overall correlation between the methylation levels and expression levels of the differentially methylated genes/differentially methylated lncRNAs. The red line represents the trend line of the point distribution. lncRNAs, long non-coding RNAs.

Construction of the RS system

Among the 181 DMGs and DM-lncRNAs, 105 were identified to be associated with prognosis, including 91 DMGs and 14 DM-lncRNAs, by univariate Cox regression analysis. Then, 16 DMGs and 6 DM-lncRNAs independently associated with prognosis were identified from the 105 DMGs/DM-lncRNAs using multivariate Cox regression analysis. According to the LASSO Cox regression model, 10 optimal DMGs, including aldehyde dehydrogenase 7 family member A1 (ALDH7A1), chromosome 8 open reading frame 48 (C8orf48), cytokine-like 1 (CYTL1), heat shock protein 90 alpha family class A member 1 (HSP90AA1), isovaleryl-CoA dehydrogenase (IVD), phosphodiesterase 3A (PDE3A), PNMA family member 2 (PNMA2), SAM domain, SH3 domain and nuclear localization signals 1 (SAMSN1), thyroid hormone receptor interactor 13 (TRIP13) and zinc finger protein 878 (ZNF878), and 6 optimal DM-lncRNAs, including ATXN8 opposite-strand lncRNA (ATXN8OS), brain cytoplasmic RNA 1 (BCYRN1), family with sequence similarity 138 member D (FAM138D), HLA complex group 11 (HCG11), MIR155 host gene (MIR155HG) and testis-specific transcript, Y-linked 13 (TTTY13) were finally identified in the training dataset (Fig. 3; Table I).
Figure 3

Graphs for selection of the 'lambda' parameter, and coefficient distribution diagrams of the optimal genes or lncRNAs. (A) The parameter selection diagram for lncRNAs. In the graph on the left, the horizontal and vertical axes represent 'lambda' and cross-validation likelihood (cvl) values, respectively; the intersection of the black dashed lines represents the 'lambda' value at the largest cvl value. The graph on the right demonstrates the coefficient distribution diagram of the optimal lncRNA. (B) The parameter selection diagram for genes. In the graph on the left, the horizontal and vertical axes respectively represent 'lambda' and cvl values; the intersection of the black dashed lines represents the 'lambda' value at the largest cvl value. Right: The graph on the right demonstrates the coefficient distribution diagram of the optimal gene. lncRNAs, long non-coding RNAs.

Table I

Genes and lncRNAs involved in the optimal combinations.

TypeGeneLocusCoefficientHazard ratio95% CIP-valueCutoff
geneALDH7A1cg26327732−1.1070.4150.250-0.8842.350×10−50.34
C8orf48cg24727311−0.5720.1910.180-0.4574.610×10−40.34
CYTL1cg175630340.4961.5091.104-2.3233.051×10−30.16
HSP90AA1cg239042471.0124.7692.041-6.1111.080×10−40.53
IVDcg275299300.5701.9251.244-3.1913.500×10−40.11
PDE3Acg265718140.8991.9771.745-3.9799.310×10−50.34
PNMA2cg262682770.0111.1231.043-2.7057.295×10−30.48
SAMSN1cg139516640.1801.2081.079-2.5461.490×10−40.25
TRIP13cg175103850.1611.331.038-2.6919.838×10−30.65
ZNF878cg26626525−0.1690.5970.336-0.9061.630×10−30.06
lncRNAATXN8OScg25514273−1.2270.3120.088-0.7047.080×10−30.24
BCYRN1ch.X.1084981F13.1985.3991.028-10.381.229×10−20.04
FAM138Dcg265231962.6252.8541.681-4.4259.330×10−30.73
HCG11cg274903872.9533.1641.550-6.7342.010×10−30.20
MIR155HGcg234338891.2581.4311.281-3.1863.341×10−20.04
TTTY13cg259188491.8311.5951.492-3.4136.200×10−30.24

lncRNA, long non-coding RNA; CI, confidence interval.

The cutoff values of the methylation levels of the lncRNAs and genes in the aforementioned optimal combinations were calculated (Table I). In combination with the regression coefficients of the optimal genes or lncRNAs, RS systems based on the methylation status of optimal genes or lncRNAs were constructed. The formulas were as follows: Gene status RS=−1.1070 × StatusALDH7A1-0.5721 × StatusC8orf48 + 0.4962 × StatusCYTL1 + 1.0118 × StatusHSP90AA1 + 0.5696 × StatusIVD + 0.8992 × StatusPDE3A + 0.0109 × StatusPNMA2 + 0.1803 × StatusSAMSN1 + 0.1612 × StatusTRIP13-0.1688 × StatusZNF878; and lncRNA status RS=−1.2274 × StatusATXN8OS + 13.1984 × StatusBCYRN1 + 2.6252 × StatusFAM138D + 2.9526 × StatusHCG11 + 1.2577 × StatusMIR155HG + 1.8308 × StatusTTTY13. In addition, RS systems based on the methylation levels of optimal genes or lncRNAs were built. The formulas for calculating the RSs were: Gene methylation (methy) RS = −1.10 70 × MethyALDH7A1-0.5721 × MethyC8orf48 + 0.4962 × MethyCYTL1 + 1.0118 × MethyHSP90AA1 + 0.5696 × MethyIVD + 0.8992 × MethyPDE3A + 0.0109 × MethyPNMA2 + 0.1803 × MethySAMSN1 + 0.1612 × MethyTRIP13-0.1688 × MethyZNF878; and lncRNA methy RS=−1.2274 × MethyATXN8OS + 13.1984 × MethyBCYRN1 + 2.6252 × MethyFAM138D + 2.9526 × MethyHCG11 + 1.2577 × MethyMIR155HG + 1.8308 × MethyTTTY13. According to the median RS values, samples in the training set, validation set 1 and validation set 2 were separately subdivided into high- and low-risk groups. The associations between prognosis and the results of RS systems were analyzed. The KM curves for the methylation status-based RS systems and those for the methylation level-based RS systems are presented in Figs. 4 and 5, respectively. These RS systems demonstrated high levels of specificity, with the area under the receiver operating characteristic curve >0.6 (Figs. 4 and 5), indicating accurate levels of risk prediction. Of all the systems generated, the RS system based on the methylation level of the 10 optimal DEGs exhibited the best predictive performance (Table II).
Figure 4

KM curves and ROC curves for the methylation status-based RS systems. (A) The KM curves (black and red curves respectively represent low- and high-risk groups) and the ROC curve (black, red and green lines represent the training set, validation set 1 and validation set 2, respectively) for the lncRNA methylation status-based RS system. (B) The KM curves (blue and purple curves respectively represent low- and high-risk groups) and the ROC curve (black, red and green lines represent the training set, validation set 1 and validation set 2, respectively) for the DNA methylation status-based RS system. KM, Kaplan-Meier; ROC, receiver operating characteristic; RS, risk score; TCGA, The Cancer Genome Atlas; HNSC, head and neck squamous cell carcinoma; AUC, area under the receiver operating characteristic curve; lncRNA, long non-coding RNA; HR, hazard ratio.

Figure 5

KM curves and ROC curves for the methylation level-based RS systems. (A) The KM curves (black and red curves represent low- and high-risk groups, respectively) and the ROC curve (black, red and green lines represent the training set, validation set 1 and validation set 2, respectively) for the long noncoding RNA (lncRNA) methylation level-based RS system. (B) The KM curves (blue and purple curves represent low- and high-risk groups, respectively) and the ROC curve (black, green and red lines represent the training set, validation set 1 and validation set 2, respectively) for the DNA methylation level-based RS system. KM, Kaplan-Meier; ROC, receiver operating characteristic; RS, risk score; TCGA, The Cancer Genome Atlas; HNSC, head and neck squamous cell carcinoma; AUC, area under the receiver operating characteristic curve; lncRNA, long non-coding RNA; HR, hazard ratio.

Table II

Survival analysis of the methylation status-based RS system.

A, KM survival analysis for the methylation status-based RS system
Gene product typeStatus model (log-rank P-value)
TrainingHR (95% CI)Validation 1 (GSE39279)HR (95% CI)Validation 2 (TCGA-HNSC)HR (95% CI)
IncRNA2.628×10−42.370 (1.471-3.818)1.446×10−11.734 (0.821-3.662)1.385×10−11.397 (0.896-2.178)
DNA2.004×10−126.112 (3.452-10.82)3.217×10−22.213 (1.050-4.664)2.742×10−21.638 (1.052-2.550)

B, KM survival analysis for the methylation level-based RS system
Methylation model (log-rank P-value)
Gene product typeTrainingHR (95% CI)Validation 1 (GSE39279)HR (95% CI)Validation 2 (TCGA-HNSC)HR (95% CI)

IncRNA6.884×10−52.601 (1.596-4.241)7.923×10−21.913 (0.916-3.995)1.366×10−11.391 (0.899-2.150)
DNA5.928×10−125.870 (3.325-10.37)2.209×10−22.355(1.108-5.006)4.833×10−21.551 (1.098-2.402)

C, AUC of the four RS systems
Status model (ROC)
Methylation model (ROC)
Gene product typeTrainingValidation 1 (GSE39279)Validation 2 (TCGA-HNSC)TrainingValidation 1 (GSE39279)Validation 2 (TCGA-HNSC)

IncRNA0.9630.8530.9080.9970.8240.967
DNA0.9950.7420.9610.9590.8460.912

KM, Kaplan-Meier; RS, risk score; ROC, receiver operating characteristic; AUC, area under the ROC curve; IncRNA, long non-coding RNA; HR, Hazard Ratio; CI, confidence interval; TCGA, The Cancer Genome Atlas; HNSC, head and neck squamous cell carcinoma.

Construction of the nomogram survival model

Independent clinical factors associating with prognosis in the training set were identified by means of univariate and multivariate Cox regression analyses (Table III). For both the training and validation sets, pathological T stage, radiotherapy and DNA methylation level-based model RS status were selected as independent prognostic factors. The KM curves demonstrating the correlations between pathological T/radiotherapy and prognosis are presented in Fig. 6. The patients with low pathological T staging and those who underwent radiotherapy exhibited improved prognoses, which is consistent with the outcomes observed in clinical practice.
Table III

Screening of the independent prognostic factors using

Clinical characteristicsTCGA (n=293)Univariate
Multivariate
HR (95% CI)P-valueHR (95% CI)P-value
Age, years'67.72±8.630.988 (0.962-1.014)3.505×10−1--
Sex. male/female211/821.329 (0.783-2257)2.903×10−1--
Pathological M stage. M0/M1/-227/2/640.303(0.151-1536)6.074×10−1--
Pathological N stage, N0/N1/N2/-193/76/20/41.562(1.142-2.138)4.719×l0−30.365 (0.803-2 582)2.210×10−1
Pathological T stage, T1/T2/T3/T479/159/48/71.639 (1209-2220)1.426×10−31.560(1.096-2.444)4.520×l0−2
Pathological stage. LTIHLIW-139/110/39/2/31.768(1.326-2.356)7.566×10−51.191 (0.653-2.171)5.681×10−1
Radiation therapy, Yes/No/-31/251/112.054(1.145-3.686)1.366×10−22.244(1.191-4227)1.230×10−2
Targeted molecular therapy. Yes/No/-94/188/111.238(0.774-1.981)3.719×10−1--
mRNA methylation model RS status, High/Low146/1475.870(3.325-10.37)5.928×l0−128.348 (4.408-15.812)7.420×10−11
Recurrence, Yes/No78/215----
Recurrence free survival time, months'28.28±27.45----

Data are expressed as mean ± standard deviation. HR, Hazard Ratio; CI, confidence interval; TCGA, The Cancer Genome Atlas.

Figure 6

KM curves. (A) The KM curve revealing the correlation between pathological T and prognosis (black, red, blue and purple lines represent pathological T1, pathological T2, pathological T3 and pathological T4 stage, respectively); (B) The KM curve demonstrating the correlation between radiotherapy and prognosis (black and red lines represent patients without radiotherapy and those with radiotherapy, respectively). KM, Kaplan-Meier.

The 5-year nomogram survival model was built to further analyze the correlation between the 3 independent prognostic factors and prognosis; this model integrated various clinical indicators by means of the 'points' axis in the first row to predict the survival of patients corresponding to the samples (Fig. 7A). In addition, the predicted 5-year survival probability was compared to the actual 5-year survival probability, and a high level of consistency was observed (Fig. 7B).
Figure 7

Nomogram survival model. (A) The nomogram survival model based on the independent prognostic factors. (B) The line graph indicating the consistency between the predicted 5-year survival probability and actual 5-year survival probability. The horizontal and vertical axes represent the predicted probability of RFS and the actual probability of RFS, respectively. The black line represents the trend line connecting the start time point and the 5-year time point. RFS, recurrence-free survival.

As aforementioned, following the addition of the RSs of the DNA methylation level-based RS system to the analysis, the samples in the training set were classified into either high- or low-risk groups. A total of 820 DEGs between the two groups were identified, including 412 upregulated genes and 408 downregulated genes. Following pathway enrichment analysis, 3 pathways were identified to be enriched in this set of DEGs: The calcium signaling pathway (P=0.040944); complement and coagulation cascades (P=0.0034123); and vascular smooth muscle contraction (P=0.0427049; Table IV).
Table IV

Pathways enriched for the differentially expressed genes.

TermSizeESNESP-valueGenes
hsa04020: Calcium signaling pathway10−0.396503−1.1142274.094×10−2EDNRB, HRH1, CALML3, PLN, LTB4R2, RYR1, BDKRB2, CALML5, ITPR2, F2R
hsa04610: Complement and coagulation cascades100.58634982.01256923.412×10−3C3AR1, F12, FGG, C5, F8, SERPINA1, C4BPA, C2, BDKRB2, F2R
hsa04270: Vascular smooth muscle contraction7−0.537324−1.3411764.270×10−2PTGIR, CALML3, CALML5, PRKG1, PLA2G3, PPP1R14A, ITPR2

ES, enrichment score; NES, normalized enrichment score.

Discussion

In the present study, a total of 335 DMGs and DM-lncRNAs were identified between the recurrence-associated samples and nonrecurrence samples of tumor tissues from patients with LSCC. Among them, 16 DMGs and 6 DM-lncRNAs were significantly associated with independent prognosis, and 10 optimal DMGs (ALDH7A1, C8orf48, CYTL1, HSP90AA1, IVD, PDE3A, PNMA2, SAMSN1, TRIP13 and ZNF878) and 6 optimal DM-lncRNAs (ATXN8OS, BCYRN1, FAM138D, HCG11, MIR155HG and TTTY13) were used to construct the methylation status-based or methylation level-based RS systems. In the KM analysis, the 10-DMG methylation level-based RS system exhibited the best performance. The set of DEGs between high- and low-risk groups according to the RS system was identified to be enriched in 3 pathways: The calcium signaling pathway; complement and coagulation cascades; and vascular smooth muscle contraction. As the RS system based on the 10 DMGs exhibited an improved predictive performance compared with that based on the 6 lncRNAs, the present study focused on the functions of these 10 DMGs in LSCC. Elevated expression of ALDH7A1 in prostate cancer has been previously described, and was suggested to predict disease progression and metastasis (32), suggesting that alteration of the expression of this gene may control tumor progression. However, to the best of our knowledge, studies investigating the involvement of this gene in lung cancer are scarce. Low expression of ALDH7A1 in tumors of surgically treated patients with NSCLC is associated with a decreased incidence of tumor recurrence, indicating that decreased expression of this gene may predict a good prognosis for these patients (33). Patients with LSCC with idiopathic pulmonary fibrosis indicate a significantly decreased methylation level of ALDH7A1 compared with those without fibrosis (34). This observation suggests that methylation status of ALDH7A1 may affect other clinical factors. In the present study, ALDH7A1 was a component of the 10-DMG methylation level-based RS system that exhibited optimal performance; therefore, the methylation level of this gene may predict LSCC recurrence. Nevertheless, as the data concerning this gene in lung cancer are limited, the specific mechanisms of action of the ALDH7A1 protein and ALDH7A1 DNA methylation in LSCC should be studied further. CYTL1 has important roles in certain types of cancer and is regulated by DNA methylation in LSCC (35). The methylation patterns of CYTL1 are evidently different between early and late stages of LSCC, and hypermethylation is more common in the advanced stages (36). Although CYTL1 hypermethylation does not affect the repression activity mediated by histone deacetylases (36), this repression activity may be subject to the regulation between CYTL1 hypermethylation and histone deacetylases, as the DNA methylation level is considered to affect the binding of a histone deacetylase to a promoter region (36,37). CYTL1 is suggested to be a risk indicator of smoking-associated impairment of metabolic health, as it is hypomethylated and upregulated in non-lung tissues of smokers (38). According to the present study, the CYTL1 gene was often methylated in recurring LSCC tumors, suggesting that it may be a predictive factor of LSCC recurrence. High expression levels of HSP90AA1, heat shock protein 90 alpha family class B member 1 and heat shock protein 90 β family member 1 are associated with adverse outcomes among patients with NSCLC, and therefore may serve as promising prognostic markers and therapeutic targets in NSCLC (39). HSP90AA1 is differentially expressed between LUAD and LSCC, and therefore its expression profile may be used to distinguish the two subtypes (40). Nevertheless, data concerning HSP90AA1 methylation or the effect of this gene on tumor recurrence are scarce. PDE3A expression is low in chemoresistant NSCLC cells due to DNA hypermethylation, and high PDE3A expression is associated with improved survival in patients with LUAD (41). According to the results of the present study, methylated PDE3A may be associated with the prognosis of LSCC. PNMA2 is aberrantly expressed in various types of tumor in patients with paraneoplastic syndromes; therefore, PNMA2 may be implicated in tumorigenesis (42). PNMA2 has been demonstrated to be a tissue marker of small intestine neuroendocrine tumors, and Ma2 autoantibodies in the blood are a valuable biomarker for the diagnosis and prediction of tumor recurrence (43). Conversely, to the best of our knowledge, no studies on the methylation of this gene have been published, and this parameter may be a new prognostic indicator in LSCC, based on the results of the present study. Downregulation of SAMSN1 is detectable in lung cancer cell lines and may be involved in the development of this disease (44). Elevated TRIP13 expression contributes to the progression of LUAD and may be a candidate biomarker or therapeutic target in LUAD (45). In lung cancer in Xuanwei, TRIP13, cAMP-responsive element-binding protein 3-like 4 and cyclin E2 exhibit concordant upregulation and frequent copy number gains, and have been proposed as potential oncogenes in the pathogenesis of lung cancer in this region (46). In addition, silencing of TRIP13 can suppress cell growth and metastasis of hepatocellular carcinoma by activating TGF-β1/SMAD3 signaling (47). TRIP13 is 1 of the 7 hypomethylated genes in kidney renal cell carcinoma that have been suggested as a prognostic factor (48). Downregulation of ZNF878 has been suggested to be correlated with a poor prognosis of patients with LUAD (49). Although published data on SAMSN1, TRIP13, and ZNF878 methylation levels or their effects on LSCC are limited, when taken into consideration with the results of the present study, we hypothesize that these 3 genes may perform important functions in LSCC progression, and their methylation levels may help to determine an accurate prognosis of the disease. Information on the participation of genes C8orf48 and IVD in LSCC is scare. According to the results of the present study, their methylation may be a prognostic indicator of LSCC recurrence. Despite the identification of the 10-DMG signature that may predict the recurrence of LSCC, the present study contained certain limitations, such as the small sample size of the data retrieved from the database, and the lack of expression validation experiments. In addition, the clinical information of the LSCC samples in the training set was not compared to the validation sets, due to the difference in platforms. Nevertheless, this did preclude the predictive performance evaluation on risks of the identified clinical factors, as they could be distinguished from the AUC values. Further experiments are required to confirm the results. In conclusion, 335 DMGs and DM-lncRNAs, including 27 DM-lncRNAs and 308 DMGs, were identified between recurrence-associated and nonrecurrence samples. The RS system based on the methylation levels of the 10 optimal DMGs (ALDH7A1, C8orf48, CYTL1, HSP90AA1, IVD, PDE3A, PNMA2, SAMSN1, TRIP13 and ZNF878) may help to predict the outcomes of patients with LSCC.
  47 in total

1.  Effect of DR4 promoter methylation on the TRAIL-induced apoptosis in lung squamous carcinoma cell.

Authors:  Wenwu Wang; Xiaoyan Qi; Minghua Wu
Journal:  Oncol Rep       Date:  2015-08-04       Impact factor: 3.906

Review 2.  Update on small cell carcinoma and its differentiation from squamous cell carcinoma and other non-small cell carcinomas.

Authors:  William D Travis
Journal:  Mod Pathol       Date:  2012-01       Impact factor: 7.842

Review 3.  Long Noncoding RNA and Cancer: A New Paradigm.

Authors:  Arunoday Bhan; Milad Soleimani; Subhrangsu S Mandal
Journal:  Cancer Res       Date:  2017-07-12       Impact factor: 12.701

4.  Paraneoplastic antigen Ma2 autoantibodies as specific blood biomarkers for detection of early recurrence of small intestine neuroendocrine tumors.

Authors:  Tao Cui; Monica Hurtig; Graciela Elgue; Su-Chen Li; Giulia Veronesi; Ahmed Essaghir; Jean-Baptiste Demoulin; Giuseppe Pelosi; Mohammad Alimohammadi; Kjell Öberg; Valeria Giandomenico
Journal:  PLoS One       Date:  2010-12-30       Impact factor: 3.240

5.  Integrated Analysis of Genome-Wide Copy Number Alterations and Gene Expression Profiling of Lung Cancer in Xuanwei, China.

Authors:  Yanliang Zhang; Qiuyue Xue; Guoqing Pan; Qing H Meng; Xiaoyu Tuo; Xuemei Cai; Zhenghui Chen; Ya Li; Tao Huang; Xincen Duan; Yong Duan
Journal:  PLoS One       Date:  2017-01-05       Impact factor: 3.240

Review 6.  Potential clinical application of lncRNAs in non-small cell lung cancer.

Authors:  Tong Lu; Yuanyong Wang; Di Chen; Jia Liu; Wenjie Jiao
Journal:  Onco Targets Ther       Date:  2018-11-13       Impact factor: 4.147

Review 7.  The prognostic value and mechanisms of lncRNA UCA1 in human cancer.

Authors:  Fei Yao; Qiang Wang; Qingming Wu
Journal:  Cancer Manag Res       Date:  2019-08-14       Impact factor: 3.989

Review 8.  SNHG15: a promising cancer-related long noncoding RNA.

Authors:  Jinfei Tong; Xudong Ma; Hailan Yu; Jianhua Yang
Journal:  Cancer Manag Res       Date:  2019-07-01       Impact factor: 3.989

9.  RNA-seq analyses of multiple meristems of soybean: novel and alternative transcripts, evolutionary and functional implications.

Authors:  Lei Wang; Chenlong Cao; Qibin Ma; Qiaoying Zeng; Haifeng Wang; Zhihao Cheng; Genfeng Zhu; Ji Qi; Hong Ma; Hai Nian; Yingxiang Wang
Journal:  BMC Plant Biol       Date:  2014-06-17       Impact factor: 4.215

10.  TRIM58/cg26157385 methylation is associated with eight prognostic genes in lung squamous cell carcinoma.

Authors:  Weimin Zhang; Qichen Cui; Weifeng Qu; Xiaoyun Ding; Donglin Jiang; Hongcheng Liu
Journal:  Oncol Rep       Date:  2018-05-08       Impact factor: 3.906

View more
  8 in total

Review 1.  The regulation mechanisms and the Lamarckian inheritance property of DNA methylation in animals.

Authors:  Yulong Li; Yujing Xu; Tongxu Liu; Hengyi Chang; Xiaojun Yang
Journal:  Mamm Genome       Date:  2021-04-15       Impact factor: 2.957

2.  An Immune Gene-Related Five-lncRNA Signature for to Predict Glioma Prognosis.

Authors:  Xinzhuang Wang; Ming Gao; Junyi Ye; Qiuyi Jiang; Quan Yang; Cheng Zhang; Shengtao Wang; Jian Zhang; Ligang Wang; Jianing Wu; Hua Zhan; Xu Hou; Dayong Han; Shiguang Zhao
Journal:  Front Genet       Date:  2020-12-16       Impact factor: 4.599

3.  Specific Lung Squamous Cell Carcinoma Prognosis-Subtype Distinctions Based on DNA Methylation Patterns.

Authors:  Guichuan Huang; Jing Zhang; Ling Gong; Daishun Liu; Xin Wang; Yi Chen; Shuliang Guo
Journal:  Med Sci Monit       Date:  2021-03-04

4.  A Robust Panel Based on Mitochondrial Localized Proteins for Prognostic Prediction of Lung Adenocarcinoma.

Authors:  Weifeng Chen; Jingyao Wang; Qiumei Zhao; Dandan Liu; Donglin Sun; Ningxia Xie; Haohao Zhang; Deji Ye; Chaoyang Li; Yongzhong Liu; Xiaoren Zhang
Journal:  Oxid Med Cell Longev       Date:  2021-09-09       Impact factor: 6.543

5.  A nomogram based on A-to-I RNA editing predicting overall survival of patients with lung squamous carcinoma.

Authors:  Li Liu; Jun Liu; Xiaoliang Deng; Li Tu; Zhuxiang Zhao; Chenli Xie; Lei Yang
Journal:  BMC Cancer       Date:  2022-06-29       Impact factor: 4.638

6.  A Novel Prognosis Signature Based on Ferroptosis-Related Gene DNA Methylation Data for Lung Squamous Cell Carcinoma.

Authors:  Xuequan Wang; Yinnan Meng; Changxin Liu; Haihua Yang; Suna Zhou
Journal:  J Oncol       Date:  2022-09-12       Impact factor: 4.501

7.  A novel gene signature derived from the CXC subfamily of chemokine receptors predicts the prognosis and immune infiltration of patients with lung adenocarcinoma.

Authors:  Kun Deng; Shenghua Lin; Zhanyu Xu; Junqi Qin; Liqiang Yuan; Yu Sun; Jiangbo Wei; Tiaozhan Zheng; Zhiwen Zheng; Fanglu Qin; Shikang Li
Journal:  Medicine (Baltimore)       Date:  2022-10-14       Impact factor: 1.817

8.  Identification and validation of a novel redox-related lncRNA prognostic signature in lung adenocarcinoma.

Authors:  Jie Ren; Aman Wang; Jiwei Liu; Qihang Yuan
Journal:  Bioengineered       Date:  2021-12       Impact factor: 3.269

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.