Qiang Zhu1, Gang Tian1, Jianyong Gao1. 1. Department of Stomatolgy, Changhai Hospital, Second Military Medical University, Shanghai 200433, P.R. China.
Abstract
This study aimed to identify DNA methylation markers in oral squamous cell carcinoma (OSCC) and to construct a prognostic prediction model of OSCC. For this purpose, the methylation data of patients with OSCC downloaded from The Cancer Genome Atlas were considered as a training dataset. The methylation profiles of GSE37745 for OSCC samples were downloaded from Gene Expression Omnibus and considered as validation dataset. Differentially methylated genes (DMGs) were screened from the TCGA training dataset, followed by co‑methylation analysis using weighted correlation network analysis (WGCNA). Subsequently, the methylation and gene expression levels of DMGs involved in key modules were extracted for correlation analysis. Prognosis‑related methylated genes were screened using the univariate Cox regression analysis. Finally, the risk prediction model was constructed and validated through GSE52793. The results revealed that a total of 948 DMGs with CpGs were screened out. Co‑methylation gene analysis obtained 2 (brown and turquoise) modules involving 380 DMGs. Correlation analysis revealed that the methylation levels of 132 genes negatively correlated with the gene expression levels. By combining with the clinical survival prognosis of samples, 5 optimized prognostic genes [centromere protein V (CENPV), Tubby bipartite transcription factor (TUB), synaptotagmin like 2 (SYTL2), occludin (OCLN) and CAS1 domain containing 1 (CASD1)] were selected for constructing a risk prediction model. It was consistent in the training dataset and GSE52793 that low‑risk samples had a better survival prognosis. On the whole, this study indicates that the constructed risk prediction model based on CENPV, SYTL2, OCLN, CASD1, and TUB may have the potential to be used for predicting the survival prognosis of patients with OSCC.
This study aimed to identify DNA methylation markers in oral squamous cell carcinoma (OSCC) and to construct a prognostic prediction model of OSCC. For this purpose, the methylation data of patients with OSCC downloaded from The Cancer Genome Atlas were considered as a training dataset. The methylation profiles of GSE37745 for OSCC samples were downloaded from Gene Expression Omnibus and considered as validation dataset. Differentially methylated genes (DMGs) were screened from the TCGA training dataset, followed by co‑methylation analysis using weighted correlation network analysis (WGCNA). Subsequently, the methylation and gene expression levels of DMGs involved in key modules were extracted for correlation analysis. Prognosis‑related methylated genes were screened using the univariate Cox regression analysis. Finally, the risk prediction model was constructed and validated through GSE52793. The results revealed that a total of 948 DMGs with CpGs were screened out. Co‑methylation gene analysis obtained 2 (brown and turquoise) modules involving 380 DMGs. Correlation analysis revealed that the methylation levels of 132 genes negatively correlated with the gene expression levels. By combining with the clinical survival prognosis of samples, 5 optimized prognostic genes [centromere protein V (CENPV), Tubby bipartite transcription factor (TUB), synaptotagmin like 2 (SYTL2), occludin (OCLN) and CAS1 domain containing 1 (CASD1)] were selected for constructing a risk prediction model. It was consistent in the training dataset and GSE52793 that low‑risk samples had a better survival prognosis. On the whole, this study indicates that the constructed risk prediction model based on CENPV, SYTL2, OCLN, CASD1, and TUB may have the potential to be used for predicting the survival prognosis of patients with OSCC.
Oral cancer is a type of head and neck cancer arising in the oral cavity, such as the lips, the anterior two-thirds of the tongue, sublingual region and the hard palate (1,2). There are several histological types of oral cancer, and >90% of oral cancers are squamous cell carcinomas. Oral squamous cell carcinoma (OSCC) is the eighth most common type of cancer worldwide, leading to 144,000 deaths annually (3). Despite many advances being made in the surgical treatment, chemotherapy and radiotherapy used in OSCC, the survival rate of patients with OSCC has not markedly improved (4). Currently, the 5-year survival rate of patients with OSCC is approximately 50% due to the tendency for metastasis (5,6). Therefore, exploring biomarkers with high accuracy, specificity and sensitivity based on the current genome-based approaches underlying OSCC may be helpful for improving its diagnosis, treatment and prognosis.DNA methylation, mainly referring to the covalent addition of a CH3 on the 5′ position of cytosine, is one of the most important epigenetic modifications involved in the physiological control of genome expression (7). Studies have demonstrated that abnormal methylation mechanisms can result in abnormal gene expression and decreased genomic stability, thereby exerting potent effects on cancer genes (8,9). For OSCC, the silencing of death associated protein kinase 1 (DAPK1), p16INK4a, cadherin 1 (CDH1), O-6-methylguanine-DNA methyltransferase (MGMT) and ABO, alpha 1-3-N-acetylgalactosaminyltransferase and alpha 1-3-galactosyltransferase (ABO) has been reported to be linked with increased DNA methylation levels of the respective promoters (10-12). Usually, DNA methylation is an earlier event in the process of cell canceration compared with DNA mutation (13). Therefore, it is helpful for diagnostic prediction and therapeutic treatments based on characteristics of DNA methylation patterns across cancers (14). Screening high-risk populations through methylation detection may improve the accuracy of early diagnosis so as to gain more treatment time for cancerpatients.In this study, we aimed to screen some important DNA methylation markers based on the OSCC methylation data in The Cancer Genome Atlas (TCGA). Additionally, a prognostic prediction model was constructed using the screened DNA methylation markers and further validated in another dataset, GSE52793.
Data and methods
Data collection
The methylation data of patients with OSCC, based on the Illumina Infinium Human Methylation 450 BeadChip platform, were downloaded from the TCGA database (https://gdc-portal.nci.nih.gov/). The Cancer Genome Atlas is a public funded project that aims to catalogue and discover major cancer-causing genomic alterations to create a comprehensive 'atlas' of cancer genomic profiles through large-scale genome sequencing (15). According to the definition of oral anatomy, 378 oral samples were retained, including 18 gums, 30 tongue roots, 22 buccal mucosae, 66 mouth floor, 8 dura, 87 oral cavity, and 156 tongue samples. Among these 378 oral samples, 312 samples with survival prognosis information were used as the training dataset. The average age was 61.67±12.89 years, the ratio of males to females was 213:99, the average overall survival (OS) time was 27.61±27.70 months, and the dead to survival ratio was 131:181.Simultaneously, the GSE52793 (platform: Illumina Infinium Human Methylation 450 BeadChip) was downloaded from the National Center of Biotechnology Information (NCBI) Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/). This dataset contains DNA methylation profiles across approximately 450,000 CpGs in oral rinse samples from a cohort of 82 OSCC patients who had survival prognosis information. This dataset was used as the validation dataset. The average OS time was 41.82±17.45 months, and the dead to survival ratio was 23:59.
Screening of differentially methylated genes (DMGs)
The samples in the TCGA training dataset were divided into the bad prognostic group (survival time of <12 months and died) and the good prognostic group (survival time of >36 months and still survived) according to the OS time recorded in the clinical information. Subsequently, the detected methylation sites were annotated according to the platform annotation information and the meth-ylation sites located in the CpGs regions of the genes were selected for differential methylation analysis using the limma package (16) (version 3.34.7, https://bioconductor.org/packages/release/bioc/html/limma.html) in R3.4.1. Limma package is a R-based open-source software development project in statistical genomics that provides enhanced possibilities for biological interpretation of both differential expression and differential splicing analyses of RNA sequencing data (16). False discovery rate (FDR) <0.05 and |log fold change (FC)|>0.2 were used as the thresholds.
Co-methylation gene analysis
In order to evaluate the internal correlations between CpG islands, we performed co-methylation analysis of all CpG island methylation data using weighted correlation network analysis (WGCNA) package (17) (version 1.63, https://cran.r-project.org/web/packages/WGCNA/index.html) in R3.4.1. The WGCNA package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis that includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software (17). In this method, the module was defined as a group of genes that had similar methylation levels. Based on this algorithm, we screened the modules that were significantly related to methylation levels. The CpGs in the same module highly correlated with the meth-ylation level.Subsequently, the screened CpGs were mapped to each WGCNA module to calculate the fold enrichment ratio and the P-value of target CpGs in each module significantly correlated with the methylation level using the hypergeometric algorithm. The formula is f (k, N, M, N) = C (k, M) x C (n-k, n-m)/C (N, N) (17), where 'N' indicates all methylation sites involved in the WGCNA algorithm analysis; 'M' indicates the number of genes in each module obtained from the WGCNA algorithm; and 'n' indicates the number of significant DMGs. The thresholds for module screening were P<0.05 and fold enrichment >1. Finally, the genes in the screened modules were subjected to biology process enrichment analysis using the Database for Annotation, Visualization and Integrated Discovery (DAVID, version 6.8, https://david.ncifcrf.gov/) (18,19). DAVID is an online available bioinformatics resources that consists of an integrated biological knowledgebase and analytic tools aimed at systematically extracting biological meaning from large gene/protein lists (18,19).
Correlation analysis between the methylation level and expression level
From the TCGA training dataset, the OSCC mRNA-seq samples matching methylation profiles were selected. The methylation levels and gene expression levels of DMGs involved in key modules were extracted as well. The overall Pearsons' correlation coefficients (PCC) (20) between the methylation levels and gene expression levels were calculated using cor.test (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.test.html) in R3.4.1. The cor.test is a program of R functions that can be used for exploring the correlations between variables by Pearsons' correlation coefficient (20). The correlation between the methylation level and gene expression level of each gene was then calculated and the DMGs whose methylation levels negatively correlated with the expression levels were selected for further analysis (P<0.05).
Screening of prognosis-related methylated genes
Based on the expression levels of the screened DMGs in TCGA samples, as well as the clinical survival prognosis information, the prognosis-related methylated genes were screened using the univariate Cox regression analysis implemented in survival package (21) (version 2.41-1, http://bioconductor.org/packages/survivalr/) of R3.4.1. The log-rank tested P-value <0.05 was considered as the significance threshold.
Establishment of prognostic prediction model based on methylated genes
According to the methylation levels of the prognosis-related methylated genes, the optimized prognostic genes were screened by the Cox-Proportional Hazards (Cox-PH) model (22) based on L1 penalized regularized regression algorithm (23) (version 0.9-50, http://bioconductor.org/packages/penalized/) of R3.4.1. The Cox-PH model is a predictive model of overall and disease-free survival, based on demographic and clinical covariates (24).Subsequently, the risk prediction model was constructed based on the prognosis coefficients of optimized prognostic genes. The risk score of each sample was calculated as follows: Risk score=∑coefgene × Methylationgene, where coefgene represents the regression coefficient obtained in the previous step, and Methylationgene represents the methylation level of the corresponding gene.Finally, the samples in TCGA training dataset were divided into the high- and low-risk groups according to the median of the risk score. The association between risk model and prognosis was assessed by the Kaplan-Meier (KM) survival curve (25) in survival package (version 2.41-1). The KM survival curve is the probability of surviving in a given length of time while considering time in many small intervals (25). Moreover, the discriminant efficacy of the prediction model on survival prognosis was evaluated through the validation dataset GSE52793.
Results
Screening of significant DMGs
A total of 15,654 gene methylation sites containing CpGs were obtained through the annotation information on the Illumina 450 K methylation platform. Subsequently, in the TCGA training dataset, 54 samples were divided into the bad prognostic group and 46 were divided into the good prognostic group. Finally, a total of 948 DMGs with CpGs were screened out and the volcano plot is presented in Fig. 1A. The log2 Kernel density curve revealed that 64.56% (612/948) of the DMGs were significantly hypomethylated in the good prognostic group, and 35.44% (336/948) were significantly hypermethylated (Fig. 1B). The bidirectional hierarchical cluster heatmap revealed that the samples were clearly divided into 2 groups based on the screened DMGs (Fig. 1C).
Figure 1
The significant differentially methylated genes. (A) Volcano plot of significant differentially methylated genes. The red dots represent differentially methylated genes; the black dots represent non-differentially methylated genes; the green horizontal dotted line represents the false discovery rate (FDR) <0.05; the two green vertical dotted lines represent the |log2 fold change (FC)|>0.2. A total of 948 DMGs with CpGs were screened out. (B) Log2 Kernel density curve based on differentially methylated genes. The proportion of hypomethylated genes in the good prognostic group was 64.56% (612/948), and 35.44% (336/948) were significantly hypermethylated. (C) The hierarchical clustering heatmaps of significant differentially methylated genes. The red and green bars represent the samples in good and bad prognostic groups, respectively. The samples were clearly divided into 2 groups based on the screened differentially methylated genes.
Among the CpGs of the 948 DMGs, 15 were in the transcription start site (TSS) area, 432 in the gene body area, 104 in the 5′UTR region, 17 in the 3′UTR region, and 380 in the promoter region. The top 20 DMGs with CpGs sorted according to FDR are presented in Table I.
Table I
The top 20 significant differentially methylated genes (DMGs) with CpGs between bad prognostic group and good prognostic group sorted according to false discovery rate (FDR).
Methylation loci
Chr.
Position
Genes
Location
β-bad
β-good
Effect
Pnominal
FDR
cg03904042
chr20
31719152
NECAB3
Body
0.3413
0.2296
−0.5719
8.05×10−7
2.21×10−5
cg04488521
chr5
178420322
ZNF354C
Promoter
0.3382
0.2272
−0.5737
3.42×10−6
9.37×10−5
cg09595479
chr12
47975479
PRPH
1stExon
0.4229
0.3151
−0.4244
5.31×10−6
1.45×10−4
cg00054702
chr5
135556442
LOC389332
Body
0.1967
0.1149
−0.7755
8.34×10−6
2.29×10−4
cg07040405
chr2
174537121
SP3
Promoter
0.0306
0.0383
0.3253
9.43×10−6
2.59×10−4
cg13146839
chr3
192063116
LOC647309
1stExon
0.4032
0.5159
0.3556
1.12×10−5
3.08×10−4
cg00741900
chr14
101097550
DIO3
5′UTR
0.3288
0.2510
−0.3892
1.15×10−5
3.14×10−4
cg27229100
chr20
47330667
C20orf199
Body
0.4259
0.5299
0.3153
1.44×10−5
3.94×10−4
cg13316171
chr2
187059280
ZC3H15
1stExon
0.0212
0.0273
0.3649
1.56×10−5
4.28×10−4
cg01494348
chr8
144731538
NAPRT1
Promoter
0.1567
0.0991
−0.6612
1.64×10−5
4.50×10−4
cg24960763
chr12
132217389
ZNF10
Promoter
0.0867
0.0578
−0.5848
1.90×10−5
5.22×10−4
cg18113994
chr19
62817679
ZNF134
1stExon
0.0784
0.1796
1.1953
2.04×10−5
5.60×10−4
cg06815419
chr15
81526866
BTBD1
Promoter
0.0215
0.0265
0.3018
2.05×10−5
5.63×10−4
cg25019777
chr4
53220160
USP46
Promoter
0.0432
0.0641
0.5716
2.08×10−5
5.70×10−4
cg00004421
chr17
7415803
SENP3
TSS1500
0.4066
0.4947
0.2828
2.35×10−5
6.44×10−4
cg01331992
chr9
19369118
RPS6
Body
0.2316
0.3035
0.3899
2.95×10−5
8.09×10−4
cg00767496
chr15
43458571
GATM
Body
0.5105
0.4140
−0.3020
3.00×10−5
8.22×10−4
cg21858255
chr12
103133739
TXNRD1
Promoter
0.1570
0.0696
−1.1737
3.15×10−5
8.64×10−4
cg02605461
chr20
32047137
RALY
5′UTR
0.3931
0.4871
0.3095
3.37×10−5
9.25×10−4
cg26658728
chr16
65195711
CMTM3
5′UTR
0.1783
0.1032
−0.7891
3.38×10−5
9.26×10−4
The first column indicates the cg number of the methylation site; the second column indicates the chromosome number of detected methylation site; the third column indicates the chromosome position of the detected methylation site; the fourth column indicates the gene name of methylation area; the fifth column represents the CpGs location in methylation regional; the sixth and seventh columns represent the average methylation level β of CpGs in bad and good prognostic groups; the eighth column represents log2FC (good/bad); the ninth column represents the P-value; and the tenth column represents the corrected FDR value.
In the WGCNA algorithm, we selected the power value (power=6) when the square value of the correlation coefficient reached up to 0.9 for the first time. The topological overlap matrix was then calculated to evaluate the correlation between the methylation levels of two genes, subsequently constructing the system clustering tree. With parameters of minSize=100 and cutHeight=0.99, 15 modules were obtained (Fig. 2A and Table II). The CpGs gene expression levels in the modules were significantly associated with the methylation levels, apart from the grey module (P<0.05). The average of significant correlation coefficients was 0.547, ranging from 0.138 to 0.777. The number of DMGs in each module and the fold enrichment ratio of each module are shown in Fig. 2B and C, respectively. According to the hyper-geometric enrichment algorithm, 2 (brown and turquoise) modules were obtained. There were 208 and 172 DMGs in the brown and turquoise modules, respectively. Functional enrichment analysis revealed these 380 DMGs were significantly related to 24 biological processes, such as vasculogenesis and response to hypoxia (Fig. 3).
Figure 2
Screening of modules related to gene CpG methylation by weighted correlation network analysis (WGCNA). (A) The modules related to gene CpGs methylation. A total of 15 modules were identified, and they were presented by 15 different colors (black, blue, brown, cyan, green, green-yellow, grey, magenta, pink, purple, red, salmon, tan, turquoise and yellow). (B) The number of differentially methylated genes in the brown, blue, black, yellow, turquoise, tan, red, purple, pink, magenta, green-yellow, green, cyan models. (C) Fold enrichment column graph of the brown, blue, black, yellow, turquoise, tan, red, purple, pink, magenta, green-yellow, green, cyan models. The green horizontal dotted line represents a fold enrichment ratio of 1.
Table II
Information of the 15 modules identified by weighted correlation network analysis.
Color
#CpGs
Correlation
Pcorr
DE CpGs
Enrichment fold (95% CI)
Phyper
Black
290
0.7553
5.3×10−29
10
0.2477
1.439×10−7
Blue
468
0.6377
3.33×10−3
29
0.4452
5.483×10−6
Brown
430
0.7665
8.58×10−8
208
3.4742
2.200×10−16
Cyan
110
0.7766
3.93×10−9
2
0.1306
1.104×10−4
Green
334
0.5926
7.60×10−15
16
0.3442
2.647×10−6
Green-yellow
157
0.6143
4.24×10−20
12
0.5491
4.249×10−2
Grey
2,667
0.2203
7.42×10−2
433
1.1662
1.408×10−2
Magenta
261
0.6383
1.38×10−25
9
0.2477
7.327×10−7
Pink
276
0.6787
1.27×10−13
15
0.3904
9.668×10−5
Purple
222
0.6008
4.42×10−3
17
0.5501
1.520×10−2
Red
333
0.6209
6.26×10−15
2
0.0432
5.343×10−16
Salmon
120
0.2454
4.37×10−11
-
-
-
Tan
147
0.1381
5.75×10−10
21
1.0262
9.052×10−1
Turquoise
585
0.5115
1.02×10−15
172
2.112
3.342×10−14
Yellow
410
0.4044
4.85×10−10
2
0.03503
2.200×10−16
The first column represents the module color; the second column represents the number of CpGs genes in module; the third column represents the correlation coefficient between CpGs gene and methylation level; the fourth column represents the P-value; the fifth column represents the number of differentially methylated CpGs gene mapping to each module; the sixth and seventh columns represent Enrichment fold and P-values obtained based on hypergeometric enrichment algorithm. CI, confidence interval.
Figure 3
The significantly enriched biological processes for the differentially methylated genes in brown and turquoise modules. The horizontal axis represents the number of genes involved in the biological process; the vertical axis represents the name of the biological process; the height of the column represents the number of genes involved in biological process; and the color of the column represents P-value. These 380 differentially methylated genes were significantly related to 24 biological processes.
Correlation analysis between the methylation level and expression level of key DMGs
The overall correlation analysis between the methylation levels and expression levels of the 380 DMGs mentioned above revealed that there was a significant negative correlation between the gene methylation and expression level (Cor.=−0.3379, P=2.978e-07; Fig. 4). After searching the published studies on line about the correlation analysis between the methylation levels and expression levels, some correlation coefficients of revealed significant correlations were below 0.5. For example, in the study by Győrffy et al, the significant correlations between the methylation of gene regions and expression levels among prognostic genes in the internal cohort of breast cancerpatients revealed that most of the absolute correlation coefficients were <0.5 (26). Therefore, the results may be not lower than expected. The analysis of the correlation between the methylation and expression levels for each gene revealed that the methylation levels of 132 genes negatively correlated with the expression levels (data not shown).
Figure 4
The overall correlation analysis between methylation levels and expression levels of the 380 differentially methylated genes in brown and turquoise modules. The red line is the trend line of point distribution; Cor. represents the Pearson's correlation coefficient between the methylation levels and expression levels; 'P' represents the significance of the correlation. There was a significant negative correlation between the gene methylation and expression level.
Based on the methylation levels of CpGs in these 132 genes in the TCGA samples and the clinical survival prognosis of samples, we identified 27 methylated genes that were significantly associated with prognosis. Subsequently, 5 optimized prognostic genes [centromere protein V (CENPV), Tubby bipartite transcription factor (TUB), synaptotagmin like 2 (SYTL2), occludin (OCLN) and CAS1 domain containing 1 (CASD1)] were selected through the Cox-PH model (Fig. 5A). The prognosis coefficients of these 5 genes are shown in Fig. 5B. Subsequently, according to the median of the methylated signal value, the samples were divided into the hypomethylation and hypermethylation groups. The KM curves revealed that the samples with hypomethylation levels had a better OS (P<0.05; Fig. 6). Additionally, according to the median of the gene expression level, the samples were divided into the high expression and low expression groups. The KM curves revealed that the samples with high expression levels had a better OS (P<0.05; Fig. 6).
Figure 5
Identification of optimized prognostic genes through the Cox-Proportional Hazards (Cox-PH) model. (A) The lambda parameter curve selected by cross-validation likelihood. The horizontal and vertical axes respectively represent different values of lambda and cross-validation likelihood. (B) The prognosis coefficients of the 5 optimized prognostic genes (CENPV, TUB, SYTL2, OCLN and CASD1) selected through the Cox-PH model. CENPV, centromere protein V; TUB, Tubby bipartite transcription factor; SYTL2, synaptotagmin like 2; OCLN, occluding; CASD1, CAS1 domain containing 1.
Figure 6
The Kaplan-Meier curves for patients with different methylation or expression levels of CASD1, OCLN, SYTL2, TUB and CENPV. According to the median of the methylated signal value, the samples were divided into the hypomethylation and hypermethylation groups (upper panels). The KM curves also revealed that the samples with high expression levels had a better overall survival prognosis (lower panels). CENPV, centromere protein V; TUB, Tubby bipartite transcription factor; SYTL2, synaptotagmin like 2; OCLN, occluding; CASD1, CAS1 domain containing 1.
Prognosis prediction model based on optimized methylated genes
Based on the Cox-PH prognostic coefficients of the 5 genes, the gene-based risk prediction model was constructed as follows: Risk score=(0.2666) × Methylationcg08214689 + (0.2937) × Methylationcg09498146 + (1.5954) × Methylationcg02716252 + (0.6491) × Methylationcg00562641 + (0.3989) × Methylationcg11341528.The risk score of each sample was calculated using the above-mentioned formula. In order to evaluate the discriminant efficacy of the prediction model on the survival prognosis, we divided the samples in the TCGA training dataset into the high- and low-risk groups based on the median of the risk score. The KM curves of the TCGA dataset revealed that the low-risk samples had a better survival prognosis (P=4.698e-03; Fig. 7A). Additionally, the discriminant efficacy of the prediction model was also evaluated in the validation dataset GSE52793, and the results were consistent with those of the TCGA training dataset (P=1.466e-02; Fig. 7B).
Figure 7
The Kaplan-Meier curves for patients based on the risk score prediction model. (A) Kaplan-Meier curves of patients in TCGA training dataset revealed that the low-risk samples had a better survival prognosis. (B) Kaplan-Meier curves of patients in the validation dataset GSE52793 were consistent with those of the TCGA training dataset.
Discussion
In the present study, we analyzed the methylation data of patients with OSCC downloaded from the TCGA database, and selected 5 optimized genes (CENPV, TUB, SYTL2, OCLN and CASD1) associated with methylation through a series of bioinformatics methods. Based on these 5 genes, a prognostic prediction model was constructed and its discriminant efficacy on survival prognosis was confirmed by the validation dataset GSE52793.Among the 5 genes, SYTL2, OCLN and CASD1 have been reported to be associated with tumorigenesis. SYTL2 possesses a C-terminal tandem C2 domain, N-terminal Slp homology domain and Rab-binding region (27), which is primarily characterized as an effector of Rab27, a Ras-related small GTPase (28). SYTL2 can regulate tubulogenesis and cell signaling by promoting the trafficking of the signaling molecule, podocalyxin, to the apical surface through a Rab27-dependent manner (29). Recently, SYTL2 has been reported to be implicated in cancer pathogenesis. Ho et al (30) analyzed 223 Rab effector genes in bladder cancer and found that the Rab27 gene cluster was deregulated in the carcinoma in situ pathway and Ta pathway. Specifically, SYTL2 was downregulated in the two above-mentioned pathways and was associated with differentiation. More recently, SYTL2 was demonstrated to serve as an epigenetically regulated pro-metastatic factor and to be associated with a poorer survival in patients with ovarian cancer (31). These findings suggest that SYTL2 plays critical roles in humancancers, although its role in OSCC has not been reported previously, at least to the best of our knowledge.OCLN encodes an integral membrane protein which plays an important role in cytokine-induced regulation of the tight junction (32). The disruption of the structure of tight junction or occludin is related to cancer invasion and metastasis (33). A previous study reported that OCLN downregulation is a common feature of epithelial-mesenchymal-transition in tumors derived from simple epithelial cells (34). To date, the downregulation of OCLN has been reported in breast cancer, liver tumors, endometrial carcinoma and lung cancer (33,35-37). Rachow et al (32) observed the loss of OCLN in cutaneous squamous cell carcinoma and suggested that OCLN loss was related to cell adhesion, apoptosis and proliferation (32). Importantly, OCLN has been demonstrated to be methylated in the promoter and endogenous region of a breast cancer cell line (38). Therefore, we speculated that OCLN might serve as an important prognostic maker in OSCC.A recent study demonstrated that CASD1 is a sialate O-acetyltransferase (SOAT), a key enzyme in the biosyn-thesis of 9-O-acetylated sialoglycans (39). Evidence has also indicated an enhanced SOAT activity in the lymphoblasts of children with acute lymphoblastic leukemia and SOAT is thus considered a possible biomarker for monitoring acute lymphoblastic leukemia (40). Currently, its role in other solid cancers, particularly OSCC has not been reported, at least to the best of our knowledge. According to the results of this study, we hypothesized that the methylation of CASD1 may be a key predictor for the prognosis of patients with OSCC. Taken together, in consideration of the reports associated with the functions of SYTL2, OCLN and CASD1 in humancancers, it can be concluded that the constructed risk prediction model may have the potential to be used in predicting the survival of patients with OSCC.CENPV encodes a centrosome-associated protein. Presently, there are very few specific studies available on the role of CENPV in humancancers (41). Notably, there are a number of studies available on the other centromere proteins in cancers. It has been reported that the enhanced expression of centromere protein A is implicated in the poor survival of patients with humancancers (42,43). Centromere protein A has been considered as a prognostic and predictive biomarker in humancancers (44). Additionally, the elevated expression of centromere protein F can predict clinical progression and prognosis in patients with prostate cancer (45). Importantly, human centromere protein H has been suggested to be closely linked to the increased or abnormal cell proliferation in human OSCC (46). These findings mentioned above suggest the critical roles of centromere protein in predicting humancancers. CENPV may thus serve as a potential prognostic factor in human OSCC.For the marker gene TUB, it encodes a member of the Tubby family of bipartite transcription factors. The Tubby proteins are a unique protein family sharing a highly conserved C-terminal domain, playing a role in obesity and sensorineural degradation (47). However, the function of TUB in cancers has not yet been reported, at least to the best of our best knowledge. In view of the reliability of the risk prediction models in the present study, we considered that methylated TUB may be a prognostic factor in OSCC.In spite of these findings mentioned above, there were some limitations to our study. Although RNA sequencing is able to identify and quantify transcripts, isoforms, novel transcripts, gene fusions, and non-coding RNAs among a wide range of samples rapidly and accurately, the methylation level and expression level of the 5 predicted genes should be further detected using laboratory experiments, such as PCR. Additionally, the constructed risk prediction model should also be applied to other cohorts of patients with OSCC to further confirm the discriminant efficacy of the prediction model on survival prognosis, even though this prediction model has been verified in a validation dataset GSE52793.In conclusion, the present study identified 5 key methylated genes (CENPV, SYTL2, OCLN, CASD1 and TUB) for OSCC based on the methylation data in the TCGA database. Moreover, the constructed risk prediction model was validated in GSE52793 suggesting that this prediction model may be helpful for predicting the survival prognosis of patients OSCC.
Authors: S L Rosas; W Koch; M G da Costa Carvalho; L Wu; J Califano; W Westra; J Jen; D Sidransky Journal: Cancer Res Date: 2001-02-01 Impact factor: 12.701
Authors: Davide B Gissi; Viscardo P Fabbri; Andrea Gabusi; Jacopo Lenzi; Luca Morandi; Sofia Melotti; Sofia Asioli; Achille Tarsitano; Tiziana Balbi; Claudio Marchetti; Lucio Montebugnoli Journal: Int J Mol Sci Date: 2020-09-14 Impact factor: 5.923