Jun Han1, Meijun Chen2, Qingxiao Fang3, Yanqing Zhang4, Yihan Wang5, Jamaspishvili Esma2, Hong Qiao6. 1. Department of Endocrinology and Metabolism, The Second Affiliated Hospital, Harbin Medical University, The Fourth Affiliated Hospital, Harbin Medical University, Harbin 150001, China. 2. Department of Endocrinology and Metabolism, The Second Affiliated Hospital, Harbin Medical University, Harbin 150001, China. 3. Surgical Oncology, The Second Affiliated Hospital, Harbin Medical University, Harbin 150001, China. 4. Hematological Department, The Second Affiliated Hospital, Harbin Medical University, Harbin 150001, China. 5. College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150001, China. 6. Department of Endocrinology and Metabolism, The Second Affiliated Hospital, Harbin Medical University, Harbin 150001, China. Electronic address: qiaoh0823@sina.com.
Abstract
Papillary thyroid carcinoma (PTC) is the most common malignant tumor of endocrine systems. Chromosomal instability (CIN) is crucial to the clinical prognoses of tumor patients. DNA methylation plays an important role in the regulation of gene expression and CIN. Based on PTC samples from The Cancer Genome Atlas database, we used multiple regression analyses to identify methylation patterns of CpG sites with the strongest correlation with gene expression. A total of 4,997 genes were obtained through combining the CpG sites, which were represented as featured DNA methylation patterns. In order to identify CIN-related epigenetic markers of PTC survival, we developed a method to characterize CIN based on DNA methylation patterns of genes using the Student's t statistics. We found that 1,239 genes were highly associated with CIN. With the use of the log-rank test, univariate Cox regression analyses, and the Kaplan-Meier method, DNA methylation patterns of UBAC2 and ELOVL2, highly correlated with CIN, provided potential prognostic values for PTC. The higher these two genes, risk scores were correlated with worse PTC patient prognoses. Moreover, the ELOVL2 risk score was significantly different in the four stages of PTC, suggesting that it was related to the progress of PTC. The DNA methylation pattern associated with CIN may therefore be a good predictor of PTC survival.
Papillary thyroid carcinoma (PTC) is the most common malignant tumor of endocrine systems. Chromosomal instability (CIN) is crucial to the clinical prognoses of tumorpatients. DNA methylation plays an important role in the regulation of gene expression and CIN. Based on PTC samples from The Cancer Genome Atlas database, we used multiple regression analyses to identify methylation patterns of CpG sites with the strongest correlation with gene expression. A total of 4,997 genes were obtained through combining the CpG sites, which were represented as featured DNA methylation patterns. In order to identify CIN-related epigenetic markers of PTC survival, we developed a method to characterize CIN based on DNA methylation patterns of genes using the Student's t statistics. We found that 1,239 genes were highly associated with CIN. With the use of the log-rank test, univariate Cox regression analyses, and the Kaplan-Meier method, DNA methylation patterns of UBAC2 and ELOVL2, highly correlated with CIN, provided potential prognostic values for PTC. The higher these two genes, risk scores were correlated with worse PTCpatient prognoses. Moreover, the ELOVL2 risk score was significantly different in the four stages of PTC, suggesting that it was related to the progress of PTC. The DNA methylation pattern associated with CIN may therefore be a good predictor of PTC survival.
Papillary thyroid carcinoma (PTC) is derived from the thyroid follicular epithelium. PTC is the most common type of endocrine cancer, and its incidence has increased rapidly over the past several decades. It accounts for 85% of thyroid cancer, 60% of adult thyroid cancer, and 100% of childthyroid cancer. The vast majority of patients are diagnosed with differentiated thyroid carcinoma, especially with PTC. This causes difficulty in planning the therapy, because some patients are overtreated, whereas in other patients, the same therapy does not result in the eradication of the neoplastic foci and inhibition of the natural course of the disease. PTCs are usually curable with a 5-year survival of over 95%; however, occasionally, they dedifferentiate into more aggressive and lethal thyroid cancers. For this reason, it is important to identify effective prognostic markers to evaluate the prognoses of PTCpatients.Commonly used prognostic markers presently include proteins, microRNAs (miRNAs), mRNAs, and DNA methylations. Ma and Yu suggested that TBL1XR1 overexpression was an unfavorable prognostic factor for epithelial ovarian cancer, and Wang et al. suggested that DHX32 overexpression was an unfavorable prognostic biomarker for breast cancer. The signature of chromosomal instability (CIN), inferred from gene-expression levels, can predict clinical outcomes in multiple humancancers. CIN describes a dynamic state in which cells continuously gain or lose whole chromosomes or parts of chromosomes at an elevated rate and is therefore a principal mediator of aneuploidy and intra-tumor heterogeneity.10, 11, 12, 13 Because aneuploidy is a consequence of CIN, genes with expression levels are consistently associated with aneuploidy, so gene-expression signatures provide a means to estimate levels of CIN. Carter et al. developed a computational method to characterize CIN based on gene-expression levels using the Student’s t statistics. They mapped the genes to chromosomal sub-bands, with CIN describing the net deviation in expression of genes contained in each chromosomal region relative to the remainder of the sampled transcriptome. Patients with a higher CIN score had worse clinical prognoses. They suggested that gene-expression signatures that had high correlations with CIN could therefore predict the clinical prognoses of tumorpatients.Genomic DNA hypomethylation is another important factor associated with CIN.14, 15, 16, 17 Methylation of the carbon-5 position of cytosine, mostly in the context of CpG dinucleotides, is the main epigenetic modification of DNA and is essential for a properly functioning genome, including maintenance of chromosome stability and transcriptional repression.18, 19, 20, 21 Recently, DNA methylation biomarkers for the diagnoses, molecular typing, and prognoses of cancers were identified.22, 23, 24 Lu et al. suggested that hypermethylation of hMLH1 in PTC was significantly correlated with age, size, and the number of primary lesions, local invasion, T stage, and lymph node metastases. Shou et al. reported that aberrant methylation of the RASSF1A promoter was more frequently detected in thyroid cancer than in noncancerous controls. Wang et al. reported that hypermethylation of RUNX3 significantly increased the risk of PTC recurrence by using appropriate site-specific cut-off values.Genomic DNA hypomethylation has been associated with increased CIN, which plays a central role in tumorigenesis.14, 15, 16, 17 Kawano et al. suggested that whole genome hypomethylation initiated carcinogenesis of esophageal squamous cells through CIN. Nishida et al. concluded that DNA hypomethylation is an important cause of CIN in the earliest phase of humanhepatocellular carcinoma, especially in the background of noncirrhotic livers. Rodriguez et al. reported that CIN was correlated with genome-wide DNA demethylation in human primary colorectal cancers, and Suzuki et al. reported that global DNA demethylation in gastrointestinal cancer was correlated with increased genomic damage. However, few reports have shown that hypermethylation is associated with CIN.Gene-expression levels can be affected by a number of factors, including the environment, gene mutations, and DNA methylations.25, 26, 27, 28 Hypermethylated promoters lead to an “off” state of expression, whereas less methylation may lead to an “on” state. Methylation is an acquired epigenetic phenomenon but can be faithfully reproduced in the progeny of affected cells, and the methylation will then be propagated during clonal selection during the development of tumors. DNA methylations are therefore more stable than gene expressions. Although several methylation biomarkers have been identified to predict cancer survival, they are usually limited to average methylation levels of several genes based on experimental data. However, there is a weak correlation between the average DNA methylation levels of gene promoters and the levels of gene expression. This report prompted us to speculate that methylated CpGs might not have equivalent regulatory effects on gene expression, which results in the maximum regulatory effect of DNA methylation on gene expression. We then identified the DNA methylation patterns that had high correlations with CIN as prognostic markers of PTC.In the following study, based on The Cancer Genome Atlas (TCGA) database, we identified differentially methylated CpG sites between PTC and normal samples. Multiple regression analyses were then used to obtain the methylation patterns of CpGs with the highest correlations with gene expression. We obtained specified genes by combining CpG sites, which were represented as specific DNA methylation patterns. In order to identify CIN-related epigenetic markers of PTC survival, a method was developed to characterize CIN based on the DNA methylation patterns of genes using the Student’s t statistics. Pearson’s correlation coefficient (PCC) was used to evaluate the correlations between DNA methylation patterns and the CIN of each gene. With the use of PCC and a permutation test, we verified that the featured genes were highly associated with CIN. With the use of the log-rank test, univariate Cox regression analyses, and the Kaplan-Meier method, we conducted prognostic analyses. The DNA methylation patterns of UBAC2 and ELOVL2 that had high correlations with CIN provided good prognostic values for PTC. Moreover, UBAC2 and ELOVL2 were hypomethylation phenotypes. The DNA methylation patterns associated with CIN may therefore be a good predictor of PTC survival.
Results
Identifying Differentially Methylated CpGs Associated with Gene Expression
With the use of the Illumina Infinium HumanMethylation450 BeadChip assay (Illumina, San Diego, CA), raw data (level 3 data), raw UNC RNAseqV2 level 3 expression data, and the clinical prognostic information for PTC were collected from TCGA. The DNA methylation data and the gene-expression data both contained 562 samples, comprised of 49 matched normal samples, 494 PTC samples, 11 metastatic thyroid carcinoma samples, and eight samples of other types of thyroid cancers. We eliminated batch effects between these 562 samples. In total, 49 PTC samples and 49 matched normal samples comprised the training set, with the remaining 445 PTC samples used as the testing set. Three samples were excluded because they did not contain survival information. The final study included 442 PTCpatients in the testing set (Table 1). Eleven metastatic thyroid carcinoma samples and eight samples of other types of thyroid cancers were excluded.
Table 1
Clinical Information on PTC Patients
Characteristics
Training Set (n = 49)
Testing Set (n = 442)
State
Living
45
431
Dead
4
11
Survival (years)
Mean ± SD
4.22 ± 3.52
2.46 ± 2.37
Gender
Male
14
117
Female
35
325
Age (years)
Mean ± SD
45.29 ± 17.22
47.57 ± 15.72
Stage
I
30
246
II
5
47
III
11
99
IV
3
50
Histological Type
Thyroid papillary carcinoma
Classical/usual
42
313
Tall cell (≥50% tall cell features)
3
33
Follicular (≥99% follicular patterned)
4
96
Clinical Information on PTCPatientsAll of the CpG sites were from the Illumina Infinium HumanMethylation450 BeadChip assay. Raw data (level 3 data) contained all CpG sites in the gene sequence and all CpG sites in the promoter of the analyzed gene. For a specific CpG site, we calculated the correlations with expression of the nearest gene. In total, 203,015 differentially methylated CpG sites were identified from the training set, and 7,541 differentially methylated CpG sites were significantly related to gene expression (false discovery rate [FDR]-corrected p value < 0.05) and included 4,997 genes. There were 3,673 hypermethylated sites and 3,868 hypomethylated sites. More than 50% of the hypomethylated sites were negatively related to gene expression (Figure S1A), and more than 50% of the hypermethylated sites were negatively related to gene expression (Figure S1B). A total of 2,035 genes with the proportion of the hypomethylated sites associated with gene expression were greater than 9.60%, accounting for more than 50%, and the expression of seven genes was influenced by all of their hypomethylated sites (Figure S1C). A total of 2,082 genes with the proportion of the hypermethylated sites associated with gene expression were greater than 9.22%, accounting for more than 50%, and the expression of three genes was influenced by all of their hypermethylated sites (Figure S1D). The hypomethylated sites showed significant regulatory effects on gene expression (Figure S1E), and the hypermethylated sites also showed significant regulatory effects on gene expression (Figure S1F).
Identification of the Genes Related to CIN
The methylation pattern score (score value) was used to describe the maximal regulatory effect of DNA methylation on gene expression. The distribution of the score value was consistent with the normal distribution, which was the same as the gene expression (Figure S2). We therefore characterized the CIN based on the score value using Student’s t statistics. The f was the net deviation in the score value contained in each chromosomal region relative to the remainder of the sampled score value. The results of clustering analyses showed that most of the bands had CIN (Figure 1A). As a measure of overall CIN, the MFA of a sample was defined as the sum of the magnitudes of its f features. In total, the DNA methylation patterns of 1,239 genes from preselected 4,997 genes were significantly related to the MFA (Table S1), and 53% of them showed a positive correlation, indicating the higher the gene score value, the higher the CIN (Figure 1B). A total of 572 of them were hypermethylation phenotypes (Table S2), and 667 of them were hypomethylation phenotypes (Table S3).
Figure 1
Analyses of Papillary Thyroid Carcinoma Data
(A) Two-way hierarchical clustering of the f value in papillary thyroid carcinoma tissue samples. The blue area was >0, and the red area was <0. (B) In total, DNA methylation patterns of 1,239 genes were highly related to chromosomal instability (CIN), and 653 (53% of the total) were positively correlated genes. In total, 586 (47% of the total) were negatively correlated genes. (C) Gene ontology (GO) functional enrichment analyses for DNA methylation patterns of 1,239 genes were highly related to CIN using DAVID. (D) The Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses for DNA methylation patterns of 1,239 genes were highly related to CIN using DAVID. (E) Two-way hierarchical clustering of DNA methylation patterns of 1,239 genes were highly related to CIN in papillary thyroid carcinoma tissue samples and normal tissue samples. The blue area was >0, and the red area was <0.
Analyses of Papillary Thyroid Carcinoma Data(A) Two-way hierarchical clustering of the f value in papillary thyroid carcinoma tissue samples. The blue area was >0, and the red area was <0. (B) In total, DNA methylation patterns of 1,239 genes were highly related to chromosomal instability (CIN), and 653 (53% of the total) were positively correlated genes. In total, 586 (47% of the total) were negatively correlated genes. (C) Gene ontology (GO) functional enrichment analyses for DNA methylation patterns of 1,239 genes were highly related to CIN using DAVID. (D) The Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses for DNA methylation patterns of 1,239 genes were highly related to CIN using DAVID. (E) Two-way hierarchical clustering of DNA methylation patterns of 1,239 genes were highly related to CIN in papillary thyroid carcinoma tissue samples and normal tissue samples. The blue area was >0, and the red area was <0.Gene ontology (GO) functional enrichment analyses (Figure 1C) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses (Figure 1D) showed that the 1,239 genes were significantly related to organ development and were enriched in pathways that were related to cancer. The DNA methylation patterns of these genes were significantly related to the MFA and significantly differentiated between normal samples and cancer samples (Figure 1E).
Differentially Methylated Markers Associated with PTC Clinical Prognoses
We used the log-rank test to analyze the DNA methylation patterns of 1,239 genes that had high correlations with CIN, with 28 genes being left. The DNA methylation patterns of thirteen genes that had high correlations with CIN were significant (p < 0.05) using univariate Cox regression analyses. The hazard ratios (HRs) and 95% confidence intervals (CIs) of the clinical parameters for mortality were calculated using univariate Cox proportional hazard model analyses. The Kaplan-Meier method was used to estimate the overall survival times of patients. The DNA methylation patterns of four genes, TMEM18, UBAC2, ELOVL2, and ALMS1P, which had high correlations with CIN, were finally identified as prognosis markers of PTC. We developed four distinct risk scores, each of them based on the methylation pattern of one of four genes. The risk score formula for each patient was calculated as follows: risk score = (−20.05 × TMEM18), (−41.68 × UBAC2), (−6.20807 × ELOVL2), and (21.25 × ALMS1P). If the regression coefficient estimated by the univariate Cox proportional hazards model > 0, then the hyper-methylation of the risk gene was bad for the survival time. We subdivided the PTCpatients into high-risk and low-risk groups by using the median of the risk scores. The HRs (95% CI) of TMEM18, UBAC2, ELOVL2, and ALMS1P were 1.951 × 10−9 (2.351e−17-0.1619), 7.909 × 10−19 (3.658e−33-0.000171), 0.002013 (5.646e−06-0.7177), and 1.7 × 109 (1.075-2.687e+18), respectively (Table 2).
Table 2
Prognosis Markers of PTC Patients
Variable
HR (95% CI)
Regression Coefficient
p Value
TMEM18
1.951 × 10-9(2.351e−17-0.1619)
−20.05
0.0311
UBAC2
7.909 × 10-19(3.658e−33-0.000171)
−41.68
0.0133
ELOVL2
0.002013(5.646e−06-0.7177)
−6.208070
0.0384
ALMS1P
1.7 × 109(1.075-2.687e+18)
21.25
0.0492
Prognosis Markers of PTCPatientsThe DNA methylation pattern of UBAC2 that had a high correlation with CIN significantly predicted the survival of PTCpatients in the training set (Figure 2A). The 5-year survival percentage of the high-risk score patients was 68.2% ± 13.6% and was less than that of the low-risk score patients (100%, p = 0.038). From the low-risk group to the high-risk group, the methylation levels of the UBAC2cg16941122 site showed a significant upward trend and had a strong linear relationship with the risk index, although the expression levels of UBAC2 and the average methylation levels had no obvious trend (Figure 2B). Furthermore, a higher UBAC2 risk score was associated with a worse PTCpatient prognosis.
Figure 2
The UBAC2 Risk Score Model Predicts Overall Survival of Papillary Thyroid Carcinoma Patients in the Training Dataset
(A) Kaplan-Meier analyses for overall survival of patients with high-risk or low-risk scores. The p value was calculated using the two-sided log-rank test. (B) DNA methylation pattern, expression levels, and average methylation levels of the prognostic UBAC2 that correlated with patients’ survival status and increased risk scores.
The UBAC2 Risk Score Model Predicts Overall Survival of Papillary Thyroid CarcinomaPatients in the Training Dataset(A) Kaplan-Meier analyses for overall survival of patients with high-risk or low-risk scores. The p value was calculated using the two-sided log-rank test. (B) DNA methylation pattern, expression levels, and average methylation levels of the prognostic UBAC2 that correlated with patients’ survival status and increased risk scores.The DNA methylation pattern of TMEM18 that had a high correlation with the CIN significantly predicted the survival of PTCpatients in the training set. The 5-year survival percentage of the high-risk score patients was 59.8% ± 16.3% and was significantly lower than that of the low-risk score patients (100%, p = 0.01) (Figure S3). The DNA methylation pattern of ALMS1P that had a high correlation with CIN significantly predicted the survival of PTCpatients in the training set. The 5-year survival percentage of the high-risk score patients was 67.6% ± 14.2% and was significantly lower than that of the low-risk score patients (100%, p = 0.041) (Figure S4). Therefore, a higher gene risk score predicted a worse PTCpatient prognosis.The DNA methylation pattern of ELOVL2 that had a high correlation with CIN significantly predicted the survival of PTCpatients in the training set (Figure 3A). The 5-year survival percentage of the high-risk score patients was 65.8% ± 14.6% and was significantly lower than that of the low-risk score group (100%, p = 0.029). From the low-risk group to the high-risk group, the methylation levels of the ELOVL2cg24724428 site showed a significant upward trend and had a strong linear relationship with the risk index, although the expression levels of ELOVL2 and the average methylation levels had no obvious trend (Figure 3B). Moreover, the ELOVL2 risk score was significantly different in the four stages of PTC (Kruskal-Wallis test, p = 0.001527), suggesting that the DNA methylation pattern of ELOVL2 that had a high correlation with CIN was related to the progress of PTC (Figure 3C). A higher ELOVL2 risk score correlated with a worse PTCpatient prognosis, further suggesting that the ELOVL2 score value that had a high correlation with CIN significantly influenced the patient’s clinical condition, progression of the disease, and survival time.
Figure 3
The ELOVL2 Risk Score Model Predicts Overall Survival of Papillary Thyroid Carcinoma Patients in the Training Dataset
(A) Kaplan-Meier analyses for overall survival of patients with high-risk or low-risk scores. The p value was calculated using the two-sided log-rank test. (B) DNA methylation pattern, expression levels, and average methylation levels of the prognostic ELOVL2 that correlated with patients’ survival status and increased risk scores. (C) Box plot of the ELOVL2 risk scores in the four stages of papillary thyroid carcinoma.
The ELOVL2 Risk Score Model Predicts Overall Survival of Papillary Thyroid CarcinomaPatients in the Training Dataset(A) Kaplan-Meier analyses for overall survival of patients with high-risk or low-risk scores. The p value was calculated using the two-sided log-rank test. (B) DNA methylation pattern, expression levels, and average methylation levels of the prognostic ELOVL2 that correlated with patients’ survival status and increased risk scores. (C) Box plot of the ELOVL2 risk scores in the four stages of papillary thyroid carcinoma.We performed a time-dependent receiver-operating characteristic (ROC) curve analysis to compare the sensitivity and specificity for survival predictions among the DNA methylation patterns of these four genes. The area under the ROC curve (AUC) value was obtained from the ROC analyses and was compared among the DNA methylation patterns of these four genes. The AUC values of TMEM18, UBAC2, ELOVL2, and ALMS1P were 0.95, 0.886, 0.764, and 0.854, respectively (Figure 4).
Figure 4
Receiver-Operated Characteristic (ROC) Analyses of the Sensitivity and Specificity for Survival Prediction among the DNA Methylation Patterns of the Four Genes
The time-dependent ROC curve was used to evaluate the prognostic performance for survival predictions. The performance comparison was assessed among the four genes by calculating the area under the ROC curves (AUC) in the training dataset.
Receiver-Operated Characteristic (ROC) Analyses of the Sensitivity and Specificity for Survival Prediction among the DNA Methylation Patterns of the Four GenesThe time-dependent ROC curve was used to evaluate the prognostic performance for survival predictions. The performance comparison was assessed among the four genes by calculating the area under the ROC curves (AUC) in the training dataset.
Verification of the Testing Set
The testing set was used to evaluate the reproducibility and availability of these four genes in a prognostic model. The DNA methylation patterns of ELOVL2 and UBAC2 were obtained using independent cancer samples that not only associated with CIN but could also be used to predict the prognosis of PTC. Moreover, UBAC2 and ELOVL2 were hypomethylation phenotypes. The DNA methylation pattern of UBAC2 that had a high correlation with CIN predicted the survival of PTCpatients in the testing set (Figure 5A). The 5-year survival percentage of the high-risk score patients was 83.6% ± 6.2%, which was significantly lower than that of the low-risk score group (97.1% ± 2%, p = 0.024). From the low-risk group to the high-risk group, the methylation levels of the UBAC2cg16941122 site showed a significant upward trend and had a strong linear relationship with the risk index, although the expression levels of UBAC2 and the average methylation levels had no obvious trend (Figure 5B). The samples from the testing set were subgrouped based on the tumor stage, and the survival times of patients from the high-risk score group were significantly different from that of the low-risk score group in stage III of PTC (p = 0.018) (Figure 5C). The survival times of patients from the high-risk score group were the same as those from the low-risk score group in stages I, II, and IV of PTC (p = 0.39, 0.355, 0.137, respectively). We subgrouped the testing samples based on histological type, and the survival times of the patients in the high-risk score group were significantly different from those of the low-risk score group in the thyroid papillary carcinoma classical/usual group (p = 0.009) (Figure 5D). The thyroid papillary carcinoma tall cell (≥50% tall cell features) histological group contained 33 samples, and there were no samples from patients who had died. Therefore, we could not perform survival analyses. The survival times of the patients in the high-risk score group were the same as those from the low-risk score group in the thyroid papillary carcinoma-follicular (≥99% follicular patterned) histological group (p = 0.366). A higher UBAC2 risk score predicted a worse PTCpatient prognosis.
Figure 5
The UBAC2 Risk Score Model Predicts Overall Survival of Papillary Thyroid Carcinoma Patients in the Testing Dataset
(A) Kaplan-Meier analyses for overall survival of patients with high-risk or low-risk scores. The p value was calculated using the two-sided log-rank test. (B) DNA methylation pattern, expression level, and average methylation level of the prognostic UBAC2 that correlated with patients’ survival status and increased risk scores. (C) Kaplan-Meier analyses for overall survival of patients in the third stage of papillary thyroid carcinoma. The p value was calculated using the two-sided log-rank test. (D) Kaplan-Meier analyses for overall survival of patients with thyroid papillary carcinoma-classical/usual histological types. The p value was calculated using the two-sided log-rank test.
The UBAC2 Risk Score Model Predicts Overall Survival of Papillary Thyroid CarcinomaPatients in the Testing Dataset(A) Kaplan-Meier analyses for overall survival of patients with high-risk or low-risk scores. The p value was calculated using the two-sided log-rank test. (B) DNA methylation pattern, expression level, and average methylation level of the prognostic UBAC2 that correlated with patients’ survival status and increased risk scores. (C) Kaplan-Meier analyses for overall survival of patients in the third stage of papillary thyroid carcinoma. The p value was calculated using the two-sided log-rank test. (D) Kaplan-Meier analyses for overall survival of patients with thyroid papillary carcinoma-classical/usual histological types. The p value was calculated using the two-sided log-rank test.The DNA methylation pattern of ELOVL2 that had a high correlation with the CIN significantly predicted the survival of PTCpatients in the testing set (Figure 6A). The 5-year survival percentage of the high-risk score patients was 83.4% ± 6.1% and was significantly lower than that of the low-risk score group (97.1% ± 2.4%; p = 0.026). From the low-risk group to the high-risk group, the methylation levels of the ELOVL2cg24724428 site showed a significant upward trend and had a strong linear relationship with the risk index, although the expression levels of ELOVL2 and the average methylation levels had no obvious trend (Figure 6B). The samples from the testing set were subgrouped based on the tumor stage, and the survival times of patients from the high-risk score group were the same as those of the low-risk score group in stages I, II, III, and IV of PTC (p = 0.324, 0.206, 0.19, and 0.05, respectively). We subgrouped the testing samples based on histological type, and the survival times of the patients in the high-risk score group were the same as those from the low-risk score group in the thyroid papillary carcinoma classical/usual group (p = 0.078). The survival times of the patients in the high-risk score group were the same as those of the low-risk score group in the thyroid papillary carcinoma-follicular (≥99% follicular pattern) histological group (p = 0.317). Moreover, the ELOVL2 risk score was significantly different in the four stages of PTC (Kruskal-Wallis test; p = 9.915 × 10−13) (Figure 6C), suggesting that the DNA methylation pattern of ELOVL2 that had a high correlation with CIN was related to the progress of PTC. A higher ELOVL2 risk score was correlated with a worse PTCpatient prognosis, indicating that the ELOVL2 score value that had a high correlation with CIN significantly influenced the patient’s clinical condition, progression of disease, and survival time. The AUC values of ELOVL2 and UBAC2 were 0.849 and 0.556, respectively (Figure 7).
Figure 6
The ELOVL2 Risk Score Model Predicts Overall Survival of Papillary Thyroid Carcinoma Patients in the Testing Dataset
(A) Kaplan-Meier analyses for overall survival of patients with high-risk or low-risk scores. The P value was calculated using the two-sided log-rank test. (B) DNA methylation pattern, expression level, and average methylation level of the prognostic ELOVL2 that correlated with the patients’ survival status and increased risk scores. (C) Box plot of the ELOVL2 risk score in the four stages of papillary thyroid carcinoma.
Figure 7
Receiver-Operated Characteristic (ROC) Analyses of the Sensitivity and Specificity for Survival Prediction between the DNA Methylation Patterns of the Two Genes
The time-dependent ROC curve was used to evaluate the prognostic performance for survival predictions. The performance comparison was assessed between the two genes by calculating the area under the ROC curves (AUC) in the testing dataset.
The ELOVL2 Risk Score Model Predicts Overall Survival of Papillary Thyroid CarcinomaPatients in the Testing Dataset(A) Kaplan-Meier analyses for overall survival of patients with high-risk or low-risk scores. The P value was calculated using the two-sided log-rank test. (B) DNA methylation pattern, expression level, and average methylation level of the prognostic ELOVL2 that correlated with the patients’ survival status and increased risk scores. (C) Box plot of the ELOVL2 risk score in the four stages of papillary thyroid carcinoma.Receiver-Operated Characteristic (ROC) Analyses of the Sensitivity and Specificity for Survival Prediction between the DNA Methylation Patterns of the Two GenesThe time-dependent ROC curve was used to evaluate the prognostic performance for survival predictions. The performance comparison was assessed between the two genes by calculating the area under the ROC curves (AUC) in the testing dataset.
Discussion
To predict PTCpatient clinical prognoses, tumor node metastasis (TNM) staging; patient age, histologic grade of the tumor, tumor extent (extrathyroidal invasion or distant metastases), and size of the primary tumor (AGES) scoring; patient age, presence of distant metastases, extent and size of the primary tumor (AMES) scoring; and metastasis, patient age, completeness of resection, local invasion, and tumor size (MACIS) scoring have been used.32, 33, 34, 35 However, patients with similar clinical phenotypes do not have identical prognoses, suggesting that the present PTC prognostic evaluation system does not provide an accurate clinical prognosis for every patient.36, 37, 38, 39 The PTC prognostic evaluation system therefore needs improvement. The accuracy of PTC prognoses could be significantly improved by the use of molecular markers. Cancerpatients with a higher CIN have a worse clinical prognosis, so CIN could be used to evaluate the clinical prognoses of tumorpatients.40, 41, 42, 43 To improve the existing PTC prognostic evaluation system, it is important for PTCpatient treatment to identify reliable CIN-related prognostic markers. Although CIN-related prognostic markers have been previously reported, the results differed. Therefore, more valid CIN-related prognostic markers are needed to improve the accuracy and credibility of the prognoses.DNA methylation plays an important role in the regulation of gene expression and CIN. Based on PTC samples from TCGA database, we used the R Significance Analysis of Microarrays (SAM) package to identify 203,015 differentially methylated CpG sites between PTC and normal samples. Then, we used multiple regression analyses to obtain 7,541 methylation patterns of CpG sites with the strongest correlation with gene expressions. A total of 4,997 genes were obtained by combining the CpG sites, which were represented as featured DNA methylation patterns. The results showed that the distribution of DNA methylation patterns was consistent with the normal distribution, which was the same as gene expression. We subsequently developed a method to characterize CIN based on DNA methylation patterns of genes using the Student’s t statistics. PCC was used to evaluate the correlation between the DNA methylation patterns and the CIN of each gene. We found that 1,239 genes were highly associated with CIN. With the use of the log-rank test, univariate Cox regression analyses, and the Kaplan-Meier method, DNA methylation patterns of four genes, including TMEM18, UBAC2, ELOVL2, and ALMS1P, which had high correlations with CIN, provided good prognostic values for PTC. An independent test set was used to test the validity of the methylation risk score of the four genes. Finally, ELOVL2 and UBAC2 remained. In addition, the DNA methylation pattern of ELOVL2 was involved in different stages of PTC, indicating that the DNA methylation pattern of ELOVL2 with a high correlation with CIN significantly influenced the patient’s clinical condition, progression of disease, and survival time. The DNA methylation pattern associated with CIN may therefore be a good predictor of PTC survival.Previous studies of ELOVL2 and UBAC2 emphasized their relationships with lipid metabolism and obesity.45, 46, 47, 48 González-Bengtsson et al. suggested that ELOVL2 played an important role in docosahexaenoic acid (DHA) synthesis. Kobayashi et al. reported that cells overexpressing ELOVL2 showed enhanced triacylglycerol synthesis and subsequent accumulation of lipid droplets. Pauter et al. suggested that hepatic DHA synthesis of ELOVL2, in addition to controlling de novo lipogenesis, also regulated lipid storage and fat mass expansion in an SREBP-1c-independent fashion. Tikhonenko et al. reported that a decrease in long-chain polyunsaturated fatty acids was associated with a decrease in the fatty acid elongases, ELOVL2 and ELOVL4, in diabetes, and additional studies showed that obesity increased the risk of thyroid cancer.49, 50, 51, 52 Han et al. reported that the morbidity of thyroid cancer in female patients was related to a high BMI. Hwang et al. suggested that weight gain and annual increases in obesity indicators in middle-aged adults increased the risk of developing PTC. Kim et al. reported that a higher BMI was associated with more aggressive tumor features, such as lymph node metastasis, lymphatic invasion, and tumor multiplicity in PTCpatients. Oberman et al. reported that obesity was significantly associated with thyroid cancer, with BMI, in particular, a strong predictor of thyroid cancer. ELOVL2 was also significantly related to biosynthesis of unsaturated fatty acids, and UBAC2 was significantly related to protein localization to the endoplasmic reticulum. Therefore, DNA methylation patterns of ELOVL2 and UBAC2 were not only associated with CIN, but might also participate in the initiation and development of PTC.Because of differences among individual patients, presently used prognostic indicators cannot accurately predict the prognosis of each patient. It is therefore difficult to evaluate the clinical prognoses of patients with similar clinical features. Our results showed that the DNA methylation patterns of ELOVL2 and UBAC2, which had high correlations with CIN, could be used to predict the prognosis of PTC. Aberrant DNA methylation was related to the risk of PTC, and the epigenetic markers associated with CIN may be used as predictors of PTC survival. The inclusion of these prognosis markers into the present PTC prognostic evaluation system could therefore assist the clinician in determining the prognoses of patients with similar clinical features and could provide a more appropriate therapeutic schedule for high-risk patients to enhance the efficacies of PTC treatments.
Conclusions
UBAC2 and ELOVL2, which had high correlations with CIN, provided good prognostic values for PTC. The DNA methylation pattern associated with CIN may therefore be a good predictor of PTC survival.
Materials and Methods
Acquisition of Gene Expression and DNA Methylation Data
With the use of the Illumina Infinium HumanMethylation450 BeadChip assay, raw data (level 3 data), raw UNC RNAseqV2 level 3 expression data, and clinical prognostic information for PTC were collected from TCGA (https://www.cancer.gov/tcga/).
Eliminating Batch Effects
In order to ensure the accuracy of experiments, we used the R Surrogate Variable Analysis (Bioconductor) package to eliminate batch effects of all of the DNA methylation data and the gene-expression data from all samples.
Identifying Differentially Methylated CpGs
To compare the differences of DNA methylation between cancer and normal samples in the training set, we used the R SAM package to identify differentially methylated CpG sites. To control for a FDR of the results, we used the Benjamini-Hochberg method to correct the p value obtained from the statistical test. The threshold for defining the differentially methylated CpG site involved a value of p < 0.05, and the differential level (delta beta value) between the cancer and normal samples was >0.1.For the differentially methylated CpG sites of the PTC samples from the training set, we used multiple regression coefficients to evaluate the correlation between DNA methylation and gene expression. The dependent variable was gene expression of a single gene. The independent variables were all CpG sites mapped to the gene, containing all CpG sites under the gene sequence and all CpG sites in the promoter of the analyzed gene. For a particular CpG site, we calculated correlations with the expression of the nearest gene. A value of p < 0.05 was identified as a significant methylation level highly related to gene expression.
Quantitation of the Regulatory Effect of DNA Methylation (Score Value)
With the consideration of the multiple CpGs mapped to the gene and the variability of the DNA methylation levels of multiple CpGs located in the same gene, the average methylation level may not reflect the real ability of DNA methylation to regulate gene expression. Multiple regression analyses were therefore used to quantify the regulatory competence of differential CpG sites and then to quantify the maximal regulatory effect of DNA methylation on gene expression. The methylation pattern score (score value) was defined as follows:The j CpGs sites represented the significant methylation sites in multiple regression analyses (p < 0.05), αˆj (j = 1,2,…,j) represented the multiple regression coefficient of the jth CpG sites of the gene, and cgkj represented the methylation level of the jth CpG sites of the gene in the kth sample.
Calculating the CIN Score (MFA)
This study downloaded the cytoband coordinate file (GRCh37/hg19) from the UCSC Genome Bioinformatics and then mapped the genes to chromosomal sub-bands. If fewer than five genes were present in a given cytoband, we considered the statistical measure unreliable, and that cytoband was eliminated from further analysis. The f described the net deviation in score value contained in each chromosomal region relative to the remainder of the sampled score value:The μBi represented the average score value of all of the genes under sample i in the band, represented the variance of the score value of all the genes under sample i in the band, μGi represented the average score value of the rest of the genes under sample i, represented the variance of the score value of the rest of the genes under sample i, N represented the number of genes in the band, and N represented the number of the remaining genes.As a measure of overall CIN, we defined the MFA of a sample as the sum of the magnitudes of its f features:
Identification of Genes Related to MFA
PCC was used to evaluate the correlation between the score value and the MFA of each gene. In order to control for a FDR of the PCC, we adopted a permutation test to correct the p value of the statistical test. For each gene, the MFA was permutated 1,000 times to calculate its PCC value, and if the value of p was <0.05, this gene was identified as highly related to MFA as follows:where c was the number of PCC square values that were no less than the actual PCC square value of the gene from the 1,000 permutations.
Enrichment Analyses for the GO and KEGG Pathways
To analyze further the biological significance of the genes related to CIN, we used Database for Annotation, Visualization and Integrated Discovery (DAVID) software to perform GO function analyses for the genes related to CIN. The Fisher exact test with multiple test corrections (FDR < 0.05) was used to obtain significant GO terms associated with PTC. We acquired the KEGG pathway terms using the same method.
Prognosis Analyses
The log-rank test was used to obtain p values and to identify a subset of genes for which a score value that had high correlations with CIN showed significant differences between the high and low groups. The high and low groups were groups with high and low score values, grouped by their median values. The survival times were compared between these two groups. Genes with p < 0.05 were used in the study. The p values were uncorrected p values. Univariate Cox regression analyses were performed to assess the survival prognosis capabilities of the selected gene set using the overall survival time as a dependent variable. The HRs and 95% CIs of the clinical parameters for mortality were calculated using the univariate Cox proportional hazard model. The risk score formula for each patient was calculated as follows:where k was the kth sample, i denoted the feature genes filtered by the univariate Cox proportional hazards models, and c was the regression coefficient estimated by the univariate Cox proportional hazards model. The 5-year overall survival for each score value scoring group (high versus low) was calculated using the Kaplan-Meier method, and the statistical significance was assessed using the log-rank test. The significance level of all statistical tests was p < 0.05. We performed time-dependent ROC curve analyses to compare the sensitivities and specificities for survival predictions between the predicted genes. The ROC AUC values were obtained from ROC analyses and were compared between the selected genes.In order to verify the reproducibility and accuracy of the gene prognostic model, as predicted in the training set, we used the testing set. The regression coefficients and the thresholds of risk scores derived from the training set were directly applied to the testing set, and then the patients in the testing set were divided into high-risk and low-risk groups. The evaluation of survival times and the comparison of differences between the two groups were the same as that of the training set.
Author Contributions
H.Q. conceived and designed the experiments. J.H., M.C., and Q.F. acquired the experiment data. J.H., M.C., and Y.Z. performed the study. J.H., M.C., and Y.W. carried out the data analysis. J.H. and J.E. wrote the manuscript. All authors read and approved the final manuscript.
Authors: Shasha Li; Jiali Yu; Amanda Huber; Ilona Kryczek; Zhuwen Wang; Long Jiang; Xiong Li; Wan Du; Gaopeng Li; Shuang Wei; Linda Vatan; Wojciech Szeliga; Arul M Chinnaiyan; Michael D Green; Marcin Cieslik; Weiping Zou Journal: Cell Rep Date: 2022-04-05 Impact factor: 9.995