Literature DB >> 35153506

Identification of a Four-Gene Signature for Determining the Prognosis of Papillary Thyroid Carcinoma by Integrated Bioinformatics Analysis.

Yuting Luo1, Rong Chen1, Zhikun Ning1, Nantao Fu1, Minghao Xie1.   

Abstract

PURPOSE: Although well-differentiated papillary thyroid carcinoma (PTC) has an indolent nature and usually an excellent prognosis, some patients experience disease recurrence or death. The aim of this study was to identify prognostic markers to stratify PTC patients. PATIENTS AND METHODS: Eight gene-expression profiles (GSE3467, GSE3678, GSE5364, GSE27155, GSE33630, GSE53157, GSE60542, and GSE104005) were obtained from the Gene Expression Omnibus and used to analyze differentially expressed genes (DEGs) between PTC tissues and non-tumor tissues. Univariable Cox regression survival analysis and Lasso-penalized Cox regression analysis were performed to identify prognostic genes and establish a risk-score model based on the integrated DEGs. Kaplan-Meier (KM) and receiver operating characteristic (ROC) curves were used to validate the prognostic performance of the risk score. A nomogram was constructed based on The Cancer Genome Atlas dataset and Multivariable Cox regression analysis.
RESULTS: A total of 165 upregulated and 207 downregulated DEGs were screened. A four-gene signature including PAPSS2, PCOLCE2, PTX3, and TGFBR3 was identified. The risk-score model showed a strong diagnosis performance for identifying patients with a poor prognosis. KM analysis showed that patients with low risk scores had a significantly more favorable overall survival (OS) than those with high risk scores (p = 0.0002). ROC curves based on the four-gene signature showed better performances in predicting 1-, 3-, and 5-year survival than did the American Joint Committee on Cancer staging system (area under the curve: 0.86 vs 0.84, 0.80 vs 0.63, and 0.79 vs 0.73, respectively). Furthermore, when combined with age and tumor status from the nomogram, the four-gene signature achieved a good performance in guiding postoperative follow-up surveillance of patients with PTC.
CONCLUSION: The four-gene signature was found to be a novel and reliable biomarker with great potential for clinical application in risk stratification and OS prediction in patients with PTC.
© 2022 Luo et al.

Entities:  

Keywords:  gene signature; overall survival; papillary thyroid cancer; prognosis

Year:  2022        PMID: 35153506      PMCID: PMC8824688          DOI: 10.2147/IJGM.S346058

Source DB:  PubMed          Journal:  Int J Gen Med        ISSN: 1178-7074


Introduction

Thyroid carcinoma (THCA) is the most common type of endocrine malignancy and its incidence is increasing.1 Based on its histopathological characteristics, thyroid carcinoma can be classified into multiple subtypes, such as papillary thyroid carcinoma (PTC), follicular thyroid carcinoma, and anaplastic thyroid carcinoma.2 PTC is the most common subtype of THCA, comprising approximately 80% of all thyroid malignancies.3 The prognosis for patients with PTC is highly favorable, although more than 10% of patients eventually experience local or distant disease recurrence.4 Those patients with potentially poor outcomes must be monitored and administered timely and effective treatments to prolong their survival and improve their quality of life. Therefore, it is necessary to identify effective prognostic biomarkers of PTC to accurately assess their prognosis and to stratify patients. At present, treatment options for PTC and disease prognosis depend mainly on the histopathological subtype. Some variants of PTC may behave more aggressively than classic PTC. These so-called aggressive variants include the tall-cell variant, the columnar variant, the solid variant, and the recently described hobnail variant. The tall-cell variant is the most common aggressive variant of PTC. The tumor cells have typical nuclear features of PTC, composed of prominent papillary structures lined by cells 2 or 3 times as tall as they are wide, and have abundant eosinophilic (oncocytic-like) cytoplasm. The columnar variant can also be aggressive, particularly in older patients, with larger tumors, presenting symptomatically, and showing a diffusely infiltrative growth, and extrathyroidal extension. The solid variant not only has the same growth pattern with poorly differentiated THCA but also has the nuclear features of PTC. The histologic features of hobnail variant include papillary and micropapillary structures closely lined by cells containing eosinophilic cytoplasm and apically located nuclei with prominent nucleoli. The tumors cells have increased nuclear to cytoplasmic ratios and apically placed nuclei that produce a hobnail like surface bulge.5 However, aggressive PTC variants remain largely understudied.6 Some reports have shown that the diffuse-sclerosing variant is an indolent variant of PTC,7,8 but other data have suggested that the variant is a risk factor for a poor patient prognosis.9,10 Moreover, in the absence of explicit standards, histopathology definitions vary widely across institutions. There has been considerable disagreement among different investigators concerning the threshold percentage of tall cells (30–70%) that constitute a true tall-cell PTC variant.11–13 Therefore, the use of histopathological subtyping of PTC in routine clinical practice is limited. In addition to the histopathological differences, the molecular features of PTC may also vary. Programmed cell death protein 1 (PD-1) is an apoptosis-associated gene that involved in the regulation of the immune response and is one of the most important inhibitory checkpoints.14 Girolami et al15 reported that the status of BRAF mutation was significant association with programmed death-ligand 1 (PD-L1) expression in PTC patients. Immunotherapy may be considered in a subset of BRAF mutant PTC. However, the reported rates of BRAF mutation vary significantly between studies. BRAF mutation has been observed in 80–100% tall-cell PTC variant patients. Whereas the BRAF mutation rate is similar between the columnar variant and classic PTC patients. With developments in gene chips and high-throughput sequencing, gene signatures based on mRNA-expression levels have shown great potential for determining PTC prognosis. However, the data from previous studies have often been incomplete or inconsistent with one another. A comprehensive analysis of integrated gene-expression data using bioinformatics methods can overcome the problems caused by heterogeneities in expression studies. In this study, eight microarray datasets of differentially expressed genes (DEGs) from the Gene Expression Omnibus (GEO) were integrated and used for bioinformatics analyses. Univariable and Lasso–Cox regression analysis were performed to identify overall survival (OS)-related DEGs and propose a prognostic risk-score model to stratify PTC patients. Multivariable Cox regression analysis was used to identify independent prognostic factors for PTC. We identified a four-gene signature as a robust marker with great potential in risk stratification and OS prediction. Furthermore, based on the GeneMANIA database, we analyzed the potential biological functions of this four-gene signature by performing interaction-network and pathway-enrichment analyses. Finally, a prognostic nomogram was constructed based on The Cancer Genome Atlas (TCGA) dataset to guide postoperative follow-up surveillance of patients with PTC.

Materials and Methods

Microarray Preparation and Data Processing

Eight microarray datasets (GSE3467, GSE3678, GSE5364, GSE27155, GSE33630, GSE53157, GSE60542, and GSE104005) were downloaded from the GEO database () for DEG analysis, using “thyroid carcinoma” and “Homo sapiens” as search terms. The raw data from these datasets were processed using R language statistical software (version 3.6.1). Background correction was performed by processing the raw data by quartile normalization followed by log2 transformation, in order to obtain normally distributed expression values. The “Limma” package was used to identify the DEGs between PTC tissues and non-tumor tissues.16 The threshold for statistical significance was set at a log2 fold change (log2FC) of >1 and an adjusted P value of <0.05. Mean expression values were applied for genes whose expression values were studied using multiple probes. The robust rank aggregation (RRA) method was used to identify overlapping DEGs (P < 0.05) from the eight GEO datasets.

Constructing a Prognostic Gene Signature

Univariable Cox regression survival analysis was performed based on the overlapping DEGs (where P < 0.01 was considered statistically significant). Thereafter, Lasso-penalized Cox regression analysis was performed to construct the prognostic gene signature.17 The optimal values of the penalty parameter, alpha, were determined using the “glmnet” package of R software.18 A four-gene prognostic signature with corresponding coefficients was selected. Risk scores were calculated using TCGA data for each patients with thyroid carcinoma, using the “survminer” package of R software, and the patients were divided into two groups based on the median risk score.

TCGA Database Validation and Survival Analysis

Patients with PTC and follow-up times of >30 days in TCGA database were included in the survival analysis. The Kaplan–Meier (KM) curves were analyzed using the “survival” package of R software. Time-dependent receiver operating characteristic (ROC) curve analysis was conducted using the “pROC” and “survivalROC” packages of R.19 Patients with complete clinical information (n = 426) were included in the univariable and multivariable Cox regression analyses. Risk scores and other clinical characteristics (including the age, sex, tumor–node–metastasis classification, tumor location, residual tumor status, American Joint Committee on Cancer [AJCC] pathologic tumor stage, and tumor status at the last follow-up) were analyzed by univariable Cox regression analysis. Variants with a P value of <0.1 in univariable analysis were included in the subsequent multivariable Cox regression model to assess the predictive performance.

Constructing a Predictive Nomogram

The “rms” package of R software was used to construct a predictive nomogram based on independent prognostic parameters used to predict the probabilities of 3-, 5- and 8-year OS. The time-dependent area under the ROC curve (AUC) was calculated to determine the discriminatory ability of the nomogram. A calibration curve was used to visualize the performance of the nomogram. The patients were stratified into two groups according to the median nomogram score. The KM survival curves for the different groups were then plotted.

Functional-Enrichment Analysis

Pathway-enrichment analysis was carried out using the Kyoto Encyclopedia of Genes and Genomes (KEGG, ) and Enrichr () databases. Gene ontology (GO, ) was used to explore the biological functions of the four-gene signature. GO functional annotation included biological process (BP), cellular component (CC), and molecular function (MF) terms.

Statistical Analysis

The data generated in this study are expressed as the mean ± standard deviation. R software (version 3.6.1) was used for all statistical analyses. Categorical variables were analyzed by the χ2 test or Fisher’s exact test. Continuous variables were analyzed using Student’s t-test for paired samples. Two groups of boxplots were analyzed using the Wilcoxon test. A heatmap was generated to visualize the DEGs using the “pheatmap” package of R software. KM survival curves and hazard ratios (HRs) with 95% confidence intervals (CIs) were constructed using log–rank tests. Correlations among the individual genes in the signature were assessed with Pearson correlation coefficients. P < 0.05 was considered to reflect a statistically significant difference.

Ethics Approval

The study was reviewed and approved by the Institutional Review Board and the Ethics Committee of the First Affiliated Hospital of Nanchang University, Nanchang, China.

Results

Identification of DEGs

In this study, multistep analysis was performed to explore key DEGs in patients with PTC, and their biological functions and potentials for clinical application were studied using integrated bioinformatics methods. The overall study workflow is shown in Figure 1. The eight microarray datasets studied were normalized to increase the reliability of the data (Figure 2A and ). Hierarchical clustering analysis revealed different DEG profiles in the tumor and matched non-tumor tissues (Figure 2B and ). The volcano plots shown in Figure 2C and represent the distributions of DEGs identified from each of the eight datasets. Integrated analysis of the eight datasets using the RRA method identified 372 overlapping DEGs, including 165 upregulated and 207 downregulated genes. The top 20 overlapping upregulated and downregulated DEGs in the eight datasets are shown in Figure 2D.
Figure 1

Flowchart showing the workflow used for the identification, bioinformatic analysis, and validation of the prognostic gene signature of PTC.

Figure 2

Identification of DEGs in PTC between tumor and normal tissues. (A) Normalization of the GSE33630 dataset. (B) A representative heatmap of dataset GSE33630 showing that DEGs can effectively differentiate tumors from normal tissues (red represents higher expression and green represents lower expression). (C) Volcano plot of the GSE33630 dataset (green: downregulated genes; black: not differentially expressed; red: upregulated genes). (D) Heat map of the top 20 upregulated (red) and downregulated (green) DEGs of each expression microarray. The number in each column represents the LogFC value.

Flowchart showing the workflow used for the identification, bioinformatic analysis, and validation of the prognostic gene signature of PTC. Identification of DEGs in PTC between tumor and normal tissues. (A) Normalization of the GSE33630 dataset. (B) A representative heatmap of dataset GSE33630 showing that DEGs can effectively differentiate tumors from normal tissues (red represents higher expression and green represents lower expression). (C) Volcano plot of the GSE33630 dataset (green: downregulated genes; black: not differentially expressed; red: upregulated genes). (D) Heat map of the top 20 upregulated (red) and downregulated (green) DEGs of each expression microarray. The number in each column represents the LogFC value.

Screening and Expression Validation of Key Genes

Univariable Cox regression analysis revealed that 36 of the 372 DEGs identified correlated significantly with OS (Figure 3A). Lasso-penalized Cox regression analysis was then performed to identify a four-gene prognostic signature, consisting of 3′-phosphoadenosine 5′-phosphosulfate synthase 2 (PAPSS2), procollagen C-endopeptidase enhancer 2 (PCOLCE2), pentraxin 3 (PTX3), and transforming growth factor beta receptor 3 (TGFBR3) (Figure 3B). The four genes were significantly downregulated in THCA tissues, compared to their expression levels in normal tissues (Figure 3C). To validate expression differences in the four genes between tumor and non-tumor tissues, 512 THCA tissues and 59 normal tissues were compared using Gene Expression Profiling Interactive Analysis (). As shown in Figure 3D, the mRNA expression levels of the four genes were decreased in THCA tissues.
Figure 3

Identification of genes significantly correlated with PTC prognosis. (A and B) The four-gene prognostic signature was identified by univariate and Lasso–Cox regression analysis. (C) Heat map showing expression differences for the four genes between THCA and normal tissues. (D) Expression differences in the four genes were validated in THCA and normal tissues with GEPIA. (E) ROC analysis for the four genes in patients with PTC. (F) KM survival curves for the four genes in patients with PTC.*P < 0.05.

Identification of genes significantly correlated with PTC prognosis. (A and B) The four-gene prognostic signature was identified by univariate and Lasso–Cox regression analysis. (C) Heat map showing expression differences for the four genes between THCA and normal tissues. (D) Expression differences in the four genes were validated in THCA and normal tissues with GEPIA. (E) ROC analysis for the four genes in patients with PTC. (F) KM survival curves for the four genes in patients with PTC.*P < 0.05. In addition, 503 PTC patients with a follow-up period of >30 days were included in subsequent analysis (two TCGA-THCA patients were excluded because of a diagnosis of follicular carcinoma or poorly differentiated oncocytic carcinoma). The baseline characteristics of the patients are presented in Table 1. All four genes showed good correlations with PTC prognosis, as shown by ROC analysis (TGFBR3 AUC = 0.92, PTX3 AUC = 0.91, PCOLCE2 AUC = 0.89, and PAPSS2 AUC = 0.96) (Figure 3E). Furthermore, survival analyses were performed to verify correlations between gene-expression levels and prognosis. The patient cohorts were stratified into low- and high-risk groups according to the median expression of each gene. KM survival curves showed that patients in the high-risk group (with low expression of PAPSS2, PCOLCE2, PTX3, or TGFBR3) had a higher mortality rate than those in the low-risk group (Figure 3F).
Table 1

Baseline Characteristics of Patients with PTC, Using TCGA Data

Clinical FeatureNumber (%)Clinical FeatureNumber (%)
Follow-up time (months)40.0 ± 32.5AJCC stage
Survival status Stage I283 (56.3%)
 Alive487 (96.8%) Stage II51 (10.1%)
 Dead16 (3.2%) Stage III112 (22.3%)
Age (year)47.3 ± 15.8 Stage IV55 (10.9%)
Sex Not available2 (0.4%)
 Male135 (26.8%)T classification
 Female368 (73.2%) T1143 (28.4%)
Pharmaceutical treatment T2165 (32.8%)
 No218 (43.3%) T3170 (33.8%)
 Yes17 (3.4%) T423 (4.6%)
 Not available268 (53.3%) TX2 (0.4%)
Radiation treatmentN classification
 No78 (15.5%) N0229 (45.5%)
 Yes156 (31.0%) N1224 (44.6%)
 Not available269 (53.5%) NX50 (9.9%)
Tumor statusM classification
 Tumor free410 (81.5%) M0281 (55.9%)
 With tumor50 (9.9%) M19 (1.8%)
 Not available43 (8.6%) MX212 (42.1%)
Laterality Not available1 (0.2%)
 Left lobe176 (35.0%)Residual tumor
 Right lobe213 (42.3%) RO385 (76.5%)
 Isthmus22 (4.4%) R152 (10.3%)
 Bilateral86 (17.1%) R24 (0.8%)
 Not available6 (1.2%) RX30 (6.0%)
Extrathyroidal extension Not available32 (6.4%)
 None332 (66.0%)Relapse
 Minimal (T3)134 (26.6%) Disease-free443 (88.1%)
 Moderate/advanced (T4a)18 (3.6%) Recurred46 (9.1%)
 Very advanced (T4b)1 (0.2%) Not available14 (2.8%)
 Not available18 (3.6%)

Abbreviation: AJCC stage, American Joint Committee on Cancer pathologic tumor stage.

Baseline Characteristics of Patients with PTC, Using TCGA Data Abbreviation: AJCC stage, American Joint Committee on Cancer pathologic tumor stage.

Correlations Between Key Genes and Clinical Characteristics

The expression levels of the four hub genes were analyzed according to various clinical characteristics, including the age, tumor (T) classification, node (N) classification, metastasis (M) classification, and tumor status at the time of last follow-up. Primarily, the expression levels of the four key genes were significantly lower in patients with PTC, when compared with those in healthy subjects (Figure 4). However, their expression levels did not differ according to the presence of absence of extrathyroidal invasion (Figure 4A). TGFBR3-expression levels were significantly higher in patients with PTC and lymph node metastasis, when compared with those in patients without lymph node metastasis (Figure 4B). The expression levels of the four genes were significantly lower in patients with distant metastasis compared with those in patients without distant metastasis (Figure 4C). PCOLCE2 levels were significantly lower in younger patients (< 55 years) compared with those in older patients (≥ 55 years), as shown in Figure 4D. PTX3-expressions levels were significantly lower in the patients surviving with tumors than in tumor-free patients (Figure 4E).
Figure 4

Expression of four key genes in subgroups of patients with PTC, stratified according to their clinical characteristics. The violin plots present quantitative analysis of the relative expression levels of four key genes in normal individuals and patients with PTC. (A and C) Patients with PTC were divided into two groups according to presence or absence of extrathyroidal invasion (A), lymph node metastasis (B), or distant metastasis (C). (D and E) An age of ≥55 years and survival with tumors were analyzed as group variables for patients with PTC. ns: P > 0.05; *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001.

Expression of four key genes in subgroups of patients with PTC, stratified according to their clinical characteristics. The violin plots present quantitative analysis of the relative expression levels of four key genes in normal individuals and patients with PTC. (A and C) Patients with PTC were divided into two groups according to presence or absence of extrathyroidal invasion (A), lymph node metastasis (B), or distant metastasis (C). (D and E) An age of ≥55 years and survival with tumors were analyzed as group variables for patients with PTC. ns: P > 0.05; *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001.

Development of the Four-Gene Prognostic Signature

The coefficient values and expression levels of the four genes were determined to calculate the risk core for each patient using the following formula: (0.17363 × PAPSS2 level) + (0.00064 × PCOLCE2 level) + (0.07881 × PTX3 level) + (0.14243 × TGFBR3 level). Subsequently, the included 503 TCGA-PTC patients were divided into high-risk (n = 251) and low-risk (n = 252) groups, according to the median risk score of 2.759 (Figure 5A). KM survival-curve analysis showed that the high-risk group had a worse OS than the low-risk group (p = 0.0002; Figure 5B). The calibration plots for 1-, 3-, and 5-year survival predictions showed that the four-gene signature had satisfactory predictive performances for patients with PTC (Figure 5C). Time-dependent ROC-curve analysis was performed to evaluate the sensitivity and specificity of the four-gene risk scores. The AUCs of the four-gene signature were significantly greater than those of the AJCC stage at 1 year (0.86 vs 0.84), 3 years (0.80 vs 0.63), and 5 years (0.79 vs 0.73) (Figure 5D).
Figure 5

Establishment and evaluation of the four-gene signature in TCGA-PTC patients. (A) Distributions of the risk scores and survival data. Patients that were alive or dead at the study endpoint are represented with blue and yellow dots, respectively. (B) KM survival curves of the risk-score model for 503 TCGA-PTC patients. (C) Calibration plot of the four-gene signature for predicting the survival probability at 1, 3, or 5 years after the initial treatment. (D) Time-dependent ROC curves of the risk-score model for predicting 1-, 3-, and 5-year OS rates, compared with the results obtained using the AJCC staging system.

Establishment and evaluation of the four-gene signature in TCGA-PTC patients. (A) Distributions of the risk scores and survival data. Patients that were alive or dead at the study endpoint are represented with blue and yellow dots, respectively. (B) KM survival curves of the risk-score model for 503 TCGA-PTC patients. (C) Calibration plot of the four-gene signature for predicting the survival probability at 1, 3, or 5 years after the initial treatment. (D) Time-dependent ROC curves of the risk-score model for predicting 1-, 3-, and 5-year OS rates, compared with the results obtained using the AJCC staging system.

Four-Gene Signature Interaction Network, Functional Annotation, and Pathway-Enrichment Analysis

To further explore the biological functions of these four genes, the Enrichr tool was utilized to analyze GO functions, KEGG pathways, and WikiPathways enrichment. The top three enriched GO terms and pathways are shown in Figure 6. GO BP analysis showed that the four genes were significantly enriched for “glycoprotein metabolism,” “cardiac muscle cell proliferation,” and “striated muscle cell proliferation.” CC analysis showed that “tertiary granule lumen,” “specific granule lumen,” and “specific granule” were among the most highly enriched subcategories. Enrichment analysis of MF terms revealed that the four key genes were mainly enriched for “type II transforming growth factor beta (TGF-β) receptor binding,” “TGF-β-activated receptor activity,” and “TGF-β receptor activity.” KEGG pathway-enrichment analysis revealed that the four key genes were significantly enriched in terms of “sulfur metabolism,” “selenocompound metabolism,” and “purine metabolism.” WikiPathways analysis revealed an enrichment for the four key genes in terms of the “sulfation biotransformation reaction,” “proteoglycan biosynthesis,” and “hypothesized pathways in pathogenesis of cardiovascular disease” categories.
Figure 6

Enrichment analysis of the four key genes. The bar chart visualizes the top three enriched terms and their P values. The longer the bar, the smaller the P-value.

Enrichment analysis of the four key genes. The bar chart visualizes the top three enriched terms and their P values. The longer the bar, the smaller the P-value.

Construction and Verification of a Predictive Nomogram

Data from 425 TCGA-PTC patients with complete clinical information were used to build a prognostic nomogram. Univariate Cox regression analysis showed that the risk score, age, presence of residual tumors, AJCC stage, and tumor status were significantly associated with patient survival. However, multivariate Cox regression analyses showed that only the risk score, age, and tumor status were independent prognostic factors for patients with PTC (Table 2).
Table 2

Univariate and Multivariate Cox Regression Analysis of TCGA Data from Patients with PTC (n = 425)

CharacteristicsUnivariate CoxMultivariate Cox
HR (95% CI)PHR (95% CI)P
Risk score
 Low11
 High12.814 (1.573–104.38)0.01712.103 (1.226–119.469)0.033
Age (years)
 <5511
 ≥5530.020 (3.671–245.503)0.00229.126 (2.653–319.759)0.006
Sex
 Female1
 Male3.174 (0.789–12.773)0.104
Extrathyroid extension
 None1
 Minimal–advanced1.101 (0.262–4.622)0.895
Laterality
 Unilateral1
 Isthmus0.038 (0–43,447.345)0.646
 Bilateral0.036 (0–838.309)0.518
Residual tumor
 R011
 Non-R05.491 (1.372–21.967)0.0162.941 (0.563–15.358)0.201
AJCC stage
 I + II11
 III + IV4.501 (1.071–18.921)0.0400.122 (0.100–1.556)0.105
Tumor status
 Tumor-free11
 With tumor18.057 (4.305–75.740)<0.00118.822 (1.845–191.987)0.013

Note: Significant p-values (< 0.05) are bolded.

Abbreviations: HR, hazard ratios; CI, confidence intervals; AJCC stage, American Joint Committee on Cancer pathologic tumor stage.

Univariate and Multivariate Cox Regression Analysis of TCGA Data from Patients with PTC (n = 425) Note: Significant p-values (< 0.05) are bolded. Abbreviations: HR, hazard ratios; CI, confidence intervals; AJCC stage, American Joint Committee on Cancer pathologic tumor stage. To develop a quantitative method for determining the prognosis of patients with PTC in clinical settings, we established a nomogram by integrating the age, tumor status, and four-gene signature risk score (Figure 7A). Subsequently, the patients with PTC were divided into groups with a high risk score (n = 212) or a low risk score (n = 213), based on median of the nomogram (score = 68) (Figure 7B). Calibration plots showed that the nomogram performed well in predicting OS in patients with PTC (Figure 7C). The KM curves showed a significant difference in the OS between the two groups. Those with higher scores had a significantly lower OS period (p = 0.00042; Figure 7D). The AUCs of the 3-, 5-, and 8-year OS predictions for the nomogram were 1, 0.97, and 0.94, respectively (Figure 7E).
Figure 7

Construction of a nomogram for assessing survival and associations between the four-gene risk score and clinical characteristics. (A) The nomogram was constructed by combining the observed clinical characteristics with the four-gene risk score. (B) Distribution of the nomogram scores and survival data. Patients that were alive or dead at the study endpoint are represented with blue or yellow dots, respectively. (C) Calibration plot of the nomogram for predicting survival probabilities at 3, 5, or 8 years after treatment. (D) KM survival curves of the nomogram in 415 TCGA-PTC patients. (E) Time-dependent ROC curve of the nomogram for 3, 5, and 8-year OS predictions.

Construction of a nomogram for assessing survival and associations between the four-gene risk score and clinical characteristics. (A) The nomogram was constructed by combining the observed clinical characteristics with the four-gene risk score. (B) Distribution of the nomogram scores and survival data. Patients that were alive or dead at the study endpoint are represented with blue or yellow dots, respectively. (C) Calibration plot of the nomogram for predicting survival probabilities at 3, 5, or 8 years after treatment. (D) KM survival curves of the nomogram in 415 TCGA-PTC patients. (E) Time-dependent ROC curve of the nomogram for 3, 5, and 8-year OS predictions.

Discussion

In this study, 372 overlapping DEGs between THCA tissues and non-tumor tissues were identified from eight GEO datasets after integrating the datasets using the RRA method. Thirty-six prognosis-related genes were sifted out from TCGA datasets using univariable Cox regression, which were then subjected to Lasso regression with 10-fold cross-validation. Finally, we screened four key genes, including PAPSS2, PCOLCE2, PTX3, and TGFBR3, which are known to be involved in regulating cellular metabolism and participate in PTC development. In each case, the four-gene signature risk score was calculated based on the model described above, which was used to stratify the patients into high- or low-risk groups. The four-gene signature was identified as an independent prognostic factor of PTC, according to KM-survival curve analysis and ROC analysis. Furthermore, the four-gene signature was superior to the AJCC staging system in predicting 1-, 3-, and 5-year OS. In addition, the four-gene signature was independent of other clinical factors and performed better in predicting OS when combined with the age and tumor status in a nomogram. Collectively, these findings demonstrate that the four-gene signature can be valuable to patients with PTC and thyroid surgeons because it can help evaluate the risk for tumor-related death after surgical treatment and guide clinical treatment decisions. Bioinformatics analysis has enabled the discovery of new tumor biomarkers, and extensive research has been conducted using microarrays and RNA-sequencing. The results of clinical practice based on gene-expression profiles have shown that high-throughput methods for identifying gene signatures might show promise in helping determine effective therapies.20,21 Several studies have explored the prognostic roles of gene signatures in predicting THCA outcomes.22–25 However, only a few studies focused specifically on the prognosis of PTC.26–29 Ren et al26 identified six genes (CTGF, CYR61, CHRDL1, OGN, FGF13, and CDH3) as a prognostic signature for PTC. Wang et al27 identified eight candidate genes (FN1, CCND1, CDH2, CXCL12, MET, IRS1, DCN, and FMOD) as potential prognostic indicators for PTC. In our study, we identified a four genes (PAPSS2, PCOLCE2, PTX3, and TGFBR3) signature as a robust marker with great potential in risk stratification and OS prediction. The causes leading to the inconsistent results may be due to the inclusion of more gene-expression profiles than that of the previous study and the exclusion of the patients without PTC. Furthermore, another difference between these previous studies and our current study is that we analyzed correlations between the key genes and the clinical characteristics of PTC. Our four-gene signature showed higher 1-, 3-, and 5-year AUCs than the above gene signatures, indicating that our results are more robust and powerful. The four genes identified in this study were downregulated in PTC tissues. The PAPSS2 gene encodes 3′-phosphoadenosine 5′-phosphosulfate synthase 2, which is involved in several biological processes,30 including the sulfation conjugation of xenobiotic compounds.31 PAPSS2 has been reported to be expressed at low levels in prostate cancer32 and colon cancer33 tissues. PCOLCE2 promotes the enzymatic cleavage of type-I procollagen to yield mature, structured fibrils. Previous data showed that PCOLCE2 was mainly expressed in the adult heart and developing cartilage tissues.34 However, Wu et al35 reported lower PCOLCE2 expression in patients with nasopharyngeal carcinoma than in non-tumor populations. PTX3 has been reported as a key homeostatic component that functions at the crossroads between innate immunity, inflammation, tissue repair, and cancer.36 PTX3 was identified as a component of the humoral arm of innate immunity and an extrinsic tumor-suppressor gene that tames tumor-promoting inflammation.37 As a tumor-suppressor gene, TGFBR3 has been related to the development of various cancers. The loss of TGFBR3 expression promoted the progression of pancreatic cancer,38 hepatocellular cancer,39 and ovarian cancer.40 Similar to the findings of previous studies, the decreased expression of these four genes were confirmed in our study by integrated bioinformatics analysis in tumor tissues from patients with PTC. Our results indicated that low expression of these four genes may be related to PTC development. To understand the potential mechanisms whereby the four-gene signature affected the prognosis of patients with PTC, we analyzed the correlations between their expression levels and the associated clinical characteristics. No differences were found in their expression levels between patients with or without extrathyroid extension. However, in patients with distant metastasis, the expression levels of the four genes were significantly lower, compared with those in patients with no evidence of distant metastasis. These results indicate that these four genes might relate to the molecular mechanism of distant metastasis of PTC cells, but not local invasion. Furthermore, our results show that lower PCOLCE2 expression was associated with an earlier age of PTC onset, suggesting that the loss of PCOLCE2 gene expression might accelerate tumorigenesis. In addition, PTX3 expression were significantly lower in patients surviving with tumors than in tumor-free patients, revealing a potential correlation between PTX3 and tumor recurrence. GO analysis showed that the four genes were significantly enriched for the “glycoprotein metabolism,” “tertiary granule lumen,” and “TGF-β receptor binding” terms. WikiPathways and KEGG-pathway analyses revealed that the four genes were significantly enriched for “sulfur metabolism” and “sulfation biotransformation reaction.” These enriched signaling pathways serve vital catalytic roles in the development and progression of malignant disease. Changes in protein glycosylation are increasingly being recognized as important modifications associated with cancer etiology. Mammadova–Bach et al41 reported that the genetic deficiency of platelet glycoprotein VI in mice correlated with decreased experimental and spontaneous metastasis of colon and breast cancer cells. TGF-β signaling has been associated with the progression of many cancers, such as colorectal cancer42 and breast cancer.43 The findings of earlier studies suggest that sulfur metabolism constituted the cellular antioxidant system, mediated intercellular and intracellular signaling, and facilitated the epigenetic regulation of gene expression, all of which can contribute to tumorigenesis.44 These enriched pathways of the four genes in patients with PTC might provide insight into the molecular mechanisms underlying poor prognosis in the high-risk group (with low expression of the four genes). In this study, we not only demonstrated the validity and applicability of the four-gene signature for determining the prognosis of patients with PTC through multiple bioinformatics analysis, but we also analyzed relationships between the signature and clinicopathological features. However, this study also had several limitations. First, this study was a retrospective analysis of data deposited in public databases; thus, the potentials for selection bias and confounding bias were inevitable. Second, additional in vitro and in vivo functional experiments need to be performed to further understand the biological roles of the four-gene signature in PTC. Therefore, additional work is required to further explore the potential relationship between this four-gene signature and PTC.

Conclusion

In conclusion, we identified a prognostic four-gene signature via comprehensive bioinformatics analysis with GEO and TCGA-THCA cohorts. The four-gene signature could be used to effectively stratify patients with PTC into high- and low-risk groups and independently predict their OS. Our findings indicate that the four-gene signature may help facilitate personalized cancer management in clinical settings. We therefore recommend using this classifier as a molecular diagnostic test to evaluate the prognostic risk in patients with PTC.
  44 in total

1.  Cancer Statistics, 2021.

Authors:  Rebecca L Siegel; Kimberly D Miller; Hannah E Fuchs; Ahmedin Jemal
Journal:  CA Cancer J Clin       Date:  2021-01-12       Impact factor: 508.702

2.  Concomitant RAS, RET/PTC, or BRAF mutations in advanced stage of papillary thyroid carcinoma.

Authors:  Minjing Zou; Essa Y Baitei; Ali S Alzahrani; Faisal S BinHumaid; Dania Alkhafaji; Roua A Al-Rijjal; Brian F Meyer; Yufei Shi
Journal:  Thyroid       Date:  2014-06-10       Impact factor: 6.568

Review 3.  Papillary thyroid cancer: strategies for optimal individualized surgical management.

Authors:  Clive S Grant
Journal:  Clin Ther       Date:  2014-05-01       Impact factor: 3.393

Review 4.  Sulfur metabolism and its contribution to malignancy.

Authors:  Nathan P Ward; Gina M DeNicola
Journal:  Int Rev Cell Mol Biol       Date:  2019-06-17       Impact factor: 6.813

5.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

6.  limma powers differential expression analyses for RNA-sequencing and microarray studies.

Authors:  Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth
Journal:  Nucleic Acids Res       Date:  2015-01-20       Impact factor: 16.971

7.  Decreased expression of the type III TGF-β receptor enhances metastasis and invasion in hepatocellullar carcinoma progression.

Authors:  Sen Zhang; Wu-Yi Sun; Jing-Jing Wu; Yuan-Jing Gu; Wei Wei
Journal:  Oncol Rep       Date:  2016-02-08       Impact factor: 3.906

8.  Tall Cell Variant versus Conventional Papillary Thyroid Carcinoma: A Retrospective Analysis in 351 Consecutive Patients.

Authors:  Alessandro Longheu; Gian Luigi Canu; Federico Cappellacci; Enrico Erdas; Fabio Medas; Pietro Giorgio Calò
Journal:  J Clin Med       Date:  2020-12-28       Impact factor: 4.241

9.  FUT8 promotes breast cancer cell invasiveness by remodeling TGF-β receptor core fucosylation.

Authors:  Cheng-Fen Tu; Meng-Ying Wu; Yuh-Charn Lin; Reiji Kannagi; Ruey-Bing Yang
Journal:  Breast Cancer Res       Date:  2017-10-05       Impact factor: 6.466

10.  Identification of Potential Biomarkers for Thyroid Cancer Using Bioinformatics Strategy: A Study Based on GEO Datasets.

Authors:  Yujie Shen; Shikun Dong; Jinhui Liu; Liqing Zhang; Jiacheng Zhang; Han Zhou; Weida Dong
Journal:  Biomed Res Int       Date:  2020-04-01       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.