Qunguang Jiang1, Wenqian Feng2, Chengfeng Xiong3, Yunxia Lv3. 1. Department of Gastrointestinal Surgery, First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, P.R. China. 2. Department of Operating Room, Nanchang University Second Affiliated Hospital, Nanchang, Jiangxi 330006, P.R. China. 3. Department of Thyroid Surgery, Nanchang University Second Affiliated Hospital, Nanchang, Jiangxi 330006, P.R. China.
Papillary thyroid carcinoma (PTC) is one of the most prevalent thyroid malignancies; the morbidity and mortality ranks ninth and sixth, respectively, with 567,000 new cases diagnosed and 41,000 cancer-related deaths in 2018, worldwide (1). In recent years, the widespread application of thyroid ultrasonography and ultrasound-guided fine needle aspiration has improved the diagnosis of PTC in an asymptomatic population (2). Most patients have PTC that is curable, with a 5-year survival rate >95%. However, a few cases of PTC might occur with invasive features such as tumor recurrence, cervical lymph node involvement and/or distant metastasis, which are associated with poor prognosis (3,4). Therefore, biomarkers that can predict poor outcomes are urgently needed as these would assist clinicians in taking early and appropriate measures for optimal therapeutic benefit.Current studies have indicated the involvement of various molecular events in the deterioration of PTC (5,6), including BRAF and RAS mutations, rearrangements of RET protooncogene and the excessive activation of the mitogen-activated protein kinase signal transduction pathway (7–10). To date, multiple microRNAs (miRs) and long non-coding RNAs (lncRNAs) have been identified as potential molecular biomarkers, including miR-146b-5p and lncRNA HOTTIP, which have been confirmed to participate in the invasion and metastasis of PTC (11–14). Nevertheless, the majority of these findings are derived from preclinical studies, and large-scale studies of prognostic biomarkers of PTC are lacking.The development of bioinformatic analysis and the emergence of public database sorting have facilitated the identification of prognostic biomarkers of multiple types of cancer based on large cohorts, including kinesin family member 4A for colorectal cancer (15), miR-106a in breast cancer (16) and kirre-like nephrin family adhesion molecule 1 in melanoma (17). In the present study, the Gene Expression Omnibus (GEO) database (18) was used to mine differentially expressed genes (DEGs) between PTC tissues and adjacent normal tissues of PTC. Subsequently, multiple bioinformatics methods, including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genome (KEGG) enrichment analysis, construction of a protein-protein interaction (PPI) network and network topological analysis were employed to identify hub genes in DEGs which potentially related to the initiation and progression of PTC. In addition, association of the expression of hub genes with prognosis as well as clinical stage of patients with PTC was validated using The Cancer Genome Atlas (TCGA) database and survival analysis.
Materials and methods
Microarray data
In total, 4 gene expression datasets of PTC [GSE3678 and GSE3467 (19), GSE60542 (20), and GSE97001 (21)] were downloaded from the GEO database. The GSE3678 dataset contained 7 PTC samples and 7 paired adjacent normal thyroid tissues. The GSE3467 dataset was comprised of 9 PTC samples and 9 paired adjacent normal thyroid tissues. The GSE60542 dataset included 33 PTC samples and 29 paired adjacent normal thyroid tissues. GSE97001 was composed of 4 PTC samples and 4 paired adjacent normal thyroid tissues.
DEG analysis
The R language limma package (version 3.4.2) (22) (http://www.bioconductor.org/packages/release/bioc/html/limma.html) was used to analyze the DEGs among samples. The adjusted P<0.05 and |log2fold change|>1 were taken as the cut-off values. Subsequently, the 4 expression datasets were integrated using the Robust Rank Aggregation (RRA) method (https://cran.r-project.org/web/packages/RankAggreg/RankAggreg.pdf).
GO and KEGG enrichment analyses of DEGs
To assess the potential molecular mechanisms of DEGs in PTC, the Database for Annotation, Visualization and Integrated Discovery (DAVID) online tool (23) was utilized for GO analysis (24) to annotate genes and to define the functions of DEGs into 3 domains: Cellular component, biological process and molecular function. Subsequently, KEGG enrichment analysis was conducted to understand the signaling pathways in which DEGs genes were involved (25).
PPI network construction and topological analysis
The DEG-related PPI network was established using the STRING database (26), followed using visualization by Cytoscape software (version 3.6.1; http://github.com/cytoscape/cytoscape/releases/3.6.1/). Significant genes in the network were subsequently filtered using the Cytoscape plugin cytoHubba (27), which sorts the nodes in the network in accordance with the topological parameters of each node. The hub genes were selected from the intersection of the top 55 DEGs from 12 algorithms by cytoHubba (28).
Validation of DEGs and association with clinicopathological characteristics using a TGCA dataset
Clinical data and RNA-sequencing data on patients with thyroid cancer were downloaded from TCGA database (29). There were 510 thyroid cancer (THCA) samples and 58 normal samples in the TCGA-THCA cohort. Pathological classifications of non-papillary carcinoma specimens were first excluded, and samples with pathological parameters and follow-up data remained, leaving 493 patients with PTC and 58 normal thyroid samples. Additionally, the expression value of the RNA sequencing data, downloaded as transcripts per million (TPM), was analyzed. The coexpression network of hub genes was identified base on STRING network and CytoScape software was in cBioportal (http://www.cbioportal.org/).
Statistical analysis
R software version 3.4.2 (R Foundation) or GraphPad Prism version 7.0 software (GraphPad, Inc.) were used for statistical analysis. The differential expression of hub genes between normal and PTC tissue were analysed using the Wilcoxon test, and the Kaplan-Meier curves and log-rank test were used for survival analysis of hub genes. P<0.05 was considered to indicate a statistically significant difference. The ‘Survminer’ package (https://cran.r-project.org/web/packages/survminer/index.html) for R was applied to determine the optimal cut-off value to divide patients into 2 groups based on either high or low gene expression, based on receiver operating characteristic. Pearson's χ2 test or Fisher's exact test were employed to determine the association between DEGs and clinicopathological characteristics of patients with PTC. Moreover, overall survival (OS) and disease-free survival (DFS) were calculated using the R package survival (https://cran.r-project.org/web/packages/survival/index.html). The cut-off value of significant GO terms and KEGG pathway was P<0.05. Univariate Cox proportional hazard models were utilized to explore the prognostic value of the significant DEGs. Subsequently, multivariate Cox regression was used for the identification of independent predictors of PTC.
Results
Identification of DEGs between PTC and paracancerous tissue A total of 4 datasets were identified in the GEO database: GSE3678, GSE3467, GSE60542 and GSE97001
The characteristics of these studies are presented in Table I. A total of 134 upregulated genes and 106 downregulated genes were identified using the RRA method. The top 50 most significantly upregulated and downregulated genes are clustered in a heatmap (Fig. 1).
Table I.
Characteristics of the four GEO datasets.
GEO ID
Platform
Normal samples, n
Tumor samples, n
GSE3678
GPL570
7
7
GSE3467
GPL570
9
9
GSE60542
GPL570
29
33
GSE97001
GPL10332
4
4
GEO, Gene Expression Omnibus.
Figure 1.
Identification of DEGs in the papillary thyroid carcinoma mRNA expression profiling datasets GSE3678, GSE3467, GSE60542 and GSE97001. The top 50 most significantly upregulated and downregulated DEGs were clustered in a heatmap. Red represents higher expression and green represents lower expression. The values represented are logFC values. DEGs, differentially expressed genes; FC, fold change.
Pathway enrichment analysis using GO and KEGG
GO is considered as the international standard classification system for gene function. A total of 240 DEGs were identified. As shown in Fig. 2A, ‘Cellular oxidant detoxification’, ‘extracellular exosome’, ‘cell adhesion’, ‘extracellular matrix’, ‘collagen binding’ and ‘protease binding’ was enrichment based on GO analysis. KEGG analysis was undertaken using the DAVID tool, which revealed the most significantly enriched pathways included thyroid hormone synthesis, pathways in cancer, focal adhesion, metabolic pathways, apoptosis and the PPAR and PI3K/AKT signaling pathways in KEGG (Fig. 2B).
Figure 2.
GO terms and KEGG pathways enriched by DEGs. The y-axes correspond to the (A) GO terms or (B) KEGG pathways, and the x-axes show the enrichment factor. The color of the dot represents the P-value, and the size of the dot represents the number of DEGs mapped to the reference GO terms or pathways. DEGs, differentially expressed genes; GO, gene ontology; KEGG, Kyoto Encyclopedia of Genes and Genome.
PPI network analysis
A PPI network of DEGs was first constructed using the STRING database, followed by data visualization by Cytoscape (Fig. 3A). To further mine the significant nodes in the network, topological analysis of each node was performed using the Cytoscape plugin cytoHubba. The 6 most significant genes identified in the PPI network were apolipoprotein E (APOE), hemoglobin subunit α1 (HBA1), angiotensin II receptor type 1 (AGTR1), collagen type I α1 (COL1A1), galectin 3 (LGALS3) and TIMP metallopeptidase inhibitor 1 (TIMP1). The co-expression network of these 6 hub genes based on the TCGA-THCA cohort dataset was constructed using the cBioportal tool (http://www.cbioportal.org/) (Fig. 3B).
Figure 3.
PPI network construction and topological analysis. (A) PPI network of differentially expressed genes. Red stands for upregulated genes, while green stands for downregulated genes. (B) Topological analysis of each node showing the genes in the context of biological interactions derived from public pathway databases. Dots with black borders represent significant genes in the PPI network. The arrows represent direction in signaling pathways. PPI, protein-protein interaction.
Verification of the association of 6 hub genes with prognosis for patients with PTC
A total of 493 patients with PTC and 58 normal thyroid samples were identified in the TCGA-THCA dataset. To validate the reliability of the 6 significant genes identified in the GEO database, the expression level of these genes was assessed in the TCGA database by including 493 PTC samples and 58 normal thyroid tissues. As shown in Fig. 4, the expression trends of these genes were in line with previous GEO analysis (Fig. 1). To identify the association between the 6 hub genes and prognosis for patients with PTC, samples from the TCGA database were further categorized into high and low expression groups. As shown in Fig. 5, the expression levels of 4 genes, APOE, COL1A1, LGALS3 and TIMP1, were significantly associated with OS. As shown in Fig. 6, all 6 significant genes were associated with DFS in patients with PTC. Among these genes, APOE had the most significant association with OS (P=0.00067) and DFS (P=0.00220), suggesting that APOE may play a critical role in PTC progression and is significantly associated with prognosis.
Figure 4.
Relative mRNA expression level of hub genes in the TCGA cohort. To compare the expression of (A) APOE, (B) HBA1, (C) AGTR1, (D) COL1A1, (E) LGALS3 and (F) TIMP1 genes in papillary thyroid carcinoma (red) and normal tissues (blue). Statistical differences were analyzed using the Wilcoxon test. ****P<0.0001. APOE, apolipoprotein E; HBA1, hemoglobin subunit α1; AGTR1, angiotensin II receptor 1; COL1A1, collagen type I α1; LGALS3, galectin 3; TIMP1, TIMP metallopeptidase inhibitor 1; TPM, transcripts per kilobase million.
Figure 5.
Kaplan-Meier survival analysis of the association between hub gene expression level and OS in patients with PTC. Differences in OS in PTC patients with high and low expression of (A) APOE, (B) HBA1, (C) AGTR1, (D) COL1A1, (E) LGALS3 and (F) TIMP1. PTC, papillary thyroid carcinoma; OS, overall survival; APOE, apolipoprotein E; HBA1, hemoglobin subunit α1; AGTR1, angiotensin II receptor 1; COL1A1, collagen type I subunit α1; LGALS3, galectin 3; TIMP1, TIMP metallopeptidase inhibitor 1.
Figure 6.
Kaplan-Meier survival analysis of the association between significant gene expression and DFS in patients with PTC. Differences in DFS in PTC patients with high and low expression of (A) APOE, (B) HBA1, (C) AGTR1, (D) COL1A1, (E) LGALS3 and (F) TIMP1. PTC, papillary thyroid carcinoma; DFS, disease-free survival; APOE, apolipoprotein E; HBA1, hemoglobin subunit α1; AGTR1, angiotensin II receptor 1; COL1A1, collagen type I α1; LGALS3, galectin 3; TIMP1, TIMP metallopeptidase inhibitor 1.
Association between APOE expression and clinicopathological features of patients with PTC
To further demonstrate the importance of APOE in PTC and patient prognosis, the association of APOE expression with clinicopathological features of patients with PTC was analyzed using the TCGA dataset. As presented in Table II, low expression of APOE was significantly associated with older age (P<0.001) and more advanced TNM stage (P<0.001). The results revealed that gender, lymph node involvement, distant metastasis and cancer multifocality were not significantly associated with APOE expression. Finally, univariate and multivariate Cox proportional hazard analyses were conducted. As shown in Table III, univariate analysis revealed that the mRNA expression of APOE (P=0.002) was closely associated with TNM stage (P=0.001), while multivariate analysis suggested that APOE could be an independent prognostic factor for patients with PTC.
Table II.
Association between the expression of APOE and the clinicopathological characteristics of patients with papillary thyroid carcinoma.
PTC is the most prevalent malignancy in the endocrine system, with a rapid increase in morbidity rate in past two decades (1). Despite the relatively good 5-year survival rate of patients with PTC, tumor recurrence and lymph node metastasis are common, which might impact the OS and DFS of patients with PTC (30,31).In the present study, DEGs between neoplastic and adjacent tissues of PTC were analyzed using the GEO database. Consequently, 134 upregulated and 106 downregulated DEGs were identified. GO and KEGG enrichment analysis, establishment of PPI network and network topological analysis were utilized to identify genes potentially related to the initiation and development of PTC. It can be hypothesized that most of the DEGs identified in the present study were enriched in multiple cancer initiation and development-related pathological processes and signaling pathways based on GO and KEGG enrichment analysis from previous research (32–37). Recent evidence from network biology suggests that genes and proteins do not function in isolation. Instead, they function in interconnected pathways and molecular networks on multiple levels (38). In a previous study the properties and behavior of biological molecules were also identified using the PPI network, suggesting this is an important tool (39).In total, 6 significant hub genes were identified among the DEGs: APOE, HBA1, AGTR1, COL1A1, LGALS3 and TIMP1. Of these 6 genes, HBA1, AGTR1, COL1A1 and TIMP1 have previously been reported to participate in the pathogenesis, development, malignant transformation and pathological process of PTC, and they have been significantly associated with patient survival and prognosis (40–43). These previous studies have confirmed the reliability and accuracy of the bioinformatic mining results in the present study.APOE is a well-known apolipoprotein that functions in lipoprotein-mediated lipid transport between organs via the plasma and interstitial fluids (44). As a secreted protein, APOE has been implicated in lipoprotein metabolism and the pathogenesis of Alzheimer's disease and atherosclerosis (45–47). However, its association with tumors has rarely been reported. In recent years, an additional role of APOE has been identified in the pathogenesis of tumor metastasis (48,49). APOE can inhibit the invasiveness of melanoma cells and the recruitment of endothelial cells, potentially functioning as a vital barrier to metastatic colonization (50,51). APOE has also been demonstrated to be involved in innate immune modulation, which could be a target to enhance the efficacy of cancer immunotherapy in patients (52). Moreover, the abnormal expression of APOE in lung, colorectal and gastric cancer has been reported (53–56); however, the association between APOE and PTC has not been investigated and should be the basis of future research.At present, the association of the expression of numerous genes with the survival of patients with PTC is increasingly being mined. The majority of present studies are validated in animal tumor models, in vitro cell models or small-scale clinical studies (57,58). However, large-scale studies with more comprehensive analysis and larger cohorts are required due to the complexity of PTC. Fortunately, the rapid development of genome-wide sequencing has led to the development of publicly-available high-throughput tumor databases such as GEO and TCGA, which have made it possible to conduct bioinformatic analysis of large datasets of patients with PTC.
Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock Journal: Nat Genet Date: 2000-05 Impact factor: 38.330
Authors: Huiling He; Krystian Jazdzewski; Wei Li; Sandya Liyanarachchi; Rebecca Nagy; Stefano Volinia; George A Calin; Chang-Gong Liu; Kaarle Franssila; Saul Suster; Richard T Kloos; Carlo M Croce; Albert de la Chapelle Journal: Proc Natl Acad Sci U S A Date: 2005-12-19 Impact factor: 11.205