Literature DB >> 31966069

Integrated bioinformatics analysis to screen hub genes in the lymph node metastasis of thyroid cancer.

Si Lu1, Rongjie Zhao2, Jie Shen3, Yu Zhang4, Jingjing Shi4, Chenke Xu5, Jiali Chen2, Renbin Lin6, Weidong Han2, Dingcun Luo4.   

Abstract

Thyroid cancer (TC) is one of the most common types of malignancy of the endocrine-system. At present, there is a lack of effective methods to predict neck lymph node metastasis (LNM) in TC. The present study compared the expression profiles from The Cancer Genome Atlas between N1M0 and N0M0 subgroups in each T1-4 stages TC in order to identify the four groups of TC LNM-associated differentially expressed genes (DEGs). Subsequently, DEGs were combined to obtain a total of 493 integrated DEGs by using the method of Robust Rank Aggregation. Furthermore, the underlying mechanisms of LNM were investigated. The results from Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses demonstrated that the identified DEGs may promote LNM via numerous pathways, including extracellular matrix-receptor interaction, PI3K-AKT signaling pathway and focal adhesion. Following construction of a protein-protein interaction network, the significance score for each gene was calculated and seven hub genes were screened, including interleukin 6, actinin α2, collagen type I α 1 chain, actin α1, calbindin 2, thrombospondin 1 and parathyroid hormone. These genes were predicted to serve crucial roles in TC with LNM. The results from the present study could therefore improve the understanding of LNM in TC. In addition, the seven DEGs identified may be considered as potential novel targets for the development of biomarkers that could be used in the diagnosis and therapy of TC. Copyright: © Lu et al.

Entities:  

Keywords:  Robust Rank Aggregation; bioinformatics analysis; hub gene; lymph node metastasis; thyroid cancer

Year:  2019        PMID: 31966069      PMCID: PMC6956406          DOI: 10.3892/ol.2019.11188

Source DB:  PubMed          Journal:  Oncol Lett        ISSN: 1792-1074            Impact factor:   2.967


Introduction

Thyroid cancer (TC) is a common endocrine malignancy for all age groups worldwide and affects three times as many women as men (1). In the past decade, the incidence of TC has rapidly increased in numerous countries, particularly the USA, Croatia, Korea and China (2). In 2018, an estimated 562,000 patients with TC were newly diagnosed in 185 countries (3). The most common pathological type of TC is papillary thyroid carcinoma (PTC), which accounts for >90% of all cases (4,5). Although the 10-year survival rate of TC is 90% (6,7), lymph node metastasis (LNM) occurs in 24–63% of patients with TC and is considered as an independent risk factor for recurrence and distant metastases (6–11). However, to the best of our knowledge, the fundamental pathophysiology of LNM in TC has not been completely elucidated, which is a major obstacle for clinical diagnosis and treatment. It is therefore crucial to identify the key genes associated with LNM in TC. In the last decade, microarray technology has been widely applied to identify TC-associated genes, including those involved in LNM. For example, high expression levels of Cbp/P300 interacting transactivator with Glu/Asp rich carboxy-terminal domain 1 and γ-aminobutyric acid type A receptor β 2 have been identified as promoters of PTC migration (12,13). However, to obtain a more accurate result, the present study performed a subgroup analysis according to T stage, including T1N0M0 vs. T1N1M0, T2N0M0 vs. T2N1M0, T3N0M0 vs. T3N1M0 and T4N0M0 vs. T4N1M0 comparisons, by using the novel stable analysis method Robust Rank Aggregation. The present study used the expression profiling data from The Cancer Genome Atlas (TCGA) database (14) to identify the DEGs between N1M0 and N0M0 in each individual T1-4 stage of TC. Combined with the results of Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and protein-protein interaction (PPI) network analyses, the hub genes and pathways involved in TC with LNM were identified.

Materials and methods

Identification of DEGs

The present study analyzed the RNA-sequencing (RNA-seq) data from TC and normal tissues that were downloaded from the publicly available TCGA database (https://cancergenome.nih.gov). The RNA-seq data were from 568 samples, including 510 TC samples and 58 normal samples. A total of 276 TC samples with T1-4N0-1M0 information were selected. N0 and M0 represent patients with TC without regional nodes and distant metastasis, respectively. Similarly, N1 and M1 represent patients with TC with regional nodes and distant metastasis, respectively (14). According to different T and N status, the samples were divided into 8 subgroups, including T1N0M0 vs. T1N1M0, T2N0M0 vs. T2N1M0, T3N0M0 vs. T3N1M0 and T4N0M0 vs. T4N1M0 (Fig. 1), and made differentially expressed analyses using ‘edgeR’ package (15). Genes that met the conditions of log2 fold change [log fold change (FC)] ≥0.4 and P<0.05 were considered as DEGs. Subsequently, the variable DEGs lists of the different groups were integrated using probabilistic models and a significance score for each gene was calculated using RobustRankAggreg (RRA) package (16) in order to identify the integrated DEGs (iDEGs). Due to the limited size of gene symbol and figure, only the top 40 genes were presented. In addition to the RRA method, Venn diagram was used to determine an intersection among genes in Bioinformatics and Evolutionary Genomics (http://bioinformatics.psb.ugent.be/webtools/Venn/).
Figure 1.

Study flowchart. DEG, differentially expressed gene; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; PPI, protein-protein interaction.

GO and KEGG enrichment analysis of the DEGs

GO (http://geneontology.org; release 2019-10-07) terms were used to describe the biological process (BP) of genes. The iDEGs list was uploaded to the Database for Annotation, Visualization and Integrated Discovery (https://david.ncifcrf.gov/) (17) for GO enrichment analysis. The results were visualized in the GOplot package (18). KEGG (https://www.genome.jp/kegg; release 92.0), which is a comprehensive knowledge database, can be used for functional interpretation and application of genomic information. In the present study, KEGG pathway analysis was performed using the clusterProfiler package (19). P<0.05 was considered to indicate a statistically significant difference.

Construction and assessment of a PPI network

The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database (https://string-db.org/) is an online server used for evaluating the interactions among proteins, including 9,643,763 proteins from 2,031 organisms (20). To assess the interactions among DEGs, DEGs were mapped into STRING in order to construct a PPI network. Only the interactions with a combined score >0.4 were considered as significant. The PPI network was visualized using Cytoscape software (version 3.5.1) (21). The weight of each gene was ranked using the cytoHubba plug-in (22). The top 30 genes identified in the 12 topological algorithms were used for interaction analysis. Only genes that overlapped in at least nine algorithms were identified as hub genes. In addition, overall survival analysis for all the seven hub genes (IL6, ACTN2, COL1A1, ACTA1, CALB2, THBS1 and PTH) was performed in 276 patients with TC using Kaplan Meier method and the statistical significance was evaluated using the log-rank test. Median expression level was defined as the cut-off value, above this value was defined as high expression, and below this value was defined as low expression.

Analysis of hub genes

To verify the value of the seven aforementioned hub genes, multivariable logistic regression analysis was used to construct a predictive model of LNM in TC and the different T stages. The response variable of this model was the status of lymph node metastasis, with 1 representing LNM and 0 representing LNM, whereas predictive variables were the expression levels of seven hub genes. The prediction efficiency of this model was assessed by determining the area under curve (AUC) of receiver operating characteristic (ROC) curves using the pROC package (23).

Results

Overview of the expression levels of the DEGs

In order o identify gene signatures present during the development of TC, the present study compared the expression profiles between TC, LNM and non-metastasis samples in T1-4 disease stages. Based on the criteria of P<0.05 and logFC ≥0.4, the present study identified 2,369 DEGs (1,057 upregulated and 1,312 downregulated) in T1N0M0 vs. T1N1M0, 2,581 DEGs (968 upregulated and 1,613 downregulated) in T2N0M0 vs. T2N1M0, 1,394 DEGs (872 upregulated and 522 downregulated) in T3N0M0 vs. T3N1M0 and 1,281 DEGs (889 upregulated and 392 downregulated) in T4N0M0 vs. T4N1M0 (Fig. 2). According to the significance score for each gene, a total of 493 iDEGs were eventually identified, including 246 upregulated genes and 247 downregulated genes. Among these iDEGs, details of the top 20 upregulated (Table I) and downregulated genes (Table II) are presented (Fig. 3). In addition, by using the Venn diagram method, six upregulated genes and 24 downregulated genes were identified (Fig. S1A and B). These genes have all been contained in RRA result (Fig. S1C and D). These results indicated that LNM process in TC may involve the interaction of multiple genes.
Figure 2.

Volcano plots of DEGs obtained from the T1-T4 subgroups. (A) DEGs between T1N0M0 and T1N1M0. (B) DEGs between T2N0M0 and T2N1M0. (C) DEGs between T3N0M0 and T3N1M0. (D) DEGs between T4N0M0 and T4N1M0. Red indicates upregulation in LNM and green indicates downregulation in LNM. DEGs, differentially expressed genes; FC, fold change; LNM, lymph node metastasis.

Table I.

Top 20 upregulated integrated differentially expressed genes identified using the robust rank aggregation method.

Gene symbolLogFCP-value
MYH24.96499808.87×107
MYL14.21576003.55×106
IL36G2.17863124.56×106
XIRP23.86840341.42×105
NTRK13.23820821.50×105
SLN3.06145002.43×105
DSG33.15333123.24×105
HP2.24610894.03×105
PLEKHS12.50551464.38×105
TCN13.10835328.73×105
KTR6B3.84723059.31×105
SFTPA13.30750950.000105
CTAG22.80399480.000127
MYH13.03192290.000127
MYL22.98436670.000173
CKM2.85260710.000199
ACTA12.66476530.000226
SERPINB32.35972340.000229
LUZP21.88857310.000236
CSRP33.20793340.000255

FC, fold change.

Table II.

Top 20 downregulated integrated differentially expressed genes identified using the robust rank aggregation method.

Gene symbolLogFCP-value
CALCA−5.19639563.58×10−10
MT1G−3.09754834.64×10−7
SOX14−3.56522971.42×10−5
MT1H−2.85512292.43×10−5
ARSF−3.18732802.43×10−5
AC009014.1−2.50093192.53×10−5
LMNTD1−1.91693324.03×10−5
MTRNR2L1−2.50389584.03×10−5
GATA5−1.93205124.38×10−5
PAX1−2.13914955.13×10−5
CHGA−2.39579415.66×10−5
LGALS13−1.75032458.85×10−5
TRPV6−1.65449890.000106
ZNF536−2.59531150.000137
OGDHL−1.36206720.000214
MT31.25919340.000220
NR2E1−2.57903650.000255
KIF1A−2.25693540.000286
KCNK16−1.60637560.000293
HHATL−1.68166660.000299

FC, fold change.

Figure 3.

Top 20 upregulated iDEGs and top 20 downregulated iDEGs. Red indicates upregulation and green indicates downregulation in LNM. Dimensions and extrathyroidal extension of tumor determine the classification of T status. iDEGs, integrated differentially expressed genes; T, tumor (14).

Enrichment analysis of iDEGs

In order to further investigate the potential mechanisms of LNM in TC, GO functional annotation and KEGG pathway enrichment analyses were performed. As presented in Fig. 4A, numerous biological processes were identified, including ‘cell adhesion’, ‘positive regulation of cell migration’, ‘cellular response to tumor necrosis factor’ and ‘positive regulation of MAPK cascade’. The results from KEGG pathway enrichment analyses demonstrated that the DEGs were predominantly involved in the pathways ‘PI3K-Akt signaling pathway’, ‘protein digestion and absorption’, ‘focal adhesion’ and ‘ECM-receptor interaction’ (Fig. 4B). These results indicated that the iDEGs were associated with LNM and require further investigation.
Figure 4.

Results of GO enrichment analysis and KEGG pathway analysis. (A) GO enrichment analysis. Different colors on the right side of the image represent different GO terms, whereas the left side of the image represents the expression levels of the genes. Lines represent the associations between genes and corresponding biological processes. (B) KEGG pathway analysis. Size of the points represents the count of genes enriched in each pathway. Colors of the point represent the P-value. FC, fold change; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.

PPI network and hub genes identification

A PPI network was constructed using the STRING database. The network consisted of 988 edges and 483 nodes (Fig. 5A). According to the significance score of each node, seven hub genes were identified using the cytoHubba plug-in. These genes included the upregulated genes interleukin 6 (IL6), actinin α2 (ACTN2), collagen type I α 1 chain (COL1A1), actin α1 (ACTA1), calbindin 2 (CALB2) and thrombospondin 1 (THBS1), and the downregulated gene parathyroid hormone (PTH; Fig. 5B).
Figure 5.

PPI network for integrated differentially expressed genes and hub genes analyses. (A) PPI network. Circles represent proteins and lines represent interactions between proteins. The thickness of the line represents the significance of the interaction. (B) List of the hub genes. ACTA1, actin α1; ACTN2, actinin α2; CALB2, calbindin 2; COL1A1, collagen type I α 1 chain; FC, fold change; IL-6, interleukin 6; PPI, protein-protein interaction; PTH, parathyroid hormone; THBS1, thrombospondin 1.

To validate the predictive power of the seven identified hub genes, ROC curve analysis for each T-stage subgroup and total ROC curve analysis for all samples were performed (due to the small sample size of the T4 subgroup, the ROC curve could not be presented). The AUC values for each subgroup were 0.651 (T1), 0.694 (T2), 0.676 (T3) and 0.652 (all samples; Fig. 6). These results indicated that the seven hub genes may serve crucial role in the prediction of LNM in TC. In addition, among these hub genes, high expression of ACTN2 indicated a worse prognosis for patients with TC (Fig. S2). To the best of our knowledge, the present study was the first to investigate the prognosis value of hub genes in TC.
Figure 6.

ROC curves for the seven hub genes to predict LNM in TC obtained from the T1-T3 subgroups and all-samples group. (A) ROC curve for predicting LNM of TC had an AUC of 0.651 in the T1 group. (B) ROC curve for predicting LNM of TC had an AUC of 0.694 in the T2 group. (C) ROC curve for predicting LNM of TC had an AUC of 0.676 in the T3 group. (D) ROC curve for predicting LNM of TC had an AUC of 0.652 in the all-samples group. AUC, area under the curve; LNM, lymph node metastasis; ROC, receiver operating characteristic; TC, thyroid cancer.

Discussion

In the present study, 493 iDEGs were identified from four subgroups of TC using the RRA method. Compared with the results from the Venn diagram, all genes were included in RRA result. These iDEGs were predominantly enriched in the pathways ‘PI3K-Akt signaling pathway’, ‘focal adhesion’, ‘ECM-receptor interaction’ and the ‘calcium signaling pathway’ (24–26). A total of 7 hub genes were identified from the PPI network, which was composed of 493 iDEGs. ROC curve analysis revealed that the combination of these 7 hub genes was a good predictive model for LNM in TC. Among the identified hub genes, IL6, COL1A1 and THBS1 have been reported to participate in the process of LNM. For example, CXCR7 may induce COL1A1 upregulation and subsequently accelerate the LNM of PTC via the PI3K/AKT pathway (27). Furthermore, COL1A1 expression level is higher in poorly differentiated thyroid cancer (PDTC) and anaplastic thyroid cancer (ATC) compared with PTC (10). Both PDTC and ATC are known as invasive pathological types of TC (10). Similarly, the present study demonstrated that COL1A1 expression was increased in TC tissues from the LNM group. In addition, the results from enrichment analysis suggested that COL1A1 may promote LNM in TC through ‘ECM-receptor interaction’, ‘focal adhesion’, ‘PI3K-Akt signaling pathway’ and ‘positive regulation of cell migration’. The role of THBS1 in tumor metastasis remains unclear. THBS1 is a multifunctional molecule that can affect biological activities both positively and negatively. Duquette et al (28) reported that THBS1 high expression could promote ATC cell metastasis; however, THBS1 knockdown can decrease the levels of thrombospondin 1, integrin (ITG)α3, ITGα6 and ITGβ1 in the tumor microenvironment, thus inhibiting ATC cell migration. In other types of cancer, including breast cancer, gastric cancer and melanoma, it was demonstrated that the occurrence of LNM is significantly higher in THBS1-positive samples compared with THBS1-negative samples, which was meditated by the PI3K/Akt/mTOR signaling pathway and epithelial-to-mesenchymal transition (EMT) (29–32). In the present study, THBS1 was also highly expressed and involved in cell adhesion and positive regulation of cell migration, according to the GO analysis. However, it has been reported that importin-11 overexpression could promote the invasion, migration and progression of bladder cancer via the upregulation of THBS1 (33–35). Heterogeneity among the types of cancer may explain these differences. In addition, IL-6 overexpression can accelerate LNM in cervical cancer, colorectal cancer and lung cancer by stimulating cell proliferation, migration or EMT (36–38). This is consistent with the results from the present study, which demonstrated that IL-6 was upregulated and enriched in the PI3K-Akt signaling pathway, indicating that IL-6 may promote LNM progression in TC. In addition, ACTN2 downregulation was demonstrated to cause pancreatic endocrine neoplasms metastasis in IGF-signaling cascade, leading to cell separation and migration (39,40). The other hub genes identified in the present study, including ACTA1, CALB2 and PTH, may serve important role in TC with LNM; however, this has not yet been reported. The results from the present bioinformatics analyses require further experimental confirmation. In summary, the hub genes identified in this study may exhibit crucial functions in LNM of TC via numerous mechanisms, including ECM-receptor interaction, focal adhesion and PI3K-Akt signaling pathway. The present study provided novel insights on the prediction of LNM in TC, and may serve at determining the appropriate extent of surgical resection needed to decrease recurrence rate.
  39 in total

1.  Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors:  Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal:  Genome Res       Date:  2003-11       Impact factor: 9.043

2.  Risk factors for neck nodal metastasis in papillary thyroid microcarcinoma: a study of 1066 patients.

Authors:  Ling Zhang; Wen-Jun Wei; Qing-Hai Ji; Yong-Xue Zhu; Zhuo-Ying Wang; Yu Wang; Cai-Ping Huang; Qiang Shen; Duan-Shu Li; Yi Wu
Journal:  J Clin Endocrinol Metab       Date:  2012-02-08       Impact factor: 5.958

3.  Thrombospondin-1 is part of a Slug-independent motility and metastatic program in cutaneous melanoma, in association with VEGFR-1 and FGF-2.

Authors:  Patrizia Borsotti; Carmen Ghilardi; Paola Ostano; Antonietta Silini; Romina Dossi; Denise Pinessi; Chiara Foglieni; Maria Scatolini; Pedro M Lacal; Raffaele Ferrari; Davide Moscatelli; Fabio Sangalli; Stefania D'Atri; Raffaella Giavazzi; Maria Rosa Bani; Giovanna Chiorino; Giulia Taraboletti
Journal:  Pigment Cell Melanoma Res       Date:  2014-10-13       Impact factor: 4.693

4.  Trends in Thyroid Cancer Incidence and Mortality in the United States, 1974-2013.

Authors:  Hyeyeun Lim; Susan S Devesa; Julie A Sosa; David Check; Cari M Kitahara
Journal:  JAMA       Date:  2017-04-04       Impact factor: 56.272

5.  Interleukin 6 receptor (IL-6R) was an independent prognostic factor in cervical cancer.

Authors:  Shaohong Luan; Zhijie An; Shuna Bi; Long Chen; Jun Fan
Journal:  Histol Histopathol       Date:  2017-07-25       Impact factor: 2.303

6.  Current thyroid cancer trends in the United States.

Authors:  Louise Davies; H Gilbert Welch
Journal:  JAMA Otolaryngol Head Neck Surg       Date:  2014-04       Impact factor: 6.223

7.  Interleukin-6 stimulates aerobic glycolysis by regulating PFKFB3 at early stage of colorectal cancer.

Authors:  Jun Han; Qingyang Meng; Qiulei Xi; Yongxian Zhang; Qiulin Zhuang; Yusong Han; Yi Jiang; Qiurong Ding; Guohao Wu
Journal:  Int J Oncol       Date:  2015-11-02       Impact factor: 5.650

8.  Robust rank aggregation for gene list integration and meta-analysis.

Authors:  Raivo Kolde; Sven Laur; Priit Adler; Jaak Vilo
Journal:  Bioinformatics       Date:  2012-01-12       Impact factor: 6.937

9.  Synergistic activity of everolimus and 5-aza-2'-deoxycytidine in medullary thyroid carcinoma cell lines.

Authors:  Giovanni Vitale; Alessandra Dicitore; Daniele Pepe; Davide Gentilini; Elisa S Grassi; Maria O Borghi; Giulia Gelmini; Maria C Cantone; Germano Gaudenzi; Gabriella Misso; Anna M Di Blasio; Leo J Hofland; Michele Caraglia; Luca Persani
Journal:  Mol Oncol       Date:  2017-06-21       Impact factor: 6.603

10.  Thrombospondin-1 Silencing Down-Regulates Integrin Expression Levels in Human Anaplastic Thyroid Cancer Cells with BRAF(V600E): New Insights in the Host Tissue Adaptation and Homeostasis of Tumor Microenvironment.

Authors:  Mark Duquette; Peter M Sadow; Jack Lawler; Carmelo Nucera
Journal:  Front Endocrinol (Lausanne)       Date:  2013-12-02       Impact factor: 5.555

View more
  1 in total

Review 1.  Molecular and Metabolic Reprogramming: Pulling the Strings Toward Tumor Metastasis.

Authors:  Ana Hipólito; Filipa Martins; Cindy Mendes; Filipa Lopes-Coelho; Jacinta Serpa
Journal:  Front Oncol       Date:  2021-06-03       Impact factor: 6.244

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.