Literature DB >> 34908870

Potential Novel Modules and Hub Genes as Prognostic Candidates of Thyroid Cancer by Weighted Gene Co-Expression Network Analysis.

Zhiqiang Shi1, Xinghui Li2, Long Zhang1, Yilang Luo1, Bikal Shrestha3, Xuegang Hu1.   

Abstract

BACKGROUND: Although thyroid cancer (THCA) is one of the most common type of endocrine malignancy, its highly complex molecular mechanisms of carcinogenesis are not completely known.
MATERIALS AND METHODS: In this study, weighted gene co-expression network analysis (WGCNA) was utilized to construct gene co-expression networks and evaluate the relations between modules and clinical traits to identify potential prognostic biomarkers for THCA patients. RNA-seq data and clinical data were downloaded from The Cancer Genome Atlas (TCGA). Other independent datasets from the Gene Expression Omnibus (GEO) database and the Human Protein Atlas database were performed to validate findings.
RESULTS: Finally, 11 co-expression modules were constructed and four hub genes, CCDC146, SLC4A4, TDRD9 and MUM1L1, were identified and validated statistically, which were considerably interrelated to worse survival of THCA patients.
CONCLUSION: This research study revealed four hub genes may be considered candidate prognostic biomarkers and potential therapeutic targets for THCA patients in the future.
© 2021 Shi et al.

Entities:  

Keywords:  THCA; WGCNA; hub gene; prognosis; thyroid cancer; weighted gene co-expression network analysis

Year:  2021        PMID: 34908870      PMCID: PMC8665846          DOI: 10.2147/IJGM.S329128

Source DB:  PubMed          Journal:  Int J Gen Med        ISSN: 1178-7074


Introduction

Thyroid cancer (THCA) is one of the most common malignant tumors in human endocrine system and head and neck.1,2 The most common pathological type of thyroid carcinoma is papillary thyroid carcinoma (PTC), accounting for about 80% of the total number of THCA.3–5 Most thyroid cancers have good prognosis, the 5-year survival rate is more than 95%.6,7 Although the incidence rate of THCA is increasing year by year, the molecular biological mechanism of thyroid carcinogenesis and development is not clear.8,9 At present, the gold standard of preoperative diagnosis of THCA is fine needle aspiration biopsy (FNAB), but 20–30% of FNAB results are uncertain or suspicious, and these patients need diagnostic surgery to identify the characteristics of tumors.10,11 The application of molecular markers is expected to help FNAB improve the ability of preoperative diagnosis of THCA. Weighted gene co expression network analysis (WGCNA) is considered to be an effective network-based method, which can highlight the co-expressed gene modules and study the relationship between gene modules and phenotypes more effectively.12–14 WGCNA has been successfully applied to explore the functional co expression modules and central genes of different diseases, such as pancreatic cancer,15 breast cancer16 and oral squamous cell carcinoma.17 In this study, we used WGCNA and other analytical methods to explore RNA data and clinical information of patients with thyroid tumor. Finally, four hub genes (CCDC146, SLC4A4, TDRD9 and MUM1L1) related to prognosis and transcription level were identified and verified, which showed good diagnostic potential and clinical relevance, and could be used as molecular markers in clinical diagnosis, treatment and prognosis of THCA.

Materials and Methods

Data Acquisition and Data Preprocessing

The RNA‐seq expression data and related clinical traits of THCA were obtained from TCGA database () and GEO database (). A total of 568 patient samples were obtained from TCGA data, including 568 samples, 510 THCA samples and 58 normal samples. Data preprocessing were used to process the raw data for perform background correction and quantile normalization, including robust multi-array average (RMA) background correction and the “affy” R package. The false discovery rate (FDR) <0.05 and log2FC≥ 2 were used as the cut-off value to screen differentially expressed genes, which laid the foundation for further construction of co-expression network.

Weighted Gene Co-Expression Network Construction

Co-expression modules are gene sets with high topological overlap similarity. The “WGCNA” package of R software was used to construct gene co-expression network of differentially expressed genes.11,12 This analysis procedure can identify highly related genes of differentially expressed genes, and genes with the same pathway or function can be clustered in similar gene modules. The cut-off value of co-expression module was set as P < 0.05. In order to further explore the dissimilarity of gene modules and visualize them, we select a cutting line for the module dendrogram and merge a few modules.

Construction of Module-Trait Relationships

In order to explore the gene modules related to clinical features of THCA, the correlation between phenotype and module eigengenes was calculated, and the significance of each gene module was evaluated.18 The gene modules significantly related to clinical features (P < 0.05) were selected for further study.

Enrichment Analysis of Key Co-Expression Modules

To further explore potential function of the key modules, Gene Ontology (GO) term analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were conducted to be described and visualized (DAVID, ).19–21 The significance level was set as p-value <0.01 and FDR <0.05.

Identification and Validation of Hub Genes

Hub genes usually have important biological functions and are highly associated with other nodes of the module. The module membership of each gene is calculated, and the module membership value of the hub gene is higher. To verify the reliability of the hub gene, GSE33630, GSE29265, GSE6004 and the Human Protein Atlas database ()22 were used for further validation. Kaplan-Meier plotter was used for survival analysis.

Results

Co-Expression Module Construction and Identification of Key Modules

Based on the differential analysis, a total of 3712 genes from TCGA for co-expression analysis were calculated. We also excluded cases with incomplete clinical information. The Pearson’s correlation coefficient was used to cluster the sample. We draw a sample clustering tree after removing outliers (Figure 1A). Moreover, we selected the power of β = 18 as the soft‐thresholding (Figures 1B and C). Finally, 11 modules were screened out based on average hierarchical clustering and dynamic tree clipping (Figure 2). As shown in Figure 3A, the interaction between the 11 co-expression modules indicates that each gene module is independently verified in the network. Cyan module and grey module were highly correlated with sample type by Pearson’s correlation analysis (Figure 3B). Therefore, these two modules were selected as clinically important modules for further analysis. In addition, the eigengene dendrogram and heatmap plotted were drawn to explore groups of related eigengenes and the dendrogram of all modules (Figures 3C and D).
Figure 1

Clustering of samples and determination of soft-thresholding power. (A)The clustering was based on the expression data of TCGA, which contained 568 samples, 510 THCA and 58 normal samples. The color intensity was proportional to sample type (normal and THCA), sex, age and disease status. (B) analysis of the scale-free fit index for various soft-thresholding powers (β). (C) Analysis of the mean connectivity for various soft-thresholding powers.

Figure 2

Construction of co-expression modules by WGCNA package in R. (A) The cluster dendrogram of module eigengenes. (B) The cluster dendrogram of genes in TCGA. Each branch in the figure represents one gene, and every color below represents one co-expression module.

Figure 3

Identification of Key Modules. (A) Interaction relationship analysis of co-expression genes. Different colors of horizontal axis and vertical axis represent different modules. (B) Heatmap of the correlation between module eigengenes and the sample type of THCA. (C) Hierarchical clustering of module hub genes that summarize the modules yielded in the clustering analysis. (D) Heatmap plot of the adjacencies in the hub gene network.

Clustering of samples and determination of soft-thresholding power. (A)The clustering was based on the expression data of TCGA, which contained 568 samples, 510 THCA and 58 normal samples. The color intensity was proportional to sample type (normal and THCA), sex, age and disease status. (B) analysis of the scale-free fit index for various soft-thresholding powers (β). (C) Analysis of the mean connectivity for various soft-thresholding powers. Construction of co-expression modules by WGCNA package in R. (A) The cluster dendrogram of module eigengenes. (B) The cluster dendrogram of genes in TCGA. Each branch in the figure represents one gene, and every color below represents one co-expression module. Identification of Key Modules. (A) Interaction relationship analysis of co-expression genes. Different colors of horizontal axis and vertical axis represent different modules. (B) Heatmap of the correlation between module eigengenes and the sample type of THCA. (C) Hierarchical clustering of module hub genes that summarize the modules yielded in the clustering analysis. (D) Heatmap plot of the adjacencies in the hub gene network.

Functional Enrichment Analysis of Key Modules

Moreover, GO and KEGG analysis were conducted for the two key co-expression modules. The results show that the MEcyan module was involved in the important biological functions and signaling pathways related to tumorigenesis and development, such as protein binding, thyroid hormone synthesis, autophagy-animal, Insulin resistance, cell proliferation (Figures 4A and B). In the MEgrey module, functions are mainly enriched in transcriptional misregulation in cancer, cell adhesion molecules (CAMs), complexity and coagulation cascades, and ECM-receptor interaction and signal transduction (Figures 4C and D).
Figure 4

Plot of the enriched GO and KEGG terms in two key co-expression modules. (A) GO enrichment analysis of MEcyan module. (B) KEGG pathway enrichment analysis of MEcyan module. (C) GO enrichment analysis of MEgrey module. (D) KEGG pathway enrichment analysis of MEgrey module.

Plot of the enriched GO and KEGG terms in two key co-expression modules. (A) GO enrichment analysis of MEcyan module. (B) KEGG pathway enrichment analysis of MEcyan module. (C) GO enrichment analysis of MEgrey module. (D) KEGG pathway enrichment analysis of MEgrey module. We screened the Mecyan module and MEgrey module as candidate prognosis genes. Through the Log rank test (p < 0.05) for further overall survival analysis, 4 hub genes (CCDC146, SLC4A4, TDRD9 and MUM1L1) were identified (Figure 5). Kaplan Meier survival curve of overall survival analysis showed that THCA patients with low expression levels of 4 hub genes had poor prognosis. Then, GSE33630, GSE29265, GSE6004 and the Human Protein Atlas database were used to validate the expression status of the 4 hub genes. As shown in Figures 6A and B, the volcano map and expression heat map of the differential RNAs in GSE33630 (45 normal samples and 60 THCA). The common genes between differential RNAs and the MEcyan and MEgrey module were identified by overlapping them in GSE33630 were presented in Figures 6C and D. The results showed that 4 hub genes in two key modules were also differential RNAs in GEO33630. Moreover, the transcriptional level of hub genes were verified in GSE29265 (Figure 7) and GSE6004 (Figure 8). In addition, the translational level of 4 hub genes also were verified by the human protein atlas database (IHC) (Figure 9).
Figure 5

Survival analysis of 4 hub genes based on the Kaplan-Meier plotter. The patients were stratified into high- and low- expression groups according to the median expression. (A) CCDC146. (B) SLC4A4. (C) TDRD9. (D) MUM1L1.

Figure 6

Validation of hub genes in GSE33630. (A) Volcano plot visualizing DEGs in GSE33630 (45 normal samples and 60 THCA). Fold Change=2, adj P=0.05. (B) heatmap hierarchical clustering reveals DEGs in cancer groups compared with those in control groups. (C) Identification of common genes between DEGs and the MEcyan module by overlapping them. The two hub genes in the MEcyan module were also DEGs in GSE33630 (D) Identification of common genes between DEGs and the MEgrey module by overlapping them. The two hub genes in the MEgrey module were also DEGs in GSE33630.

Figure 7

Validation of 4 hub genes in the transcriptional level. (A–D) Validation of hub genes in GSE29265.(*P < 0.05, **P < 0.01, ***P < 0.001).

Figure 8

Validation of 4 hub genes in the transcriptional level. (A–D) Validation of hub genes in GSE6004 (****P < 0.0001).

Figure 9

Validation of 4 hub genes in the translational level. (A–D) Validation of 4 hub genes by The Human Protein Atlas database (IHC).

Survival analysis of 4 hub genes based on the Kaplan-Meier plotter. The patients were stratified into high- and low- expression groups according to the median expression. (A) CCDC146. (B) SLC4A4. (C) TDRD9. (D) MUM1L1. Validation of hub genes in GSE33630. (A) Volcano plot visualizing DEGs in GSE33630 (45 normal samples and 60 THCA). Fold Change=2, adj P=0.05. (B) heatmap hierarchical clustering reveals DEGs in cancer groups compared with those in control groups. (C) Identification of common genes between DEGs and the MEcyan module by overlapping them. The two hub genes in the MEcyan module were also DEGs in GSE33630 (D) Identification of common genes between DEGs and the MEgrey module by overlapping them. The two hub genes in the MEgrey module were also DEGs in GSE33630. Validation of 4 hub genes in the transcriptional level. (A–D) Validation of hub genes in GSE29265.(*P < 0.05, **P < 0.01, ***P < 0.001). Validation of 4 hub genes in the transcriptional level. (A–D) Validation of hub genes in GSE6004 (****P < 0.0001). Validation of 4 hub genes in the translational level. (A–D) Validation of 4 hub genes by The Human Protein Atlas database (IHC).

Discussion

THCA is a rare malignant tumor, accounting for less than 1% of human malignant tumors.1–3 However, it is the most common cancer in the endocrine system and the cause of death for most endocrine cancers.4,5 The occurrence and development of THCA is a multifactorial disease process, involving a variety of molecular mechanisms.2 At present, many published studies mainly focus on the molecular mechanism of single gene in THCA, but ignore the interaction between genes due to its limitations.23–26 Due to the development of big data, gene network is used to analyze the origin and development of various cancers. Therefore, in order to further explore novel and accurate molecular biomarkers for prognosis, we use RNA sequencing data and clinical information from TCGA and GEO databases to explore and verify potential key modules and hub genes through bioinformatics analysis of WGCNA. In this study, we screened 2 key modules (MEcyan module and MEgrey module) from TCGA dataset by WGCNA analysis. 4 hub genes (CCDC146, SLC4A4, TDRD9 and MUM1L1) were then further screened and verified using the GEO database and survival analysis. Considering the functional and pathway enrichment analysis, the two key co-expression modules were significantly enriched in thyroid hormone synthesis, autophagy-animal, cell proliferation, transcriptional misregulation in cancer, cell adhesion molecules (CAMs), and ECM-receptor interaction and signal transduction. At the same time, we also found that these significantly expressed functional annotations and signaling pathways have been reported in THCA and many other cancers.27–30 At present, studies on these five hub genes have been reported, and a large number of studies have shown that their expression plays an important role in the occurrence, development and prognosis of many tumors. It has been found that SLC4A4 contributes to the occurrence and development of tumors, and its involvement in tumor biological processes is specific.31 It is reported that mir-223-3p promotes tumor cell proliferation and metastasis by reducing the expression of SLC4A4 in renal clear cells.32 Gerber et al. Confirmed that the low expression of SLC4A4 in thyroid carcinoma and its diagnostic value.31 SLC4A4 has been reported to be associated with poor prognosis in patients with colon adenocarcinoma. The low expression of SLC4A4 is associated with lymph node invasion and distant metastasis of colon adenocarcinoma. At the same time, SLC4A4 expression is associated with the invasion of immune cells in colon adenocarcinoma. It may be a biomarker and therapeutic target for the diagnosis and prognosis of colon adenocarcinoma.33 Another study have reported that downregulation of expression in TDRD9-positive cell lines causes a decrease in cell proliferation, S-phase cell cycle arrest, and apoptosis, which can be used as a marker for prognosis and as a potential therapeutic target in a subset of lung carcinomas.34 More importantly, one study by Wang et al suggested that TDRD9 was significantly related to the prognosis of THCA. CCDC146 was a potential therapeutic strategy for lymph node metastasis of breast cancer.35 MUM1L1 has not been previously reported to be associated with cancer. The results show that the occurrence and development of tumor may be regulated by multiple genes, which may provide more research strategies for the diagnosis and treatment of THCA. There are two deficiencies in this study. First, the results were not verified in clinical samples. Considering the high reliability of high-throughput sequencing expression data and the sufficient number of samples included in the study, this deficiency can be made up to a certain extent, but it can not completely replace the significance of clinical sample verification. Second, the selected molecules have no functional validation. Although the signal pathway of differential genes was analyzed by KEGG pathway in this study, the newly discovered molecules that have not been reported should be functional verified.

Conclusion

In summary, based on the TCGA database, we analyzed the gene expression profile of THCA and successfully identified four hub genes associated with THCA prognosis, which showed good diagnostic potential and clinical relevance as molecular markers for clinical diagnosis, treatment and prognosis of THCA.
  35 in total

1.  DAVID: Database for Annotation, Visualization, and Integrated Discovery.

Authors:  Glynn Dennis; Brad T Sherman; Douglas A Hosack; Jun Yang; Wei Gao; H Clifford Lane; Richard A Lempicki
Journal:  Genome Biol       Date:  2003-04-03       Impact factor: 13.583

2.  A general framework for weighted gene co-expression network analysis.

Authors:  Bin Zhang; Steve Horvath
Journal:  Stat Appl Genet Mol Biol       Date:  2005-08-12

3.  Weighted gene correlation network analysis identifies RSAD2, HERC5, and CCL8 as prognostic candidates for breast cancer.

Authors:  Jianing Tang; Qian Yang; Qiuxia Cui; Dan Zhang; Deguang Kong; Xing Liao; Jiangbo Ren; Yan Gong; Gaosong Wu
Journal:  J Cell Physiol       Date:  2019-06-21       Impact factor: 6.384

4.  Weighted gene co-expression network analysis reveals key genes involved in pancreatic ductal adenocarcinoma development.

Authors:  Matteo Giulietti; Giulia Occhipinti; Giovanni Principato; Francesco Piva
Journal:  Cell Oncol (Dordr)       Date:  2016-05-30       Impact factor: 6.730

5.  Thyroid cancer in the USA: current trends and outstanding questions.

Authors:  Louise Davies; Jenny K Hoang
Journal:  Lancet Diabetes Endocrinol       Date:  2020-11-19       Impact factor: 32.069

6.  Current thyroid cancer trends in the United States.

Authors:  Louise Davies; H Gilbert Welch
Journal:  JAMA Otolaryngol Head Neck Surg       Date:  2014-04       Impact factor: 6.223

7.  Development and validation of a novel survival model for head and neck squamous cell carcinoma based on autophagy-related genes.

Authors:  Ziying Ren; Long Zhang; Wei Ding; Yilang Luo; Zhiqiang Shi; Bikal Shrestha; Xuan Kan; Zhuhua Zhang; Jing Ding; Haojie He; Xuegang Hu
Journal:  Genomics       Date:  2020-11-20       Impact factor: 5.736

8.  H2A.Z acetylation by lincZNF337-AS1 via KAT5 implicated in the transcriptional misregulation in cancer signaling pathway in hepatocellular carcinoma.

Authors:  Yin Yuan; Wen Cao; Hongbing Zhou; Haixin Qian; Honggang Wang
Journal:  Cell Death Dis       Date:  2021-06-12       Impact factor: 8.469

9.  WGCNA: an R package for weighted correlation network analysis.

Authors:  Peter Langfelder; Steve Horvath
Journal:  BMC Bioinformatics       Date:  2008-12-29       Impact factor: 3.169

10.  Ten hub genes associated with progression and prognosis of pancreatic carcinoma identified by co-expression analysis.

Authors:  Zhou Zhou; Yian Cheng; Yinan Jiang; Shi Liu; Meng Zhang; Jing Liu; Qiu Zhao
Journal:  Int J Biol Sci       Date:  2018-01-12       Impact factor: 6.580

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.