Hao Bo1, Ke Cao2, Ruiling Tang1, Han Zhang1, Zhaojian Gong3, Zhizhong Liu4, Jianye Liu5, Jingjing Li6, Liqing Fan1,7. 1. Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, China. 2. Department of Oncology, Third Xiangya Hospital, Central South University, Changsha, Hunan, China. 3. Department of Oral and Maxillofacial Surgery, The Second Xiangya Hospital, Central South University, Changsha, Hunan, China. 4. Hunan Cancer Hospital and The Affliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China. 5. Department of Urology, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China. 6. Department of Plastic Surgery of Third Xiangya Hospital, Central South University, Changsha, China. 7. Reproductive & Genetic Hospital of CITIC-Xiangya, Changsha, China.
Abstract
Background: Testicular germ cell tumors (TGCT) is the most common testicular malignancy threaten young male reproductive health. This study aimed to identify aberrantly methylated-differentially expressed genes and pathways in TGCT by comprehensive bioinformatics analysis. Methods: Data of gene expression microarrays (GSE3218, GSE18155) and gene methylation microarrays (GSE72444) were collected from GEO database. Integrated analysis acquired aberrantly methylated-genes. Functional and pathway enrichment analysis were performed using DAVID database. Protein-protein interaction (PPI) network was constructed by STRING and App Mcode was used for module analysis. GEPIA platform and DiseaseMeth database were used for confirming the expression and methylation levels of hub genes. Finally, Human Protein Atlas database was performed to evaluate the prognostic significance. Results: Totally 604 hypomethylation-high expression and 147 hypermethylation-low genes were identified. The high expressed genes were enriched in biological processes of cell proliferation and migration. The top 8 hub genes of PPI network were GAPDH, VEGFA, PTPRC, RIPK4, MMP9, CSF1R, KRAS and FN1. After validation in GEPIA platform, all hub genes were elevated in TGCT tissues. Only MMP9, CSF1R and PTPRC showed hypomethylation-high expression status, which predicted the poor outcome of TGCT patients. Conclusion: Our study indicated possible aberrantly methylated-differentially expressed genes and pathways in TGCT by bioinformatics analysis, which may provide novel insights for unraveling pathogenesis of TGCT.
Background: Testicular germ cell tumors (TGCT) is the most common testicular malignancy threaten young male reproductive health. This study aimed to identify aberrantly methylated-differentially expressed genes and pathways in TGCT by comprehensive bioinformatics analysis. Methods: Data of gene expression microarrays (GSE3218, GSE18155) and gene methylation microarrays (GSE72444) were collected from GEO database. Integrated analysis acquired aberrantly methylated-genes. Functional and pathway enrichment analysis were performed using DAVID database. Protein-protein interaction (PPI) network was constructed by STRING and App Mcode was used for module analysis. GEPIA platform and DiseaseMeth database were used for confirming the expression and methylation levels of hub genes. Finally, Human Protein Atlas database was performed to evaluate the prognostic significance. Results: Totally 604 hypomethylation-high expression and 147 hypermethylation-low genes were identified. The high expressed genes were enriched in biological processes of cell proliferation and migration. The top 8 hub genes of PPI network were GAPDH, VEGFA, PTPRC, RIPK4, MMP9, CSF1R, KRAS and FN1. After validation in GEPIA platform, all hub genes were elevated in TGCT tissues. Only MMP9, CSF1R and PTPRC showed hypomethylation-high expression status, which predicted the poor outcome of TGCT patients. Conclusion: Our study indicated possible aberrantly methylated-differentially expressed genes and pathways in TGCT by bioinformatics analysis, which may provide novel insights for unraveling pathogenesis of TGCT.
Testicular germ cell tumors (TGCT) is the most common malignancy among young men between 17 and 45 years of age, which comprises around 98% of all testicular neoplasms 1. The incidence rate of TGCT has been increasing over the last decades in large parts of the world 2. TGCT are divided into seminomas (SE), non-seminomas (non-SE) and mixed tumors according histologic features. Currently, platinum chemotherapy and surgery are the main treatment for TGCT, but about 15% of the patients still result in poor prognosis 3.And the embarrassing is that little has been known about the etiology of TGCT up to now. According to the statistics, approximately 25% of genetic factors is believed to contribute to TGCT susceptibility 4. A gain of chromosome arm 12p is identified as highest effect and the most frequent event detected in TGCT 5. But beyond that, only Kit and Ras gene family have been implicated repeatedly in different TGCT 6. Therefore, identification of key differently expressed genes may provide a further understanding of the genetic etiology and molecular mechanism in TGCT occurrence and development.DNA methylation is reported to lead to gene silencing via inhibiting gene transcription, which has become the most studied epigenetic mechanism 7. Accumulated studies have showed that aberrant methylation of gene promoter regions is associated with occurrence, progression, and metastasis of several tumors, including ovarian carcinoma 8, gastric cancer 9 and so on. The profile of gene promoter methylation in TGCT was reported by several studies 10-12. In this study, we focused on identifying the DNA methylation in TGCT from GEO online database. Integrated analysis and protein-protein interaction network were performed to explore the functions of abnormal expressed genes. Then the levels of mRNA and DNA methylation of top 8 hub genes were confirmed by the data from Cancer Genome Atlas (TCGA) or DiseaseMeth database, respectively. These identified genes may be used as potential biomarkers or therapeutic targets for TGCT.
Materials and methods
Microarray data information and gene identification
The gene express profile of GSE3218 (6 normal cases and 101 tumor cases) 13 and GSE18155 (7 normal cases and 26 tumor cases) 14, and gene methylation datasets of GSE72444 (9 normal cases and 3 tumor cases) were obtained from NCBI GEO database (GEO, http://www.ncbi.nlm.nih.gov/geo/). Limma package (http://www.bioconductor.org/packages/release/bioc/html/limma.html) in R (https://www.r-project.org/) was used to identify differentially expressed genes (DEGs) between normal testis cases and TGCT cases. Aberrantly methylated-DEGs were identified by GEO2R online software. The screening criteria of DEGs was defined with |t| > 2, P < 0.01, while which of differentially DNA methylation was defined with |t| > 8, P < 0.01. For integrated analysis, the gene microarray and DNA methylation data were processed in Bioinformatics & Evolutionary Genomics (Available online: http://bioinformatics.psb.ugent.be/webtools/Venn/), which were presented by Venn diagrams.
GO and KEGG pathway enrichment analysis
The annotation, visualization and integrated discovery (DAVID) database is a web-accessible program (https://david.ncifcrf.gov/), which provides an integrated analysis to reveal biological meaning behind given gene list. Candidate genes' functions and pathways enrichment were analyzed by using DAVID. The enrichment graphs were drawn by E Chart online software (http://www.ehbio.com/ImageGP/index.php/Home/Index/index.html). And the top 10 of the results were presented. The size of each circle indicates the counting number on each part, while the color represents the P-value of the enrichment analysis.
Integration of protein-protein interaction (PPI) network construction and modular analysis
The online database Retrieval of Interacting Genes (STRING, http://string-db.org) was used to evaluate proteins and protein-protein interaction information, the detail methods were used as previous report 15. Interaction combined score ≥ 0.4 was regarded as significant. The results of significant interactions were imported into Cytoscape plugin to create network visualizations. After finishing PPI network, App Mcode was performed to make module analysis with the default parameters. (Degree cut off ≥ 8, Node score cut off ≥ 30 and Max depth = 100). Hub genes were defined as the genes with the top 8 degree.
Analysis of hub genes expression in TGCT tissue samples
The expression levels of top 8 hub genes in TGCT tissues and normal samples were validated by GEPIA (Gene Expression Profiling Interactive Analysis) platform, which is a free online database (http://gepia.cancer-pku.cn/). Search strategy was as previous report 16. In brief, the difference in expression levels of each candidate gene between TGCT and normal testicular tissues from Genotype-Tissue Expression (GTEx) project were assessed by the linear model and the empirical Bayes method implemented by the R package limma, with adjusted P-value (Benjamini and Hochberg FDR). |log2FC| > 0.5 and P < 0.01 were considered to indicate a statistically significant difference.
Methylation and prognostic analysis of hub genes in TGCT tissue samples
The data of DNA methylation level of these top 8 hub genes were obtained from DiseaseMeth database (http://bio-bigdata.hrbmu.edu.cn/diseasemeth/), and analyzed as described previously 17. The difference in DNA methylation between TGCT tissues and normal tissues were compared by Student's t-test. The survival curves of each hub gene were online drawn using the Human Protein Atlas database (http://www.proteinatlas.org/) 18.
Results
Identification of abnormal DNA methylated‑differentially expressed genes in TGCT
In order to screen the different expression of genes related to the development of TGCT, a total of two gene mRNA expression datasets (GSE3218 and GSE18155) and one DNA methylation dataset (GSE72444) were downloaded for integrated analysis from GEO database. As shown in Fig. 1A and B, 901 overlapping high-regulated genes and 1098 overlapping low-regulated genes were identified in the GSE3218, GSE18155 and GSE72444 datasets, respectively. For integrated analysis, totally 604 hypomethylation-up expression genes and 147 hypermethylation-down expression genes were screened from these three datasets.
Fig 1
The integrated analysis of abnormal DNA methylation genes in TGCT. The gene counting numbers were summarized in gene expression datasets (GSE3218 blue and GSE18155 red) and gene methylation dataset (GSE72444 green), respectively. Left panel (A) was represented the hypomethylation and up-regulated genes, while right panel (B) represented the hypermethylation and down-regulated genes. These two graphs were drawn by Bioinformatics & Evolutionary Genomics.
Gene function and pathways enrichment analysis
604 hypomethylation and up-regulated genes and 147 hypermethylation and down-regulated genes were respectively imported into online biological classification tool DAVID for functional analysis. GO term enrichment analysis was composed of three aspects, including biological process, cellular component, and molecular function. For biological process (Fig. 2A), the hypomethylation-upregulated genes were enriched in cell migration, osteoblast differentiation, cell adhesion, cell proliferation, ERK1 and ERK2 cascade, cell shape, innate immune response and apoptosis. The hypermethylation-downregulated genes were enriched in spermatid development, autophagy, cell cycle, starvation, fatty acid catabolic process, protein ubiquitination, membrane depolarization, phosphatidylethanolamine biosynthetic process and proteasome-mediated ubiquitin-dependent protein catabolic process. GO cell component analysis showed that the upregulated genes were significantly enriched in the extracellular exosome, focal adhesion, endoplasmic reticulum, cell surface, plasma membrane, while downregulated genes were enriched in nucleus, centriole, lysosome (Fig. 2B). Besides, the results showed that hypomethylation-upregulated genes were associated with the molecular function, including calcium ion binding, GTP binding, extracellular matrix structural constituent, GTPase activator activity (Fig. 2C). The hypermethylation-downregulated genes were mostly enriched in molecular function of Zinc ion binding and ubiquitin-protein transferase activity.
Fig 2
GO and KEGG pathway enrichment analysis. GO analysis covered three groups, including biological process (A), cellular component (B) and molecular function (C). (D) The most significantly enriched molecular pathways of the hypomethylation-upregulated and hypermethylation-downregulated genes were figured out by KEGG analysis. The size of each circle indicates the counting number on each part, while the color represents the P-value of the enrichment analysis.
To further understand the functions of such identified genes, these overlapping genes were mapped into DAVID for KEGG pathway enrichment analysis (Fig. 2D). The most significantly molecular pathways of the hypomethylation-upregulated genes were enriched in lysosome, focal adhesion, ECM-receptor interaction, microRNA in cancer, proteoglycans in cancer, PI3K-Akt signaling pathway, chemokine signaling pathway, Rap 1 signaling pathway and regulation of actin cytoskeleton. In addition, the hypermethylation-downregulated genes were enriched in five KEGG pathways, which including autophagy, glycerophospholipid metabolism, mTOR signaling pathway, metabolic pathway, and aldosterone synthesis and secretion.
PPI networks and module analysis
The overlapping abnormal DNA methylated‑differentially expressed genes indicated a distinct set of interactions and networks, these identified genes were imported into STRING database to construct PPI networks. The analysis results were then put into Cytoscape software for creating network visualizations. The PPI network graph was shown in Fig. 3A, which included 638 nodes and 3382 edges. The most significant 8 node with higher degrees were identified as hub genes, including glyceraldehyde-3-phosphate dehydrogenase (GAPDH), vascular endothelial growth factor A (VEGFA), protein tyrosine phosphatase, receptor type C (PTPRC), receptor interacting serine/threonine kinase 4 (RIPK4), matrix metallopeptidase 9 (MMP9), colony stimulating factor 1 receptor (CSF1R), KRAS proto-oncogene, GTPase (KRAS), fibronectin 1 (FN1).
Fig 3
Protein-protein interaction (PPI) network complex and top three modules of the hypomethylation-upregulated and hypermethylation-downregulated genes. (A) The PPI networks. The size of node indicated the degree value. Yellow nodes represented hub genes. The thickness of edges represented correlation degree. (B-D) Module 1 consists of 30 nodes, module 2 consists of 35 nodes and module 3 consists of 58 nodes.
According the degree of importance, 3 significant modules (Fig. 3B-D) from PPI network were further analyzed by App Mcode. The KEGG pathway enrichment analysis of the top 3 module genes using DAVID showed enrichment in the cell cycle, cell proliferation, pathways in cancer, all of which have a close relationship with tumor biology (supplemental table).
The mRNA expression levels of 8 hub genes in TGCT
To confirm the expression levels of identified 8 hub genes, we collected the related published data from TCGA datasets and analyzed by GEPIA platform. As expected, the results revealed that all hub genes had higher levels in tumor tissues than that in normal tissues (Fig. 4), suggesting these 8 genes might be candidate genes affected by aberrant methylation during TGCT development.
Fig 4
The mRNA expression levels of 8 hub genes in TGCT. The published online data of gene mRNA expression level were analyzed by GEPIA platform. These 8 hub genes were all up-regulated in the TGCT tissues comparing with the normal tissues, including GAPDH, VEGFA, PTPRC, RIPK4, MMP9, CSF1R, KRAS, FN1. *P<0.05.
Methylation analysis of 8 hub genes in TGCT
To further verify the methylation levels of GAPDH, VEGFA, PTPRC, RIPK4, MMP9, CSF1R, KRAS, and FN1, the data of DNA methylation status in TGCT was obtained from DiseaseMeth database. We found that DNA hypomethylation in the promoters of MMP9, CSF1R, and PTPRC, which might be one of the main causes of aberrant high expression in TGCT. Unexpectedly, for KRAS and RIPK4, there was no significant difference in DNA methylation value between TGCT and normal tissues. Inversely, GAPDH, VEGFA and FN1 even showed aberrant promoters hypermethylation in TGCT. The results were summarized in Fig. 5.
Fig 5
Methylation analysis of 8 hub genes in TGCT. The promoters of MMP9, CSF1R and PTPRC were hypomethylated. No obvious change was observed in methylation levels of KRAS and RIPK4, whereas GAPDH, VEGFA and FN1 showed gene hypermethylation in TGCT.
The prognostic significance of hub genes in TGCT
Previous studies have confirmed that MMP9, CSF1R, and PTPRC were up-regulated in TGCT with hypomethylation, suggesting these three genes may act as tumor promoters and involve in TGCT procession. To estimate the prognostic significance of abnormal expressed MMP9, CSF1R, and PTPRC, the survival time and gene expression levels were acquired from the Human Protein Atlas database (https://www.proteinatlas.org/). Kaplan-Meier method was used to evaluate the survival time data of each identified gene. The analysis results showed that higher levels of MMP9, CSF1R, and PTPRC were related to shorter survival time (Fig. 6), these three key genes may provide potential makers for TGCT diagnosis and prognosis evaluation.
Fig 6
The prognostic significance of hub genes in TGCT. Higher levels of MMP9, CSF1R and PTPRC were associated with poorer overall survival time, respectively.
Discussion
Testicular germ cell tumor (TGCT) is one of the most common malignancies among young men in large parts of the world 19, 20. According to the statistics, white Caucasian populations in industrialized countries usually have the highest incidence of TGCT 21. However, the reality is that the genetic etiology and pathogenicity related genes for TGCT are a lot of unknowns. Therefore, revealing the underlying molecular mechanism of TGCT is a key approach to greatly benefit the diagnosis and treatment. Fortunately, the emerging microarray and next-generation sequencing technology help us gain deeper insight into the thousands or millions of human genome sequences concurrently 22-30. As a consequence, these related technologies have been adopted extensively to screen the potential diagnostic indicator and treatment target points for multiple diseases, such as cancer, and metabolic disease 31-37.In present study, we analyzed two gene mRNA expression profiles (GSE3218 and GSE18155) and one DNA methylation dataset (GSE72444) of human TGCT tissues from the available GEO database and identified 604 hypomethylation-high expression genes and 147 hypermethylation-low expression genes. These identified abnormal DNA methylated-differentially expressed genes were imported into online biological classification tool DAVID for further functional analysis. The hypomethylation-upregulated genes were enriched in biological processes of cell migration and proliferation. GO cell component analysis showed that the upregulated genes were significantly enriched in the extracellular exosome, focal adhesion, endoplasmic reticulum, cell surface 38-42. Besides, for molecular function, the hypomethylation-upregulated genes were significantly enriched in calcium ion binding, GTP binding, extracellular matrix structural constituent, GTPase activator activity 43. It is reasonable because high capacity of cell migration and proliferation are two apparent hallmarks of tumor cells including TGCT 21, which promote tumor development. GO cell component analysis suggested that these genes are mainly located in membranaceous structures of cell and organelle. Endoplasmic reticulum stress has been known to create a profound effect on cancer cell proliferation and survival, which also increase injury of spermatogonia and decrease of spermatocytes 44. Notably, KEGG pathway enrichment analysis suggested significant enrichment in pathways including ECM-receptor interaction, microRNA in cancer 45-48 and PI3K-Akt signaling pathway. The previous studies reported that PI3K-Akt signaling activation is associated with tumor cell growth, proliferation, migration and angiogenesis of TGCT 49-52. It is well known that increased cell proliferation and migration are the leading causes for malignant tumor initiation and progression 53-55, thus our biological analysis results were consistent with previous researches.PPI network of hypomethylation-high expression genes illustrated the protein-protein interactome of the top 8 hub genes, named GAPDH, VEGFA, PTPRC, RIPK4, MMP9, CSF1R, KRAS, and FN1, which may provide a new sight for the therapeutic strategy in TGCT by constructing the PPI network. Module analysis of the PPI network revealed that the development of TGCT was associated with cell cycle, cell proliferation and pathways in cancer, which was consistent with our pathway analysis. Seven of the eight hub genes are reported to directly participate in the identified pathways, except for RIPK4. Additionally, our previous study found that upregulation of RIPK4 increases the expression of VEGFA and contributes to the progression of bladder urothelial carcinoma 56.To further confirm the top 8 hub genes expression levels in TGCT tissues, the related published data from TCGA datasets were collected and analyzed by GEPIA platform. As expected, all hub genes had higher levels in tumor tissues than that in normal tissues. However, whether DNA hypomethylation contribute to the abnormal high expression status was still unclear. Then, we analyzed the methylation data of TGCT tissues acquired from DiseaseMeth database. Only MMP9, CSF1R, and PTPRC were confirmed with hypomethylation. No obvious change was observed in methylation level of KRAS, RIPK4, whereas VEGFA, GAPDH and FN1 showed gene hypermethylation in TGCT. These differences reflect that some other reasons may affect the expression level of these five genes. Except DNA methylation, histone modification is also an important way of regulating of gene expression and associated with transcriptional activity 57, 58. Enrichment for histone H3 Lys 27 acetylation (H3K27Ac) has been found in the DNA regions of VEGFA, GAPDH and FN1 as confirmed by UCSC Genome Browser (https://genome.ucsc.edu/), which may cause a significant increase of mRNA levels of VEGFA, GAPDH and FN1 when their promoters are hypermethylated. In our future work, we will focus on this issue to deeper insight into the gene regulatory mechanism.MMP9, CSF1R, and PTPRC, are hypomethylation-high expressed genes, suggesting they may involve in TGCT procession. The Kaplan-Meier analysis results showed that higher levels of MMP9, CSF1R, and PTPRC were related to shorter survival time of TGCT patients. MMP9 is a class of enzymes involved in the degradation of the extracellular matrix, which was up-regulated in tumor tissue samples of multi cancer 59-61, such as TGCT 62, breast cancer 63, lung cancer 64. Moreover, MMP9 overexpression was significantly associated with poor survival of nasopharyngeal carcinoma 65 and glioblastoma 66. CSF1R is a cell-surface protein, while PTPRC is a member of the protein tyrosine phosphatase (PTP) family 67, 68. High expression levels of CSF1R were associated with higher lung cancer-specific mortality 69. CSF1R and PTPRC are known to regulate a variety of cellular processes including cell growth, differentiation, and oncogenic transformation 70. The present study is the first to identify the prognostic significance of these three genes, which may serve as valuable prognostic indicators of TGCT.In conclusion, with the microarray gene expression profiling and gene methylation datasets, this study provides a network-based approach to identify abnormal DNA methylated-differentially expressed genes, which may play important roles in initiation and progression of TGCT. Hub genes including PTPRC, MMP9 and CSF1R might serve as potential biomarkers for diagnosis, treatment and prognosis evaluation of TGCT in the future.
Authors: Roger D Palmer; Matthew J Murray; Harpreet K Saini; Stijn van Dongen; Cei Abreu-Goodger; Balaji Muralidhar; Mark R Pett; Claire M Thornton; James C Nicholson; Anton J Enright; Nicholas Coleman Journal: Cancer Res Date: 2010-03-23 Impact factor: 12.701
Authors: Victoria M Chia; Sabah M Quraishi; Susan S Devesa; Mark P Purdue; Michael B Cook; Katherine A McGlynn Journal: Cancer Epidemiol Biomarkers Prev Date: 2010-05 Impact factor: 4.254
Authors: Anja Lorch; Jörg Beyer; Caroline Bascoul-Mollevi; Andrew Kramar; Lawrence H Einhorn; Andrea Necchi; Christophe Massard; Ugo De Giorgi; Aude Fléchon; Kim A Margolin; Jean-Pierre Lotz; Jose Ramon Germa Lluch; Thomas Powles; Christian K Kollmannsberger Journal: J Clin Oncol Date: 2010-10-18 Impact factor: 44.544
Authors: Georges J Netto; Yasutomo Nakai; Masashi Nakayama; Sana Jadallah; Antoun Toubaji; Norio Nonomura; Roula Albadine; Jessica L Hicks; Jonathan I Epstein; Srinivasan Yegnasubramanian; William G Nelson; Angelo M De Marzo Journal: Mod Pathol Date: 2008-07-11 Impact factor: 7.842
Authors: Peter A Kanetsky; Nandita Mitra; Saran Vardhanabhuti; Mingyao Li; David J Vaughn; Richard Letrero; Stephanie L Ciosek; David R Doody; Lauren M Smith; Joellen Weaver; Anthony Albano; Chu Chen; Jacqueline R Starr; Daniel J Rader; Andrew K Godwin; Muredach P Reilly; Hakon Hakonarson; Stephen M Schwartz; Katherine L Nathanson Journal: Nat Genet Date: 2009-05-31 Impact factor: 38.330
Authors: M Brait; L Maldonado; S Begum; M Loyo; D Wehle; F F Tavora; L H J Looijenga; J Kowalski; Z Zhang; E Rosenbaum; S Halachmi; G J Netto; M O Hoque Journal: Br J Cancer Date: 2011-11-08 Impact factor: 7.640