Zeng-Hong Wu1,2, Tao Zhou1, Hai-Ying Sun1,3. 1. Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China. 2. Department of Infectious Diseases, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China. 3. Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA.
Abstract
Nasopharyngeal carcinoma (NPC) is the most common malignant tumor with a remarkable racial and geographical distribution including people in southern China, South East Asia, and the Middle East/North Africa. DNA methylation is an important manifestation of epigenetic modification, has been studied over several decades, and by regulating and controlling the expression of cancer-related genesits, abnormal DNA methylation can influence in a variety of human malignancy tumors.Until now, there is no analysis focus on differentially methylated, differential expressed genes (MDEGs) study, so we make a joint analysis for both gene methylation profiling microarray and gene expression profiling microarray in NPC. Two gene expression datasets (GSE64634 and GSE12452) and gene methylation profiling data set (GSE62336) were downloaded from GEO and analyzed using the online tool GEO2R to identify MDEGs. Gene ontology (GO) functional analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the differentially methylated genes were performed. The STRING database was used to evaluate the interactions of MDEGs and to construct a protein-protein interaction (PPI) network using Cytoscape software. Hub genes were validated with the cBioPortal database.The overlap among the 3 datasets contained 135 hypermethylation genes and 541 hypomethylation genes between NPC and non-NPC samples. A total of 4 genes (TROAP, PCOLCE2, HOXA4, and C1QB) in Hyper-LGs and 14 genes (DYNC1H1, LNX1, RAB37, ALDH3A1, SLC24A4, CP, CEP250, ANK2, DNAI2, MUC13, ACACB, GABRP, STX7, and TTC9) in Hypo-HGs were identified as hub genes.The study of DNA methylation and gene expression provides us a strong support as well as new comprehensive information of MDEGs to the revelation of nasopharyngeal carcinoma's complex pathogenesis. However, further studies are needed to elucidate the biological function of these genes in NPC in the future.
Nasopharyngeal carcinoma (NPC) is the most common malignant tumor with a remarkable racial and geographical distribution including people in southern China, South East Asia, and the Middle East/North Africa. DNA methylation is an important manifestation of epigenetic modification, has been studied over several decades, and by regulating and controlling the expression of cancer-related genesits, abnormal DNA methylation can influence in a variety of human malignancy tumors.Until now, there is no analysis focus on differentially methylated, differential expressed genes (MDEGs) study, so we make a joint analysis for both gene methylation profiling microarray and gene expression profiling microarray in NPC. Two gene expression datasets (GSE64634 and GSE12452) and gene methylation profiling data set (GSE62336) were downloaded from GEO and analyzed using the online tool GEO2R to identify MDEGs. Gene ontology (GO) functional analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the differentially methylated genes were performed. The STRING database was used to evaluate the interactions of MDEGs and to construct a protein-protein interaction (PPI) network using Cytoscape software. Hub genes were validated with the cBioPortal database.The overlap among the 3 datasets contained 135 hypermethylation genes and 541 hypomethylation genes between NPC and non-NPC samples. A total of 4 genes (TROAP, PCOLCE2, HOXA4, and C1QB) in Hyper-LGs and 14 genes (DYNC1H1, LNX1, RAB37, ALDH3A1, SLC24A4, CP, CEP250, ANK2, DNAI2, MUC13, ACACB, GABRP, STX7, and TTC9) in Hypo-HGs were identified as hub genes.The study of DNA methylation and gene expression provides us a strong support as well as new comprehensive information of MDEGs to the revelation of nasopharyngeal carcinoma's complex pathogenesis. However, further studies are needed to elucidate the biological function of these genes in NPC in the future.
Nasopharyngeal carcinoma (NPC) is the most common malignant tumor with a remarkable racial and geographical distribution including people in southern China, South East Asia, and the Middle East/North Africa.[ The main clinical manifestations of the patient are nasal congestion, blood stasis, ear blockage, hearing loss, vision, and headaches, among others. It is primarily a malignant tumor derived from nasopharyngeal epithelium located in the upper part of the nasopharyngeal cavity and on the side wall, with a strong tendency to metastasize.[ Diagnosing the disease in the early needs a high index of clinical acumen and confirmation is only dependent on histology.[ The potential risk factors for NPC include Epstein–Barr virus (EBV) infection,[ alcohol consumption, exposure to dust, formaldehyde, the function of genetic factors, and cigarette smoking.[ Studies reported that ingestion of salted fish or other preserved foods was the most important part cause of NPC. Other studies found EBV infection is 90% to 100% of NPC cases in endemic regions.[ EBV is associated with multiple types of human cancer, such as Burkitt lymphoma and Hodgkin disease, whereas in Asia it is closely association with NPC.DNA methylation is an important manifestation of epigenetic modification, has been studied over several decades, and by regulating and controlling the expression of cancer-related genes, abnormal DNA methylation can influence in a variety of human malignant tumors. Recently epigenetic studies indicate that methylation can be of use as diagnostic biomarker and potential target for treatment.[ Hypomethylation activates transcription of genes, whereas hypermethylation usually inhibits transcription of genes. CpG islands are lie in or near promoter regions of the genome, so aberrant methylation genes in CpG islands are often hypermethylated and may affect chromatic structure, upregulating or downregulating gene expression. Changes and malignant transformation of cells eventually lead to the formation of tumors.[ About one-quarter of methylation alterations are significantly related to changes in the expression of tumor genes.[ Zouridis et al[ has been reported that 78% of the identified methylation expression were negative, consistent with DNA methylation used to silence local transcription. Therefore, the identification of differentially methylated, differential expressed genes (MDEGs) will be of great significance in clarifying the pathogenesis of NPC and in filtering biomarkers for diagnosis.Many gene expression profiling analysis were introduced for differentially expressed genes (DEGs), whereas separated analysis of DEGs is limited.[ Until now, there is no analysis focus on differentially methylated genes (DMGs) study, so we make a joint analysis for both gene methylation profiling microarray and gene expression profiling microarray in NPC. In this study, we used online bioinformatics resources to explore NPC-specific MDEGs. We identified NPC-related MDEGs, including hypomethylated, highly expressed genes (Hypo-HGs), and hypermethylated, lowly expressed genes (Hyper-LGs). Gene ontology and KEGG pathways involving the MDEGs help us understand the function of the MDEGs and we also explored the main hub nodes in protein–protein interaction (PPI) networks. Our results showed that the Hyper-HGs hub genes are TROAP, PCOLCE2, HOXA4, and C1QB; the Hypo-LGs hub gene is DYNC1H1, LNX1, RAB37, ALDH3A1, SLC24A4, CP, CEP250, ANK2, DNAI2, MUC13, ACACB, GABRP, STX7, and TTC9. The analyses provide comprehensive biological information for MDEGs, which may help to promote the understanding of the development and progression of NPC.
Materials and methods
Data resources
GEO (http://www.ncbi.nlm.nih.gov/geo)[ is a public functional genomics data repository which included throughout gene expression data, chips and microarrays. Two gene expression datasets (GSE64634[ and GSE12452[) and gene methylation profiling data set (GSE62336) were downloaded from GEO (GPL13534 Illumina HumanMethylation450 BeadChip and Affymetrix GPL570 platform, Affymetrix Human Genome U133 Plus 2.0 Array). Observing the download of GSE64634 database included 14 NPC samples and 4 normal samples; GSE12452 contained 31 NPC samples and 10 noncancerous samples; GSE62336 dataset contained 25 NPC tissue samples and 25 noncancerous samples. Ethical approval was not necessary for this study because our study is bioinformatic analysis.
Identification of differentially expressed genes
The identification of DEGs and DMGs between NPC and noncancerous samples was performed using GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r) with the criteria of P < .05 and |t| > 2. GEO2R is an online tool designed that allows users to compare different datasets in a GEO series for identify DEGs across experimental conditions. To correct the limitations of false-positives, we used Benjamini and Hochberg False Discovery Rate method.[ Finally, we acquired hypomethylation-high expression genes (Hypo-HGs) after superimposition of upregulated and hypomethylation genes and acquired hypermethylation-low expression genes (Hyper-LGs) after superimposition of downregulated and hypermethylation genes.
KEGG and GO enrichment analyses of DMGs
The Enrichr (https://amp.pharm.mssm.edu/Enrichr/) (version 6.8)[ which is a useful online platform database that integrates biological data and provides a comprehensive set of functional annotation information of genes as well as proteins for users to analyze the functions or signaling pathways. The Kyoto Encyclopedia of Genes and Genomes (KEGG)[ is a database resource for understanding high-level gene functions and linking genomic information from large-scale molecular datasets. Gene ontology (GO)[ function analysis (biological processes [BPs], cellular components [CCs], and molecular functions [MFs]) is a powerful bioinformatics tool to analyze BP and annotate genes. To analyze the function of the identified DMGs, biological analyses were performed using GO enrichment and KEGG pathway analysis via Enrichr online database. P < .05 as the cutoff criterion considered statistically significant.
PPI network construction and module analysis
Search Tool for the Retrieval of Interacting Genes (STRING; http://string-db.org)[ online database was used to predict the PPI network information. Analyzing the interactions and functions between DMGs may provide information about the mechanisms of generation and development of disease (PPI score >0.4). Cytoscape (version 3.7.1) is a bioinformatics platform for constructing and visualizing molecular interaction networks.[ The plug-in Molecular Complex Detection (MCODE) of Cytoscape was applied to detect densely connected regions in PPI networks. The PPI networks were constructed using Cytoscape and the most significant module in the PPI networks was selected using MCODE. The criteria for selection were set as follows: Max depth = 100, degree cut-off = 2, Node score cut-off = 0.2, MCODE scores >5 and K-score = 2.
Hub genes selection and analysis
A network of the integrative relationships of the hub genes and their co-expression genes clinical characteristics in NPC was analyzed using cBioPortal for Cancer Genomics (http://www.cbioportal.org/),[ which is an open-access resource for analyzing and exploring genetic alterations from multidimensional studies samples. The analyses of genomic mutations in the selected TCGA datasets could be analyzed in the cBioPortal online according to the instructions.
Results
Identification of DMGs in NPC
After standardization of the microarray results, DEGs and DMGs were identified. The overlap among the 3 datasets contained 135 hypermethylation genes and 541 hypomethylation genes between NPC and non-NPC samples the results as shown in the Venn diagram (Fig. 1).
Figure 1
Abnormal methylation of expression genes identified in data sets including GSE64634 and GSE12452 for gene expression, and GSE62336 for gene methylation. (A) Highly expressed genes with low methylation. (B) Low expressed genes with high methylation.
Abnormal methylation of expression genes identified in data sets including GSE64634 and GSE12452 for gene expression, and GSE62336 for gene methylation. (A) Highly expressed genes with low methylation. (B) Low expressed genes with high methylation.
GO enrichment and KEGG analyses of DMGs
To further investigate the biological functions and mechanisms of the DMGs, functional and pathway enrichment analyses were performed using Enrichr tool. Hyper-LGs GO analysis results showed that changes in BPs of DMGs were significantly enriched in sensory organ morphogenesis, polyol biosynthetic process, oxygen homeostasis, among others. Changes in MF were mainly enriched in nucleoside kinase activity, DNA N-glycosylase activity, and so on. Changes in CC of DMGs were mainly enriched in the axonal growth cone, microtubule plus-end, ribonucleoprotein granule, and so on. KEGG pathway analysis revealed that the DMGs were mainly enriched in protein digestion and absorption, ECM-receptor interaction, PI3K-Akt signaling pathway, and so on (Fig. 2).
Figure 2
Gene ontology (GO) functional analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the Hyper-LGs. (A) Biological processes (BP). (B) Molecular functions (MF). (C) Cellular components (CC). (D) KEGG pathway.
Gene ontology (GO) functional analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the Hyper-LGs. (A) Biological processes (BP). (B) Molecular functions (MF). (C) Cellular components (CC). (D) KEGG pathway.Hypo-HGs GO analysis results showed that changes in BPs of DMGs were significantly enriched in cilium movement, flagellated sperm motility, axonemal dynein complex assembly, and so on. Changes in MF were mainly enriched in ATP-dependent microtubule motor activity, minus-end-directed, carnitine transmembrane transporter activity, and so on,. Changes in cell component (CC) of DMGs were mainly enriched in the dendrite, axonemal dynein complex, outer dynein arm, and so on. KEGG pathway analysis revealed that the DMGs were mainly enriched in drug metabolism-cytochrome P450, tyrosine metabolism, phenylalanine metabolism, and so on (Fig. 3).
Figure 3
Gene ontology (GO) functional analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the Hypo-HGs. (A) Biological processes (BP). (B) Molecular functions (MF). (C) Cellular components (CC). (D) KEGG pathway.
Gene ontology (GO) functional analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the Hypo-HGs. (A) Biological processes (BP). (B) Molecular functions (MF). (C) Cellular components (CC). (D) KEGG pathway.To further explore the connection between Hyper-LGs and Hypo-HGs at the protein level, the PPI networks were constructed based on the interactions of DMGs (Fig. 4) and the most significant module was obtained using Cytoscape (Fig. 5). A total of 157 interactions and 95 nodes in Hyper-LGs and a total of 964 interactions and 434 nodes in Hypo-HGs were screened to establish the PPI network and the biological functional analyses of genes involved in this most significant module were analyzed using Enrichr tool. The most significant module KEGG results and GO analysis results showed in Tables 1 and 2.
Figure 4
The protein–protein interaction (PPI) network of differentially methylated genes was constructed using Cytoscape. (A) hypermethylated, lowly expressed genes (Hyper-LGs). (B) hypomethylated, highly expressed genes (Hypo-HGs).
Figure 5
The protein–protein interaction (PPI) network of the most significant module was obtained using Cytoscape. (A) hypermethylated, lowly expressed genes (Hyper-LGs). (B) hypomethylated, highly expressed genes (Hypo-HGs).
Table 1
GO and KEGG pathway enrichment analysis of Hyper-LGs in the most significant module.
Table 2
GO and KEGG pathway enrichment analysis of Hypo-HGs in the most significant module.
The protein–protein interaction (PPI) network of differentially methylated genes was constructed using Cytoscape. (A) hypermethylated, lowly expressed genes (Hyper-LGs). (B) hypomethylated, highly expressed genes (Hypo-HGs).The protein–protein interaction (PPI) network of the most significant module was obtained using Cytoscape. (A) hypermethylated, lowly expressed genes (Hyper-LGs). (B) hypomethylated, highly expressed genes (Hypo-HGs).GO and KEGG pathway enrichment analysis of Hyper-LGs in the most significant module.GO and KEGG pathway enrichment analysis of Hypo-HGs in the most significant module.
Hub gene selection and analysis
A total of 4 genes (TROAP, PCOLCE2, HOXA4, and C1QB) in Hyper-LGs and 14 genes (DYNC1H1, LNX1, RAB37, ALDH3A1, SLC24A4, CP, CEP250, ANK2, DNAI2, MUC13, ACACB, GABRP, STX7, and TTC9) in Hypo-HGs were identified as hub genes with degrees ≥10. The full names, abbreviations, also known as, and functions for these hub genes are shown in Tables 3 and 4. A network of the hub genes and their co-expression genes were performed via cBioPortal online platform (Fig. 6).
Table 3
Functional roles of Hyper-HGs hub genes with degree ≥10.
Table 4
Functional roles of 14 Hypo-HGs hub genes with degree ≥10.
Figure 6
Interaction network and biological process analysis of the hub genes. Hub genes and their co-expression genes were analyzed using cBioPortal. Nodes with bold black outline represent hub genes. Nodes with thin black outline represent the co-expression genes. (A) hypermethylated, lowly expressed genes (Hyper-LGs). B hypomethylated, highly expressed genes (Hypo-HGs).
Functional roles of Hyper-HGs hub genes with degree ≥10.Functional roles of 14 Hypo-HGs hub genes with degree ≥10.Interaction network and biological process analysis of the hub genes. Hub genes and their co-expression genes were analyzed using cBioPortal. Nodes with bold black outline represent hub genes. Nodes with thin black outline represent the co-expression genes. (A) hypermethylated, lowly expressed genes (Hyper-LGs). B hypomethylated, highly expressed genes (Hypo-HGs).
Discussion
It is known that NPC located at the head of the head and neck cancer poses a great threat to human health, but there has been limited research on the pathogenesis of NPC. Tumor is a product of epigenetic, cumulative genetic, somatic, and endocrine aberrations.[ DNA methylation can alter the expression of genes and provided a novel idea to understand the pathogenesis of malignant tumor.[ DNA methylation aberrations are more common contrast to genomic aberrations in the cancer genome. The rapid development of high-throughput sequencing technologies and microarray has allowed us observed thousands of genes methylation levels simultaneously in the human genome, so we can investigate the key genes affected by methylation now. In our study, we identified 135 Hyper-LGs as well as 541 Hypo-HGs that may be involved in molecular regulation with the development of NPC.The DMGs GO analysis results showed that the main MFs of the Hyper-LGs were nucleoside kinase activity and DNA N-glycosylase activity indicating that hypermethylation mainly affected enzyme activity. In addition, the functions of the Hypo-HGs BPs of DMGs were significantly enriched in cilium movement, flagellated sperm motility, and axonemal dynein complex assembly suggesting that nasal mucociliary dysfunction may be an important reason for NPC development. Hyper-LGs KEGG pathway analysis revealed that the DMGs were mainly enriched in protein digestion and absorption, ECM-receptor interaction, PI3K-Akt signaling pathway suggesting that the disorder of regulation of various cellular functions such as cell proliferation, differentiation, apoptosis, and glucose transport could promote the evolution of tumors. However, Hypo-HGs KEGG pathway analysis revealed that the DMGs were mainly enriched in drug metabolism-cytochrome P450, tyrosine metabolism, phenylalanine metabolism prompting that hypomethylated more involvement in the regulation of metabolic pathways.The PPI network showed the functional connectivity of the hypermethylation-low expression genes, among which the hub genes were TROAP, PCOLCE2, HOXA4, and C1QB. TROAP is involved with trophinin and bystin in cell adhesion molecule, and the trophinin–cell-adhesion molecule complex mediates an initial attachment of the blastocyst to uterine epithelial cells.[ Some studies reported that TROAP expression not only enhances malignancy but also promotes tumor development in colorectal cancer, ovarian adenocarcinomas, breast cancer, bladder urothelial carcinoma, hepatocellular carcinoma, and other cancers.[ Thus, TROAP may be a potential biomarker may be used to predict cancer prognosis and sensitivity to cancer treatment. Procollagen C-proteinase enhancer 2 (PCOLCE2) protein confirmed as a differentially expressed epithelial transcript.[ PCOLCE2 at high levels in the adult heart and strong expression primarily in nonossified cartilage in developing tissues.[ HOXA4 is part of the A cluster on chromosome 7 and encodes a DNA-binding transcription factor, which may regulate gene expression, morphogenesis, and differentiation. Accumulated evidence has indicated the abnormal expression HOXA4 contributions to carcinogenesis.[ HOXA4 is reportedly overexpressed in epithelial ovarian cancer and colorectal cancer. Study further reported that HOXA4 suppresses migration via β1 integrin in ovarian cancer cell lines.[ C1QB deficiency is associated with glomerulonephritis and lupus erythematosus.The PPI network showed the functional connectivity of the selected hypomethylation-high expression genes, among which the first 5 hub genes were DYNC1H1, LNX1, RAB37, ALDH3A1, and SLC24A4. DYNC1H1 encodes a heavy chain of cytoplasmic dynein for axonal transport[ and involvement in development of motor neuron axon degenerative diseases.[ Further studies proved the association of CpG island methylation level in DYNC1H1 and spinal muscular atrophy severity was substantiated. LNX1 encodes a membrane-bound protein that is involved in protein interactions and signal transduction and may play an indispensable role in tumorogenesis. LNX1 not only interacts with and ubiquitinates c-Src kinase but also facilitates the endocytosis of junction adhesion molecule.[RAB37 belong to member RAS oncogene family that are critical regulators of exocytosis of secreted glycoproteins.[ Patients with preserved RAB37 protein expression were related to suppress cancer metastasis and better prognosis. ALDH3A1 are most abundant in stem cells as a cytoprotective aldehyde dehydrogenases family members, which protect cells from injury.[ SLC24A4 encodes a member of the potassium-dependent sodium/calcium exchanger protein family. SLC24A4 was shown to be closely related to the risk of late-onset Alzheimer disease[ and reveals that enamel maturation is dependent upon SLC24A4 function.[The study of DNA methylation and gene expression provides us a strong support as well as new comprehensive information of MDEGs to the revelation of NPCʼs complex pathogenesis. As for abnormal methylation regions often contributed to changes in gene expression, this study indirectly reflects the indispensable role of abnormal methylation in the pathogenesis of NPC. A total of 4 genes (TROAP, PCOLCE2, HOXA4, and C1QB) in Hyper-LGs and 14 genes (DYNC1H1, LNX1, RAB37, ALDH3A1, SLC24A4, CP, CEP250, ANK2, DNAI2, MUC13, ACACB, GABRP, STX7, and TTC9) in Hypo-HGs were selected as effective biomarkers by a series of the advanced data analysis. Moreover, a set of hub genes in network modules related to MDEGs were identified. These genes may have potential value as methylation-based biomarkers for the diagnosis and treatment of NPC. However, further studies are needed to elucidate the biological function of these genes in this type of cancer in the future.
Author contributions
W.Z.H. Z.T. designed and analyzed the research study; W.Z.H. and S.H.Y. wrote and revised the manuscript.Formal analysis: Zeng-hong Wu.Investigation: Zeng-hong Wu.Methodology: Zeng-hong Wu.Writing – original draft: Zeng-hong Wu.Writing – review & editing: Zeng-hong Wu.
Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock Journal: Nat Genet Date: 2000-05 Impact factor: 38.330
Authors: Lori E Dodd; Srikumar Sengupta; I-How Chen; Johan A den Boon; Yu-Juen Cheng; William Westra; Michael A Newton; Beth F Mittl; Lisa McShane; Chien-Jen Chen; Paul Ahlquist; Allan Hildesheim Journal: Cancer Epidemiol Biomarkers Prev Date: 2006-11 Impact factor: 4.254
Authors: Seema Bhatlekar; Sankar Addya; Moreh Salunek; Christopher R Orr; Saul Surrey; Steven McKenzie; Jeremy Z Fields; Bruce M Boman Journal: Stem Cells Dev Date: 2013-11-05 Impact factor: 3.272
Authors: N Suzuki; J Zara; T Sato; E Ong; N Bakhiet; R G Oshima; K L Watson; M N Fukuda Journal: Proc Natl Acad Sci U S A Date: 1998-04-28 Impact factor: 11.205
Authors: Majid Hafezparast; Rainer Klocke; Christiana Ruhrberg; Andreas Marquardt; Azlina Ahmad-Annuar; Samantha Bowen; Giovanna Lalli; Abi S Witherden; Holger Hummerich; Sharon Nicholson; P Jeffrey Morgan; Ravi Oozageer; John V Priestley; Sharon Averill; Von R King; Simon Ball; Jo Peters; Takashi Toda; Ayumu Yamamoto; Yasushi Hiraoka; Martin Augustin; Dirk Korthaus; Sigrid Wattler; Philipp Wabnitz; Carmen Dickneite; Stefan Lampel; Florian Boehme; Gisela Peraus; Andreas Popp; Martina Rudelius; Juergen Schlegel; Helmut Fuchs; Martin Hrabe de Angelis; Giampietro Schiavo; David T Shima; Andreas P Russ; Gabriele Stumm; Joanne E Martin; Elizabeth M C Fisher Journal: Science Date: 2003-05-02 Impact factor: 47.728