Ji-Zhou Zhang1,2, Zeng-Hong Wu3, Qing Cheng3. 1. Graduate School, Nanjing University of Chinese Medicine, Nanjing. 2. Oncology Department, Wenzhou Traditional Chinese Medicine affiliated to Zhejiang Chinese Medicine University, Wenzhou. 3. Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China.
Abstract
As for the lack of simple and effective diagnostic methods at the early of the nasopharyngeal carcinoma (NPC), the mortality rate of NPC still remains high. Therefore, it is meaningful to explore the precise molecular mechanisms involved in the proliferation, carcinogenesis, and recurrence of NPC and thus find an effective diagnostic way and make a better therapeutic strategy.Three gene expression data sets (GSE64634, GSE53819, and GSE12452) were downloaded from Gene Expression Omnibus (GEO) and analyzed using the online tool GEO2R to identify differentially expressed genes (DEGs). Gene ontology functional analysis and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis of the DEGs were performed in Database for Annotation, Visualization and Integrated Discovery. The Search Tool for the Retrieval of Interacting Genes database was used to evaluate the interactions of DEGs and to construct a protein-protein interaction network using Cytoscape software. Hub genes were validated with the cBioPortal database.The overlap among the 3 data sets contained 306 genes were identified to be differentially expressed between NPC and non-NPC samples. A total of 13 genes (DNAAF1, PARPBP, TTC18, GSTA3, RCN1, MUC5AC, POU2AF1, FAM83B, SLC22A16, SPEF2, ERICH3, CCDC81, and IL33) were identified as hub genes with degrees ≥10.The present study was attempted to identify and functionally analyze the DEGs that may be involved in the carcinogenesis or progression of NPC by using comprehensive bioinformatics analyses and unveiled a series of hub genes and pathways. A total of 306 DEGs and 13 hub genes were identified and may be regarded as diagnostic biomarkers for NPC. However, more experimental studies are needed to carried out elucidate the biologic function of these genes results for NPC.
As for the lack of simple and effective diagnostic methods at the early of the nasopharyngeal carcinoma (NPC), the mortality rate of NPC still remains high. Therefore, it is meaningful to explore the precise molecular mechanisms involved in the proliferation, carcinogenesis, and recurrence of NPC and thus find an effective diagnostic way and make a better therapeutic strategy.Three gene expression data sets (GSE64634, GSE53819, and GSE12452) were downloaded from Gene Expression Omnibus (GEO) and analyzed using the online tool GEO2R to identify differentially expressed genes (DEGs). Gene ontology functional analysis and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis of the DEGs were performed in Database for Annotation, Visualization and Integrated Discovery. The Search Tool for the Retrieval of Interacting Genes database was used to evaluate the interactions of DEGs and to construct a protein-protein interaction network using Cytoscape software. Hub genes were validated with the cBioPortal database.The overlap among the 3 data sets contained 306 genes were identified to be differentially expressed between NPC and non-NPC samples. A total of 13 genes (DNAAF1, PARPBP, TTC18, GSTA3, RCN1, MUC5AC, POU2AF1, FAM83B, SLC22A16, SPEF2, ERICH3, CCDC81, and IL33) were identified as hub genes with degrees ≥10.The present study was attempted to identify and functionally analyze the DEGs that may be involved in the carcinogenesis or progression of NPC by using comprehensive bioinformatics analyses and unveiled a series of hub genes and pathways. A total of 306 DEGs and 13 hub genes were identified and may be regarded as diagnostic biomarkers for NPC. However, more experimental studies are needed to carried out elucidate the biologic function of these genes results for NPC.
Nasopharyngeal carcinoma (NPC) has remained high in endemic regions and is the most common malignant tumor in southern China and South East Asia.[ It is primarily a malignant tumor derived from nasopharyngeal epithelium located in the upper part of the nasopharyngeal cavity and on the side wall, with a strong tendency to metastasize.[ Its incidence very high in head and neck tumors approximately 0.2 to 0.5 cases per 100,000 people.[ The main clinical manifestations of the patient are nasal congestion, blood stasis, ear blockage, hearing loss, and vision. Things like ghosts and headaches and other symptoms.[ Diagnosing the disease in the early needs a high index of clinical acumen and confirmation is only dependent on histology.[ The potential highly risk factors for NPC include Epstein–Barr virus (EBV) infection,[ alcohol consumption, exposure to dust, formaldehyde, the function of genetic factors, and cigarette smoking.[ EBV infection is found in 90% to 100% of NPC cases in endemic regions.[ EBV is associated with multiple types of human cancer, such as Burkitt lymphoma and Hodgkin disease, while in Asia it is closely association with NPC. Accumulating evidence has demonstrated that abnormal expression and mutation of genes are involved in the carcinogenesis and progression of NPC, including glutathione S-transferase A1 (GSTA1), NGX6, COX-2, as well as mutations of tumor-suppressor genes. Recent genomic study of NPC has found dysregulated nuclear factor kappa B (NF-κB) signaling in NPC as well as multiple somatic mutations in the upstream negative regulators of NF-κB signaling.[ Ye et al[ reported that RASSF1A promoter methylation may be used for clinical diagnosis of nasopharyngeal carcinoma. Chen et al[ used immunohistochemistry to detect the expression of p53R2 in 201 patients with NPC and find p53R2 was positively expressed in 92.5% (186/201) of NPC tissue with a high expression rate of 38.3% (77/201). Multivariate analysis of Cox model showed that IGFBP6 is an independent prognostic biomarker for recurrence and distant metastasis.[ Peng et al[ reported that the chronic stimulation of COX-2 plays a key role in the neoplastic conversion and development of NPC. However, as for the lack of simple and effective diagnostic methods at the early of the disease, the mortality rate of NPC still remains high. Therefore, it is meaningful to explore the precise molecular mechanisms involved in the proliferation, carcinogenesis, and recurrence of NPC and thus find an effective diagnostic way and make a better therapeutic strategy.In this larger data age, microarray technology and bioinformatics analysis have been widely used to throughput and simultaneously detects thousands of genes at the genome level. Microarray gene expression dazzling features are integrated, automated, miniaturized[ which have helped us identify the differentially expressed genes (DEGs) and functional pathways involved in the carcinogenesis and progression of NPC. A single microarray analysis cannot obtain reliable results. Thus, we downloaded 3 mRNA microarray data sets from Gene Expression Omnibus (GEO) and analyzed to get DEGs between nasopharyngeal carcinoma tissues and noncancerous tissues. Subsequently, gene ontology (GO), protein–protein interaction (PPI) network analyses and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed to make us clearness the molecular mechanisms underlying carcinogenesis and progression. In conclusion, a total of 306 DEGs and 13 hub genes were identified the 13 hub genes present in front of us were DNAAF1, PARPBP, TTC18, GSTA3, RCN1, MUC5AC, POU2AF1, FAM83B, SLC22A16, SPEF2, ERICH3, CCDC81, and IL33.
Materials and methods
Data resources
The GEO (http://www.ncbi.nlm.nih.gov/geo)[ is a public functional genomics data repository which included throughout gene expression data, chips, and microarrays. Three gene expression data sets (GSE64634,[ GSE53819, and GSE12452[) were downloaded from GEO (GPL6480, Agilent-014850 Whole Human Genome Microarray 4x44K G4112F and Affymetrix GPL570 platform, Affymetrix Human Genome U133 Plus 2.0 Array). Observing the download of GSE64634 database included 14 NPC samples and 4 normal samples, GSE53819 data set contained 18 NPC tissue samples and 18 noncancerous samples, GSE12452 contained 31 NPC samples and 10 noncancerous samples. Ethical approval was not necessary for this study because our study is bioinformatic analysis.
Identification of DEGs
The identification of DEGs between NPC and noncancerous samples was performed using GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r). GEO2R is an online tool designed that allows users to compare different data sets in a GEO series for identify DEGs across experimental conditions. To correct the discovery of statistically significant genes and limitations of false-positives, we using the adjusted P-value and Benjamini and Hochberg false discovery rate method. A |log2FC| > 2 and P-value < .01 were used as considered statistically significant.
KEGG and GO enrichment analyses of DEGs
The Database for Annotation, Visualization and Integrated Discovery (DAVID; http://david.ncifcrf.gov) (version 6.8)[ which is a useful online platform database that integrates biologic data and provides a comprehensive set of functional annotation information of genes as well as proteins for users to analyze the functions or signaling pathways. The Kyoto Encyclopedia of Genes and Genomes (KEGG)[ is a database resource for understanding high-level gene functions and linking genomic information from large-scale molecular data sets. GO[ function analysis (biologic processes [BPs], cellular components [CCs], and molecular functions [MFs]) is a powerful bioinformatics tool to analyze BP and annotate genes. To analyze the function of the identified DEGs, biologic analyses were performed using GO enrichment and KEGG pathway analysis via DAVID online database. P < .05 as the cutoff criterion considered statistically significant.
PPI network construction and module analysis
Search Tool for the Retrieval of Interacting Genes (STRING; http://string-db.org)[ online database was using to predicted the PPI network information. Analyzing the interactions and functions between DEGs may provide information about the mechanisms of generation and development of disease (PPI score > 0.4). Cytoscape (version 3.7.1) is a bioinformatics platform for constructing and visualizing molecular interaction networks.[ The plug-in Molecular Complex Detection (MCODE) of Cytoscape was applied to detect densely connected regions in PPI networks. The PPI networks were constructed using Cytoscape and the most significant module in the PPI networks was selected using MCODE. The criteria for selection were set as follows: Max depth = 100, degree cut-off = 2, Node score cut-off = 0.2, MCODE scores >5, and K-score = 2.
Hub genes selection and analysis
A network of the integrative relationships of the hub genes and their coexpression genes clinical characteristics in NPC was analyzed using cBioPortal for Cancer Genomics (http://www.cbioportal.org/),[ which is an open-access resource for analyzing and exploring genetic alterations from multidimensional studies samples. The analyses of genomic mutations in the selected TCGA data sets could be analyzed in the cBioPortal online according to the instructions.
Results
Identification of DEGs in NPC
After standardization of the microarray results, DEGs (1099 in GSE64634, 2377 in GSE5381, and 1344 in GSE12452) were identified. The overlap among the 3 data sets contained 306 genes were identified to be differentially expressed between NPC and non-NPC samples with the threshold of P < .01 and a minimal 2-fold change of expression as shown in the Venn diagram (Fig. 1A), consisting of 280 downregulated genes and 26 upregulated genes between nasopharyngeal carcinoma tissues and noncancerous tissues. The volcano plots and the heatmap of top 50 DEGs are shown in Figures 2 and 3.
Figure 1
Venn diagram and the most significant module of differentially expressed genes (DEGs). (A) DEGs were selected with a fold change >2 and P-value < .01 among the mRNA expression profiling sets GSE64634, GSE53819, and GSE12452. The 3 data sets showed an overlap of 306 genes. (B) The most significant module was obtained from protein–protein interaction network with 21 nodes and 153 edges.
Figure 2
The volcano plots of differentially expressed genes in GSE64634, GSE53819, and GSE12452.
Figure 3
The heatmap of top 50 differentially expressed differentially expressed genes in GSE64634, GSE53819, and GSE12452.
Venn diagram and the most significant module of differentially expressed genes (DEGs). (A) DEGs were selected with a fold change >2 and P-value < .01 among the mRNA expression profiling sets GSE64634, GSE53819, and GSE12452. The 3 data sets showed an overlap of 306 genes. (B) The most significant module was obtained from protein–protein interaction network with 21 nodes and 153 edges.The volcano plots of differentially expressed genes in GSE64634, GSE53819, and GSE12452.The heatmap of top 50 differentially expressed differentially expressed genes in GSE64634, GSE53819, and GSE12452.
GO enrichment and KEGG analyses of DEGs
To further investigate the biologic functions and mechanisms of the DEGs, functional and pathway enrichment analyses were performed using DAVID tool. GO analysis results showed that changes in BPs of DEGs were significantly enriched in axoneme assembly, cilium organization, microtubule bundle formation, cilium assembly, and cilium movement (Table 1). Changes in MF were mainly enriched in microtubule motor activity, motor activity, alcohol dehydrogenase activity, zinc-dependent, alcohol dehydrogenase (NAD) activity, and epidermal growth factor receptor binding (Table 1). Changes in CC of DEGs were mainly enriched in the cilium, ciliary part, ciliary plasm, axoneme, and axoneme part (Table 1). KEGG pathway analysis revealed that the DEGs were mainly enriched in drug metabolism, metabolism of xenobiotic by cytochrome P450, chemical carcinogenesis, tyrosine metabolism, Huntington disease, fatty acid degradation, and adenine ribonucleotide biosynthesis.
Table 1
GO and KEGG pathway enrichment analysis of differentially expressed genes in NPC samples.
GO and KEGG pathway enrichment analysis of differentially expressed genes in NPC samples.To further explore the connection between DEGs at the protein level, the PPI networks were constructed based on the interactions of DEGs (Fig. 4) and the most significant module was obtained using Cytoscape (Fig. 1B). A total of 678 interactions and 225 nodes were screened to establish the PPI network and the biologic functional analyses of genes involved in this most significant module were analyzed using DAVID. KEGG results showed that genes in this module were mainly enriched in Huntington disease, and GO analysis results are summarized in Table 2.
Figure 4
The protein–protein interaction network of differentially expressed genes was constructed using Cytoscape. Upregulated genes are marked in light red; downregulated genes are marked in light blue.
Table 2
GO and KEGG pathway enrichment analysis of differentially expressed genes in the most significant module.
The protein–protein interaction network of differentially expressed genes was constructed using Cytoscape. Upregulated genes are marked in light red; downregulated genes are marked in light blue.GO and KEGG pathway enrichment analysis of differentially expressed genes in the most significant module.
Hub gene selection and analysis
A total of 13 genes (DNAAF1, PARPBP, TTC18, GSTA3, RCN1, MUC5AC, POU2AF1, FAM83B, SLC22A16, SPEF2, ERICH3, CCDC81, and IL33) were identified as hub genes with degrees ≥10. The full names, abbreviations, also known as, and functions for these 13 hub genes are shown in Table 3. A network of the hub genes and their coexpression genes were performed via cBioPortal online platform (Fig. 5).
Table 3
Functional roles of 13 hub genes with degree ≥10.
Figure 5
Interaction network and biologic process analysis of the hub genes. Hub genes and their coexpression genes were analyzed using cBioPortal. Nodes with bold black outline represent hub genes. Nodes with thin black outline represent the coexpression genes.
Functional roles of 13 hub genes with degree ≥10.Interaction network and biologic process analysis of the hub genes. Hub genes and their coexpression genes were analyzed using cBioPortal. Nodes with bold black outline represent hub genes. Nodes with thin black outline represent the coexpression genes.
Discussion
More than 95% of nasopharyngeal carcinomas belong to poorly differentiated and undifferentiated cancer types. They have high malignancy and fast growth and are prone to lymph nodes or blood metastasis. There are currently no clinically specific molecular markers for nasopharyngeal carcinoma. Therefore, it is necessary to elucidate the detailed molecular mechanisms that are independently associated with tumor prognosis and invasiveness. With the development of high-throughput sequencing technology and microarrays technology, the profiling of DEGs closely link to NPC has become more common worldwide, providing a novel and effective way to predict the promising potential diagnostic and therapeutic targets in preventing and treating NPC. In this study, the 3 gene expression data sets were extracted from GEO, and a total of 306 DEGs between NPC and normal tissues were screened. GO enrichment and KEGG analyses showed that these DEGs were robustly related to various biologic functions most of which are closely to the progression and genesis of cancer. Moreover, coexpression genes was performed via cBioPortal platform and a PPI network with the DEGs was constructed, and 13 hub genes, including DNAAF1, PARPBP, TTC18, GSTA3, RCN1, MUC5AC, POU2AF1, FAM83B, SLC22A16, SPEF2, ERICH3, CCDC81, and IL33, were identified as the key genes in NPC.The protein encoded by DNAAF1 gene is cilium specific and involved in the regulation of microtubule-based cilia and actin-based brush border microvilli, DNAAF1 mutations can cause primary ciliary dyskinesia.[ DNAAF1 was initially identified as a dynein assembly factor, with mutations causing reduced ciliary beat frequency and a block in outer dynein arm assembly.[ Mutations in this gene may suggest a disorder in nasal ciliary motor function in patients with nasopharyngeal carcinoma. Thus, DNAAF1 may be a potential biomarker may be used to predict cancer prognosis and sensitivity to cancer treatment. PARPBP which is highly expressed in nonsmall cell lung cancer.[ PARPBP can encode DNA repair enzymes to protect DNA against anticancer drug damage. Therefore, PARP-1 inhibitors have become a promising approach for developing chemosensitizers in cancers.[ GSTA3 gene encodes a glutathione S-tranferase (GST) belonging to the alpha class genes that are located in a cluster mapped to chromosome 6. Genes of the alpha class are highly related and encode enzymes with glutathione peroxidase activity. The enzyme encoded by this gene catalyzes the double bond isomerization of precursors for progesterone and testosterone during the biosynthesis of steroid hormones.The GSTs are phase II detoxification enzymes that may have evolved in response to changes of environmental substrates.[ Thus GSTA3 may be plays an important role in the occurrence and development of NPC. RCN1 is a calcium-binding protein located in the lumen of the ER. This protein localizes to the plasma membrane in human prostate cancer cell lines. It is speculated that the RCN1 has played the function in Ca2+-dependent cell adhesion,[ as dysregulation of RCN1 protein resulting in multifarious diseases, such as cancer. Reports indicate that RCN1 was found in multiple tumorous types, including kidney cancer, breast cancer, liver cancer, colorectal cancer, and so on.[ These findings remind us that overexpression of RCN1 might contribute to the tumorigenesis and tumor progression. MUC5AC expressed in human normal airways and is significantly upregulated in the sinus mucosa of patients with chronic rhinosinusitis.[ Ye et al demonstrates that autophagy is essential for activation of the JNK-AP-1 signaling pathway, and this subsequently promotes MUC5AC production.[ POU2AF1 a known B-cell transcriptional coactivator[ and is regulated by CD40-L, a continuous stimulation may resulting in an overexpression of this gene on B cells.[ FAM83B has also been reported significantly upregulated in breast cancer,[ gastric cancer,[ pancreatic ductal adenocarcinoma,[ and lung squamous cell carcinoma.[ SLC22A16 belongs to transporter protein family which transports carnitine and successful treatment has been related to the level of activity of this transporter in tumor cells.[ SPEF2 is essential for motile cilia, and lack of SPEF2 function causes primary ciliary dyskinesia.[ ERICH3 are related to plasma serotonin concentrations and dysfunction may lead to selective serotonin reuptake inhibitor.[ CCDC81 may as a potential cargo-binding protein in conjunction with Dynein-VII.[ IL33 is involved in the maturation of Th2 cells and the activation of basophils, mast cells, eosinophils, and natural killer cells.[ TTC18 also known as CFAP70, a novel cilia-related gene and may play important in NPC.[Although similar studies have been explored before,[ but we 1st used a more stringent screening criteria and to obtain more reliable results, we 1st downloaded 3 mRNA microarray data sets from GEO database. While our study may exit some limitations, such as due to insufficient data, we cannot evaluate the correlations between the hub genes and survival prognosis. But a network of the hub genes and their coexpression genes was performed using cBioPortal platform.
Conclusion
The present study was attempted to identify and functionally analyze the DEGs that may be involved in the carcinogenesis or progression of NPC by using comprehensive bioinformatics analyses and unveiled a series of hub genes and pathways. A total of 306 DEGs and 13 hub genes were identified and may be regarded as diagnostic biomarkers for NPC. However, more experimental studies are needed to carried out elucidate the biologic function of these genes results for NPC.
Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock Journal: Nat Genet Date: 2000-05 Impact factor: 38.330
Authors: Lori E Dodd; Srikumar Sengupta; I-How Chen; Johan A den Boon; Yu-Juen Cheng; William Westra; Michael A Newton; Beth F Mittl; Lisa McShane; Chien-Jen Chen; Paul Ahlquist; Allan Hildesheim Journal: Cancer Epidemiol Biomarkers Prev Date: 2006-11 Impact factor: 4.254
Authors: Mari S Lehti; Henna Henriksson; Petri Rummukainen; Fan Wang; Liina Uusitalo-Kylmälä; Riku Kiviranta; Terhi J Heino; Noora Kotaja; Anu Sironen Journal: Sci Rep Date: 2018-01-16 Impact factor: 4.379