Literature DB >> 26238040

Functional analysis of the nasopharyngeal carcinoma primary tumor‑associated gene interaction network.

Fengwei An¹, Zhiqiang Zhang², Ming Xia³.

Abstract

The aim of the present study was to investigate the molecular mechanism of nasopharyngeal carcinoma (NPC) primary tumor development through the identification of key genes using bioinformatics approaches. Using the GSE53819 microarray dataset, acquired from the Gene Expression Omnibus database, differentially expressed genes (DEGs) were screened out between NPC primary tumor and control samples, followed by hierarchical clustering analysis. The Search Tool for the Retrieval of Interacting Genes database was utilized to build a protein‑protein interaction network to identify key node proteins. In total, 1,067 DEGs, including 326 upregulated genes and 741 downregulated genes, were identified between the NPC and control samples. The results of the hierarchical clustering analysis demonstrated that 95% of the DEGs were sample‑specific. Furthermore, PDZ binding kinase (PBK), centromere protein F (CENPF), actin‑binding protein anillin (ANLN), exonuclease 1 (EXO1) and chromosome 15 open reading frame 42 (C15ORF42) were included in the obtained network module, which was closely associated with the cell cycle and nucleic acid metabolic process GO functions. The results of the present study revealed that EXO1, CENPF, ANLN, PBK and C15ORF42 may be involved in the mechanism of NPC via modulating the cell cycle and nucleic acid metabolic processes, and may serve as molecular biomarkers for the diagnosis of this disease.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2015 PMID： 26238040 PMCID： PMC4581807 DOI： 10.3892/mmr.2015.4090

Source DB: PubMed Journal: Mol Med Rep ISSN： 1791-2997 Impact factor: 2.952

Introduction

The primary tumor or nasopharyngeal carcinoma (NPC) is a complicated malignant disease, originating from the epithelial cells located in the nasopharynx. There is markedly higher incidence of NPC in East Asia and Africa, compared with other regions of the world (1). The disease is attributed to multiple causative factors. One of the key risk factors identified is the Epstein-Barr (EB) viral infection (2,3). In addition, environmental effects and hereditary susceptibility contribute to the disease (4). The poor outcome of NPC treatment is attributed to the deficiency of effective therapeutic approaches and medicines, the complex structure of the nasopharynx, nonspecific clinical features, the difficulty of early diagnosis and variations in tumor histological types and differentiation (5,6). Therefore, there is an urgent requirement to identify specific molecular biomarkers for the early diagnosis of NPC. It has been previously reported in Central and Southern China, that the miRNA-146a gene polymorphism is associated with the incidence of NPC (7). Additionally, EB virus-encoded microRNA has been reported to have an active role in NPC via modulating E-cadherin (8). It has been established that biological activities are performed by numerous interactions among proteins, DNA, RNA and other small molecules (9). Biological functions are achieved by a complex interaction network constructed by several functional units (10). Therefore, bioinformatics approaches have been widely used to investigate the associations among biological molecules, thus elucidating the complex mechanisms of disease (11). In addition, increasing studies have revealed that the roles of node proteins in the biological network topology are closely associated with their importance in cellular function, and networks with distinct topological features exhibit varying degrees of robustness in response to external environmental effects and internal conflicts (12,13). Consequently, the aims of topology-based investigations of biological networks are to investigate the association of critical nodes in the network, thus assisting in the understanding of the interactive topology and complex functions in cells. This provides valuable information for the diagnosis and treatment of disease, and designing novel drugs (14). The present study aimed to investigate the molecular mechanism underlying NPC, by screening for the differentially expressed genes (DEGs) between NPC primary tumor and control samples, followed by hierarchical clustering analysis. The subsequent construction of a protein-protein interaction (PPI) network aimed to select hub proteins and perform network module analysis. The present study contributed to an enhanced understanding of the molecular mechanism of NPC and provided a basis for treating the disease.

Materials and methods

Microarray data preprocessing and DEG screening

The GSE53819 microarray dataset was downloaded from the Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo/), which is the largest open database of gene expression data (15). The data set used in the present study consisted of 18 samples of NPC primary tumor tissue and 18 control samples of normal nasopharyngeal tissue, based on the GPL6480 Agilent-014850 Whole Human Genome Microarray 4×44 K G4112F platform (Agilent Technologies, Inc., Santa Clara, CA, USA). According to the platform, all probe numbers in the microarray data were mapped to their corresponding gene names. Regarding the genes corresponding to several probes, the average expression values of these probes were calculated to determine the expression value of the gene. Subsequently, the skewed distribution of data was converted into a normal distribution using a log 2 transformation, followed by normalization using the Median method (16). The Linear Models for Microarray Analysis package (http://www.bioconductor.org/packages/release/bioc/html/limma.html) (17) in R language was used to screen for the DEGs between the NPC and control tissue samples. Multiple testing correction (18) was also performed using the Benjamini-Hochberg method (19). |Log fold change|>1 and false discovery rate <0.05 were set as the strict cutoffs for DEG identification.

Hierarchical clustering analysis

Two-way hierarchical clustering analysis was performed for the identified DEGs using the pheatmap package in R language (http://cran.fhcrc.org/web/packages/pheatmap/index.html) (20). The clustering analysis grouped together genes with similarities in expression patterns, evaluating whether these DEGs were sample-specific. The clustering result of the DEGs enabled assessment of the sample type. The result was displayed as a heatmap.

PPI network construction and hub protein analysis

It has been established that the majority of biological networks are scale-free networks, in which only a minority of nodes possess a large number of links, while the majority of nodes have few links (21). Nodes which are connected to most of the proteins are defined as hub proteins and are the key in the network. To identify the hub proteins in the present study, the Search Tool for the Retrieval of Interacting Genes (22) online database (http://string-db.org/) was used to construct a PPI network using the proteins encoded by the DEGs. The path lengths of the nodes in the network were calculated to determine that the constructed network was scale-free. Subsequently, the degrees of the nodes corresponding to the links of the node protein were calculated, in order to screen for the hub proteins with the highest degrees.

Network modules analysis

Single proteins usually function via interactions with other proteins, rather than acting alone (23). Given that proteins in the same module are likely to perform similar functions, network modules with a degree ≥2 and K-core ≥2 were obtained using the Mcode plugin (24) from Cytoscape (www.cytoscape.org/) (25), which is software for network visualization and analysis. Gene ontology (GO) (26) functional enrichment analysis was performed for these obtained modules using the Bingo plugin (27) of Cytoscape. An adjusted P-value <0.05 was set as the threshold.

Results

DEG screening and hierarchical clustering analysis

A total of 1,067 DEGs were screened between the NPC and control samples, including 326 upregulated genes and 741 down-regulated genes. The heatmap demonstrated that 95% of the DEGs were sample-specific (Fig. 1).

Figure 1

Heatmap from hierarchical clustering analysis. Changes in color between blue and orange indicate the progression of expression values of the differentially expressed genes between downregulation and upregulation, respectively. X-axis, sample name; Y-axis, fold change of the expression values of differentially expressed genes.

Analysis of hub proteins in the PPI network

In the PPI network (Fig. 2), 239 pairs of interactions among proteins were identified, in which 168 DEGs were involved. As shown in Fig. 3, the path lengths of the nodes in the network varied, ranging between one and nine, with the highest frequency at two, revealing that the network was scale-free. The degrees of the nodes are shown in Fig. 4. The top 10 node genes were sorted by degree in descending order (Table I). Among these 10 genes, membrane-spanning 4-domains, subfamily A, member 1 had the highest degree, with a degree of 13.

Figure 2

Protein-protein interaction network. Green nodes represent downregulated DEGs; pink nodes represent upregulated DEGs. Blue lines indicate the interaction between two proteins. DEGs, differentially expressed genes.

Figure 3

Analysis of the path lengths of the nodes in the protein-protein interaction network.

Figure 4

Analysis of the degrees of the nodes in the protein-protein interaction network.

Table I

Top 10 node genes sorted in descending order of degree.

Gene	Path length	Degree
MS4A1	3.16	13
PBK	2.05	10
CENPF	2.05	10
ANLN	2.05	10
DTL	2.10	9
EXO1	2.10	9
CD79B	3.82	8
C15ORF42	2.65	8
IGF2BP3	1.80	8

MS4A1, membrane-spanning 4-domains, subfamily A, member 1; PBK, PBZ binding kinase; CENPF, centromere protein F; ANLN, anillin; DTL, denticleless protein homolog; EXO1, exonuclease 1; C15ORF42, chromosome 15 open reading frame 42; insulin-like growth factor 2 mRNA-binding protein 3.

Network module analysis

As shown in Fig. 5, a network module including six genes exhibiting high degrees was obtained. The six genes involved were PDZ binding kinase (PBK), centromere protein F (CENPF), anillin (ANLN), denticleless protein homolog (DTL), exonuclease 1 (EXO1) and chromosome 15 open reading frame 42 (C15ORF42). The results of the GO functional analysis revealed that the network module was closely associated with the cell cycle and nucleic acid metabolic process (Table II), which were enriched in five of the genes exhibiting high degrees: EXO1, CENPF, ANLN, åPBK and C15ORF42. Of these five genes, PBK exhibited the highest degree (10).

Figure 5

Network module obtained from the protein-protein interaction network. Green nodes represent downregulated DEGs; red nodes represent upregulated DEGs. Blue lines indicate the interaction between two proteins. DEGs, differentially expressed genes.

Table II

GO functional enrichment analysis of network modules.

GO ID	P-value	Adjusted P-value	Description
7049	7.30×10⁻⁷	2.38×10⁻⁴	Cell cycle
51726	1.49×10⁻⁶	2.43×10⁻⁴	Regulation of cell cycle
90304	4.73×10⁻⁴	6.75×10⁻³	Nucleic acid metabolic process
6139	1.22×10⁻³	1.17×10⁻²	Nucleobase, nucleoside, nucleotide and nucleic acid metabolic process
34641	2.46×10⁻³	1.82×10⁻²	Cellular nitrogen compound metabolic process
6807	3.16×10⁻³	2.15×10⁻²	Nitrogen compound metabolic process
44260	3.79×10⁻³	2.32×10⁻²	Cellular macromolecule metabolic process
16043	6.37×10⁻³	2.91×10⁻²	Cellular component organization
43170	7.92×10⁻³	3.31×10⁻²	Macromolecule metabolic process

GO, gene ontology.

Discussion

NPC is an endemic malignant tumor in Southern China. The present study identified 1,067 DEGs between NPC and control samples. These DEGs were revealed to be sample-specific by hierarchical clustering analysis. The constructed PPI network was confirmed to be scale-free and its hub proteins were analyzed. The results of network module analysis demonstrated that the obtained network module was associated with the cell cycle and nucleic acid metabolic process, in which EXO1, CENPF, ANLN, PBK and C15ORF42 were enriched DEGs with high degrees. These DEGs were downregulated, with the exception of C15ORF42. In agreement with the results of the present study, increasing studies have reported that cell cycle function is closely associated with the initiation and progression of NPC (28,29). EXO1, one of the five DEGs identified, is an enzyme encoded by the EXO1 gene, which is involved in DNA repair and homologous recombination (30). It has been reported that genes associated with DNA repair are involved in the molecular mechanism underlying NPC (31). Similarly, the present study found that EXO1, the critical node protein in PPI network, was associated with the nucleic acid metabolic process, suggesting that EXO1 may be critically involved in the mechanism of NPC via regulating the nucleic acid metabolic process. CENPF, a member of the centromere protein family, is involved in the formation of the nuclear matrix during the G2 phase of the cell cycle and is involved in mitosis (32). Significant upregulation of CENPF has been previously reported in NPC cells, relative to normal nasopharyngeal cells, thus CENPF may be a molecular biomarker for the progression of NPC (33). Centromere protein H is also considered as a prognostic marker for the progression of NPC (34). In agreement with these reports, the present study demonstrated that CENPF was a critical node protein, exhibiting a high degree in the network module, indicating its importance in the mechanism of NPC. Anillin, encoded by the ANLN gene, is a scaffolding actin-binding protein, which is involved in cytokinesis via connecting RhoA, actin and myosin (35). It has been demonstrated that anillin is upregulated in lung carcinogenesis, which may serve as a prognostic indicator for this disease (36). By contrast, the results of the present study suggested an undetermined role of anillin, which was downregulated in NPC. In addition, lymphokine-activated killer T-cell-originated protein kinase (TOPK), which is encoded by the PBK gene, is a serine/threonine kinase associated with mitogen-activated protein kinase kinase (37). TOPK has been previously identified to promote proliferation of breast tumor cells via p38 mitogen activated protein kinase activity (38). In addition, high expression levels of TOPK have been observed in melanoma cells (39). However, PBK was downregulated in the present study. These conflicting results may be a result of discrepancies between the experimental models and the samples. The function of the C15ORF42 gene remains to be fully elucidated. It has been reported that another member of the same family, C16ORF13, is overexpressed in gastric cancer tissues, although the precise function of C15ORF42 remains elusive. In the present study, C15ORF42 was identified to be an upregulated node protein with a high degree in NPC, indicating a potentially critical role of C15ORF42 in the tumorigenesis of NPC for the first time, to the best of our knowledge. In conclusion, the present study identified critical node proteins exhibiting close interactions with other proteins in the network module, including EXO1, CENPF, ANLN, PBK and C15ORF42. These proteins may be involved in the tumori-genesis of NPC via modulating the cell cycle and nucleic acid metabolic process, and may be used as molecular biomarkers for the early diagnosis of NPC. The results of the present study assist in further understanding of the tumorigenesis of NPC, and provide potential targets for developing effective therapeutic treatment strategies for this disease.

35 in total

1. Discovering regulatory and signalling circuits in molecular interaction networks.

Authors: Trey Ideker; Owen Ozier; Benno Schwikowski; Andrew F Siegel
Journal: Bioinformatics Date: 2002 Impact factor: 6.937

Review 2. The efficiency of multi-target drugs: the network approach might help drug design.

Authors: Péter Csermely; Vilmos Agoston; Sándor Pongor
Journal: Trends Pharmacol Sci Date: 2005-04 Impact factor: 14.819

3. Centromere protein H is a novel prognostic marker for nasopharyngeal carcinoma progression and overall patient survival.

Authors: Wen-Ting Liao; Li-Bing Song; Hui-Zhong Zhang; Xing Zhang; Ling Zhang; Wan-Li Liu; Yan Feng; Bao-Hong Guo; Hai-Qiang Mai; Su-Mei Cao; Man-Zhi Li; Hai-De Qin; Yi-Xin Zeng; Mu-Sheng Zeng
Journal: Clin Cancer Res Date: 2007-01-15 Impact factor: 12.531

4. Anillin is a scaffold protein that links RhoA, actin, and myosin during cytokinesis.

Authors: Alisa J Piekny; Michael Glotzer
Journal: Curr Biol Date: 2007-12-27 Impact factor: 10.834

5. ANLN plays a critical role in human lung carcinogenesis through the activation of RHOA and by involvement in the phosphoinositide 3-kinase/AKT pathway.

Authors: Chie Suzuki; Yataro Daigo; Nobuhisa Ishikawa; Tatsuya Kato; Satoshi Hayama; Tomoo Ito; Eiju Tsuchiya; Yusuke Nakamura
Journal: Cancer Res Date: 2005-12-15 Impact factor: 12.701

6. How does multiple testing correction work?

Authors: William S Noble
Journal: Nat Biotechnol Date: 2009-12 Impact factor: 54.908

7. Prognostic significance and therapeutic implications of centromere protein F expression in human nasopharyngeal carcinoma.

Authors: Jing-Yan Cao; Li Liu; Shu-Peng Chen; Xing Zhang; Yan-Jun Mi; Zhi-Gang Liu; Man-Zhi Li; Hua Zhang; Chao-Nan Qian; Jian-Yong Shao; Li-Wu Fu; Yun-Fei Xia; Mu-Sheng Zeng
Journal: Mol Cancer Date: 2010-09-09 Impact factor: 27.401

8. Cytoscape 2.8: new features for data integration and network visualization.

Authors: Michael E Smoot; Keiichiro Ono; Johannes Ruscheinski; Peng-Liang Wang; Trey Ideker
Journal: Bioinformatics Date: 2010-12-12 Impact factor: 6.937

9. Evaluating different methods of microarray data normalization.

Authors: André Fujita; João Ricardo Sato; Leonardo de Oliveira Rodrigues; Carlos Eduardo Ferreira; Mari Cleide Sogayar
Journal: BMC Bioinformatics Date: 2006-10-23 Impact factor: 3.169

10. The Epstein-Barr virus-encoded microRNA MiR-BART9 promotes tumor metastasis by targeting E-cadherin in nasopharyngeal carcinoma.

Authors: Chung-Yuan Hsu; Yung-Hsiang Yi; Kai-Ping Chang; Yu-Sun Chang; Shu-Jen Chen; Hua-Chien Chen
Journal: PLoS Pathog Date: 2014-02-27 Impact factor: 6.823

3 in total

1. High Centromere Protein-A (CENP-A) Expression Correlates with Progression and Prognosis in Gastric Cancer.

Authors: Yuan Xu; Chao Liang; Xianlei Cai; Miaozun Zhang; Weiming Yu; Qinshu Shao
Journal: Onco Targets Ther Date: 2020-12-29 Impact factor: 4.147

2. The CRL4DTL E3 ligase induces degradation of the DNA replication initiation factor TICRR/TRESLIN specifically during S phase.

Authors: Kimberlie A Wittig; Courtney G Sansam; Tyler D Noble; Duane Goins; Christopher L Sansam
Journal: Nucleic Acids Res Date: 2021-10-11 Impact factor: 16.971

3. Establishment of a patient-derived organoid model and living biobank for nasopharyngeal carcinoma.

Authors: Xian-Wen Wang; Tian-Liang Xia; Hao-Cheng Tang; Xiong Liu; Ri Han; Xiong Zou; Yun-Teng Zhao; Ming-Yuan Chen; Gang Li
Journal: Ann Transl Med Date: 2022-05

3 in total