BACKGROUND: Gene alterations are crucial to the molecular pathogenesis of pancreatic cancer. The present study was designed to identify the potential candidate genes in the pancreatic carcinogenesis. METHODS: Gene Expression Omnibus database (GEO) datasets of pancreatic cancer tissue were retrieval and the differentially expressed genes (DEGs) from individual microarray data were merged. Gene Ontology (GO) enrichment analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, protein-protein interaction (PPI) networks, and gene coexpression analysis were performed. RESULTS: Three GEO datasets, including 74 pancreatic cancer samples and 55 controls samples were selected. A total of 2325 DEGs were identified, including 1383 upregulated and 942 downregulated genes. The GO terms for molecular functions, biological processes, and cellular component were protein binding, small molecule metabolic process, and integral to membrane, respectively. The most significant pathway in KEGG analysis was metabolic pathways. PPI network analysis indicated that the significant hub genes including cytochrome P450, family 2, subfamily E, polypeptide 1 (CYP2E1), mitogen-activated protein kinase 3 (MAPK3), and phospholipase C, gamma 1 (PLCG1). Gene coexpression network analysis identified 4 major modules, and the potassium channel tetramerization domain containing 10 (KCTD10), kin of IRRE like (KIRREL), dipeptidyl-peptidase 10 (DPP10), and unc-80 homolog (UNC80) were the hub gene of each modules, respectively. CONCLUSION: Our integrative analysis provides a comprehensive view of gene expression patterns associated with the pancreatic carcinogenesis.
BACKGROUND: Gene alterations are crucial to the molecular pathogenesis of pancreatic cancer. The present study was designed to identify the potential candidate genes in the pancreatic carcinogenesis. METHODS: Gene Expression Omnibus database (GEO) datasets of pancreatic cancer tissue were retrieval and the differentially expressed genes (DEGs) from individual microarray data were merged. Gene Ontology (GO) enrichment analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, protein-protein interaction (PPI) networks, and gene coexpression analysis were performed. RESULTS: Three GEO datasets, including 74 pancreatic cancer samples and 55 controls samples were selected. A total of 2325 DEGs were identified, including 1383 upregulated and 942 downregulated genes. The GO terms for molecular functions, biological processes, and cellular component were protein binding, small molecule metabolic process, and integral to membrane, respectively. The most significant pathway in KEGG analysis was metabolic pathways. PPI network analysis indicated that the significant hub genes including cytochrome P450, family 2, subfamily E, polypeptide 1 (CYP2E1), mitogen-activated protein kinase 3 (MAPK3), and phospholipase C, gamma 1 (PLCG1). Gene coexpression network analysis identified 4 major modules, and the potassium channel tetramerization domain containing 10 (KCTD10), kin of IRRE like (KIRREL), dipeptidyl-peptidase 10 (DPP10), and unc-80 homolog (UNC80) were the hub gene of each modules, respectively. CONCLUSION: Our integrative analysis provides a comprehensive view of gene expression patterns associated with the pancreatic carcinogenesis.
Pancreatic cancer is the 4th leading cause of cancer death in the world, with a median survival time less than 6 months and a dismal 5 years survival rates of 3% to 5%.[ Despite considerable progress has been achieved, the diagnosis and management of pancreatic cancer remains challenging.[ To date, the exact mechanisms of pancreatic cancer have not been fully elucidated, but the genes alteration during the local and systemic tumor development of many cancers has been well known. Many studies have reported that abnormal alteration of some genes were crucial to the pancreatic carcinogenesis.[During the last decade, gene expression profiling with microarrays has developed greatly and become an important technology for identifying the genes and biological pathways that associated with various diseases. This approach is appropriated for identifying potentially useful diagnostic and prognostic biomarkers. Recently, there are studies using microarrays to identify potentially useful genes or gene signatures that are associated with pancreatic cancer.[Although these studies have successfully identified gene expression signatures that associated with pancreatic carcinogenesis, the expression signatures identified in these studies were not consistent with each other.[ Therefore, in order to unravel the gene expression signatures associated with the pancreatic carcinogenesis, we used the microarray datasets of humanpancreatic tissue available in the databases, and performed integrated analysis on their differentially expressed gene (DEGs), our results will identify the possible biologically active molecules and the potential therapeutic targets for pancreatic cancer.
Materials and methods
Microarray data selection
The Gene Expression Omnibus database (GEO, http://www.ncbi.nlm.nih.gov/geo) of NCBI was searched to identify the relevant microarray datasets. The microarray datasets are selected according to the following rules: the samples must be humanpancreatic cancer tissue; the sample size must large than 5 samples; the patients did not receive special treatment, including radiotherapy and chemotherapy; and the study type of dataset is expression profiles studies. We excluded nonhuman studies, pancreatic cancer cells studies, and integrated analysis of expression profiles studies. Data were extracted from the original studies by 2 independent reviewers. The following information was extracted from each identified study: GEO accession number, sample type, platform, number of cases and controls, references, and gene expression data. Any discrepancies between reviewers were resolved by consensus or a 3rd reviewer. The study was approved by the Review Boards of the First Affiliated Hospital of Guangxi Medical University.
Differentially expressed genes (DEGs) analysis
The raw data in CEL files were normalized using Robust Multiarray Analysis algorithm in R Affy package. Then the datasets were assigned into 2 groups: pancreatic cancer group and control group. The DEGs between pancreatic cancer tissue and normal tissue were estimated by t test using limma package in R statistical software.[ Genes exhibiting at least 2-fold changes corresponding to a false discovery rate less than 0.05 were selected as the significantly DEGs.
Identification of the overlap DEGs from 3 microarray datasets
The DEGs from individual microarray data were merged and the overlap DEGs of the 3 microarray datasets were identified using the robustrankaggreg package[ in R statistical software. Only the overlap DEGs were used for the integrated analysis. According to a nonparametric permutation test of the robustrankaggreg algorithm, a list of upregulated or downregulated genes were identified based on P value (where threshold < 0.05) and fold change (FC) level in a given number of replicates multiplied across different microarray datasets. The cutoff P value was adjusted by the Benjamin–Hochberg false discovery rate.
Functional and pathway enrichment analyses of DEGs
In order to screen the biological processes involved in the pathogenesis of pancreatic cancer, the online software Database for Annotation, Visualization and Integrated Discovery was used to perform Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis for the DEGs. These analyses were using Hypergeometric Distribution test, and the P value < .05 was set as the cut-off criteria. We also constructed pathway relation network for the DEGs and identified the relationship among the pathways.
Analysis of protein–protein interaction (PPI) network
To determine the function of the proteins that they encoded, DEGs were imported into the PPI network constructed by using the Biological General Repository for Interaction Datasets (BioGRID) (http://thebiogrid.org/) in Cytoscape software (http://www.cytoscape.org/). The PPI network identified for the DEGs was screened at a genome-wide scale, with both end nodes having DEGs. The network construction using methods based on genomic context and structure information.[
Gene coexpression network analysis
To further identify possible genes that crucial to the pancreatic cancer, we selected DEGs the both significant expression in GO annotation and KEGG pathway using GCBI online program (https://www.gcbi.com.cn/gclib/html/index). Next, DEGs from the intersection of GO annotation and KEGG pathway analysis were used to construct a gene coexpression network, we mapped the DEGs to the immense database of already-known networks and screened significant gene–gene interactions using GCBI online program. The correlation between the genes in the network was determined by GO term (biological process)-based weighting. To construct a coexpression network, a correlation matrix was built by calculating pairwise Spearman Rank correlations for all pairs of expression vectors, and the analysis for modularity using the Louvain method, with a P value < .05 as significant.[
Results
Selection of microarray datasets
From microarray datasets retrieved in GEO of NCBI, we selected 3 microarray datasets, including GSE28735,[ GSE32676,[ and GSE43288[ that meet our criteria for DEGs analysis. These 3 microarray datasets provided the gene expression profiles on the humanpancreatic cancer tissue. The dataset of GSE43288 included pancreatic cancer tissue, precursor lesions, and normal control tissue, we only chose data from pancreatic cancer tissue and normal control tissue. Therefore, a total of 74 pancreatic cancer and 55 normal control sample were included our study. With regard to the used GEO platforms, all these 3 datasets used Affymetrix Human Gene Array platforms. The details of the datasets are shown in Table 1.
Table 1
Characteristic of included microarray data.
Characteristic of included microarray data.
Identification of DEGs for the 3 microarray datasets
The raw data of the 3 microarray datasets were log2-transformed and normalized in order that their mean and unit variance was zero. According to robustrankaggreg algorithm, we identify a total of 2325 DEGs from the microarray datasets depending on their P-value and FC level. There are 1383 upregulated and 942 downregulated DEGs.
Functional annotation and KEGG pathway for DEGs
To gain insights into the biological roles of the DEGs, we performed a GO categories enrichment analysis. The GO function and KEGG pathway enrichment of the total DEGs are listed in Table 2. We found that the top enriched GO terms for biological processes were: small molecule metabolic process (P = 5.38e-47), signal transduction (P = 7.33e-34), and transmembrane transport (P = 2.65e-27); for molecular function were: protein binding (P = 4.16e-97), ATP binding (P = 8.34e-52), and metal ion binding (P = 4.93e-37); for cellular component were: integral to membrane (P = 3.45e-99), plasma membrane (P = 1.07e-93), and cytoplasm (P = 1.11e-82). In the KEGG pathway enrichment analysis for the DEGs, we found that the most significant pathways in the KEGG analysis were metabolic pathways (P = 5.83e-34). Furthermore, pancreatic secretion (P = 1.98e-22) and phagosome (P = 2.64e-15) pathways were also highly enriched.
Table 2
Top 5 GO function and KEGG pathway enrichment of the total DEGs.
Top 5 GO function and KEGG pathway enrichment of the total DEGs.To further assess the relation of the pathways, we constructed a pathway relation network for the DEGs. This pathway relation network composed of 30 nodes and 118 edges, and we found that the MAPK signaling pathway (degree = 41), apoptosis (degree = 31), and pathways in cancer (degree = 27) ranked the top 3 largest of degree. We also found that the genes expression was upregulated only in apoptosis, cell cycle, and Jak-STAT signaling pathway, while the rest pathway included up- and downregulated genes (Fig. 1).
Figure 1
Pathway relation network for the total differentially expressed genes (DEGs). Expression levels are represented by red (upregulated), green (downregulated), and yellow (up- and downregulated).
Pathway relation network for the total differentially expressed genes (DEGs). Expression levels are represented by red (upregulated), green (downregulated), and yellow (up- and downregulated).
PPI network analysis of the DEGs
The PPI network for the total of DEGs with significant interaction relation composed of 453 nodes and 1039 edges (Fig. 2A) and a subnetwork for the top 30 DEGs is shown in Fig. 2B. As shown in Fig. 2B, the top 5 list of DEGs were determined in the order of the interacting edges, and the upregulated DEGs were: mitogen-activated protein kinase 3 (MAPK3) (degree = 31), phospholipase C, gamma 1 (PLCG1) (degree = 27), and phospholipase A2, group IVA (PLA2G4A), the downregulated DEGs were: cytochrome P450, family 2, subfamily E, polypeptide 1 (CYP2E1), insulin-like growth factor 1 receptor (IGF1R). The top 20 list of DEGs in the PPI network are shown in Table 3.
Figure 2
PPI network for the DEGs. (A) PPI network for the total DEGs; (B) PI network for the top 30 DEGs. The red nodes represented upregulated of genes, the blue nodes represented downregulated of genes. DEG = differentially expressed gene, PPI = protein–protein interaction.
Table 3
Top 20 DEGs from the PPI network.
PPI network for the DEGs. (A) PPI network for the total DEGs; (B) PI network for the top 30 DEGs. The red nodes represented upregulated of genes, the blue nodes represented downregulated of genes. DEG = differentially expressed gene, PPI = protein–protein interaction.Top 20 DEGs from the PPI network.Gene coexpression network was constructed using the intersection DEGs from GO analysis and pathway analysis. The gene coexpression network composed of 80 nodes and 294 edges (Fig. 3A). The top 20 DEGs of intersection are listed in Table 4. In the coexpression network, we found that the network can be divided into 4 major modules, the potassium channel tetramerization domain containing 10 (KCTD10) (degree = 31) and kin of IRRE like (KIRREL) (degree = 26) were the hub upregulated gene of the modules, and the dipeptidyl-peptidase 10 (DPP10) (degree = 33) and unc-80 homolog (UNC80) (degree = 27) were the hub downregulated gene of the modules (Fig. 3B).
Figure 3
Gene coexpression network for the differentially expressed genes (DEGs). (A) Gene coexpression network for the total DEGs; (B) gene coexpression network for the top 20 DEGs. The red nodes represented upregulated of genes, the blue nodes represented downregulated of genes.
Table 4
Top 20 differentially expressed genes (DEGs) for gene coexpression network.
Gene coexpression network for the differentially expressed genes (DEGs). (A) Gene coexpression network for the total DEGs; (B) gene coexpression network for the top 20 DEGs. The red nodes represented upregulated of genes, the blue nodes represented downregulated of genes.Top 20 differentially expressed genes (DEGs) for gene coexpression network.
Discussion
The development of pancreatic cancer is a complicated process and involving numerous alterations of genes. A comprehensive analysis of the molecular mechanism underlying pancreatic carcinogenesis is crucial for the management strategy of pancreatic cancer. To date, several studies reported the results of gene expression signatures in pancreatic cancer using microarray profiling, but some of them with less samples and the results were inconsistent. Therefore, combining data from multiple existing studies can increase the reliability and generalizability of results. Integrative analysis is an approach of gaining new insights on microarray data. In this study, we performed an integrative analysis to understand the mechanism using 3 microarray datasets of pancreatic cancer tissue.In the DEGs expression analysis, we found that there were 1383 upregulated and 942 downregulated DEGs after combing the data from the 3 datasets. Among these DEGs, MAPK3, CYP2E1, KCTD10, KIRREL, DPP10, and UNC80 were found to be the hub genes in the PPI analysis and coexpression. Other top DEGs, such as PLCG1, PLA2G4A, and IGF1R, were also implicated in the pathogenesis of several cancers. PLCG1 is a downstream effector signaling molecule for fibroblast growth factor receptor 1, and a study has shown that sonic hedgehog signaling promotes gastric cancer proliferation through induction of PLCG1-extracelluar regulated protein kinases 1/2.[ PLA2G4A is an enzyme that implicated in cancer cells proliferation. Targeting of PLA2G4A impedes cell cycle re-entry of quiescent prostate cancer cells.[ Agents targeting the IGF1R have shown antitumor activity. Cyclin dependent kinases 4/6 and IGF1 inhibitors synergized to suppress the growth of p16INK4A-deficient pancreatic cancers.[In the GO analysis, the significant GO terms were related to small molecule metabolic process, protein binding, and integral to membrane of the host cell. Like other cancers, it has been known that the pancreatic carcinogenesis involving many small molecules changed, the genes alteration subsequently leads to the change of the corresponding proteins. Abnormal expression of proteins interacting selectively with other proteins or form protein complex, which is a vital mechanism of cancer development.[ Besides, the abnormal expression protein integrals to membrane of the host cells, then results in the change of morphology and function of host cells, finally leads to the degeneration of normal cells.[With regard to the signaling pathway, we found that metabolic pathways, pancreatic secretion, and phagosome pathways were highly enriched, indicating that the DEGs of the microarray data were largely involving in these signaling pathways. Besides, in the pathway relation analysis, we found that the downregulated pathways: MAPK signaling pathway, apoptosis, and pathways in cancer; and the upregulated pathways: apoptosis, cell cycle, and Jak-STAT signaling pathway showed the most relationship with other pathways. In addition, as shown in Fig. 1, we also found that most of the pathways were downregulated during the pancreatic carcinogenesis. The role of MAPK and Jak-STAT signaling pathways have been well defined in the pancreatic cancer,[ and the cell apoptosis and cell cycle are the important process of normal cells develop into cancer cells.[ These results provided more evidences of the important signaling pathways interaction during the process of pancreatic carcinogenesis.In PPI network analysis, the results indicated that the significant hub gene (MAPK3 and CYP2E1) and their related pathways played key roles in the pancreatic carcinogenesis. MAPK directly interacts with scaffolding proteins, activators, and effectors, and these interactions also govern signaling specificity. The MAPK/ERK1/2 signaling pathway, which has been correlated with malignant carcinoma carcinogenesis, plays crucial roles in cell growth control, differentiation, proliferation, and apoptosis.[ It has been suggested that a novel compound capable of inhibiting MAPK/ERK1/2 activity would be a good potential candidate for pancreatic cancer.[ CYP2E1 is a member of the cytochrome P450 superfamily and is involved in the metabolic activation of many carcinogens, such as gastric cancer[ and lung cancer.[ These PPI network results demonstrated the crucial functional linkages between the DEGs in the pancreatic carcinogenesis.Gene–gene interactions is an important process in the regulation of disease pathogenesis. In a bioinformatics study, the coexpression network analysis would detect several gene–gene interactions modules from the overall network, while these coexpression genes in the modules could not be easily detected and clustered by the differential expression analysis.[ In general, the coexpressed genes tend to involve in the similar biological process, such as pathways, complexes, and those genes in a strong coexpression module (calculated by the degree value) present higher similarity of biological function than those from random gene pairs analysis.[ In this study, we identified 4 major gene modules, and KCTD10, KIRREL, DPP10, and UNC80 were the hub gene of each modules, indicating that these genes are key to the pancreatic carcinogenesis.KCTD10 belongs to the polymerase delta-interacting protein 1 gene family and is implicated in some tumors. Wang et al[ observed that the expression of proliferating cell nuclear antigen was decreased by knockdown of KCTD10 in lung adenocarcinama cells, and the cell proliferation was also inhibited. KCTD10 can be regulated by some transcription factors, including transcription factor specificity protein 1 and activating protein 2alpha, which regulated KCTD10 mRNA expression positively and negatively, respectively.[ Recently, a study observed that KCTD10 was regulated by E twenty-six variant 1, and silencing of KCTD10 could increase gastrointestinal stromal tumor cell proliferation and invasion, suggesting that KCTD10 was vital to the gastrointestinal stromal tumor pathogenesis.[KIRREL (also known as NEPH1) is a member of the nephrin-like protein family. KIRREL induces actin polymerization by transduction outside-in signals and recruitment of Grb2.[ KIRREL and nephrin can form a functional receptor complex during podocyte development and following podocyte injury,[ indicating that KIRREL is associated with these physiological or pathologic process.DPP10 is an inactive peptidase that modulates the electrophysiological properties, cell-surface expression, and subcellular localization of voltage-gated potassium channels. DPP10 is restricted expression, including the brain, adrenal gland, and pancreas, may serve as a marker in certain malignant states such as colorectal cancer and could have a prognostic significance.[ DPP10 has been linked to asthma susceptibility by several genome-wide association studies[ and recently impaired expressions of DPP10 gene in malignant mesothelioma[ has been reported.UNC80 is a component of the NALCN Na+ leak cation channel, which is a voltage-independent ion-channel complex. In heterologous expression systems, UNC80 and NALCN have been shown to interact.[ UNC79 and UNC80 also associate with each other, and UNC79 requires the presence of UNC80 to associate with NALCN. NALCN associates with UNC79 via UNC80 in the brain, and UNC79 influences UNC80 protein levels. Mutations in Nalcn, Unc79, or Unc80 lead to severe phenotypes that include neonatal lethality and disruption in rhythmic behaviors.[In summary, the present integrated analysis provided a comprehensive perspective to understand the molecular mechanism underlying pancreatic carcinogenesis, and identified some hub genes and pathways. The hub genes and pathways may be potential targets of treatment for pancreatic cancer. However, further investigations are remained necessary for unraveling the mechanism in the pathogenesis of pancreatic cancer.