Literature DB >> 25707808

Mining the bladder cancer-associated genes by an integrated strategy for the construction and analysis of differential co-expression networks.

Abstract

BACKGROUND: Bladder cancer is the most common malignant tumor of the urinary system and it is a heterogeneous disease with both superficial and invasive growth. However, its aetiological agent is still unclear. And it is indispensable to find key genes or modules causing the bladder cancer. Based on gene expression microarray datasets, constructing differential co-expression networks (DCNs) is an important method to investigate diseases and there have been some relevant good tools such as R package 'WGCNA', 'DCGL'.
RESULTS: Employing an integrated strategy, 36 up-regulated differentially expressed genes (DEGs) and 356 down-regulated DEGs were selected and main functions of those DEGs are cellular physiological precess(24 up-regulated DEGs; 167 down-regulated DEGs) and cellular metabolism (19 up-regulated DEGs; 104 down-regulated DEGs). The up-regulated DEGs are mainly involved in the the pathways related to "metabolism". By comparing two DCNs between the normal and cancer states, we found some great changes in hub genes and topological structure, which suggest that the modules of two different DCNs change a lot. Especially, we screened some hub genes of a differential subnetwork between the normal and the cancer states and then do bioinformatics analysis for them.
CONCLUSIONS: Through constructing and analyzing two differential co-expression networks at different states using the screened DEGs, we found some hub genes associated with the bladder cancer. The results of the bioinformatics analysis for those hub genes will support the biological experiments and the further treatment of the bladder cancer.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2015 PMID： 25707808 PMCID： PMC4331807 DOI： 10.1186/1471-2164-16-S3-S4

Source DB: PubMed Journal: BMC Genomics ISSN： 1471-2164 Impact factor: 3.969

Background

The morbidity of bladder cancer is in the first place among the cancers of urinary system. The bladder cancer cells can spread by breaking away from the original tumor. They can spread through the blood vessels to the liver, lungs and bones. However, its causes are not yet clear. The bladder cancer is a heterogeneous disease that shows both superficial and invasive growth [1,2]. Superficial tumors frequently recur and may progress to invasive growth. A part that warrants better treatment regimes bladder cancer is also a good model system to study tumor initiation and progression. To gain insights into the molecular biology of these processes, we performed gene expression analyses to get some important information about bladder cancer-associated genes. Systems biology is an emerging approach applied to biomedical and biological scientific research. It is a biology-based inter-disciplinary field of study that focuses on complex interactions within biological systems, using a holistic approach to biological and biomedical research [3-5]. Network biology is a new way of representation and analysis of biological information processing, which understands life as a network. In fact, the network biology is a branch of the systems biology. Differential co-expression network (DCN) is one of biological networks. A gene co-expression network has emerged as a novel holistic approach for microarray analysis [6-9]. Stuart et al. [6] and Bergmannet al.[7] separately constructed the gene co-expression network that connected genes whose expression profiles were similar across different organisms. A human network was analyzed by Leeet al. [8] with functional grouping and cluster analysis. van Noort et al. [9] demonstrated the small-world and scale-free architecture of the yeast co-expression network. They showed that functionally related genes are frequently co-expressed across organisms constituting conserved transcription modules. We wanted to explore transcriptional changes in terms of gene interactions rather than at the level of individual genes. In the end, we constructed two gene co-expression networks and sought to find cancer-induced changes in the network. The identification of co-expressed pairs in tumor and normal tissues led to the construction of two distinct networks that represent tumor and normal states, respectively. We expected that biological changes would be reflected in transcriptional changes, which could be identified by comparing the two co-expression networks. In the transcriptome analysis, differential co-expression analysis (DCEA) is emerging as a unique complement to traditional differential expression analysis. DCEA investigates differences in gene interconnection by calculating the expression correlation changes of gene pairs between two conditions. The rationale behind differential co-expression analysis is that changes in gene co-expression patterns between two contrasting phenotypes (e.g., healthy and disease) provide hints regarding the disrupted regulatory relationships or affected regulatory subnetworks specific to the phenotype of interest (in this case, the disease phenotype). Therefore, among the many growing directions of DCEA, there is the so-called "differential regulation analysis"(DRA), which integrates the transcription factor (TF)-to-target information to probe upstream regulatory events that account for the observed co-expression changes. Recently, many researchers have integrated differential co-expression and differential expression concepts to propose a novel Regulatory Impact Factor (RIF) that can be used to prioritize disease-causative TFs [10,11]. In addition, a lot of researchers have begun to perform differential co-expression analyses of microRNAs [12,13]. Currently, some tools have been developed for differential expression analysis based on microarray, such as R packages "LIMMA"[14], "SAMR"[15], "WGCNA"[16] and so on. In our study, we collected microarray datasets of bladder cancer from GEO http://www.ncbi.nlm.nih.gov/geo/ to analyze the datasets by an integrated strategy including some functions of SAMR[15], WGCNA[16], Cytoscape[17] and other packages. We selected some DEGs at two different state (normal and cancer) and constructed two DCNs. Through the comparisons between two DCNs of two different states, we found some hub genes associated with the bladder cancer.

Results

The datasets of gene expression Affymetrix microarray of bladder cancer [GEO: GSE3167] were download from the GEO database of NCBI[18]. It has 18 samples, [GSM71019-GSM71027] are from the normal tissues and the other 9 samples are from the cancer tissues. The datasets were processed by an integrated strategy. Some DEGs were selected and constructed two DCNs. In the end, simple analysis was applied to the two DCNs and some hub genes were found through comparing two DCNs at different states.

Normalization of the microarray datasets

In order to get high-quality and strong-expression genes for the convenience of the following data processing, we normalized the microarray dataset using medians(Additional file 1 shows the comparison between before and after normalization). After normalization, the expression values are in the better order. We also discovered the distribution of the expression datasets (Figure 1).

Figure 1

The Q-Q plot of the expression datasets.

Selection and bioinformatical analysis of DEGs

After preprocessing the microarray datasets including the above normalization, some DEGs were selected using the R package "SAMR" (nperms(Number of permutations used to estimate false discovery rates) = 100; del (Value of delta to define cutoff rule) = 2.5). 36 up-regulated genes and 356 down-regulated genes were picked out (See Additional file 2). Next, we did bioinformatics analysis of those DEGs including GO function enrichment using a online tool AmiGO[19]. A part of GO enrichment results are showed in Figure 2. From the GO function enrichment results, we can easily find that the main functions of the up-regulated DEGs are nitrogen compound metabolic process (GO:0006807)(20 genes), heterocycle metabolic process (GO:0046483) (20 genes), cellular aromatic compound metabolic process (GO:0006725) (20 genes) and regulation of metabolic process (GO:0019222). And the down-regulated DEGs mostly involve ion binding (GO:0043167) (158 genes),multicellular organismal process(GO:0032501)(127 genes), single-multicellular organisms process(GO:0044707) (121 genes) and response to stimulus (GO:0050896) (158 genes) etc.

Figure 2

Part of GO function enrichment results of DEGs associated the bladder cancer.

Part of GO function enrichment results of DEGs associated the bladder cancer. We used a online tool GATHER[20] to do pathway enrichment (Figure 3). The up-regulated DEGs mostly involve the pathways related to "metabolism". However, the down-regulated DEGs are included in toll-like receptor signaling pathway, gamma-hexachlorocyclohexane degradation besides two metabolism-associated pathways.

Figure 3

The KEGG enrichment results of DEGs associated with the bladder.

The KEGG enrichment results of DEGs associated with the bladder. We also investigated the clustering of the DEGs associated with the bladder cancer. The heatmap of the DEGs is showed in Figure 4. From the heatmap, we can see the clustering results of the DEGs. The clustering method can correctly divide the samples into two classes.

Figure 4

The heatmap of DEGs associated with the bladder cancer.

Construction and analysis of two DCNs

We first calculated a adjacency matrix of DEGs at the normal or cancer state using the method "Pearson correlation" based on the gene expression values. If the adjacency value of a gene pair is greater than 0.8, the two genes will be connected to be an edge of a DCN. We constructed a normal DCN and a cancer DCN (Figure 5) employing the package "WGCNA". We can easily see that in the cancer state, a few up-regulated DEGs take parts in the co-expression relations. Through the comparisons of two different DCNs, we found that the shortest path length distribution (Additional file 3) has a little changes from the normal to the cancer. However, the average clustering coefficient distribution (Figure 6) and the topological coefficients (Additional file 4) under the normal condition differ from those under the cancer conditions, which indicates that the modules in the two different DCNs have a lot of changes.

Figure 5

Figure 6

The avg. clustering coefficient distribution of two different DCNs. For the normal DCN, the average clustering coefficient distribution is bigger, implying that the cancer DCN has more modules than the normal DCN.

Two DCNs at normal and cancer state cancer. The blue ellipses represent down-regulated DEGs and the red ellipses represent up-regulated DEGs. Comparing with the normal DCN, in the cancer DCN, the up-regulated DEGs decrease. In additional, the structure of two differential DCNs change a lot. The avg. clustering coefficient distribution of two different DCNs. For the normal DCN, the average clustering coefficient distribution is bigger, implying that the cancer DCN has more modules than the normal DCN. Next, we want to find differential links (edges) between two different DCNs. At first, we set two thresholds T1 and T2. If the correlation (Here is pearson correlation) of a gene pair is less than T1 (Here is set 0.3) at normal state, but is bigger than T2 (Here is set 0.8) at the cancer state, the link of the gene pair is defined as a differential link. We computed the significance of the differential links using permuation test (p-value <2.2e-16). All the selected differential links can compose a differential co-expression subnetwork (Figure 7). In Figure 7, the blue ellipses represent down-regulated DEGs and the red ellipses represent up-regulated DEGs. The shapes of nodes (genes) grow bigger with the degree of nodes. The 6 biggest nodes represent the hub genes: "GDF9", "CYP1A2", "ATF7", "TRPM3", "CER1", "PTPRJ", "KCNIP1", and "LRRC15". These hub genes mainly involve the biological processes: cellular response to stimulus (GO:0051716)(5 genes), regulation of biological process (GO:0050789)(7 genes), response to stimulus (GO:0050896)(6 genes), multi-organism process (GO:0051704)(4 genes), and regulation of localization (GO:0032879)(4 genes). And they mostly take part in five pathways: gamma-Hexachlorocyclohexane degradation (path:hsa00361), Fatty acid metabolism((path:hsa00361)), Adherens junction(path:hsa04520), Tryptophan metabolism(path:hsa00380), and Wnt signaling pathway(path:hsa04310). Among of them, "CYP1A2" and "PTPRJ" have been reported that they are associated with the bladder cancer[21-24].

Figure 7

Differential co-expression subnetwork of two different DCNs. The blue ellipses represent down-regulated DEGs and the red ellipses represent up-regulated DEGs. The shapes of nodes (genes) grow bigger with the degree of nodes. The 6 biggest nodes represent the hub genes in the subnetwork. And then, we dectected the modules of a DCN. There have been some methods using clustering algorithms[25-32]. Here, we adopted another good approach [33]. In order to detect the modules of two DCNs under different conditions, we begun with calculating the topological overlap matrix (TOM)[33] of expression datasets. The topological overlap of two nodes reflects their similarity in terms of the commonality of the nodes they connect to. In order test and verify the inference, we clustered the TOM similarities from two different conditions (Additional file 5). From the Additonal file 5, it is obvious that the modules at two different states change greatly. For observing the module changes, we plotted the module heatmaps of DEGs at two different states and showed parts of the module heatmaps (Figure 8).

Figure 8

Part of the modules heatmap of DEGs at two different states. The colors of grids get deeper with the correlation value and so the genes corresponding to the grids with close colors can be considered to be in the same module.

Discussion

The development of molecular markers for tumor classification and expression signatures that predict outcome will greatly improve diagnosis treatment of bladder cancer. We employed the R package "SAMR" to select 392 DEGs including 36 up-regulated and 356 down-regulated. In the GO function enrichment results (Additional file 6 Additional file 7), it is showed that the main functions of the DEGs are cellular physiological precess(24 up-regulated DEGs; 167 down-regulated DEGs) and cellular metabolism (19 up-regulated DEGs; 104 down-regulated DEGs), which is reasonable for the bladder. However, it is not clear why the difference between the number of the up-regulated and the down-regulated is so large. We used the R packages "WGCNA" to construct two DCNs under different conditions. And we used the tool "Cytoscape" to visualize and analyze the two different DCNs. Some hub genes were found and analyzed in view of bioinformatics. The hub genes of the normal DCN mainly involve the neuroactive ligand-receptor interaction pathway and their GO functions mostly are response to biotic stimulus,response to stimulus and response to external stimulus. The hub genes of the cancer DCN involve the following three KEGG pathways: gamma-Hexachlorocyclohexane degradation, Fatty acid metabolism and Tryptophan metabolism. Their main GO functions are cellular physiological process, surface receptor linked signal transport and signal transduction. In addition, we found some difference of the two DCNs in modules from the clustering plots and the heatmaps. But it is unknown for us that the functions of the different modules, which is our future work. We found several hub genes from the selected differential co-expression subnetwork of two different DCNs. Two of them have been reported to be associated with the bladder cancer. Then whether are the other hub genes associated with the bladder cancer? It need to be validated through biological experiments.

Conclusions

In the work, we adopted an integrated strategy to analyzing the bladder cancer-associated genes by combining several R packages, Gene Ontology and KEGG. In the experimental results, it shows that the bladder cancer results from the abnormal signaling pathways caused by many genes. Through the data mining for gene expression microarrays, we found differential co-expression subnetwork and the hub genes of the subnetwork. Through the main GO functions and pathways of the hub genes, we can better understand the development of the bladder cancer, which will support the wet biological experiment and even further promote the prevention, treatment,diagnosis and cure of the bladder cancer in the future.

Methods

Selecting differential expressed genes

We adopted the method called "Significance analysis of microarrays (SAM)"[15] to pick out the DEGs. The selection approach is based on analysis of random fluctuations in the data. To account for gene-specific fluctuations, they defined a statistic based on the ratio of change in gene expression to standard deviation in the data for that gene. The "relative difference" d(i) in gene expression is: Where and are defined as the average levels of expression for gene (i) in states I and U, respectively. The "gene-specific scatter" s(i) is the standard deviation of repeated expression measurements: Where ∑and ∑are summations of the expression measurements in states I and U, respectively, a = (1/nl + 1/n2)/(n1 + n2 - 2), and n1 and n2 are the numbers of measurements in states I and U. To find significant change in gene expression, genes were ranked by magnitude of their d(i) values, so that d(i) was the i th largest relative difference. For each of the N balanced permutations relative differences d(i) were also calculated, and the genes were again ranked such that d(i) was the i th largest relative difference for permutation p. The expected relative difference, d(i), was defined as the average over the N balanced permutations, . To identify potentially significant changes in expression, they used a scatter plot of the observed relative difference d(i) vs. the expected relative difference d(i). For the vast majority of genes, d(i) ≅ d(i), but some genes are represented by points displaced from the d(i) = d(i) line by a distance greater than a threshold Δ and these genes were called "significant genes". The method for setting thresholds provides asymmetric cutoffs for induced and repressed genes. The alternative is the standard t test, which improves a symmetric horizontal cutoff, with d(i) >c for induced genes and d(i)

Detecting modules in differential co-expression networks

We first needed to construct topological overlap matrices(TOM)[33,34]. The topological overlap is for measuring pair-wise similarity. They start with a network encoded by its corresponding adjacency matrix A = [a] which is a symmetric with binary entries. By convention, the diagonal elements are assumed to be zero. The topological overlap of two nodes reflects their similarity in terms of the commonality of the nodes they connect to. Ravasz et al.[35] define the topological overlap matrix T = [t] as follows Where, , and the index u runs across all nodes of the network. Yip and Horvath [33] generalized the TOM of Ravasz et al.[35] by the observation that formula (1) as follows: Where N1(i) denotes the set of neighbors of I excluding I itself and |·| denotes the number of elements (cardinality) in its argument. The quantity |N1(i)∩N1(j)| measures the number of common neighbors that nodes i and j shares whereas |N1(i)| gives the number of neighbors of i. By denoting N(i)(with m > 0) the set of nodes (excluding i itself) that are reachable from i within a path of length m, i.e., Where dist(i, j) is the geodesic distance between i and j, then a very natural generalization of the TOM can be read as follows The matrix is called the m - th order generalized topological overlap matrix (GTOMm). This quantity simply measures the agreement between the nodes that are reachable from i and from j within m steps.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

Su-Ping Deng mainly designed all the experiment, did the implementation of the design and wrote all the manuscript. Lin Zhu collected the experimental datasets and preprocessed the datasets. De-Shuang Huang was responsible for the supervision and direction of all the work. Zhu-Hong You revised the manuscript and programmed a part of R resource. Click here for file Click here for file Click here for file Click here for file Click here for file Click here for file Click here for file

27 in total

1. Hierarchical organization of modularity in metabolic networks.

Authors: E Ravasz; A L Somera; D A Mongru; Z N Oltvai; A L Barabási
Journal: Science Date: 2002-08-30 Impact factor: 47.728

2. Robust classification method of tumor subtype by using correlation filters.

Authors: Shu-Lin Wang; Yi-Hai Zhu; Wei Jia; De-Shuang Huang
Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2011-10-17 Impact factor: 3.710

3. Regulatory impact factors: unraveling the transcriptional regulation of complex traits from expression data.

Authors: Antonio Reverter; Nicholas J Hudson; Shivashankar H Nagaraj; Miguel Pérez-Enciso; Brian P Dalrymple
Journal: Bioinformatics Date: 2010-02-09 Impact factor: 6.937

4. Prediction of protein-protein interactions based on protein-protein correlation using least squares regression.

Authors: De-Shuang Huang; Lei Zhang; Kyungsook Han; Suping Deng; Kai Yang; Hongbo Zhang
Journal: Curr Protein Pept Sci Date: 2014 Impact factor: 3.272

5. Expression of S100A4 in renal epithelial neoplasms.

Authors: Li J Wang; Andres Matoso; Katherine T Sciandra; Evgeny Yakirevich; Edmond Sabo; Ying Zhang; Patricia A Meitner; Rosemarie Tavares; Lelia Noble; Gyan Pareek; Ronald A DeLellis; Murray B Resnick
Journal: Appl Immunohistochem Mol Morphol Date: 2012-01

6. CYP1A2 polymorphisms, occupational and environmental exposures and risk of bladder cancer.

Authors: Sofia Pavanello; Giuseppe Mastrangelo; Donatella Placidi; Marcello Campagna; Alessandra Pulliero; Angela Carta; Cecilia Arici; Stefano Porru
Journal: Eur J Epidemiol Date: 2010-06-18 Impact factor: 8.082

7. APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility.

Authors: Jun-Feng Xia; Xing-Ming Zhao; Jiangning Song; De-Shuang Huang
Journal: BMC Bioinformatics Date: 2010-04-08 Impact factor: 3.169

8. Studying the differential co-expression of microRNAs reveals significant role of white matter in early Alzheimer's progression.

Authors: Malay Bhattacharyya; Sanghamitra Bandyopadhyay
Journal: Mol Biosyst Date: 2013-01-23

9. A differential wiring analysis of expression data correctly identifies the gene containing the causal mutation.

Authors: Nicholas J Hudson; Antonio Reverter; Brian P Dalrymple
Journal: PLoS Comput Biol Date: 2009-05-01 Impact factor: 4.475

10. Whole miRNome-wide differential co-expression of microRNAs.

Authors: Cord F Stäehler; Andreas Keller; Petra Leidinger; Christina Backes; Anoop Chandran; Jöerg Wischhusen; Benjamin Meder; Eckart Meese
Journal: Genomics Proteomics Bioinformatics Date: 2012-08-23 Impact factor: 7.691

26 in total

1. CIPPN: computational identification of protein pupylation sites by using neural network.

Authors: Wenzheng Bao; Zhu-Hong You; De-Shuang Huang
Journal: Oncotarget Date: 2017-11-06

2. Knockdown of SIRT1 Suppresses Bladder Cancer Cell Proliferation and Migration and Induces Cell Cycle Arrest and Antioxidant Response through FOXO3a-Mediated Pathways.

Authors: Qingxuan Hu; Gang Wang; Jianping Peng; Guofeng Qian; Wei Jiang; Conghua Xie; Yu Xiao; Xinghuan Wang
Journal: Biomed Res Int Date: 2017-09-25 Impact factor: 3.411

3. Recurrent Neural Network for Predicting Transcription Factor Binding Sites.

Authors: Zhen Shen; Wenzheng Bao; De-Shuang Huang
Journal: Sci Rep Date: 2018-10-15 Impact factor: 4.379

4. Fast sequence analysis based on diamond sampling.

Authors: Liangxin Gao; Wenzhen Bao; Hongbo Zhang; Chang-An Yuan; De-Shuang Huang
Journal: PLoS One Date: 2018-06-28 Impact factor: 3.240

5. A Network-guided Association Mapping Approach from DNA Methylation to Disease.

Authors: Lin Yuan; De-Shuang Huang
Journal: Sci Rep Date: 2019-04-03 Impact factor: 4.379

6. Bladder cancer stage-associated hub genes revealed by WGCNA co-expression network analysis.

Authors: Yu Di; Dongshan Chen; Wei Yu; Lei Yan
Journal: Hereditas Date: 2019-01-28 Impact factor: 3.271

7. Integrating Imaging Genomic Data in the Quest for Biomarkers of Schizophrenia Disease.

Authors: Vince D Calhoun
Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2017-09-04 Impact factor: 3.710

8. Functional divergence and convergence between the transcript network and gene network in lung adenocarcinoma.

Authors: Min-Kung Hsu; Chia-Lin Pan; Feng-Chi Chen
Journal: Onco Targets Ther Date: 2016-01-14 Impact factor: 4.147

Review 9. Emerging Biomarkers in Bladder Cancer Identified by Network Analysis of Transcriptomic Data.

Authors: Matteo Giulietti; Giulia Occhipinti; Alessandra Righetti; Massimo Bracci; Alessandro Conti; Annamaria Ruzzo; Elisabetta Cerigioni; Tiziana Cacciamani; Giovanni Principato; Francesco Piva
Journal: Front Oncol Date: 2018-10-12 Impact factor: 6.244

10. Deep Transcriptomic Analysis Reveals the Dynamic Developmental Progression during Early Development of Channel Catfish (Ictalurus punctatus).

Authors: Xiaoli Ma; Baofeng Su; Yuan Tian; Nathan J C Backenstose; Zhi Ye; Anthony Moss; Thuy-Yen Duong; Xu Wang; Rex A Dunham
Journal: Int J Mol Sci Date: 2020-08-02 Impact factor: 5.923