Literature DB >> 24550935

The gene regulatory network for breast cancer: integrated regulatory landscape of cancer hallmarks.

Frank Emmert-Streib1, Ricardo de Matos Simoes1, Paul Mullan2, Benjamin Haibe-Kains3, Matthias Dehmer4.   

Abstract

In this study, we infer the breast cancer gene regulatory network from gene expression data. This network is obtained from the application of the BC3Net inference algorithm to a large-scale gene expression data set consisting of 351 patient samples. In order to elucidate the functional relevance of the inferred network, we are performing a Gene Ontology (GO) analysis for its structural components. Our analysis reveals that most significant GO-terms we find for the breast cancer network represent functional modules of biological processes that are described by known cancer hallmarks, including translation, immune response, cell cycle, organelle fission, mitosis, cell adhesion, RNA processing, RNA splicing and response to wounding. Furthermore, by using a curated list of census cancer genes, we find an enrichment in these functional modules. Finally, we study cooperative effects of chromosomes based on information of interacting genes in the beast cancer network. We find that chromosome 21 is most coactive with other chromosomes. To our knowledge this is the first study investigating the genome-scale breast cancer network.

Entities:  

Keywords:  BC3Net; GPEA; breast cancer; computational genomics; gene regulatory network; statistical inference

Year:  2014        PMID: 24550935      PMCID: PMC3909882          DOI: 10.3389/fgene.2014.00015

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.599


1. Introduction

Breast cancer is among the most common cancers with worldwide more than 1,300,000 cases each year (Cancer and Atlas, 2012). Among these cases, ductal carcinoma, a particular subtype of breast cancer, represents up to 25% of all newly diagnosed patients in the United States (Wickerham and Julian, 2013). In general, breast cancer is derived from epithelial cells that develop neoplasia in breast tissue. In the malignant case these cells are invasive and can become metastatic. It is known that the major cancer hallmarks, common to all mammalian cancer forms, are related to cell differentiation, proliferation and cell apoptosis processes that are associated to the deregulation of the cell cycle and impairment of DNA repair processes (Hanahan and Weinberg, 2000, 2011a). This makes cancer a disease of pathways (Hanahan and Weinberg, 2000). Unfortunately, the underlying molecular interactions of these processes are to-date not well understood and the corresponding network of the mechanistic interplay and physical interactions between individual genes, their products, proteins and metabolites is underexplored. A reason for this lack of knowledge is due to the fact that most pathways do not have a chain-like structure, but are complex connected to regulate particular cellular processes and responses. This makes cancer a complex disease, difficult to study as it can not be traced back to an individual component, e.g., a protein in the network. For this reason, cancer genes need to be understood as being part of a complex network and the malfunction of a process may be caused by inadequate interactions (Kitano, 2004). Although major efforts have been made to identify important interaction partners in various cancer types (Basso et al., 2005; Madhamshettiwar et al., 2012; de Matos Simoes et al., 2013a), the actual interaction structure of genome-scale networks is far from being known. In general, genes involved in the development and progression of cancer represent a broad class of proteins such as transcription factors, chromatin remodelers, growth factors (e.g., EGFR), growth factor receptors (e.g., HER2/neu), signal transducers, regulators of apoptosis and DNA repair genes (Croce, 2008). The individual key players of cancer progression are classified as (A) oncogenes, (B) tumor suppressor genes and (C) genomic stability genes (Vogelstein and Kinzler, 2004). These genes are playing a key role in the regulation of the cell-cycle, proliferation and cell differentiation, and in the regulation of apoptosis (Croce, 2008). Specifically, oncogenes accumulate particular mutations that lead to a constitutive structural active form of the protein. In contrast, specific mutations in tumor suppressor genes (e.g., APC) lead to an inactivation or decreased activity of the protein. Stability genes include proteins involved in DNA repair, mitotic recombination and chromosome segregation (e.g., BRCA1). It is important to note that cancer genes are mostly identified by their genetic alterations such as germline or somatic mutations, which are sequenced from tumor tissues (Sjoblom et al., 2006). Specifically, mutations in cancer genes can be somatic or germlime genomic single nucleotide substitutions, deletions, insertions (Futreal et al., 2004) or mutations of larger DNA segments that are amplified, translocated, deleted or inserted (Mitelman, 2000). Due to complex nature of breast cancer, we are pursuing in this paper a systems approach (Alon, 2006; Hernandez-Lemus, 2013) based on gene regulatory networks. Specifically, a gene regulatory network (GRN) is a description of the complex molecular interactions between individual genes and their products (Hecker et al., 2009; Emmert-Streib et al., 2012a,b). Statistically, gene regulatory networks are inferred from large-scale gene expression data from a variety of cancer tissue samples and many contemporary inference methods are based on estimates of mutual information (Li, 1990; Steuer et al., 2002; Meyer et al., 2008; Emmert-Streib et al., 2012a). Generally, biological networks have been analyzed structurally by using entropy-based network measures and other quantities which characterize the underlying graph topology uniquely (Mueller et al., 2011; Dehmer et al., 2012). The major goal of this paper is to use the BC3NET method (de Matos Simoes and Emmert-Streib, 2012) for inferring a breast cancer gene regulatory network. Specifically, we are using 351 breast cancer samples from the Expression Project for Oncology (expO) (http://www.intgen.org/expo/) microarray database maintained by the International Genomics Consortium (IGC). For this breast cancer gene regulatory network, we perform a systems analysis with respect to its functional and structural features. Furthermore, we study the role of known general cancer genes and specific breast cancer genes in the breast cancer network. Finally, we investigate cooperative effects between chromosomes by relating interactions back to their chromosomal positions. This paper is organized as follows: In the next section, we describe all methods and data we are using for our analysis. In the results section, we present our findings and in the section “Discussion” we interpret our results. The paper finishes with the section “Conclusions” with a summary.

1.1. Author summary

What types of biological networks have been inferred in the paper? We use gene expression data from breast cancer samples and infer a gene regulatory network (GRN). How was the quality/utility of the inferred networks assessed? We assess the biological validity of the inferred GRN by using the Gene Ontology database. How were these networks validated? The GRN is analyzed computationally and statistical hypotheses testing is employed to test various hypotheses about the network structure and the biological function of the investigated GRN of breast cancer.

2. Methods

2.1. Breast cancer gene expression data

For our study, we use 351 breast cancer tissue samples from the Expression Project for Oncology (expO) (http://www.intgen.org/expo/) microarray database maintained by the International Genomics Consortium (IGC). ExpO provides breast cancer samples from histologically determined tumors of various types, see Table 1 for an overview, whereas the majority of samples (over 80%) is from ductual carcinoma across various grades (1, 2, and 3) and stages (1, 2A, 2B, 3A, 3B, 3C, and 4). The majority of patients are in the age group between 40 and 70 (238/351 patients). Most of the patient samples have a caucasian ethnic background (314/351 samples). A total of 136/351 patients were without family history of cancer and 213/351 were with a family history of cancer (two samples have unknown family history of cancer). 346 samples are from female and 5 from male gender. The 351 breast cancer Affymetrix hgu133plus2 microarray samples in CEL format were obtained from the GEO NCBI repository (accession number GSE2109) (Edgar et al., 2002).
Table 1

Overview of histological annotations of the 351 samples of the expO breast cancer gene expression compendium.

Histology descriptionSamples
Ductal carcinoma270
Lobular carcinoma39
Invasive ductal carcinoma7
Mucinous carcinoma6
Ductal and lobular carcinoma4
Metaplastic carcinoma3
Metaplastic squamous carcinoma3
Cribiform carcinoma2
Inflammatory carcinoma2
Intraductal carcinoma2
Intraductal papillary carcinoma2
Medullary carcinoma, NOS2
Adenoid cystic carcinoma1
Colloid adenocarcinoma1
Ductal carcinoma in situ (DCIS) with focal comedo carcinoma1
Intracystic carcinoma1
Invasive ductal carcinoma, mucinous type1
Lobular carcinoma in situ1
Papillary carcinoma1
Papillary carcinoma (predominantly micropapillary pattern)1
Unknown1
Overview of histological annotations of the 351 samples of the expO breast cancer gene expression compendium.

2.1.1. Preprocessing and normalization

We normalize the microarray samples for the selected tissue types using RMA and quantile normalization (Irizarry et al., 2003) using log2 expression intensities for each probe set. Because a gene can be represented by more than one probe sets, we use the median expression value as summary statistic for different probe sets. Entrez gene ID to Affymetrix probe set annotation is obtained from the “hgu133plus2.db” R package. If a probe sets is unmapped, we exclude it from our analysis. After these preprocessing steps, we have 19,738 genes and 351 samples we use for our analysis.

2.2. BC3NET

In order to infer the gene regulatory network for the gene expression data from breast cancer, we use the BC3NET algorithm (de Matos Simoes and Emmert-Streib, 2012) to infer a mutual information based gene regulatory network. In the following, we denote this network briefly as GBC3Net. The gene regulatory network GBC3Net is inferred from a bootstrap ensemble generated from a single gene expression dataset D. In the first step of the procedure mutual information values among all gene pairs are estimated using the Pearson estimator. In the second step, for each gene at most one gene is selected for each of the p genes in a given dataset to contribute at most one edge to the inferred network. In overall p different null hypotheses for mutual independency are tested. In the third step we control the type one error by applying a Bonferroni multiple testing procedure. This results in a network G that is inferred for each k of 100 Bootstrap datasets. For each generated dataset in the ensemble, D, a network, G, is inferred using C3NET (Altay and Emmert-Streib, 2010). From {G} an aggregated network G is obtained whose edges are used as test statistics to obtain the final network G. The test statistic for each edge is used for a binomial test to test for significance of the connection between gene pairs BC3NET. If a connection between a gene pair is statistically significant (α ≤ 0.05) they are connected by and edge, otherwise there is no connection.

2.3. Cancer gene census

We use the complete list of the Cancer Gene Census (CGC) (Futreal et al., 2004) (Version 2011−03−22, Table_1_full_2011−03−22) (http://www.sanger.ac.uk/genetics/CGP/Census/). The CGC list comprises a total of 457 cancer genes. From these 457 genes, 435 are present in the expO breast cancer gene expression data set. The manually curated CGC list contains genes reported in the literature having cancer associated somatic or germline non-synonymous substitutions, insertions and deletions in coding regions or splice sites and genes affected by chromosomal rearrangements or copy number variations (Futreal et al., 2004).

2.4. Degree distribution

In order to assess the global connectivity of the inferred breast cancer network we estimate its degree distribution for a power-law distribution (Barabási and Albert, 1999; Newman, 2005). The degree distribution of a power-law follows whereas α is the exponent of the power-law distribution.

2.5. GPEA: gene pair enrichment analysis

In order to test the enrichment of Gene Ontology (GO)-terms in the inferred breast cancer network, we are applying a hypergeometric test for edges (gene pairs), instead of genes. For this reason, this analysis is called gene pair enrichment analysis (GPEA) (de Matos Simoes and Emmert-Streib, 2012; de Matos Simoes et al., 2013b). For p genes there is a total of N = p(p − 1)/2 different gene pairs. If there are pGO genes for a particular GO-term then the total number of gene pairs for this GO-term is mGO = pGO(pGO − 1)/2. Suppose the inferred breast cancer network contains n edges, of which k are edges are among genes from the given GO-term. Then a p-value for the enrichment of this GO-term can be calculated from a hypergeometric distribution by Here the p-value gives an estimate for the probability to observe k or more edges between genes from the given GO-term. We access the GO annotation for entrez identifiers from the Bioconductor (Gentleman et al., 2004) annotation packages org.Hs.eg.db (v2.9.0) and GO.db (v2.9.0).

3. Results

3.1. Breast cancer gene regulatory network

Using the expO data set and the BC3NET algorithm, we infer a breast cancer gene regulatory network (GRN). In the following, we denote this network briefly as GBC3Net. This regulatory network consists of 19,738 genes and contains 180,171 interactions (edges) among these genes. With the exception of 15 genes the overall network is connected, i.e., we can always find a path that connects a pair of genes with each other. Technically, this means that the giant connected component (GCC) (Dorogovtesev and Mendes, 2003) of our breast cancer GRN has a size of 19,723 genes. For this network, we find an average shortest path length of 4.11 and its edge density is ϵ = 9.2 · 10−4. Here we measure a shortest path by means of the Dijkstra distance (Dijkstra, 1959). Furthermore, we find the degree distribution of the network to follow a power law distribution with an exponent of α = 3.48. This indicates that the resulting network is a scale-free network (Barabási and Albert, 1999) as found for many different types of biological networks (Bornholdt and Schuster, 2003; van Noort et al., 2004; Albert, 2005; Basso et al., 2005).

3.2. Functional analysis of biological processes using GPEA

In order to evaluate the inferred breast cancer GRN biologically, we use the GO database (Ashburner et al., 2000). Specifically, we evaluate our network based on functional knowledge about genes that are involved in similar biological processes. We are interested to identify which functional modules are most prominently represented in our inferred breast cancer network under the assumption that functionally related genes are likely to interact with each other. Furthermore, we want to identify which known cancer genes are represented (enriched) in those functional modules. This will shed light on the role and importance of cancer genes in the breast cancer network. We conduct this functional analysis of the breast cancer network by using the GPEA (gene pair enrichment analysis; see “Methods” section) method. The results of this analysis are shown in Table 2. Briefly, a GPEA analysis identifies GO-terms with an enriched number of interactions among genes from the same GO category. We correct for multiple testing using a Bonferroni correction for a significance level of α = 0.05. In order to assess the role of census genes for the individual GO-terms we counted the number of census genes present in each GO-term. For the analysis, we consider a total of 7989 GO-terms from the category Biological Process, with a term size larger than 2 and less than 1000 genes. In total, we find 632 enriched GO-terms (12.64%). The 50 most significant terms of the GPEA analysis are shown in Table 2. As one can see, the significant terms describe a variety of biological processes such as mitotic cell cycle (1031 edges), cell cycle phase (1142 edges), mRNA translation such as translational elongation (218 edges), termination (191 edges) and initiation (226 edges), protein targeting to ER (196 edges), viral transcription (193 edges), protein complex disassembly (197 edges), regulation of immune system process (827 edges), innate immune response (368 edges), cell adhesion (867 edges) and type I interferon-mediated signaling pathway (71 edges).
Table 2

GPEA analysis of the breast cancer gene regulatory network for GO biological processes.

GOIDGO termSizeEdgesp-valueCG
GO:0000278Mitotic cell cycle77610312.3e-26054/+
GO:0006414Translational elongation1082186.7e-2591
GO:0022403Cell cycle phase85311421.1e-25760/+
GO:0006415Translational termination911913.5e-2441
GO:0006614SRP-dependent cotranslational protein targeting to membrane1051959.1e-2272
GO:0045047Protein targeting to ER1071963.6e-2252
GO:0072599Establishment of protein localization to endoplasmic reticulum1081961.4e-2232
GO:0006613Cotranslational protein targeting to membrane1071951.4e-2232
GO:0000184Nuclear-transcribed mRNA catabolic process, non-sense-mediated decay1181981.1e-2112
GO:0070972Protein localization to endoplasmic reticulum1211976.2e-2062
GO:0006413Translational initiation1532261.0e-2044
GO:0000087M phase of mitotic cell cycle3744152.3e-18220/+
GO:0000280Nuclear division3633911.0e-17120/+
GO:0007067Mitosis3633911.0e-17120/+
GO:0000279M phase5375731.7e-17133/+
GO:0006612Protein targeting to membrane1542001.1e-1694
GO:0000956Nuclear-transcribed mRNA catabolic process1712121.0e-1667
GO:0048285Organelle fission3884091.6e-16620/+
GO:0019080Viral genome expression1521936.4e-16310/+
GO:0019083Viral transcription1521936.4e-16310/+
GO:0043624Cellular protein complex disassembly1571961.8e-1612
GO:0043241Protein complex disassembly1621971.3e-1572
GO:0006402mRNA catabolic process1832134.3e-1567
GO:0032984Macromolecular complex disassembly1832011.8e-1427
GO:0006401RNA catabolic process2102181.1e-1377
GO:0072594Establishment of protein localization to organelle2122136.1e-1314
GO:0019058Viral infectious cycle2282181.4e-12314/+
GO:0051301Cell division4804191.3e-11235/+
GO:0016071mRNA metabolic process6145441.3e-10721/+
GO:0006412Translation4693762.3e-9316/+
GO:0002682Regulation of immune system process8938277.1e-9183/+
GO:0022411Cellular component disassembly2952298.5e-9012/+
GO:0001775Cell activation7636636.4e-8874/+
GO:0046649Lymphocyte activation4713672.8e-8761/+
GO:0045321Leukocyte activation5564392.3e-8463/+
GO:0050776Regulation of immune response5644358.7e-7943/+
GO:0022415Viral reproductive process5474163.0e-7744/+
GO:0007155Cell adhesion9638671.4e-7441/+
GO:0022610Biological adhesion9658692.1e-7441/+
GO:0060337Type I interferon-mediated signaling pathway73718.5e-735/+
GO:0071357Cellular response to type I interferon73718.5e-735/+
GO:0034340Response to type I interferon74715.7e-725/+
GO:0016032Viral reproduction7015461.6e-6846/+
GO:0044764Multi-organism cellular process7035474.0e-6846/+
GO:0006396RNA processing6564881.4e-6318
GO:0010564Regulation of cell cycle process4402952.9e-6245/+
GO:0002684Positive regulation of immune system process5583871.9e-5941/+
GO:0006259DNA metabolic process8807071.2e-5675/+
GO:0051249Regulation of lymphocyte activation2961831.5e-5634/+
GO:0045087Innate immune response5443681.8e-5625/+

Shown are the 50 most significant terms (p-values are Bonferroni adjusted). CG is the number of cancer genes present in each term and the “+” indicates the significant enrichment of a GO-term with cancer genes. The total number of census genes in these 50 GO-terms is 256.

GPEA analysis of the breast cancer gene regulatory network for GO biological processes. Shown are the 50 most significant terms (p-values are Bonferroni adjusted). CG is the number of cancer genes present in each term and the “+” indicates the significant enrichment of a GO-term with cancer genes. The total number of census genes in these 50 GO-terms is 256. Interestingly, the significant biological processes shown in Table 2 are known to be most affected in breast cancer and many are recognized as hallmarks of cancer. For example, increased translational initiation through elevated expression of key genes such as pS6, p4E-BP1, eEF2K, and decreased pdcd4 are associated with poor prognosis in hormone receptor-positive breast cancer, highlighting the role of translational control in breast cancer pathogenesis (Meric-Bernstam et al., 2012). Also, inflammation has been cited as one of the major hallmarks of cancer and immune infiltration of tumors (principally by the innate immune system) has been shown to be a key component in both the initiation, progression, survival rates and chemotherapy responses of multiple cancer types including breast cancer (DeNardo et al., 2011; Hanahan and Weinberg, 2011b). An emerging hallmark of cancer is its ability to evade the host immune system by paralyzing immune cells such as CTLs and NK cells through secretion of TGF? or other immunosuppressive mediators (Shields et al., 2010). Aneuploidy and Chromsomal Instability (CIN) are also well known hallmarks of cancer which highlight the dysregulation of mitotic control and chromosome segregation in many cancer types. Many mitotic regulators are known to be overexpressed principally by transcriptional upregulation (Aurora kinases, PLKs) or less frequently mutated (Bub1, Bub1R, Mps1) resulting in impaired mitotic checkpoints (reviewed in Kops et al., 2005).

3.2.1. Cancer census genes and cell cycle

To study the relationship of the identified functional modules and known cancer genes, we utilize the manually curated cancer gene census (CGC) list (Futreal et al., 2004) consisting in total of 435 cancer genes. For each GO-term, we count the number of present cancer genes (last column in Table 2) and perform a hypergeometric test to determine the enrichment of cancer genes. From the 50 tests for each GO-term in Table 2, we find 32 to be enriched with cancer genes; after a Bonferroni correction for a significance level of α = 0.05. These GO-terms are highlighted by a “+” in the last column in Table 2. Furthermore, the 50 most significant GO-terms comprise in total 4743 genes, of which 238 are cancer genes (54.71% = 256/435). Also this gene set comprising all 50 GO-terms is significantly enriched with cancer genes. In Figure 1, we show a subnetwork of our breast cancer GRN that includes only genes belonging to the biological process term cell cycle (GO:0007049). This network component contains a total of 345 interactions among 728 genes of which 51 genes are known cancer genes (Futreal et al., 2004). Among these 51 cancer genes, we find BRCA1 and BRCA2 that are multifunctional proteins playing a major role in DNA repair processes.
Figure 1

Subnetwork of the breast cancer GRN for the biological process cell cycle (GO:0007049). This subnetwork includes 51 census cancer genes, e.g., BRCA1 and BRCA2. The census cancer genes are highlighted in red. Overall, the shown network consists of 728 genes and 1671 interactions.

Subnetwork of the breast cancer GRN for the biological process cell cycle (GO:0007049). This subnetwork includes 51 census cancer genes, e.g., BRCA1 and BRCA2. The census cancer genes are highlighted in red. Overall, the shown network consists of 728 genes and 1671 interactions. Interestingly, our breast cancer GRN shows proximities of some well characterized interactions. For example, Figure 1 shows a close association of BRCA2, MSH2, and MSH6. These proteins are known to interact in the BRCA1 Associated Surveillance Complex (BASC), a large multicomponent DNA damage sensing complex containing proteins with roles in recognition of abnormal DNA structures or damaged DNA, suggesting that BASC may serve as a sensor for DNA damage (Wang et al., 2000). Additionally Figure 1 shows a close association between FANCA and BLM. Both proteins again have been shown to interact in a multiprotein complex and participate in genomic maintenance (Meetei et al., 2003). Within the 1st level neighbors shown in Figure 2 there also appear to be interesting associations. For example, p53 is proposed to closely associate with C1QBP a protein modulated EGF-induced cancer cell chemotaxis and metastasis in Severe Combined Immunodeficiency (SCID) mouse models, suggesting that p53 loss of function could result in C1QBP-mediated metastatic events (Zhang et al., 2013). Conversely, p53 also showed close association with PFN1 a protein shown to have antiproliferative function in MDA-MB-231 cells (Zou et al., 2010).
Figure 2

Shown is the 1st degree neighbors subnetwork, G. Note that we also included RAD50, despite the fact that RAD50 is not present in cancer census gene list.

Shown is the 1st degree neighbors subnetwork, G. Note that we also included RAD50, despite the fact that RAD50 is not present in cancer census gene list.

3.3. Local landscape of breast cancer genes

Next, we investigate 10 well-known genes that are frequently observed in inherited breast cancer. Specifically, germline mutations in BRCA1, BRCA2, TP53, PTEN, CHEK2, ATM, NBN (also denoted by NBS1), RAD50, BRIP1, and PALB2 are known to be associated with a high risk for breast cancer (Walsh and King, 2007). Interestingly, all of these genes, except RAD50, are also in the cancer census gene list (Futreal et al., 2004) and it is known that these genes are playing an important role in genomic integrity such as DNA repair pathways. In order to study the local interaction landscape of these 10 breast cancer genes, we extract their 1st degree neighbors from our breast cancer network GBC3Net. Figure 2 shows the resulting subnetwork, which we denote by G1st. As one can see, we obtain one large network component in G1st that includes seven cancer genes (BRCA1, BRCA2, CHEK2, ATM, NBN (NBS1), BRIP1, and PALB2) and three smaller components that contain only a single cancer gene (TP53, PTEN, and RAD50). Overall, G1st consists of 116 genes and 209 interactions. We would like to emphasize that despite the fact that in G1st (Figure 2), e.g., TP53 is not connected to BRCA1, there is a path in our breast cancer network GBC3Net. The reason for this is that G1st contains only the 1st degree neighbors of the 10 breast cancer genes in GBC3Net in order to obtain a (small) subnetwork that can be visualized sensibly. Extracting the subnetwork from GBC3Net that would connect all 10 cancer genes with each other along shortest paths consists of 107 genes and has an average shortest path length of 4.31. In Table 3 we show the length of the shortest paths that connect these 10 breast cancer genes.
Table 3

Length of the shortest path between pairs of breast cancer genes (Bgene1 and Bgene2).

Bgene1Bgene2Shortest path length
CHEK2BRIP12
BRCA1BRIP13
BRCA2CHEK23
BRCA2NBN3
BRCA2BRIP13
ATMNBN3
NBNPALB23
BRCA1BRCA24
BRCA1CHEK24
BRCA1PALB24
BRCA2PTEN4
BRCA2RAD504
BRCA2PALB24
TP53PALB24
PTENCHEK24
PTENNBN4
PTENRAD504
CHEK2ATM4
CHEK2NBN4
CHEK2RAD504
CHEK2PALB24
NBNBRIP14
RAD50BRIP14
RAD50PALB24
BRIP1PALB24
BRCA1PTEN5
BRCA1ATM5
BRCA1NBN5
BRCA1RAD505
BRCA2TP535
BRCA2ATM5
TP53PTEN5
TP53CHEK25
TP53ATM5
TP53NBN5
TP53RAD505
TP53BRIP15
PTENATM5
PTENBRIP15
PTENPALB25
ATMRAD505
ATMBRIP15
NBNRAD505
BRCA1TP536
ATMPALB26
Length of the shortest path between pairs of breast cancer genes (Bgene1 and Bgene2). Biologically, the interconnectivity between the genes in G1st shown in Figure 2 reflects the combined roles of these individual genes in DNA damage signalling and repair, which is a major feature of cancer predisposition in breast cancer. For example, both BRCA1 and BRCA2 play key roles in co-ordination of homologous recombination events following double strand break repair (DSBR) (for review see Powell et al., 2011). Whilst BRCA1 and BRCA2 are considered “high risk” the other five members of this network component are considered “moderate risk” with a two- to fourfold increased breast cancer risk relative to the general population (10%) (Hollestelle et al., 2010). BRIP1 (also known as FANCJ or BACH1) and PALB2 physically interact with BRCA1 and BRCA2 to orchestrate helicase unwinding of DNA and to promote RAD51-mediated strand invasion functions in DSBR. A common DNA damage network between the BRCA and Fanconi anaemia (FA) pathways has been proposed and three FA genes, FANCD1, FANCN, and FANCJ, are identical to the BRCA genes BRCA2, PALB2, and BRIP1 (reviewed in Wang, 2007). ATM acts as a sentinel kinase detecting and signalling DSBs through phosphorylation of a plethora of other proteins including NBS1 (to initiate end processing of DSB ends as part of the MRN complex) and CHEK2 to initiate the enforcement of cell cycle arrest. Whilst RAD50 and p53 constitute separate modes in this regulatory network, they are nevertheless integrated into DSBR signalling, notably the MRN end processing of DSBs and transcriptional regulation of cell cycle arrest, respectively. PTEN, a substrate of ATM with functions related in DNA damage repair signalling (Bassi et al., 2013), also has distinct tumor suppressor roles in modulation of PI3K activity and phosphatidyl inositol signalling.

3.4. Closeness and gene neighbor degree of census cancer genes

In this section we study the closeness between cancer genes in the breast cancer GRN. For the set of 435 cancer census genes (Futreal et al., 2004), present in the breast cancer GRN, we define census gene pairs that have a significant shorter shortest path length compared to all shortest path length of the entire network. The null distribution is obtained by the distribution of all shortest paths between all (197232 − 19723)/2 = 194,488,503 gene pairs. For each of the (4352 − 435)/2 = 94,395 census gene pairs a p-value is estimated by the fraction of shortest path length from all gene pairs that are shorter as the observed shortest path of the census gene pair. We consider a multiple testing correction using the Benjamini and Hochberg procedure for α = 0.05. As a result, we find 148 significant census gene pairs (0.15%), involving 188 (43.21%) cancer census genes, that have significant shorter shortest paths. Aside from this, we find a total of 51 network components of directly connected census genes. The largest network component of directly connected census genes is 21 and 11 genes with the remaining components with ≤ 8 genes.

3.5. Cooperation between chromosomes

Finally, we study the relation between the genetic context and the structural connectivity of our breast cancer network GBC3Net. We study this relation in the following way. First, we investigate the overall frequency of a gene pair being located on the same chromosome or located on different chromosomes. We find that, in average, 20.43% of the interactions in the breast cancer network connect genes that are located on the same chromosome. Hence, 79.57% of the interactions connect genes on different chromosomes. Interactions between genes on separate or the same chromosome can be seen as trans-interactions and cis-interactions, in analogy to the trans- and cis-regulation of genes (Cheung et al., 2010). However, we would like to emphasize that there is a crucial difference between both types of connections. For a “regulation”, the transcription of a gene is regulated by a cis- or trans-acting transcription factor, whereas an “interaction” means any type of biochemical binding, not limited to transcription regulation, but also including protein-protein interaction, phosphorylation, ubiquitination or others. In our next analysis, we test if there are chromosome pairs that contain a statistically significant number of interactions. That means we calculate the number of interactions, e.g., between chromosome 2 and 21, denoted as s2,21, and apply a statistical hypothesis test to see if this number is larger than expected by chance, i.e., srand|2,21. In order to obtain the a sampling distribution for the general null hypothesis we randomize the gene labels in the breast cancer network E times. We would like to note that the indices i and j in srand| indicate that the sampling distribution is different for each chromosome pair because it takes the varying size of the chromosomes into account. For each randomization, e ∈ E, we calculate the number of interactions s between each chromosome pair (i, j ∈ {1, 2, …, 22, X, Y}. From this, we estimate p-values by Here, I(), is the indicator function which gives a value of “1” if its argument is true and “0” otherwise. We would like to emphasize, first, that the way we estimate the p-values for each chromosome pair (i,j) uses its own, individual sampling distribution given by the values of {s}. Second, utilizing the connection structure of the inferred breast cancer network in combination with a gene label resampling conserves not only the total number of interactions among genes, but also the structural properties of the network. Furthermore, the uneven number of genes on the 24 chromosomes is considered by this procedure as well. In total, we perform 300 = (242 − 24)/2 + 24 tests for chromosome pairs for the 24 chromosomes. In order to adjust for multiple testing, we apply a Benjamini and Hochberg (1995) correction controlling the FDR for a significance level of α = 0.05. This guarantees a false discovery rate of FDR ≤ α (Dudoit and van der Laan, 2007). For our non-parametric estimation of the p-values, we used E = 100,000. From our analysis, we find seven chromosome pairs that are statistically significant, shown in Figure 3B. Interestingly, 6 of the 7 significant pairs involve chromosome 4 and 21 and the remaining significant link represents a self-interaction on the Y chromosome. The results of our analysis shed light on the cooperation of genes as measured by the prevalence of significant interactions between chromosome pairs. From this perspective, visualized in Figure 3A, one sees that only a rather limited number of chromosomes contribute to this cooperation on the chromosome level.
Figure 3

(A) Statistically significant cooperations between chromosome pairs are highlighted by a link. The Benjamini and Hochberg (BH) adjusted p-values for these links are shown in the Table (B). The significance level for this analysis was α = 0.05, which guarantees that FDR ≤ α holds for the set of all significant links.

(A) Statistically significant cooperations between chromosome pairs are highlighted by a link. The Benjamini and Hochberg (BH) adjusted p-values for these links are shown in the Table (B). The significance level for this analysis was α = 0.05, which guarantees that FDR ≤ α holds for the set of all significant links. In terms of intra-chromosomal gene co-regulation many low-penetrance breast cancer susceptibility loci are found to be located in non-protein-coding regions, suggesting their involvement in gene expression regulation. For example, Smits et al., have shown how the human MCS5A polymorphisms associated with breast cancer risk are located at both sides of a looped structure and functionally interact to downregulate transcriptional activity, a phenomenon which is conserved with rat Mcs5a [14]. In addition Akulenko and Helms have showed that out of 300 pairs of genes which showed co-methylation (but not nescessarily also co-repressed), 187 pairs were located on the same chromosome and were shown to be related to similar functional processes in the same pathways [15]. In fact they concluded that co-methylation “anti-correlated” with genomic distance [15]. Like most cancers, breast cancers are comprised not only of cancerous epithelia but also of numerous other contributory cell types which are involved in various stages of tumor initiation, progression and metastasis. There are instances where Loss of Heterozygosity (LOH) from genomic regions on the same chromosomes has been reported in breast cancer epithelial but furthermore LOH from adjacent regions could be detected in both breast cancer epithelia and breast cancer stromal cells from within the same tumors [16].

4. Discussion

In this study, we investigated a breast cancer gene regulatory network with respect to its structural and functional features. The network itself has been inferred from the application of BC3Net to a large-scale gene expression data set of breast cancer patients provided by the International Genomics Consortium (IGC) (expO data set). From conducting a GPEA for GBC3Net, we found significant enrichment of GO-terms that represent biological processes for translation, immune response, cell cycle, organelle fission, mitosis, cell adhesion, RNA processing, RNA splicing and response to wounding. These biological processes are well described by cancer hallmark pathways (Hanahan and Weinberg, 2000, 2011b; Vogelstein and Kinzler, 2004; Berretta and Moscato, 2010) and known for playing central roles in differentiation, proliferation and immune system functional processes. In order to relate these processes to known cancer genes, we used the cancer census gene list (Futreal et al., 2004). Interestingly, our analysis revealed that 32 of the 50 most significant GO-terms are also enrichment with known cancer genes (see Table 2). From studying a subnetwork of GBC3Net limited to cell cycle genes, we found 51 known cancer genes. Among these 51 cancer genes the most prominent breast cancer genes BRCA1, BRCA2, and Chk2 were present. BRCA1 is multi-functional protein involved in DNA damage repair, cell cycle checkpoint activation, ubiquitination and in the regulation of gene expression (Zhu et al., 2011). Chk2 is a tumor suppressor which functions as a protein kinase involved in DNA damage and cell-cycle arrest (Matsuoka et al., 1998). Although our GPEA analysis is restricted to the underlying biological knowledge gathered in the GO database, it provides a good overview of the most prominently represented biological processes present in the breast cancer network. Currently, the analysis of genes involved in breast cancer is a very active field and resources have been established that provide a collective overview of such genes (Mosca et al., 2010). In our study, the manually curated census gene list (Futreal et al., 2004) has been used to gain functional insights of the involved biological processes. In previous studies, these cancer genes have been analyzed for a variety of cellular networks, e.g., protein interaction networks (Jonsson and Bates, 2006; Rambaldi et al., 2008) and manually curated signaling networks (Awan et al., 2007). For instance, a structural network analysis of the degree, betweenness and closeness of differentially expressed cancer genes, mapped on a protein interaction network, was performed by Hernandez et al. (2007). However, the main problem with such an approach is that the interactions in a protein network, e.g., measured by yeast-two-hybrid (Y2H), are outside of a disease context of the corresponding physiological state. On the other hand, one of the main advantages of gene regulatory networks is that they are measured in the physiological context under investigation. This should enable a more relevant analysis. A gene regulatory network also provides novel candidates for experimental investigations such as the hub genes that are highly enriched e.g., by membrane receptors (Table 4).
Table 4

Census cancer genes ranked by degree in all-census-pair shortest path network.

Entrez IDDeg. nodesSymbolName
81137725OR7E104POlfactory receptor, family 7, subfamily E, member 104 pseudogene
100132767524
2623462GATA1GATA binding protein 1 (globin transcription factor 1)
387601432SLC22A25Solute carrier family 22, member 25
353135415LCE1ELate cornified envelope 1E
284187384
256646334C15orf55Chromosome 15 open reading frame 55
219409334GSX1GS homeobox 1
9362326CPNE6Copine VI (neuronal)
283933319ZNF843Zinc finger protein 843
9127311P2RX6Purinergic receptor P2X, ligand-gated ion channel, 6
285877303POM121L12POM121 transmembrane nucleoporin-like 12
9758292FRMPD4FERM and PDZ domain containing 4
3166292HMX1H6 family homeobox 1
60506290NYXNyctalopin
348808288NPHP3-AS1NPHP3 antisense RNA 1
4584287MUC3AMucin 3A, cell surface associated
284805283C20orf203Chromosome 20 open reading frame 203
163778262SPRR4Small proline-rich protein 4
155256ADRB3Adrenoceptor beta 3
286023252FLJ40288Uncharacterized FLJ40288
64405251CDH22Cadherin 22, type 2
1813248DRD2Dopamine receptor D2
149647245FAM71AFamily with sequence similarity 71, member A
253868238C20orf166-AS1C20orf166 antisense RNA 1
Census cancer genes ranked by degree in all-census-pair shortest path network. The impact of cancer progression driving gene mutations is explained by causing the malfunction of a protein, binding site alteration that causes a loss or constitutive deregulated function. Another major aspect of cancer progression is genome instability that causes larger mutation events, e.g., copy number variations, duplications, gene loss and translocation events of large genomic regions. Such processes can lead to the deregulation of genes and their related functions causing an amplification, downregulation or a complete shut-down of gene expression leading to a functional gain or loss. In order to relate the inferred interactions in our breast cancer network back to their genetic origin, we studied chromosomal effects. From this analysis, we found that in average 20.43% of the interactions in the breast network connect genes that are co-located on the same chromosome. Furthermore, we found only seven chromosome pairs, involving eight different chromosomes, that are more densely connected than expected by chance. Specifically, we observed the chromosomes 4, 13, and 21 to connect to two or more chromosomes and the chromosomes 2, 7, 15, 20, and Y to connect to exactly one chromosome. A putative explanation for this observation may be the amplification of the genes in a chromosome that would lead to a higher probability for interactions occurring within and between chromosomes. On a more general note, one may wonder if the inferred gene regulatory network is “complete” or if parts of it, e.g., important driver gene(s) and their interactions, may be missing. Here it is crucial to realize that for our analysis, we used only observational data from cancer tissues. That means, the data were not generated in a controlled manner ensuring a sufficiently strong signal for all relevant biological components of the system. For this reason, it is possible that parts of the true breast cancer network were missed. Also, our network inference method BC3Net aims at inferring the strongest signal (network) within a given data set. However, from comparing information about our inferred gene regulatory network and independently conducted experiments one could get valuable information about such “missing parts” in order to come up with an experimental design that might fill-in these gaps. Hence, even information that is missing in our breast cancer network could be utilized for a future experimental design of follow-up studies.

5. Conclusions

Complex disorders like breast cancer require a systems level analysis. For this reason, network-based approaches provide a practical means to elucidate the biological function of processes from large-scale genomic data (Emmert-Streib and Dehmer, 2011). This also opens a venue for translation bioinformatics and personalized medicine, which depend crucially on the availability of robust, genome-scale analysis methods (Gonzalez-Angulo et al., 2010; Chan and Ginsburg, 2011; Chin et al., 2011).

5.1. Data sharing

We provide the gene expression data and the inferred breast cancer GRN from our analysis in the R-package BreastCancerGRN, available from CRAN.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  57 in total

1.  Emergence of scaling in random networks

Authors: 
Journal:  Science       Date:  1999-10-15       Impact factor: 47.728

Review 2.  Cancer as a robust system: implications for anticancer therapy.

Authors:  Hiroaki Kitano
Journal:  Nat Rev Cancer       Date:  2004-03       Impact factor: 60.716

3.  Ten genes for inherited breast cancer.

Authors:  Tom Walsh; Mary-Claire King
Journal:  Cancer Cell       Date:  2007-02       Impact factor: 31.743

Review 4.  Low duplicability and network fragility of cancer genes.

Authors:  Davide Rambaldi; Federico M Giorgi; Fabrizio Capuani; Andrea Ciliberto; Francesca D Ciccarelli
Journal:  Trends Genet       Date:  2008-07-31       Impact factor: 11.639

Review 5.  Gene regulatory network inference: data integration in dynamic models-a review.

Authors:  Michael Hecker; Sandro Lambeck; Susanne Toepfer; Eugene van Someren; Reinhard Guthke
Journal:  Biosystems       Date:  2008-12-27       Impact factor: 1.973

6.  Ductal carcinoma in situ: a rose by any other name.

Authors:  D Lawrence Wickerham; Thomas B Julian
Journal:  J Natl Cancer Inst       Date:  2013-09-25       Impact factor: 13.506

Review 7.  Future of personalized medicine in oncology: a systems biology approach.

Authors:  Ana Maria Gonzalez-Angulo; Bryan T J Hennessy; Gordon B Mills
Journal:  J Clin Oncol       Date:  2010-04-20       Impact factor: 44.544

Review 8.  Recurrent chromosome aberrations in cancer.

Authors:  F Mitelman
Journal:  Mutat Res       Date:  2000-04       Impact factor: 2.433

Review 9.  Harnessing the complexity of gene expression data from cancer: from single gene to structural pathway methods.

Authors:  Frank Emmert-Streib; Shailesh Tripathi; Ricardo de Matos Simoes
Journal:  Biol Direct       Date:  2012-12-10       Impact factor: 4.540

10.  Further steps toward functional systems biology of cancer.

Authors:  Enrique Hernández-Lemus
Journal:  Front Physiol       Date:  2013-09-18       Impact factor: 4.566

View more
  26 in total

Review 1.  Network inference in systems biology: recent developments, challenges, and applications.

Authors:  Michael M Saint-Antoine; Abhyudai Singh
Journal:  Curr Opin Biotechnol       Date:  2020-01-09       Impact factor: 9.740

2.  Deep Learning-Based Multi-Omics Integration Robustly Predicts Relapse in Prostate Cancer.

Authors:  Ziwei Wei; Dunsheng Han; Cong Zhang; Shiyu Wang; Jinke Liu; Fan Chao; Zhenyu Song; Gang Chen
Journal:  Front Oncol       Date:  2022-06-23       Impact factor: 5.738

3.  Urothelial cancer gene regulatory networks inferred from large-scale RNAseq, Bead and Oligo gene expression data.

Authors:  Ricardo de Matos Simoes; Sabine Dalleau; Kate E Williamson; Frank Emmert-Streib
Journal:  BMC Syst Biol       Date:  2015-05-14

4.  Reverse Engineering of Genome-wide Gene Regulatory Networks from Gene Expression Data.

Authors:  Zhi-Ping Liu
Journal:  Curr Genomics       Date:  2015-02       Impact factor: 2.236

Review 5.  Gene regulatory networks and their applications: understanding biological and medical problems in terms of networks.

Authors:  Frank Emmert-Streib; Matthias Dehmer; Benjamin Haibe-Kains
Journal:  Front Cell Dev Biol       Date:  2014-08-19

Review 6.  Time-Delayed Models of Gene Regulatory Networks.

Authors:  K Parmar; K B Blyuss; Y N Kyrychko; S J Hogan
Journal:  Comput Math Methods Med       Date:  2015-10-20       Impact factor: 2.238

7.  DYVIPAC: an integrated analysis and visualisation framework to probe multi-dimensional biological networks.

Authors:  Lan K Nguyen; Andrea Degasperi; Philip Cotter; Boris N Kholodenko
Journal:  Sci Rep       Date:  2015-07-29       Impact factor: 4.379

8.  Functional and genetic analysis of the colon cancer network.

Authors:  Frank Emmert-Streib; Ricardo de Matos Simoes; Galina Glazko; Simon McDade; Benjamin Haibe-Kains; Andreas Holzinger; Matthias Dehmer; Frederick Campbell
Journal:  BMC Bioinformatics       Date:  2014-05-16       Impact factor: 3.169

9.  GANP protein encoded on human chromosome 21/mouse chromosome 10 is associated with resistance to mammary tumor development.

Authors:  Kazuhiko Kuwahara; Mutsuko Yamamoto-Ibusuki; Zhenhuan Zhang; Suchada Phimsen; Naomi Gondo; Hiroko Yamashita; Toru Takeo; Naomi Nakagata; Daisuke Yamashita; Yoshimi Fukushima; Yutaka Yamamoto; Hiroji Iwata; Hideyuki Saya; Eisaku Kondo; Keitaro Matsuo; Motohiro Takeya; Hirotaka Iwase; Nobuo Sakaguchi
Journal:  Cancer Sci       Date:  2016-02-09       Impact factor: 6.716

10.  Biological networks: the microscope of the twenty-first century?

Authors:  Frank Emmert-Streib; Matthias Dehmer
Journal:  Front Genet       Date:  2015-10-13       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.