Literature DB >> 21347162

Pattern discovery in breast cancer specific protein interaction network.

Xiaogang Wu¹, Scott H Harrison, Jake Yue Chen.

Abstract

The interest in indentifying novel biomarkers for early stage breast cancer (BRCA) detection has become grown significantly in recent years. From a view of network biology, one of the emerging themes today is to re-characterize a protein's biological functions in its molecular network. Although many methods have been presented, including network-based gene ranking for molecular biomarker discovery, and graph clustering for functional module discovery, it is still hard to find systems-level properties hidden in disease specific molecular networks. We reconstructed BRCA-related protein interaction network by using BRCA-associated genes/proteins as seeds, and expanding them in an integrated protein interaction database. We further developed a computational framework based on Ant Colony Optimization to rank network nodes. The task of ranking nodes is represented as the problem of finding optimal density distributions of "ant colonies" on all nodes of the network. Our results revealed some interesting systems-level pattern in BRCA-related protein interaction network.

Entities: Chemical Disease Gene Species

Year: 2009 PMID： 21347162 PMCID： PMC3041566

Source DB: PubMed Journal: Summit Transl Bioinform ISSN： 2153-6430

Introduction

The interest in indentifying novel biomarkers for early stage breast cancer (BRCA) detection has become grown significantly in recent years1–3. Known BRCA susceptibility genes, e.g. P53, BRCA1, BRCA2, ERBB2 and PTEN, only account for 15–20% of the familial risk4. Identification of these genes5, while extremely precious, is only a first step to understand BRCA progression. From a view of network biology6, these genes never function in isolation7, one study re-characterized them in a molecular interaction network for BRCA, and identified HMMR as a new susceptibility locus3. Another study integrated protein interaction network and gene expression data to improve the prediction of BRCA metastasis8. These works suggest that protein interaction networks, although noisy and incomplete, can serve as a molecule-level conceptual roadmap to guide future network biomarkers studies9. On the other hand, it is found that both biological shape10, 11 and physiological signals12, 13 have chaotic and/or fractal characteristics14, which indicate that many biological systems and networks could be analyzed effectively by applying nonlinear dynamical approaches involving chaos, fractal, bifurcation, pattern formation and complex systems15. For these studies, the concept of dynamical biomarkers was firstly introduced on a speech by A.L. Goldberger in 200616, which can be seen as an initiation of using nonlinear dynamical properties as biomarkers, although this concept has not extended to the area of molecular networks. Based on the relationship between features of complex networks (e.g. scale-free) and nonlinear dynamical properties (e.g. fractals)17, systems-level biomarkers (sys-biomarkers), as an innovative concept shown in Figure 1, derive from the marriage of network biomarkers and dynamical biomarkers. Although many methods have been presented in network biology, including network-based gene ranking for molecular biomarker discovery18, and graph clustering for functional module discovery19, it is still hard to find sys-biomarkers hidden in disease specific molecular networks.

Figure 1.

Evolvement of concepts on diagnostic biomarkers.

Starting with the initial motivation of systems biology20, we reconstructed BRCA-related protein interaction network by taking BRCA-associated genes/proteins as seeds, using the nearest-neighbor expansion method21, and expanding them in an integrated protein interaction database. Our method allows BRCA experts to merge their prior knowledge on the BRCA-associated genes/proteins into a manually curated list (protein seeds), which could be obtained from the OMIM™ database (Online Mendelian Inheritance in Man™). Here, we use the latest high-quality subsets of protein interaction data integrated into the Human Annotated and Predicted Protein Interaction (HAPPI, http://bio.informatics.iupui.edu/HAPPI) database. In this database, all protein interactions are weighted, with a confidence score (SC) encoding prior knowledge of experimental and literature evidence supporting each protein interaction. We further developed a computational framework based on Ant Colony Optimization (ACO) 22 to rank network nodes. The task of ranking nodes is represented as the problem of finding optimal density distributions of “ant colonies” on all nodes of the network. Our results revealed some interesting systems-level pattern in BRCA-related protein interaction network.

Results

In our experiments, we firstly constructed an BRCA-related protein interaction network as described above. Using the ACO ranking algorithm, the ranking results of the weighted BRCA-related protein interaction network are shown in Figure 2, which show the ranked adjacency matrix according to the final density distribution. The top 20 proteins from the ranking result shown in Figure 2(f) are highlighted in Figure 3, which shows a high-quality BRCA-related protein interaction network when taking interaction confidence scores CS > 0.99. Node degree distributions plotted in Figure 4 for each BRCA-related protein interaction network taking different CS thresholds, are all very close to power-law distribution, which implies scale-free features.

Figure 2.

Node ranking of the weighted BRCA-related protein interaction network. (a) CS > 0.50; (b) CS > 0.60; (c) CS > 0.70; (d) CS > 0.80; (d) CS > 0.90; (d) CS > 0.99.

Figure 3.

A visual layout of the BRCA protein interaction network (CS > 0.99). Top 20 proteins from the ranking result shown in Figure 2(f) are highlighted.

Discussion

ACO is a dynamic process effective in solving optimization problems such as those of phylogenetic analyses in biology15. Here, we represent the task of finding network relevant nodes as an ant colony optimization problem, in which simulated ants (s-ant) roam all possible network paths iteratively. By designing various strategies of s-ants for each step taken to walk in a network, the iteration process can be manipulated to get the density distribution of s-ants crowding on each node. According to this density distribution, the adjacency matrix of the network with ranked nodes is shown as a map in order to reveal the system-level features of the network. Experiments on an BRCA-relevant protein interaction network demonstrated that this method finds the key nodes in the network, and also reveals a fractal feature of the scale-free network through a quick-populating strategy of colonization. Analyses for both unweighted and weighted protein interaction networks based on this framework are given to exhibit the feasibility and flexibility of our method. Comparisons with previous works on BRCA-related protein interaction networks show the reliability of ACO.

Conclusion

Proteins ranked from an BRCA network using our method not only show system-level fractal characteristics but are also useful for subsequent translational biomedical discoveries of gene/protein-disease associations. The highly-ranked proteins from the case study for BRCA could be prioritized for “drug target candidates” and, with additional validation, for “disease biomarker candidates”, where proteins may be differentially and specifically expressed in tissues/biofluids based on an associated condition of health or disease. We found that ACO-adapted framework to be robust in identifying fractal-like organization with or without confidence weightings of network connections. Our results revealed fractal features not previously reported in disease-specific molecular interaction networks. Our results are comparable but seem more sensitive than a previous study11, suggesting convergence of different algorithmic approaches in revealing the same network characteristics of BRCA-related proteins. Proteins in this disease-specific network could have dramatically different characteristics than in the global network. For example, as labeled in Figure 2, CDK5 is a major BRCA-related protein and a “mini-hub” in the BRCA protein interaction network, but it is not a major hub in the global networks based on having a node degree of only 22 in the HAPPI database12. If we accept that fractal features reflect a high level of “orderness” eventually interpretable in biology, the results of our study and methodology could point to a brand-new direction of finding and ranking proteins and genes systematically for all human diseases with public data available to bioinformatics researchers today.

Methods

In this framework, node ranking is seen as an optimization problem, which is why the concept of an “ant colony” can be utilized. ACO is mostly like a multi-agent system, but each s-ant (also can be seen as an agent in the system) will mark its path in ACO in a manner comparable to the natural situation where a real ant will leave a pheromone on its track. The pheromone on the ground will stimulate other ants to work together and the whole ant colony will become more cooperative, in a phenomenon of self-organizing communication called stigmergy13. This characteristic of self-organization leads to an emergence of a complex system, and we propose to leverage this characteristic into solving the problem of complex biological networks by using it as a basis for complex systems modeling. In our developed methodology, s-ants roam all possible network paths iteratively, and marks signed by the s-ants act to accelerate the optimization process. By designing various strategies F of s-ants for each step taken to walk in a network, the iteration process can be manipulated to get the density distribution s of s-ants crowding on each node, as shown in Eq. (1). According to this density distribution, the ranked adjacency matrix of the network will be shown as a map to reveal the system-level feature of the network. Here M is determined by both the network features under analysis (including topology and weighted information) and the marks signed by s-ants. The initial column vector can be evaluated as s0 = (1/n, 1/n,…, 1/n)T to describe the equivalence of each node in the network. The final density distribution s will determine the rank of each node. Moreover, marks signed from outside will easily switch this scheme from an unsupervised mode into a supervised one. In a simple case of the proposed scheme, s-ants never sign a mark on the network, and M is only determined by the network, which means it is invariable. Eq. (1) can be reduced as: For further simplification, s-ants can be modeled by the constraint of maintaining a constant walking strategy, and Eq. (2) can be reduced as: Here M becomes the state transition probability matrix about the network. From Eq. (3), we observe there to be a typical Markov Chain. Let P denote the adjacency matrix of the network (in spite of directed versus undirected or unweighted versus weighted). In the event where s-ants fail to populate, M can be obtained by Eq. (4). We established by proof that the final density distribution s has a convergent limit as described by Eq. (5). If s-ants populate quickly, M can be simply evaluated as M = P. In this situation however, a convergent property of this algorithm cannot be assured for all kinds of networks. In our experiments, it seems to be related with a scale-free feature.

15 in total

1. Fractal dynamics in physiology: alterations with disease and aging.

Authors: Ary L Goldberger; Luis A N Amaral; Jeffrey M Hausdorff; Plamen Ch Ivanov; C-K Peng; H Eugene Stanley
Journal: Proc Natl Acad Sci U S A Date: 2002-02-19 Impact factor: 11.205

Review 2. Network biology: understanding the cell's functional organization.

Authors: Albert-László Barabási; Zoltán N Oltvai
Journal: Nat Rev Genet Date: 2004-02 Impact factor: 53.242

3. Emergence of complex dynamics in a simple model of signaling networks.

Authors: Luís A N Amaral; Albert Díaz-Guilera; Andre A Moreira; Ary L Goldberger; Lewis A Lipsitz
Journal: Proc Natl Acad Sci U S A Date: 2004-10-25 Impact factor: 11.205

4. Broken asymmetry of the human heartbeat: loss of time irreversibility in aging and disease.

Authors: Madalena Costa; Ary L Goldberger; C-K Peng
Journal: Phys Rev Lett Date: 2005-11-04 Impact factor: 9.161

5. Mining Alzheimer disease relevant proteins from integrated protein interactome data.

Authors: Jake Yue Chen; Changyu Shen; Andrey Y Sivachenko
Journal: Pac Symp Biocomput Date: 2006

Review 6. The genetics and genomics of cancer.

Authors: Allan Balmain; Joe Gray; Bruce Ponder
Journal: Nat Genet Date: 2003-03 Impact factor: 38.330

7. Network modeling links breast cancer susceptibility and centrosome dysfunction.

Authors: Miguel Angel Pujana; Jing-Dong J Han; Lea M Starita; Kristen N Stevens; Muneesh Tewari; Jin Sook Ahn; Gad Rennert; Víctor Moreno; Tomas Kirchhoff; Bert Gold; Volker Assmann; Wael M Elshamy; Jean-François Rual; Douglas Levine; Laura S Rozek; Rebecca S Gelman; Kristin C Gunsalus; Roger A Greenberg; Bijan Sobhian; Nicolas Bertin; Kavitha Venkatesan; Nono Ayivi-Guedehoussou; Xavier Solé; Pilar Hernández; Conxi Lázaro; Katherine L Nathanson; Barbara L Weber; Michael E Cusick; David E Hill; Kenneth Offit; David M Livingston; Stephen B Gruber; Jeffrey D Parvin; Marc Vidal
Journal: Nat Genet Date: 2007-10-07 Impact factor: 38.330

8. GeneRank: using search engine technology for the analysis of microarray experiments.

Authors: Julie L Morrison; Rainer Breitling; Desmond J Higham; David R Gilbert
Journal: BMC Bioinformatics Date: 2005-09-21 Impact factor: 3.169

9. Genome-wide association study identifies novel breast cancer susceptibility loci.

Authors: Douglas F Easton; Karen A Pooley; Alison M Dunning; Paul D P Pharoah; Deborah Thompson; Dennis G Ballinger; Jeffery P Struewing; Jonathan Morrison; Helen Field; Robert Luben; Nicholas Wareham; Shahana Ahmed; Catherine S Healey; Richard Bowman; Kerstin B Meyer; Christopher A Haiman; Laurence K Kolonel; Brian E Henderson; Loic Le Marchand; Paul Brennan; Suleeporn Sangrajrang; Valerie Gaborieau; Fabrice Odefrey; Chen-Yang Shen; Pei-Ei Wu; Hui-Chun Wang; Diana Eccles; D Gareth Evans; Julian Peto; Olivia Fletcher; Nichola Johnson; Sheila Seal; Michael R Stratton; Nazneen Rahman; Georgia Chenevix-Trench; Stig E Bojesen; Børge G Nordestgaard; Christen K Axelsson; Montserrat Garcia-Closas; Louise Brinton; Stephen Chanock; Jolanta Lissowska; Beata Peplonska; Heli Nevanlinna; Rainer Fagerholm; Hannaleena Eerola; Daehee Kang; Keun-Young Yoo; Dong-Young Noh; Sei-Hyun Ahn; David J Hunter; Susan E Hankinson; David G Cox; Per Hall; Sara Wedren; Jianjun Liu; Yen-Ling Low; Natalia Bogdanova; Peter Schürmann; Thilo Dörk; Rob A E M Tollenaar; Catharina E Jacobi; Peter Devilee; Jan G M Klijn; Alice J Sigurdson; Michele M Doody; Bruce H Alexander; Jinghui Zhang; Angela Cox; Ian W Brock; Gordon MacPherson; Malcolm W R Reed; Fergus J Couch; Ellen L Goode; Janet E Olson; Hanne Meijers-Heijboer; Ans van den Ouweland; André Uitterlinden; Fernando Rivadeneira; Roger L Milne; Gloria Ribas; Anna Gonzalez-Neira; Javier Benitez; John L Hopper; Margaret McCredie; Melissa Southey; Graham G Giles; Chris Schroen; Christina Justenhoven; Hiltrud Brauch; Ute Hamann; Yon-Dschun Ko; Amanda B Spurdle; Jonathan Beesley; Xiaoqing Chen; Arto Mannermaa; Veli-Matti Kosma; Vesa Kataja; Jaana Hartikainen; Nicholas E Day; David R Cox; Bruce A J Ponder
Journal: Nature Date: 2007-06-28 Impact factor: 49.962