Literature DB >> 21803803

JMassBalance: mass-balanced randomization and analysis of metabolic networks.

Abstract

SUMMARY: Analysis of biological networks requires assessing the statistical significance of network-based predictions by using a realistic null model. However, the existing network null model, switch randomization, is unsuitable for metabolic networks, as it does not include physical constraints and generates unrealistic reactions. We present JMassBalance, a tool for mass-balanced randomization and analysis of metabolic networks. The tool allows efficient generation of large sets of randomized networks under the physical constraint of mass balance. In addition, various structural properties of the original and randomized networks can be calculated, facilitating the identification of the salient properties of metabolic networks with a biologically meaningful null model.
AVAILABILITY AND IMPLEMENTATION: JMassBalance is implemented in Java and freely available on the web at http://mathbiol.mpimp-golm.mpg.de/massbalance/. CONTACT: basler@mpimp-golm.mpg.de.

Entities: Chemical

Mesh：

Year: 2011 PMID： 21803803 PMCID： PMC3179655 DOI： 10.1093/bioinformatics/btr448

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Network-based studies of biological systems attempt to relate topological properties to biological function. The first step in drawing this connection involves determining the network properties which do not arise by chance. To this end, a network null model can be used to assess the statistical significance of network properties. The common approach for determining the statistical significance of a given property is to determine a P-value based on the following procedure: (i) determine the chosen property from an investigated biological network; (ii) sample a large number of random networks under biologically meaningful constraints; and (iii) estimate the mean and variance of the property from the simulated networks to calculate a z-score (with the corresponding P-value) under the assumption of normal distribution. Clearly, the significance of a network property strongly depends on the null model. The commonly used method, switch randomization (Guimerà ; Milo ; Sales-Pardo ), does not account for physical constraints, and thus generates unrealistic biochemical reactions (see Basler , for an example). Thus, it is questionable whether the significance determined by this generic randomization scheme helps to elucidate the relation between network properties and biological functions. Motivated by the lack of a biologically meaningful null model for metabolic networks, we developed a method for randomizing metabolic networks under the constraint of mass balance, and analyzed its computational complexity and uniformity of sampling (Basler ). Here, we present a tool which can be run via a graphical user interface (GUI) or from the command line, and implements mass-balanced randomization of metabolic networks provided in one of three standard data formats: (i) BioCyc (http://www.biocyc.org); (ii) Systems Biology Markup Language (SBML, http://sbml.org); or (iii) a customizable text file format.

2 METHOD

A metabolic network is represented as a weighted directed bipartite graph G=(V∪V, E), where V is the set of compound nodes, V the set of reaction nodes, and E⊆(V×V)∪(V×V) is the set of weighted, directed edges denoting stoichiometric substrate-reaction and product–reaction relationships. For example, an edge (c,r) specifies that compound c is a substrate of reaction r, while the stoichiometric coefficient s of c in r is represented as the weight of (c,r). A compound node is uniquely represented by a name, a compartment and a mass vector, m∈ℕ, i.e. the vector representation of the compound c over n chemical elements. For instance, when considering the six most abundant elements in biological systems: carbon (C), hydrogen (H), nitrogen (N), oxygen (O), phosphorus (P) and sulfur (S), then the mass vector of water is m=(0, 2, 0, 1, 0, 0)·(C, H, N, O, P, S). The set of considered chemical elements can be specified in a configuration file (see Reference Manual, available online at http://mathbiol.mpimp-golm.mpg.de/massbalance/). For a reaction r, r denotes the set of substrates, and r the set of products. A reaction node is uniquely represented by a name and its direction: reversible reactions are represented by one reaction node for each direction, r+ and r−, where r+=r− and r+=r−. A reaction is mass balanced, i.e. chemically feasible with respect to the conservation of mass, if the sum of its substrate atoms equals the sum of its product atoms: The randomization procedure consists of a pre-calculation step, which classifies the compounds from the network according to their chemical sum formula (see Basler ), followed by the actual randomization. The pre-calculation is executed only once for all subsequent randomizations of the same network, and renders the method applicable to large networks. A network is randomized by replacing the substrates and products of randomly chosen reactions by compounds from within the same network, and choosing their stoichiometric coefficients, such that Equation (1) is satisfied (Fig. 1). The polynomial-time algorithm generates randomized networks uniformly at random and clearly outperforms switch randomization (see Basler , Supplementary Table S1).

Fig. 1.

Mass-balanced substitution of a substrate. A large number of substitutions is applied in order to obtain fully randomized networks.

3 APPLICATION

JMassBalance is written in Java and comes with all required libraries. Hence, an installation is not required, and it can be used on any operating system with installed Java (http://www.oracle.com). The randomization procedure accepts network files in BioCyc, SBML, or a customizable text format. Additional optional parameters allow specifying whether unbalanced reactions in the original network should be fixed, whether compartments should be considered, the randomization depth and probability, and the number of randomized networks to generate. All calculations can easily be parallelized by executing the program multiple times with different network indices (see online Reference Manual). Switch randomization is also implemented, and can be applied to compare the results of the two null models. In addition to randomization, the following structural properties can be calculated for the original and randomized networks, respectively, which allows to determine their statistical significance in a biologically meaningful context: The randomized networks may be printed as stoichiometric matrices or as text files, thus enabling subsequent investigations, such as constraint-based analysis (Feist ). Average path length: the average number of reactions on the shortest path between two compounds. Clustering coefficient: average fraction of mutually connected neighbors of a node in the corresponding (unipartite) metabolite–metabolite network. Assortativity: correlation coefficient of the in-/out-degree of a node and the average in-/out-degree of its predecessors/successors in the corresponding (unipartite) metabolite–metabolite network. n-cycles: the number of directed cycles of length n in the corresponding (unipartite) metabolite–metabolite network. Path: test whether the given compounds constitute a path. Connectedness: test whether the given compounds are connected via paths. Transition degree: the number of possible mass-balanced substitutions. Local essentiality: the ratio of successor reactions affected by the knockout of a reaction. Reaction centrality: the ratio of reactions globally affected by the knockout of a reaction. Knockout set: the set of reactions globally affected by the knockout of a given reaction. Degree distribution: the compound degree distribution. Weight distribution: the distribution of edge weights. Scope size distribution (Handorf ): the distribution of the number of compounds producible from a random set of seed compounds of the given size. Distribution of Δ0 G (Mavrovouniotis, 1991): the distribution of the standard Gibbs free energy change of reactions.

4 CONCLUSION

JMassBalance is a flexible and efficient tool for assessing the significance of metabolic network properties through a biologically meaningful null model. It can be used to determine the salient structural properties of metabolic networks and to identify new properties, which are statistically significant and independent of basic physical constraints. Thus, we believe the tool is useful for the initial analysis of reconstructed metabolic networks, as well as subsequent network-based research. Funding: German Federal Ministry of Education and Research (grant number 0313924). Conflict of Interest: none declared.

7 in total

1. Network motifs: simple building blocks of complex networks.

Authors: R Milo; S Shen-Orr; S Itzkovitz; N Kashtan; D Chklovskii; U Alon
Journal: Science Date: 2002-10-25 Impact factor: 47.728

2. Expanding metabolic networks: scopes of compounds, robustness, and evolution.

Authors: Thomas Handorf; Oliver Ebenhöh; Reinhart Heinrich
Journal: J Mol Evol Date: 2005-09-12 Impact factor: 2.395

3. Extracting the hierarchical organization of complex systems.

Authors: Marta Sales-Pardo; Roger Guimerà; André A Moreira; Luís A Nunes Amaral
Journal: Proc Natl Acad Sci U S A Date: 2007-09-19 Impact factor: 11.205

4. Classes of complex networks defined by role-to-role connectivity profiles.

Authors: Roger Guimerà; Marta Sales-Pardo; Luís A N Amaral
Journal: Nat Phys Date: 2007 Impact factor: 20.034

5. Estimation of standard Gibbs energy changes of biotransformations.

Authors: M L Mavrovouniotis
Journal: J Biol Chem Date: 1991-08-05 Impact factor: 5.157

6. Model-driven evaluation of the production potential for growth-coupled products of Escherichia coli.

Authors: Adam M Feist; Daniel C Zielinski; Jeffrey D Orth; Jan Schellenberger; Markus J Herrgard; Bernhard Ø Palsson
Journal: Metab Eng Date: 2009-10-17 Impact factor: 9.783

7. Mass-balanced randomization of metabolic networks.

Authors: Georg Basler; Oliver Ebenhöh; Joachim Selbig; Zoran Nikoloski
Journal: Bioinformatics Date: 2011-03-23 Impact factor: 6.937

7 in total

2 in total

1. Observability of complex systems.

Authors: Yang-Yu Liu; Jean-Jacques Slotine; Albert-László Barabási
Journal: Proc Natl Acad Sci U S A Date: 2013-01-28 Impact factor: 11.205

Review 2. Music of metagenomics-a review of its applications, analysis pipeline, and associated tools.

Authors: Bilal Wajid; Faria Anwar; Imran Wajid; Haseeb Nisar; Sharoze Meraj; Ali Zafar; Mustafa Kamal Al-Shawaqfeh; Ali Riza Ekti; Asia Khatoon; Jan S Suchodolski
Journal: Funct Integr Genomics Date: 2021-10-18 Impact factor: 3.410

2 in total