Literature DB >> 35759507

The topology of genome-scale metabolic reconstructions unravels independent modules and high network flexibility.

Verónica S Martínez^1,2, Pedro A Saa^3,4, Jason Jooste¹, Kanupriya Tiwari¹, Lake-Ee Quek^1,5, Lars K Nielsen^1,2,6,7.

Abstract

The topology of metabolic networks is recognisably modular with modules weakly connected apart from sharing a pool of currency metabolites. Here, we defined modules as sets of reversible reactions isolated from the rest of metabolism by irreversible reactions except for the exchange of currency metabolites. Our approach identifies topologically independent modules under specific conditions associated with different metabolic functions. As case studies, the E.coli iJO1366 and Human Recon 2.2 genome-scale metabolic models were split in 103 and 321 modules respectively, displaying significant correlation patterns in expression data. Finally, we addressed a fundamental question about the metabolic flexibility conferred by reversible reactions: "Of all Directed Topologies (DTs) defined by fixing directions to all reversible reactions, how many are capable of carrying flux through all reactions?". Enumeration of the DTs for iJO1366 model was performed using an efficient depth-first search algorithm, rejecting infeasible DTs based on mass-imbalanced and loopy flux patterns. We found the direction of 79% of reversible reactions must be defined before all directions in the network can be fixed, granting a high degree of flexibility.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35759507 PMCID： PMC9269948 DOI： 10.1371/journal.pcbi.1010203

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.779

Introduction

A genome-scale metabolic model (GeM) is a comprehensive mathematical representation of an organism’s metabolism [1, 2]. To date, GeMs for more than 6,000 organisms, including all model organisms, have been reconstructed [3-8]. This network representation is widely employed to study the metabolic phenotype of cells with applications ranging from strain development, modelling interactions among multiple cells or organisms, understanding human diseases to the study of evolutionary processes [8-13]. GeMs describe all metabolic capabilities of an organism, i.e., all biochemical reactions that can carry flux under any condition. These detailed models contain thousands of reactions, which can confound more detailed studies of network properties and functions. A common strategy to overcome this limitation is to focus the analysis on one or a few model subsystems. Subsystems have been defined by conventional biochemical pathways in online databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) [14] and BioCyc [15]. Subsystems have been used to map omics data [16] and for model reduction [17], yet their definition is arbitrary and identical for all organisms. Recognising the diversity and uniqueness of the metabolism in individual organisms, a more satisfying alternative would be to generate model subsystems in an unsupervised manner relying exclusively on the specific topology of the studied metabolic network. The topology of metabolic networks has been widely studied by graph theory methods. Early work by Barabasi and colleagues concluded that metabolic networks are scale-free, hierarchical networks with highly connected modules overlapping known metabolic functions [18, 19]. However, these analyses did not consider the nature of the edges and it soon became apparent that the extremely short average pass length observed was realized through cofactors (e.g., ATP, NADH, NADPH), whereas the flow of carbon from a substrate to a product often is quite long. Following a more biologically meaningful interpretation of the network topology, by excluding currency metabolites (cofactors and moieties) and accounting for directionality of irreversible reactions, Ma [20, 21] observed that metabolic networks can be broken into a modest number of strong networks (i.e., networks where each metabolite can be reached from every other metabolite). The network arranged as a directed bow-tie structure with a substrate subset connected to a product subset through a giant strong component corresponding to central carbon metabolism [20, 22]. Another approach for inferring and studying metabolic modules/pathways is based on structural (stoichiometric) analysis [23-26]. For this task, classical Elementary Flux Modes (EFMs) has been adapted for enumerating flux patterns in metabolic subnetworks (i.e., modules) under biomass-optimal growth [23, 25], incorporating even loopless criteria [27] avoiding thermodynamically infeasible flux cycles [24]. While these approches have yielded deep insights about the flexibility and functioning of metabolic newtorks, their applicability still remains limited to small- to medium-sized models. This work presents a novel approach to generate topologically independent metabolic modules exploiting the network topology and directionality constraints. The E.coli iJO1366 [4] and Recon 2.2 [7] GeMs were subdivided in topologically independent modules and evaluated for their biological relevance under specific growth conditions. The clustering approach provides fundamental insights into the role and flexibility conferred to metabolic networks by reversible reactions. We quantitatively estimated the network flexibility by counting in each module the number of feasible Directed Topologies (DTs), which represent consistent flux solutions [28] where all reactions carry flux, and hence, the directions are fixed. Notably, these DTs are maximal pathways known as Flux Topes (FTs) [23], which have been recently applied for exploring the flexibility of optimal network states, and correcting thermodynamically infeasible cycles [29]. Under the assumption of ‘thermodynamic’ isolation, the (Cartesian) product of the DTs of the different modules provides an unprecedented upper bound estimate of the ‘topological’ degree of freedom of the network.

Results

Model reduction and compression

GeMs are large models with thousands of reactions, some of which are isolated and unable to carry flux, while others are part of linear pathways that can be compressed. As an initial step, all blocked reactions were removed and the model compressed to generate a more manageable model for clustering.

E. coli iJO1366

The E.coli iJO1366 [4] contains 2,583 reactions (941 reversible reactions) and 2,135 metabolites (330 boundary metabolites). Under aerobic growth in medium with glucose as sole carbon source, the initial 941 reversible reactions were reduced by 74% to only 248 “structurally” reversible reactions after model reduction and compression (Table A in S1 Table). Using the network topology and original directions, flux variability analysis (FVA) [30] was performed to identify and remove blocked reactions resulting from singleton metabolites and, where possible, constrain the flux direction of active reactions. The result was the identification of 242 blocked reactions and 534 new irreversible reactions, causing a 57% reduction in reversible reactions as well as a 60% reduction in metabolites involved in those reactions. Next, the reduced model was compressed by lumping together reactions in linear pathways as they carry the same flux and are fully coupled [31]. Overall, the model reduction process led to a total compression of the model from 2,583 to 1,419 reactions (45% reduction), and from 2,135 to 971 metabolites (55% reduction) (Table A in S1 Table). More importantly, the original 941 reversible reactions were reduced to 248 (74% reduction). We denote these remaining reversible reactions as "structural" reversible reactions. Notably, there is a considerable reduction in the number of metabolites participating in reversible reactions from the initial 2,135 to just 382 (28%); if we know the concentration of these metabolites and the ΔrG’0 of the structural reversible reactions, we can determine the directions of all metabolic reactions in the model [32, 33].

Human model recon 2.2

The reduction of the human model Recon 2.2. [7] was performed following the same methodology used for the E. coli model. Recon 2.2 has 7,785 reactions (3,782 reversible reactions) and 6,047 metabolites (723 boundary metabolites). After the initial model reduction by FVA, 1,878 reactions were found to be blocked and 1,007 reactions became irreversible. The total number of metabolites in the model was reduced by 36% and the reversible reactions by 52%. The compression of the model further reduced the number of reversible reactions to 1,582 structurally reversible reactions, and the number of metabolites participating in reversible reactions to 1,523, only 42% of the initial number of reversible reactions (Table B in S1 Table).

Modular topology and clustering

Metabolism is organised into semi-autonomous modules sharing a common pool of currency metabolites (co-factors and moieties) [21], but otherwise weakly connected. Here, we considered irreversible reactions as the natural boundary between modules. Irreversible reactions are thermodynamic insulators preventing downstream products from affecting upstream reactions. Moreover, they are in many cases the “committed pathway step” under allosteric regulation, which greatly reduces the control exerted by the upstream substrate. Using irreversible reactions to define the boundaries of modules, the metabolic models were subdivided by clustering the reversible reactions by common metabolites connecting them, disregarding connections associated to currency metabolites (co-factors and moieties). As a result, we identified functional modules in the E. coli and human metabolism. The iJO1366 model split into 103 isolated modules, with each module comprising nearby reversible reactions that can be assigned to a specific metabolic function or pathway (Fig 1, full list of modules is found in Table C in S1 Table). Seventy-three modules only contained a single structural reversible reaction, i.e., reversible reactions that are not constrained by surrounding irreversible reactions. In other words, around 70% of the modules are made by a single linear pathway of reversible reactions. Each of the six largest modules contained eight or more reversible reaction, representing the following metabolic pathways: (I) glycolysis, pentose phosphate pathway, ribonucleotides and sugars metabolism, (II) fatty acids metabolism, (III) pyruvate metabolism, (IV) nucleotides metabolism, (V) folate, serine and glycine metabolism, and (VI) TCA cycle (Table 1 and Fig 1).

Fig 1

Modules of structural reversible reactions of iJO1366.

The network was subdivided into modules of reversible reactions that share metabolites. Each node represents a reversible reaction and each edge a metabolite connecting two reversible reactions. The topology of the network is given by the invisible presence of irreversible reactions that connect the reversible reactions. The six largest modules that contain more than seven reversible reactions: TCA cycle (green), Fatty acids metabolism (yellow), Glycolysis, PPP, ribonucleotides and sugars metabolism (red), Pyruvate metabolism (purple), Folate, serine and glycine metabolism (pink), and Nucleotides metabolism (orange). Insert left corner shows the reaction adjacency matrix of the structural reversible reactions, which also represents the reversible reaction modules.

Table 1

Topologically independent modules of iJO1366.

The 294 structural reversible reactions were grouped in 103 modules (see Table C in S1 Table for the full list of modules). Here features of the six largest modules are presented.

		Number of Reactions
Module	Pathway	Reversible	Irreversible Internal	Irreversible External	Total
7	TCA cycle	8	5	26	39
11	Fatty acids metabolism	20	6	35	61
15	Glycolysis, PPP, ribonucleotides and sugars metabolism	42	25	147	214
18	Pyruvate metabolism	16	16	53	85
63	Folate, serine and glycine metabolism	9	11	32	52
86	Nucleotides metabolism	16	33	62	111

Modules of structural reversible reactions of iJO1366.

Topologically independent modules of iJO1366.

The 294 structural reversible reactions were grouped in 103 modules (see Table C in S1 Table for the full list of modules). Here features of the six largest modules are presented. In order to validate the biological relevance of the generated modules, we computed the correlation in gene expression between structurally reversible reactions of the iJO1366 model belonging to the same module (intra module) and to different modules (extra module). Even without a modular structure, one would expect that proximity would enhance correlation. Hence, we calculated all correlations with distance two, i.e. those separated by exactly one reaction (Supporting text B in S1 Text and Table D in S1 Table). A distance of two was chosen because this is the shortest distance between two reversible reactions that belong to different modules. Correlation was computed using the PRECISE database [34], which contains 278 RNA-seq expression profiles for E. coli K-12 MG1655 and BW25113. Constitutively expressed genes (38 out of 164 genes presented in the GPR of reversible reactions), those with standard deviation of the normalized gene expression data less or equal to 0.6, as well as four genes that are part of the gene protein relationship (GPR) of 99 of the structural reversible reactions, were removed from the dataset. The latter four genes were removed in order to avoid artificially high correlations. Overall, 74% of the genes belonging to the GPR of the structural reversible reactions of iJO1366 were mapped onto the transcriptomic dataset. If there is no modular organization underlying gene expression, we would expect no difference in the distance-2 correlation of structural reversible reactions between intra and extra module reactions. In contrast, we observed a significant difference in their distributions. The correlation between distance-2 reactions belonging to different modules was generally low with only 0.4% displaying absolute correlation values greater than 0.8 and 25% showing less than 0.1 absolute correlation value (Fig B in S1 Fig). In contrast, more than 37% of intra-module distance-2 reactions displayed an absolute correlation value greater than 0.8 and only 12% displayed an absolute correlation value less than 0.1. While there is substantial overlap in distributions, the two distributions are not only significantly different (p-value = 8.2x10-59, Kolmogorov–Smirnov test for equal distributions), but the median of the intra module distribution is significantly higher than the extra module reactions (p-value = 1x10-39, Wilcoxon sum rank test). We conclude that the module structure is biological relevant as reflected in differential gene expression.

Human model Recon 2.2

Analogous to iJO1366 model, Recon 2.2 clustered into 251 isolated modules. However, 53% of the 1,582 structural reversible reactions in the reduced model were allocated to a single large cluster, describing a major share of the central carbon metabolism. Inspection of the topological features of this cluster revealed that the high connectivity was caused by a large number of antiporter reactions coupling metabolism of otherwise distinct metabolites. For example, Recon 2.2 contains 102 antiporter reactions catalysed by the L-type neutral amino acid transporter (LAT1) generated by pairwise combination of the possible substrates. These antiporters play a critical homeostatic role as “harmonizers”, maintaining a balanced cytosolic pool of all amino acids [35]. In order to explore the intracellular metabolism, however, the majority of antiporter reactions (414 reactions) were removed from the model, keeping the uniporter transport reactions and the co-transport reactions that were needed to enable the transport of all metabolites in the model. Notably, this reduction of reversible antiporter reactions does not affect the overall model capabilities in terms of metabolites that the model is able to consume and produce and maximum specific growth rate (details of simulations in Supporting text A in S1 Text and Table E in S1 Table). Clustering of the reduced model produced 321 modules (list of modules in Table F in S1 Table), with the larger module containing 111 reversible reactions. Almost half of the modules (45%) contained only one structural reversible reaction (Fig A in S1 Fig), while the six largest modules contained more than 30 reversible reactions each (Table 2 and Fig 3).

Table 2

Topologically independent modules of Recon 2.2.

		Number of Reactions
Module	Pathway	Reversible	Irreversible Internal	Irreversible External	Total
4	Fatty acids metabolism	46	47	86	179
6	Glutamate and glutathione metabolism and TCA cycle	81	221	78	380
7	Nucleotides metabolism	111	119	63	293
9	Pyruvate, lactate, alanine and cysteine metabolism	38	99	59	196
19	Glycolysis, PPP and galactose metabolism	32	13	54	99
27	Glycine, serine and taurine metabolism	47	109	48	204

Fig 3

Modules of structural reversible reactions of Recon 2.2 model.

The network was subdivided into modules of reversible reactions that share metabolites. Each node represents a reversible reaction and each edge a metabolite connecting two reversible reactions. The topology of the network is given by the invisible presence of irreversible reactions that connect the reversible reactions. The six largest modules that contain more than 30 reversible reactions: Glutamate metabolism, glutathione metabolism and TCA cycle (green), Fatty acids metabolism (yellow), Glycolysis, PPP, and galactose metabolism (red), Pyruvate, lactate, alanine and cysteine metabolism (purple), Glycine, serine and taurine metabolism (pink), and Nucleotides metabolism (orange). On the right the reaction adjacency matrix of the structural reversible reactions is presented, which also represents the reversible reaction modules.

Modules of structural reversible reactions of Recon 2.2 model.

The network was subdivided into modules of reversible reactions that share metabolites. Each node represents a reversible reaction and each edge a metabolite connecting two reversible reactions. The topology of the network is given by the invisible presence of irreversible reactions that connect the reversible reactions. The six largest modules that contain more than 30 reversible reactions: Glutamate metabolism, glutathione metabolism and TCA cycle (green), Fatty acids metabolism (yellow), Glycolysis, PPP, and galactose metabolism (red), Pyruvate, lactate, alanine and cysteine metabolism (purple), Glycine, serine and taurine metabolism (pink), and Nucleotides metabolism (orange). On the right the reaction adjacency matrix of the structural reversible reactions is presented, which also represents the reversible reaction modules.

Topologically independent modules of Recon 2.2.

The remaining 1,168 structural reversible reactions, after removal of antiporter reactions, were grouped in 321 modules (see Table F in S1 Table for the full list of modules). Here features of the six largest modules are presented. As with the iJO1336 model, the biological relevance of the Recon 2.2 model modules was validated by computing the Pearson correlation in gene expression data between the genes of structurally reversible reactions with distance 2 (Supporting text B in S1 Text and Table G in S1 Table). The human expression data from the GTEx (The Genotype-Tissue Expression) project [36] was used to compute gene correlations. This dataset contains RNA-seq data for 17,382 samples representing 54 different human tissues from 948 donors. Gene TPMs were obtained from the GTEx portal on 13/04/21 (dbGaP Accession phs000424.v8.p2). The data was initially filtered by removing genes with less than 10 samples with at least 1 TPM and normalized by log-transformation log2(TPM+1). Out of the 397 genes present in the reversible reactions of the Recon 2.2 model, only the gene expressing phosphoglycerate mutase 2 (HGNC:8889) was not found in the GTEx database. Fig 2B shows the distribution of correlations between structurally reversible reactions of distance 2 from the Recon 2.2 model. The distribution of the correlation for extra module reactions resembled a normal distribution, centred at zero, with 53% of correlations presenting absolute values of less than 0.1 and only 10% displaying absolute correlation values higher than 0.5. In contrast, for the intra module correlations, the maximum absolute correlation frequency was between 0.55 and 0.6. Less than 30% of correlations were between -0.1 and 0.1 and more than 30% showed absolute correlation values over 0.5. Unlike the E. coli data, the correlation between human genes rarely exceeds 0.8, possibly due to the inherent expression variation between individuals compared to isogenic E. coli. While seemingly less profound, the difference between the distributions of correlation of intra and extra module genes is highly significant (p-value < 5x10-324, Kolmogorov–Smirnov test for equal distributions) with a tendency of the genes belonging to the same module to have higher correlations (p-value < 5x10-324, Wilcoxon sum rank test).

Fig 2

Distribution of correlation between structural reversible reactions of distance 2.

A. Results for the iJO1366 model. B. Results for the Recon 2.2 model. The distribution of the (absolute) correlations between the structural reversible reactions with distance 2 that belong to the same module (Intra Module) and to different modules (Extra Module), are shown in red and blue, respectively.

Distribution of correlation between structural reversible reactions of distance 2.

Topological flexibility in iJO1366

Irreversible reactions direct flux from substrates towards biomass components and products. Reversible reactions grant metabolic networks the flexibility to have alternative flow directions, e.g. glycolysis versus gluconeogenesis, in order to adapt to a changing environment [12]. We can characterize the level of flexibility by the possible directed topologies (DTs), defined as the set of distinct network configurations where all reversible reactions are unidirectional and all network reactions carry flux simultaneously at steady state. The reduced and compressed E. coli model iJO1366 has 248 structural reversible reactions (not directionally constrained by surrounding irreversible reactions) (Table A in S1 Table), thus it can be described by a maximum of 2248 different DTs if these reactions were completely independent. In reality, reactions are coupled [31], and the number of distinct feasible DTs arising from the various combination of directions for each reversible reaction should be substantially less. We defined the "topological" degree of freedom (DoF) (Log2N) of the metabolic network (N being the total number of feasible DTs), as the minimum number of reversible reactions that must be directionally fixed in order to fix metabolism to a distinct feasible DT. From a practical point of view, it may be possible to identify the metabolic state of the cell under certain growth conditions by determining the direction of key reactions by the use of metabolomics data and thermodynamic principles [32, 33, 37–39], if the “topological” DoF is a relatively small number. In order to quantify the flexibility of the iJO1366 model, the “topological” DoF was estimated. It is possible to estimate the feasibility of each DT by FVA, however, this task is computationally prohibitive for 2248 DTs. By taking advantage of the modular topology of metabolic models, where each module is semi-autonomous and consists of highly connected components, we can estimate an upper bound of the model “topological” DoF as the Cartesian product of the individual modules “topological” DoFs, greatly reducing the computation challenge. The largest module contained n = 42 reversible reactions, therefore, 3.7x1014 (2 x n x 2) optimization problems would have to be solved to determine the “topological” DoF of this module. As this is simply impractical, we constructed a set of rules to identify flux patterns leading to infeasible DTs, i.e., unable to carry flux at steady state (see Methods). The rules were constructed using patterns that either violate single metabolite mass balances (i.e., steady state) or the second law of thermodynamics by generating infeasible closed loops (Fig 4A and 4B). Finally, the “topological” DoF was estimated by performing a depth-first search in the directionality space of each module using the previously defined rules to remove infeasible DTs. Importantly, by using this topological search approach and in contrast to previous methods [23], no optimizations runs were required for the “topological” DoF estimation.

Fig 4

Example of rules to identify infeasible flux patterns.

Example of rules to identify infeasible flux patterns.

A. Loopless rules for the fatty acid metabolism module. In order to break the loop, the acetaldehyde oxidoreductase reaction (acetaldehyde (acald[c]) to acetyl-coA (accoa[c])) cannot point in the direction of acetaldehyde synthesis (red arrow). B. Local mass balance rules for the fatty acid metabolism module. Around each internal metabolite in the network, there must be at least one reaction in (synthesis) and one out (consumption). In this example, there is one irreversible reaction around acetoacetyl-coA in the synthesis direction, thus at least one of the reversible reactions should go in the consumption direction. C. Two metabolites that are fully mixed for the Arginine metabolism module. As Agmatine (agm[p]) and Arginine (arg_L[p]) are fully mixed, both can actually be seen as only one metabolite, thus a mass balance rule around both metabolites was added. Which means that around fully mixed metabolites there should be at least one reaction in (synthesis) and one out (consumption). In purple are highlighted the 3 reversible reactions around these metabolites. In red are presented the infeasible rules for A and B. Black arrows represent irreversible reactions and green arrows represent reversible reactions. A comparison of this enumeration strategy against FVA for all but the largest module revealed the above two principles for rule generation captured almost all the infeasible DTs patterns. The majority of failures were found in metabolite pairs that behaved as one metabolite due to the reactions connecting them. This issue was found, for example, in the Arginine metabolism module, where Agmatine (agm[p]) and Arginine (arg_L[p]) are fully mixed by reactions connecting them (Fig 4C). This issue was overcome by introducing a local mass balance around the two fully mixed metabolites to generate the missing infeasible patterns (see Materials and Methods). A few failures remained in module 86 (nucleotides metabolism) even after the inclusion of the local mass balance rule; the search strategy identified 446 feasible DTs compared to 440 determined using brute force FVA. A heuristic approach was implemented to find the missing infeasible flux patterns needed to find the correct number of DTs. The heuristic approach generates potential new infeasible patterns based on the already identified infeasible patterns with 3 or more fixed directionalities, by moving the constraint from one reaction to another that was initially unconstrained (see Supporting text C in S1 Text for more details). The inclusion of the heuristic approach, as well as identifying the six missing infeasible DTs on module 86, reduced the number of calculated feasible DTs on the largest module (module 15) by 43% (from 1.14x1011 to 4.89x1010), and the computational time by 88%. Due to the large number of structural reversible reactions in module 15 (46 structural reversible reactions after model compression, see Methods), it is infeasible to run the brute force search to fully validate our search algorithm. Instead, sampling was used to validate the enumeration approach for this module. Out of a random sample of 1x107 DTs analysed using FVA, only 0.055% (5,519) were feasible. This percentage is similar to the 0.069% DTs found feasible for module 15 using our search algorithm. Furthermore, all of the infeasible DTs seen in the random sample could be explained with the existing infeasible flux patterns. We complemented random sampling with a targeted sampling to ensure that for every combination of 5 structural reversible reactions, the full combinatorial within those reactions was covered. Out of the 43,864,128 (, for n = 46 structural reversible reactions) simulations, 31,118 (0.071%) were found feasible. Again, the proportion of feasible DTs was very similar to the proportion found by our algorithm (0.069%), and the infeasible DTs were fully explained by the computed infeasible flux patterns. These results suggest that the presented topological enumeration algorithm is most likely capable of finding all the feasible DTs in unprecedentedly large search spaces. For the largest module (module 15), there were 29 infeasible flux patterns identified: 11 due to loopless rules, 15 due to local mass balance rules, 2 due to two metabolites mixed mass balance rules and 2 from the heuristic algorithm (Fig B-H in S1 Fig). The infeasible patterns covered 38 of the 46 structural reversible reactions, leaving eight completely free reactions: G3PD, RPI, DURIPP, PUNP2, EX_ade_e, EX_dha_e, EX_hxan_e and EX_ura_e. Based on the number of feasible DTs, the “topological” DoF of module 15 is 35.5 out of the theoretical maximum 46 (Fig 5).

Fig 5

Theoretical maximum and actual “topological” DoF of the seven larger modules of the E.coli iJO1366 reconstruction.

In order to gain a better understanding of the flexibility of structural reversible reactions within module 15, the directionality correlation between the reactions was studied (refer to Supporting text B in S1 Text for more details). Overall, the correlation between the studied reactions was poor, suggesting a weak coupling (Fig I in S1 Fig). We further investigated the presence of clusters of highly coupled reversible reactions within this module by enumerating the largest sets of fully connected reactions (maximal cliques) [40]. Here, connected reactions were defined as those with an absolute correlation higher than a predefined cut-off. When a cut-off of 0.85 was defined, only 5 cliques with 2 reactions each were found, each clique connecting an exchange and an internal reaction consuming the transported metabolite. The small number of cliques found and the extremely small size of them is in agreement with the previously found “topological” DoF of this module, confirming that the module structural reversible reactions are highly independent from each other. The model “topological” DoF was estimated from the individual modules “topological” DoFs assuming that the modules are independent of each other. The assumption that reversible reactions within a module can operate independently of the state of reversible reactions in other modules largely appear to be valid because the structural reversible reactions in the model were organized in nearly isolated modules upon the removal of currency metabolites. Thus, changes inside a module are unlikely to have a high impact in a different module of the network [41]. The model “topological” DoF was estimated to be 200, i.e., 79% of structurally reversible reactions are independent of each other, which suggest a high level of flexibility. The simulation results of the seven largest modules on “topological” DoF, after model pre-processing, are presented in Fig 5.

Discussion

Metabolic networks are inherently modular [19, 20, 22]. This modular nature provides a means for simplifying structural and functional analysis of large-scale metabolic networks. Early work described the network topology using an undirected graph with no consideration of the nature of the edges, hereby yielding artificial short path lengths and an ambiguous structure. By excluding currency metabolites and accounting for directionality of reactions, metabolic networks have been previously described having a bow-tie structure with a substrate subset connected to a product subset through a giant strong component corresponding to central carbon metabolism [20, 21]. The deliberate omission of energy/redox co-factors was critical for the identification of thermodynamically isolated modules. Clearly, these modules are coupled through energy and redox to other reactions, however, the coupling is to the tightly maintained global pool rather than between any two individual reactions. Arguably energy/redox homeostasis–maintaining energy charge and the ratios of various redox partners–is a more global regulatory principle (see for example [42]) than the modules identified. Conversely, assuming that coupling through energy/redox links individual reactions would speak against modularity, e.g., suggest that all gene regulation is globally coordinated with no modularity, which is clearly not accurate (operons in bacteria is a clear example of modularity). The thermodynamic isolation hereby employed focus on is the isolation achieved by irreversible reactions–commonly subject to allosteric regulation–that ensures the products have no impact on substrate concentrations or the reactions upstream of the substrates. Importantly, once modules have been indentified, currency metabolites (redox and co-factors) may be reincorporated to the respective reactions if desired. For example, a kinetic model of a module would typically include cofactors as fixed concentration external metabolites [43, 44]. It is for the sole purpose of identifying modules that currency metabolites are reversibly removed. Altogether, the identified modules unveiled the modular organization of the reversible reactions of the E. coli iJO1366 and the Recon 2.2 GeMs into metabolic modules connected by irreversible reactions. The resulting organization resembles conventional metabolic pathways and subsystems known for these organisms, but in this case, they emerged from the topological features of each network leveraged by a novel clustering approach. The approach unravelled hundred and three nearly isolated modules for the E. coli network growing aerobically in media with glucose as sole carbon source. The majority of the modules contained only one structural reversible reaction, whereas thirty contained more than one. For comparison, Ma and Zeng [20] found 29 strong components that include no less than three metabolites. When applying the clustering approach to a much larger reconstruction, Recon 2.2, a large number of antiporter transport reactions (e.g., the amino acid “harmonizers” such as LAT1) were removed, which artificially connect different parts of metabolism [35]. After the removal of antiporters, the model was subdivided into 321 thermodynamically isolated modules. The six largest modules in the human model displayed known metabolic functions similar to those found in E coli, namely: TCA cycle, glycolysis, pentose phosphate pathway, fatty acids, nucleotides, sugars and amino acids metabolism. The biological relevance of the identified modules was demonstrated through gene expression analysis, which showed that the correlation between of reversible reactions of distance 2 was significantly higher between reactions within a cluster and low between reactions in separate clusters. The majority of correlated enzymes catalysing reactions within the same module are highly correlated (more than 0.8 absolute value correlation across 278 transcription datasets in the E.coli model and more than 0.5 absolute value correlation across 17,382 RNA-seq samples in the Human model). In contrast, the majority of absolute correlations between reactions in different modules concentrated around zero supporting the presence of the inferred underlying modular structure. The E. coli iJO1366 metabolic network flexibility was studied using the identified modules. An efficient depth-first search algorithm using simple infeasible mass balance and loopless rules was developed to explore the topological flexibility of the modules by enumerating all feasible DTs. We note that this amounts to enumerating all the flux topes in these (currency-free) subnetworks [23]. The analysis revealed only a weak coupling between structural reversible reactions in the largest module, which points to an overall high topological flexibility providing a high degree of robustness [45]. Strong coupling was only found between some boundary (exchange) and internal reactions consuming a common metabolite, which is known to be the case as exchange reactions can exert massive coupling and blocking of reactions at the boundary of metabolic networks [31].This observation is true across the modules. Assuming that modules operate independent of each other, the topological degree of freedom of the E. coli iJO1366 model was determined to be 200 (79%) out of a theoretical maximum of 248. This number represents an upper bound on the number of directionalities that must be determined to fix the topological state of the metabolic network. A more exact estimate would be obtained by enumerating all the feasible DTs in the entire network as whole, which is unfortunately impractical at this scale [23]. Still, we can conclude that except for linear pathways, reversible reactions operate practically independent of each other, granting both flexibility and robustness against internal and external perturbations [22, 34, 46, 47].

Methods

Prior to model reduction and compression, the models were modified for mass balance and thermodynamic calculation consistency [32, 33]. In short, reactions catalyzed by different enzymes in either direction were lumped together, the species HCO3, CO3, CO2, and H2CO3 were aggregated as CO2tot and H2O was added to the other side of the reaction. Additionally, the oxidized and reduced FAD of mitochondria were replaced by the oxidized and reduced FADenz to represent the enzyme-bound FAD cofactor. Finally, the models’ original directionality constraints were used, which describe aerobic growth with glucose as main carbon source (Tables H and I in S1 Table). Two steps of model reduction were performed: Flux variability analysis (FVA), the minimum and maximum flux of all network reactions were estimated by linear programming and blocked reactions were identified. Blocked reactions are reactions that carry null flux as a result of being linked to singleton/dead-end metabolites, or caused by contradicting reversibility specifications. Additionally, reversible reactions that operate irreversible under the specified directionality constrains were identified. The model was reduced by removing blocked reactions and constraining flux direction of the identified irreversible reactions. Model compression, linear pathways were lumped into single reactions. Reactions organized into a linear pathway must carry the same flux value, therefore can be lumped together [48]. If one of the reactions in the pathway is irreversible the lumped reaction will be irreversible. A novel approach to divide the metabolic network into functional modules (metabolic pathways) was developed. First, the currency metabolites (co-factors and moieties), defined as highly connected metabolites with a carrier function (e.g., electrons and chemical groups), were removed from the network reactions. The removal of currency metabolites was done manually based on on their role in each reaction. For instance, if a currency metabolite was participating as carrier in the reaction, it was removed, otherwise it was kept (refer to Table J in S1 Table for the complete list of removed metabolites). In addition, if it was not clear which metabolites were the co-factors and main substrates/products of the reaction (e.g., K+-Cl- cotransport: cl[e]+k[e] = cl[c]+k[c]), all metabolites were kept. In order to preserve the metabolic functions of the network, currency metabolites were kept in all reactions that synthetise or degrade them, as well as when a metabolite was being used as a building block (e.g. Acetoacetyl-CoA:acetate CoA-transferase: acac+coa = aacoa) (Tables K and L in S1 Table). Then reversible reactions were clustered by common metabolites connecting them. The irreversible reactions were kept in the model to retain the network structure, but these reactions did not take part of any module. The reaction adjacency matrix was used to reveal the generated modules.

Model flexibility and enumeration of directed topologies

In order to study the degree of flexibility conferred to the metabolic model due to reversible reactions, two terms were introduced: Directed Topologies (DTs), defined by fixing directions to all reversible reactions, i.e., all reactions in the network are irreversible and carry flux in a DT. Topological degree of freedom, defined as log2N where N the number of DTs that are capable of carrying flux in all reactions. The DT enumeration problem was divided into sub-problems each contained in a pre-defined module. Inside each module, duplicated reactions were lumped as one reaction and irreversible reactions in opposite direction were lumped into new reversible reactions. Then, the connections between reversible reactions inside the modules were studied using a set of rules that define infeasible flux patterns rendering the candidate DT infeasible. These rules are based on the mass balance around one or two metabolites and the loopless flux condition. Next, a depth-first search was performed through the feasible space of reversible reactions directions using the previously defined rules as a table of infeasible flux patterns (see next section for details on the rules and search algorithm). In order to reduce the computational time, the search algorithm was implemented in C, resulting in a 500 times faster search. To verify that the feasible space was properly explored by the search algorithm, an FVA brute force algorithm was implemented and their results were compared. It was found that for some of the largest modules, few infeasible patterns were missing, therefore, a heuristic method for finding the missing infeasible flux patterns was added. For the largest module, a flux sampling strategy was employed for validation as the brute force method was impractical to run. For more information about the missing rules and the adjustment of the patterns check Supporting text A in S1 Text. Finally, simulations were executed in MATLAB (The Math-Works, Natick, USA) using the MEX file interface for C code. Gurobi Optimization solver was employed for the linear optimizations. Calculations were performed on a desktop computer.

Depth-first search for DT enumeration

Enumeration of all DTs in a given module of a determined model is likely NP-hard. Even when directed topologies are considered and composed of configurations where all reactions carry flux (i.e. a maximal flux mode)–common simplification applied to keep the problem tractable [29]–, this enumeration amounts to counting all flux modes of size K (where K represents the size of the subnetwork), which can be assumed to be NP-complete from previous theoretical results [49]. Thus, implementation of a brute-force optimization algorithm where all the DTs are checked for feasibility is simply naïve and does simply not scale well. In order to overcome this obstacle, we developed and implemented a novel depth-first search algorithm that traverses a directionality tree, where the current flux directionality pattern is compared against a list of previously calculated infeasible flux patterns for early rejection. A similar graph-based approach has been used to efficiently enumerate elementary flux modes using a reaction tree for rejecting on-the-fly pathways with two-futile cycles [50]. Particularly in our case, we constructed two types of infeasible flux patterns before each search in each module: 1) mass-imbalanced flux patterns, and 2) loopy flux patterns. The first infeasible pattern type is derived directly from inspection of the stoichiometric matrix, S. Each row j of S, denoted by S, represents the mass balance for a metabolite j in the module, and more importantly, it provides necessary relations for reaching a mass balanced/imbalanced flux topology based on a simple criterion: an imbalanced flux pattern has either all fluxes coming or leaving from metabolite j. Such topologies will never reach steady state, and hence, they can be discarded without the need of simulation. Moreover, since S is typically sparse due to the removal of currency metabolites, exhaustive computation of all mass-imbalanced flux pattern combinations for all the metabolites in the module can be efficiently performed. The second infeasible pattern type follows the same logic but addresses thermodynamically infeasible loops. The presence of internal loops or closed cycles in flux solutions violate the second law of thermodynamics, as they can potentially reach infinite flux without additional energy input [27]. In order to identify and correct such situations, the `loopless´ flux optimization formulation has been proposed and successfully applied to large metabolic models [44, 51–53]. Briefly, thermodynamically infeasible flux solutions can be identified by checking if they contain any active (non-zero) closed cycles [54]. These closed cycles are encoded in the ´loop-law´ matrix, Nint, which describes a null space basis of the stoichiometric matrix of internal reactions Sint of the network [53]. We have recently developed an efficient algorithm called Fast-SNP [44] for finding a sparse representation of such basis in large-scale metabolic models [55, 56], substantially easing the detection of closed cycles in flux patterns. Once Nint is computed for the particular submodule, then complete enumeration of all ‘loopy’ flux patterns in the module can be efficiently performed. It is important to highlight that Nint is constructed using the original reaction stoichiometries, thereby avoiding distortion of the generated loop laws. A: Checking model capabilities after removing antiporter reactions (in S1 Text). B: Calculation of correlation in gene expression between structurally reversible reactions of distance 2 (in S1 Text).: Missing Infeasible Patterns (in S1 Text). D: Study of directionality correlation between structural reversible reactions (in S1 Text). (DOCX) Click here for additional data file. A: Characterization of Recon 2.2 modules dimensions. Almost half of the generated modules contain one reversible reaction, while 6% of the modules contain 10 or more reversible reactions (in S1 Fig). B: Infeasible flux patters of module 15 due to loopless rules. 5 of the 11 infeasible flux patterns are presented, each with a different colour. The arrows represent the reactions in the module (both reversible and irreversible) and the nodes the metabolites (in S1 Fig). C: Infeasible flux patters of module 15 due to loopless rules. 4 of the 11 infeasible flux patterns are presented, each with a different colour. The arrows represent the reactions in the module (both reversible and irreversible) and the nodes the metabolites (in S1 Fig). D: Infeasible flux patters of module 15 due to loopless rules. 2 of the 11 infeasible flux patterns are presented, each with a different colour. The arrows represent the reactions in the module (both reversible and irreversible) and the nodes the metabolites (in S1 Fig). E: Infeasible flux patters of module 15 due to mass balance rules. 14 of the 15 infeasible flux patterns are presented, each with a different colour. The arrows represent the reactions in the module (both reversible and irreversible) and the nodes the metabolites (in S1 Fig). F: Infeasible flux patters of module 15 due to mass balance rules. 1 of the 15 infeasible flux patterns are presented, each with a different colour. The arrows represent the reactions in the module (both reversible and irreversible) and the nodes the metabolites (in S1 Fig). G: Infeasible flux pattern of module 15 due to two metabolites mixed mass balance rules. The infeasible pattern is highlighted using red arrows. The arrows represent the reactions in the module (both reversible and irreversible) and the nodes the metabolites (in S1 Fig). H: Infeasible flux pattern of module 15 found using the heuristic algorithm. The two infeasible patterns are highlighted using red and blue arrows. The arrows represent the reactions in the module (both reversible and irreversible) and the nodes the metabolites (in S1 Fig). I: Correlation analysis of the directionalities of the structural reversible reactions in module 15. (A) Absolute correlation distribution of the directionalities in the DTs sample. (B) Cliques distribution and (C) Cliques size distribution for different absolute correlation thresholds. The higher the absolute correlation cut off, the least and smaller the size of the cliques found (in S1 Fig). (DOCX) Click here for additional data file. A: Summary of reduction and compression of E.coli iJO1366 model. First a singleton reduction was realized where blocked reactions and singleton metabolites were removed, and found new irreversibilities were added; then the model was compressed by lumping linear pathways (in S1 Table). B: Summary of reduction and compression of Recon 2.2 model. First a singleton reduction was realized where blocked reactions and singleton metabolites were removed, and found new irreversibilities were added; then the model was compressed by lumping linear pathways (in S1 Table). C: Topologically independent modules of iJO1366. The 294 structural reversible reactions were grouped in 103 modules (in S1 Table). D: Correlation in gene expression between structurally reversible reactions of distance 2 of the iJO1366 model (in S1 Table) E: Effect of removing antiporter reactions. Flux variability analysis shows that range fluxes do not differ between full [vminF,vmaxF] and reduced [vminR,vmaxR] model (in S1 Table). F: Topologically independent modules of Recon 2.2. The remaining 1,168 structural reversible reactions, after removal of antiporter reactions, were grouped in 321 modules (in S1 Table). G: correlation in gene expression between structurally reversible reactions of distance 2 of the Recon 2.2 model (in S1 Table) H: Directionality constraints used for (in S1 Table) I: Directionality constraints used for (in S1 Table) J: Currency metabolites that were removed from reactions (in S1 Table). Table K: Reversible reactions with and without currency metabolites of (in S1 Table). Table L: Reversible reactions with and without currency metabolites of Recon 2.2 model (in S1 Table) Table M: Cliques distribution for different correlation thresholds in the largest metabolic modules of the iJO1866 model (in S1 Table). (XLSX) Click here for additional data file.

Code to calculate correlations and visualise the data.

(ZIP) Click here for additional data file.

Correlations computed for Human Recon 2.2.

Code to process large TPM files from the GTEx (The Genotype-Tissue Expression) project and produce correlation tables. (ZIP) Click here for additional data file. 24 Sep 2021 Dear Dr. Nielsen, Thank you very much for submitting your manuscript "The topology of genome scale reconstructions unravel independent metabolic modules and high model flexibility" (PCOMPBIOL-D-21-01428) for consideration at PLOS Computational Biology. As with all papers peer reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent peer reviewers. Based on the reports, we regret to inform you that we will not be pursuing this manuscript for publication at PLOS Computational Biology. In line with the points raised by the reviewers (please see below), the omission of energy/redox co-factors is a fundamental shortcoming towards identifying modules that are thermodynamically isolated. The reviews are attached below this email, and we hope you will find them helpful if you decide to revise the manuscript for submission elsewhere. We are sorry that we cannot be more positive on this occasion. We very much appreciate your wish to present your work in one of PLOS's Open Access publications. Thank you for your support, and we hope that you will consider PLOS Computational Biology for other submissions in the future. Sincerely, Tunahan Cakir Guest Editor PLOS Computational Biology Kiran Patil Deputy Editor PLOS Computational Biology ************************************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: We have read “The topology of genome scale reconstructions unravel independent metabolic modules and high model flexibility” by Martinez et al. The authors explored the topology of metabolic network reconstructions by defining sets of thermodynamically isolated modules connected by irreversible reactions. The modules corresponded with correlations in RNA transcription data. A major focus was on Directed Topologies which were defined by fixing the direction of reversible reactions and then determining how many could carry flux. The authors explored these topologies for metabolic networks of E. coli and Human metabolism. Overall, the manuscript describes a nice approach for defining metabolic subsystems in an unbiased way. However, we have several questions regarding the methods and assumptions used by the authors. Unfortunately, the methods section lacking leading to several questions related to the robustness of the method. Major Comments: • Our major concern relates to some ambiguity in the methods section. The authors describe two steps of model reduction. The first uses FVA to estimate the flux of all network reactions. However, the authors do not specify the constraints applied to their models prior to FVA. For the E. coli model, were the FVA calculations run using default constraints (e.g. aerobic M9 minimal media + glucose?) or was this a fully open model? What about for the human reconstruction? Changing constraints, especially exchange reactions will change FVA results. Thus, we must know what constraints were used. • Do the authors anticipate dramatically different results from altering these constrains? Would the identified modules change dramatically? Is a sensitivity analysis needed? • The authors casually discuss and dismiss several concerns without describing the basis for such statements or being more specific in what they mean. For example, on line 234 the authors state that “reduction of reversible anitiporter reactions don’t affect overall model capabilities”. How are the authors defining model capabilities? How was this determined? We understand that they likely mean capabilities in terms of catabolic and anabolic capabilities, but what about gene essentiality or other model predictive capabilities? The authors should be careful with their language and statements like these, and may need to demonstrate that no other model capabilities are changed based on removing such reactions. • We commend the authors use of RNA transcription data to bolster their results. However, its difficult for us to validate these results without deposition of data (both RNA transcription data and the model simulation results). We understand that the transcription data is from a separate published study but the outcome of the filtering steps (e.g. describe on line 199) are important to evaluate. Minor Comments: • Line 59: add “of metabolism” – reconstructions can be constructed for several other systems • Line 61 – “not” • Line 270 “visually” may not be the correct word • We encourage another round of grammar checks to address issues such as those above Reviewer #2: The work by Martinez et al aims to identify well defined metabolic modules in genome-scale metabolic models. To this end the authors focus on metabolically connected reversible reactions that are separated by irreversible reactions form each other. Each of these reversible cluster defines one model. The question of identifying modules in metabolism is very interesting and — in my understanding — has not been satisfactorily been answered so far. The approach by the authors is interesting. I do have, however, some concerns that relate to the strategy taken by the authors and the stability of the identified modules. * The authors remove currency metabolites from the analysis. This is a key step as it is removes many links. Thus it needs to be clearly defined what abundance makes a metabolite a currency metabolite. If the threshold that separates currency metabolites from non-currency metabolites is changed, do the cluster remain the same? * Removing links (currency metabolites and anti porters) is the key for the success of the presented method. In fact previous attempts to identify modules in (genome-scale) metabolic networks that did not remove links only succeed in identifying modules for the optimal flux cone, but not for the general, see https://doi.org/10.1007/s00285-013-0731-1 and https://doi.org/10.1038/srep00580 Thus I believe that removing links needs to be more thoroughly justified. In particular the removal of anti-porters appears to me as an adhoc assumption to make the method work. Also I encourage the authors to discuss similarities and differences between the modules identified by the authors to the optimal modules identified in the mentioned articles. * A nice feature of the work by Kelk et al (see above) is that the modules identified by Kelk et al. can nicely explain the combinatorial explosion of elementary flux modes (EFMs) in large metabolic networks, as they originate from the combination of the EFMs in the modules. Can such an argument be translated to the modules identified here as well? * With respect to the feasible reversible flux space identified by the authors, what is its relation to the reversible flux cone constraint by the set of minimal metabolic behaviors (MMB, see https://doi.org/10.1016/j.dam.2008.06.039). I think, they are the same. If so, what is the novelty in the proposed concept? * I do not understand if the authors assume a particular medium or if they take the models with all exchange reactions unconstrained. In the later case I again wonder if similar modules arise as for a specific medium I would expect that many reversible reactions turn irreversible resulting in a much smaller topological flexibility within a fixed environment. This also touches on the validation with the expression data of 54 different human tissues. Why do the authors not use tissue specific genome-scale reconstructions for this analysis? * The authors argue that their method identifies thermodynamically isolated modules and that reversible reactions are highly independent of each other. I like to disagree that these modules are isolated and that reversible reactions are highly independent of each as thermodynamic coupling is still possible via currency metabolites. For instance, the reversible Glutamate dehydrogenase in E. Coli turns into a thermodynamic bottleneck if grown on glucose, see https://doi.org/10.1128/jb.176.15.4664-4668.1994 * The question of how many directed topologies exist within a given environment has been theoretically addressed in terms of so called flux topes, see https://doi.org/10.1093/bioinformatics/bty550 I think it would be interesting to compare the authors heuristics with the exhaustive method—at least on a theoretical level. * The authors develop a heuristics to compute directed topologies an suggest that with their algorithm it is likely to find all feasible directed topologies. I think this statement needs to be explicitly shown. Either in a smaller network, like coli_core, http://www.asmscience.org/content/journal/ecosalplus/10.1128/ecosalplus.10.2.1#backarticlefulltext, or ECC2, https://doi.org/10.1038/srep39647, or by restricting the analysis to the optimal flux cone in a large network. Reviewer #3: The work by Martínez et al focuses on the well-known problem (re)interpretation of metabolic networks, finding modules. This work focuses on thermodynamic perspective, reversibility of reactions and topological properties of some (sub) networks. The novelty of the proposed method is questionable, if at all unclear. The idea seems interesting at first yet there are several points to be addressed in this paper, in particular the novelty is questionable and the criteria is rather poorly described and is unclear what is the difference between earlier and proposed concepts. The following points need to be addressed before any decision. P16, L279. The authors quantify the flexibility by (number of?) possible directed topologies. The authors need to elaborate this criterium, as this already has been tackled by the elementary modes concept. How this metric is different than earlier defined concepts? When the currency metabolites are removed from the system, what happens to the reactions that consumes/produces these? The reason to focus on this is as follows: the presence of internal loops or closed cycles can be drawn from the analysis of the stoichiometric matrix (e.g. analysis of null space as already mentioned in the manuscript) which is somewhat “distorted” by the removal of these currency metabolites. Focusing on finding the properties of the sub-network, how the proposed method is different from elementary flux modes (or extreme pathways) problem is not clear, and judging from enumeration approach accompanied by the classical analysis of stoichiometric matrix, it seems that this is study a minor contribution to the field. -------------------- Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No: Supplemental data appears to be missing and should be included in a revision Reviewer #2: No: * I may have overlooked it, but I did not see a link to the actual scripts and models used in the main text. Reviewer #3: None -------------------- PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No 27 Feb 2022 Submitted filename: Response letter 07_02_22.docx Click here for additional data file. 8 Apr 2022 Dear Dr. Nielsen, Thank you very much for submitting your manuscript "The topology of genome-scale metabolic reconstructions unravels independent modules and high network flexibility" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. In line with the points raised by the two reviewers, please (i) include the criteria you used in classifying a metabolite as currency metabolite to remove it from the network, and (ii) explain the logic behind using log2 in determining topological degrees of freedom. Please submit a revised version that address those points. Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Tunahan Cakir Guest Editor PLOS Computational Biology Kiran Patil Deputy Editor PLOS Computational Biology *********************** A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] In line with the points raised by the two reviewers, please (i) include the criteria you used in classifying a metabolite as currency metabolite to remove it from the network, and (ii) explain the logic behind using log2 in determining topological degrees of freedom. Please submit a revised version that address those points. Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have satisfactorily addressed all of our comments Reviewer #2: Thank you, authors, for your detailed response and the modifications to your manuscript. My concerns have been sufficiently addressed. I still wonder about the definition of cofactors and the impact of this definition on your results. Especially, I regret that you decided not to test the sensitivity of your results on a slightly different definition of cofactors (e.g. by increasing our decreasing the number of required links that make a metabolite a cofactor). At the least I'd ask you to make your current statement more precise "defined as highly connected metabolites were removed from the network reactions. The removal of currency metabolites was done manually based on the nature of the metabolites in each reaction. In order to preserve the metabolic functions of the network, currency metabolites were kept in reactions that synthesize or degrade consume them (Supporting Tables S10 and S11)." Please could you include the minimum number of links that makes a metabolite removable, (and any additional guiding principles that where used in your manual curation) For instance, in case there are 2 alternative synthesis pathways, did you keep both in or just one? etc. As for your ability on "isolated thermodynamic modules" I remain skeptical. Particularly I disagree with your response that "There is no thermodynamic coupling between the two assimilation systems" with respect to GS/GOGAT. Previously, we found that there is a coupling transmitted via cofactors, see https://doi.org/10.1038/srep08930 I'd love to discuss these aspects with you in future. However, this disagreement does not hinder publication. Reviewer #3: The revised work by Martínez et al focuses on determination of directed topologies, finding modules from thermodynamics perspective. There are several major changes in the article and the new version is clearly improved version. My earlier comments were centered on three points: - metrics in enumerating different modules. - difference with earlier work - fate of currency metabolites These points (fate of currency metabolites to a much lesser extent) have been addressed by the authors. Finally, the use of log2 in determining the degree of freedom needs to be motivated. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Jürgen Zanghellini Reviewer #3: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols References: Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. 10 May 2022 Submitted filename: Response letter 4.5.22.docx Click here for additional data file. 14 May 2022 Dear Dr. Nielsen, We are pleased to inform you that your manuscript 'The topology of genome-scale metabolic reconstructions unravels independent modules and high network flexibility' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Tunahan Cakir Guest Editor PLOS Computational Biology Kiran Patil Deputy Editor PLOS Computational Biology *********************************************************** 9 Jun 2022 PCOMPBIOL-D-21-01428R2 The topology of genome-scale metabolic reconstructions unravels independent modules and high network flexibility Dear Dr Nielsen, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Zsofia Freund PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

55 in total

1. Energy balance for analysis of complex metabolic networks.

Authors: Daniel A Beard; Shou-dan Liang; Hong Qian
Journal: Biophys J Date: 2002-07 Impact factor: 4.033

2. The connectivity structure, giant strong component and centrality of metabolic networks.

Authors: Hong-Wu Ma; An-Ping Zeng
Journal: Bioinformatics Date: 2003-07-22 Impact factor: 6.937

3. Flux modules in metabolic networks.

Authors: Arne C Müller; Alexander Bockmayr
Journal: J Math Biol Date: 2013-10-19 Impact factor: 2.259

4. A note on the complexity of finding and enumerating elementary modes.

Authors: Vicente Acuña; Alberto Marchetti-Spaccamela; Marie-France Sagot; Leen Stougie
Journal: Biosystems Date: 2009-12-03 Impact factor: 1.973

5. Accelerating flux balance calculations in genome-scale metabolic models by localizing the application of loopless constraints.

Authors: Siu H J Chan; Lin Wang; Satyakam Dash; Costas D Maranas
Journal: Bioinformatics Date: 2018-12-15 Impact factor: 6.937