Mikkel Herholdt Jensen1, Eliza J Morris1, Hai Tran2, Michael A Nash3,4, Cheemeng Tan5. 1. Department of Physics and Astronomy, California State University, Sacramento, California, United States of America. 2. Department of Chemistry, California State University, Sacramento, California, United States of America. 3. Department of Chemistry, University of Basel, Basel, Switzerland. 4. Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland. 5. Department of Biomedical Engineering, University of California, Davis, California, United States of America.
Abstract
Top-down proteomics has enabled the elucidation of heterogeneous protein complexes with different cofactors, post-translational modifications, and protein membership. This heterogeneity is believed to play a previously unknown role in cellular processes. The different molecular forms of a protein complex have come to be called "complex isoform" or "complexoform". Despite the elucidation of the complexoform, it remains unclear how and whether cellular circuits control the distribution of a complexoform. To help address this issue, we first simulate a generic three-protein complexoform to reveal the control of its distribution by the timing of gene transcription, mRNA translation, and protein transport. Overall, we ran 265 computational experiments: each averaged over 1,000 stochastic simulations. Based on the experiments, we show that genes arranged in a single operon, a cascade, or as two operons all give rise to the different protein composition of complexoform because of timing differences in protein-synthesis order. We also show that changes in the kinetics of expression, protein transport, or protein binding dramatically alter the distribution of the complexoform. Furthermore, both stochastic and transient kinetics control the assembly of the complexoform when the expression and assembly occur concurrently. We test our model against the biological cellulosome system. With biologically relevant rates, we find that the genetic circuitry controls the average final complexoform assembly and the variation in the assembly structure. Our results highlight the importance of both the genetic circuit architecture and kinetics in determining the distribution of a complexoform. Our work has a broad impact on our understanding of non-equilibrium processes in both living and synthetic biological systems.
Top-down proteomics has enabled the elucidation of heterogeneous protein complexes with different cofactors, post-translational modifications, and protein membership. This heterogeneity is believed to play a previously unknown role in cellular processes. The different molecular forms of a protein complex have come to be called "complex isoform" or "complexoform". Despite the elucidation of the complexoform, it remains unclear how and whether cellular circuits control the distribution of a complexoform. To help address this issue, we first simulate a generic three-protein complexoform to reveal the control of its distribution by the timing of gene transcription, mRNA translation, and protein transport. Overall, we ran 265 computational experiments: each averaged over 1,000 stochastic simulations. Based on the experiments, we show that genes arranged in a single operon, a cascade, or as two operons all give rise to the different protein composition of complexoform because of timing differences in protein-synthesis order. We also show that changes in the kinetics of expression, protein transport, or protein binding dramatically alter the distribution of the complexoform. Furthermore, both stochastic and transient kinetics control the assembly of the complexoform when the expression and assembly occur concurrently. We test our model against the biological cellulosome system. With biologically relevant rates, we find that the genetic circuitry controls the average final complexoform assembly and the variation in the assembly structure. Our results highlight the importance of both the genetic circuit architecture and kinetics in determining the distribution of a complexoform. Our work has a broad impact on our understanding of non-equilibrium processes in both living and synthetic biological systems.
Proteins are synthesized in specific orders to assemble large protein complexes, such as microtubule, proteasome, ribosomes, and cellulosome. These protein complexes are assembled both inside and outside cells through the coordination of gene expression, protein transport, and binding processes. Prior work has been assuming that protein complexes have a homogeneous composition of protein members. Yet, recent top-down proteomics shows that protein complexes can compose of different cofactors, post-translational modifications, and protein membership [1-3]. The different molecular forms of a protein complex have come to be called the complex isoforms or complexoforms [1]. For example, recent work shows that the yeast homotetrameric FBP1 complex can co-exist with 0 to 4 phosphorylated amino acids [4]. Bacteria cellulosomes are also found to exist in heterogeneous compositions [5-11]. Furthermore, a recent computational study [12] has investigated the formation of protein complexes using existing data on protein-protein interaction networks. This prior work shows that the composition of a protein complex can drift over time even when the simulation starts from the same initial condition. The work suggests that other cellular mechanisms must exist to prevent the compositional drift of some protein complexes.While the elucidation of complexoforms is now possible, finding the mechanisms that control the distribution of complexoforms is challenging. There are two competing paradigms for understanding the formation of complexoforms. A classical but unproven paradigm assumes that temporal ordering of protein synthesis is essential for accurate assembly of protein complexes. Yet, most biochemical studies mix proteins in a test tube to study protein complex assembly, neglecting the underlying cellular networks that control synthesis and transport of the proteins. This approach suggests a second, contradicting paradigm: the underlying cellular networks are not essential for the assembly of quaternary protein complexes.One way to resolve the contradiction is to examine genome sequences for the spatial arrangement of genes in the genome: if genes are arranged in specific patterns that correlate with their synthesis order or the assembly sequence of protein complexes, then the underlying gene networks may be essential for the protein assembly. Indeed, a study of the human genome shows that housekeeping genes expressed in most tissues show strong clustering [13]. The data suggest that the ordering of protein synthesis may be conserved for critical protein complexes [14]. For non-critical protein complexes, either each cell-type evolves its protein synthesis order to achieve unique complexoforms, or the protein synthesis order is not an important factor.A central question currently challenging the field is, therefore, what underlying rules govern protein-synthesis or assembly order, and the limit of the rules in controlling the distribution of a complexoform. Here, we use stochastic computational simulations to model protein expression and assembly from the bottom-up. We first simulate a three-protein model system to investigate how complexoform composition is affected by the timing of gene transcription, mRNA translation, and protein transport, as well as the gene circuit architecture. Due to the small number of chemical agents and proteins often involved in protein assembly processes, ordinary equilibrium statistical mechanics approaches are not expected to adequately describe the eventual protein assembly outcome. We show that genetic circuits, physical transport, and binding kinetic rates all modulate the distribution of complexoform. Comparison of the stochastic simulations to deterministic solutions reveal two distinct kinetic regimes: a slow equilibrium regime in which the average structure is well predicted by equilibrium statistical mechanics, and a fast nonequilibrium regime in which small-number statistics leads to structures far from equilibrium predictions. This work points to new and previously unappreciated mechanisms of modulating the distribution of a complexoform. Our work also highlights the importance of both gene circuit architecture and nonequilibrium stochasticity in regulating the formation of complexoforms. Our work provides a possible resolution of the gene-circuit vs. protein-only paradigms: gene-circuit-based regulation of complexoform distribution may be important only under the select kinetic regime and cellular context.
Results
The type of genetic circuit modulates complexoform assembly
Protein expression and assembly typically involve a relatively small number of each type of molecule (<1000 per type of protein [15]). In such cases, equilibrium statistical mechanics may not accurately describe the process of protein expression and assembly, since the ensemble of reactants is insufficiently small. Thus, the process becomes inherently stochastic in nature. Here, we use a stochastic model to examine the heterogeneity in protein expression and assembly.Our model system consists of a genetic circuit expressing two types of proteins, denoted X and Y. The proteins are exported to a sub-cellular location and then bind to scaffold proteins, each consisting of two docking sites. This model system allows us to investigate the role of timing and genetic circuit architecture in modulating the protein assembly, irrespective of the exact number of types of proteins and binding sites. In this model, the binding between the proteins creates a protein complex (Fig 1). The proteins compete for 10 scaffold proteins. The final complexoforms can thus consist of XX, XY, or YY. When a small number of molecules are involved in each process, the exact ratio of each is stochastically determined. To establish the average distribution of complexoform, simulations are run 1,000 times for each genetic circuit and each set of parameters. This number of simulations is found to be sufficient to reach a reproducible average to within an uncertainty of 1 percentage point of XX, XY, and YY.
Fig 1
Illustration of the complexoform assembly model system.
Simulated processes and rates are indicated by arrows and italicized text. In the model, genetic circuitry coding for either protein X or protein Y is initiated by a promoter and transcribed to mRNA. Translational machinery then produces proteins X and Y, which are exported and compete for a limited number of scaffold proteins, each with two docking sites. The simulation also incorporates a rate of loss of mRNA and external protein. Generic simulations are carried out with 10 external scaffold proteins (i.e., 20 docking sites).
Illustration of the complexoform assembly model system.
Simulated processes and rates are indicated by arrows and italicized text. In the model, genetic circuitry coding for either protein X or protein Y is initiated by a promoter and transcribed to mRNA. Translational machinery then produces proteins X and Y, which are exported and compete for a limited number of scaffold proteins, each with two docking sites. The simulation also incorporates a rate of loss of mRNA and external protein. Generic simulations are carried out with 10 external scaffold proteins (i.e., 20 docking sites).We set up four common types of genetic circuits expressing two proteins, X and Y. The series uncoupled circuit expresses the two proteins from the same operon, but the translation occurs at two separate ribosome binding sites. The series coupled circuit expresses the two proteins from the same operon and the translation occurs through a single ribosome binding site. The cascade circuit expresses the two proteins from two separate promoters linked in a cascade. Finally, the parallel circuit expresses the two proteins independently from two identical promoters.We first investigate the role of the specific genetic circuit architecture in determining the protein assembly by assuming that the rate constants are identical. This assumption helps to isolate the effect of gene circuit architecture on the formation of complexoform. It will be relaxed in subsequent simulations. Each genetic circuit modulates the distribution of the complexoforms differently (Fig 2). For the cascade, series uncoupled, and series coupled circuits, the final complexoforms are predominantly XX. This trend is most evident for the series coupled circuit (84% XX, 15% XY, 1% YY) and the cascade circuit (86% XX, 10% XY, 4% YY). The results arise because these circuits produce the mRNA for X or protein X first. For the series uncoupled circuit, protein X is less prevalent in the final complexoform, but still exceeds protein Y (45% XX, 38% XY, 17% YY). The parallel circuit stochastically produces similar amounts of X and Y in the final complexoform (45% XX, 12% XY, 43% YY). These results indicate that even with identical rate constants and initial concentrations of all the reactants, the genetic circuit architecture can significantly modulate the final distribution of the complexoform.
Four different genetic circuits expressing proteins X and Y (parallel, cascade, series uncoupled, and series coupled) are simulated with identical first-order rates of transcription (1 t-1), translation (0.1 [C]-1t-1), export (1 t-1), and binding (1 [C]-1t-1) to 10 scaffold proteins each with 2 docking sites. The pie charts indicate the average outcomes from 1,000 stochastic simulations, showing the percentage of XX (black), YY (white), and XY/YX complexoforms (grey). The simulation results demonstrate that the genetic circuit architecture controls the final protein assembly structure.
Four different genetic circuits expressing proteins X and Y (parallel, cascade, series uncoupled, and series coupled) are simulated with identical first-order rates of transcription (1 t-1), translation (0.1 [C]-1t-1), export (1 t-1), and binding (1 [C]-1t-1) to 10 scaffold proteins each with 2 docking sites. The pie charts indicate the average outcomes from 1,000 stochastic simulations, showing the percentage of XX (black), YY (white), and XY/YX complexoforms (grey). The simulation results demonstrate that the genetic circuit architecture controls the final protein assembly structure.
Reaction rates of both protein expression and physical processes modulate complexoform assembly
Next, we determine the impact of the reaction constants on protein assembly. We simulate each genetic circuit while individually varying the translation rate, the export rate, and the binding rate. For comparison, we also solve each condition deterministically. Fig 3 shows the mean distribution of the complexoform from stochastic simulation (circles) and deterministic solutions (lines). In all cases, the kinetic rates modulate the final distribution of the complexoform. Slow translation, export, or binding all result in a distribution of complexoform of roughly 25% XX, 50% XY, and 25% YY. Our results show that varying the kinetic rates in the stochastic model alters the distribution of the complexoform, demonstrating that the genetic and physical rate constants, as well as the exact genetic circuit architecture, can modulate the eventual protein assembly. We note that the rate of mRNA degradation also alters the distribution of the complexoform, but that the effect is captured by altering the rate of translation (i.e., varying the ratio of the translation rate to the mRNA degradation rate), as noted in S3 Appendix. In contrast, the promoter binding rate, while changing the deterministic solutions, does not affect the stochastically predicted distribution of the complexoform (see S4 Appendix).
Fig 3
Computational simulations of complexoform assembly using a generic model system.
Stochastic solutions (XX: black circles; YY: white circles; XY: grey circles) and deterministic solutions (XX: solid black line; YY: dashed line; XY: solid grey line) for parallel (first column), cascade (second column), series uncoupled (third column), and series coupled (fourth column). For the parallel circuit, the deterministic solutions for XX and YY overlap perfectly, so only one line is visible on the graph. For each circuit, variations in translation rate, protein export rate, and binding rate reveal an “equilibrium” regime, in which the stochastic and deterministic solutions are in relatively good agreement, and a “non-equilibrium” regime, in which the stochastic and deterministic solutions disagree. More stringent genetic circuits (e.g., the series coupled circuit) exhibits a lower discrepancy between the stochastic outcome and the deterministic solution.
Computational simulations of complexoform assembly using a generic model system.
Stochastic solutions (XX: black circles; YY: white circles; XY: grey circles) and deterministic solutions (XX: solid black line; YY: dashed line; XY: solid grey line) for parallel (first column), cascade (second column), series uncoupled (third column), and series coupled (fourth column). For the parallel circuit, the deterministic solutions for XX and YY overlap perfectly, so only one line is visible on the graph. For each circuit, variations in translation rate, protein export rate, and binding rate reveal an “equilibrium” regime, in which the stochastic and deterministic solutions are in relatively good agreement, and a “non-equilibrium” regime, in which the stochastic and deterministic solutions disagree. More stringent genetic circuits (e.g., the series coupled circuit) exhibits a lower discrepancy between the stochastic outcome and the deterministic solution.As any one of the rates is increased, the stochastic distribution of the complexoform begins to deviate from the equilibrium outcome of equal amounts of X and Y in the assemblies. For the parallel circuit, any single stochastic simulation produces either predominantly XX or YY assemblies, each with a 50/50 chance, and has a very low chance of producing any heterogeneous XY complexoforms. Since the structure of the cascade, series uncoupled, and series coupled genetic circuits all favor protein X expression before protein Y, higher kinetic rates tend to produce XX complexoform in these circuits. In contrast, if a sufficiently large number of both types of proteins reach the scaffold protein before the docking sites are occupied, any binding site has an equal chance of binding either protein X or Y, and the four different complexoforms (X+X, Y+Y, X+Y, and Y+X) have an equal chance of occurring. A homogeneous XX or YY should, therefore, each occur 25% of the time, while a heterogeneous XY or YX assembly should account for 50% of the final scaffold proteins. This 25/50/25 distribution in the final protein complex is seen in almost all cases when the rates are sufficiently slow (Fig 3). For example, a sufficiently slow rate of protein export results in the combinatorically predicted the 25/50/25 distribution for all four types of genetic circuits (Fig 3, second row). In this case, the protein complex is independent of the underlying genetic circuit architecture: even though the cascade, series uncoupled, and series coupled circuits all synthesize protein X before protein Y, a bottleneck in a subsequent physical process ensures that the amounts of X and Y have time to reach similar levels before accessing the docking sites.
Kinetic rates drive either equilibrium or non-equilibrium assembly
Our simulation results suggest that the protein assembly process can be divided into two fundamental limiting time scales: One slow regime, in which the slow kinetic rates allow for equal amounts of protein X and Y to access the docking sites, and a fast regime, in which one protein undergoes rapid expression, transport, and binding, thus dominating the complexoform. If the assembly process is sufficiently slowed down by one or more rate-limiting steps, the protein assembly process can be treated as a problem of equilibrium statistical mechanics; however, this is not the case if the protein expression and assembly process are sufficiently rapid.To better quantify these two limiting time scales, which we term the “equilibrium” and “non-equilibrium” regimes, we numerically solve the set of coupled differential equations deterministically and compare our solutions to the stochastic simulation results. In the deterministic approach, the assembly is not limited by a small number of reactants in the system, and the deterministic numerical solutions, therefore, represent the complexoform in the limit of large-number statistical mechanics. If the stochastically driven protein assembly reaches an equilibrium state, the stochastic simulations and numerical solutions should agree. In contrast, a discrepancy between the stochastic simulations and deterministic numerical solutions would suggest that the protein assembly is not well described by predictions from large ensemble statistical mechanics, but instead is governed by non-equilibrium, small-number statistics.The predictions from the deterministic solutions are shown together with the stochastic results (Fig 3). The discrepancy between the stochastic simulation average and the deterministic solution is calculated as the sum of the differences for each of the three types of complexoform assemblies (XX, YY, or XY/YX) (Fig 4) together with the variation in the assembly structure from the stochastic simulations. The stochastic simulations agree with the deterministic solutions in all cases and for all genetic circuits when any one of the kinetic rates is sufficiently slow, corresponding to the equilibrium regime. The equilibrium regime does not imply that all complexoform assemblies would be identical. Indeed, in the equilibrium regime, the complexoform is still stochastically determined and varies from simulation to simulation, as is evident from the non-zero stochastic variation at slow rates (Fig 4, dashed lines). However, in the equilibrium regime, the average stochastic outcome is well predicted by treating the problem as a large-ensemble deterministic system. In contrast, when any one of the rates is increased, the stochastic average deviates from the deterministic solution, and the deterministic equilibrium solution no longer accurately predicts the average complexoform from the stochastic simulation, corresponding to the non-equilibrium regime. The transition from the equilibrium to the non-equilibrium regime is seen as a rise in the discrepancy between the stochastic simulation average and the deterministic solution (Fig 4, solid lines). In the non-equilibrium regime, the protein assembly arrests into a non-equilibrium structure as vacant docking sites are depleted before reaching the equilibrium structure predicted by the deterministic solution. The system transitions between the equilibrium and non-equilibrium regimes when either the genetic expression rates or the physical rates of protein export and binding are altered, and generally coincides with a peak in the stochastic variation in the protein assembly structure. A notable exception is observed for the simulations of the series coupled genetic circuit. Since this circuit imposes the most stringent structure on both the transcription and translation of the proteins, the average stochastic structure agrees well with the deterministic solution, even as the protein export or binding rate is increased.
Fig 4
The discrepancy between the deterministic and stochastic solutions, and stochastic variation in protein assembly structure for four stochastically simulated genetic circuits.
The discrepancy between the stochastic simulations and deterministic solutions (solid line) tends to be greater when rates are high and assembly proceeds quickly, suggesting that the protein assembly in this limit is not accurately represented by the corresponding deterministic solution. However, when assembly is slow, the predicted assembly matches the equilibrium prediction from mixing large amounts of the constituents X and Y, and all circuits yield similar protein assembly products. Circuits with a more controlled sequence of expression (e.g., the series coupled circuit) exhibit an overall better agreement with the deterministic solution. The stochastic variation in assembly (dashed line) also varies between the different circuits, suggesting that the type of genetic circuit, as well as the kinetics of assembly, both play into determining the variability of the assembled structure.
The discrepancy between the deterministic and stochastic solutions, and stochastic variation in protein assembly structure for four stochastically simulated genetic circuits.
The discrepancy between the stochastic simulations and deterministic solutions (solid line) tends to be greater when rates are high and assembly proceeds quickly, suggesting that the protein assembly in this limit is not accurately represented by the corresponding deterministic solution. However, when assembly is slow, the predicted assembly matches the equilibrium prediction from mixing large amounts of the constituents X and Y, and all circuits yield similar protein assembly products. Circuits with a more controlled sequence of expression (e.g., the series coupled circuit) exhibit an overall better agreement with the deterministic solution. The stochastic variation in assembly (dashed line) also varies between the different circuits, suggesting that the type of genetic circuit, as well as the kinetics of assembly, both play into determining the variability of the assembled structure.Our results also show that altering any of the rates of the system, in turn, changes the average total time to fully occupy the available docking sites, which we term time to assembly (Tasb). The average fraction of XX, XY, and YY complexoform in the stochastic simulation is shown as a function of the average Tasb in Fig 5. Contrary to our expectation, the fraction of each complexoform falls onto a single curve, regardless of which of the rates is being changed. Although creating an earlier bottleneck by slowing down the earlier processes, such as translation, results in the longest Tasb, the total Tasb is highly correlated with the complexoform composition, and the average assembly outcome from the stochastic simulation can be correlated to a single parameter, the Tasb. Here again, however, the series coupled genetic circuit is an exception; when the translation rate in this genetic circuit is lowered, the sequential architecture of the circuit results predominantly in XX complexoform. This result again suggests that the gene circuit architecture can exert a high level of control over the protein assembly structure, even when the assembly is slow, and a large number of proteins are involved in stochastically forming the structure.
Fig 5
The genetic circuits generate similar assembly times, regardless of which rate is varied.
Shown is the stochastically simulated average fraction of XX (black), YY (white), and XY complexoform (grey) in the protein assemblies when varying translation rate (circles), protein export rate (squares), and protein binding rates (diamonds). Each circuit exhibits a transition from the fast non-equilibrium regime to the slow equilibrium regime, although the exact transition time is dependent on the type of genetic circuit. A notable exception is the series coupled circuit, indicating that the highly sequential architecture of this circuit modulates the protein assembly structure in both the fast and slow regimes.
The genetic circuits generate similar assembly times, regardless of which rate is varied.
Shown is the stochastically simulated average fraction of XX (black), YY (white), and XY complexoform (grey) in the protein assemblies when varying translation rate (circles), protein export rate (squares), and protein binding rates (diamonds). Each circuit exhibits a transition from the fast non-equilibrium regime to the slow equilibrium regime, although the exact transition time is dependent on the type of genetic circuit. A notable exception is the series coupled circuit, indicating that the highly sequential architecture of this circuit modulates the protein assembly structure in both the fast and slow regimes.
Comparisons to other biological model systems
To compare our modeling results to typical rates and concentrations in biological systems, and to determine whether protein assembly in biological systems generally fall in the equilibrium or non-equilibrium regime, we run our computational models with typical biochemical and physical rates of protein expression, export, and binding for bacterial cellulosome. This system is modeled as a scaffold protein with 10 binding sites to which proteins X and Y can bind. Rates representative of the bacterial cellulosome complexoform, as well as relevant references, are summarized in Table 1. Since it is difficult to estimate the exact external per-molecule binding rate and loss rate of individual proteins, we simulate cellulosome assembly with a range of possible protein binding rates while keeping the loss rate fixed, thus varying the ratio of binding to loss rate for the external protein. Each condition is simulated 1,000 times to generate a histogram of the outcomes, as quantified by the number of binding sites on the scaffold protein occupied by protein X (the rest being occupied by protein Y; see Fig 6). From these simulations, we quantify both the variation in stochastic assembly, as well as the discrepancy between stochastic simulations and deterministic solutions. The simulation results for each of the four genetic circuits (parallel, cascade, series uncoupled, and series coupled) are summarized in Fig 7, with each of the translation rate, protein export rate, and protein binding rate being varied about the estimated physiological value from literature and previous experimentally reported values.
Table 1
Rates and concentrations used in simulating cellulosome assembly.
Value
References
Rates
kpro (RNAP promoter binding)
5.6∙107 M-1s-1 = 0.09 s-1
[16]
kpro- (RNAP promoter unbinding)
0.20 s-1
kgene (RNAP synthesis rate constant)
0.36 s-1
[17]
km (transcription)
0.03 s-1
[17–20]
kmrna-loss (mRNA degradation)
0.002 s-1
[21]
kp (translation)
0.03 s-1
[19,20,22]
kout (diffusive protein export)
0.5 s-1
[23–24]
kbind (protein association rate)
1∙106 M-1s-1 = 0.002 s-1
[25–26]
kbind- (protein dissociation rate)
0
kout-loss (protein loss)
0.0002 s-1
[27]
Reactants
P (promoter DNA)
1
RNAP (RNA polymerase)
100
[28]
Rib (ribosome)
5
[29–30]
Docking sites per scaffold protein
10
[5]
Note: Two proteins are expressed and bind to an external scaffold with 10 binding sites. Intracellular molar concentrations are converted to molecules per cell assuming that 1 M is equal to 6∙108 molecules per cell. Note that the protein assembly process is assumed to be irreversible for the purposes of this simulation, whereas actual measurements in the literature does show that the complex does disassemble over time.
Fig 6
Stochastic simulations and deterministic solutions for a cellulosome model system, in which two proteins X and Y expressed in a parallel gene circuit bind to 10 scaffold binding sites to form a complexoform.
The outcome of 1,000 stochastic simulations is indicated as grey bars, while the deterministic solution is indicated by a dashed line. The model parameters are summarized in Table 1, with the translation rate constant in this figure being varied across several orders of magnitude: 3.10-5 s-1 (A), 3.10-2 s-1 (B), and 3.101 s-1 (C). For each simulation condition, the stochastic variation in cellulosome assembly is calculated as the standard deviation of the distribution of stochastic simulations. The discrepancy between the stochastic simulations and deterministic solution is calculated as the average difference between each simulation and the deterministic solution.
Fig 7
Computational simulations of cellulosome assembly consisting of a scaffold with 10 available binding sites for proteins X and Y for four stochastically simulated genetic circuits.
The model parameters are summarized in Table 1. The vertical axis indicates the number of scaffold binding sites occupied by protein X. The average of 1,000 stochastic simulations is indicated by solid circles for each condition, with error bars indicating the standard deviation from the simulation. Deterministic solution results are indicated by dashed lines. The discrepancy between stochastic and deterministic solutions (calculated as the average difference between each simulation and the deterministic solution) is indicated by solid lines. Grey dots indicate the estimate of physiological rates (Table 1), about which each rate is varied.
Stochastic simulations and deterministic solutions for a cellulosome model system, in which two proteins X and Y expressed in a parallel gene circuit bind to 10 scaffold binding sites to form a complexoform.
The outcome of 1,000 stochastic simulations is indicated as grey bars, while the deterministic solution is indicated by a dashed line. The model parameters are summarized in Table 1, with the translation rate constant in this figure being varied across several orders of magnitude: 3.10-5 s-1 (A), 3.10-2 s-1 (B), and 3.101 s-1 (C). For each simulation condition, the stochastic variation in cellulosome assembly is calculated as the standard deviation of the distribution of stochastic simulations. The discrepancy between the stochastic simulations and deterministic solution is calculated as the average difference between each simulation and the deterministic solution.
Computational simulations of cellulosome assembly consisting of a scaffold with 10 available binding sites for proteins X and Y for four stochastically simulated genetic circuits.
The model parameters are summarized in Table 1. The vertical axis indicates the number of scaffold binding sites occupied by protein X. The average of 1,000 stochastic simulations is indicated by solid circles for each condition, with error bars indicating the standard deviation from the simulation. Deterministic solution results are indicated by dashed lines. The discrepancy between stochastic and deterministic solutions (calculated as the average difference between each simulation and the deterministic solution) is indicated by solid lines. Grey dots indicate the estimate of physiological rates (Table 1), about which each rate is varied.Note: Two proteins are expressed and bind to an external scaffold with 10 binding sites. Intracellular molar concentrations are converted to molecules per cell assuming that 1 M is equal to 6∙108 molecules per cell. Note that the protein assembly process is assumed to be irreversible for the purposes of this simulation, whereas actual measurements in the literature does show that the complex does disassemble over time.Our simulations show that, as observed for the generic model system, the type of genetic circuit exerts a great deal of control over the eventual protein assembly. For example, a parallel circuit will result in an average assembly of 50% X and 50% Y (Fig 7, column 1), whereas the cascade circuit and the series circuits are weighted towards X to varying degrees (Fig 7, columns 2, 3, and 4). As was also observed for the generic model, the parallel genetic circuit exhibits the highest degree of variation in the assembly outcome, whereas the more structured genetic circuits have lower variability in the final assembly composition. One major difference between the generic model (with arbitrary rates) and the cellulosome model is that whereas the stochastic simulations of the generic model would often deviate significantly from the deterministic solutions, the simulations with the rates for the cellulosome system exhibits virtually no such deviations between the stochastic average and the deterministic solution. One exception to this is the parallel circuit, where a very fast rate of expression could have resulted in an assembly consisting of either X or Y (see also Fig 6C). Overall, however, the model using biological cellulosome rates predicts good prediction between the deterministic and stochastic simulations, although the stochastic simulations still predict significant variability in the final protein assembly. Interestingly, the simulations also reveal that the cellulosome kinetic rates place the system at the transition between the fast and slow limits seen in the generic model. For example, in the cascade genetic circuit, the biological rates predict an average assembly with about 90% protein X; however, if the binding rate had been 10-fold slower (or the loss rate 10-fold higher), the fraction of protein X in the assembly drops to 77%, and a 1000-fold change predicts a fraction of 56% X, i.e., almost equal amounts of X and Y. Both protein binding and protein loss could depend on the exact extracellular conditions (e.g., the degree of crowding), thus impacting the assembly purely by altering the kinetic rates involved. Thus, our data demonstrate that both the kinetic rates and the genetic circuit can moderate the eventual assembly outcome in the case of the bacterial cellulosome.
Discussion
In summary, our work shows that the underlying genetic circuit architecture does modulate the protein assembly. However, it is the interplay between the circuit architecture and the genetic and physical rate kinetics that together determine the protein assembly structure. We demonstrate two distinct behaviors of kinetic assembly: a slow equilibrium regime, in which the average assembly is well described by equilibrium statistical mechanics, and a fast non-equilibrium regime, in which the average assembly arrests before the system reaches equilibrium. Regardless of the equilibrium or non-equilibrium regime, the cumulative protein concentrations (i.e., the total amount of protein available to bind over time) determine the eventual complexoform distribution (S5 Appendix). Furthermore, we demonstrate that the two regimes can be regulated by tuning any of the kinetic rates involved in the protein expression and assembly process, whether biochemical or physical. The arresting of the assembly into a non-equilibrium structure has previously been observed on much larger length scales, such as in the dynamic arrest occurring in macroscopic protein assemblies such as biopolymer networks, in which the kinetics of assembly can highly affect the non-equilibrium assembly structure [31-34]. Our work shows that similar dynamic arrest can occur on a much smaller scale as well, in the assembly of protein complexes involving just a handful of individual proteins. The results highlight new mechanisms, in addition to restrictive or preferential binding, through which systems can control stochastic processes such as protein assembly. The results also underscore the importance of both kinetics and stochastic non-equilibrium behavior in addition to the genetic circuit architecture as modulators of protein assembly processes.There are several future considerations for our results. First, our work did not consider the details of how genetic context affects the transcription or translation rate of genes. For instance, a prior work [35] shows that the arrangement of two genes can affect the level, dynamic range, and sensitivity of their expression. The spatial arrangement of the two genes will affect the expression of our parallel genetic module. Specifically, if the parallel genetic module is inducible by certain chemicals, its sensitivity and dynamic range of induction would also be affected by the arrangement of the two genes, as suggested by the prior work. In our work, we have not considered the induction dynamics of the genetic module, which may serve as another regulatory force of complexoforms.Second, one challenge in our work is to obtain the analytical solution of the stochastic equations. For instance, previous work by Laurenzi, Renyi, Mcquarrie, and Ishida [36-39] have solved the analytical solutions for A+B↔C. To obtain the solution, the authors exploit the conservation of mass between A, B, and C. Finding similar analytical solutions to our model involves two challenges. First, there are multiple bimolecular reactions in our model. Second, it is challenging to incorporate the conservation of mass for intermediate reactions of our model. There is, however, a qualitative agreement between our simulations and prior analytical solution. For instance, Laurenzi shows that stochastic solution deviates from a deterministic solution when the number of molecules is low. We observe this trend generally for all our results because the molecular number in our system is low. Furthermore, Laurenzi shows that the variance of molecular concentrations peaks at intermediate reaction time when the reaction becomes irreversible. Likewise, we observe that when the reaction rate becomes slower, our system takes a long time to approach the steady-state solution. The longer transient kinetics increase the duration when the system exhibits a high variance of molecular concentration. This enhanced variance likely leads to the significant deviation between the stochastic and deterministic solutions when reaction rate constants are high.Third, the arrangement of two genes in the series coupled module may affect the expression level of the genes. Typically, the second gene is expressed at a lower level than the first gene. A prior work [40] has quantified the effect of GC content, size, and folding energy of the first gene on the expression of the second gene. Incorporating these details will improve the prediction power of our model on the distribution of complexoforms.Our study points to the needs of quantifying heterogeneity of complexoform inside cells and studying the regulatory mechanisms of the heterogeneity. New mass spectrometry technology and modeling tools may be used to reveal new insights into complexoforms that were not possible before. For instance, the proteasomes of Rhodococcus consisted of both alpha (α1, α2) and beta (β1, β2) subunits [41]. The subunits can form four variants of complexoform (α1β1, α1β2, α1β2, α2β1) with similar proteasome functions. Does natural Rhodococcus control the heterogeneity of the proteasome at the genetic or protein level, or both? The Trp synthase of E. Coli also consists of two α and two β subunits. The genes are arranged in the same order as the assembly order of the subunits [42]. Does the assembly process of the synthase fall in the equilibrium or non-equilibrium regimes? Answer to the question requires detailed measurements of the kinetic parameters that are not available yet. Finally, the Pili of E. Coli consists of four different protein subunits, and the gene order is arranged opposite of the assembly order [43]. Does this contradiction of gene and assembly order indicate that the assembly is solely determined by protein-protein binding and that there is less heterogeneity in the final protein complex? To this end, when the kinetic parameters of these processes become available, our modeling framework may be used to predict the heterogeneity of the complexoform and the equilibrium vs. non-equilibrium control of the systems. Furthermore, a meta-analysis of genome sequences and protein-protein interaction maps may be used to reveal the relative abundance of each circuit architectures for assembling known complexoforms.
Methods
Modeling protein expression, export, and binding
Our computational model consists of a set of coupled biochemical and physical processes, starting from transcription, and ending with the binding of protein products to 10 scaffold proteins, each with two docking sites to form a final complexoform. In the first process, an RNA polymerase binds to a promoter to synthesize mRNA. This transcription step is followed by the translation of two different protein products, denoted X and Y. The proteins are then transported for binding to a scaffold protein to form a three-protein complexoform. The model, which is summarized schematically in Fig 1, also incorporates the degradation of mRNA and diffusive loss of proteins after transport.The transcription and translation of the two proteins occur through one of four genetic circuits: (1) a parallel expression circuit, in which transcription and translation of the two proteins happen concurrently and independently; (2) a cascade expression circuit, in which the transcription and translation of protein X create the needed transcription activator for the subsequent transcription of protein Y; (3) a series uncoupled expression circuit, in which the transcription of the two genes occurs sequentially but with uncoupled translation; and (4) a series coupled expression circuit, in which both the transcription and translation are sequential. Within each of these four different genetic circuit architectures, the rates of genetic expression and physical export and protein binding processes are varied in the model. In all cases, the evolution of the system is arrested as the docking sites are exhausted, and no further changes to the protein assembly structure can occur once they are all occupied.The set of biochemical and physical processes defining each genetic circuit is listed in S1 Appendix. All processes are modeled as linear reactions, in which the reaction is linearly dependent on a rate constant and on the concentration of the reactants. Thus, for a set of reactions of the form
the rate of change of any reactant is described by a first-order differential equation:
Here, [∙] indicates the concentration of each reactant. Any additional reactions involving the reactant Q is incorporated as additional, linear terms, and the rate of change of each reactant is modeled by an additional differential equation. In this model, any arbitrary number of reactions involving any number of reactants can thus be written as a set of coupled, first-order differential equations. For large ensembles (i.e., large concentrations of all reactants involved), these coupled differential equations are deterministic, and the outcome for each reactant can be predicted exactly by solving the equations numerically. However, for a sufficiently small number of molecules, the outcome becomes stochastic and must be approached using stochastic algorithms. We, therefore, solve the coupled differential equations both deterministically to obtain exact numerical solutions, and stochastically using a stochastic algorithm.
Computational model implementation
All computer simulations are done using Matlab (Mathworks, Natick MA). The built-in ode45 differential equation solver is used for all deterministic numerical solutions. The initial conditions of the deterministic differential equation solutions are identical to those of the stochastic model, except that the amount of each reactant is treated as a continuous, rather than a discrete variable, and are allowed non-integer values.Stochastic simulations use a custom-written Gillespie algorithm [44] in Matlab, in which each reaction outlined in S1 Appendix is treated stochastically. In this algorithm, each possible reaction is numerically simulated using a Monte Carlo technique by generating a tentative reaction time for each reaction based on the concentrations and reaction rates at that instant in time, and in each case, executing the reaction with the shortest tentative reaction time. This process is repeated until all binding sites are occupied, after which no further evolution of the protein assembly can occur. Unlike in the deterministic solution, the stochastic simulation only allows for discrete changes in the amount of each reactant, and only integer values are allowed for each reactant. To account for the stochastic variation between simulations, each set of conditions is simulated 1,000 times and averaged. For each set of initial conditions, we calculate the average assembly structure for the stochastic simulation by counting the docking site occupancy (i.e., the final average number of scaffold proteins with XX, XY, or YY count for each run of the simulation). The standard deviation is calculated for each of the three types of assemblies. We quantify the stochastic variation in the simulation as the sum of these three standard deviations.To compare the deterministic and stochastic solutions, we calculate the difference between the amounts of each type of protein complex predicted from the deterministic solution with the average amount observed from the stochastic simulation. The sum of these three differences is used as a measure of the overall discrepancy between the deterministic and stochastic computational solutions.For each genetic circuit, we vary the translation rate, the protein export rate, and the protein binding rate to investigate the impact of kinetics on the modulation of the final protein assembly. The initial conditions and rates are all listed in S2 Appendix. In order to investigate the impact of the relative, rather than the absolute, kinetic rates on the final assembly structure in the model system, simulations are done using arbitrary but dimensional units: time is indicated in units of t, concentrations are indicated in units of [C], and rates are indicated in units of [C]-1t-1 or t-1.
Biological rates of the cellulosome complex
In order to compare our simulation results in arbitrary units to biological systems, we run both deterministic and stochastic simulations using typical concentrations and biochemical and physical rates of protein expression, export, and binding for the bacterial cellulose-degrading multi-enzyme complexes, known as cellulosomes. We modeled the assembly of cellulosomes, which comprise scaffold proteins containing repeated copies of Cohesin (Coh) domains and catalytic enzymes containing Dockerin (Doc) domains. Each Coh within the scaffold serves as a binding site for the corresponding Doc domain located at the C-terminus of each enzyme. The enzymes are secreted from the bacteria and assemble onto the extracellular scaffold proteins through Coh-Doc binding. The Coh-Doc interactions, albeit high-affinity, are promiscuous, and dozens of different enzymes containing homologous and closely related terminal Doc sequences are able to bind to any of the Coh domains within the scaffold. Although the exact cellulosome complex can contain many types of catalytic modules and scaffoldin binding sites [5], we adopt a model in which the external scaffold has 10 available sites to which two different proteins can directly bind, as this model captures the competing role of both timing and gene circuit architecture in modulating the complexoform. While some rates relevant to complexoform enzyme expression and assembly have been previously measured and reported in the literature, such as the promoter binding and unbinding rates [16] and the rates of the protein-protein binding interaction [25-26], we infer or estimate other rates to an order of magnitude, as described below.Previous reports have shown that mRNA degradation occurs with half-lives ranging from a few minutes in E. Coli [21] and ~20 minutes in S. Cerevisiaeyeast [45] to several hours in H. Sapiens cells [46]. Therefore, we assume a typical bacterial mRNA half-life of 5 minutes, corresponding to a decay rate of 0.002 s−1.The rate of protein export is assumed for on the size of 50–100 kDa [23], suggesting an effective diffusion coefficient of about 1 μm2s-1 through a cytoplasmic environment [24]. We, therefore, estimate that diffusive transport in a bacterium with a size of ~2 μm would take on the order of 2 seconds, corresponding to a rate of export on the order of 0.5 s-1 after completed expression.We assume that RNA polymerase creates RNA at a rate of about 45 nucleotides per second, and that ribosomal translation occurs at a rate of about 15 amino acids per second [17-20,22]. Since 3 base pairs code for one amino acid, the rate of transcription and translation are roughly equal. Assuming a typical protein size of about 50 kDa and an average amino acid size of 110 Da, one full protein would take 30 seconds of transcription and translation, corresponding to a rate of 0.03 s-1 for both processes.In order to consider reaction rates and concentrations in terms of the number of molecules of each reactant inside a bacterium, we convert molar concentrations to molecules per bacteria volume with E. Coli as a reference. A concentration of 1 M corresponds to 6∙108 molecules/bacterium, assuming a bacterial volume of 1 fL. We assume one copy of each gene (typically the case for bacteria with one chromosome), with about 100 RNA polymerases per gene [28] (estimated from ~23 RNA polymerase per lacZ gene). Although there are about 20,000 ribosomes in one E. Coli bacterium [29], these are employed in the expression of roughly 4,000 different genes [30]; we, therefore, assume 5 available ribosomes.It is difficult to estimate the exact external per-molecule binding rate and loss rate of individual proteins. The cohesin-dockerin association rate has been measured to be about 1∙106 M-1s-1 [25-26], and assuming that the outside reaction volume is similar to the volume of a single bacterium, the external binding rate is 0.002 s-1 per molecule. Protein lifetimes are typically on the order of tens of minutes or several hours [27]. Assuming a lifetime of about an hour, this implies a protein loss rate of 0.0002 s-1 per molecule.All the values used for the cellulosome assembly simulations are summarized in Table 1. Since most values are estimated to an order of magnitude, we vary the translation, export, and binding rates across many orders of magnitude to elucidate whether the cellulosome complexoform assembly process occurs in the equilibrium or non-equilibrium regime. In the simulation, we vary the ratio of the protein association rate and diffusive loss rate. Although data shows a slow dissociation of the dockerin/cohesin complex, we assume that the binding is irreversible to allow us to study the structure of the completed assembly right after protein expression, transport, and assembly. We note that with a slowly reversible binding, the complexoform distribution should eventually equilibrate with the external solution concentrations of protein in solution over long time scales.All data generated for and presented in this study are available in full in the S1 Data.(PDF)Click here for additional data file.(PDF)Click here for additional data file.(PDF)Click here for additional data file.(PDF)Click here for additional data file.(PDF)Click here for additional data file.(XLSX)Click here for additional data file.21 Feb 2020Dear Dr. Tan,Thank you very much for submitting your manuscript "Stochastic ordering of complexoform protein assembly by genetic circuits" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.In addition to the reviewer comments below, please also consider my own comment that this work seems to be related to earlier work linked to here -- https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0032032 -- which also used a simulation-based approach to determine distribution of complexoforms and their kinetics. If so, it may be worth citing and discussing.Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.When you are ready to resubmit, please upload the following:[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).Important additional instructions are given below your reviewer comments.Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.Sincerely,James R. FaederAssociate EditorPLOS Computational BiologyWilliam NobleDeputy EditorPLOS Computational Biology***********************A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:[LINK]Reviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #1: This is a nice study elucidating the role of biological circuitry in control and assembly of complexoform proteins. With a few modifications, it should be accepted for publication. Please take the following suggestions into account:• The authors should take into account the effect of different promoter strengths and different ribosome binding site strengths for each separate gene. The authors should analyze the effect of changing the promoter and RBS strengths for an individual gene, and compare this to the effect of changing the genetic circuit architecture. If circuit architecture effects tend to dominate over promoter and RBS binding site effects, this should be explained.• The authors should include some discussion of the relative abundance of each type of circuit architecture found in natural genomes, for assembling known complexoforms. Is there a particular circuit layout which tends to appear more often than the others?• The authors should discuss their work in relation to the following literature:o Yeung et al. Biophysical Constraints Arising from Compositional Context in Synthetic Gene Networks. Cell Systems. 2017.o In particular, the authors should include a discussion of how the orientation of genetic circuits in a parallel configuration affects transcription, and how this might impact their model.• The term “complexoform” is of limited use in the literature (a Google Scholar search returns four relevant results). Accordingly, in order to assure clarity, the authors should introduce the term as “complex isoforms” as is done in Fonslow et al.Reviewer #2: This is an interesting and novel study, where computational simulations are used to analyze a recently appreciated phenomenon -- complexoforms. In particular, the authors thoroughly analyzed how gene circuit architecture and rate constants can affect distribution of complexoforms. They have also thoroughly compared deterministic and stochastic simulations to examine the potential contribution of noise in this dynamics.The study is well designed and the paper is well written and easy to follow. The model formulation and subsequent analysis are fairly standard. But the key innovation of the study is the application context (to a novel problem, as far as I can tell). The work defines a useful blueprint for the dynamic analysis of complexoforms. The modeling framework (in particular) and the results can be useful for guiding future experimental analysis.I am in support of the publication of the work with some minor clarifications and revisions, as detailed below.Specific points:1. In essence, the different circuit architectures dictate the temporal dynamics of X and Y, which in turn bind the the different scaffold sites. As such, it is not entirely surprising that different architectures would modulate the find distribution of complexoforms differently. I wonder if it is possible to deduce some more general rules that govern the final distribution based on the temporal profiles of X and Y (outside the cell).2. Related to the point above, the binding kinetics of X and Y to scaffold site should be important as well. If I understand it correctly, the authors assume the binding to be irreversible (Table 1). I wonder what happens to the distributions if the binding is reversible, which appears to be more plausible.3. Does the number of binding sites matter? At it stands, the choice of the number seems ad hoc.4. There was a lot of focus on comparing deterministic modeling and stochastic modeling. The resulting insights are not well articulated and need to be clarified in a revision. Is there experimental data that the modeling results can be compared to?Reviewer #3: In the manuscript titled " Stochastic ordering of complexoform proteinassembly by genetic circuits", Jensen et al. used combination of stochasticsimulation and deterministic differential equation model to investigate howparameters such as timing of transcription, translation, and transport rateimpact formation preferences of complexoform. They found that in two differentextreme scenarios, complexoform variations are also vastly different. At theend, the authors used a real biological example to verify their framework. Ithink the authors did a good job explaining this relatively new topic and madethe text fairly easy to follow. Discussing a relatively less studied new area,this work used standard simulation methods to shed some new light on possiblemechanisms controlling complexoform preferences, attributing some of thecontrol to gene circuit layout. This finding hopefully would inspire morefuture work in this area to investigate this interesting and important problem.I found the manuscript to be informative, theapproaches used appropriate, and would recommend for its publication given myfollowing concerns/comments to be addressed:1. One major concern of mine is the lack of any analytical results. Numericalsimulations are definitely preferred in many scenarios, including this work.But careful analysis often is also needed. It would complement numericalsimulations well and often offer a fuller picture. I understand analyticalresults are not possible for complex systems, but in this case, there is nofeedback and an analysis of the simplified scenarios should be within reach.2. I am curious why the authors did not consider mRNA degradation as one keyparameters. mRNA degradation has been shown to be a key regulatory mechanismin biology. In addition, recent work (Wu et al, Cell Systems 6, 206-215, 2018)showed that in the series coupled scenarios, the order of the gene X and Ywould affect their stability and hence could cause a change in thecomplexoform formation biases.**********Have all data underlying the figures and results presented in the manuscript been provided?Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology
data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.Reviewer #1: YesReviewer #2: YesReviewer #3: Yes**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: NoReviewer #3: NoFigure Files:While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at .Data Requirements:Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.Reproducibility:To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see24 Apr 2020Submitted filename: response _ plos comp bio _ 0416.pdfClick here for additional data file.28 May 2020Dear Dr. Tan,We are pleased to inform you that your manuscript 'Stochastic ordering of complexoform protein assembly by genetic circuits' has been provisionally accepted for publication in PLOS Computational Biology.Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.Best regards,James R. FaederAssociate EditorPLOS Computational BiologyWilliam NobleDeputy EditorPLOS Computational Biology***********************************************************Reviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #1: The authors have adequately addressed my points.Reviewer #2: The authors have fully addressed my raised points and I support the publication of the paper.**********Have all data underlying the figures and results presented in the manuscript been provided?Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology
data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.Reviewer #1: YesReviewer #2: Yes**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: No19 Jun 2020PCOMPBIOL-D-20-00101R1Stochastic ordering of complexoform protein assembly by genetic circuitsDear Dr Tan,I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!With kind regards,Sarah HammondPLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol
Authors: Michiel van de Waterbeemd; Kyle L Fort; Dmitriy Boll; Maria Reinhardt-Szyba; Andrew Routh; Alexander Makarov; Albert J R Heck Journal: Nat Methods Date: 2017-01-23 Impact factor: 28.547
Authors: Jonathan A Bernstein; Arkady B Khodursky; Pei-Hsun Lin; Sue Lin-Chao; Stanley N Cohen Journal: Proc Natl Acad Sci U S A Date: 2002-07-15 Impact factor: 11.205
Authors: Gary M Skinner; Christoph G Baumann; Diana M Quinn; Justin E Molloy; James G Hoggett Journal: J Biol Chem Date: 2003-11-03 Impact factor: 5.157
Authors: Owen S Skinner; Pierre C Havugimana; Nicole A Haverland; Luca Fornelli; Bryan P Early; Joseph B Greer; Ryan T Fellers; Kenneth R Durbin; Luis H F Do Vale; Rafael D Melani; Henrique S Seckler; Micah T Nelp; Mikhail E Belov; Stevan R Horning; Alexander A Makarov; Richard D LeDuc; Vahe Bandarian; Philip D Compton; Neil L Kelleher Journal: Nat Methods Date: 2016-01-18 Impact factor: 28.547