Eric Charles Dykeman1. 1. Department of Mathematics, University of York, York, United Kingdom.
Abstract
Computational modelling of in vivo protein synthesis is highly complicated, as it requires the simulation of ribosomal movement over the entire transcriptome, as well as consideration of the concentration effects from 40+ different types of tRNAs and numerous other protein factors. Here I report on the development of a stochastic model for protein translation that is capable of simulating the dynamical process of in vivo protein synthesis in a prokaryotic cell containing several thousand unique mRNA sequences, with explicit nucleotide information for each, and report on a number of biological predictions which are beyond the scope of existing models. In particular, I show that, when the complex network of concentration dependent interactions between elongation factors, tRNAs, ribosomes, and other factors required for protein synthesis are included in full detail, several biological phenomena, such as the increasing peptide elongation rate with bacterial growth rate, are predicted as emergent properties of the model. The stochastic model presented here demonstrates the importance of considering the translational process at this level of detail, and provides a platform to interrogate various aspects of translation that are difficult to study in more coarse-grained models.
Computational modelling of in vivo protein synthesis is highly complicated, as it requires the simulation of ribosomal movement over the entire transcriptome, as well as consideration of the concentration effects from 40+ different types of tRNAs and numerous other protein factors. Here I report on the development of a stochastic model for protein translation that is capable of simulating the dynamical process of in vivo protein synthesis in a prokaryotic cell containing several thousand unique mRNA sequences, with explicit nucleotide information for each, and report on a number of biological predictions which are beyond the scope of existing models. In particular, I show that, when the complex network of concentration dependent interactions between elongation factors, tRNAs, ribosomes, and other factors required for protein synthesis are included in full detail, several biological phenomena, such as the increasing peptide elongation rate with bacterial growth rate, are predicted as emergent properties of the model. The stochastic model presented here demonstrates the importance of considering the translational process at this level of detail, and provides a platform to interrogate various aspects of translation that are difficult to study in more coarse-grained models.
In cells, ribosomes are responsible for translation of protein from messenger RNA (mRNA), which has been transcribed from the genomic DNA. At any given time, multiple mRNAs are being transcribed by the polymerase machinery in response to cellular stresses and feedback mechanisms. The full set of transcribed mRNAs in the cell represents its transcriptome, and translation by the ribosomal machinery gives rise to its proteome. The dynamics of the transcriptome, its response to cellular stresses, and its influences on protein synthesis, as well as the overall proteome, are important for understanding how disease processes arise at the cellular level. Although there are many processes that can influence the dynamics of the proteome, such as control of mRNA expression at the transcriptional level, it has been recognised that the overall kinetics of translation by the ribosomal machinery can exert additional control on protein levels in the cell. Because translation by ribosomes involves many individual kinetic steps and is influenced by the concentration of translational factors such as elongation and initiation factors, which must be created during ribosomal translation of the mRNAs encoding them, the dynamics of the translational process has complex interdependencies and feedback loops making the accurate modeling of the process challenging.In essence, translation is governed by the physics of molecular interactions and diffusion processes, thus making it amenable to study using standard chemical kinetics and the mathematics of the chemical master equation. Ideally, when constructing a chemical master equation model of translation, one would like to account for as many of the individual biochemical reactions in the translational process as possible that affect the overall efficiency of protein production. Zur et al. [1] recently highlighted a list of seven features which they contend are required to be included in a translational model, at a minimum, to provide a comprehensive picture of translation in vivo. To date, development of kinetic models of the dynamics of ribosomes have mainly focused on either (1) stochastic based methods [2-4], or (2) statistical approaches which, roughly speaking, treat mRNAs as lattices with the ribosomes moving along the mRNA lattice points according to certain hopping probabilities [5-7]. The ribosomes can be modeled as occupying single lattice sites or as extended objects occupying multiple lattice sites [8].Stochastic based methods have considered ribosomes moving along the mRNA in a stochastic manner, with different levels of detail taken into account. For example, Chu and von der Haar [3] have constructed a tool in Java which takes into account the movement of 200k ribosomes on 60k mRNAs with sequence information for each, with positional information on the mRNA encoded for each ribosome. Their model of translation in yeast [2] considers the complex positioning of all the ribosomes on the mRNAs along with relative tRNA abundances and the effects of codon bias and competition between the tRNAs for the A-aite of the ribosome. However, they do not consider concentration effects from elongation factors such as eEF1 eEF2 (Ef-Tu and Ef-Tu/Ef-Ts in bacterial systems), exchange of GDP for GTP on these of elongation factors, nor the concentration and competition effects between charged tRNAs for Ef-Tu and formation of TC, nor pre-mature termination on sense codons.In the case of statistical based methods, these models are referred to as “totally asymmetric simple exclusion processes” or TASEP models. These types of models are well established in the literature and have been used to model aspects of the translational process since the 60’s and 70’s. A comprehensive review of TASEP models (or its cousin the ribosome flow model—RFM [9] which is a mean field approximation of the TASEP) and their current capabilites is not feasible here, but can be found, along with an account of various other computational models of translation, in [1, 10] with a discussion of the benefits and limitations of each. However, while RFM and TASEP models have been useful in considering some of the concentration dependencies on the translational process, such as from the inclusion of tRNAs or Ef-G [11, 12]; or have included some steps of the initiation process [13], or considered TASEP models with a mixture of a few mRNAs [14], it has been recently stated that there are “currently no models that consider all the fundamental mRNA translational aspects at a cellular level” [1, 9].In this work, I have followed what could be termed a “bottom-up” approach to the construction of a translational kinetics model to be used for simulation of translation in a cell, with the goal of fulfilling the seven features noted by Zur et al. [1]. This approach is based on two underlying principles: (1) to encode into the model the current biochemical knowledge on all of the ribosome reactions that occur at every stage (i.e. initiation, elongation, and termination), and (2) to minimially adjust experimentally measured kinetic parameters, when required, such that the results of the model recapitulate features of the translational process that have been either observed directly or deduced from experimental measurements in vivo. A similar approach has been recently attempted by Matsuura et al. [15], where they created an ordinary differential equation (ODE) model, solved in MATLAB, which simulates the synthesis of a tri-peptide from a single mRNA species containing three codons. Their ODE model contains a reported 968 reactions and 242 components, a large number considering that only synthesis of a tri-peptide from a single mRNA type was considered. This highlights the complexity of the number of states and reactions that need to be modelled when scaling up to an entire cell with thousands of different mRNAs and tens of thousands of ribosomes, and why models including all the biochemical reactions that can occur have been difficult to develop.Although a “bottom-up” approach has been attempted by Matsuura et al., there are a number of large differences between their model and the one I report here. First, the Matsuura model focuses on simulating protein synthesis in an in vitro reconstituted protein synthesis system, whereas I focus on modeling the translational process in vivo. Second, their approach only simulates the synthesis on a single mRNA type with a single ribosome occupying the mRNA at any one time. In contrast, I consider protein synthesis on thousands of mRNAs, each with an explicit and potentially unique nucleotide sequence that are allowed to be in a polysome state. Finally, although Matsuura et al. use experimentally measured kinetic rates, they have not, to my knowledge, checked for robustness of the rates against experimental observations of in vitro reconstituted protein translation systems. The last point is particularly important to note since many experimental groups have had conflicting accounts over the years of both the range of kinetic rates for certain reactions (most notably for GTP/GDP binding to Ef-G [16, 17]) and have also proposed different models for the kinetic events and their order of occurrence on the ribosome. Moreover, Indrisiunaite et al. [18] has noted that many of these rates can be sensitive to buffer conditions, making verification of the model by comparing with experimental observations critical. By testing the kinetic rates and reaction models that have been proposed by experimental groups for the various stages of translation, I have been able to confirm which of these result in a quasi-steady state of translation in vivo (please see supporting information for full discussion). Although the model I report here is not definitive and likely requires additional adjustments to the kinetic rates to better fit to experimental observations, particularly premature termination and stop codon read through rates (see supporting information), it does demonstrate that a whole cell model of translation, taking into account all known protein factors and biochemical steps, is computable in a reasonable time. Finally, by taking into account (1) the explicit nucleotide content of the full transcriptome, and (2) all of the translational factors (e.g. Efs, RFs and tRNAs) and how their concentrations influence the translational process, the model reveals how several translational properties emerge naturally from the model due to the complex network of the interacting components that are obscured in more coarse-grained models, or models which neglected, e.g., exchange of GDP for GTP on elongation factors [4].
Results
Computational model of ribosome kinetics in vivo
Fig 1 illustrates the various kinetic steps of ribosome kinetics in vivo that has been implemented in the stochastic model reported here. Highlighting the main features of the stochastic model, it contains ≈ 50 kinetic parameters for the initiation, elongation, and termination stages of the ribosome, along with ≈ 20 kinetic parameters which govern, for example, GTP/GDP binding to GTPases such as Ef-G and Ef-Tu. It is capable of simulating translation on thousands of unique mRNA sequences in a polysome arrangement, whose sequences are explicitly taken into account, and is set-up to predict initiation rates based on mRNA secondary structure and sequence at both cognate and non-cognate start codons (see supporting information). It also depends on the concentrations of: ≈ 10 initiation, elongation, and recycling factors; nucleotide tri and di-phosphates; 20 amino acids; and over 40 tRNAs. Premature termination is accounted for in the model, along with mis-incorporation of non-cognate peptides, and the occurrence of stop codon read-through events. Although a framework has been set up for these advanced features in the code and the kinetic model for initiation and premature termination is based on the current best understanding of the biochemical steps involved, more experimental/theoretical work on kinetic rates would be required to implement these features in full. Moreover, as specific kinetic information for the aminoacylation of tRNAs have only been examined for two of the 20 aminoacyl-tRNA synthases, the complex individual kinetic steps are implemented in the model as a single reaction (dashed box in Fig 1A). It should also be noted that the model currently assumes that the concentrations of amino acids and nucleotide phosphates are constant, that mRNAs are not degraded or produced due to transcription, and that the total number of ribosomes, initiation factors, elongation factors etc. are constant. Thus, the model assumes a quasi-steady state for ribosome, mRNA, and elongation factor concentrations. This is a reasonable assumption for cells undergoing exponential growth, as a constant supply of amino acids and GTP/ATP would be expected in such a scenario. However, with minor modifications to the code, a more detailed model can be implemented where such features were dynamic. A full description of the model, along with justification for the kinetic rates based on experimental evidence, is given in the supporting information.
Fig 1
Diagram of the ribosome kinetic model.
(A) Illustration of the reaction network that is simulated in the model. Aminoacylation of tRNAs (dashed boxes) indicate an area where there is limited knowledge of the biochemical kinetics and additional experimental work is needed. (B) Overview of the implementation steps. Specific mRNA sequences, and the concentration of ribosomes, are used as input parameters. The program determines the quantities of elongation factors, etc. that are expected from the number of ribosomes based on experimental data [19]. Examples of output data are shown in the final column.
Diagram of the ribosome kinetic model.
(A) Illustration of the reaction network that is simulated in the model. Aminoacylation of tRNAs (dashed boxes) indicate an area where there is limited knowledge of the biochemical kinetics and additional experimental work is needed. (B) Overview of the implementation steps. Specific mRNA sequences, and the concentration of ribosomes, are used as input parameters. The program determines the quantities of elongation factors, etc. that are expected from the number of ribosomes based on experimental data [19]. Examples of output data are shown in the final column.To simulate the kinetic reactions listed in the supporting information, I implement an exact Gillespie stochastic model [20] where individual reactions are randomly “fired”, one at a time, according to the reaction propensity function
Here, i labels one of the N mRNAs, j labels one of the M possible reactions for this mRNA, and k are the individual kinetic rates for each reaction, with ϕ denoting the sum of reactions involving mRNA i only. Since an exact Gillespie model can take considerable time to simulate, I am using a binary tree structure, with nodes representing the partial sums ϕ of the reactions involving mRNA i, to compute Φ and identify reactions to fire at each step. The tree is searched from the top node, choosing the branch which is greater than rΦ, with random r ∈ [0, 1], until a specific mRNA number η which satisfies is identified. After firing of a specific reaction in mRNA η, the reactions in this mRNA are updated and the same path through the tree is re-traced, updating the sums accordingly. In this way, the total propensity Φ is updated in log2(N) time following the approach used in a Gillepsie model of RNA kinetics involving single base-paring reactions [21]. This technique amounts to a re-ordering of the reaction propensities and, as noted by Cao et al. [22], results in a trajectory which will be statistically equivalent to one that was formed by the standard Gillepsie algorithm. Full details of the Log(N) binary tree method can be found in the supporting information of reference [21]. The simulations reported in this work for the μ = 1.0 doublings/hour case take half a day to reach 103 seconds of simulation time on a single processor. Simulations at the higher growth rate μ = 2.5 take 3-4 days due to the larger number of ribosomes present and increased number of reactions that need to occur to reach 103 seconds of simulation time. However, individual reactions are implemented for both cases in log2(N) time. Thus, if a cell is doubling at a rate of once per hour (μ = 1.0), then the kinetic model reported in this work can simulate the entire protein production that takes place in a single cell during one complete cell cycle in roughly 30 hours of computational time.
Validation of the model with experimental observations
To verify that the model recapitulates experimental observations of translation in Prokaryotes, I have performed a simulation of translational kinetics in exponentially multiplying E. coli at three different growth rates; μ = 0.7, μ = 1.0, and μ = 2.5 doublings per hour. Before simulating these, a representative transcriptome must be constructed that mimics what could potentially be observed in vivo. Each E. coli cell will have a temporaly dynamic transcriptome due to expression of different genes in response to various environmental cues and/or being at different stages of the cell cycle. Thus, an exact representation of the transcriptome of a single cell is difficult to measure, since measurments will often be averaged over many cells. Moreover, [23] have shown in a transcriptional time series measurement of the Lac operon using sm-FISH that the numbers of mRNAs for the LacZ gene not only vary in response to intracellular lactose concentration, but vary between cells following a geometric/Poisson distribution at low/high gene expression levels.Using these observations, I have constructed transcriptomes by selecting genes from the E. coli proteome at random and assigning a number of mRNA copies for each gene sampled according to the two types of probability distributions (geometric/Poisson). It should be noted that the nucleotide sequence of the mRNAs do not match authentic E. coli mRNAs, but instead codons are chosen so that the codon biases match with the tRNA abundances in Table K in S1 Text (see Methods). This resulted in transcriptomes T07, T10, and T25 for use in the μ = 0.7, 1.0, and 2.5 growth rate simulations, respectively. Although the T07,T10 and T25 transcriptomes are artificial, I will show in the next section (Biological Predictions of the model) that without the adjustment to the codon bias of the transcriptome to match tRNA abundances (or vis versa), translation efficiency is extremely poor.The T07 transcriptome contains 650 mRNAs comprising N = 0.56M nucleotides, while the T10 and T25 transcriptomes contain a total of 1265 mRNAs with 1.2M nucleotides and 5151 mRNAs with 5.0M nucleotides (c.f. Table 1). The total nucleotides in each transcriptome have been chosen to be roughly in line with experimental values from [19], which essentially fixes, to within approximately 10%, the total number of mRNAs that have been used in the simulations. In more concrete terms, the number of mRNAs that are used for the T07, T10, and T25 transcritpomes are within 10% of the value determined experimentally. Simulations were performed using the total number of elongation factors, tRNAs, etc. listed in Tables K-N in S1 Text. Each of these factors are assumed to increase linearly with the number of ribosomes in the cell following Bremer and Dennis [19]. Thus, users only need to specify the number of ribosomes and the individual mRNA sequences for the simulation (see Fig 1B). Kinetic rates were identical for all growth rates and are given in the supporting information. The volume of an E. coli ranges from v = 0.83 to v = 1.12 μm3 for rates of μ = 1.3 to μ = 2.1 [19]. As this change in volume has negligible effect on the overall kinetic rates in a Gillepsie model, I have used an average volume size of v = 1.0 μm3 for all simulations. Finally, although my model can theoretically compute the ribosome binding site (RBS) and the initiation rate for an mRNA based on detailed knowledge of its 5’ UTR sequence and secondary structure, this feature requires additional experimental validation before it can be fully implemented. Thus, I have assumed for now that ribosomes initiate on all mRNAs at the maximal rate, i.e. the situation where the 5’ UTR is unstructured and a Shine-Dalgarno sequence is present. This has an added benefit for the testing/validation of the termination and recycling phases, which need to be fast enough to process mRNAs which are maximally translated. This requires that the kinetics of the termination and recycling steps be such that substantial stalling or a full blown traffic jam does not result on a maximally translating mRNA. Fixing mRNAs to the case of maximal initiation has allowed for verification that my model does not result in in-efficient translation on maximally translated mRNAs and has revealed that one particular experimental model of termination was more consistent with this requirement than an alternative (see supporting information).
Table 1
Efficiency of translation for different transcriptomes.
Trans.
N
Nnt
Cp
% I
% E
% T
% S
% 50S
T25
5151
5.00
21.0
5.1
77.4
2.0
2.3
15.6
T10
1265
1.20
18.1
4.8
79.0
2.0
3.0
14.2
T07
650
0.56
15.0
4.9
78.4
2.1
4.3
14.6
T10a
1265
1.20
8.1
3.8
95.0
1.0
13.6
1.5
T10b
2040
1.24
17.7
7.9
81.7
3.4
5.0
7.9
T10c
2040
2.00
17.9
6.6
90.0
2.1
1.0
2.3
Values for each transcriptome (T25,T10,etc.) are obtained from an average over three separate simulations. Variables denote the following quantities: N—number of mRNAs in the transcriptome, N—total number of nucleotides in the transcriptome in millions, C—average protein chain elongation rate (aa/sec), (%I,%E,%T, and %S)—Percentage of ribosomes (I)nitiating, (E)longating, (T)erminating, (S)talled, and %50S—Percentage of 50S ribosomal subunits that are free.
Values for each transcriptome (T25,T10,etc.) are obtained from an average over three separate simulations. Variables denote the following quantities: N—number of mRNAs in the transcriptome, N—total number of nucleotides in the transcriptome in millions, C—average protein chain elongation rate (aa/sec), (%I,%E,%T, and %S)—Percentage of ribosomes (I)nitiating, (E)longating, (T)erminating, (S)talled, and %50S—Percentage of 50S ribosomal subunits that are free.Fig 2 shows the percentage of ribosomes at different stages of the translation process, along with the percentage of free 30S pre-initiation complex (30S:PIC) and free 50S, at the three growth rates. The percentages are computed relative to the total 50S (or equivalently 30S) subunits available. For all growth rates, roughly 15% of 50S ribosome subunits are free, while the remaining β = 85% are in one of the stages of translation. Specific values for intatition, elongation, and terminaition are listed in Table 1. The percentage of elongating ribosomes, which are stalled, is given in the %S column. Although the β value is consistent with Bremer and Dennis [19], I have found that it has a complex dependence on at least four factors: (1) the total number of initiation sites, (2) the total number of nucleotides across the transcriptome, (3) the different lengths of the open reading frames and their relative abundance in the transcriptome, and (4) the initiation rates of the ribosome on all mRNAs. Finally, Fig 3 shows the computed peptide chain elongation rates (C) for all growth rates. As can be seen in Fig 3, the average C value is roughly 15 codons per second at the growth rate μ = 0.7, 18 codons per second at 1.0, while it is 21 codons per second at μ = 2.5. These values compare extremely well with the expected rates of 15,18 and 22 codons per second, respectively, reported in Bremer and Dennis [19, 24].
Fig 2
Translational kinetics of ribosomes at different E. coli growth rates.
Time courses for the percentage of ribosomes (compared with total ribosomal mass) that are initiating (black), elongating (green), terminating (red), stalled (purple), or free 50S (blue) and free 30S:PIC (brown) are shown for growth rates of (A) μ = 0.7, (B) μ = 1.0, and (C) μ = 2.5 doublings per hour. The experimentally expected ratio of ribosome in elongating complexes (70S:EC—green line) to total ribosome is 0.80–0.85 for all growth rates [19], which matches with the model prediction. Time courses represent an average of three separate simulations.
Fig 3
Peptide chain elongation rates at different E. coli growth rates.
The peptide chain elongation rates (C) in amino acids per second are shown for the simulations at growth rates μ = 0.7,1.0, and 2.5 doublings per hour. Running averages of C over 1000 seperate protein syntheisis events are given by the red curves and reveal an average peptide chain elongation of C = 15,18, and 21 amino acids per second for μ = 0.7,1.0, and 2.5 doublings per hour, respectively. All model predictions of the peptide elongation rates match experimental estimates of [24].
Translational kinetics of ribosomes at different E. coli growth rates.
Time courses for the percentage of ribosomes (compared with total ribosomal mass) that are initiating (black), elongating (green), terminating (red), stalled (purple), or free 50S (blue) and free 30S:PIC (brown) are shown for growth rates of (A) μ = 0.7, (B) μ = 1.0, and (C) μ = 2.5 doublings per hour. The experimentally expected ratio of ribosome in elongating complexes (70S:EC—green line) to total ribosome is 0.80–0.85 for all growth rates [19], which matches with the model prediction. Time courses represent an average of three separate simulations.
Peptide chain elongation rates at different E. coli growth rates.
The peptide chain elongation rates (C) in amino acids per second are shown for the simulations at growth rates μ = 0.7,1.0, and 2.5 doublings per hour. Running averages of C over 1000 seperate protein syntheisis events are given by the red curves and reveal an average peptide chain elongation of C = 15,18, and 21 amino acids per second for μ = 0.7,1.0, and 2.5 doublings per hour, respectively. All model predictions of the peptide elongation rates match experimental estimates of [24].
Biological predictions of the model
Elongation rate increases with cellular growth rate
As can be seen in Fig 3, C decreases as the growth rate decreases, even though all simulations use exactly the same kinetic rates for the elongation reactions. Previous models such as in Rudorf [25] reported on the use of different kinetic rates for the elongation phase of the ribosome in order to enforce the observed C values, while the stochastic framework used in Vieira et al. [4] found that there was no increase in C with increasing growth rates. Here, the difference in peptide chain elongation rates arises from the complex interdependence of the various translational factors, i.e it is an emergent property of the model. One might hypothesize that the decrease in C that follows a reduction in the growth rate arises solely from the drop in concentration of the tRNAs and elongation factors, as these are directly associated with the elongation kinetics of the ribosome. This is incorrect, however, as the C value, as well as the overall translational kinetics of the ribosome, are robust to attempts of increasing the protein chain elongation rate by optimizing the concentrations of a few translational factors. To demonstrate this, I have performed simulations at the growth rate of μ = 0.7 doublings per hour with the same number of ribosomes and same transcriptome as the T07 simulations shown in Fig 2A, but have doubled the number of certain components as follows. When only tRNAs and the elongation factors Ef-Tu, Ef-Ts, and Ef-G are doubled, C = 15.1, but with the additional consequence that there were twice as many 70S elongating ribosomes stalled on mRNAs (due to insufficient recycling factors). Doubling only the number of Ef-Tu, Ef-Ts, or recycling factors resulted in values of C = 16.0, 15.5, and 15.6, respectively. Finally, choosing different random transcriptomes (by sampling different genes) also had no effect on C. Only when all tRNAs, EFs and RFs doubled in unison did C increase demonstrating how the peptide chain elongation rate is an emergent property of the model.
Elongation rate depends on tRNA abundance and codon bias of the transcriptome
Although the translational kinetics of the ribosome are robust to alterations of the translational factors, it is very sensitive to the abundance of tRNAs relative to the codon bias in the transcriptome. Dong et al. have previously measured tRNA molar abundances from E. coli at different growth rates using two-dimensional electrophoresis (c.f. Table 5 in [26]). Additonally, they estimated codon usage in the transcriptome at different growth rates using coding sequences from Genbank and previous measurements from [27] on the abundance of proteins at different growth rates (c.f. Table 3 in [26]). Using the Genbank and Pedersen data, Dong gives a codon usage frequency for AAA as 4.9% at μ = 1.0. However, the tRNA cognate to AAA represents only 1.5% of the total molar mass of tRNA that they measured. Hence the codon usage reported by Dong, based on the Pederson and Genbank data, does not exactly match the tRNA profile that they measured. To illustrate the dramatic effects that this mismatching between tRNA abundance and codon usage in the transcriptome can have on translation efficiency, I have constructed transcriptome T10a, which was adapted from T10 so as to have identical numbers of mRNAs with identical reading frame lengths for each. However, the codons in T10a have been altered so as to have the same codon biases as the Pederson and Genbank data reported in Table 3 of [26] i.e. T10a has a codon bias reflecting the wild-type E. coli transcriptome. However, while the codon bias in transcriptome T10a reflects what was observed in the experiments of Pedersen et al. [27] for wild-type E. coli, the demand on tRNAs that this codon usage implies will not match the relative tRNA abundances from Dong et al. used in the simulation.Simulation of translation using T10a following the same procedure as T10 reveals that elongation is substantially slowed, with an average elongation rate of C = 8.1 codons per second when averaged over the entire transcriptome and a substantial 13% of ribosomes stalled (c.f. Table 1). As can be seen in Fig 4, the resulting ratios of tRNAs in free ternary complex (TC) for the T10a simulation varies across tRNA species much more then the T10 simulation, with some tRNA species only having 10% of their total number in free TC. This mismatch between the tRNA abundance and the codon bias of the transcriptome results in several tRNA species having very few of their total numbers available in free TC, resulting in increased decoding times at codons dependent on these tRNAs. However, in contrast with transcriptomes T07, T10, and T25, which have codon biases which match tRNA abundances, the percentage of tRNAs in free TC is roughly uniform across all tRNA species illustrating how the quantities of tRNAs match their rate of usage by translating ribosomes in these cases.
Fig 4
Percentage of each tRNA in free ternary complex.
For each tRNA listed in Table K in S1 Text, the percentage of the tRNA that is in free ternary complex (TC) is computed by taking the ratio of the amount of the tRNA in free TC to the total amount of the tRNA. The average ratio for each tRNA (over the last 100 seconds of the translational simulation) is shown for for the T07 (black), T10 (red), and T25 (blue and T10a (green) transcriptomes. The tRNAs are ordered from lowest percentage to highest for each simulation separately.
Percentage of each tRNA in free ternary complex.
For each tRNA listed in Table K in S1 Text, the percentage of the tRNA that is in free ternary complex (TC) is computed by taking the ratio of the amount of the tRNA in free TC to the total amount of the tRNA. The average ratio for each tRNA (over the last 100 seconds of the translational simulation) is shown for for the T07 (black), T10 (red), and T25 (blue and T10a (green) transcriptomes. The tRNAs are ordered from lowest percentage to highest for each simulation separately.The results for the T10a transcriptomes are similar to a previous Markov model study by Rudorf et al. [28] which showed that tRNA abundances in free TC become skewed, relative to the total tRNA concentration, as a result of mismatches between the tRNAs and the frequency of codon usage in the mRNAs. Table 2 compares the ratio of tRNAs in free TC to the total amount of that tRNA for the transcriptome T10a (green data in Fig 3) with that of Rudorf et al. [28]. As can be seen from the table, there are similar effects on the tRNA abundances between T10a and those reported in Rudorf et al. [28]—e.g. the largest tRNA abundances are in tRNAs-Arg2,Arg3,Arg4, and tRNA-Arg5 while the least is in tRNA-Lys. However, there are some differences, and these are likely due to two factors. First in Rudorf and Lipowsky’s model, Ef-Tu does not have a reaction for GTP/GDP mediated exchange by Ef-Ts, where as in my model this is present (c.f. Fig 1A). This is probably the largest contribution to the difference as roughly 21% of Ef-Tu is in complex with GDP or Ef-Ts (at μ = 1.0) in my model, and therefore unable to bind charged tRNAs to form free TC. Second, the effects of stalling are accounted for in this work, which can slow the time to release tRNAs which decode rare codons, thereby further decreasing their abundance in free TC. This effect can be illustrated in Fig 5 which shows my models predicted average decode times for each codon (in aa/sec), along with stalling frequencies, for each of the 61 sense codons. As can be seen, small effects of codon biases can dramatically alter the codon decoding times producing a feedback on the abundance of tRNAs in free TC, effecting the overall translation properties of the transcriptome.
Table 2
Concentration of tRNA in free ternary complex.
x
T10
T10A
Rudorf
Ala1B
0.3812
0.2862
0.5611
Ala2
0.3776
0.1945
0.4935
Arg2
0.3769
0.5187
0.6801
Arg3
0.3713
0.8276
0.8551
Arg4
0.3826
0.8439
0.8863
Arg5
0.3832
0.8791
0.9006
Asn
0.3737
0.1043
0.4229
Asp1
0.3789
0.3186
0.5356
Cys
0.3825
0.7182
0.8068
Gln1
0.3726
0.6055
0.7520
Gln2
0.3858
0.1511
0.4409
Glu2
0.3765
0.4833
0.6505
Gly1
0.4385
0.8315
0.8806
Gly2
0.3135
0.7791
0.8512
Gly3
0.3722
0.4287
0.6423
His
0.3844
0.1006
0.4258
Ile1
0.3771
0.1003
0.6288
Ile2
0.3799
0.8720
0.8253
Leu1
0.3901
0.5593
0.7177
Leu2
0.3818
0.6342
0.7524
Leu3
0.2783
0.5201
0.6870
Leu4
0.4099
0.8267
0.8680
Leu5
0.2920
0.6998
0.8058
Lys
0.3778
0.0631
0.4149
Metm
0.3885
0.0903
0.4742
Phe
0.3655
0.0735
0.4731
Pro1
0.4454
0.1629
0.5897
Pro2
0.3807
0.6345
0.7566
Pro3
0.2633
0.1034
0.4504
Ser1
0.3327
0.5772
0.7239
Ser2
0.5335
0.7890
0.8632
Ser3
0.3801
0.5844
0.7240
Ser5
0.4295
0.3518
0.5749
Thr1
0.3818
0.1736
0.4444
Thr2
0.5028
0.7457
0.8199
Thr3
0.3818
0.1736
0.4470
Thr4
0.2964
0.4357
0.6307
Trp
0.3829
0.6220
0.7641
Tyr1
0.3794
0.5025
0.6666
Tyr2
0.3794
0.5025
0.6666
Val1
0.3597
0.4052
0.5835
Val2
0.4151
0.4695
0.6494
The ratio of the concentration of tRNA in free ternary complex relative to the total concentration of tRNA at growth rate μ = 1.0 is shown. Data for Rudorf computed from the 2-1-2 model at growth rate μ = 1.07 in supplementary tables S4 and S5 [28].
Fig 5
Predicted codon decoding times and stalling frequency.
The average codon decoding times a stalling frequencies for each sense codon is calculated from the 215M individual decoding events that occur over the entire transcriptome in 1000s of simulation time. Blue and red bars indicate the T10 and T10a transcriptomes, respectively. Stalling events are computed as the frequency of ribosome stalling that occurs at the given codon.
The ratio of the concentration of tRNA in free ternary complex relative to the total concentration of tRNA at growth rate μ = 1.0 is shown. Data for Rudorf computed from the 2-1-2 model at growth rate μ = 1.07 in supplementary tables S4 and S5 [28].
Predicted codon decoding times and stalling frequency.
The average codon decoding times a stalling frequencies for each sense codon is calculated from the 215M individual decoding events that occur over the entire transcriptome in 1000s of simulation time. Blue and red bars indicate the T10 and T10a transcriptomes, respectively. Stalling events are computed as the frequency of ribosome stalling that occurs at the given codon.The source of the slow average translation of the T10a transcriptome is due to the additional reactions governing GTP/GDP exchange on Ef-Tu and the binding of aminocyl-tRNAs to Ef-Tu:GTP to form free TC. These reactions create a situation where aminoacyl-tRNAs must compete for available Ef-Tu:GTP to form free TC. Underused aminoacyl-tRNAs will have an advantage in formation of TC over that of overused ones, resulting in a surplus of underused tRNAs in free TC. Thus, situations where the relative abundances of tRNAs do not match their demand implied by the codon bias of the transcriptome will result in shifts in the amount of tRNA available in free TC (as can be seen in Fig 4 for T10a). To further demonstrate the dramatic difference in translational behaviour that can be observed when all reactions are accounted for, I have performed a simulation where reactions governing the binding of Ef-Tu to newly charged tRNAs are turned off and tRNAs are imediately returned to free TC following ejection from an elongating ribosome. This is the same assumption used by Vieria et al. [4] and assumption 3 in Zouridis et al. [29]. Neglecting these reactions results in average translation rates of C = 18.4 aa/sec for T10 and C = 17.7 aa/sec for T10a, in contrast to the less efficient C = 8.1 aa/sec average translation that results for transcriptome T10a when these reactions are accounted for. Similarly, increasing Ef-Tu (and Ef-Ts) concentrations by 25%, 50% and 75% results in C values for T10a of 12.1, 14.2 and 15.1, respectively, reducing the effects of competition amongst the charged tRNAs for Ef-Tu:GTP as it becomes increasingly abundant. Both of these results are in close agreement with recent models of Rudorf et al. [28, 30] which have also taken into account the effects of competition amongst newly charged tRNAs for Ef-Tu:GTP and the effects of codon bias on the overall translational process.
Ribosomal density on mRNAs is transcriptome dependent
While the tRNA abundance relative to codon usage in the mRNAs appears to have a strong effect on translational efficiency, the length distribution of the mRNAs and their overall quantity control the average ribosomal density over the whole transcriptome (assuming the case of a fixed amount of ribosomal subunits). To illustrate this phenomenon, I have constructed two additional transcriptomes, T10b and T10c. The former, T10b, has essentially the same nucleotide content as T10 (1.24M nt in total), but is biased to short mRNA lengths between approximately 600 and 900 nucleotides. This results in transcriptome T10b, which has roughly the same total number of nucleotides as T10, but a higher number of overall mRNAs at 2040. The latter, T10c, has the same number of mRNAs as T10b, but has no bias in the length selection, and thus has a similar distribution of lengths as T10. Hence, T10c has both higher total nucleotide content (2.0M in this case) and more mRNAs overall compared with T10. To calculate the ribosomal density, I first calculate the average nucleotide distance between ribosomes following [19]
where N is the total number of nucleotides in the transcriptome, β is the ribosome activity, and N50 is the total number of 50S ribosome subunits available. A density value is computed from ρ = 1/d, which is equal to the number of ribosomes per nucleotide. Note that d has a minimal value, corresponding to the ribosomal footprint, indicating the maximal density of ribosomes that is physically possible. It should be noted that estimates for the ribosomal footprint vary from 10 to 14 codons [1, 31]. My model allocates a defined extended footprint (8 codons 5’ of the P site and 6 codons 3’ of it) for the ribosome in contact in the mRNA, following RNAase protection assays discussed in Berkhout et al. [31].Table 1 shows the results for T10b and T10c using the same settings for rates and numbers of tRNAs, etc., as T10. Because initiation occurs at the maximal rate on all mRNAs and a consistent spacing between the first and second ribosomes is achieved on average in the simulations, one might assume that the average number of nucleotides per ribosome should only depend on the total nucleotide content, as assumed in [19]. However, as Table 1 shows, this is not the case as the overall center-center ribosome distance for T10b is d = 86.5, while for T10 it is d = 92.9, indicating a higher density of ribosomes in the T10b transcriptome. The increase in density results in more ribosomes being in a stalled state, with 5% of ribosomes now stalled compared with 3% in T10. Very interestingly, this increased stalling has only a tiny impact on the overall average elongation rate, with C = 17.7 for T10b, compared with C = 18.1 for T10. Thus, although there are more ribosomes stalled at any one time on the mRNAs, these must occur randomly and transiently so that any stalled ribosomes quickly begin moving again before causing additional ribosomes to stall behind them, building into a full traffic jam. Finally, simulating the transcriptome T10c, which has an increase in both the number of mRNAs and total number of nucleotides when compared with T10, it can be observed that the average distance between ribosomes is approximately d = 135.9, indicating a smaller ribosome density in T10c even though the ribosome activity is near 98%.
Simulations using E. coli mRNAs
Although the transcriptomes T07,T10, and T25 all reproduce expected features of the translational processes in vivo that have been highligetd in Bremer and Dennis [19], they are artificial transcritpomes which have codon biases that are not what would be observed in E. Coli. Thus, the question naturally arises as to the ability of the model to reproduce translational dynamics on real mRNA/transcriptome information. To probe this question, I have created transcriptomes T07wt, T10wt and T25wt which have near identical nucleotide and mRNA numbers as the artificial T07,T10, and T25 transcriptomes, but the mRNAs in these transcriptomes have been taken from the predicted open reading frames (ORFs) of E. Coli K12 strainMG1655 (accession code U00096). Table 3 reports on the average translational elongation rates using the tRNA abundances measured by Dong et al. [26], and compares with simulations using the alternative tRNA abundances shown in Table 4. The tRNA abundances in Table 4 represent one possible theoretical solution to the tRNA/codon usage matching problem, i.e. the identification of the relative tRNA abundances such that the percentage of each tRNA out of the total matches the usage frequency of the codon(s) that it decodes for. It is important to note that, due to cross recognition of codons by multiple tRNAs, there are multiple solutions to this problem, and therefore a more optimal solution may exist.
Table 3
Efficiency of translation for wild-type E. Coli transcriptomes.
Trans.
tRNA
N
Nnt
Cp
% I
% E
% T
% S
% 50S
T25wt
Dong
5154
5.01
6.0
2.1
97.2
0.6
33.3
0.6
T10wt
Dong
1259
1.20
4.8
2.3
96.9
0.6
33.2
1.2
T07wt
Dong
646
0.56
0.2
0.1
66.9
0.1
15.9
31.6
T25wt
Tab.4
5154
5.01
17.4
4.8
90.0
1.6
17.3
3.3
T10wt
Tab.4
1259
1.20
14.9
4.5
88.6
1.8
17.1
5.8
T07wt
Tab.4
646
0.56
0.2
0.1
56.6
0.1
16.6
43.5
Values for each transcriptome (T25wt,T10wt,T07wt) are obtained from an average over three separate simulations. The tRNA column denotes the tRNA bias used in the simulation, either the tRNA abundances measured from Dong et al., or the inferred tRNA abundance based on the codon bias from the E. Coli ORFs (Table 4). Variables denote the following quantities: N—number of mRNAs in the transcriptome, N—total number of nucleotides in the transcriptome in millions, C—average protein chain elongation rate (aa/sec), (%I,%E,%T, and %S)—Percentage of ribosomes (I)nitiating, (E)longating, (T)erminating, (S)talled, and %50S—Percentage of 50S ribosomal subunits that are free.
Table 4
A theoretical estimate for the number of tRNA in E. Coli. K12.
x
μ = 0.70
μ = 1.07
μ = 2.50
Ala1
4718
9437
37748
Ala2
1745
3491
13964
Arg2
3177
6354
25416
Arg3
366
733
2932
Arg4
138
276
1104
Arg5
76
153
612
Asn
2666
5333
21332
Asp1
3488
6977
27908
Cys
792
1584
6336
Gln1
1047
2094
8376
Gln2
1972
3945
15780
Glu2
3917
7834
31336
Gly1
474
949
3796
Gly2
815
1631
6524
Gly3
3709
7418
29672
His
1541
3083
12332
Ile1
3794
7588
30352
Ile2
289
579
2316
Leu1
1870
3740
14960
Leu2
1509
3019
12076
Leu3
2010
4020
16080
Leu4
621
1243
4972
Leu5
1251
2502
10008
Lys
2992
5985
23940
Metm
1701
3403
13612
Phe
2645
5290
21160
Pro1
943
1887
7548
Pro2
503
1007
4028
Pro3
1567
3135
12540
Ser1
1091
2182
8728
Ser2
398
796
3184
Ser3
1687
3374
13496
Ser5
760
1520
6080
Thr1
767
1535
6140
Thr2
686
1373
5492
Thr3
767
1535
6140
Thr4
1440
2880
11520
Trp
1040
2080
8320
Tyr1
964
1929
7716
Tyr2
964
1929
7716
Val1
2988
5977
23908
Val2
1816
3632
14528
One possible solution for the number of tRNAs that would be observed in E. Coli K12 substrain MG1655 (accession code U00096) if it is assumed that tRNA abundances match the codon bias in the predicted ORFs.
Values for each transcriptome (T25wt,T10wt,T07wt) are obtained from an average over three separate simulations. The tRNA column denotes the tRNA bias used in the simulation, either the tRNA abundances measured from Dong et al., or the inferred tRNA abundance based on the codon bias from the E. Coli ORFs (Table 4). Variables denote the following quantities: N—number of mRNAs in the transcriptome, N—total number of nucleotides in the transcriptome in millions, C—average protein chain elongation rate (aa/sec), (%I,%E,%T, and %S)—Percentage of ribosomes (I)nitiating, (E)longating, (T)erminating, (S)talled, and %50S—Percentage of 50S ribosomal subunits that are free.One possible solution for the number of tRNAs that would be observed in E. Coli K12 substrain MG1655 (accession code U00096) if it is assumed that tRNA abundances match the codon bias in the predicted ORFs.As can be seen in Table 3, using the experimentally measured tRNA abundances from Dong et al. [26] results in extremely poor translation (C ≤ 6.0 aa/sec) for all of the growth rates. However, by using a tRNA bias which matches the codon frequency in the ORFs of E. Coli, we can see that translation efficiency is dramatically improved for the T25wt and T10wt transcriptomes, while the T07wt transcriptome still suffers from poor transcription. Upon examination of the quasi-steady state distribution of tRNAs in free TC for the T07wt simulation, the tRNAs Arg4 and Arg5 which decode the ultra-rare arginine codons AGA and AGG have essentially zero tRNAs in free TC. This suggests that a minimal threshold of tRNAs may be important for ensuring efficient cycling through the recharging process, and additional constraints on tRNA abundances may exist.Although using the tRNA abundance in Table 4 which matches the codon bias of the mRNAs improves translation in the wt transcriptomes, the translation dynamics do not match the experimental observations of Bremer and Dennis as well as the artificial transcriptome with tRNA abundances from Dong et al. [26]. There are many possible explanations for this, along with potential ways to tune the rates/concentrations to achieve a better fit to experiment for the wild-type transcriptomes. First, it may be that an alternative tRNA distribution which also matches the codon bias, is more efficient, or that the tRNAs need to slightly deviate from the codon bias for other unknown reasons. Second, it could be that the kinetic rates for the elongation process, which were optimized for the Dong tRNA abundances by Rudorf et al. [25] need re-calibrated, or other kinetic rates, such as those for aminoacyl-tRNA recharging, need altering. Third, there may be some known/unknown reactions for elongation or tRNA recharging that need to be accounted for in the model. Fourth, the concentrations of elongation factors such as Ef-Tu may be too low as increasing Ef-Tu increased the translational rates for T10a. Finally, there may be a combined tRNA and kinetic rate optimization which must be done in tandum, along with potentially other factors, to get a good fit of the wild-type transcriptomes to the experimental translational data of Bremer and Dennis [19].
Discussion
In this work, I have developed a stochastic model of ribosome kinetics based on the [20] simulation algorithm which is capable of simulating translation on thousands of unique mRNAs with an explicit nucleotide sequence in an in vivo context. The model incorporates a number of important biophysical properties of the translational process simultaneously and accounts for different codon translation rates, which will in turn depend on tRNA abundance and usage by other mRNAs. Moreover, it accounts for the effects of individual ribosomes competing for the translation factors and a polysomal state of multiple ribosomes to be bound simultaneously on the mRNAs, while also taking into account the excluded volume of the ribosomes and the formation and resolution of traffic jams. The computational code is set up for the prediction of initiation rates on mRNAs based on sequence and secondary structure around the start codon, presenting an opportunity in the near future for experiment and theory to elucidate the impact of sequence on translation initiation.As demonstrated above, my translational models predictions for ribosomal activity β for a given growth rate and mRNA nucletide content are consistent with experimental observations by [19] for E. coli. In addition, the increasing peptide chain elongation rates with increasing growth rates are consistent with experiment over at least three different growth rates [24]. However, unlike in [25], only a single set of kinetic parameters are needed for the elongation reactions in order to reproduce this phenomenon. Indeed, the translational kinetics model reported here displays classic features of a complex dynamical system, where properties emerge from the complex interactions of all the components. The peptide chain elongation rate is a clear example of this phenomenon, as increasing C values emerge naturally as a result of the changes to the complex dynamics of all of the translational components that occur at increasing concentrations. Attempts to increase C values by increasing the concentration of only one or a few components of the system had no effect, further illustrating the complexity of the translational process. This is in direct contrast to Vieira et al. [4], who have not seen this effect using a stochastic framework model of translation. This is likely due to the lack of binding reactions governing the formation of free TC as well as missing termination steps involving the recycling factors. This shows the importance for including these reactions in translational models.However, one of the factors which seems to have a strong influence on the translational dynamics was the composition of the transcriptome, both in its codon bias relative to tRNA concentrations and its length and total nucleotide content. The ultra sensitivity of the translational process on the tRNA abundance relative to codon bias has also been noted by Rudorf et al. when Ef-Tu and TC formation reactions have been accounted for in their translational models [28, 30]. However, preliminary work on reproducing the observed translational dynamics with wild-type E. Coli transcriptomes using mRNAs from predicted ORFs showed that, while matching of the tRNA abundances to codon bias enhanced translational efficiency to values closer to what is expected experimentally, additional optimization of the parameters or the overall model is needed.The implication of the results for tRNA abundances and their effects on translational efficency, as also discussed in Rudorf et al. [28], are that changes to the transcriptomes codon bias from over expression of particular mRNAs (such as lacz) due to cellular responses to the environment could potentially result in alterations to the elongation rates of other mRNAs due to redistribution of tRNAs in free TC. In cell free protein syntehsis applications, the results show a potential need for researchers to carefully tune tRNA concentrations to the demand implied by the transcriptomes codon usage to maximize protein synthesis. These observations points to a need to consider translational kinetics from a more holistic view, where the effects of translation due to the presence of a heterogeneous population mRNAs need to be considered. Moreover, the simulations with T10, T10a, T10b, and T10c, illustrate how ribosome density is likely controlled by a complex set of factors which include the overall number of ribosomes, mRNAs, as well as the lengths of the mRNAs and their relative abundances and ribosomal initiation rates. As Bremer and Dennis have reported, observed ribosomal activity β is roughly constant across the different growth rates of E. coli and the corresponding mRNA nucleotide content reported is approximately 1.0M, 2.0M, and 4.6M nucleotides for growth rates of μ = 0.7, 1.0, and 2.5 doublings per hour, respectively [19]. Although this is slightly different from the values of 0.56M, 1.2M and 5.0M nucleotides used here, the discrepancy is likely due to mRNAs having varying initiation rates and expression levels in reality, while here all mRNAs have been arbitrarily fixed to have maximal initiation rate. Taking into account different initiation rates for mRNAs will likely also result in the average distance between ribosomes being more in line with experimental measurements, which Bremer and Dennis compute to be roughly d = 160 for μ = 1.0, compared with the value of d = 93 I have computed here. However, adjustments to the transcriptome, and to some of the kinetic parameters for the initiation stage of translation, would require knowledge on the initiation rates for all mRNAs in the transcriptome, something beyond the scope of this initial work. Despite this, the translational model used here has elucidated a number of unexpected effects due to the composition of the transcriptome.The model I developed opens up numerous possibilities for linking experimental measurements with theoretical predictions and for testing the physical feasibility of models that have been postulated based on experimental observations. Indeed, through testing and tuning of this model to recapitulate experimental observations, I found a number of experimentally hypothesised models were unable to result in a steady state of protein synthesis (see Supporting Information for discussion). Finally, it can also help in the development of simplified models through the choice of better model parameters. For instance, the observations from this more complex model can help guide TASEP models to consider additional features, such as dynamic initiation times, which may be critical for modeling specific features of the translational process.
Methods
Model construction and parametrization
Explicit details on the construction of the model, along with justification of the kinetic rates and concentrations of the translational factors, are based on a large background of experimental literature and full details can be found in the supporting information. Software and transcriptome data used in the simulations can be downloaded from http://www-users.york.ac.uk/∼ecd502/ or requested from the author via email.
Construction of the transcriptome
The transcriptomes used in the simulations were constructed using the known E. coli proteome, which was obtained from the uniprot database https://www.uniprot.org/proteomes/UP000000625. This database lists the number of amino acids in each of the 4353 identified proteins in the E. coli K12 strain, along with an amino acid sequence for the protein. In order to achieve a maximal translation rate of 22 codons per second at the highest growth rates, codon usage frequency in the mRNAs must match the abundances of the tRNAs which can decode them. Hence, the individual amino acid sequences were ignored and codons for the mRNAs were chosen according to the relative abundance of tRNAs in Table K in S1 Text, while preserving the reading frame length for each gene. For tRNAs such as tRNA-Lys, which is cognate to codons AAA and AAG, it is assumed that the number of tRNAs (4360 at μ = 1.0) implies an equal frequency usage of 2180 for both codons. Summing these contributions to the codons over all tRNAs and normalizing gives the frequency of codon usage, which is then sampled to construct representative mRNAs for each gene in the uniprot database.To construct a transcriptome, individual mRNAs corresponding to a single gene are selected at random, and the number of copies for the mRNA is assigned into either a high (probability 5%), intermediate (probability 35%), or low (probability 60%) expression category. Copy numbers (k) for each mRNA are then determined by sampling from either a Poisson distribution, p(k) = e−λλ/k!, for the high expression (λ = 6.8), or a geometric distribution, p(k) = (1 − λ)λ, for the intermediate (λ = 0.58) and low (λ = 0.93) expression categories. Genes and their corresponding mRNA sequences are selected, one at a time, until a total nucleotide content and/or total number of mRNAs is achieved.
Supplementary information.
(PDF)Click here for additional data file.29 Oct 2019Dear Dr Dykeman,Thank you very much for submitting your manuscript 'A stochastic model for simulating ribosome kinetics in vivo' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the manuscript as it currently stands. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time.Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts.In addition, when you are ready to resubmit, please be prepared to provide the following:(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.(3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are:- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).- Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video.- Funding information in the 'Financial Disclosure' box in the online system.While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here.We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us.Sincerely,Attila Csikász-NagyAssociate EditorPLOS Computational BiologyWilliam NobleDeputy EditorPLOS Computational BiologyA link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:[LINK]Reviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #1: The manuscript develops a stochastic model of protein synthesis, which is used to simulate translation in E. coli cells. I have problems fully evaluating this manuscript, due in part to the fact that the important aspects of the work seem incompletely described, and that novelty is claimed in a way that I do not completely agree with.1) The introduction gives the impression that previous work on modelling protein synthesis has relied almost exclusively on mathematical modelling via the TASEP, that very little work was done on stochastic modelling, and that none of the latter attempted genome-wide models of translation. This is factually untrue – a detailed Gillespie-based model of translation was published eg in 2007 (PMID 17897886) and this was used for modelling genome-wide translation in yeast (22965119, 23791185). The author even cites a more recent model as using a similar approach – simulating translation in the cell or in a cell-free extract into fundamentally different apart from the parameterisation. The principal findings in the publications cited above on translation in vivo do not seem fundamentally different from those described here.2) Although I see strong overlap with previous work, there is some novelty in this manuscript but this is not clearly described. First, the detailed model for translation initiation is novel as far as I know, although this is qualified in the text as “somewhat hypothetical”. If there is uncertainty over the structure or parameterisation of this part of the model, this should have been discussed in more detail. Second, the technical approach used for the modelling described on page 4 seems interesting, but is described without any validation that this produces the same results as the original approach described by Gillespie, or as the faster derivatisations of this approach like tau-leap etc. In my view a more efficient algorithm for stochastic modelling of chemical systems would be very useful, but this would need to be validated by comparing results with more established approaches. I am assuming that the approach used here is more efficient than existing stochastic modelling approaches, if not, it should be explained why a new approach was necessary. It should also be explained how spatial information (ribosome positioning on the mRNAs) was encoded, as this is the biggest problem in standard ODE-based approaches and the reason why papers such as the one cited in the text require such high numbers of equations even for short mRNAs.3) The approach for constructing the transcriptome appears so nonsensical that I wonder whether I misunderstand something fundamental. The author appears to say that they constructed the main transcriptome used in the simulations from proteome data, in such a way that the codon usage of the transcriptome fits the assumed tRNA pool better than experimentally reported transcriptomes. If I understood this correctly, they would model an entirely unphysiological transcriptome?Reviewer #2: In his manuscript, Dr. Dykeman presents to my knowledge the first computational model of in-vivo protein synthesis that is built on a close-to complete set of reactions and captures the stochastic nature of this highly complex process. He uses this model to study fundamental aspects of translation, such as the effect of the system’s tRNA and codon compositions on the availability of free ternary complexes.I find this work highly interesting and believe that it significantly advances the field. In particular, it is the first computational model of protein synthesis for a broad range of bacterial growth rates that uses only a single set of kinetic parameters. A convincing validation of the approach is given by comparing the numerical results with experimental observations.The manuscript is very well organized and the text is very well written, clear and easy to follow. An exceptional comprehensive, thorough and transparent discussion about the limits of the model, which covers many challenges that are common for this type of modeling (e.g. a consistent parameterization), is included.There is only one minor issue I kindly ask the author to comment on:In the abstract you mention that “several biological phenomena are predicted as emergent properties of this complex translational dynamics”. Could you please be a bit more specific so that the readers know what to expect from your paper?Reviewer #3: The paper of Dykeman deals with the simulation of in vivo translation kinetics in E. coli outlining that the cellular context is issential to understand experimental findings.While the approach is basically highly attractive and deserves publication, the paper has some shortcomings that need to be eliminated before publication:1) The author outlines the in vivo character of the simulation. However, the reader cannot understand how (and why) the arbitrary transcriptomes have been designed (lines 154 -156). There are numerous E. coli publications covering genome-scale transcript data. Why should the transcript artifact of the author be superior to experimental measurements?2) The adaptation of nucleotide sequence - apparently a crucial step for sequenced based modelling - is not properly explained (lines 158 - 159). Actually, the need to do so does not get clear at all.3) Apparently, initiation modelling has been simplified ignoring the formation of S30-IF1 and S30-IF2 - why?4) In general: Considering the 'artificial' transcriptome for simulating translation is certainly important and highly attractive. Nevertheless, some findings were already published by other others who focused on single genes alone. Examples are: Niess et al. already discussed impacts on tRNA shortage and stochasticity on translation efficiency, Vieira et al implemented stochastic translational elements, Arnold et al. already published a very complex ODE model in 2005. The author is invited to check further details in the following list:Arnold, S., Siemann-Herzberg, M., Schmid, J., and Reuss, M. (2005). Model-based Inference of Gene Expression Dynamics from Sequence Information. In Biotechnology for the Future, (Springer, Berlin, Heidelberg), pp. 89–179Mehra, A., and Hatzimanikatis, V. (2006). An Algorithmic Framework for Genome-Wide Modeling and Analysis of Translation Networks. Biophysical Journal 90, 1136–1146.Nieß, A., Failmezger, J., Kuschel, M., Siemann-Herzberg, M., and Takors, R. (2017). Experimentally Validated Model Enables Debottlenecking of in Vitro Protein Synthesis and Identifies a Control Shift under in Vivo Conditions. ACS Synth. Biol. 6, 1913–1921.Nieß,A., Siemann-Herzberg,M. and Takors,R. (2019) Protein production in Escherichia coli is guided by the trade‑off between intracellular substrate availability and energy cost. Microbial Cell Factories, 18Shaw, L.B., Zia, R.K.P., and Lee, K.H. (2003). Totally asymmetric exclusion process with extended objects: A model for protein synthesis. Phys. Rev. E 68, 021910.Vieira, J.P., Racle, J., and Hatzimanikatis, V. (2016). Analysis of Translation Elongation Dynamics in the Context of an Escherichia coli Cell. Biophys. J. 110, 2120–2131.Zouridis, H., and Hatzimanikatis, V. (2007). A Model for Protein Translation: Polysome Self-Organization Leads to Maximum Protein Synthesis Rates. Biophysical Journal 92, 717–730.Zouridis, H., and Hatzimanikatis, V. (2008). Effects of Codon Distributions and tRNA Competition on Protein Translation. Biophysical Journal 95, 1018–1033.**********Have all data underlying the figures and results presented in the manuscript been provided?Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology
data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.Reviewer #1: No: I could not locate data underlying figures.Reviewer #2: YesReviewer #3: Yes**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: Yes: Tobias von der HaarReviewer #2: NoReviewer #3: No8 Nov 2019Submitted filename: response.pdfClick here for additional data file.24 Nov 2019Dear Dr Dykeman,Thank you very much for submitting your manuscript 'A stochastic model for simulating ribosome kinetics in vivo' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the manuscript as it currently stands. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time.Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts.In addition, when you are ready to resubmit, please be prepared to provide the following:(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.(3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are:- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).- Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video.- Funding information in the 'Financial Disclosure' box in the online system.While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here.We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us.Sincerely,Attila Csikász-NagyAssociate EditorPLOS Computational BiologyWilliam NobleDeputy EditorPLOS Computational BiologyA link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:[LINK]Reviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #1: I thank Dr Dykeman for the detailed response to my comments.I am generally satisfied with the response to the first two comments. With respect to the “novelty” aspect, my evaluation was too focused on the translation process itself and I had not considered all the additional reactions included in the model, so upon reflection I agree with Dr Dykeman’s argument in this respect. The details of the modified Gillespie algorithm had escaped me but are now clearer and easier to understand for non-experts such as myself.I do however have continuing concerns about the decision to base the majority of the study on a non-physiological transcriptome. Expressed in a slightly snide way to drive home the point, the message is essentially that “because my model does not fit well with experimental data, I make up a dataset that fits the model better and base my study on that”. I thought about this for quite a while because I would like to be fair to the study and its author, but in the end this is not an approach I wish to condone for the following reasons:1) The assumption that it is the experimental data that are are inaccurate, rather than the model, seems arbitrary in the first instance.2) Even if the model is correct and some of the experimental data inaccurate, the transcriptome replacement strategy is underpinned by the very specific but arbitrary notion that it must be the tRNA data that are wrong. Why can it not be the target amino acids per second data that are wrong, or the recharging data (nb “recharging” in the manuscript should actually be “nucleotide exchange” – recharging traditionally refers to tRNA synthetase reactions). It is clearly acknowledged that altering elongation factor re-charging can produce high elongation rate with the actual E. coli transcriptome...3) One of the motivations for replacing the transcriptome is the assumption that tRNAs and codon usage should match perfectly (most directly expressed in lines 283-287). This seems to take that literature observation that there is some correlation between codon usage and tRNA content and interpret it very strongly.4) Overall, I think this study could be re-framed as one that highlights apparent mismatches in experimental data revealed by the modelling, and carefully discusses the potential implications. However, I do not think the current framing where an (in my view) uncertain work-around to the mismatch is provided and this is then used to make strong statements about biological implications works very well.Reviewer #2: Review of PCOMPBIOL-D-19-01692R1My original comment was fully addressed and I still believe that this manuscript deserves publication.However, I now disagree with some parts of the revised manuscript and these issues should be addressed prior to publication.In my opinion, the great advancement of the presented work is the following:by integration of many different biomolecular processes that govern protein biosynthesis into a “holistic” model, many important features of translation (growth rate dependence of translation rates, sensitivity to codon/tRNA composition, usage of the machinery) are shown to be inherent to the system and can be explained without any further assumptions. Neglecting some parts of the system, such as ternary complex formation, leads to a loss of these characteristic features.I agree that it is important to stress this point but while doing so the author must be careful to not make the impression as if some of the _individual features_ have not been studied before. In particular, in the new text the author writes- “While previous stocahstic models have examined competiton between tRNAs in TC for the A-site in elongating ribosomes [2-4], they have not considered the recharging reactions, specifially GTP/GDP exchange on Ef-Tu and formation of TC” (l 324ff)- “…the model I report on here takes into account effects from the additional competition that occurs between aminoacyl-tRNAs for Ef-Tu:GTP during formation of free TC after recharging of tRNA by aminoacyl-TRNA synthase” (l 422ff)- “… the effects on the translational rate from the competition between aminoacyl-tRNAs for formation of free TC is much more dramatic” (l 426f)It is not true that the great influence of ternary complex formation on the translation rate has not been described before. In contrast, that the competition for EF-Tu during TC formation has a tremendous effect on the overall rate of protein synthesis was recently studied and described in great detail (PMID 31369559). Note that neglecting the GDP/GTP exchange does not cause a fundamental difference between the published and your model, because including the GDP/GTP exchange basically reduces the effective concentration of EF-Tu available for ternary complex formation. The mentioned paper demonstrates that such a reduction (independent of its origin) strongly affects protein synthesis.- “Thus, situations where the relative abundances of tRNAs do not match their demand implied by the codon bias of the transcriptome will result in shifts in the amount of tRNA available in free TC” (l 330ff)- “This observation implies that tRNA abundances must closely match the levels of demand placed on them as a result of the codon bias in the transcriptome for efficient translation to occur” (l 428ff)In the paper mentioned above (PMID 31369559) it was shown in detail that a disbalance between codon usage and tRNA abundance is the driving force behind the sensitive dependence of translation on TC formation (i.e., EF-Tu availability). That this causes “shifts in the amount of tRNA available in free TC” was also already demonstrated previously (PMID 26270805).- “… removing the reactions involved in Ef-Ts mediated GDP/GTP exchange on Ef-Tu, a feature that is often neglected in translational models, resulted in similar average translational speeds for T10a and T10 transcriptomes, despite the difference between the demand for tRNA species implied by the codon usage in T10a and the tRNA abundances present”(l 430ff)Again: That (effectively) increasing the amount of EF-Tu available for TC formation (which is exactly what happens when GDP/GTP exchange is ignored) relieves the sensitivity of the speed of translation for disbalances in the tRNA/codon composition was demonstrated before (PMID 31369559). The observation made by the author is thus in full agreement with the previously published model and the statement is important to make because in most translation models the strong impact of the availability of EF-Tu for TC formation is indeed ignored. But again, the author should mention that he is not the first/only to make this observation and must put his results in the context of previous publications.Side note: This implies that the strong dependency of the results on the chosen transcriptome is “caused” by the chosen value for the EF-Tu concentration. You could check if the differences between T10 and T10a still remain when you increase the EF-Tu by e.g. 10, 20 or 50 % (but keeping the GDP/GTP exchange reaction). At some point the differences should vanish.In this context one could also discuss that EF-Tu has many functions in bacteria in addition to its main role as elongation factor. That means that other cell processes sequester EF-Tu, thus reduce the amount of EF-Tu available for TC formation and increase the need for a well-balanced tRNA/codon composition.Reviewer #3: The author has covered most questions of the reviewer very well. Some optimization potential still exists with respect to the cited literature. But this is a very minor remark.**********Have all data underlying the figures and results presented in the manuscript been provided?Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology
data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.Reviewer #1: YesReviewer #2: YesReviewer #3: Yes**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: Yes: Tobias von der HaarReviewer #2: NoReviewer #3: No9 Dec 2019Submitted filename: response2.pdfClick here for additional data file.19 Dec 2019Dear Dr Dykeman,We are pleased to inform you that your manuscript 'A stochastic model for simulating ribosome kinetics in vivo' has been provisionally accepted for publication in PLOS Computational Biology.Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Once you have received these formatting requests, please note that your manuscript will not be scheduled for publication until you have made the required changes.In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pcompbiol/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process.One of the goals of PLOS is to make science accessible to educators and the public. PLOS staff issue occasional press releases and make early versions of PLOS Computational Biology articles available to science writers and journalists. PLOS staff also collaborate with Communication and Public Information Offices and would be happy to work with the relevant people at your institution or funding agency. If your institution or funding agency is interested in promoting your findings, please ask them to coordinate their releases with PLOS (contact ploscompbiol@plos.org).Thank you again for supporting Open Access publishing. We look forward to publishing your paper in PLOS Computational Biology.Sincerely,Attila Csikász-NagyAssociate EditorPLOS Computational BiologyWilliam NobleDeputy EditorPLOS Computational BiologyReviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #1: Once again I thank Dr Dykeman for the thorough and engaged response. The arguments put forward in his response make sense, and I note that the rebuttal is essentially framed as the discussion I had suggested of whether the alternative transcriptome is useful (and what for). I would have preferred if more of this discussion had found its way into the actual manuscript. However, given that the other two reviewers did not share my concerns over the use of the alternative transcriptome, and that I think the manuscript as it stands sufficiently alerts readers to the implications of using this transcriptome, I'm happy to not insist further on this issue.Reviewer #2: All of my concerns have been resolved and I have no further comments.**********Have all data underlying the figures and results presented in the manuscript been provided?Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology
data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.Reviewer #1: YesReviewer #2: Yes**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: Yes: Tobias von der HaarReviewer #2: No22 Jan 2020PCOMPBIOL-D-19-01692R2A stochastic model for simulating ribosome kinetics in vivoDear Dr Dykeman,I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!With kind regards,Laura MallardPLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol
Authors: Gina Cannarozzi; Gina Cannarrozzi; Nicol N Schraudolph; Mahamadou Faty; Peter von Rohr; Markus T Friberg; Alexander C Roth; Pedro Gonnet; Gaston Gonnet; Yves Barral Journal: Cell Date: 2010-04-16 Impact factor: 41.582