Inbal Mizrahi1, Robijn Bruinsma1,2, Joseph Rudnick1. 1. Department of Physics and Astronomy, University of California, Los Angeles, California, United States of America. 2. Department of Chemistry and Biochemistry, University of California, Los Angeles, California, United States of America.
Abstract
The paper presents a statistical-mechanics model for the kinetic selection of viral RNA molecules by packaging signals during the nucleation stage of the assembly of small RNA viruses. The effects of the RNA secondary structure and folding geometry of the packaging signals on the assembly activation energy barrier are encoded by a pair of characteristics: the wrapping number and the maximum ladder distance. Kinetic selection is found to be optimal when assembly takes place under conditions of supersaturation and also when the concentration ratio of capsid protein and viral RNA concentrations equals the stoichiometric ratio of assembled viral particles. As a function of the height of the activation energy barrier, there is a form of order-disorder transition such that for sufficiently low activation energy barriers, kinetic selectivity is erased by entropic effects associated with the number of assembly pathways.
The paper presents a statistical-mechanics model for the kinetic selection of viral RNA molecules by packaging signals during the nucleation stage of the assembly of small RNA viruses. The effects of the RNA secondary structure and folding geometry of the packaging signals on the assembly activation energy barrier are encoded by a pair of characteristics: the wrapping number and the maximum ladder distance. Kinetic selection is found to be optimal when assembly takes place under conditions of supersaturation and also when the concentration ratio of capsid protein and viral RNA concentrations equals the stoichiometric ratio of assembled viral particles. As a function of the height of the activation energy barrier, there is a form of order-disorder transition such that for sufficiently low activation energy barriers, kinetic selectivity is erased by entropic effects associated with the number of assembly pathways.
When the molecular components of a single-stranded (ss) RNA virus assemble and form virions in the cytoplasm of an infected cell, genomic viral RNA molecules (gRNA) compete for packaging with a large pool of—quite similar—host messenger RNA (mRNA) molecules for packaging by the viral capsid proteins [1]. For example, for the case of influenza the number of gRNA molecules inside an infected cell is less than 104 [2] while the total number of host mRNA molecules is in the range of 3.6 × 105.Viral RNA selection relies on so-called packaging signals [3-9]. These are short RNA stem loop motifs that are a part of the secondary structure of gRNA molecules. Importantly, any ssRNA molecule with the molecular weight of a gRNA molecule has numerous hairpins, some of which may be similar or identical to one of the packaging signals of a virus. In order to avoid the packaging of mRNA material individual packaging signals should not trigger virion assembly. Viral gRNA molecules typically have a coordinated pattern of packaging signals that collectively direct the assembly, sometimes called a “ψ sequence”. These virus-specific interactions between packaging signals and capsid proteins operate together with a generic, non-specific electrostatic affinity between the negatively charged RNA nucleotides and positively charged residues of the capsid proteins [10-13].It has been demonstrated that the spontaneous self-assembly of empty icosahedral capsids is initiated by the formation of a nucleation complex composed of a limited number of capsid proteins [14-16]. This nucleation complex can be compared to the critical nucleus of the kinetic theory of nucleation and growth [17, 18]. The energetically uphill formation of the nucleation complex is followed by the energetically down-hill growth (or “elongation”) process that ends in the formation of a closed capsid. The self-assembly of empty capsids does not (and should not) take place under physiological conditions. Under those conditions electrostatic repulsion between the capsid proteins is just able to overcome the hydrophobic affinity between capsid proteins. The physical aspects of RNA packaging have been extensively studied experimentally and theoretically [10, 11, 19–35] as well as by numerical modeling [13, 36–38]. For reviews see refs. [18, 39, 40]. Theoretical models have generally focused on the minimization of the free energy of a fully assembled viral particle. This produced global measures for the “packaging fitness” of an RNA molecule in terms of its length, the degree of branching and compactness, and the effects of electrostatics and osmotic pressure.The pioneering work by Aaron Klug on TMV [41] showed that the action of gRNA on assembly is two-fold. On the one hand, negative charges of the RNA molecules neutralize—on a non-specific basis—positive capsid protein charges. This shifts the overall equilibrium free-energy balance from a dispersed state towards aggregation. On the other hand, the specific packaging signals on gRNA act as catalysts that lower the activation energy barrier of the nucleation complex. In the view of Klug, the packaging signals affect the assembly kinetics while in a thermodynamic view the role of the packaging signals would be to further tilt the free energy balance in favor of packaging. The most well-studied case of RNA selectivity is probably that of the HIV-1 retrovirus (see ref. [42] and references therein). RNA selectivity depends on the cooperative action of a cluster of packaging signals located at the 5’ end of the gRNA molecule, the ψ sequence. It is about a hundred nucleotides long, which is very small compared to the total length of the HIV-1 genome (about 104 nucleotides). gRNA selection appears to take place very early, during the nucleation stage of the assembly process when the ψ sequence interacts with only a small group of capsid proteins. Changing the RNA sequence of other sections of the genome molecules does not appear to affect the selectivity. The gRNA molecules appear to have no thermodynamic advantage over non-viral RNA molecules of the same length [43]. This indicates that the origin of the very efficient gRNA selection mechanism of HIV-1 must be sought in the kinetics of the assembly process. Finally, recent progress in the asymmetric image reconstruction of certain small RNA viruses [44] indicate that also in those cases the RNA selection process takes place early in the assembly process. Asymmetric reconstruction of the MS2 phage virus [45] shows that RNA packaging signals associate reproducibly with a specific section of the interior of the capsid. The authors proposed a model for viral assembly in which a spatial distribution of packaging signals functions as a virus-specific “map” for the initial nucleation stage of the assembly while the subsequent elongation step of the assembly is driven more by non-specific interactions. This scenario appears similar to that of the TMV and HIV-1 viruses. There certainly are also counter examples where free energy minimization accounts for selection. For example, the asymmetric reconstruction of the CCMV and BMV plant viruses produced only a very small amount of reproducible RNA-protein association [46]. Interestingly, this group of viruses is much less selective than MS2. CCMV capsid proteins appear to select BMV genome molecules over CCMV genome molecules while they can package a wide variety of non-viral ss RNA molecules and even non-RNA polyelectrolytes [47, 48]. Apparently, the amount of CCMV gRNA molecules produced inside an infected cell is sufficiently large so there is no need for very precise assembly selectivity.In this paper we propose a simple statistical-physics model to study the physics of selective nucleation by a group of packaging signals that encode the assembly of a small ssRNA virus. By construction, the model focuses exclusively on kinetic selection. In the conclusion we will return to the relation between the thermodynamic and kinetic modes of selection.
Model and methods
Spanning tree model
The starting state of the system is assumed to be a solution containing a certain concentration of condensed, folded viral RNA molecules and of pentameric capsid proteins. The folded molecules have the same interior structure and dimensions and differ only in terms of the ψ section of contiguous packaging signals distributed over the surface of the condensed RNA molecule. The capsid of the virus is assumed to be composed of twelve protein pentamers assembled into a dodecahedron such that double-stranded (ds) RNA sequences line the edges of the pentamers. This is inspired by the family of the Nodaviridae in which part of the ssRNA genome are ds sequences forming a dodecahedral cage [49]. The secondary structure of the ψ section is represented as a tree of nineteen links and twenty nodes connecting the vertices of the dodecahedron. The pentameric capsid itself is the well-studied Zlotnick model system for empty capsids [50-53]). The geometry of the ψ section is assumed to be adapted to the dodecahedral capsid so that the twenty nodes of the secondary structure match up with the twenty vertices of the dodecahedron. Despite these constraints there still are tens of thousands of secondary structures that satisfy these constraints. In the mathematical literature, these structures are known as the spanning tree graphs of a dodecahedron [54]. The number of nodes of a spanning tree is twenty because a spanning tree must visit all the vertices, of which there are twenty. The number of links is nineteen because in any connected tree graph the number of links is one less than the number of nodes. A spanning tree leaves eleven edges of the dodecahedron uncovered. We will assume that these remaining edges have only a generic affinity for the capsid proteins. Fig 1 shows an example of a spanning tree of the dodecahedron.
Fig 1
Left: Branched spanning tree connecting the vertices of a dodecahedron. Six pentamers can be placed on the dodecahedron such that their edges make the maximum of four contacts with links of the spanning tree. Right: Planar representation of the spanning tree.
Left: Branched spanning tree connecting the vertices of a dodecahedron. Six pentamers can be placed on the dodecahedron such that their edges make the maximum of four contacts with links of the spanning tree. Right: Planar representation of the spanning tree.Initially, all pentamers are in solution at a certain total concentration c0. The pentamers are assumed to have a generic affinity (electrostatic in actuality) for all edges of the dodecahedron plus a specific affinity for those edges that are covered by links of the spanning tree (acting as packaging signals). The specific affinity is maximized by placing the pentamer on a location such that four of its edges can associate with a link of the chain. For the tree molecule shown in Fig 1, a total of six pentamers can be placed on such maximum wrapping locations. We will say that the wrapping number of this tree structure is N = 6. The wrapping number is a characteristic of the folding geometry of the RNA molecule.The very simplest spanning tree is a linear chain composed of nineteen links. Fig 2 shows how a linear chain can be distributed over a dodcahedron, while visiting all vertices. As shown in Fig 2, only two pentamers can be placed on locations such that four of its edges associate with a link of the chain. There are 1620 different configurations for a linear chain of twenty nodes to be distributed over of a dodecahedron such that the nodes coincide with the vertices known in the mathematical literature as Hamiltonian paths. Hamiltonian paths have been used before to classify viral RNA configurations [8, 55, 56]). Hamiltonian paths of the dodecahedron can have wrapping numbers two, three, or four. There are important differences between the linear and the branched cases. In the linear case, pentamers have embeddings with two, three or four edges occupied while for the branched case, there are embeddings with zero, one, two, three or four edges occupied.
Fig 2
Example of a spanning tree in the form of a Hamiltonian path of a dodecahedron.
In blue are shown two pentamers that can be placed on the dodecahedron such that their edges make the maximum of four contacts with links of the spanning tree.
Example of a spanning tree in the form of a Hamiltonian path of a dodecahedron.
In blue are shown two pentamers that can be placed on the dodecahedron such that their edges make the maximum of four contacts with links of the spanning tree.Next, the edges of a pentamer will be assumed to have affinity for the edges of other pentamers. It is this affinity that drives the assembly of empty capsids. The wrapping number does not measure how many attractive contacts a newly added pentamer can make with pentamers that were placed earlier on the dodecahedron. The more compact a spanning tree, the larger the probability that two maximally wrapped pentamers also are able to share an edge. Fig 3 illustrates the important role of compactness of the spanning tree. The figure shows the first three steps of the minimum energy assembly pathways of two spanning trees, one with a wrapping numbers N = 5 and the other with N = 6. It is assumed that the edge-edge affinity exceeds the edge-link affinity. In both cases, the first two pentamers can be placed on adjacent sites with maximum wrapping so they have one shared edge. The assembly energy is the same at this point. A difference appears when the third pentamer is placed on the assembly. For the more compact N = 5 molecule, the third pentamer can be placed on a maximum wrapping site where it has two shared edges with the two pentamers already present but this is not possible for the less compact N = 6 tree. It follows that the minimum assembly energy of a three pentamer cluster for the N = 5 molecule is lower than that of the N = 6 molecule. The wrapping number is thus, by itself, insufficient as an index that can predict which spanning trees favor assembly nucleation.
Fig 3
The first three steps of the minimum energy assembly pathways of two molecules, with wrapping numbers N = 6 (top) and N = 5 (bottom) respectively, for the case that the affinity of pentamer edges for each other exceeds the specific affinity for the spanning tree.
The more compact N = 5 tree allows three pentamers on sites with four links with each pentamer making two edge-to-edge contacts. For the less compact N = 6 tree, the third pentamer only makes three contacts with a link.
The first three steps of the minimum energy assembly pathways of two molecules, with wrapping numbers N = 6 (top) and N = 5 (bottom) respectively, for the case that the affinity of pentamer edges for each other exceeds the specific affinity for the spanning tree.
The more compact N = 5 tree allows three pentamers on sites with four links with each pentamer making two edge-to-edge contacts. For the less compact N = 6 tree, the third pentamer only makes three contacts with a link.The Maximum Ladder Distance (MLD) has been used to characterize the degree of compactness of the secondary structure of complete gRNA molecules and as a measure of the size of an RNA molecule in solution [57, 58]. In the mathematical literature, the ladder distance (or LD) between two nodes of a tree graph is defined as the number of links of the graph along a minimum length path separating the two nodes. The MLD of a tree graph is the largest LD of the graph. In the language of graph theory, the LD between two nodes of a tree graph is known as the “distance” between the two nodes and the MLD as the “diameter” of a tree graph [59]. Below we apply the MLD concept only to the ψ section of the complete molecules, not to the RNA molecule as a whole. The MLD of the branched spanning tree that was just discussed is nine while it is nineteen for the linear tree. Unlike the wrapping number, the MLD is a characteristic that is determined entirely by the topology of the planar graph of the secondary structure of the ψ sequence. Unlike the wrapping number, it remains the same for different folding geometries. We will see that the combination of the wrapping and MLD numbers together forms a satisfactory index for the effectiveness of a spanning tree as a nucleation catalyst.
Minimum-energy assemblies
A spanning tree/pentamer structure with n pentamers is assigned an assembly energy
with respect to an RNA molecule without pentamers. Here, n1 is the number of links of the spanning tree that lie along a pentamer edge that is not shared with another pentamer, n2 is the number of spanning tree links that lie along a pentamer edge that is shared with another pentamer while n3 is the number of edges shared between two pentamers that are not associated with a link of the spanning tree. The energy scale E0 is the binding energy between two pentamer edges in the absence of specific RNA-pentamer affinity. Energies will be expressed in units of E0. The physical meaning of the dimensionless parameter −ϵ1 is that of the ratio of the affinity of a spanning tree link for a pentamer edge over E0. Interactions between edges and spanning tree links will be assumed to be additive. In that case the dimensionless ϵ2 parameter equals ϵ2 = −1 + 2ϵ1 since the RNA link interacts with two pentamer edges. Finally, μ0 is the chemical potential of pentamers in solution at a certain reference concentration. The reference chemical potential includes the non-specific affinity of a pentamer for the RNA condensate. The reference chemical potential will be chosen so that ΔE(12) is close to zero, so when the chemical potential in solution at the reference concentration is the same as that of a pentamer that is part of an assembled capsid. This is the case if ΔE(12) = 19ϵ2 − 11 − 12μ0 is zero. Note that ΔE(12) is the same for all spanning trees so different spanning trees have the same assembly energy (we will see later that the assembly free energy is not the same for all trees).Two examples of minimum-energy assembly profiles near assembly equilibrium with c0 = 1 are shown in Fig 4 The left figure shows the case of an N = 8, MLD = 9 spanning tree. Recall that such a spanning tree is maximally adjusted for pentamer binding. The right figure shows the case of an N = 2, MLD = 19 spanning tree, which has minimal adjustment for pentamers. The activation energy is about two E0 higher in the second case. The assembly energy profiles of spanning trees with the same N and MLD are nearly always the same.
Fig 4
Minimum-energy assembly energy profiles for N = 8, MLD = 9 spanning trees (left) and for an N = 2, MLD = 19 spanning tree (right).
Energy parameters are ϵ1 = −0.5 and μ0 = −4.0 (close to assembly equilibrium where μ0 ≃ − 4.083. Energies are expressed in units of the overall scale E0. The assembly energy profiles of spanning trees with the same N and MLD are nearly always the same. The pathways for the small number of exceptions are shown in the figure.
Minimum-energy assembly energy profiles for N = 8, MLD = 9 spanning trees (left) and for an N = 2, MLD = 19 spanning tree (right).
Energy parameters are ϵ1 = −0.5 and μ0 = −4.0 (close to assembly equilibrium where μ0 ≃ − 4.083. Energies are expressed in units of the overall scale E0. The assembly energy profiles of spanning trees with the same N and MLD are nearly always the same. The pathways for the small number of exceptions are shown in the figure.These assembly energy profiles are consistent with a nucleation-and-growth scenario close to the equilibrium assembly threshold. As expected, the activation energy barrier of the N = 8, MLD = 9 spanning tree (about 3.0E0) is lower than that of the N = 2, MLD = 19 spanning tree (about 5.0E0). The long straight section of the N = 8, MLD = 9 energy profile can be understood by noting that when a pentamer is added to one of the eight maximum wrapping sites then that lowers the energy by 4ϵ1 in units of E0. If additional pentamers always make two new contacts with pentamers that are already present—as is indeed the case here—then each added pentamer lowers the energy further by an amount of 2 E0 minus the chemical potential μ0. In this particular case, these two terms cancel so the assembly energy barrier has a wide and flat top. Starting from a linear chain with MLD = 19 and N = 2 and then stepwise decreasing the MLD and increasing the N one finds that the assembly energy activation barrier nearly always systematically decreases. Assembly on a spanning tree with the minimum MLD and the maximal wrapping number N has in general the lowest possible assembly activation energy barrier.An illustration of the assembly sequence of an N = 8, MLD = 9 spanning tree spanning tree is shown in Fig 5 for the case that −ϵ1 is less than 0.5. The first eight pentamers all can be placed on sites that maximize the number of spanning tree link contacts (four) as well as the number of attractive pentamer-pentamer contacts. The second pentamer creates one new pentamer-pentamer contact while the next three pentamers create two new pentamer-pentamer contacts. All pentamer assembly intermediates are compact structures. In summary, the combination of the wrapping number and the MLD appears to be a good, though not perfect, code for the height of the assembly energy activation barrier.
Fig 5
A spanning tree with a maximum wrapping number of eight and the minimum MLD of nine.
The first five pentamers can be placed on sites that maximize both the number of available pentamer-pentamer contacts and pentamer-spanning tree link contacts (four). The sixth pentamer, shown separately with a different perspective, makes three pentamer-pentamer contacts but can only make two spanning tree link contacts.
A spanning tree with a maximum wrapping number of eight and the minimum MLD of nine.
The first five pentamers can be placed on sites that maximize both the number of available pentamer-pentamer contacts and pentamer-spanning tree link contacts (four). The sixth pentamer, shown separately with a different perspective, makes three pentamer-pentamer contacts but can only make two spanning tree link contacts.Next, we explored the configuration space of assembly intermediates. Fig 6 shows graphs of all distinct assembly intermediates for molecule (1) and (2) as well as the minimum energy assembly pathways that link them. Each node of the network stands here for a physically distinct assembly intermediate (assemblies that are related by a symmetry operation of the dodecahedron are treated here as the same). Nodes are assigned “coordinates” (n, i) with n = 0, 1, …., 12 the number of pentamers of the intermediate and with i = 1, 2, ……, m an index ranging over the distinct n-pentamer states where m is the multiplicity of the n-pentamer state (e.g., m5 = 4 for the N = 8, MLD = 9 spanning tree). A black line linking two dots indicates that the two states can be interconverted by addition or removal of a pentamer. Assembly of viral particles can be viewed as a net “current” flow from the n = 0 source state to the n = 12 final state along all possible paths across the network linking the initial state to the final state. Under conditions of thermodynamic equilibrium, the current across each individual link should be zero according to the principle of detailed balance. Note that the compact, branched spanning tree molecule (1) (left) has far fewer assembly intermediates and assembly pathways than the linear structure molecule (2) (right).
Fig 6
Examples of minimum energy assembly paths of an N = 8, MLD = 9 spanning tree (left) and an N = 2, MLD = 19 spanning tree (right).
A node (indicated by a dot) indicates a physically distinct intermediate structure with, from left to right, n = 0, 1, …., 12 pentamers. The number m of vertical dots for given n is the multiplicity, i.e., the number of distinct n-pentamer intermediates. A link connecting two nodes indicates that the two states are related by addition or removal of a pentamer. Every possible path from n = 0 to n = 12, including back steps, represents a possible minimum energy assembly pathway. During assembly there is a net current from the n = 0 initial state to the n = 12 final state while in thermal equilibrium the net current across every link is zero.
Examples of minimum energy assembly paths of an N = 8, MLD = 9 spanning tree (left) and an N = 2, MLD = 19 spanning tree (right).
A node (indicated by a dot) indicates a physically distinct intermediate structure with, from left to right, n = 0, 1, …., 12 pentamers. The number m of vertical dots for given n is the multiplicity, i.e., the number of distinct n-pentamer intermediates. A link connecting two nodes indicates that the two states are related by addition or removal of a pentamer. Every possible path from n = 0 to n = 12, including back steps, represents a possible minimum energy assembly pathway. During assembly there is a net current from the n = 0 initial state to the n = 12 final state while in thermal equilibrium the net current across every link is zero.An important simplification ensues when we ignore the very small number of assembly energy profiles that do not conform to the quasi-universal profile for given N and MLD discussed above. If we do that then different nodes of the network with the same n all have have the same assembly energy ΔE(n). This simplification allows us assign to each node a Boltzmann equilibrium probability:
which depends on the concentration c of pentamers free in solution (expressed in units of the reference concentration). The proportionality factor in the expression is determined by the normalization condition . We will make this simplification in the following sections.
Master equation
In this section we define the Master Equation that governs the kinetics. We will use the coordinate system for the network graphs defined below Fig 6. The network geometry of a particular spanning tree is specified in the form of an adjacency matrix
that equals one if a link connects node n, i to node n + 1, j while it equals zero if there is no link. Each node n, i of the graphs has a time-dependent occupation probability P(t). The kinetics is expressed as a set of coupled rate equations for the and is assumed to be a Markov process with probabilities evolving in time according to the Master Equation [60]:
Here, W is the on-rate for the transition of an assembly of n pentamers to one with size n+ 1 by the addition of a pentamer while W is the off-rate at which a pentamer is removed from an assembly of size n. We assume a simplified diffusion-limited chemical kinetics (see [61] and Supplementary Information S1 Text) in which the addition or removal of a pentamer to an assembly of size n is treated as a bimolecular reaction. The resulting on-rate has the form of a kinetic Monte-Carlo algorithm:
with c again the concentration of free pentamers, ΔΔE = ΔE(n + 1) − ΔE(n) the energy cost of adding a pentamer, and λ a base rate that depends on molecular quantities such as diffusion coefficients and reaction radii but not on the concentrations. The inverse of λ is the fundamental time-scale of the kinetics. In the following, time will be expressed in units of 1/λ. If ΔΔE is negative then the on-rate is equal to this base rate. If ΔΔE is positive then the base rate is reduced by the Arrhenius factor . The off-rate entries W are determined by the on-rates through the condition of detailed balance:Below, we will limit ourselves to the case of solutions containing only two species of spanning trees, namely molecules (1) and (2), having the same solution concentrations. During assembly, the two species compete for the same concentration c of pentamers. The ratio of c over the total pentamer concentration c0 is determined by mass conservation:
where superscripts denote the kind of spanning tree. Next, D ≡ 12r/c0, with r the total RNA concentration, is the RNA to protein mixing ratio, an important thermodynamic control parameter for the assembly process. If D = 1 then there are exactly enough pentamers to encapsidate both types spinning trees, which corresponds to the stoichiometric ratio. Since the occupation probabilities that we seek to obtain enter in this relation, the rate equations form a coupled, non-linear set of two times twelve differential equations (for the case of just a single species in solution, there are twelve differential equations where one must replace D/24 by D/12).
Numerical construction and implementation of coupled master equations with competition
Coupled master equations for packaging competition between pairs of spanning trees were integrated using a Mathematica program (available on request). The program was organized as follows.
First step: Construction of spanning trees
First, we populate in all possible ways nineteen of the thirty one edges of the dodecahedron with links. Next we identify graphs that have the properties that (i) the graph is connected and that (ii) the graph has twenty vertices. All graphs having these two properties are spanning trees because (i) any connected graph with n edges and n + 1 vertices is a tree and (ii) they are spanning trees because a dodecahedron has twenty vertices. This method generates many duplicates so the next step is winnowing the trees down to a collection of trees that cannot be mapped into each other by any of the 120 rotations and reflections that map a dodecahedron into itself. This is a straightforward (though laborious) process that involves choosing a tree and then eliminating all other trees that can be mapped into it by a reflection or a rotation. One can speed the process up by separating trees into subsets having the same MLD, as the MLD is a topological property that is preserved under rotation and reflection.
Second step: Choice of a pair of spanning trees
Two trees are chosen from the library of all unique spanning trees of the dodecahedron. The trees are indexed by their MLD and N numbers.
Third step: Specification of the assembly energies
Values are assigned to the energy parameters E0, ϵ1 and μ0, as defined above.
Fourth: Determination of the assembly network
All physically distinct minimum energy assemblies are determined forming the nodes of the assembly graph. For every node, a list is made of nodes with one more or or one less pentamer from which the adjacency matrix is determined.
Fifth step: Construction of the master equation
The assembly/disassembly process is assumed to occur by the addition of a pentamer to, or removal of a pentamer from, a structure. The steps of addition or removal are controlled by two considerations. The first is the increase or decrease in the energy of the system as determined by the energy of the partially or completely assembled capsid and an assigned chemical potential. The second is the availability of pentamers, quantified by the concentration in solution of available pentamers. The free energy increment determines the relative likelihoods of addition or removal of a pentamer, via the kinetic Monte-Carlo rates that were defined above. The concentration of available pentamers controls the rate of the assembly/disassembly process, in that disassembly takes place at a fixed rate and assembly at a rate proportional to the number of available pentamers.
Time-scales
As a preliminary, Figs 7 and 8 show the packaging kinetics of a single species of RNA molecules. The parameters are E0 = 4kT, ϵ1 = −0.5, c0 = 1 and D = 0.5 and μ0 = −4. Fig 7 shows the case of the class of MLD = 9 and N = 8 spanning trees and Fig 8 that of the class of MLD = 19, N = 2 spanning trees. In the late time limit, the two classes approach thermal equilibrium with roughly the same fraction of RNA molecules being packaged (about sixty six percent). This reflects the fact that all classes of molecules have—by construction—the same assembly energy (the remaining small difference is due to the fact the entropy in the assembly free energy is not the same for the two classes). The reason that in both cases a significant fraction of RNA molecules has not been packaged, despite the fact that there are twice as many pentamers as needed to package the RNA molecules, is that the chemical potential is close to assembly equilibrium so a significant number of RNA molecules remain free of pentamers. There is a large difference between the thermalization times of two classes, roughly 103 time units for MLD = 9 and N = 8 spanning trees and 105 time units for MLD = 19, N = 2 spanning trees, consistent with the fact that the assembly activation barrier is about two times E0 larger (i.e., about 8kT) for the MLD = 19, N = 2 spanning trees. These thermalization times must be compared with the assembly delay time
t, which is defined as the time lag between the establishment of solution assembly conditions and the first appearance of assembled viral particles. Measured delay times for the assembly of empty capsids are in the range of seconds to minutes [14-16]. We obtain t from the intersection of the tangent to P12(t) at the point of maximum slope with the horizontal axis (see Fig 9).
Fig 7
Packaging kinetics of MLD = 9 and N = 8 spanning trees.
Parameter values are E0 = 4kT for the energy scale, ϵ1 = −0.5, c0 = 1, D = 0.5 and μ0 = −4. Bottom: Packaging kinetics of MLD = 19, N = 2 spanning trees with the same parameters.
Fig 8
Packaging kinetics of MLD = 19, N = 2 spanning trees with the same parameters as those of the previous figure.
Fig 9
Definition of the delay time as the intersection of the maximum slope tangent to the assembly curve P12(t) with the time axis, as indicated by the arrow.
MLD = 9, N = 8, ϵ = 0.5, c0 = 1, D = 0.5 and μ0 = −4.
Packaging kinetics of MLD = 9 and N = 8 spanning trees.
Parameter values are E0 = 4kT for the energy scale, ϵ1 = −0.5, c0 = 1, D = 0.5 and μ0 = −4. Bottom: Packaging kinetics of MLD = 19, N = 2 spanning trees with the same parameters.
Definition of the delay time as the intersection of the maximum slope tangent to the assembly curve P12(t) with the time axis, as indicated by the arrow.
MLD = 9, N = 8, ϵ = 0.5, c0 = 1, D = 0.5 and μ0 = −4.For the case of the MLD = 9 and N = 8 class of spanning trees, this gives about 8.5 time units as shown. Other classes have comparable delay times. It follows that our time unit is roughly in the range of five seconds. The thermalization time under conditions of assembly equilibrium would then be in the range of a prohibitively long two hundred hours for MLD = 9, N = 8 spanning trees and two orders of magnitude longer for the MLD = 9, N = 8 spanning trees.
Packaging competition
In Figs 10, 11 and 12 we show the result of a calculation with the same total amount of RNA molecules and pentamers as before but now with half of the RNA molecules MLD = 9 and N = 8 spanning trees and the other half MLD = 19, N = 2 spanning trees.
Fig 10
Packaging competition between MLD = 9, N = 8 spanning trees and MLD = 19, N = 2 spanning trees with the same parameters as the previous figure on a time scale of 108 units.
Fig 11
Same as the previous figure but on a time scale of 107 units.
Fig 12
Same as the previous figure but on a time scale of 104 units.
The MLD = 9, N = 8 spanning trees dominate packaging on time scales less than about 107 time units while the characteristic assembly time is in the range of 104 time units. About eighty percent of these spanning trees are packaged. This packaged fraction then slowly decreases on time scales of the order of 107−108 time unit, which means that packaged MLD = 9, N = 8 spanning trees are gradually disassembling as a precursor of thermalization. The fraction of packaged MLD = 19, N = 2 spanning trees increases correspondingly. Apparently, pentamers that are being freed up by disassembly of MLD = 9, N = 8 spanning trees feed assembly of the MLD = 19, N = 2 spanning trees. When the system approaches thermal equilibrium, the packaging fractions of the two classes in the long-time limit are nearly the same and close to the separate equilibrium values found earlier in the absence of competition. We conclude that in a packaging competition experiment, the disassembly of particles is a crucial step for reaching thermodynamic equilibrium.
Order-disorder transitions
We now can explore how this kinetic form of RNA selection is influenced by changes in the control parameter. One reason that is necessary for us to do that is that the assembly time-scale of the MLD = 9, N = 8 spanning trees was in the range of 104 time units. This is much too long, of the order of five days if one uses the earlier estimate that a time unit is about five seconds. Since assembly time kinetics depends exponentially on E0, the assembly time can be reduced by reducing the energy scale E0 but what happens with the selectivity if one reduces E0? We will quantify the kinetic selectivity of a packaging competition between class (1) spanning trees that have a small MLD and a large wrapping number and class (2) spanning trees with large MLD and small wrapping number asHere, t is the time at which class (1) spanning trees achieve their maximum packaging yield. If class (2) molecules outcompete class (1) then we set S(E0) = 0. The dependence of the kinetic selectivity on the energy scale on E0 is shown in Fig 13. The parameter values are the same as those of Fig 10. For E0 less than about 1.2kT there is no kinetic selectivity left while S(E0) approaches one for E0 ≃ 4.0 or larger. The disappearance of selectivity for small E0 is due to the fact that both the number of assembly pathways and the number of nodes for assembly intermediates of the MLD = 19, N = 2 trees is two orders of magnitude larger than that of MLD = 9, N = 8 spanning trees (see Fig 6). This means that for MLD = 19, N = 2 trees the entropic component of the free energy activation barrier causes that barrier to be reduced by a factor of about four to five kT. Since the activation energy barrier is about 2E0 higher for the MLD = 19, N = 2 trees, we indeed should expect kinetic selectivity to become negligible for E0 less than roughly 2kT, in agreement with Fig 13. The importance of entropy for smaller E0 has another aspect. For E0 less than about 1.5kT there is a significant fraction of incomplete assemblies since that also increases the entropy of the system. In the language of statistical physics, the kinetic selectivity resembles the order parameter of an order-disorder phase-transition with the energy scale E0 acting as the inverse of the effective temperature.
Fig 13
Kinetic selectivity S(E0) as a function of the energy scale E0 in units of kT for packaging competition between MLD = 9, N = 8 spanning trees and MLD = 19, N = 2 spanning trees.
The other parameter values are the same as those of Fig 10.
Kinetic selectivity S(E0) as a function of the energy scale E0 in units of kT for packaging competition between MLD = 9, N = 8 spanning trees and MLD = 19, N = 2 spanning trees.
The other parameter values are the same as those of Fig 10.One encounters a somewhat similar transition for larger E0 when one increases the affinity ratio −ϵ1 between specific pentamer/RNA affinity and pentamer/pentamer affinity. One might expect that that should improve selectivity but, in actuality, for −ϵ1 = −1.1 a variety of incomplete particles appear packaging MLD = 9, N = 8 spanning trees. These incomplete particles are in coexistence with fully packaged particles containing MLD = 19, N = 2 trees. The reason is shown in Fig 14. The minimum energy state at assembly equilibrium of MLD = 9, N = 8 spanning trees are incomplete particles with ten pentamers while for MLD = 19, N = 2 trees fully assembled particles are the minimum energy state. For −ϵ1 smaller than 1.0, the minimum energy state of MLD = 9, N = 8 spanning trees are fully packaged particles. However metastable states with n less than twelve start to appear roughly above −ϵ1 = 0.6. Metastable intermediate states are quite familiar from experimental studies of viral assembly [4, 62, 63] and from numerical simulations [13, 36–38]. Known as “kinetic traps”, they retard assembly. In summary, It is clear that increasing −ϵ1 beyond about 0.5 also does not improve selectivity.
Fig 14
Minimum-energy assembly profiles for N = 8, MLD = 9 spanning trees (left) and for N = 2, MLD = 19 spanning trees (right) for −ϵ1 = −1.1.
Supersaturation
A different approach to reduce the assembly time scale is to increase the level of supersaturation. Recall in this context that assembly experiments under in-vitro conditions take place at relatively high levels of supersaturation. The supersaturation can be increased by increasing the total pentamer concentration c0, which increases the chemical potential by kT ln c0. In Fig 15 the parameters are the same as in Fig 10 except that the pentamer concentration has been increased from c0 = 1 to c0 = 4. The results look encouraging. First, nearly all MLD = 9, N = 8 spanning trees are packaged. Second, the assembly time scale is reduced to about 103 time units, or about one hour. Increasing the supersaturation further reduces assembly time scales. However, the kinetic selectivity also has been reduced: the concentration of N = 2, MLD = 19 assemblies also rises much more rapidly than for c0 = 1 under assembly equilibrium conditions. The two classes of molecules have comparable packaging probabilities already after about 104 time units.
Fig 15
Packaging competition for c0 = 4.0.
The other parameters are the same as for Fig 10.
Packaging competition for c0 = 4.0.
The other parameters are the same as for Fig 10.Is it possible to maintain a high selectivity under conditions of supersaturation that persists over very long time scales? So far we kept the mixing ratio at D = 0.5, meaning that the number of pentamers is double of what necessary to package all MLD = 9, N = 8 and all N = 2, MLD = 19 spanning trees. What would happen if, under conditions of supersaturation, the mixing ratio would be increased to D = 2? In that case, the early assembly of MLD = 9, N = 8 spanning trees might “drain” the solution of pentamers, which would delay the assembly of N = 2, MLD = 19 spanning trees. Fig 16 shows what happens. The kinetic selectivity is about 99 percent and it is maintained over more than 2 × 105 time units! It comes at a price however: the fraction of packaged target molecules is reduced to about 80 percent while the assembly time-scale has mildly increased. In general, by varying the overall pentamer concentration c0 and the mixing ratio D a wide range of requirements can be satisfied in terms of persistent kinetic selectivity, overall yield, and assembly time-scale.
Fig 16
Packaging competition for c0 = 4.0 and D = 2.
The other parameters are the same as for Fig 10.
Packaging competition for c0 = 4.0 and D = 2.
The other parameters are the same as for Fig 10.
Parameter values
Are the energy and concentration parameters settings used in this article reasonable for in-vitro or in-vivo viral assembly experiments? The overall energy scale E0 was defined as the binding affinity between two pentamers that share an edge for the case that the RNA molecule has only generic affinity (i.e. ϵ1 = 0). The assembly energy per capsid protein of empty capsids has been been measured under conditions of thermodynamic equilibrium [64]. Comparing such data to the model with ϵ1 = 0 gives E0 ≃ 4.73 in units of kT [50], close to the value E0 = 4 used in the paper. The other important energy scale is the dimensionless parameter −ϵ1, which is the ratio between the protein/RNA and the protein-protein affinity. It can be estimated by comparing the capsid protein concentrations at assembly onset for empty capsids and for virions. MS2 capsid proteins aggregate in RNA-free physiological solutions for concentrations above 2.0 mg/ml while in the presence of viral RNA (but not generic RNA) viral particles form already at 0.05 mg/ml [65]. The assembly free energy of a viral particle E0(−30 + 38ϵ1 − 12μ0) at the reference concentration was set to zero by definition, in which case the empty capsid assembly energy is −30 − 12μ0 is positive. An increase in the total pentamer concentration by a factor of 2.0/0.05 = 40 must be able to raise the chemical potential sufficiently so the assembly energy for an empty capsid becomes zero. An increase of the chemical potential equals ln40 in units of kT or about 0.92 in units of E0 = 4kT. This condition is satisfied if −38ϵ1 = 12 × 0.92 or −ϵ1 ≃ 0.29. For convenience we used ϵ1 = −0.5 in the calculations.
Conclusion
In conclusion, we have presented a statistical-mechanics model for the selection of viral ssRNA molecules triggered by packaging signals during the assembly of small RNA viruses. According to the model, there are two important time scales: the characteristic time scale for assembly and the characteristic time scale for thermal equilibration. In the model, RNA selectivity is a non-equilibrium, kinetic effect that disappears when the system approaches thermodynamic equilibrium. Particle disassembly under the action of thermal fluctuations is an essential step for thermal equilibration during packaging competition. Kinetic selection “works” as long as the time scale for spontaneous disassembly is prohibitively long compared to the measurement time. Kinetic selection is the result of the dependence of the height of the activation energy barrier on the RNA folding geometry, as encoded by the wrapping number N and the maximum ladder distance MLD. There is an order-disorder type transition as a function of the strength of the affinity between the capsid proteins where fully packaged particles are replaced by a polydisperse solution of incomplete particles. Near the transition, RNA molecules that optimally encode secondary structure and folding geometry—by having minimal MLD and maximal N—begin to outcompete other RNA molecules that have a larger number of assembly pathways. A similar order-disorder transition is encountered when the strength of the protein-RNA interaction is increased with respect to that of the protein-protein interactions.The most striking result is the fact that kinetic selection performs better under increasing levels of supersaturation. There are in fact numerous examples in molecular biology where the fidelity of the read-out of a code is enhanced under non-equilibrium conditions, known as kinetic proofreading [66]. DNA replication is an important example [67, 68]. There are similarities between the kinetic selection under supersaturation discussed in this paper and Hopfield kinetic proof reading. Kinetic selection works when the formation of the encoded system (MLD = 9, N = 8 spanning tree particles) is quasi-irreversible while attempts of forming particles with molecules that are not encoded (MLD = 19, N = 2 spanning tree particles) lead to disassembly. This is the case because of the higher assembly activation energy. An unusual feature is that by tuning the mixing ratio to be close to the stoichiometric ratio for the early assembling encoded particles can “monopolize” the pentamers and thereby greatly retard the formation of improper particles.An important question about the model concerns the relation between the kinetic selectivity discussed in this paper and selection in terms of the thermodynamic assembly free energy as discussed in the literature cited in the introduction. The relation between the two forms of selection lies in the distinction between the nucleation and elongation stages. An ssRNA molecule may well have packaging signals that produce an unusually low assembly energy barrier causing it to be selected during the nucleation stage. If, however, the molecular weight of the molecule is too high and/or if the solution radius of gyration is too large then the elongation process simply would not be able to complete. It is this elongation part of the assembly that was captured by the earlier studies of the thermodynamic assembly free energy and that was not included in the present study. Following ref. [45], we are assuming here that the selective effects of the packaging signals operate nearly exclusively during the nucleation stage and are weak during the energetically downhill elongation stage which is dominated by the non-specific interactions. The model presented in this paper can be generalized to investigate the competition between selection by nucleation and selection by elongation. The simplest step would be by letting the competing molecules have different reference chemical potentials μ0. This allows for the possibility that the two RNA condensates have a different molecular mass and/or a different overall MLDs (recall that the MLD in this paper only refers to the ψ sequence). An additional refinement would be to allow μ0 to depend on n. As an assembly intermediate grows, an RNA molecule that is only partially condensed will be progressively confined by the developing capsid. This reduces the RNA conformational entropy, which reduces the free energy gain obtained when a pentamer is taken from the solution and added to an incomplete assembly. Finally, the loss of conformational entropy of individual RNA nucleotides as they are getting packaged can be included in the model by associating a fixed amount of entropy with every RNA segment that has not yet associated with a pentamer. This amounts to a renormalization of the ϵ1 and ϵ2 parameters.A second concern with the model is that it is generally assumed that virions are in a state of full thermodynamic equilibrium. If that were truly the case then equilibrium thermodynamics rather than kinetics would always be the appropriate description mode. We would argue against the assumption of full thermodynamic equilibrium for virions. Reaching complete thermodynamic equilibrium in a packaging competition experiment necessarily involves disassembly. The time-dependent assembly yield of early-assembly kinetically encoded molecules needs to decrease in order for thermal equilibrium to be established. Experimentally, the time scale for spontaneous disassembly of virions under the action of thermal fluctuations must be extremely large compared to observation times. We know this because under in-vivo conditions virions typically are released after assembly into environments with few or no capsid proteins. Under those conditions, the chemical potential of the capsid proteins of a virion would be large compared to that of capsid proteins free in solution, which would trigger disassembly. In actuality, spontaneous dissolution of virions in solutions without capsid proteins is not observed under physiological conditions. The time scale for spontaneous disassembly of virions must be large compared to both laboratory time scales and the characteristic time scales of the life cycle of a virus. Maturation processes [63, 69–77] probably play an important role in suppressing spontaneous disassembly process. Maturation stabilization of complete assemblies would further justify a focus on selection during nucleation, as opposed to selection by late-time disassembly/assembly competition necessary for thermalization.A third concern is that the model presented in this paper is not a realistic representation of any particular virus. It was constructed, for mathematical convenience, by borrowing features of the dodecahedral Zlotnick model for empty capsids, the dodecahedral gRNA spatial distribution of the Nodaviridae, and the asymmetric reconstruction of MS2. The wrapping number concept as a geometric characteristic of the geometry of the RNA outer surface really is appropriate only for the special case of Nodaviridae. Different geometrical characteristics will have to be developed for other viruses. The MS2 virus is an interesting target since a detailed asymmetric reconstruction of the MS2 virion is available [44]. Since the packaging signals of MS2 gRNA associate directly with capsid proteins—rather than with capsomer edges—the spanning tree would have to have icosahedral rather than dodecahedral symmetry. Moreover, MS2 has (approximate) T = 3 capsid symmetry with 180 capsid proteins rather than the T = 1 structure with 60 proteins grouped in 12 pentamers that we assumed so that would have to be included as well. We hope to carry out such a study in the future.A final concern about the model is the study by Tresset and collaborators of the assembly of the CCMV plant virus [78, 79]. They find that the so-called en-masse assembly scenario provides a good description [80]. In that scenario, a virion does not assemble on a protein-by-protein basis, as was assumed in the present paper. Instead, assembly starts with the formation of a disordered RNA-protein condensate that shrinks and then transits into an ordered virion. It is tempting to identify CCMV-type assembly (and the en-masse assembly scenario) with the entropy-dominated low selectivity assembly scenario that we encountered for lower E0 and for larger values of the RNA/pentamer affinity. This also will have to be explored in the future.How could the model (or one of its generalizations) be tested experimentally? It has been shown that large ssRNA molecules in solution with identical primary sequences adopt a range of secondary and tertiary structures [81]. In the presence of a sufficient concentration of positive polyvalent counter ions, large ssRNA molecules have been shown to condense [82]. For a solution of gRNA molecules, that should produce a variety of pre-condensed molecules with roughly the same size but with different surface structures. Asymmetric reconstruction of the packaged particles could then reveal which RNA structures were selected for.
Smoluchowski theory of bimolecular reactions.
(PDF)Click here for additional data file.6 Sep 2021Dear Dr. Bruinsma,Thank you very much for submitting your manuscript "Packaging Contests between Viral RNA Molecules." for consideration at PLOS Computational Biology.As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.Though the work is promising, I cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.When you are ready to resubmit, please upload the following:[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).Important additional instructions are given below your reviewer comments.Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.Sincerely,Samuel Coulbourn Flores, Ph.D.Guest EditorPLOS Computational BiologyArne ElofssonDeputy EditorPLOS Computational Biology***********************Reviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #1: The authors develop a kinetic model for the assembly of small RNA viruses based on the nucleation theory. As inside a host cell, there is a significant amount of non-viral RNA and other anionic polyelectrolytes, it is not clear how the capsid proteins select their native RNA. The main focus of the paper is on understanding how genomic RNAs outcompete other non-viral RNAs. The authors introduce the concept of the wrapping number, a geometrical measure of the ability of genome to bind capsid proteins. The authors discuss that the outcome of the competition can be explained in terms of the combined effect of the wrapping number and the maximum ladder distance (MLD), a topological measure of the degree of branching of the RNA secondary structure.The paper is relatively well-written. I think that the subject of the paper is timely considering the current pandemic and the results are exciting. Thus I recommend the paper for the publication. However, there are several items that the authors need to clarify before publishing the paper.1. The authors explain the assembly of capsid proteins around ssRNA based on the heterogenous nucleation. However, they discuss that “the spatial distribution of packaging signal is likely to play an important role.” If the process is based on the nucleation and growth, then why is the distribution of packaging signals important? After the capsid is nucleated in one of the packaging signals, does it still matter that there are other packaging signals if the process is based on the nucleation and growth? One might think about the process of multi-nucleation on different segments of the RNA, but it seems that the authors do not agree with the en mass assembly or multi nucleation growth. For the en mass mechanism and the multi-nucleation process see ACS Nano 14, 3170-3180 (2020).2. The authors ignore the importance of the length of genome, which is often longer than mRNA. Recently, it was shown that the capsids assembled around shorter RNAs are unstable, see Small 2020, 16, 2004475. See also Physics Reports 847, 1-102 (2020) and many references therein.3. If there are a few long branches in an RNA, it is not obvious if MLD is still a good “predictor” of RNA branchiness. Have you seen the work of van der Schoot and collaborators? For example, they show how the free energy or osmotic pressure due to the presence of RNA changes with the degree of RNA branching, see for example Phys. Rev. E 94, 022408, (2016). Since the base-pairing increases the stiffness of RNA segments, sometimes the linear polymers win over the branched ones to attract proteins (J. Phys.: Condens. Matter 30, 044002 (2018)). Also see The Journal of Physical Chemistry B, 119, 13991-14002 (2015) how branchedness and length contribute to the competition experiments.4. Can the authors explain why they believe that the RNA selection mechanism is purely kinetic and disappears after the system has reached full thermodynamic equilibrium. Several papers show that the encapsidation of branched polymers is energetically more advantageous than a linear one, see Elife 2, e00632 (2013) in addition to the paper mentioned in item 3.5. The authors find that the en-masse scenario is not a viable route for selective virion assembly. Have they seen the experiments of Tresset and collaborators, which show that for CCMV, en mass mechanism is a viable one (Nat. Commun. 9, 3071 (2018)), also see the simulations corresponding to his experiments in ACS Nano 14, 3170-3180 (2020).6. The authors state, “Such “specific” interactions operate against a background of a generic electrostatic affinity between the negatively charged RNA nucleotides and positively charged residues of the capsid proteins.” Why does it work against? Doesn’t specific interaction facilitate the packaging, thus helping the electrostatic interactions?7. There are a few redundant sentenceds and typos in the paper. I am sure that the authors will proofread it again.Reviewer #2: Mizrahi et al. examine the selective packaging of RNAs into viral capsids. Their model assumes capsids are built from 12 pentameric subunits and the RNA interacts along edges of the pentamers. They then examine packaging competitions between different shaped RNAs which can be characterized by their maximum ladder distance.The work in this manuscript is similar to that done by Dykeman and Twarock on the assembly competition between viral genomic RNAs and host cellular mRNAs that would occur in vivo, but contains a couple of key differences which make it new.1) The authors model treats the interaction between RNA and capsid as occuring between pentameric edges. This type of model would be applicable to insect viruses in the Nodaviridae family which construct a dodecahedral cage of ds-RNA. An assembly model looking at this type of interaction (along with a combinatorics enumeration of the RNA configurations) has not been constructed yet.2) The model attempts to account for some of the effects of the geometry of the RNA secondary structure and its effects on RNA assembly efficiency which is lacking in previous models (e.g. Dykeman et al.).Some comments on the science.MAJOR1) Enumeration/Combinatorics of the spanning trees. There is limited discussion on the overall numbers of spanning trees and the computational method how they were counted. The manuscript also leaves out an important discussion on the number of ways to embed a spanning tree. For example. The linear spanning tree case corresponds to the previous treatment of RNA paths in Dykeman et al. For this one case, there are ~ 64 ways to embed the linear spanning tree to obtain a Hamiltonian path. Similarly, there will be an equivalent treatment for the other spanning tree shapes which will each have a potentially different number of ways to embed the spanning tree into the dodecahedron protein shell resulting in a Hamiltonian path. The Authors use the minimum energy embedding, where the pentamers are brought onto the RNA in an order which minimizes the energy at each step. This seems reasonable, but could have some issues if there is more then one path which does this. This would give an entropic term that needs to be accounted for where spanning trees with more path options that have the same energetic pathway are favored over others with less options. For example, in Fig 1 my guess (unless you tell me I am mistaken) is that there are two orientations of the path which should have the same kinetic profile in your model. The one shown, and the other where the order of the four edges on one of the two blue pentamers is clockwise instead of anti-clockwise.Paths are fairly easy to construct on the dodecoahedron, so I don't think it should be to hard to enumerate the embedding options (only 64 for the linear case with less options I would guess for more complicated trees). But this is dependent on the Authors spanning tree enumeration procedure which they should discuss.2) When constructing minimum energy paths, the manuscript states that (line 168) for more then one choice of lowest energy configuration, an arbitrary choice is made. However, this choice could impact on the choices further down the path, resulting in lower drops in free energy further down the path which may be kinetically less favorable. It is probably not too much of an issue with the doedec since it is small, has short assembly with limited options, but some discussion about this issue would be nice. Potential fixes would be a breadth first search of the assembly paths, keeping the n lowest energy at each pentamer addition. This is also a simple check to see if there is more then one embedding which has the same energy at each step (point 1 above).3) The choice of energy parameters needs more justification. In Zlotnick (1994) the protein-protein interaction energy was chosen such that fully assembled empty particles were obtained at thermodynamic equilibrium with almost no free capsid. The critical threshold value for this was around -2.8 kCal/Mol (which depends on concentration of total protein - your c0). I would have thought that it would be ideal to be somewhere around this value since empty particles for most viruses form readily at appropriate concentration in vitro, but it is not clear where your parameters lie. (see point 4 in MINOR) Some more clarification here would be appreciated.MINOR1) The model for RNA interactions with the capsid has a Hamiltonian path on a dodecahedron and a dodecahedron for the protein shell. In Dykeman et al. the capsid was a dodecahedron and the Hamiltonian path was on an icosahedron. This was due to RNA-CP interactions being modeled as occurring at the centers of the capsids. Here both the protein cage and Hamiltonain path cage are dodehaedron, with RNA interactions on the edges. There is nothing wrong with this per se, but I would clarify that this models the type of situation in Nodaviridae. Some discussion of how the model could be used in viruses such as MS2 would also be useful.2) A brief discussion on why (mathematically) every spanning tree for the dodecahedron must have 20 nodes and 19 links would be useful.3) Similarly, the wrapping number and its connection with the number of pentamer edges occupied by RNA should also be discussed more. There is important nuisances between the linear case and the branched case, i.e. in the linear case, pentamers must have either 2/3/4 edges occupied by RNA, for any embedding (as the Hamiltonian path requirement enforces this when there are only 2 end nodes as in the linear case) where as for the branched case, there could be embeddings which have pentamers with 0/1/2/3/4 edges occupied (due to > 2 end nodes). An end node is a vertex on the spanning tree with only one link emanating from it.4) The values for the energy parameters (in particular e1,e2,e3,e4) are somewhat convoluted into dependencies on D,c0 etc.. I would suggest also computing the values for e1-e4 for your examples and state concentrations of co in equivalent values of uM, as this will allow for comparisons with expected concentration in cells and in vitro experiments.5) For the model, is there an intermediate step (where RNA-CP contact is formed followed by CP-CP contact) or do both RNA-cp and CP-CP contacts occur in one step?6) Some more discussion on the nucleation energies and a comparison between the different spanning trees would be interesting.7) There was no discussion (as promised) about ways to incorporate RNA conformation entropy into the model in the conclusions. Some discussion on how RNA secondary structure can be mapped onto your spanning trees (e.g. where are single stranded and double stranded areas on the tree, where are LD interactions and local hairpins). I think your model is currently applicable to Nodaviridae, but I would be interested in a brief discussion of how it could be applied to MS2.**********Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: No: The computational code is not available.Reviewer #2: Yes**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: Yes: Eric C. DykemanFigure Files:While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at .Data Requirements:Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.Reproducibility:To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols18 Jan 2022Submitted filename: Response_PLOS(2)_Jan.18.pdfClick here for additional data file.9 Feb 2022Dear Dr. Bruinsma,We are pleased to inform you that your manuscript 'Packaging Contests between Viral RNA Molecules and Kinetic Selectivity.' has been provisionally accepted for publication in PLOS Computational Biology.I think the writing is very didactic and the idea is original and interesting. Best of luck with the rest of the process.Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.Best regards,Samuel Coulbourn Flores, Ph.D.Guest EditorPLOS Computational BiologyArne ElofssonDeputy EditorPLOS Computational Biology***********************************************************Dear Authors,I like the clear and didactic explanation, though it is a bit outside my usual field of work.The referees appear to be largely positive to the work, I will ask the editorial staff to send your revision 1 to them to see if they have any further comments.For my part I have only a very minor pedantic comment -- on line 554 please fix "adopt will adopt".With kind regards,SamReviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #1: The authors have modified the manuscript based on my recommendation and addressed all my concerns. I thus recommend for publication now.Reviewer #2: The manuscript has substantially improved and the Authors have thoughtfully replied to my concerns in the revision. I am happy with the revisions and I believe that it can now be accepted.**********Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: YesReviewer #2: Yes**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: Yes: E C Dykeman28 Mar 2022PCOMPBIOL-D-21-01496R1Packaging Contests between Viral RNA Molecules and Kinetic Selectivity.Dear Dr Bruinsma,I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!With kind regards,Agnes PapPLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol
Authors: David H J Bunka; Stephen W Lane; Claire L Lane; Eric C Dykeman; Robert J Ford; Amy M Barker; Reidun Twarock; Simon E V Phillips; Peter G Stockley Journal: J Mol Biol Date: 2011-08-03 Impact factor: 5.469
Authors: Aron M Yoffe; Peter Prinsen; Ajaykumar Gopal; Charles M Knobler; William M Gelbart; Avinoam Ben-Shaul Journal: Proc Natl Acad Sci U S A Date: 2008-10-09 Impact factor: 11.205
Authors: Peter G Stockley; Reidun Twarock; Saskia E Bakker; Amy M Barker; Alexander Borodavka; Eric Dykeman; Robert J Ford; Arwen R Pearson; Simon E V Phillips; Neil A Ranson; Roman Tuma Journal: J Biol Phys Date: 2013-04-12 Impact factor: 1.365