Kingshuk Ghosh1, Adam M R de Graff2, Lucas Sawle1, Ken A Dill2. 1. Department of Physics and Astronomy, University of Denver , Denver, Colorado 80209, United States. 2. Laufer Center for Physical and Quantitative Biology and Departments of Chemistry and Physics and Astronomy, Stony Brook University , Stony Brook, New York 11794, United States.
Abstract
We review how major cell behaviors, such as bacterial growth laws, are derived from the physical chemistry of the cell's proteins. On one hand, cell actions depend on the individual biological functionalities of their many genes and proteins. On the other hand, the common physics among proteins can be as important as the unique biology that distinguishes them. For example, bacterial growth rates depend strongly on temperature. This dependence can be explained by the folding stabilities across a cell's proteome. Such modeling explains how thermophilic and mesophilic organisms differ, and how oxidative damage of highly charged proteins can lead to unfolding and aggregation in aging cells. Cells have characteristic time scales. For example, E. coli can duplicate as fast as 2-3 times per hour. These time scales can be explained by protein dynamics (the rates of synthesis and degradation, folding, and diffusional transport). It rationalizes how bacterial growth is slowed down by added salt. In the same way that the behaviors of inanimate materials can be expressed in terms of the statistical distributions of atoms and molecules, some cell behaviors can be expressed in terms of distributions of protein properties, giving insights into the microscopic basis of growth laws in simple cells.
We review how major cell behaviors, such as bacterial growth laws, are derived from the physical chemistry of the cell's proteins. On one hand, cell actions depend on the individual biological functionalities of their many genes and proteins. On the other hand, the common physics among proteins can be as important as the unique biology that distinguishes them. For example, bacterial growth rates depend strongly on temperature. This dependence can be explained by the folding stabilities across a cell's proteome. Such modeling explains how thermophilic and mesophilic organisms differ, and how oxidative damage of highly charged proteins can lead to unfolding and aggregation in aging cells. Cells have characteristic time scales. For example, E. coli can duplicate as fast as 2-3 times per hour. These time scales can be explained by protein dynamics (the rates of synthesis and degradation, folding, and diffusional transport). It rationalizes how bacterial growth is slowed down by added salt. In the same way that the behaviors of inanimate materials can be expressed in terms of the statistical distributions of atoms and molecules, some cell behaviors can be expressed in terms of distributions of protein properties, giving insights into the microscopic basis of growth laws in simple cells.
Cellular Growth Laws Are Related to Cellular Fitness
Consider the simplest cells, such as bacteria or yeast. Cells grow
at different rates, depending on their environment. A cell’s
growth rate depends on how much food is present, on the temperature
and salt concentration of the external medium, and on its internal
biochemical health. Because a cell’s duplication speed is often
the single most important determinant of its ability to propagate
its progeny, growth rate could have evolved to be a complicated function
of many biochemical details of a cell. However, we review here recent
efforts toward a different view. Modeling shows how the growth laws
of simple cells are encoded within the physical properties of a cell’s proteome (i.e., its full complement of proteins). That is,
some cell behaviors are attributable to large fractions of the proteome,
not just a single protein or gene or pathway. And, some behaviors
are physical (due to protein folding, aggregation, or diffusion, applicable
in some universal or general way across different proteins), rather
than biological (due to the protein’s particular biological
action). Of course, at best, simple models of the physical proteome
are only a first approximation. But, in the spirit of other physical
chemistry, they may provide useful conceptual insights and can make
testable predictions.First, we make a general point: growth
laws are related to, and
manifestations of, evolutionary fitness landscapes. Define a cellular
growth rate, λ, as the number of new cells produced per unit
time from each existing parent. If c(t) is the cell population at time t, then under appropriate
conditions, populations grow asThe growth rate λ can depend, often
strongly, on various quantities; these are called growth laws. Perhaps the best known growth law,[1] λ
= λ(sugar), indicates that cells grow faster with increasing
concentrations of food, such as sugar, up to a point at which the
growth rate saturates. Bacterial growth rates also depend strongly
on temperature and external salt concentrations. For practical bacteriology,
these are important. To kill bacteria, you remove a food source, or
you heat the cells to high temperatures (as when you cook food), or
you introduce high external salt concentrations (in pickling fish
or in making jerky or salting meats, for example). In general, such
growth laws can be expressed as λ = λ(e),
where e indicates a vector of environmental variables, such as sugar, temperature, or salt. These functions
can express cellular growth laws.A growth law is a function
that describes how “today’s
cell” can respond to variations in today’s conditions.
But, cells can change those functions, through evolutionary modifications
over longer time scales. This can be expressed in terms of their genotype, a vector of genes, g. We use the
term genotype here in a very general way: It can describe either a
set of discrete options, such as the presence or absence of genes
or amino acids in proteins, or a continuum of options. It can express
some property of a gene directly or it can be a surrogate for that,
representing some rate coefficients or equilibrium constants in the
biochemical workings of the cell. In general, we can express the growth
rate of a cell asEquation captures
both today’s growth law λ = λ(e),
for fixed evolutionary properties g, while it also captures
that growth rates can be modulated by evolution λ = λ(g) for fixed conditions e. The latter property,
λ = λ(g), is the fitness landscape for cells for which duplication speed is their primary measure of
fitness. Hence, eq relates,
albeit in only a general abstract way, the evolutionary fitness landscape
to the growth laws of cells. For cells that have been under a fixed
selection pressure for a long time, and have evolved to maximize their
fitness, we can study their peak-fitness points by findingNote that, in general, cellular fitness f is not
always equal to just λ, the growth rate. Many types of cells
live in multicellular organisms. They contribute to the fitness of
the whole organism. Their own particular fitness objectives are rarely
known. Here, we describe some models of fitness f(e, g) in simple cells as a function of
properties of the cell’s proteins.We focus on proteins
because more than half of a cell’s
biomass is its proteins. Hence, where physical behaviors matter, proteins
are likely to be predominant players. We distinguish between a protein’s
generic physicochemical properties and its specialized sequence-structure
actions. By “general physical” properties, we mean the
following. First, we are referring to a protein’s health (also
called proteostasis(2)):
the balance between folded and unfolded states, the balance between
folding and degradation, and the states of protein oxidation. Second,
we are also referring to biophysical properties that can matter to
the cell, such as protein movement, transport, crowding, sticking,
and localization. Thanks to enzymatic assays, genome sequencing, and
tens of thousands of atomically detailed protein structures in the
Protein DataBank, the special functions of many proteins are now known.
Less is known about the generic, physical, and health behaviors of
proteomes. While the biological actions are often distinct from one
protein to the next, the physical behaviors can involve commonalities among proteins, often arising more from statistical properties than
from the singular native states. These properties include a proteome’s
distribution of stabilities, folding rates, and sensitivity to perturbations
(such as side-chain charge modification), as shown in Figure . The physical properties of
proteins are important because the cell commits major resources in
energy and biomass toward managing them, in its struggle against stresses,
disease, and death. Just like the specialized jobs of proteins, the
generic actions can be changed through evolutionary processes such
as natural selection.
Figure 1
(a) Folding stability (Δ) varies across a proteome, with longer proteins tending
to have
higher stability. (b) Mean folding rate decreases with increasing
protein size (N). (c) Stability loss from a single
side-chain charge modification (for example from oxidative damage)
scales linearly with the net charge (Q) of the protein
and affects small proteins more greatly than large proteins. While
two-thirds of the human proteome lies within one standard deviation
of neutrality (left of dotted boundary), and is relatively robust
to charge modification, the high-charge outliers are at risk of large
stability loss.
(a) Folding stability (Δ) varies across a proteome, with longer proteins tending
to have
higher stability. (b) Mean folding rate decreases with increasing
protein size (N). (c) Stability loss from a single
side-chain charge modification (for example from oxidative damage)
scales linearly with the net charge (Q) of the protein
and affects small proteins more greatly than large proteins. While
two-thirds of the human proteome lies within one standard deviation
of neutrality (left of dotted boundary), and is relatively robust
to charge modification, the high-charge outliers are at risk of large
stability loss.Here, we describe how
simple physicochemical models, combined with
data from in vitro experiments, can predict some
cell behaviors, rationalize observed growth laws, and generate hypotheses
about diseases, aging, and evolutionary tendencies. The concepts being
sought here, and the models being developed, are coarse-grained, not
atomically detailed. Yet, despite their simplicity, they are often
sufficient to generate testable hypotheses. The first example below
shows how a coarse-grained model of protein folding stability can
explain the high sensitivities of cells to temperature, rationalize
thermal growth laws, predict proteome stability distribution functions,
and give insight into how thermophilic organisms may have evolved
to deal with higher environmental temperatures.
Thermal Properties of Cells
Arise from the Folding Stabilities
of their Proteomes
Cells are highly sensitive to temperature.
It is not uncommon that
the temperatures at which cells die are only a few degrees higher
than the temperatures at which their growth is optimal.[3,4] Small shifts of environmental temperature can drive biological migrations,
extinctions, genetic divergence, and speciation.[5−7] By what mechanism
are cells so sensitive to temperature? Here, we review a polymer folding
model (polymer-collapse theory) that indicates that the thermal sensitivities
of cells arise because proteomes have evolved to have denaturation
temperatures that are only marginally higher than the cell’s
growth temperature.[8−11] Despite its simplicity, this mechanism gives an approximate quantitative
description of bacterial growth rates versus temperature.
Cells Are Sensitive
to Temperature Because Their Proteomes Are
Poised Near Their Denaturation Temperatures
This protein–denaturation–catastrophe
mechanism[8,12] has been made quantitative by a combination
of thermodynamic measurements of 59 mesophilic proteins in
vitro with polymer-collapse theory. Such theory reckons that
reversible protein folding is driven by the small average tendency
of amino acids to prefer sticking to other amino acids inside a compact
native structure, rather than to be exposed and solvated in an expanded
unfolded state in water. This mechanism reckons that the principal
force opposing folding is the chain entropy, which favors the unfolded
state. A version of that simple idea also accounts for electrostatic
interactions and the effects of temperature, salts, and denaturants,
giving the folding free-energy Δunfold = Gunfolded – Gfolded as[10,13,14]where g0 represents
the free-energy when amino acids desolvate and come into contact, z is the average conformational freedom loss per backbone
bond, and Δcp is the change in heat
capacity per amino acid upon folding. Qd and Qn are the total net charge on the
denatured and native structures, respectively, and Rd and Rn are the radii of
denatured and native protein. N denotes the number
of amino acids (or chain length) in the protein, c is the denaturant concentration, κ is the inverse Debye length, lb is Bjerrum length, k is Boltzmann’s
constant, T is the temperature, Th = 373.5, Ts = 385 K,[13,14] and T0 = 300 K; for details, see refs (10) and (14).Equation gives the stability for a
single average protein of length N. Thus, the probability
distribution p(Δ) of stabilities of all the proteins in a proteome (Figure a) can be computed from P(N), the distribution of chain lengths
of proteins in a cell.[8]P(N) is available for different cell types from proteomic
or genomic data.We conclude that proteomes tend to be marginally
stable at their
physiological temperatures; see Figure . This marginal stability is not because the average
stability is low, but because of the distribution of stabilities. The average protein in E. coli is
estimated to be reasonably stable, Δunfold = 6.8 kcal/mol at 37 °C. However,
there are many proteins that populate the “unstable”
side of the distribution: approximately 550 out of 4300 (size of the E. coli proteome) proteins are less stable than 3 kcal/mol.
In the absence of much data, we can estimate how stability is affected
by protein domain structure,[15] and it indicates
that proteins may be even less stable than the estimates above.[8] Furthermore, while these estimates are based
on stabilities measured in vitro, experiments and
simulations show that protein stabilities in vivo or in the reconstituted cytosol are comparable to, or even slightly
less stable than, those in vitro.[16−20] The polymer folding model predicts that this marginally
stable subset of the proteome is responsible for the high thermal
sensitivity of the cell, as seen in Figure by a small shift in temperature from 37
to 41 °C.
Figure 2
Distribution of unfolding free-energy (Δunfold = Gunfolded – Gfolded) of all the proteins
present in the E. coli proteome at 37 °C (in
blue) and at 41 °C (in red). The bin width for the free-energy
is 1 kT. The total area under the curve equals the
number (4300) of proteins present in the E. coli proteome.
Adapted with permission from ref (8). Copyright 2010 Elsevier.
Distribution of unfolding free-energy (Δunfold = Gunfolded – Gfolded) of all the proteins
present in the E. coli proteome at 37 °C (in
blue) and at 41 °C (in red). The bin width for the free-energy
is 1 kT. The total area under the curve equals the
number (4300) of proteins present in the E. coli proteome.
Adapted with permission from ref (8). Copyright 2010 Elsevier.A similar stability distribution is predicted by an evolutionary
kinetics model.[9] In that treatment, random
mutations occur through evolution that can alter the folding stabilities
of proteins. Evolutionary changes occur by a random walk with a drift
on the folding free-energy landscape.[9,21] That work
envisions two limiting states. Proteins have a maximum stability, Δmax, because it becomes
increasingly harder for evolution to find sequences having arbitrarily
high stabilities. Proteins also have a minimum stability, Δmin, because otherwise
they will aggregate or not fold. Within these two limits, it is assumed
that the fitness landscape is flat. The protein stability distribution
that evolves through this evolutionary model gives the same stability
distribution as the polymer folding model.[8]Both the polymer folding model and the evolutionary kinetics
model
give a basis for rationalizing the functional form of cellular thermal
growth laws.[8,10−12] We suppose
that the cell’s growth rate, r(T), is a product of two terms: (i) a factor that describes Arrhenius-activation
of one or more activated metabolic process(es) that govern how the
cell’s growth rate increases with temperature at low temperatures,[8,12,22,23] and (ii) a factor that accounts for the fraction of the proteome
that is folded at any temperature (capturing the denaturation catastrophe
of the proteome at high temperatures[8,11,12]):Here, r0 is some
reference growth rate, Δ⧧ is the activation barrier of some critical growth-limited
metabolic rate, and Γ is the number of essential proteins that
are needed for growth. The product denotes multiplication over the
probability that the ith essential protein (with N amino acids) is in the folded
state which is written in terms of Δunfold (eq ; typical temperature dependence shown in Figure a). The expression above is
simplified by assuming lethal proteins are drawn from the same distribution
as the proteome,[8,12] thus enabling the calculation
over all the proteins in the proteome, with Γ being a fit parameter.
The details of the calculation can be found in previous work.[8,10] Similar arguments[22,24] have been made but using only
a single effective value for Δunfold. The model described here, based on the whole proteome
stability distribution, fits well the experimentally measured growth
rates for mesophilic organisms (Figure b). The corresponding best-fit value of the cell’s
activation barrier for growth, Δ⧧, for E. coli is found to be
16.3 kcal/mol. This happens to be approximately equal to the barrier
for peptide bond formation by the ribosome,[25] and is consistent with estimates from other studies.[12,22−24] Moreover, this activation energy is in the same range
as typical values for various enzymatic reactions, including the barrier
(13 kcal/mol) that is associated with the elongation of RNA by transcription.[26] This model also fits the growth rates of thermophilic
organisms (Figure c) well when using thermodynamic parameters for thermophilic proteins
obtained from analyzing in vitro data sets.[10] A detailed systems level model has been applied
to understand how mutations in metabolic networks change thermal growth
rates.[27,28] They also indicate that the thermostabilities
of metabolic enzymes are rate-limiting at superoptimal temperatures.[28] These models and arguments suggest that fundamental
physicochemical properties of proteomes help to define a cell’s
evolutionary fitness landscape (Figure d).
Figure 3
(a) Protein folding stability across temperatures (Δunfold) for an ideal
mesophilic (blue)
and thermophilic (red) protein based on thermodynamic data.[10] (b) The growth rate model (blue) captures the
experimental growth rate of mesophiles like E. coli (●) and (c) thermophiles (red).[10] (d) Temperature–growth curves in parts b and c can be seen
as slices through a high-dimensional fitness landscape. Some dimensions
can be traversed rapidly (like temperature), while others (ξ)
change over evolutionary time scales. Reprinted in part with permission
from ref (10). Copyright
2011 Elsevier.
(a) Protein folding stability across temperatures (Δunfold) for an ideal
mesophilic (blue)
and thermophilic (red) protein based on thermodynamic data.[10] (b) The growth rate model (blue) captures the
experimental growth rate of mesophiles like E. coli (●) and (c) thermophiles (red).[10] (d) Temperature–growth curves in parts b and c can be seen
as slices through a high-dimensional fitness landscape. Some dimensions
can be traversed rapidly (like temperature), while others (ξ)
change over evolutionary time scales. Reprinted in part with permission
from ref (10). Copyright
2011 Elsevier.
Proteomes of Thermophilic
Organisms Are More Stable Than Those
of Mesophilic Organisms
The polymer-collapse model also gives
insight into how mesophilic cells differ from thermophiles. Mesophilic organisms mostly live at moderate temperatures
(25–40 °C) while thermophilic organisms
grow at higher temperatures. How do their proteomes differ? A global
analysis of 57 thermophilic proteins and 59 mesophilic proteins shows
an average systematic difference:[10] thermophilic
proteins denature at higher temperatures than mesophilic proteins,
as they are more stable, on average, at all temperatures[10] (see Figure a). It also indicates that denatured states of thermophilic
proteins may have less chain entropy than mesophilic proteins.[10] This implies that the denatured states are,
on average, more compact in thermophiles;[29−31] see Figure . In principle, the
difference in stabilities between thermophiles and mesophiles could
arise from any of the types of driving forces, including electrostatics,
hydrophobic interactions, proline substitution, disulfide bonds,[32−57] the presence of amino acids having different flexibilities,[58−61] or loop deletions.[62]
Figure 4
Denatured states are
more compact in thermophilic proteins than
in their mesophilic counterparts. Among other things, this can result
from less net charge on thermophilic proteins or from more subtle
differences in charge patterning (see ref (57) for details).
Denatured states are
more compact in thermophilic proteins than
in their mesophilic counterparts. Among other things, this can result
from less net charge on thermophilic proteins or from more subtle
differences in charge patterning (see ref (57) for details).However, it seems likely that electrostatics may be a key
contributor
to these differences.[33−36,39−48,57,63,64] Electrostatic stability of folded proteins
can depend both on a protein’s net charge and on its charge
patterning. For example, Sawle and Ghosh have shown that a good predictor
of the relative compactness of the denatured structures between thermophilic
and mesophilic sequences is the sequence–charge–decoration
(SCD) metric:[57]Here, q, q are the charges
(1 for basic, −1 for acidic, and 0 otherwise) on two amino
acids m and n with |m – n| being their sequence separation. SCD
expresses the degree of charge mixing;[57] a similar metric has been given by Das and Pappu.[65]Figure gives the SCD values for two sequences of charge. A more compact
denatured state is predicted by a more negative value of SCD. In this
case, a “blockier” sequence of charges gives the more
compact denatured state. Sawle and Ghosh have applied this metric
to a set of 540 orthologous pairs of thermophilic and mesophilic proteins,
and found that thermophiles, in general, have a more compact denatured
state than mesophiles.[57] While this comparison
was made without corresponding 3D protein structures, a comparison
has also been made of a smaller set of 55 well-aligned mesophile–thermophile
pairs, for which structures are known.[66] This too shows that thermophilic domains are, on average and with
high statistical significance, more compact than their mesophilic
counterparts. Charge patterning and segregation also contribute to
the sizes of intrinsically disordered proteins[65] and to the degree of ribosome–protein complexation.[67]
Figure 5
Sequence–charge–decoration (SCD) is a measure
of
charge patterning discrimination and a predictor of the compactness
of a denatured state. The blockier sequence has the more negative
SCD, predicting the more compact denatured state. A key distinction
between mesophilic and thermophilic proteins appears to be the net
charge and charge patterning of the protein sequences (see ref (57) for details).
Sequence–charge–decoration (SCD) is a measure
of
charge patterning discrimination and a predictor of the compactness
of a denatured state. The blockier sequence has the more negative
SCD, predicting the more compact denatured state. A key distinction
between mesophilic and thermophilic proteins appears to be the net
charge and charge patterning of the protein sequences (see ref (57) for details).
Highly Charged Proteins Are in Greater Danger
of Unfolding from
Random Oxidative Damage, Such as in Aging
Here is another
way that protein folding stability appears to manifest as a phenotype
of the cell. Cells sustain increasing oxidative damage with age.[68−71] Protein damage with age follows a fairly universal behavior, independent
of organism (Figure ). We describe here a hypothesis about how oxidative damage can lower
the folding stability of some of the proteome’s proteins.[72] A few things are clear. First, proteins are
key targets of oxidative damage.[73−75] As many as half of the
proteins in an average 80-year-old person are estimated to have oxidative
damage.[68,74] Second, amino acid side-chains are the principal
site of damage,[75−78] estimated to be at least 10 times more common than other types of
damage.[75] Third, oxidative damage is a
random “loose cannon” event in the cell, hitting proteins
across the spectrum of the whole proteome. So, random side-chain damage
may be an important consequence of oxidation. But, one additional
fact poses a challenge for modeling: the level of oxidative damage
in old cells amounts to only about one amino acid alteration per protein,[68,74] a relatively small effect. How might single charge changes in some
proteins be sufficient to contribute to the aging phenotype?
Figure 6
Diverse range
of organisms share a common age-dependent increase
of oxidative damage. The amount of protein damage with age is shown
for worms[69] (purple ◆), flies[70] (green ▲), rats[68] (cyan ■), and humans[71] (blue ▼).
The black curve is the fit to the data, while the blue shaded region
is the range of curves obtained if the fit parameters are changed
by 15%. The pink stripes show the damage levels reached at the end
of life in people with the premature aging diseases progeria and Werner
syndrome.[71] Reprinted with permission from
ref (72). Copyright
2016 Elsevier.
Diverse range
of organisms share a common age-dependent increase
of oxidative damage. The amount of protein damage with age is shown
for worms[69] (purple ◆), flies[70] (green ▲), rats[68] (cyan ■), and humans[71] (blue ▼).
The black curve is the fit to the data, while the blue shaded region
is the range of curves obtained if the fit parameters are changed
by 15%. The pink stripes show the damage levels reached at the end
of life in people with the premature aging diseases progeria and Werner
syndrome.[71] Reprinted with permission from
ref (72). Copyright
2016 Elsevier.Here, we review the following
mechanism:[72] (i) oxidation damages amino
acid sites on random proteins across
the proteome; (ii) some damage events will alter the charges on some
side-chains;[77] (iii) for a small subset
of the proteome, a small change in net charge (as small as +1 or −1
charges) can denature or destabilize its folded state. How can changing
a protein’s charge by only +1 or −1 units unfold a protein? Equation contains an expression
of electrostatic contribution to the free-energy of folding in terms
of Qn2 and Qd2, the square of the charge on the native and
denatured protein, respectively.[10,11] These terms
capture the principle that it is unfavorable to bring a protein’s
net charge from the larger volume of the unfolded state to the smaller
confines of the native state[79,80] (see Figure a). This model has been demonstrated
to predict the following: (i) the experimentally measured pH–salt
phase diagrams for the unfolding of myoglobin, lysozyme, and RNase
A,[14] and (ii) the experimental dependence
of the folding free-energy on the square of the net charge.[79−82]Equation shows that
changing a protein’s charge from Q to Q ± 1, for example from a single oxidative damage event,
will change an average protein’s folding stability by ΔΔ(Q) = Δ(Q ± 1) – Δ(Q), where
Figure 7
(a) For highly
charged proteins, folding leads to the confinement
of many charges into a small space. So, high net charge tends to destabilize
the native fold. (b) This figure shows two points. First, the black
line shows how one standard deviation of charge increases as a function
of chain length in the human proteome. The color shading indicates
the stability change predicted from a single destabilizing charge
modification. The fact that the one standard deviation line coincides
with the boundary between the blue and red regions indicates that
most proteins in the human proteome are relatively long, neutral,
and low-risk, yet there exists a significant number of outliers that
are short, highly charged, and high-risk. Second, the points on this
figure indicate 20 proteins that are important to aging and aging-related
diseases and predicted to be in greater danger of large stability
loss from a single oxidative charge modification. Some are among the
most highly charged proteins in the proteome. Adapted with permission
from ref (72). Copyright
2016 Elsevier.
(a) For highly
charged proteins, folding leads to the confinement
of many charges into a small space. So, high net charge tends to destabilize
the native fold. (b) This figure shows two points. First, the black
line shows how one standard deviation of charge increases as a function
of chain length in the human proteome. The color shading indicates
the stability change predicted from a single destabilizing charge
modification. The fact that the one standard deviation line coincides
with the boundary between the blue and red regions indicates that
most proteins in the human proteome are relatively long, neutral,
and low-risk, yet there exists a significant number of outliers that
are short, highly charged, and high-risk. Second, the points on this
figure indicate 20 proteins that are important to aging and aging-related
diseases and predicted to be in greater danger of large stability
loss from a single oxidative charge modification. Some are among the
most highly charged proteins in the proteome. Adapted with permission
from ref (72). Copyright
2016 Elsevier.Equation is in quantitative
agreement with charge-perturbation experiments.[81,82] It can be computed using only a protein’s sequence. It predicts
a proteome-wide distribution of stability changes that is similar
to that observed experimentally in point mutations of charged residues,
which are reasonable proxies for oxidation.[83]A key conclusion from eq is that the change in folding free-energy, ΔΔ, from a damage event will be proportional to
the net charge already on the native protein before the damage
event. So, any proteins in the proteome that are highly charged
and/or relatively unstable to begin with are in greater danger of
being destabilized by a single oxidative damage event; see Figure b.Figure b shows
an interesting implication of the model.[72] First, the black curve shows the one standard deviation line for
the human proteome. It shows that most human proteins are sufficiently
neutral to be safe from unfolding by single charge-modification events.
Only a few of the proteins in the proteome have a sufficiently high
net charge (of either sign) for the destabilization of their native
state to be comparable to the stability of some entire proteins (roughly
2–4 kT; see Figure ).Now, notice the data points on Figure b. These are 20 human
proteins known from
the literature to be relevant to aging.[84] These 20 proteins all lie in the high-risk region, and thus, the
model predicts that these proteins can be unfolded by a single oxidative
charge-modification event. So, changing a single side-chain charge
by a random oxidation event could contribute to how aging cells lose
protein stability and function.[85]Figure compares a typical
charge distribution found on the majority of proteins, which are nearly
neutral (Figure c)
and not at risk of unfolding from random oxidation events, with those
of highly charged proteins (Figure a,b) at high risk of unfolding from single oxidation
events.
Figure 8
Electrostatic surface potential of (a) telomerase reverse transcriptase
(1132 residues and +98 net charge in Figure b; PDB: 3KYL) and (b) nucleosome-remodeling factor
subunit RbAp48 (425 residues and −29 net charge; PDB: 2XU7) are substantially
different from the smaller, more speckled potential at the surface
of (c) ubiquitin (76 residues and zero net charge; PDB: 1UBQ).[72] Reprinted with permission from ref (72). Copyright 2016 Elsevier.
Electrostatic surface potential of (a) telomerase reverse transcriptase
(1132 residues and +98 net charge in Figure b; PDB: 3KYL) and (b) nucleosome-remodeling factor
subunit RbAp48 (425 residues and −29 net charge; PDB: 2XU7) are substantially
different from the smaller, more speckled potential at the surface
of (c) ubiquitin (76 residues and zero net charge; PDB: 1UBQ).[72] Reprinted with permission from ref (72). Copyright 2016 Elsevier.Additional observations support
this mechanism: high net charge
is known to predict disorder-prone, unstable proteins;[86] disorder and low stability increase the chance
of becoming oxidatively damaged;[87] protein
aggregates of old organisms are enriched in damaged proteins;[88] and in budding yeast[89] and worms,[90,91] aggregates are known to be enriched
in highly charged proteins such as ribosomal and DNA-binding proteins.[72] Interestingly, low net charge is also a signature
of thermophilic proteins,[57] which face
greater stability challenges, as discussed earlier.
Dynamical Properties
of Cells Arise from the Folding, Synthesis,
Degradation, and Transport Rates of Proteins
Below, we review
some of the time scales and dynamical processes
of proteomes that are important to rapidly duplicating cells.
Protein Folding
Happens Fast Enough To Escape the “Grim
Reaper” of Proteome Degradation
First, consider the
distribution of protein folding times. Experiments show that single-domain
proteins fold in vitro over time scales that range
over about 8 log orders.[11,92−96] Thirumalai developed an early model,[97] predicting that folding rates would scale as k = k0 exp(−N1/2) with chain length N. It
was remarkably prescient, given the almost complete absence of data
at that time. It successfully describes folding rates of proteins[98] and RNA molecules.[99] Recently, a microscopic folding mechanism has been proposed, called
the Foldon Funnel Model; see Figure . The model asserts a simple folding mechanism, namely,
that local structures form first and rapidly, followed by larger nonlocal
structures that assemble more slowly because they have to wait for
smaller pieces to form first.[95] The model
gives good predictions of folding rates for 93 single-domain proteins
from sensible values of helix–coil and hydrophobic interaction
parameters[95] (Figure a). The model predicts a median nonabundance-weighted
folding time of 5 s for the E. coli proteome.[95]
Figure 9
(a) Foldon Funnel Model predictions for protein folding
rates vs
number of secondary structure units (Ns), compared to data on 93 small single-domain proteins. The inset
shows the funnel landscape for this model. (b) Mechanism for how local
structures form first and then assemble toward the native state.[95] Reprinted with permission from ref (95). Copyright 2014 American
Chemical Society.
(a) Foldon Funnel Model predictions for protein folding
rates vs
number of secondary structure units (Ns), compared to data on 93 small single-domain proteins. The inset
shows the funnel landscape for this model. (b) Mechanism for how local
structures form first and then assemble toward the native state.[95] Reprinted with permission from ref (95). Copyright 2014 American
Chemical Society.Another model of folding
rates is the Topology Polymer Model.[94] It
treats the chain conformations more explicitly
than the Foldon Funnel Model, fully accounting for entropic costs
of chain topological restrictions (see polymer diagrams in ref (94) for details). The Topology
Polymer Model also differs by (i) using structure-based domain assignments
to predict folding rates and (ii) weighting the folding rates by protein
abundance when predicting the proteome folding rate distribution.[96] The Topology Polymer Model gives good predictions
for the dependence of folding speed on native topology[94,100] and unifies different models of folding kinetics. It predicts an
average abundance-weighted folding time of 100 ms for the E. coli proteome, and it predicts an average of 170 ms for
the yeast proteome.[96] The role of topological
constraints in nucleic acids, proteins, and folding kinetics has also
been recently revisited using simple folding models.[101,102] A question for the future remains: What are the folding rates of
large single-domain or multidomain proteins? There are not yet many
experiments for those types of proteins.[15,103]Figure a
compares
the protein folding times for the yeast proteome (from the Topology
Polymer Model) with other key rates in the cell.[96] The rate distribution is broad. The most remarkable prediction
is that folding speeds seem nearly optimal for outrunning the “grim
reaper” of protein degradation,[96] with the slowest-folding proteins just barely out-pacing the fastest
protein degradation. This case is made by the black curve in Figure a, which is the
result of an evolutionary diffusion-drift model of folding rates,[96] resembling the diffusion-drift model of protein
stabilities[9] described earlier. The model
is based on asserting two physical principles of evolution, namely,
that (i) no protein can fold faster than known ultrafast folders,
due to conformational speed limits,[105] and
(ii) no protein should fold more slowly than the fastest degradation
time. Within this interval, the only selection pressure on folding
kinetics is simply to “beat the clock” against degradation.[96] When fitted with only one parameter against
the folding time distribution derived from the Topology Polymer Model,
the model predicts the slowest folding time to be around 10 s. This
provides a cushion of an order of magnitude in time separation relative
to the fastest degradation times (a few minutes). So, even a protein
that degrades at the fastest rate, if not folded off the ribosome
by cotranslational folding, has at least a 90% chance of folding before
being degraded.[96] For yeast, almost 99%
of the proteome’s proteins fold faster than the degradation
time (see Figure b and ref (96) for
details). Among the four outliers, the only protein that folds significantly
more slowly has 18 chaperone interaction partners,[96] indicating the important role of chaperones in helping
slow folders.[106]
Figure 10
(a) Abundance-weighted
folding time (t in seconds)
distribution across the yeast proteome (blue) using the topology polymer
model,[94] which is in good agreement with
diffusion-drift model (black) with flat fitness landscape.[96] Experimentally measured half-life distribution
of the yeast proteome (green)[104] shows
folding kinetics is faster than protein degradation.[96] Median synthesis time is shown in red. (b) The distribution
of the ratio of protein half-life and protein folding time.[96] Adapted with permission from ref (96). Copyright 2014 Zou et
al.
(a) Abundance-weighted
folding time (t in seconds)
distribution across the yeast proteome (blue) using the topology polymer
model,[94] which is in good agreement with
diffusion-drift model (black) with flat fitness landscape.[96] Experimentally measured half-life distribution
of the yeast proteome (green)[104] shows
folding kinetics is faster than protein degradation.[96] Median synthesis time is shown in red. (b) The distribution
of the ratio of protein half-life and protein folding time.[96] Adapted with permission from ref (96). Copyright 2014 Zou et
al.
Speed of Cell Duplication
Is Limited by the Rate of Protein
Translation
What is the speed limit for cell duplication?
In rapidly growing E. coli bacteria, DNA replication
takes 1–2 ms/base,[107] RNA polymerase
10–40 ms/base,[108,109] and the ribosome 50 ms/amino
acid.[110] The ribosome’s slower rate
of elongation, combined with its enormous size (since the ribosome
itself needs to get copied) and the 10-fold greater cellular abundance
of polymerized amino acids relative to nucleotides, makes protein
translation the largest bottleneck to cellular growth. In fast-growing E. coli, about a third of the cell’s dry weight is
ribosome (including rRNA).[111,112]What is the
maximum rate of protein synthesis? First, cell duplication requires
that each ribosome must make a copy of its own proteins. The fastest
that a ribosome can copy itself is 6 min, assuming a ribosome’s
7336 amino acids[113] are translated at a
rate of 20 per second.[114] Second, each
ribosome must duplicate a corresponding complement of other proteins
too. At fast growth rates, an E. coli ribosome must
make roughly three times its own mass of nonribosomal proteins.[111,115] These nearly 30 000 amino acids must be duplicated in series,
one-amino-acid-at-a-time, by each ribosome, predicting a minimum doubling
time of 24 min, which approximately equals the observed maximum rate
in E. coli.[111]Interestingly,
this 1:3 ratio of ribosomal to nonribosomal proteins
also appears to hold in budding yeast, a fast-growing eukaryote.[116] So, the minimum cell division time t can be estimated aswhere r is the rate that
one ribosome adds one amino acid to a growing protein chain, and L is the number of amino acids in a ribosome. A ribosome
of budding yeast contains 1.6-fold more amino acids than E.
coli’s[113,117] and elongates proteins
at half the latter’s speed.[110,116] So, if protein
translation is indeed the limiting factor in the rate of cell duplication,
it implies a minimum doubling time of 2 × 1.6 × 24 min =
77 min. This is close to experimental values.[118]
Protein Translation Speeds Are Limited by Diffusion and Binding
So, why can an amino acid not be added to a growing peptide chain
in less than 50 ms in E. coli? Translation is known
to require several actions:[119,120] (i) tRNA needs to
diffuse to the ribosomal binding site; (ii) the tRNA must settle and
bind in the appropriate orientation at this site, with proofreading
to verify that it is the correct tRNA;[119] (iii) the peptide is chemically elongated. It is thought that the
peptide elongation reaction (iii) is faster than the accommodation
step, but this is still debated.[119] The
rate of tRNA accommodation (ii) has been found experimentally to occur
on the same time scale as translation (i) and thus could account for
a non-negligible fraction of the total 50 ms. The translation step
(i) depends on tRNA concentration. Evidence for its role in a diffusion
bottleneck is that cellular tRNA concentrations are roughly the same
as those needed to saturate ribosomal kinetics.[121] Furthermore, E. coli devotes a significant
fraction of its dry weight to tRNA (up to 2%[121]) that could have been spent on more ribosomes, suggesting tRNA plays
an important role in protein synthesis speed. Consistent with this,
a tRNA diffusion model correctly accounts for the abundance of tRNA
with growth rate.[121] In short, it appears
that the physical processes of tRNA diffusion (i) and the binding
and proofreading (ii) are limits to the speed of ribosomal translation.
Cellular Actions May Be Broadly Rate-Limited by Protein Motions
Of course, there are very many metabolic rates in the cell. Figure a summarizes a
broad range of enzyme actions, indicating a predominant time scale
around 10–1000 ms.[122] What limits
their rates? Typical enzyme reactions are often parsed into the following
steps:
Figure 11
(a) Distribution of protein and ribosomal catalytic rates in prokaryotes
and eukaryotes.[122] Ribosomal catalytic
rates are remarkably similar to the proteome-wide averages. (b) Catalytic
rates often closely follow those of the functional low-frequency motions
of proteins. Mesophilic adenylate kinase (●),[123] thermophilic adenylate kinase (○),[123] T4 lysozyme (■),[133] triosephosphate
isomerase (◀),[134] ribonuclease binase
(▶),[135] RNase A (▼),[136] and cyclophilin A (◆).[137] (c) Enzyme catalysis slows down with increasing solvent
viscosity in different concentrations of trehalose (○).[131] Part a adapted with permission from ref (122). Copyright 2011 Americal
Chemical Society. Part c reprinted with permission from ref (131). Copyright 2004 Springer.
(a) Distribution of protein and ribosomal catalytic rates in prokaryotes
and eukaryotes.[122] Ribosomal catalytic
rates are remarkably similar to the proteome-wide averages. (b) Catalytic
rates often closely follow those of the functional low-frequency motions
of proteins. Mesophilic adenylate kinase (●),[123] thermophilic adenylate kinase (○),[123] T4 lysozyme (■),[133] triosephosphate
isomerase (◀),[134] ribonuclease binase
(▶),[135] RNase A (▼),[136] and cyclophilin A (◆).[137] (c) Enzyme catalysis slows down with increasing solvent
viscosity in different concentrations of trehalose (○).[131] Part a adapted with permission from ref (122). Copyright 2011 Americal
Chemical Society. Part c reprinted with permission from ref (131). Copyright 2004 Springer.Among these steps, the chemical
reaction step itself is often fast.
The rate of collision between proteins and small diffusing ligands
is on the order of 108 M–1 s–1, implying a time scale of 0.1 ms for typical ligand concentrations
of 0.1 mM.[122] Hence, the rate-limiting
steps for enzyme actions appear to be the other steps in eq ; namely, the opening and closing,
binding, product release steps.[123−127] These steps can be limited by protein dynamics.
Evidence for this view comes from the close correspondence between
catalytic rates and the rates of functional motions observed across
many proteins, as shown in Figure b. However, enzymatic efficiency can be enhanced by
other subtle mechanisms as well. For example, binding of allosteric
effectors can induce fluctuations[128] and
alter conformational landscape either by facilitating conformational
transition or altering the width of the free-energy basin[129] and site-specific local flexibility.[130] Partitioning of flux between different pathways
can also enhance turnover rates.[128] In
spite of these subtleties, the overall role of protein dynamics in
enzymatic turnover is clear (Figure b). Furthermore, enzyme actions often slow down with
increased solvent viscosity[131] (Figure c). This is consistent
with the observed effect of solvent viscosity on loop closure, which
is rate-limiting for catalysis in some enzymes.[132]So, if cell duplication speeds are ultimately limited
by protein
motions, why can proteins not wiggle any faster than they do? First,
protein conformational energy landscapes are naturally rugged, even
along directions of large-amplitude motions.[138] Second, large motions require moving against friction (“wet”
friction of the solvent and “dry” friction from internal
motions[139−142]). Third, some motions require local unfolding of secondary structures,[138] and that depends on protein folding stability,
which is usually marginal.[127,138] Fourth, the protein
conformation that binds the substrate is often little populated, and
requires waiting for the right fluctuation. Lastly, there are trade-offs
between high affinity for the substrate and stabilization of the transition
state conformation.[127] In summary, the
evidence compiled here indicates that cell duplication speeds are
limited by ribosomal and enzyme actions, which are in turn limited
typically by the diffusion of substrate and the motions of protein
molecules as they slosh and contort in the solvent.
Salts Can Slow
Down Cell Growth by Slowing the Rates of Movement
of Proteins inside Cells
High salt concentrations can slow
down the growth of bacteria. Salts are used to pickle foods and to
preserve meats. Salts act by slowing down bacterial growth. Here,
we describe a mechanism for bacterial salt growth laws: Adding external
salt contributes an osmotic pressure that draws water out of the cell,
causing the density of proteins inside the cell to increase, leading
to more sluggish transport of the proteins throughout the cell’s
cytoplasm, and reducing the cell’s growth rate. Experimental
data shows a correlation between cellular growth rate and specific
reactions such as translation speed[110,143] and other
key metabolic reactions.[144] To obtain the
salt growth law, we suppose that growth rates of cells are proportional
to protein–protein collision rates (rd) inside the cell, resulting from protein diffusional transport.We hypothesize that biomolecular crowding has two opposing effects
on reactions: (i) it increases the concentration of interacting species,
but (ii) it hinders and slows the diffusion rate of the reactants.
The combination of these two effects predicts a protein diffusional
rate rd that is proportional to ϕD(ϕ), where ϕ is the protein volume fraction
and D(ϕ) is the diffusion constant depending
on the crowding fraction. The reduction of diffusion due to volume-excluding
monodisperse hard-sphere crowders can be approximated by a simple
formula: D(ϕ) ∼ D0 (1 – ϕ/ϕc)2, where D0 is the diffusion in the limit of no crowding,
and ϕc denotes the volume fraction at which diffusion
critically slows down and is estimated to be ϕc ≈
0.58.[11,145,146] The protein–protein
collision rate isMaximizing rd with
respect to ϕ yields the optimal volume fraction of ϕopt ≈ ϕc/3 ≈ 0.19, close to
the typical protein volume fraction (around 0.2) inside a cell.[11] We can compare this model’s predictions
to experiments on bacterial growth rate as a function of salt and
crowding volume fraction.[147]To account
for heterogeneous protein sizes, two ingredients are
needed. First, we have used the hard-particle theory of Minton,[148] and its parameters, to estimate how D(ϕ) varies with protein size. This model correctly
captures the observed decrease in diffusion with increasing particle
size.[148] Second, we need to know which
particular protein or proteins are responsible for the diffusion limit
to cell growth.Figure a shows
two different assumptions regarding which proteins are rate-limiting.
First, the red curve supposes that all the proteins in the proteome
participate in growth, taken by averaging the reaction flux over the
molecular weight distribution of the whole E. coli proteome. Second, an argument has been made[143] that one particular type of biomolecule may have an outsized
influence on cell dynamics, namely, the tRNA-EF-Tu complex, which
are the 70 kDa particles that bring the tRNA molecules to the ribosome
in order to elongate the growing peptide chain. As we have argued
in the previous section, protein translation, which depends on the
rates of amino acid incorporation, may be rate-limiting for cell growth.
The basic translation speed of incorporating one amino acid at a time
can be further slowed in the presence of crowding due to compromised
diffusion. Might the diffusion of the tRNA-EF-Tu complex be growth-limiting?
This is a large complex. It will diffuse slowly to the ribosome in
the crowded cell environment. This diffusion-bottleneck hypothesis
is supported by a recent study showing that ribosomes and tRNA are
maintained close to the ratios predicted from diffusion arguments
to optimize cell-wide translation rates.[143] The black curve in Figure shows the model prediction when the diffusion of tRNA-EF-Tu
complexes is considered to be rate-limiting.
Figure 12
(a) Growth rate as a
function of crowding volume fraction is well-captured
by the hard-particle model of Minton.[148] (b) Cell crowding has similar consequences on the rate of gene expression.[149] (c) A high-dimensional fitness landscape (as
a function of volume fraction (ϕ) and arbitrary reaction coordinate
ξ) on which part a represents a single slice. Part b is reprinted
with permission from ref (149). Copyright 2013 Macmillan Publishers Ltd.
(a) Growth rate as a
function of crowding volume fraction is well-captured
by the hard-particle model of Minton.[148] (b) Cell crowding has similar consequences on the rate of gene expression.[149] (c) A high-dimensional fitness landscape (as
a function of volume fraction (ϕ) and arbitrary reaction coordinate
ξ) on which part a represents a single slice. Part b is reprinted
with permission from ref (149). Copyright 2013 Macmillan Publishers Ltd.Of course, other factors will matter too in the
balance of salt
and volumes of the cell, including ion fluxes, their regulation, and
the balance of ATP.[150] The model described
above only aims to give a simple estimate of the protein diffusional
factor. Cellular crowding is known to affect many physiological processes.[151] Crowding can also affect gene expression levels
(Figure b), reaching
a maximum before decreasing at higher densities.[149,152] Recent work has also shown cytoplasm can exhibit glassy properties.[153,154] The nature of the cytoplasmic environment depends on the size of
the cellular objects; for example, small objects experience cytoplasm
as a liquid-background while large macromolecules experience a solid-like
environment.[154] Interestingly, metabolism
can also tune the fluidity of the cytoplasm allowing transport of
large cellular components that will otherwise be severely constrained
in their mobility. Thus, switching between different metabolic states
under varying environmental conditions can alter dynamics, cell physiology,
and ultimately cellular fitness.[154]Figure c shows how such
relationships represent single slices through a high-dimensional fitness
landscape that we are only beginning to understand.
Summary
While many behaviors of cells emerge from their unique biology,
they are fundamentally constrained by the common physics that unites
them. Here, we review simple arguments about how these fundamental
limits are encoded within the collective physical properties of proteins
and proteomes. We describe the role of proteome physics in cell growth
laws, providing mechanisms for how cell growth speeds up with temperature
and how high salt concentrations slow it down. Electrostatics models
give mechanistic insight into the stability gain in thermophiles and
the oxidative stability loss in aging and disease. Furthermore, kinetic
models of protein folding applied on a global scale show how folding
times may be limited by the rate of degradation. And, we note that
cell growth appears to be rate-limited by the ribosomal action of
adding amino acids to growing protein chains, and by protein motions
responsible for enzyme actions. In short, physics can give qualitative
and quantitative insights into the growth properties of cells through
the use of simple physical models. We believe such global scale models,
guided by physicochemical principles, will be increasingly sought
after to understand cellular phenotypes and evolution.
Authors: Samantha S Strickler; Alexey V Gribenko; Alexander V Gribenko; Timothy R Keiffer; Jessica Tomlinson; Tracey Reihle; Vakhtang V Loladze; George I Makhatadze Journal: Biochemistry Date: 2006-03-07 Impact factor: 3.162
Authors: M D Shaji Kumar; K Abdulla Bava; M Michael Gromiha; Ponraj Prabakaran; Koji Kitajima; Hatsuho Uedaira; Akinori Sarai Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971
Authors: S Jordan Kerns; Roman V Agafonov; Young-Jin Cho; Francesco Pontiggia; Renee Otten; Dimitar V Pachov; Steffen Kutter; Lien A Phung; Padraig N Murphy; Vu Thai; Tom Alber; Michael F Hagan; Dorothee Kern Journal: Nat Struct Mol Biol Date: 2015-01-12 Impact factor: 15.369
Authors: Tarick J El-Baba; Shannon A Raab; Rachel P Buckley; Christopher J Brown; Corinne A Lutomski; Lucas W Henderson; Daniel W Woodall; Jiangchuan Shen; Jonathan C Trinidad; Hengyao Niu; Martin F Jarrold; David H Russell; Arthur Laganowsky; David E Clemmer Journal: Anal Chem Date: 2021-06-08 Impact factor: 8.008
Authors: Nathan Mih; Jonathan M Monk; Xin Fang; Edward Catoiu; David Heckmann; Laurence Yang; Bernhard O Palsson Journal: BMC Bioinformatics Date: 2020-04-29 Impact factor: 3.169