Literature DB >> 33495364

Estimating computational limits on theoretical descriptions of biological cells.

Abstract

There has been much success recently in theoretically simulating parts of complex biological systems on the molecular level, with the goal of first-principles modeling of whole cells. However, there is the question of whether such simulations can be performed because of the enormous complexity of cells. We establish approximate equations to estimate computation times required to simulate highly simplified models of cells by either molecular dynamics calculations or by solving molecular kinetic equations. Our equations place limits on the complexity of cells that can be theoretically understood with these two methods and provide a first step in developing what can be considered biological uncertainty relations for molecular models of cells. While a molecular kinetics description of the genetically simplest bacterial cell may indeed soon be possible, neither theoretical description for a multicellular system, such as the human brain, will be possible for many decades and may never be possible even with quantum computing.

Entities: Chemical Disease Gene Species

Keywords: biological complexity; biological uncertainty principle; computational limits

Mesh：

Year: 2021 PMID： 33495364 PMCID： PMC8017709 DOI： 10.1073/pnas.2022753118

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 11.205

In 1966 Crick asserted: “The ultimate aim of the modern movement in biology is to explain all biology in terms of physics and chemistry (1).” Our interpretation of Crick’s comment is that explanation at a deep level will come from physics and chemistry theory. With the advent of computers and their use as a major tool in scientific research, theorists have come to rely more and more on simulations to understand physical, chemical, and biological systems, rather than on analytical models. Simulations are no longer just used to check the range of validity of theoretical equations, but to understand experimental results for systems that are too complex to be amenable to treatment by analytical models. The challenge, of course, for biologically oriented theorists is not to simply run simulations, such as molecular dynamics, but to gain an improved understanding of how a particular biological system works from an insightful analysis of the trajectories. By concentrating on a set of global variables that are assumed or known to be important for the functioning of a cell, such as mean concentrations of metabolites, RNA, and proteins, coarse-grained cell dynamic descriptions are currently becoming available from developments in the field of system biology and compare very favorably with experimental data (2). The computational burden of such coarse-grained simulations is straightforwardly manageable by modern computers, but the selection of the global variable, which is crucial, and the neglect of spatial concentration variations may influence the results obtained. In contrast, simulations that resolve molecular and spatial detail do not rely on choosing relevant coarse-grained degrees of freedom beforehand, but pose fundamental problems in terms of the computational effort. Motivated in part by the recent success of simulating subcellular biological organelles and the goal of simulating whole cells with molecular resolution (3, 4), the question arises of how complex can a system be before it is no longer possible to be simulated on a computer. To make a quantitative assessment of the level of complexity that can be simulated at the molecular level, we consider here the two main molecular methods currently being employed (3, 4), namely the physics and chemistry methods of molecular dynamics simulations and the probabilistic methods of solving molecular kinetics equations. We consider the simplest cell capable of reproducing itself, the bacterium Mycoplasma genitalium, and the most important and most interesting multicellular system, the human brain.

Results

A force-field–based molecular dynamics simulation consists of solving Newton’s equations of motion for every atom of every molecule using simplified, empirical interatomic energy functions to determine the position and velocity of every atom in the system as a function of time. These molecular mechanics energy functions do not consider the electronic degrees of freedom, which are essential for describing the making and breaking of covalent bonds or electron transfer, as occur for the many chemical reactions in a biological cell. Chemical reactions require the use of quantum mechanics, which is much more time-consuming because the interatomic forces must be obtained by solving Schrödinger’s equation rather than from empirical functions. Fortunately, the quantum-mechanical calculations are only necessary for the atoms of the active sites of the enzymes and their bound substrates. The remainder of the protein, the unbound substrates, and the aqueous solvent, can be simulated with acceptable accuracy using force fields. This combination of quantum mechanics and molecular mechanics (QM/MM) for proteins originated with the work of Warshel and Levitt (5) and is now extensively used (6). The approximate computational time (T) for a QM/MM simulation of a system of N atoms, which contains a total of p regions with n atoms each that are treated quantum mechanically, can be written aswhere T is the time over which the process is modeled, Δ is the time discretization step, and ν is the computer speed in floating-point operations per second (flops). The classical contribution (first term) is dominated by calculating the electrostatic interactions between the partial charges on all the atoms of the protein, which scales as α N ln N, where the numerical prefactor that counts the number of floating-point operations, α, is on the order of 10 (7). The quantum-mechanical part (second term) is on the density-functional level and is dominated by the effort required to diagonalize the Hamiltonian, which scales as the cube of the number of atoms n in the quantum regions of the molecules. The numerical prefactor, β, is on the order of ∼104.* Importantly, both terms in Eq. essentially scale linearly in system size, i.e., linearly in N or p. A M. genitalium cell contains a total of ∼3 × 109 atoms [spherical cell volume of radius of 0.2 μm (12) times the atom number density of water of 1011 atoms per μm3]. Of the ∼77,000 protein molecules in the cell (12), ∼26,000 (= p) (12) are enzymes with active sites. Assuming n = 100 atoms, Δ = 10−15 s, and ν = 1017 flops (the speed of the currently fastest supercomputer: the “Summit” at Oak Ridge), T for the 2-h doubling time (12) of the bacterium is ∼109 y, where >99% of the computation time is for the QM part. Although such a QM/MM calculation cannot be performed now, will it be possible in the future? Over the past 25 y, the speed of supercomputers has increased roughly 10-fold every 5 y, as predicted by Moore’s law (13). However, it is not clear whether computer speed will continue to increase exponentially at this rate for the next ∼50 y, which would be needed to shrink the computational time down to a month. The most important and most interesting multicellular system is, of course, the human brain, where an ultimate goal of science is to understand thinking, memory, and behavior. Given a particular stimulus, for example, an accurate simulation may be able to explain or predict a response. This multicellular system contains ∼1011 neurons (14), ∼1011 proteins per neuron (15), and an estimated ∼1026 atoms for the average human brain of 1,200 cm3 calculated as above (1.2 × 1015 μm3 × 1011 atoms per μm3), so the situation is quite different. Using a conservative guess that the active site complexes of only 109 of the 1011 proteins in the average neuron must be treated quantum mechanically, the calculation of the quantum-mechanical part for 1 h would take ∼1024 y and ∼1023 y for the Newtonian part. It seems unlikely that computer speed will continue to increase at the same rate for the next 125 y (after which a brain QM/MM simulation could be done in a month). We are, therefore, forced to conclude that, while an atomistic molecular dynamics simulation including quantum effects of a single bacterial cell may be possible in this century, such simulations of a human brain for even 1 h will not be possible until much later and may never be possible. Even if quantum computing could be adapted for molecular dynamics calculations, an enormous speed-up would be needed in order for such simulations to be performed in a reasonable time (16). An alternative, albeit much more approximate, approach to the problem is a description of cells by treating them at a probabilistic rather than explicit particle level. In such description, only the spatial coarse-grained probability distribution of each type of molecule as a function of time is considered (3). Because molecules can diffuse from one part of a cell to another to chemically react or simply bind to another molecule, it is necessary to solve a set of partial differential equations, called the reaction–diffusion Master equation. The solution to this equation yields the probabilities of finding the number of each molecular type—protein, bound complex, lipid, nucleic acid, metabolite, ion, etc.—at a given position in a cell as a function of time. A rough estimate of the time (T) required to solve the reaction–diffusion Master equation by simulating it as molecules jumping between subvolumes (voxels) on a lattice mesh and reacting within the voxels readswhere again T is the time over which the process is modeled, M is the number of different reactive molecular species in a cell treated as a bag of molecules, m is the typical number of specific and nonspecific possible reactions per molecular species, L is the linear cell size, l is the spatial discretization size needed to accurately describe the concentration profile of each different species, K is the maximum copy number of each species per discretization volume element (voxel) treated in the Master equation, Δ is the time step in simulating the Master equation, and v is the computer speed in flops. The numerical prefactor γ is on the order of 102 and accounts for the computational expense of one iteration step. The number of different molecular species in a Mycobacterium genitalia bacterial cell (M) is at least ∼500 (12), which is the number of different proteins and does not include small molecules, posttranslationally modified proteins, or complexes that would have to be treated as separate species in the reaction–diffusion Master equation. The mean number of reactions per molecular species can be estimated as m = 10. To obtain the concentration profile for this cell with L = 400 nm, a discretization of l = 10 nm can be used with K = 1,000 copy numbers in each voxel as a safe upper bound. To account for the fastest unimolecular and bimolecular reactions, a time step of Δ = 1 μs may be sufficient. With our highly oversimplified model that considers a bacterial cell as a bag of ∼500 different molecular species, the time (T) required from Eq. for the Oak Ridge computer to simulate a single bacterial for its 2-h doubling time and the above parameters is roughly 1 mo. Therefore, the limiting factor is not the computational time, but is determined by the time required to experimentally or theoretically determine accurate forward and reverse rate coefficients for all relevant chemical reactions and intermolecular interactions in the cell. Simulating a human brain with ∼1011 neurons (14) is again a wholly different matter. Using the bacterial values for parameters other than M = 4,000 (17) and L = 10 μm (15), the computation time from Eq. with current computing power is increased by a factor of ∼10 for the larger number of different proteins, a factor of 1011 for the number of neurons compared to a single bacterial cell, and by an additional factor (25)3 ∼104 for the difference in L to give T ∼ 1015 y for a 1-h simulation. Consequently, a 1-mo calculation for an enormously oversimplified treatment of the brain for 1-h real time as a collection of bags of molecules would not be practical for about 80 y (again using Moore’s law, which will not necessarily hold for the next 80 y). So, as with the molecular dynamics calculations, we conclude that, with a realistic model, simulating the brain at the molecular level with a reaction–diffusion Master equation may not happen for a very long time and may never be possible because of both limits on computational capability and the determination of rates for all processes for a realistic model of the brain.

Discussion

Our estimates of the computational time required to simulate highly simplified models of cells indicate that the simplest bacterial cell may be theoretically described in the not too distant future by solving molecular kinetics equations. Simulation of this cell by molecular dynamics calculations will take much longer, and would be feasible in ∼50 y if Moore’s law continues to hold. However, our estimates suggest that simulation of a multicellular system such as the human brain may never be possible, presumably even if quantum computing could be adapted for such calculations. We must emphasize that simulating parts of biological cells has (3, 4) and will continue to yield extremely valuable information. On the other hand, enormously important insights can be obtained on how any cell or multicellular system functions, including the human brain, by coarse graining or alternative descriptions that do not include molecular detail. In fact, many researchers share the view that the most important advances in theoretical understanding of complex biological systems will not come from the detailed molecular simulations we have discussed in this work, but from the discovery of collective organizing principles that may be independent of such microscopic details (18, 19). Moreover, new theoretical approaches arising from the growing field of systems biology may also provide answers to important questions that are unanswerable by the methods we have discussed (20). In physics, the Heisenberg uncertainty principle places limits on the precision in determining pairs of values for a single particle, such as position and momentum. The notion of uncertainty in biology is quite different (21) and there is, as yet, no biological uncertainty principle, although the well-known inability to determine the exact nucleotide sequence of a cell’s genome due to experimental errors has been proposed as a biological one (22). In biology, of much greater importance is understanding how cells function, which immediately raises the question we have addressed here of what level of complexity can possibly be simulated on a computer. Eqs. and place limits on the complexity of cells that can be understood with current theoretical molecular methods. They therefore represent a first step in developing equations for realistic molecular models that could be considered biological uncertainty relations. Hopefully, the work presented here will stimulate thinking on the larger issue of formulating a comprehensive biological uncertainty principle.

17 in total

1. The middle way.

Authors: R B Laughlin; D Pines; J Schmalian; B P Stojkovic; P Wolynes
Journal: Proc Natl Acad Sci U S A Date: 2000-01-04 Impact factor: 11.205

2. How fast-folding proteins fold.

Authors: Kresten Lindorff-Larsen; Stefano Piana; Ron O Dror; David E Shaw
Journal: Science Date: 2011-10-28 Impact factor: 47.728

3. Direct energy functional minimization under orthogonality constraints.

Authors: Valéry Weber; Joost VandeVondele; Jürg Hutter; Anders M N Niklasson
Journal: J Chem Phys Date: 2008-02-28 Impact factor: 3.488

4. Theoretical studies of enzymic reactions: dielectric, electrostatic and steric stabilization of the carbonium ion in the reaction of lysozyme.

Authors: A Warshel; M Levitt
Journal: J Mol Biol Date: 1976-05-15 Impact factor: 5.469

Review 5. Theoretical aspects of Systems Biology.

Authors: Mariano Bizzarri; Alessandro Palombo; Alessandra Cucina
Journal: Prog Biophys Mol Biol Date: 2013-04-03 Impact factor: 3.667

6. A whole-cell computational model predicts phenotype from genotype.

Authors: Jonathan R Karr; Jayodita C Sanghvi; Derek N Macklin; Miriam V Gutschow; Jared M Jacobs; Benjamin Bolival; Nacyra Assad-Garcia; John I Glass; Markus W Covert
Journal: Cell Date: 2012-07-20 Impact factor: 41.582

7. Toward Hamiltonian Adaptive QM/MM: Accurate Solvent Structures Using Many-Body Potentials.

Authors: Jelle M Boereboom; Raffaello Potestio; Davide Donadio; Rosa E Bulo
Journal: J Chem Theory Comput Date: 2016-07-11 Impact factor: 6.006

8. Simulating biological processes: stochastic physics from whole cells to colonies.

Authors: Tyler M Earnest; John A Cole; Zaida Luthey-Schulten
Journal: Rep Prog Phys Date: 2018-02-09

9. Essential metabolism for a minimal cell.

Authors: Marian Breuer; Tyler M Earnest; Chuck Merryman; Kim S Wise; Lijie Sun; Michaela R Lynott; Clyde A Hutchison; Hamilton O Smith; John D Lapek; David J Gonzalez; Valérie de Crécy-Lagard; Drago Haas; Andrew D Hanson; Piyush Labhsetwar; John I Glass; Zaida Luthey-Schulten
Journal: Elife Date: 2019-01-18 Impact factor: 8.140

Review 3. Mechanotransduction as a major driver of cell behaviour: mechanisms, and relevance to cell organization and future research.

Authors: Pierre-Henri Puech; Pierre Bongrand
Journal: Open Biol Date: 2021-11-10 Impact factor: 6.411

3 in total