| Literature DB >> 32756904 |
Jane R Allison1,2.
Abstract
Proteins are dynamic molecules that can transition between a potentially wide range of structures comprising their conformational ensemble. The nature of these conformations and their relative probabilities are described by a high-dimensional free energy landscape. While computer simulation techniques such as molecular dynamics simulations allow characterisation of the metastable conformational states and the transitions between them, and thus free energy landscapes, to be characterised, the barriers between states can be high, precluding efficient sampling without substantial computational resources. Over the past decades, a dizzying array of methods have emerged for enhancing conformational sampling, and for projecting the free energy landscape onto a reduced set of dimensions that allow conformational states to be distinguished, known as collective variables (CVs), along which sampling may be directed. Here, a brief description of what biomolecular simulation entails is followed by a more detailed exposition of the nature of CVs and methods for determining these, and, lastly, an overview of the myriad different approaches for enhancing conformational sampling, most of which rely upon CVs, including new advances in both CV determination and conformational sampling due to machine learning.Entities:
Keywords: collective variables; conformational ensemble; enhanced sampling; machine learning; molecular dynamics; proteins
Mesh:
Substances:
Year: 2020 PMID: 32756904 PMCID: PMC7458412 DOI: 10.1042/BST20200193
Source DB: PubMed Journal: Biochem Soc Trans ISSN: 0300-5127 Impact factor: 5.407
Figure 1.Illustration of projection of a free energy landscape onto commonly used CVs.
(a) Ramachandran maps project the conformational free energy landscape onto the backbone φ and ψ dihedral angle values. The example shown here is for a 100 ns MD simulation of hen egg white lysozyme (PDB ID: 1aki). (b,c) Projections of the conformational free energy landscape onto a single CV: (b) ψ and (c) φ. All angle values are in degrees. Projection of the free energy landscape onto the combination of both backbone dihedral angles is useful because it clearly separates the two major regions of secondary structure, namely (right-handed) α-helices and β-strands, although it is less effective at providing a more detailed degree of separation, such as between parallel and antiparallel β-strands — for this, additional CVs are required. ψ alone (b) could be a useful CV, as it preserves this separation, whereas projection onto φ (c) conflates α-helical and β-strand structure.
Category 1: Sampling enhanced by scaling the temperature
| Category 1. No/general CV | ||
|---|---|---|
| Name | Description | Citations |
| Simulated annealing | System is heated and then gradually cooled. May involve multiple iterations to sample different minima on the free energy landscape. One of the oldest techniques, but recently shown to increase sampling by at least an order of magnitude. Does not sample from a Boltzmann distribution. | [ |
| Simulated tempering | Like simulated annealing, but samples from a Boltzmann distribution. | [ |
| T-REMD: | Multiple independent replicas in parallel, with coordinates exchanged at regular intervals. Sensitive to the choice of control parameters; substantial literature regarding their optimisation. | [ |
| R-REMD: | T-REMD with the highest temperature replica replaced with a pre-generated reservoir of structures. Dependent on reservoir adequately covering conformational space. | [ |
| M-REMD: | T-REMD with several independent simulations at each temperature. Exchanges can occur between these and between temperatures. Takes advantage of highly parallel computing. | [ |
| TAMD: | Explores free-energy landscape of a large set of CVs at the physical temperature using an artificially high fictitious temperature. | [ |
| REST and REST2: | Only the temperature of the solute differs between replicas. Increases the probability of exchanges by reducing the effective system size compared at each exchange attempt. | [ |
| SGLD: | SGLD increases the temperature of low-frequency motions only, with the SGLD temperatures scaled across replicas. The implementation of Wu et al. uses the SGLD partition function to remove the problems caused by the | [ |
Figure 2.Schematic illustration of three key enhanced sampling methods.
In all cases, the black line represents a free energy landscape projected onto a single CV, for simplicity. (a) Replica exchange MD, in which multiple independent replicas are run under different conditions, such as at increasingly high temperatures (red to yellow lines), which smooth the free energy landscape; (b) Umbrella sampling, where the blue harmonic potentials represent the ‘umbrellas’ that restraint conformational sampling along the CV; (c) metadynamics, where the potential energy surface is smoothed along one or more CVs by adding Gaussian functions (blue) to regions of the conformational space that have already been visited until ultimately (cyan) the entire surface is filled; (d) well-tempered metadynamics, where the rate and size of the Gaussian functions (blue) are reduced as sampling progresses, resulting in a smooth free energy surface (or a pre-specified distribution function, cyan) and avoiding over-filling.
Category 2: Sampling enhanced along one or more CVs
| Name | Description | Citations |
|---|---|---|
| SMD: | An external force is applied to induce rare transitions along a CV to occur at a faster rate. Computational analogy to atomic force microscopy. Added force may induce physically unrealistic conformational transitions, and in general, does not sample from a Boltzmann distribution. | [ |
| US: | Uses a harmonic biasing potential to restrain the simulation to a series of windows along a pre-defined CV. If reweighted, can be used to determine the free energy surface and thus the change in free energy along the CV. | [ |
| H-REMD: | Like T-REMD, but each replica is simulated under a different Hamiltonian. Classic versions involve scaling the protein backbone and side chain dihedral angle potentials or the non-bonded interactions. | [ |
| Resolution H-REMD | Each replica is simulated at a different level of resolution, e.g. atomic-level to coarse-grained. | [ |
| Partial- and local-H-REMD | Only terms of the Hamiltonian involving the part of the system for which sampling is slow are exchanged. | [ |
| 2D-REMD | Two-dimensional H-REMD with scaling of temperature and inter-molecular interactions. Also used coarse-grained representation to calculate | [ |
| REAMD: | Combination of aMD with REMD; each replica has a different level of acceleration. Avoids the statistical reweighting problem of aMD. | [ |
| ENM-H-REMD: | Each replica is simulated with a different degree of a distance-dependent biasing potential that drives the structure away from its initial conformation in directions compatible with an ENM. Primarily enhances sampling around the initial structure. | [ |
| HS-H-REMD: | Exchanges take place between three replicas; two with either an attractive or repulsive hydrogen bonding potential added to the Hamiltonian. Similar performance to T-REMD with fewer replicas. | [ |
Category 3: Sampling adaptively enhanced along one or more CVs
| Name | Description | Citations |
|---|---|---|
| aMD: | ‘Boost’ potential applied when potential energy drops below a user-specified cut-off to increase rate of escape from minima. Reweighting of the resulting conformational ensemble to account for the applied bias is not always straightforward. | [ |
| aUS: | Iterates between sampling along a CV according to an umbrella potential and updating the umbrella potential according to an estimate of the probability distribution along the CV to improve sampling of under-sampled regions. | [ |
| SH-US: | Automatically updates the umbrella potential on-the-fly until the umbrella potentials cancel out the free energy profile. | [ |
| Multidimensional aUS | Like aUS, but with the umbrella potentials applied across more than one CV. | [ |
| Local elevation | Generates a history-dependent bias potential by adding Gaussians centred on the currently occupied value of one or more system properties to persuade the system to visit new areas of conformational space. | [ |
| Conformational flooding | Like local elevation but formulated more generally to act on coarse-grained conformational coordinates. | [ |
| LEUS: | A short LE build-up phase is used to construct an optimized biasing potential along conformationally relevant degrees of freedom that is then used in a (comparatively longer) US sampling phase. | [ |
| Metadynamics | Like local elevation, but the biases are added to the free energy rather than potential energy surface, and the bias potential is generalised to act upon any CV or multidimensional set of CVs. | [ |
| Multiple walkers (altruistic) metadynamics | Many metadynamics runs are performed in parallel, all of which contribute to filling in the free energy landscape. | [ |
| WTE metadynamics: | The energy is used as collective variable to sample the well-tempered ensemble. Note that this is different to well-tempered metadynamics. | [ |
| Bias-exchange metadynamics | A number of independent metadynamics simulations are run in parallel, each biasing a different CV, with exchange of coordinates between biases. The REMD and metadynamics act synergistically to overcome barriers. | [ |
| Parallel-bias metadynamics | Single-replica variant of bias-exchange metadynamics in which the CV that is biased is switched during the simulation according to the Metropolis criterion, avoiding the need to have as many replicas as CVs. | [ |
| T-REMD (parallel tempering) metadynamics | Multiple metadynamics simulations are performed in parallel at different temperatures, all of which contribute to filling in the free energy landscape. Improves the exploration of low probability regions and sampling of degrees of freedom not included in the CV, but requires a large number of replicas for all but very small systems. | [ |
| REST metadynamics | Like T-REMD metadynamics, but only the solute experiences different temperatures. | [ |
| WTE-metadynamics REMD | Combines WTE-metadynamics with T-REMD by running WTE-metadynamics at each temperature. Overlap and thus exchange between replicas is increased, and canonical averages of properties of interest can be obtained with reweighting. | [ |
| WT-metadynamics: | The height of the Gaussian functions and the rate at which they are deposited decreases during the simulation and inversely to the time spent at a given value of the CV(s) to prevent over-filling. | [ |
| TT metadynamics: | Like WT-metadynamics, but decreases the height of the Gaussians according to the number of round trips between basins in the free energy landscape. Useful for calculating the free energy surface along a few well-chosen collective variables (CVs) at a time, but requires a priori estimation of the basin positions. | [ |
| µ-tempered metadynamics | Like WT-metadynamics, but allows use of wide Gaussians and a high filling rate without slowing convergence. | [ |
| WT-metadynamics-REMD | Multiple WT-metadynamics simulations are run in parallel, each biasing multiple CVs simultaneously. The degree of bias increases across the ladder of replicas. | [ |
| Metabasin metadynamics | The energy level to which the metadynamics can fill the free energy landscape is restricted, to either a pre-defined level or relative to unknown barrier energies, with both these and the Gaussian shape estimated on-the-fly. Reduces need to carefully choose CVs to avoid sampling irrelevant high-energy regions. | [ |
| OPES: | A recent reconsideration of metadynamics that begins with a coarse-grained estimate of the free energy landscape and converges towards a more detailed representation using a weighted kernel density estimation and on-the-fly compression algorithm. | [ |
| VES: | Use an artificial neural network to determine a smoothly differentiable bias potential as a function of a pre-selected small number of CVs that drives the system towards a user-defined target probability distribution in which free energy barriers are lowered. | [ |
| TALOS: | Uses a generative adversarial network competing game between a sampling engine and a virtual discriminator to construct the bias potential. | [ |
Category 4: Sampling enhanced along one or more CVs learnt on-the-fly
| Name | Description | Citations |
|---|---|---|
| On-the-fly HTMD: on-the-fly | Iterates between multiple short MD simulations (HTMD) and use of an MSM to learn a simplified model of the system to decide from where to respawn the next batch of simulations. | [ |
| Extended DM-d-MD: extended | Uses diffusion maps, a non-linear manifold machine learning technique for dimensionality reduction to select regions of conformational space from an initial unbiased MD simulation from which to launch new rounds of MD simulations. Unbiased simulations are used because CVs based on diffusion maps do not explicitly map to atomic coordinates, and so cannot be used in US or metadynamics, which require calculation of the gradient of the CV with respect to the atomic coordinates [ | [ |
| VAC-metadynamics | Uses tICA to analysis an initial WT-metadynamics simulation to obtain more effective CVs that are used in a second WT-metadynamics simulation. Not strictly iterative. | [ |
| RAVE: | Iterates between enhanced sampling simulations and deep learning using variational autoencoders to learn an optimum but still physically interpretable reaction coordinate, as well as the probability distribution along this coordinate, which are then used to bias the enhanced sampling simulations. | [ |
| REAP: | Uses reinforcement learning to estimate the importance of CVs on-the-fly while exploring the conformational landscape. Requires an initial unbiased MD simulation from which to generate a dictionary of CVs and their trial weights. | [ |
| MESA: | Iterates between umbrella sampling along trial CVs and using an auto-associative artificial neural network with a nonlinear encoder and decoder to learn CVs. | [ |