Literature DB >> 31497702

Enhancing Biomolecular Sampling with Reinforcement Learning: A Tree Search Molecular Dynamics Simulation Method.

Kento Shin¹, Duy Phuoc Tran¹, Kazuhiro Takemura², Akio Kitao², Kei Terayama^3,4,5, Koji Tsuda^1,3,6.

Abstract

This paper proposes a novel molecular simulation method, called tree search molecular dynamics (TS-MD), to accelerate the sampling of conformational transition pathways, which require considerable computation. In TS-MD, a tree search algorithm, called upper confidence bounds for trees, which is a type of reinforcement learning algorithm, is applied to sample the transition pathway. By learning from the results of the previous simulations, TS-MD efficiently searches conformational space and avoids being trapped in local stable structures. TS-MD exhibits better performance than parallel cascade selection molecular dynamics, which is one of the state-of-the-art methods, for the folding of miniproteins, Chignolin and Trp-cage, in explicit water.

Entities: Chemical Disease Gene Species

Year: 2019 PMID： 31497702 PMCID： PMC6714528 DOI： 10.1021/acsomega.9b01480

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

In general, biomolecular systems are complex and have many degrees of freedom.[1] Their biological time scales exceed the current capability of atomistic simulations, which is usually more than a microsecond. Among the various biomolecular simulation methods, molecular dynamics (MD) simulation is a powerful approach for characterizing structural information as a time series of atomic-level trajectories with femto-second resolution.[2] To solve the time-scale problem, many enhanced sampling methods have been proposed to overcome the time-scale limitation. These methods can be divided into biasing and nonbiasing methods, which are listed here: conformational flooding,[3] hyperdynamic,[4] metadynamics,[5,6] multicanonical MD (McMD),[7] replica-exchange MD,[8] string method,[9] mile-stoning,[10] adaptive biased force (ABF),[11] accelerated MD,[12] weighted ensemble,[13] WExplore,[14] parallel cascade selection MD (PaCS-MD)[15] and its variants,[16−19] outlier flooding method,[20] and more. Among the unbiased, enhanced sampling methods, PaCS-MD, which was previously developed by some of the authors of this study, tends to sample the conformational transition pathway between an initial structure and a given target structure. PaCS-MD comprises multiple cycles of parallel simulations (replicas) with selections of MD snapshots from the predecessor cycle to be the initial configuration of the successor cycle.[15,21] It has been found that the number of replicas considerably affects the sampling efficiency[22] (i.e., the larger the number of replicas is, the fewer the number of traps in the local minima in the conformational space is). In addition, resampling from the top 1.0 to 2.0% of the highly ranked conformations is assumed to provide better sampling performance.[23] Therefore, a smarter selection scheme of initial configurations of replicas is needed to improve the performance of not only PaCS-MD, but also of other unbiased, enhanced sampling methods. In this study, we introduce a new method, called tree search MD (TS-MD), by applying a tree search algorithm, upper confidence bounds for trees (UCT),[24] to biomolecular sampling. UCT is a type of reinforcement learning algorithm[25] and has been used to solve search problems in the field of artificial intelligence for games, including AlphaGo Zero.[26−29] In addition, UCT-based approaches have proven to be effective for solving real world problems such as de novo molecular design and retrosynthesis.[30−32] In these search problems, the exploration-exploitation dilemma,[33,34] (i.e., the trade-off between obtaining new knowledge: exploration, and using the obtained knowledge: exploitation) becomes an issue that needs to be solved. UCT is one of the frameworks that solve this dilemma. In this paper, we show that sampling the conformation transition pathway from the initial structure to the targeted one can be regarded as a tree search problem by considering a structure (snapshot) as a node in the tree and a short simulation from the structure as an edge between nodes. Furthermore, we demonstrate that MD-based pathway sampling can be performed by applying the UCT algorithm to the tree search problem using the folding of mini-proteins, Chignolin and Trp-cage, in explicit water as examples. We show that effective sampling can be realized from their simulation results by using the UCT approach, thereby quickly escaping from the local traps. Our implementation is available on GitHub at https://github.com/tsudalab/TSMD/.

Materials and Methods

TS-MD Procedure

TS-MD comprises multiple hierarchical short MD simulations in predefined orders and metrics, which connect the given initial structure (reactant) to a target structure (product). Figure a presents an illustration of one TS-MD cycle that includes three steps: selection, expansion, and back-propagation. In the TS-MD tree, the conformational microstates (snapshots) of a protein and simulated trajectories between them are represented by nodes (circles) and edges, respectively. We employ the UCT approach[24] to realize an effective transition pathway search by growing the tree by repeating the TS-MD cycle. The advantage of the UCT search is that it realizes a pathway search by growing the search tree, maintaining the balance between exploration and exploitation. Even if a search is trapped in a local stable structure, TS-MD is expected to eventually find another pathway by growing other branches by maintaining the balance.

Figure 1

Illustration of (a) TS-MD and (b) PaCS-MD. Conformational microstates (snapshots) of a protein and simulated trajectories between them are represented by nodes (circles) and edges, respectively.

Illustration of (a) TS-MD and (b) PaCS-MD. Conformational microstates (snapshots) of a protein and simulated trajectories between them are represented by nodes (circles) and edges, respectively. We explain the three steps of a TS-MD cycle. The details of the process of them are illustrated as a pseudocode in Algorithm 1. Each node has its score X that is obtained from MD simulation results and called the “reward,” and the number of visited times n. The reward and the visited number are updated during the TS-MD cycles. During the selection step of TS-MD, the tree is traversed from the root to a leaf by recursively choosing the maximum upper confidence bound (UCB) child at each branch. Note that UCB is different from UCT. The details of UCB, which is calculated from the rewards, are explained in the next paragraph. If multiple child nodes obtain the same maximal value, one of them is selected by randomization. During the expansion step, a short MD simulation with re-randomized velocities in the system is performed from the selected node and the snapshot with a minimum root-mean-square deviation (RMSD) is added as a child if the child RMSD is smaller than the parent RMSD. In the field of games, the possible moves (actions) are predefined in most cases. However, the next moves from a node (structure) are not trivial in conformational sampling. Thus, we prepare some sample pathways using MD simulations with different initial velocities and treat each pathway (edge) as a move in the field of games (reinforcement learning). Finally, during the back-propagation step, the visit counts n of all the selected branches are incremented and the reward X is updated to the smallest achieved RMSD. The TS-MD cycle is terminated when the minimum RMSD obtained during the search becomes equal to or less than the reference value (RMSDmin). The UCB of the i-th child node can be calculated via the following equation.where X is the reward of the i-th child calculated from MD simulations, N is the number of times the parent node has been visited, n is the number of times the i-th child has been visited, and C is the constant used to balance the first and second terms. In this study, for the reward X, we take advantage of the backbone atom RMSD after least-squares-fitting of the backbone atoms to evaluate the structural similarity to the product. The negative of the minimum RMSD of the downstream nodes is used as the reward X for minimizing the similarity between the simulated and target conformations. Note that we use the negative of the minimum RMSD because of the maximum node with UCB during the selection step. Whenever a node is visited by the simulation (n increases), the second term of eq decreases, while the UCBs of other child nodes increase upon visiting their parent node (N increases). To maintain the balance between exploitation and exploration, the parameter C needs to be tuned according to the complexity of the target system and the simulation length. The key point in rare event sampling is to explore new microstates, which should be low in appearance frequency.[35] Within this scope, we add a penalty term in the algorithm to promote sampling in the rare region of the conformational space, to prevent traps from the local minima of a free energy landscape, or to sample in the same conformational space between branches. The penalized reward of the i-th node, X′, is calculated as follows.where α is the penalty parameter and Nsim is the number of similar structures. To calculate Nsim, all the structures of the nodes in the search tree are considered. Snapshots of similar structures and dissimilar structures are distinguished by their pair-wise RMSDs being lesser and greater than 1.0 Å, respectively. From the viewpoint of reinforcement learning, the penalized reward can be regarded as a way to solve the problem of UCT-based search in continuous space (i.e., excessive search of similar states or structures). In general, if the search space is discrete in the UCT search, such a penalty term is not required. However, if the search space is continuous, such a penalty is effective in the field of UCT search[36−38] because the search space is too large. Similarly, we introduce the penalized reward, giving a larger penalty in relation to the number of similar structures that exist. To compare the performance of TS-MD with that of an existing method, PaCS-MD[15] is also demonstrated in this study. Figure b illustrates the PaCS-MD procedure. PaCS-MD repeats a cycle of multiple independent MD (MIMD) simulations to obtain a transition pathway between the two structures. In each cycle, short MIMDs with randomized initial velocities of the system are performed and the top snapshots are selected from the trajectories as the initial structures of the next cycle.

Computational Details

To demonstrate the performance of TS-MD, we apply both PaCS-MD and TS-MD methods to the folding of the 10-residue protein Chignolin (PDB id 1UAO(39)) and the 20-residue protein Trp-cage (PDB id 1L2Y(40)) in explicit water (Figure ). During the folding process, an NMR structure is used as a product, whereas the artificially extended structure (see the Supporting Information) with the same amino acids using the same protonation states as the former is used as a reactant. The extended structures were prepared by using the molecular operating environment (MOE, Chemical Computing Group, Montreal, Canada). The AMBER ff99SB force field[41] was employed for both the systems. The proteins were solvated with the TIP3P water model[42] in rectangular simulation boxes with counter ions to maintain the system being neutral in charge. The box sizes were set to ensure that the distances between any atoms of the protein and the box edges were at least 10 Å in each dimension. Cubic boxes with initial edge lengths of 53.1 and 83.7 Å were employed for Chignolin and Trp-cage, respectively. We used these boxes for both PaCS-MD and TS-MD. As a result, the Chignolin system contained 4818 water molecules and two sodium ions, whereas the Trp-cage system contained 18 980 water molecules and one chloride ion. The energy of all the systems was minimized using the steepest descent algorithm to avoid high energy interactions in modeling with the imposing of positional restraints of the backbone atoms of the proteins (force constant: 1000 kJ/mol nm2). Then, the systems were then thermalized at 300 and 320 K for Chignolin and Trp-cage, respectively, using a canonical ensemble with a velocity-rescaling thermostat[43] in 100 ps. After that, the isothermal-isobaric ensemble was employed to relax the system to 1 bar pressure using an isotropic barostat with the Parrinello–Rahman algorithm[44] and to keep the corresponding temperatures in the next 100 ps. In each cycle of PaCS-MD and TS-MD, each short (100 ps) MD simulation used the same thermostat and barostat as used in the equilibrating stage, except that no positional restraint was applied to the system. The electrostatic interactions were treated using the particle mesh Ewald method.[45] The simulation time step was 2 fs with constraints on all bonds via the LINCS algorithm[46] and snapshots in each trajectory were recorded every 1 ps. Leapfrog integration was used. All the MD simulations were performed using the GPU version of the GROMACS 2016.5 package.[47]

Figure 2

(a) Native structure of Chignolin. Chignolin can have three characteristic hydrogen bonds: HB1, HB2, and HB3. (b) Native structure of Trp-cage. The blue area represents the α helix region and the red area represents the other region. We set the number of parallel cascades in PaCS-MD to five. Therefore, in each PaCS-MD cycle, five short MDs are performed from the five highest ranking snapshots from the previous cycle. The maximum number of child nodes (CHILDRENmax) in TS-MD was set to three.

Results

Chignolin Folding

To find a reactive folding pathway of Chignolin in explicit water, PaCS-MD and TS-MD were performed up to 2000 MD iterations. Note that a cycle of PaCS-MD corresponds to five MD iterations because it consists of five short MD simulations. A cycle of TS-MD corresponds to one MD iteration. Figure a presents various profiles of 10 trials of PaCS-MD and TS-MD as a function of the number of MD iterations, which include the RMSD from the product, radius of gyration, the fraction of native contact,[48] and F1 score of contact prediction.[49] We used the backbone atom RMSD after least-square-fitting of the backbone atoms to the product. In TS-MD, the parameters C and α were set to 0.05 and 1.05, respectively. As seen in the profiles, the protein structure becomes closer to the product as the MD iteration progresses and sufficiently converges in 2000 MD iterations. Among the 10 trials, nine PaCS-MD trials and all TS-MD trials reached the RMSD of less than 1.0 Å and four PaCS-MD trials, and eight TS-MD trials reached the RMSD of less than 0.5 Å.

Figure 3

Profiles of RMSD, radius of gyration, the fraction of native contact, and F1 score of PaCS-MD and TS-MD in (a) Chignolin folding and (b) Trp-cage folding as a function of the number of MD iterations. The simulation time of one short MD was 100 ps for both PaCS-MD and TS-MD. The total simulation times of one trial for Chignolin and Trp-cage foldings were 200 and 400 ns, respectively. Ten trials were performed for each method, and different colors indicate different trials. To demonstrate how the penalty to the high frequency states and the value of parameter C affect the search behavior, TS-MD with different settings of α (1.01 and 1.05) with C = 0.05 and TS-MD without penalty (α = 1) with different settings of C (0.01, 0.03, 0.05, 0.1, and 0.2) were performed. Figure a shows a boxplot of RMSD at the 2000th iteration to summarize the performance of each method. In TS-MD without penalty, the value of C affected the search performance, which indicates that it involves the exploration and exploitation trade-off. This effect can also be observed in the projection of sampled structures (Figure S1). In this case, because Chignolin folding can be considered as a relatively simple mechanism, a larger C could converge to the target in 3000 iterations and resulted in better performance compared to that of smaller Cs. However, the performance of TS-MD with penalty with C = 0.05 was better than those of PaCS-MD and TS-MD without penalty with any setting of C, with respect to the number of trials needed to reach the product, the median of the minimum RMSD, and the RMSD in the worst trial. The contribution of the penalty to the search efficiency was confirmed by comparing the performances between TS-MD with penalty and without penalty, with C = 0.05.

Figure 4

Boxplots of the smallest RMSDs for 10 trials of (a) Chignolin folding (3000 iterations) and (b) Trp-cage folding (4000 iterations) for each method: PaCS-MD, TS-MD with α = 1.01, 1.05 and C = 0.05, and TS-MD without penalty with C = 0.01, 0.03, 0.05, 0.1, 0.2. To examine, in detail, the search process, the structures were projected onto a low-dimensional space. Previous studies[39,50] showed that the length of hydrogen bonds, HB1(Asp3:N-Thr8:O), HB2(Asp3:N-Gly7:O), and HB3(Asp3:O-Gly7:N) characterize the process of folding and misfolding (Figure a). It is reported that HB3 works as a folding trigger. The formation of HB1 occurs under the condition that HB3 completes the folding, but the conformation with HB2 leads to a misfolded structure. Figure shows all the structures in the tree projected onto a subspace spanned by HB1 and HB3. In this study, reactive trajectories[15] which are defined as joint multiple trajectories connecting the reactant to the structure with the minimum RMSD, are also projected. Because the space searched by PaCS-MD is quite narrow, some trials stopped at unfolded states, in which HB3 is not formed sufficiently and RMSDs are stopped around 1.0 Å. On the other hand, TS-MD searched a much broader conformational space and almost all trials successfully reached the product. As demonstrated in Figure S1b, some TS-MD trials were initially trapped into misfolded states where HB2 is formed, but eventually avoided searching those states as the MD iterations go on. In addition, there were some reactive trajectories that reached the product by passing through misfolded states. When we judge the success of folding using the distance between the H-bonds (HB1 and HB3 ≤ 3.5 Å), eight trials except for the ocher and light blue ones in Figure , were successfully folded by PaCS-MD and all 10 trials were folded by TS-MD. Furthermore, we calculated the principal components (PCs) of the 18 models in PDB id 1UAO, and projected the final snapshots of PaCS-MD and TS-MD to the first and second PCs as shown in Figure S2. The result shows that the best structures obtained by PaCS-MD and TS-MD are correctly folded and approach the NMR structures. Notice that we used model 1 as the targeted structure.

Figure 5

Two-dimensional plot of Chignolin structures (points) and the reactive trajectory (a polygonal line) for each trial into the subspace spanned by the distances between key hydrogen bonds, HB1 and HB3, of (a) PaCS-MD and (b) TS-MD. The red star is the native structure (product). The minimum RMSD for each trial is at the top-left corner. The area surrounded by the red dotted circle indicates a misfolded state corresponding to that shown in Figure .

Figure 6

Search tree examples of (a) PaCS-MD and (b) TS-MD in the Chignolin folding (light blue and blue trail in Figure ). The color of each node indicates the order in which it is added to the tree. Intermediate structures at some nodes are also illustrated with the values of RMSDs from the product. The structures in the red dotted square have the minimum RMSD in each trial, and the one in the red dotted circle indicate a misfolded structure. In (b) TS-MD, at the beginning, the pathways to a misfolded structure were sampled preferentially. However, the search process escaped from the local minima owing to the UCT algorithm, and finally, reached the product.

Figure shows examples of the search trees of PaCS-MD and TS-MD with the reactant, product, and intermediate structures. They correspond to the trials shown in light blue (PaCS-MD) and blue (TS-MD) in Figure , respectively. The former shows a typical example of a failed case in PaCS-MD, and the latter shows how TS-MD avoided being trapped by misfolded states. The tree colors indicate the order in which each node was added to the tree. With PaCS-MD, the tree went deep and created only one long branch. The search was trapped in a local structure for a long time. However, TS-MD created many branches on one tree and backtracked from some branches to explore other branches when the search was trapped at a misfolded structure. The structure in the red dotted circle is a misfolded structure formed with HB2, and its projection onto the reaction coordinates is shown in Figures and S1. After discovering the best structure in the red dotted rectangle, TS-MD tried to search for different pathways to the target structure. As a result, relatively small RMSD structures were obtained in the later stage (blue or purple nodes), but not less than 0.185 Å. Furthermore, to make it easy to understand the folding process, we represented the progresses of MD iterations using a color gradation (see Figure S3). Search tree examples of (a) PaCS-MD and (b) TS-MD in the Chignolin folding (light blue and blue trail in Figure ). The color of each node indicates the order in which it is added to the tree. Intermediate structures at some nodes are also illustrated with the values of RMSDs from the product. The structures in the red dotted square have the minimum RMSD in each trial, and the one in the red dotted circle indicate a misfolded structure. In (b) TS-MD, at the beginning, the pathways to a misfolded structure were sampled preferentially. However, the search process escaped from the local minima owing to the UCT algorithm, and finally, reached the product.

Trp-Cage Folding

To find a reactive folding pathway of Trp-cage in explicit water, PaCS-MD and TS-MD were also performed up to 4000 MD iterations. Figure b shows the profiles of 10 trials of PaCS-MD and TS-MD. The parameters C and α for TS-MD were 0.05 and 1.01, respectively. Only 1 trial successfully reached the product in PaCS-MD and 3 trials reached in TS-MD with the criterion of RMSD < 1.0 Å. When we judge the success of folding with the criterion of RMSD ≤ 1.5 Å, two and four trials were successfully folded by PaCS-MD and TS-MD, respectively. Some trials were stopped at unfolded structures whose RMSDs were in between 2.0 and 3.0 Å. The convergence of RMSD in Trp-cage folding is slower than that in Chignolin folding because Trp-cage is larger and more complicated. Similar to the Chignolin demonstration, TS-MD with different settings of α (1.01 and 1.05) with C = 0.05 and TS-MD without penalty (α = 1) with different settings of C (0.01, 0.03, 0.05, 0.1, and 0.2) were performed. Figure b shows a boxplot of RMSDs at the 4000th iteration for each method. Both TS-MD with penalty and without penalty exhibited better performance than PaCS-MD. To analyze their properties, sampled structures and reactive trajectories were projected onto 2D reaction coordinates (Figure ). We have also shown the progresses of MD iterations using a color gradation in Figure S4. Following the folding analysis by Meuzelaar et al.,[51] we used RMSDs with the product in two partial segments, SegA and SegB, as the reaction coordinates. SegA contains an α helix region (residue 1–8) and SegB contains the other region (residue 8–20) (Figure b). In the folding process of Trp-cage, there is a free-energy barrier while the α helix structure is formed.[51] As shown in the plots, there are unfolded states at 2–3 Å without the formation of the α helix structure in both PaCS-MD and TS-MD. However, TS-MD is more likely to find the correct transition pathway.

Figure 7

Two-dimensional plot of Trp-cage structures (points) and the reactive trajectory (a polygonal line) in each trial in the subspace spanned by the RMSDs of SegA and SegB of (a) PaCS-MD and (b) TS-MD. The area surrounded by the red dotted circle indicates the unfolded state corresponding to that shown in the Figure .

Figure 8

Search tree examples of (a) PaCS-MD and (b) TS-MD in the Trp-cage folding (orange and green trials in Figure ). The color of each node indicates the order in which it is added to the tree. The intermediate structures at some nodes are also illustrated with the RMSD values from the product. The structures in the red dotted square are those having the minimum RMSD in each trial, and the structure in the red dotted circle is a misfolded structure.

Figure shows examples of search trees of PaCS-MD and TS-MD with the reactant, product, and intermediate structures. They correspond to the orange trial in PaCS-MD and green trial in TS-MD shown in Figure , respectively. It is observed that PaCS-MD chooses a wrong pathway and is trapped in a local structure in which the α helix is formed wrongly. On the other hand, TS-MD searches multiple pathways, including a wrong one and escapes from local stable structures. The structure in the red dotted circle is a misfolded structure and its projection onto the reaction coordinates is shown in Figure . We confirmed that all folding pathways obtained by TS-MD and PaCS-MD first formed the tertiary contacts before helix formation. This pathway is identical to the major pathway of Trp-cage folding reported in a previous study.[52] On the other hand, the minor pathway reported in the paper was not observed in this study. In addition, we have thoroughly performed well-tempered metadynamics[6] folding of Trp-cage with the collective variable radius of gyration (Rg) and RMSD to the extended state of Trp-cage (RMSDextended_state). We mapped the TS-MD trajectories to the free energy landscape as shown in Figure S5. In the best cases, the plots demonstrated a thorough sampling of the whole free energy landscape of Trp-cage. Search tree examples of (a) PaCS-MD and (b) TS-MD in the Trp-cage folding (orange and green trials in Figure ). The color of each node indicates the order in which it is added to the tree. The intermediate structures at some nodes are also illustrated with the RMSD values from the product. The structures in the red dotted square are those having the minimum RMSD in each trial, and the structure in the red dotted circle is a misfolded structure.

Discussion

One of the biggest benefits of using the tree search method is that it can memorize the previous results as a tree and utilize them for further simulations. Because MD simulation is strongly dependent on a trial, it is suggested that multiple trials are be needed to observe desired events. In PaCS-MD, those trials are independent and cannot share the knowledge among each other. In contrast, because TS-MD does many trials in one tree, it can share all simulation results. This property allows TS-MD to reduce the number of simulations needed to get rare events in a complicated system. In addition, TS-MD has several advantages and disadvantages compared to other sampling methods. Specifically, TS-MD (PaCS-MD) can generate a pathway without adding bias force, changing temperature, and modifying potential, which are often done by the previous works such as conformational flooding, hyperdynamics, metadynamics, replica exchange, multicanonical MD (McMD), accelerated MD, ABF, and others. Furthermore, implementation of TS-MD is relatively easy because there is no need to change the force field. The idea of TS-MD does not depend on the MD implementation, and our implementation is also available. As a disadvantage of TS-MD, compared to other well-established methods such as metadynamics,[5] because sampling by TS-MD is not based on any physical principle, but only on the similarity to a product such as RMSD, TS-MD cannot calculate the free energy landscape without being combined with other methods such as the weighted histogram analysis method[53] or the Markov state model.[54−56] The results of a TS-MD search in the conformational space strongly depend on parameter C in UCT. For example, a larger value of C enables broader sampling in the conformational space, as shown in Figures S6 and S7. When computational resources are limited, it is beneficial to reduce the cost of unnecessary searching using a relatively small C. Although we treated parameter C as a constant in this study, automatic tuning methods of C have already been proposed[57] and their benefits have been shown in solving real problems.[58] In addition, the maximum number of child nodes (CHILDRENmax) affects the conformation sampling efficiency, although we fixed it to three in this paper. Furthermore, we introduced a penalized reward with the parameter α for the acceleration of sampling. Although we showed that the proposed method has some effectiveness for sampling transition pathways, it is not necessarily the most effective one. The development of an automatic adjustment method of optimal values for these parameters would improve the usefulness of TS-MD. For the parameters C and α, it is possible to adjust experimentally using the following strategy to some extent. As mentioned above, the larger the value of C, the broader the search tree will grow. If the value of C is too large, the search does not proceed deeply. We show a typical example of a large value of C in Figure S8a. Our implementation of TS-MD outputs such diagrams by default. However, if the value of C is too small, the search tree becomes too deep with few branches. In addition, we show a typical example of such a search tree in Figure S8b. We can adjust C while observing the growth of such a search tree. If the search tree is too wide, then we should use a smaller value for C. It is difficult to determine the optimal C, but it is possible to narrow down its range roughly. Furthermore, the same adjustment is possible for α. If α is too large, the penalty increases and the tree spreads widely. On the other hand, if α is too small, its search is close to a search without a penalty. Therefore, α can also be roughly adjusted by observing the growth of the search trees. The abovementioned procedures are not optimal, and there may be room for improvement to tune the α and C parameters.

Conclusions

In this study, we proposed TS-MD, which uses reinforcement learning to enhance protein conformational sampling. TS-MD can effectively sample various states and escape from local stable structures, based on the combination of the UCT algorithm and short MD simulations. We demonstrated that the folding processes of Chignolin and Trp-cage are sampled with higher efficiency by TS-MD, compared to PaCS-MD. Although we demonstrated TS-MD-based samplings only for the folding processes of small proteins, this method can be applied to any protein conformational transition such as domain motion and protein–ligand binding. TS-MD requires a target structure in the present. However, it is also possible to use TS-MD when the target state is unknown if the reward is changed properly as the nontargeted PaCS-MD.[59] There is room for improvement in the adjustment of the search parameters such as C, α, and the number of child nodes in TS-MD. In future work, we will consider how to determine the optimal parameter values for a given system. TS-MD would be applicable to larger systems such as T4 lysozyme and Villin. By the several previous studies,[15,60] it is shown that the open-close transition of T4 lysozyme has multiple pathways. TS-MD can be applicable to the open-close transition of such larger systems with multiple pathways by multiple search trials from different initial velocities. However, TS-MD’s performance for such larger systems needs to be validated in the future. Furthermore, because TS-MD uses a relatively simple algorithm, we will enhance it by applying many acceleration techniques that use heuristic knowledge,[61] pruning of suboptimal moves,[62] and massive parallelization.[63,64]

37 in total

1. Designing a 20-residue protein.

Authors: Jonathan W Neidigh; R Matthew Fesinmeyer; Niels H Andersen
Journal: Nat Struct Biol Date: 2002-06

2. Escaping free-energy minima.

Authors: Alessandro Laio; Michele Parrinello
Journal: Proc Natl Acad Sci U S A Date: 2002-09-23 Impact factor: 11.205

3. Accelerated molecular dynamics: a promising and efficient simulation method for biomolecules.

Authors: Donald Hamelberg; John Mongan; J Andrew McCammon
Journal: J Chem Phys Date: 2004-06-22 Impact factor: 3.488

4. Folding free-energy landscape of a 10-residue mini-protein, chignolin.

Authors: Daisuke Satoh; Kentaro Shimizu; Shugo Nakamura; Tohru Terada
Journal: FEBS Lett Date: 2006-05-11 Impact factor: 4.124

5. Canonical sampling through velocity rescaling.

Authors: Giovanni Bussi; Davide Donadio; Michele Parrinello
Journal: J Chem Phys Date: 2007-01-07 Impact factor: 3.488

6. Comparison of multiple Amber force fields and development of improved protein backbone parameters.

Authors: Viktor Hornak; Robert Abel; Asim Okur; Bentley Strockbine; Adrian Roitberg; Carlos Simmerling
Journal: Proteins Date: 2006-11-15

7. Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics.

Authors: John D Chodera; Nina Singhal; Vijay S Pande; Ken A Dill; William C Swope
Journal: J Chem Phys Date: 2007-04-21 Impact factor: 3.488