Literature DB >> 16689704

A quasi-physical algorithm for the structure optimization in an off-lattice protein model.

Abstract

In this paper, we study an off-lattice protein AB model with two species of monomers, hydrophobic and hydrophilic, and present a heuristic quasi-physical algorithm. First, by elaborately simulating the movement of the smooth solids in the physical world, we find low-energy conformations for a given monomer chain. A subsequent off-trap strategy is then proposed to trigger a jump for a stuck situation in order to get out of the local minima. The algorithm has been tested in the three-dimensional AB model for all sequences with lengths of 13-55 monomers. In several cases, we renew the putative ground state energy values. The numerical results show that the proposed methods are very promising for finding the ground states of proteins.

Entities: Chemical Disease Gene

Mesh：

Substances：
Proteins

Year: 2006 PMID： 16689704 PMCID： PMC5054034 DOI： 10.1016/S1672-0229(06)60018-1

Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN： 1672-0229 Impact factor: 7.691

Introduction

Predicting the native structure of a protein from its amino acid sequence is one of the most challenging problems in biophysics and bioinformatics. The difficulty of the problem comes from two aspects. One is the determination of the potential energy function. The effective energy function can generally distinguish the native states from non-native states of protein molecules. The other is that the potential energy landscape of the system can be characterized by a multitude of local minima separated by high-energy barriers. In order to simplify and clarify these two aspects of protein folding phenomena, in resent years the theoretical community has introduced and examined several highly simplified, but still nontrivial models, including a large family of hydrophobic-polar (HP) lattice 1., 2., 3., 4., 5. and off-lattice models 6., 7.. Even though these models are highly simplified, to solve the corresponding folding problem remains to be NP-hard. In recent years, a wide variety of approximate algorithms have been employed to analyze these models, including the sequential importance sampling with pilotexploration (SISPER; ref. ) based on the important sampling, the multi-self-overlap ensemble (MSOE) approach ( based on the Monte Carlo scheme, and the pruned-enriched Rosenbluth method (PERM; ref. 10., 11., 12., 13.). All these approaches have been applied to the simplified protein folding problems. However, their efficiency still needs to be improved. The present paper attempts to find a highly efficient heuristic algorithm that can obtain the low-energy states for a given monomer chain. We use a so-called quasi-physical method 14., 15., 16., whose working path is to find a natural phenomenon equivalent to the protein folding problem in the physical world. Then we observe the evolution of the motion of matter in it so as to be inspired to obtain a formalistic algorithm for solving the problem. To see whether the quasi-physical method can be efficient for energy minimization in the protein folding problem, in this paper we introduce a so-called AB model by Stillinger et al. 17., 18., where the hydrophobic monomers are labeled by A and the hydrophilic or polar ones are labeled by B. This model has been studied in several papers 17., 18., 19., 20., 21., 22., 23., 24., 25., 26.. The methods used to find low-energy states of the AB model include neural networks (, conventional Metropolis type Monte Carlo procedures (, the annealing contour Monte Carlo method (, the simulated tempering (, biologically motivated methods 21., 22., multicanonial methods 23., 24., 26., and the new PERM with importance sampling (nPER-Mis; ref. ). For its two-dimensional (2D) version, the putative ground states for various AB sequences with various chain lengths have been given in previous studies 17., 18., 19., 21., 25., and for its three-dimensional (3D) version, the putative ground states for four Fibonacci sequences with lengths of 13–55 monomers have also been obtained 25., 26.. The present paper studies the 3D version of this model. The numerical results show that the proposed methods are very promising for finding the ground states of proteins.

Algorithm

The AB model

For an n-monomer chain, the distances between the consecutive monomers along the chain are fixed to the unit length, while the non-consecutive monomers interact through a modified Lennard-Jones potential. In addition, there is an energy contribution from each angle θ(−π ≤ θ < π) between consecutive bonds. More precisely, the total energy function 17., 18., 19., 21., 25., 26. for an n-monomer chain can be written as Here r is the distance between monomers i and j (i < j). Each ζ is either A or B. The first term is the bending energy, favoring the alignment of the three successive monomers i − 1, i, and i + 1. The second term is the Lennard-Jones potential with a species-dependent coefficient C(ζ, ζ), which is taken to be 1 for an AA pair, 1/2 for a BB pair, and −1/2 for an AB pair, giving strong attraction, weak attraction, and weak repulsion, respectively. The protein folding problem for the AB model can be formally defined as follows: given a monomer chain s = ζ1 ζ2 ζ3 … ζn, we try to find an energy-minimizing conformation of s, that is, to find X* ϵ C(s) so that E(X*) = min {E(X)ǀX ϵ C(s)}, where C(s) is the set of all the valid conformations of s.

The quasi-physical method

Imagine that all n monomers involved in the model are smooth solids with the radius of each being 1/2, which are marked by 1, 2, …, n and are cast randomly in the 3D Euclidean space, then the potential energy E is a known function of the coordinates of all n monomers with the constraint: To remove Constraint (2) and convert the constrained optimization problem into the unconstrained optimization problem, we conceive that there exists a spring with the original length 1 between the centers of the i and (i+1)th monomers (i = 1, 2, …, n−1). According to the Hooks’ law, the spring’s elastic potential energy between two adjacent monomers is proportional to the square of the length of the spring transformation. So we can give the elastic potential energy as follows:where k1 is a physical coefficient characterizing the rigidities of all the springs. The sum of the potential energy of the whole system is U = E + Eʹ. Obviously, the potential energy U is a known function of the coordinates X of all the monomers, and Eʹ is the “penalty” term in the potential energy U. When the physical coefficient k1 is great enough, the optimal solution of the unconstrained optimization problem of the known function U(X) is also the optimal solution of the constrained optimization problems (1)–(2). Thus, the protein folding problem is converted into an unconstrained optimization problem of U(X). For this optimization problem, we can employ a ready-made algorithm, the gradient method, or the steepest descent method. By integrating the quasiphysical idea into the gradient method, we gain a quasi-physical algorithm. Assume that X(0) is the initial conformation for iterations. If the conformation X( (t ≥0) is not a local minimum, a new conformation X( = X( − λ∇U( is obtained in the anti-gradient direction of the energy function U(X) at X(, where λ is the iterative step length and –∇U( is the iterative search direction. From the undated conformation X( we repeat this course of iterations until a global energy minimum conformation X is found, or a trap of local minimum X (|∇U(| < 10−6) occurs. In the latter case, a completely new round of quasi-physical calculation should be initiated from a new initial conformation. In our simulations of computation, we let λ0 = 0.5 × 10−6, k1ϵ [1, 000, 2, 000]. In order to speed up the ground state search, when calculating the gradient ∇U in iterations, we modify the Lennard-Jones potential through multiplying it by a physical coefficient k2= 10,000, meaning the Van Del Waals interaction coefficient. Another trick is to modify the physical coefficient k1 by multiplying the factor 1.3 and k2 by 0.7 until k2 < 1 per 50,000 iterative steps in the course of optimization. In the later stage of calculation, or when |∇U(| < 10−2, we modify k1 by multiplying the factor 1.1 and decrease the step length λ by multiplying a step shrinking factor 0.9 per 50,000 iterative steps. Therefore, in the beginning of calculation, the physical coefficient of all the springs is small so that all monomers can move freely and attain easily low-energy states. Thereafter, along with the execution of calculation, the physical coefficient k1 increases gradually so as to increase the penalty and make Constraint (2) satisfied gradually, and at last the interactions of springs disappear. Obviously, when the physical coefficient k1 rises to a big number, for example ≥1010, that is, when the springs turn rigid, Constraint (2) is satisfied naturally, and a global energy minimum conformation is found, or a trap of local minimum occurs.

The off-trap strategy

The calculating experience tells us when all the A-monomers fold into a hydrophobic core, the potential energy of the whole system will turn low. For jumping out of local minima, we can pick out all B-monomers squeezed among A-monomers, and place them in certain spots in 3D space to speed up the lowest-energy state search. The concrete description of the off-trap strategy is as follows: (1) Calculate the center of all A-monomers; (2) Compute the distance from the center to every A-monomer, signing the greatest distance as d; (3) Calculate respectively the distance from every B-monomer to the center. For every B-monomer, as long as the distance < d, it is our strategy to pick out the corresponding B-monomer and place it somewhere three times of the distance away from the center, in the vector direction from the center to the B-monomer picked. Keeping all A-monomers and the rest of B-monomers at their current positions, we can obtain a new conformation, where in addition to the changes of bending energy and elastic potential energy, long-range Lennard-Jones interactions of the monomers, with their relative position to each other changed, have to be computed anew after the update. By integrating the off-trap strategy into the quasi-physical algorithm, we gain a heuristic quasi-physical (HQP) algorithm. The calculation is executed by using the quasi-physical algorithm until a certain minimum conformation is reached. If now the energy E is lower than the target value and the system is satisfied with Constraint (2), the energy will be scored and the computation will terminate, otherwise the calculation point will jump to a new position through the off-trap strategy and the computation will proceed with the quasi-physical algorithm. If the off-trap strategy is repeated up to ten times and the simulation does not find states with lower energy E than the target value, it is our practice to randomly choose another initial conformation for a new round of HQP calculation.

Results and Discussion

We implemented the HQP algorithm in the C++ language on a Pentium IV, 2.0 GHz computer. For the sake of examining the calculation, in this paper we restricted ourselves to the AB model with the Fibonacci sequences in previously studies 17., 18., 19., 21., 25., 26.. The Fibonacci sequences are defined recursively by S0=A, S1=B, S=S ⁎ S Here “⁎” is the concatenation operator, for example, the first few sequences are S2=AB, S3=BAB, S4=ABBAB, and so on. They have the lengths given by n=n+n, that is, given by the Fibonacci numbers. Following the previous studies 25., 26., in this paper we considered the sequences with lengths n=13, 21, 34, and 55, which are listed in Table 1.

Table 1

The Four Fibonacci Sequences in the AB Model

n	Sequence (“B₂” for BB)
13	AB₂AB₂ABAB₂AB
21	BABAB₂ABAB₂AB₂ABAB₂AB
34	AB₂AB₂ABAB₂AB₂ABAB₂ABAB₂AB₂ABAB₂AB
55	BABAB₂ABAB₂AB₂ABAB₂ABAB₂AB₂ABAB₂AB₂ABAB₂ABAB₂AB₂ABAB₂AB

In our simulations, the initial conformations were chosen according to the following strategy. Given two concentric spheres with the origin as their centers and r1=2n, r2=5n as their radii respectively, where n is the length of the chain considered. Cast randomly all A-monomers in small sphere and B-monomers in the region between small and big spheres. After producing the initial conformation, we can execute the HQP algorithm to compute the position of every ball hereafter at every time. The calculating results showed that along with the increment in the calculation steps, at last all monomers would tend to be stable. In the four conformations that were the solution of the problem, Constraint (2) was satisfied approximately. The error margin was smaller than 10−6, that is: |r − 1| < 10−6 (i = 1, 2, …, n−1). We employed the HQP algorithm to compute these sequences and produced the final results that are shown in Table 2 in comparison with those of other studies 25., 26.. The conformations of putative ground states are shown in Figure 1.

Table 2

Comparison of HQP and Other Algarithms on the Estimation of the Global Energy Minima for the Four Fibonacci Sequences in the AB Model*

n	PERM	PERM+	MUCA	ELP	HQP (CPU time#)
13	−3.9730	−4.9616	−4.967	−4.967	−4.9729 (5,674 s)
21	−7.6857	−11.5238	−12.296	−12.316	−12.2554 (8,924 s)
34	−12.8601	−21.5678	−25.321	−25.476	−24.8083 (24,265 s)
55	−20.1070	−32.8843	−41.502	−42.428	−42.5199 (39,124 s)

The values are compared with the results quoted in Hsu et al. ( employing the PERM and the subsequent conjugate gradient (PERM+) minimization, and with the lowest energies listed in Bachmann et al. ( obtained with the multicanonical (MUCA) Monte Carlo method and the energy landscape paving (ELP) minimization

CPU time means the time needed in a certain running to get the listed energies on the Pentium IV, 2.0 GHz computer.

Fig. 1

Stereographic views of putative ground states of the four Fibonacci sequences listed in Table 1. Full dots and empty circles indicate hydrophobic and hydrophilic monomers, respectively.

The experiments showed that the HQP algorithm considerably outperformed the PERM algorithm and the subsequent conjugate gradient (PERM+) algorithm in Hsu et al. ( for the Fibonacci sequences of Table 1 in the AB model. This was particularly pronounced for the longest chain considered. In addition, for the monomer chains with n = 21, 34, and 55, we found putative ground states different from those given in Hsu et al. (, which stated that the chains with n= 21 and 34 folded into conformations with single hydrophobic cores (except for a single A-monomer that kept out in both cases), and the chain with n= 55 formed two clearly disjointed main hydrophobic groups. From the conformations (Figure 1) produced by the HQP simulation, we easily see that each of the four Fibonacci sequences has a single hydrophobic core. Indeed, with this fact we are able to refute the claims for putative ground states in Hsu et al. (, and agree well with what comes out in Bachmann et al. (. We also compared our results with the minimum energies listed in Bachmann et al. (, where the so-called multicanonical (MUCA) Monte Carlo method and the energy landscape paving (ELP) minimization were applied to these Fibonacci sequences. Table 2 shows that HQP runs lower energies for the sequences with 13 and 55 monomers, while the results for those with 21 and 34 monomers are also comparable. Moreover, the CPU time used by the HQP algorithm should be less than or comparable to those used by the PERM+ and MUCA methods. Hsu et al. ( did not give the exact CPU time of their runs. They just mentioned that their results were obtained on their Linux or Unix workstation with up to 2 days of CPU time, while Bachmann et al. ( even did not mention the CPU time of their runs. Furthermore, we will apply the methods proposed in this paper to all-atom models with realistic potential by combining it with the simulated annealing or the genetic methods, and design various kinds of higher performance algorithms for the protein folding problem.

13 in total

A quasi-physical algorithm for the structure optimization in an off-lattice protein model.

Introduction

Algorithm

The AB model

The quasi-physical method

The off-trap strategy

Results and Discussion

1. Global minimization of an off-lattice potential energy function using a chaperone-based refolding method.

2. Structure optimization in an off-lattice protein model.

3. Annealing contour Monte Carlo algorithm for structure optimization in an off-lattice protein model.

4. Proteins with selected sequences fold into unique native conformation.

5. Multicanonical study of coarse-grained off-lattice models for folding heteropolymers.

6. Toy model for protein folding.

7. Collective aspects of protein folding illustrated by a toy model.

8. Theory for the folding and stability of globular proteins.

Review 9. Principles of protein folding--a perspective from simple exact models.

10. Kinematics and thermodynamics of a folding heteropolymer.