Byung Ho Lee1, Sangjae Seo2, Min Hyeok Kim3, Youngjin Kim1, Soojin Jo1, Moon-Ki Choi1, Hoomin Lee1, Jae Boong Choi1,4, Moon Ki Kim1,4. 1. School of Mechanical Engineering, Sungkyunkwan University, Suwon, Republic of Korea. 2. Department of Materials Chemistry, Nagoya University, Nagoya, Japan. 3. School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Republic of Korea. 4. SKKU Advanced Institute of Nanotechnology (SAINT), Sungkyunkwan University, Suwon, Republic of Korea.
Abstract
The biological function of proteins is closely related to its structural motion. For instance, structurally misfolded proteins do not function properly. Although we are able to experimentally obtain structural information on proteins, it is still challenging to capture their dynamics, such as transition processes. Therefore, we need a simulation method to predict the transition pathways of a protein in order to understand and study large functional deformations. Here, we present a new simulation method called normal mode-guided elastic network interpolation (NGENI) that performs normal modes analysis iteratively to predict transition pathways of proteins. To be more specific, NGENI obtains displacement vectors that determine intermediate structures by interpolating the distance between two end-point conformations, similar to a morphing method called elastic network interpolation. However, the displacement vector is regarded as a linear combination of the normal mode vectors of each intermediate structure, in order to enhance the physical sense of the proposed pathways. As a result, we can generate more reasonable transition pathways geometrically and thermodynamically. By using not only all normal modes, but also in part using only the lowest normal modes, NGENI can still generate reasonable pathways for large deformations in proteins. This study shows that global protein transitions are dominated by collective motion, which means that a few lowest normal modes play an important role in this process. NGENI has considerable merit in terms of computational cost because it is possible to generate transition pathways by partial degrees of freedom, while conventional methods are not capable of this.
The biological function of proteins is closely related to its structural motion. For instance, structurally misfolded proteins do not function properly. Although we are able to experimentally obtain structural information on proteins, it is still challenging to capture their dynamics, such as transition processes. Therefore, we need a simulation method to predict the transition pathways of a protein in order to understand and study large functional deformations. Here, we present a new simulation method called normal mode-guided elastic network interpolation (NGENI) that performs normal modes analysis iteratively to predict transition pathways of proteins. To be more specific, NGENI obtains displacement vectors that determine intermediate structures by interpolating the distance between two end-point conformations, similar to a morphing method called elastic network interpolation. However, the displacement vector is regarded as a linear combination of the normal mode vectors of each intermediate structure, in order to enhance the physical sense of the proposed pathways. As a result, we can generate more reasonable transition pathways geometrically and thermodynamically. By using not only all normal modes, but also in part using only the lowest normal modes, NGENI can still generate reasonable pathways for large deformations in proteins. This study shows that global protein transitions are dominated by collective motion, which means that a few lowest normal modes play an important role in this process. NGENI has considerable merit in terms of computational cost because it is possible to generate transition pathways by partial degrees of freedom, while conventional methods are not capable of this.
Proteins are essential components of living cells. Each protein has its own biological function, which is accompanied by conformational change of the protein. Therefore, studying this conformational change is necessary to understand the underlying mechanism of its biological functions. Various experimental methods have been developed as part of this effort. Specifically, NMR [1,2], Raman spectroscopy [3,4], electron cryo-microscopy [5,6], atomic force microscopy [7,8], and terahertz time-domain spectroscopy [9,10], as well as time-resolved methods such as x-ray scattering [11,12], the transient grating method [13-15] and light scattering [16] have expanded our understanding of the relationship between structure and function in biomolecules. Although the local vibrational movement of a protein in meta-stable states can be relatively easily captured by those experimental methods, it is still challenging to observe its global dynamics experimentally. This is due to the existence of a high-energy barrier between two conformational states, and also because capturing a very dynamic transition state is difficult with current experimental techniques. Thus, various computational attempts have sought to find intermediate structures from the known stable structures.Molecular dynamics (MD) simulation [17,18] is a representative computational method that can be used to analyze the dynamics of proteins in atomic detail. However, the conventional MD simulation is inappropriate for predicting large-scale transitions due to its computational cost, despite recent efforts to overcome time limitations [19-21]. Instead, the prediction of transition pathways based on a simplified potential function, called the elastic network model (ENM), has flourished in recent years. Unlike MD simulation, which integrates an empirical potential function, ENM exploits a Hookean potential function and can significantly reduce the computational cost. The ENI first attempted to find intermediate structures by interpolating the distance between two end-point conformations of a target protein [22-24]. This transition pathway generation technique was already implemented on an online morph server called KOSMOS [25]. The mixed elastic network model (MENM) was also developed to study large-scale conformational transitions [26]. In the MENM method, the Boltzmann-weighted Go potentials for the end-point structures are combined into a smooth potential function, which interpolates conformation. The interpolated ENM (iENM) was proposed by constructing a double-well potential function from the ENMs of two end-point structures [27]. To improve conventional coarse-grained ENM-based methods without the loss of physical reality, the hybrid ENI considers the rigidity information of conformation when generating transition pathways [28]. In addition, the ANMPathway used two end-point ENMs and constructed two-state potential resulting in a cusp hypersurface [29]. The minimum energy conformation on this cusp hypersurface is regarded as the transition conformation.Meanwhile, normal mode analysis (NMA) based on coarse-grained modeling has addressed predicting collective motions at low frequency and succeeded in explaining biological functions in terms of the collective motion [30-34]. Since the transition process mostly shows the large collective motion, there have been efforts to employ low-frequency mode shapes predicted by NMA to the transition pathway. Firstly, collective MD iteratively obtains the transition pathways by combining NMA based on ENM, which guides the collective dynamics with MD evaluating local dynamics and atomic interactions [35]. The optimized torsion-angle normal modes in internal coordinates generated more accurate transition pathways than the Cartesian modes [36]. Next, in the anisotropic network model-Monte Carlo (ANM-MC) method, normal modes are also used for describing intermediate structures on transition. Then, MC algorithm is applied to minimize their conformational energy in order to predict the transition pathway successfully [37]. Such NMA-based transition pathway prediction tools were also addressed through online web servers. NMSim reproduces the feasible transition pathways using rigid-cluster NMA in Cartesian coordinates [38], while iMODS performs the pathway generation based on NMA in internal coordinates [39].In this work, we present a new simulation method based on ENM, which is called the normal mode guided elastic network interpolation (NGENI). This method can generate pathways on a large-scale transition process by iteratively performing NMA. The conventional ENI method generates conformational pathways between the initial and the final conformations by obtaining displacement vectors iteratively toward a direction which linearly reduces root-mean-square deviation (RMSD) between the two given structures [22-24]. On the other hand, the displacement vectors in NGENI are composed of a linear combination of normal mode vectors. Because of this methodological difference, NGENI generates theoretically more reliable pathway than ENI of which the pathway doesn’t take into account any physical aspects such as the intrinsic thermal vibration, but only complies with the given topological constraints in Cartesian space. As the existing morphing methods such as MENM, iENM and ANMPathway also utilize either topological constraints or potential energy landscape for prediction of large-scale transition pathways, our method is expected to be a new solution considering both aspects in a balanced way. In addition, NGENI is able to adjust degrees of freedom in the simulation by determining the number of normal modes to be used. This enables us to reduce the computational cost. The validity of NGENI has been verified with extensive testing by comparing NGENI to ENI.
Materials and methods
A set of proteins
In this work, we choose a set of 8 proteins for which two end-point structures are available at the Protein Data Bank (PDB). The RMSD between the two end-point structures was more than 3 Å for all tested proteins. Such a topological difference is enough to test whether a protein undergoes conformational changes that cause its own biological functions. The information about these proteins is listed in Table 1.
Table 1
Fundamental information about a set of 8 large-scale transition proteins.
Protein
PDB IDAa
PDB IDBb
No. of residues (n)
RMSD (Å)
No. of iterationsc (s)
Resolutiond (Å)
T4 lysozyme
178L
256L
162
3.4
34
1.80
Maltodextrin binding protein
1OMP
1ANF
370
3.8
38
1.67
D-allose binding protein
1GUD
1RPJ
288
4.5
45
1.70
LAO binding protein
2LAO
1LST
238
4.7
47
1.80
5’-nucleotidase
1OID
1OI8
525
5.5
55
2.10
Ribose-binding protein
1BA2
2DRI
271
6.2
62
1.60
Adenylate kinase
4AKE
1AKE
214
7.1
71
2.00
Ribonuclease III
1YZ9
1YYO
438
7.3
73
2.10
a Initial conformation of each protein.
b Final conformation of each protein.
c The number of iterations (no. of iterations) is simply determined by multiplying the RMSD value by 10 since NGENI generates the intermediate conformations, which have an RMSD of 0.1 Å to the previous step at each iteration.
d The smaller resolution value between the two conformations of each protein is selected as a representative here.
a Initial conformation of each protein.b Final conformation of each protein.c The number of iterations (no. of iterations) is simply determined by multiplying the RMSD value by 10 since NGENI generates the intermediate conformations, which have an RMSD of 0.1 Å to the previous step at each iteration.d The smaller resolution value between the two conformations of each protein is selected as a representative here.
Outline of NGENI
The purpose of the NGENI method is to construct a pathway between two end-point structures based on low-frequency modes, which are most relevant to biological function. The pathway comprises consecutive displacement vectors between an intermediate structure and the next one. To obtain these vectors, we established an objective function in which potential energy linearly decreases with respect to the RMSD value by interpolating the distance between spatially close residues (more detailed description on RMSD calculation is given in S1 Text). Here, the coordinates are collected from the position of the alpha carbon atoms (Cαs), and spatially close residues are connected by linear springs based on the ENM concept [40,41]. The key idea of the NGENI is to generate transition pathways based on normal mode vectors. A linear combination of normal mode vectors yields the corresponding intermediate structure. By adjusting the contribution weighting of each mode vector, we can define the vector set that satisfies the desired value of potential energy. Iteratively, a series of these displacement vectors are obtained from the initial to the final conformation to create a possible transition pathway. Here, the number of iterations is preset by multiplying the RMSD value between the two end-point structures by 10 in order to get a smooth transition pathway with the RMSD increment of 0.1 Å every iteration step. The overall scheme of NGENI is shown in Fig 1 and more details about the proposed objective function are also described in the following chapter.
Fig 1
Schematic of the proposed NGENI method.
(A) A flow chart of NGENI. If the total number of iterations is s, then k increases iteratively from 1 to s-1. (B) A schematic description of a large-scale transition pathway. Specifically, the left figure labeled ‘Start’ refers to an initial structure, the right one labeled ‘End’ refers to a final structure, and the center one labeled ‘kth’ indicates an intermediate structure at the kth step.
Schematic of the proposed NGENI method.
(A) A flow chart of NGENI. If the total number of iterations is s, then k increases iteratively from 1 to s-1. (B) A schematic description of a large-scale transition pathway. Specifically, the left figure labeled ‘Start’ refers to an initial structure, the right one labeled ‘End’ refers to a final structure, and the center one labeled ‘kth’ indicates an intermediate structure at the kth step.
Cost function
For a protein composed of n residues (e.g., Cαs based on coarse-grained ENM), the coordinates for the two end-point structures are denoted by {x} and {y}, respectively. When we use the m lowest normal modes in the simulation, we can define the displacement vector for the ith residue as follows
where v is the mth normal mode vector of the ith residue and c is a weighting constant of the mth normal mode. Thus, we can define
andHere, we introduce a cost function as follows
where k is an element of the linking matrix which contains binary information about virtual spring connection in ENM. The value of l is the desired distance between the ith residue and jth residue at a certain intermediate state. The value of l can be determined as
where α is a scale factor that interpolates between the initial distance ‖x − x‖ and the final one ‖y − y‖. It ranges from 0 (i.e., initial) to 1 (i.e., final) with an increment of 1/s. Again, s is the total number of iterations here.We simplify the cost function in Eq (4) in order to obtain the optimum solution of C. First, we define C as a part of the cost functionThis equation can be simplified into a quadratic equation in terms of C by using a Taylor series approximation for small values of VC and VC.
where
and E3 is the 3 by 3 identity matrix.Then, we write Eq (6) as
where
andIn in Eq (10), we define asThen, we can rewriteConsidering all the spring connections, we can obtain the following equation
where Λ(1) ∈ R is defined asNext, for in Eq (11), taking asThen,We can also simplify the term such that
where Λ(2) ∈ R1× isLastly, let Λ(3) be a constant such thatConsequently, we can derive a quadratic form of the cost function by substitution of Eqs (16), (20), and (21) into (9).Our goal is to determine the value of C, which minimizes Eq (22). To do that, the C has to satisfy the following equationThrough this computation, we can obtain the optimum weighting constant C and determine the optimum displacement vectors from an intermediate state to the next one. Iteratively, this process can generate a conformational transition pathway between the given two end-point structures. From a computational point of view, the whole process can be divided into two parts: NMA and the main computational part from Eqs (9) to (23). First, the time complexity of NMA is O(nm2) when using the “eigs” function in MATLAB. This function is well designed to find the largest/smallest magnitude eigenvalues of sparse matrix efficiently using Krylov subspace methods including Lanczos and Arnoldi algorithms [42,43]. Next, in the main computational part, the most computational effort is required to construct the cost function in Eq (16) with the time complexity of O(nm2). Consequently, the overall time complexity of optimum-NGENI can be expressed as O(n) when m is a constant. On the other hand, in ENI, the computation time is mainly consumed by multiplication/inversion of large and sparse matrix with the time complexity of O(n2) (see S2 Text for further details).
Results and discussion
Verification of NGENI by using the full degrees of freedom
We first evaluated the performance of NGENI with all normal modes (full-NGENI) by comparing it with the conventional ENI pathways [22-24] in terms of average RMSD values. For 8 large-scale transition proteins, we obtained average RMSD which is the average of RMSD values between two corresponding intermediate conformations generated by full-NGENI and conventional ENI for all iteration steps. As a result, the negligibly small RMSD values indicate that the full-NGENI generated similar pathways to those of conventional ENI for all cases (Table 2). This is because the full normal mode vectors can take the complete set of degrees of freedom (DOF) into account. Mathematically speaking, this is nothing more than a different representation of the bases that constitute the given topological space. Therefore, we confirmed that the full-NGENI could generate reliable pathways for a large-scale transition process based on the fact that the conventional ENI pathway has already been verified elsewhere [22-24].
Table 2
Comparison between the full-NGENI and the conventional ENI pathways.
Protein
Resolution (Å)
Average RMSDa (Å)
T4 lysozyme
1.80
0.0879
Maltodextrin binding protein
1.67
0.0296
D-allose binding protein
1.70
0.0012
LAO binding protein
1.80
0.0002
5’-nucleotidase
2.10
0.0021
Ribose-binding protein
1.60
0.0031
Adenylate kinase
2.00
0.0029
Ribonuclease III
2.10
0.1620
a It indicates the averaged RMSD value between the full-NGENI and the conventional ENI pathways. Here, all the ENI pathways are automatically generated through KOSMOS online server [25].
a It indicates the averaged RMSD value between the full-NGENI and the conventional ENI pathways. Here, all the ENI pathways are automatically generated through KOSMOS online server [25].
Potential of NGENI with partial degrees of freedom
As low frequency modes dominantly show collective motion, one can significantly reduce computational cost without loss of generality of the pathway by using only those modes. Here, we define another NGENI with partial DOF, called optimum-NGENI, and test whether the optimum-NGENI is still able to generate reasonable transition pathways. Fig 2 illustrates this concept, in which each curved line represents a transition pathway inside a cylindrical space spanned by corresponding normal modes. The smaller search space of the optimum-NGENI enables us to dramatically reduce the computational cost. The size of the searching space can be easily adjusted by the number of normal modes taken in the optimum-NGENI, but it has not been determined whether this optimum number satisfies all general cases.
Fig 2
A schematic of different NGENI pathways in terms of degrees of freedom.
A cylindrical tube represents the searching space for a transition pathway depicted by a curved line. The optimum-NGENI is shown in blue, while the full-NGENI is shown in red. Both pathways are generated from the same initial and target structures.
A schematic of different NGENI pathways in terms of degrees of freedom.
A cylindrical tube represents the searching space for a transition pathway depicted by a curved line. The optimum-NGENI is shown in blue, while the full-NGENI is shown in red. Both pathways are generated from the same initial and target structures.To determine the optimum number of lowest normal modes used in the optimum-NGENI, the quality of the optimum-NGENI pathway was evaluated using RMSD between intermediate conformations and the final given structure for every iteration step. If these RMSD values are less than its experimental resolution, then the proposed optimum-NGENI with a particular number of lowest normal modes is considered to satisfy the convergence condition. Although the optimal number of normal modes may be different in each case, 30 lowest normal modes seem to be sufficient to generate reliable pathways, as shown in Table 3.
Table 3
Convergence of the optimum-NGENI.
Protein
Resolution (Å)
Convergence condition, ma
5
10
20
30
40
Full
T4 lysozyme
1.80
O
O
O
O
O
O
Maltodextrin binding protein
1.67
X
O
O
O
O
O
D-allose binding protein
1.70
O
O
O
O
O
O
LAO binding protein
1.80
O
O
O
O
O
O
5’-Nucleotidase
2.10
X
X
X
O
O
O
Ribose-binding protein
1.60
O
O
O
O
O
O
Adenylate Kinase
2.00
O
O
O
O
O
O
Ribonuclease III
2.10
X
O
O
O
O
O
If the convergence condition is satisfied, it is marked ‘O’. If not, it is marked ‘X’.
a The number of lowest normal modes used in the optimum-NGENI test. The full-NGENI case is also listed in the last column as a reference.
If the convergence condition is satisfied, it is marked ‘O’. If not, it is marked ‘X’.a The number of lowest normal modes used in the optimum-NGENI test. The full-NGENI case is also listed in the last column as a reference.For further assessment of this convergence condition, Fig 3 presents the transition pathways of two proteins: adenylate kinase and D-allose binding protein. Adenylate kinase catalyzes the transfer of a phosphoryl group from ATP to AMP [44] and undergoes rigid body motions of the NMPbind and LID domains with two pairs of hinges connecting each domain to CORE domain [45]. Fig 3A (upper) includes an actual simulation result showing that the optimum-NGENI successfully generates the rigid body movements of adenylate kinase. In addition, both full-NGENI and optimum-NGENI pathways are compared to each other in Fig 3A (lower). Unlike full-NGENI, the error of the optimum-NGENI pathway is accumulated at the end and obviously caused by the missing DOF. However, this error is acceptable compared to the experimental resolution of the adenylate kinase structure. This result suggests that only a small portion of the lowest normal modes is sufficient to predict transition pathways without loss of generality because this portion has enough information to comprehend biologically important collective protein motion. Fig 3B also confirms similar results for D-allose binding protein, which has three hinge points between two domains [46]. The upper figure in Fig 3B shows that the generated pathway demonstrates the hinge movement without difficulty, and the lower one verifies that the pathway of optimum-NGENI meets the convergence condition. Furthermore, their reverse transition pathways (from closed to open form) were also generated by optimum-NGENI. They not only satisfy the convergence condition but also preserve realistic geometry during the reverse transition (see S3 Text).
Fig 3
RMSD comparison between the optimum-NGENI and the conventional ENI pathways for adenylate kinase and D-allose binding protein.
(A) Adenylate kinase, (B) D-allose binding protein. The upper figures show rough transition pathways of proteins using representative intermediate structures. The lower graphs show changes in RMSD between the final structure and intermediate conformations generated by two different methods: full-NGENI (dashed red) and optimum-NGENI (solid blue). The black dotted line represents experimental resolution of each protein.
RMSD comparison between the optimum-NGENI and the conventional ENI pathways for adenylate kinase and D-allose binding protein.
(A) Adenylate kinase, (B) D-allose binding protein. The upper figures show rough transition pathways of proteins using representative intermediate structures. The lower graphs show changes in RMSD between the final structure and intermediate conformations generated by two different methods: full-NGENI (dashed red) and optimum-NGENI (solid blue). The black dotted line represents experimental resolution of each protein.
Availability of optimum-NGENI for large proteins
Thus far, we have defined optimum-NGENI as the NGENI using only the first 30 lowest normal modes through the evaluation of convergence condition of generated transition pathways. Now, we test if optimum-NGENI is still able to generate transition pathways for relatively large proteins such as group II chaperonin. The detailed structural information of group II chaperonin is provided in Table 4.
Table 4
Structural information of group II chaperonin.
Protein
PDB IDA
PDB IDB
No. of residues (n)
RMSD (Å)
No. of iterations (s)
Resolution (Å)
Group II chaperonin
3IYF
3J03
3928
15.4
154
4.80
As shown in Fig 4A, the optimum-NGENI pathway successfully describes the hinge-bending motions of the intermediate domains which play a key role in the folding mechanism of the group II chaperonin. Moreover, RMSD values of optimum-NGENI and ENI are compared to each other in Fig 4B. Although the optimum-NGENI pathway shows higher RMSD error at the end stage, it can still converge below the experimental resolution. Lastly, the bond lengths and bond angles are measured for evaluating physical reality of the proposed pathway (Fig 4C). We have also confirmed that difference with the two end-point structures is negligible for bond length (less than 0.03Å).
Fig 4
The optimum-NGENI pathway of group II chaperonin.
(A) Transition pathway from 3IYF (open) to 3J03 (closed). (B) RMSD comparison between optimum-NGENI (solid blue) and ENI (dashed red). The black dotted line represents experimental resolution. (C) Variation of bond length (solid black) during the conformational change.
The optimum-NGENI pathway of group II chaperonin.
(A) Transition pathway from 3IYF (open) to 3J03 (closed). (B) RMSD comparison between optimum-NGENI (solid blue) and ENI (dashed red). The black dotted line represents experimental resolution. (C) Variation of bond length (solid black) during the conformational change.Ideally speaking, there is no size limitation to optimum-NGENI. For large system, we can expect much higher computational efficiency by using a finite number of meaningful normal modes as the driving force of pathway generation. Although there is still an argument on how to predetermine the number of normal modes required to capture the system dynamics without any loss of generality, empirically speaking (also supported by our case study results), the first 30 lowest normal modes are enough to generate the transition pathway successfully. Also, it is noted that this number is not determined by the size of protein structure but the complexity of conformational transition.
Quality of optimum-NGENI pathway
To address whether the optimum-NGENI method achieves goals as good as the conventional ENI, the performance of both methods is compared using the average RMSD values for our protein set including the group II chaperonin (Table 5). Here, these values are obtained from averaging RMSD values between two corresponding intermediate conformations generated by optimum-NGENI and ENI for every iteration step. As the average value is smaller than the corresponding resolution value for all cases, topological difference between two pathways is negligible. To evaluate quality of the optimum-NGENI pathways for adenylate kinase and D-allose binding protein, we compared their weighting constants with those of full-NGENI (Fig 5). For both cases, the first 30 weighting constants of the full-NGENI were very similar to those of the optimum-NGENI, in the sense that several lowest modes dominantly influence the transition pathway in proteins by generating large-scale and collective motion. Quantitatively speaking, the correlation coefficients between the two methods are 0.988 and 0.992 for adenylate kinase and D-allose binding protein, respectively. This result indeed validates that the proposed optimum-NGENI method can generate transition pathways as good as the conventional ENI does with only the fixed number of lowest normal modes (i.e., 30 in this context). The weighting constants for all the other protein pathways are also provided in S1 Fig.
Table 5
Comparison between the optimum-NGENI and the conventional ENI pathways.
Protein
Resolution (Å)
Average RMSDa (Å)
T4 lysozyme
1.80
0.30
Maltodextrin binding protein
1.67
0.58
D-allose binding protein
1.70
0.40
LAO binding protein
1.80
0.36
5’-nucleotidase
2.10
1.20
Ribose-binding protein
1.60
0.34
Adenylate kinase
2.00
0.55
Ribonuclease III
2.10
0.42
Group II chaperonin
4.80
2.09
a The average RMSD indicates the average of RMSD values between intermediate conformations generated by optimum-NGENI and conventional ENI for all iteration steps.
Fig 5
Comparison of weighting constants between the full-NGENI and the optimum-NGENI.
(A) Adenylate kinase, (B) D-allose binding protein. Weighting constants of normal modes for the full-NGENI (red dashed line) and the optimum-NGENI (blue solid line) are compared to each other. They represent the average values of weighting constants for all iteration steps (optimum-NGENI: modes 1 to 30, full-NGENI: all modes).
a The average RMSD indicates the average of RMSD values between intermediate conformations generated by optimum-NGENI and conventional ENI for all iteration steps.
Comparison of weighting constants between the full-NGENI and the optimum-NGENI.
(A) Adenylate kinase, (B) D-allose binding protein. Weighting constants of normal modes for the full-NGENI (red dashed line) and the optimum-NGENI (blue solid line) are compared to each other. They represent the average values of weighting constants for all iteration steps (optimum-NGENI: modes 1 to 30, full-NGENI: all modes).
Computational complexity of optimum-NGENI
The main advantage of NGENI is that one can incorporate large collective motions effectively when predicting transition pathways in proteins, because NGENI generates transition pathways considering geometric constraints and physical mechanics, despite the simple interpolation method. Of course, the computational cost of NGENI is higher than that of ENI because NGENI has to perform NMA at every iteration step to update intermediate conformations. Using a finite number of normal modes, however, the optimum-NGENI can overcome this drawback because it drastically reduces computational burden in the main computation in which the next intermediate conformation is determined by displacement vectors obtained from NMA. A rough mathematical calculation with big O notation yields that the optimum-NGENI follows O(nm), whereas the conventional ENI follows O(n) where n is the number of residues and m is a constant number of normal modes utilized in optimum-NGENI (see Materials and methods and S2 Text for more details). As the size of protein increases, the conventional ENI method requires much more computational time than optimum-NGENI.This relationship is verified by comparison of the actual computation time for both methods. For appropriate comparison, we take into account the average computing time to obtain each intermediate conformation. Fig 6 shows that computation time of the ENI (denoted by red circles) grows quadratically with respect to protein size, while the corresponding computation time of optimum-NGENI (denoted by blue quadrangles) increases linearly. Therefore, the optimum-NGENI method can be a reliable alternative to the conventional ENI by balancing physical realism and computational cost, regardless of protein size.
Fig 6
Computational cost comparison between optimum-NGENI and ENI.
For 9 example proteins, the average computation time for each iteration step is measured and individually marked with respect to protein size. Optimum-NGENI (the conventional ENI) is denoted by blue quadrangles (red circles) and the curve fitting lines are also added for both methods.
Computational cost comparison between optimum-NGENI and ENI.
For 9 example proteins, the average computation time for each iteration step is measured and individually marked with respect to protein size. Optimum-NGENI (the conventional ENI) is denoted by blue quadrangles (red circles) and the curve fitting lines are also added for both methods.
Conclusions
There are significant challenges in using experimental techniques to capture temporally lengthy, large-scale protein dynamics at the atomic level, so simulation methods play an important role in filling this gap by generating transition pathways between different conformational states, which are strongly related to biological functions. However, there is still concern regarding simulation reliability and computational cost. To compensate for this weakness, this work proposes a new morphing method called NGENI that interpolates the distance between spatially close residues based on a linear combination of normal mode vectors. This key idea helps us generate topologically allowable and physically reliable pathways.Furthermore, the optimum-NGENI successfully provides in-depth study on transition pathway generation. First, it can elucidate how well a minimum number of collective modes generate protein transition pathways. Second, the concept of the optimum weighting constant can be also interpreted as a quantitative measure of the contribution of each mode to the transition pathway. Third, it compromises computational cost with the physical realism of the generated transition pathway by taking only a fixed number of lowest normal modes as a basis for searching space.Consequently, it is expected that the optimum-NGENI not only assesses degrees of collectivity in protein dynamics, but also captures its functional transition pathway through a linear combination of several intrinsic vibration modes.
Details on the root-mean-square deviation (RMSD).
(DOCX)Click here for additional data file.
Details on the computational cost of the NGENI described by big O notation.
(DOCX)Click here for additional data file.
A review of reverse transition pathways from closed to open form generated by optimum-NGENI.
(DOCX)Click here for additional data file.
Weighting constants of optimum-NGENI.
(A) T4 lysozyme, (B) Maltodextrin binding protein, (C) D-allose binding protein, (D) LAO binding protein, (E) 5’-nucleotidase, (F) Ribose-binding protein, (G) Adenylate kinase, (H) Ribonuclease III, (I) Group II chaperonin. The values described in the graphs represent average weighting constants for all iteration steps.(TIF)Click here for additional data file.
Density map of linking matrix.
(A) D-allose binding protein (B) Group II chaperonin.(TIF)Click here for additional data file.
RMSD comparison between original and reverse pathways for adenylate kinase and D-allose binding protein.
(A,B,C) Adenylate kinase, (D,E,F) D-allose binding protein. The graphs show changes in RMSD between the target structure and an intermediate conformation for the two pathways: original pathway (dashed red) and reverse pathway (solid blue). Three different methods are used to generate transition pathways: (A,D) optimum-NGENI, (B,E) 100-NGENI using the 100 lowest normal modes, and (C,F) full-NGENI. The black dotted line represents experimental resolution of each protein.(TIF)Click here for additional data file.
Density of linking matrix for the set of proteins.
(DOCX)Click here for additional data file.
Variation of bond length over the reverse pathways of adenylate kinase and D-allose binding protein.
Authors: Moritz Pfreundschuh; David Alsteens; Manuel Hilbert; Michel O Steinmetz; Daniel J Müller Journal: Nano Lett Date: 2014-05-05 Impact factor: 11.189