Literature DB >> 25136273

Conformational Transition Pathways of Epidermal Growth Factor Receptor Kinase Domain from Multiple Molecular Dynamics Simulations and Bayesian Clustering.

Yan Li¹, Xiang Li², Weiya Ma¹, Zigang Dong¹.

Abstract

The epidermal growth factor receptor (EGFR) is aberrantly activated in various cancer cells and an important target for cancer treatment. Deep understanding of EGFR conformational changes between the active and inactive states is of pharmaceutical interest. Here we present a strategy combining multiply targeted molecular dynamics simulations, unbiased molecular dynamics simulations, and Bayesian clustering to investigate transition pathways during the activation/inactivation process of EGFR kinase domain. Two distinct pathways between the active and inactive forms are designed, explored, and compared. Based on Bayesian clustering and rough two-dimensional free energy surfaces, the energy-favorable pathway is recognized, though DFG-flip happens in both pathways. In addition, another pathway with different intermediate states appears in our simulations. Comparison of distinct pathways also indicates that disruption of the Lys745-Glu762 interaction is critically important in DFG-flip while movement of the A-loop significantly facilitates the conformational change. Our simulations yield new insights into EGFR conformational transitions. Moreover, our results verify that this approach is valid and efficient in sampling of protein conformational changes and comparison of distinct pathways.

Entities: Chemical Disease Gene Mutation Species

Year: 2014 PMID： 25136273 PMCID： PMC4132868 DOI： 10.1021/ct500162b

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Protein is intrinsically dynamical, and conformational changes are essential to its functions. Large-scale conformational rearrangements are involved in signaling transduction, enzyme catalysis, and protein folding.[1] Molecular dynamics (MD) simulations are a powerful tool to model such motions and deepen our understanding of the relationship between protein structure and function.[2] Studies of protein conformational changes and the intermediates that are formed along the transition pathway provide valuable insights into the process by which the protein is activated or inactivated. However, characterization of the transition pathway and intermediates as well as establishing their role in conformational rearrangements remains a major challenge. To tackle this problem, a plethora of methods have been developed to enhance sampling of the conformational change, which is a rare event. A straightforward approach is to run ultralong unbiased MD simulations.[3] It is, however, still difficult to obtain adequate sampling of conformational changes which usually occur on microsecond to millisecond time scales or beyond, especially for a large biomolecular system. To overcome this limitation, a variety of biased methods have been proposed. One approach aims to speed up conformational transitions by use of elevated temperatures or a biasing potential such as parallel tempering or replica exchange MD (REMD),[4,5] umbrella sampling,[6] metadynamics,[7,8] accelerated MD,[9,10] temperature-accelerated MD,[11] adaptive biasing force,[12] and so on. REMD and its variants are popular in sampling rare events while its efficiency has been discussed in detail.[13−17] For a bias potential, a finite number of collective variables (order parameters) needs to be defined for description of transitions. Selection of collective variables is not trivial and has been discussed in depth.[18] The other class of rare events method, path-based sampling, is of greatest relevance to the present work. These path-building techniques have been developed to construct a path between two end points, including milestoning,[19] the string method,[20] path metadynamics,[21] transition path sampling,[22,23] weighted ensemble,[24,25] fast marching method,[26] nudged elastic band,[27,28] targeted MD (TMD),[29] steered MD,[30,31] and so on. These approaches have been applied for exploration of large-scale conformational changes of biomolecules. One or a few transition pathways can be reconstructed with any method. However, fundamental questions still remain to be answered. How can these pathways be compared? Which one is the most likely transition pathway? Here we present a strategy combining multiply targeted MD (MTMD), unbiased MD simulations, and Bayesian clustering to address these questions. This approach is helpful not only for exploration of transition pathways and metastable states along them but also for comparison of distinct pathways. In our method, the phase space is first partitioned by a series of seeding structures, which are generated by MTMD. Then, a long unbiased MD simulation (200 ns) is initiated on each seeding structure to achieve local equilibration and explore the subspace. The ensemble of MD trajectories is analyzed by a clustering method based on the Bayesian model to determine metastable states and transitions between them. The strategy of running multiple MD trajectories for sampling rare events has been investigated previously.[19,25,32,33] The improvements in our approach are that, (i) with MTMD, we can define transition pathways from an initial state to a final state via different intermediate states; (ii) the seeding structures are not necessary to be in equilibrium; (iii) long unbiased MD simulations are performed to explore likely transitions between neighboring seeding structures; (iv) specification of collective variables is not needed; and (v) Bayesian clustering is carried out to reconstruct conformational transitions from the ensemble of MD trajectories. With these features, it is likely to design different transition pathways between two end points and compare them. The initial structure, final structure, and intermediate structures can be obtained from previous crystallographic studies or MD simulations. To test this approach, we apply it to the activation/inactivation process of the kinase domain of the epidermal growth factor receptor (EGFR). EGFR, a receptor tyrosine kinase, has emerged as a target of choice for treatment of various cancers.[34] Crystallographic studies have revealed different conformational states of EGFR kinase domain (Supporting Information (SI) Figure S1): the DFG-in active state (A), the Src-like inactive state (T), and the DFG-out inactive state (I). For simplicity, we will use A, T, and I in the following to represent these three forms, respectively. The major structural changes among them are movement of the activation loop (A-loop), DFG-flip (conversion between DFG-in and DFG-out), and displacement of αC helix. In form A, it adopts DFG-in conformation; the A-loop is in the extended state; Glu762 located in αC helix forms a salt bridge with Lys745, which is highly conserved and important for kinase activity.[35] Form I adopts DFG-out conformation, and the A-loop is also extended while Lys745 and Glu762 (the ion pair) are separated because of outward rotation of the αC helix. Form T adopts the DFG-in conformation, and the ion pair are apart from each other while the A-loop is in the compact mode. The mechanism that controls the transition from inactive forms of kinases to active ones as well as intermediate states has attracted much interest.[33,36−43] Based on analysis of Abl crystal structures and molecular simulations, it was proposed that the Src-like inactive form (T) functioned as an intermediate in the activation pathway to facilitate DFG-flip.[44] This assumption was supported by computer simulations of EGFR and other kinases.[37,41,42] However, it is still not clear if DFG-flip can happen without passing form T. If it is likely, is any intermediate state involved which has not been determined experimentally? In addition, the roles of A-loop and αC helix in this process need to be elucidated. Deep understanding of the activation/inactivation pathways and intermediate states of kinases will be helpful for development of high specific anticancer drugs. In this work, we design and compare two distinct transition pathways of EGFR kinase domain to investigate under what condition DFG-flip can happen. One is the conformational change between A and I via T, which will be called ATI pathway in the following. The other one is direct transition between A and I without any intermediate, which will be called AI pathway. First, the two pathways are reconstructed using the strategy mentioned above. Then, in comparison with crystal structures, several intermediate states are recognized based on clustering results. Finally, rough two-dimensional (2D) free energy surfaces are plotted to illustrate transitions between different conformations. Comparison of two predefined pathways yields new insights into the conformational transition of EGFR kinase domain. Moreover, our results verify that this approach is valid and efficient in sampling of large-scale conformational changes and determination of the energy-favorable pathway from available ones.

Methods

Preparation of EGFR Sructures

Three EGFR structures were taken from the Protein Data Bank (PDB). The DFG-in active conformation was obtained from PDB entry 2ITW.[45] The Src-like inactive conformation was obtained from PDB entry 3W32.[46] The DFG-out inactive conformation was obtained from PDB entry 4I20.[47] Two mutations in the structure of 4I20 (L858R and V948R) were changed back to the wild type within Maestro v9.3. Other atoms except the protein were removed. Some residues at the terminus were deleted so that the three structures had the same 284 residues.

TMD and MTMD

The targeted molecular dynamics (TMD) and multiply targeted molecular dynamics (MTMD) implemented in the Amber 11 program suite[48] were employed to construct the transition pathways between EGFR active and inactive states. TMD adds an additional term to the energy function. It has the following form: TGTMDFRC is the force constant for targeted MD. NATTGTRMS is the number of atoms specified for root mean square deviation (RMSD) calculation. RMSD is the current RSMD for the selected atoms. TGTRMSD is the value of the target RMSD. In contrast, the force constant in MTMD is changing with step number by a single factor R:INCREMENTS is the number of times the target value changes. The protein was modeled using Amber ff03 force field.[49] All three structures were subjected to energy minimization first using the steepest descent algorithm in 1000 steps and then the conjugate gradient algorithm in 10000 steps. For TMD simulations, the starting structure is 2ITW (A) and the target structure is 3W32 (T). Three TMD simulations ran for 0.5 ns with different force constants (1, 0.1, and 0.01 kcal/(mol·Å2)). For MTMD, each simulation ran for 1 ns. For the I-T-A transition, form I (4I20) was used as the starting structure; form T (3W32) was first set as the target structure. The force constant was applied to all heavy atoms and varied from 0.0001 to 2.0 kcal/(mol·Å2) in 0.5 ns as the simulation steps increased to drive the conformational changes to form T. Then, form A (2ITW) was set as the target structure. The force constant was reset to 0.0001 kcal/(mol·Å2) and increased to 2.0 kcal/(mol·Å2) in continuous 0.5 ns to drive the further transition to form A (2ITW). The same setting was also applied in the simulations of A-T-I and I-A-I. The TMD and MTMD simulations were performed in the implicit solvent model of generalized Born[50] using the SANDER module. Snapshot structures were extracted from produced trajectories at a time interval of 1 ps.

K-Means Clustering

To generate an ensemble of representative structures from each MTMD trajectory, clustering was performed on backbone RMSDs of protein and the activation loop. Form A (2ITW) was used as the reference structure. Snapshot structures from each MTMD trajectory were superposed onto the reference structure using backbone atoms to remove overall rotation and translation. Then RMSD values were calculated with respect to the reference structure as the distance metrics. Three MTMD trajectories were divided into six transition trajectories each lasting for 0.5 ns. For each transition, 500 snapshot structures were extracted and classified with the K-means method implemented in the statistical software R v3.0. For ATI pathway, four transition trajectories were included and each of them was clustered into 11 classes. For AI pathway, two transition trajectories were included and each of them was clustered into 9 classes. From each class, the structure closest to the centroid was chosen as the representative structure for following MD simulations.

Molecular Dynamics

All MD simulations were carried out with Amber 11.[48] The equations of motion were solved with the leapfrog integration algorithm with a time step of 2 fs. The lengths of all bonds involving hydrogen atoms were kept constrained with the SHAKE algorithm.[51] The particle mesh Ewald (PME) method was applied for treating long-range electrostatic interactions.[52] Periodic boundary condition was used in all simulations. A random seed was generated based on the current date and time for every run to assign initial velocities. The protein was modeled using the Amber ff03 force field.[49] For each seeding structure, it was explicitly solvated in a rectangular box of TIP3P[53] water molecules with a minimal distance of 10 Å from the protein to the box edges. Two chloride ions were added to neutralize uncompensated charges. After the whole system was set up, a series of energy minimizations and equilibrations were performed. First, the water molecules, hydrogen atoms, and ions were subjected to 3000 steps of steepest descent minimization followed by 12000 steps of conjugate gradient minimization while other heavy atoms were constrained with the harmonic force of 2 kcal·mol–1·Å–2. Next, the whole system was energy minimized with 10000 steps of L-BFGS algorithm without any harmonic restraint. Then, coupled to a Langevin thermostat, the system was heated from 10 K up to 300 K by increments of 100 K in 20 ps and continued to run for 40 ps at 300 K at constant volume. Finally, the system was equilibrated for 200 ps in NPT ensemble with the Langevin thermostat and isotropic position scaling, at 300 K and 1 bar. The production run for each seeding structure was carried out for 200 ns in NVT ensemble with the Langevin thermostat at 300 K using the parallel CUDA version of PMEMD on 2 GPUs.[54,55] The trajectories were sampled at a time interval of 10 ps. Distribution of simulation temperatures is shown in the Supporting Information. All simulations were performed on our Linux cluster with 2 GPUs and 12 CPUs on each node. The analysis of trajectories was performed using PTRAJ.[56]

Clustering Based on Bayesian Model

Bayesian clustering is an unsupervised method that seeks a maximum posterior probability classification by an expectation-maximization (EM) algorithm. The advantage is that there is no need to specify the cutoff radius or the number of clusters in advance. The optimal solution is determined by calculating the maximum posterior parameter values and the most probable classification.[57] It has been compared with other clustering algorithms.[58] In the present work Bayesian clustering was performed with AutoClass C v3.3.6[59] in the dihedral angle space. The structure of the EGFR kinase domain was described by 282 pairs of backbone dihedral angles. The structures for Bayesian clustering were sampled from unbiased MD trajectories. For ATI pathway, snapshot structures were extracted at a time interval of 0.4 ns from each trajectory and a total of 23500 structures was obtained. For AI pathway, snapshot structures were taken at an interval of 0.2 ns from each trajectory and a total of 20000 structures was obtained. The clustering process started with a serial of seed numbers which gave an initial guess of the number of clusters. Then a random classification was generated and refined until a local maximum was found. This process was repeated 1000 times, and the top two classifications with the highest probability were output. The 2D free energy surfaces based on these sampled structures were plotted (Supporting Information).

Free Energy Surface

The calculation of free energy is given bywhere W0 is the depth of the free energy surfaces (FES), P is the probability distribution, kB is the Boltzmann constant, and T is the temperature. All sampled structures from unbiased MD simulations are employed for 2D FES calculations and 1D convergence analysis. For 2D FESs, the 2D distribution function P(r) is computed by the histogram analysis method HIST2D implemented in the package of GPLOTS with a bin area of 0.3 Å × 0.3 Å. One-dimensional free energy surfaces are presented in the Supporting Information, which are calculated over the three metrics (A-loop RMSD, DFG RMSD, and ion-pair distance) at an interval of 40 ns of simulation time. The 1D distribution function is calculated by the histogram analysis method implemented in R v3.0 with a bin width of 0.1 Å.

Results

Generation of Seeding Structures

Compared to other approaches,[60] TMD is a valid tool for construction of a transition pathway, which does not require the specification of collective variables. In the standard TMD method there are two inefficient features. One is that a weak constraint may hold back the trajectory from reaching the target. The other one is the so-called “large-scale-first”,[61] which means that large-scale motions occur before small-scale ones. Thus, the order of events in conformational transition is associated with the direction in which the simulation is performed. To avoid these questions, we employ MTMD, a TMD variant, to generate the transition pathway for sampling of seeding structures. Compared with TMD, MTMD allows not only multiple reference structures being used but also a constraint evolving in time. The MTMD simulations were performed with the force constant varying from very weak (0.0001 kcal/(mol·Å2)) to weak (2.0 kcal/(mol·Å2)) to guarantee that the final structure would be within heavy-atom RMSD 1.0 Å of the target structure and large-scale motions would not happen at the beginning. We also ran MTMD simulations in both activation and inactivation directions to neutralize the possible bias in one-direction simulations. Three MTMD trajectories are presented in Figure 1. Two of them are for ATI pathway: one is from state I to A via T and the other one is from A to I via T. The third trajectory is for AI pathway: from state I to A and then back to I. In the MTMD simulations the starting structure is gradually changed into the target structure. And DFG-flip occurs after 0.4 ns in each transition (SI Figure S2). In contrast, three TMD trajectories with different constraints are also shown in Figure 1. In TMD simulations major conformational changes are done in a short time. This simple comparison indicates that MTMD produces a trajectory of conformational changes more smooth and consecutive than TMD does. Detailed comparison of MTMD and TMD is beyond the scope of this work and will be left for future studies.

Figure 1

Time evolution of heavy-atom RMSD in MTMD and TMD simulations. Three MTMD simulations are the conformational transition from state I to A via T (I-T-A), the transition from state A to I via T (A-T-I), and the transition between states I and A (I-A-I), respectively. Conformational transition from state A to T is also simulated by TMD with different force constants (1.0, 0.1, and 0.01 kcal/(mol·Å2)). Three crystal structures, 2ITW, 3W32, and 4I20 are the DFG-in active form (A), Src-like inactive form (T), and DFG-out inactive form (I), respectively. The RMSD values are calculated with respect to the target structure (2ITW, 3W32, or 4I20). The sampled structures from MTMD simulations were clustered using K-means method based on backbone RMSDs of both EGFR kinase domain and A-loop, considering that A-loop displays the highest flexibility in crystal structures. The structure closest to the centroid in one cluster was selected as the representative structure. For ATI pathway, 44 snapshot structures were chosen and 47 seeding structures (plus 3 crystal structures) were prepared for MD simulations. For AI pathway, 18 snapshot structures were selected and 20 seeding structures (plus 2 crystal structures) were prepared for MD simulations.

MD Simulations

The structural fluctuation from the ensemble of MD trajectories in two distinct pathways is characterized by four metrics: backbone RMSD of EGFR kinase domain (backbone RMSD), backbone RMSD of the A-loop (residues 854–878, A-loop RMSD), heavy-atom RMSD of the DFG motif (residues 855–857, DFG RMSD), and distance between the center of mass of Lys745 and Glu762 (the ion-pair distance). Backbone RMSD describes the overall conformational changes of the EGFR kinase domain; A-loop RMSD evaluates the movement of the A-loop between the extended state and the compact mode; DFG RMSD indicates the conformation of the DFG motif (DFG-in or DFG-out); the ion-pair distance (Lys745-Glu762) reflects displacement of αC helix. As a reference, the ion-pair distance and RMSD values of three crystal structures with respect to the crystal structure of 2ITW (form A) are listed in Table 1. Distribution of the measurements sampled from MD trajectories is displayed in Figure 2, in which the four metrics have different features of distribution in two pathways, suggesting that sampling is affected by predefined pathways. As expected, the A-loop RMSD varies more widely in ATI pathway than in AI pathway, though the overall structure (backbone RMSD) does not change much in both pathways. The DFG RMSD and ion-pair distance fluctuate in the same range with distinct characterization of distribution in two pathways, suggesting the influence of designed pathways on their movements.

Table 1

Comparison of Three EGFR Crystal Structuresa

	RMSD (Å)
crystal structure	backbone	A-loop	DFG	distance of ion pair (Å)
2ITW (A)	0	0	0	8.7
3W32 (T)	4.4	10.5	3.7	14.4
4I20 (I)	3.3	2.9	6.7	12.4

The crystal structure of 2ITW (A) is used as the reference structure for RMSD calculations.

Figure 2

Distribution of RMSDs and ion-pair distance: (A) Data from 47 MD trajectories in ATI pathway; (B) data from 20 MD trajectories in AI pathway. The DFG-in active conformation (2ITW) is used as the reference structure. A snapshot structure is first superposed onto the reference structure using backbone atoms. Then RMSD values of backbone, A-loop, and DFG are calculated. The density is computed by the histogram analysis implemented in R v3.0 with a bin width of 0.1 Å.

Identification of Intermediate States

Bayes theorem was already used in biophysics studies.[62−64] Clustering based on the Bayesian model was previously employed for analysis of MD trajectories.[65] An analysis of clustering simulation data indicated that the clustering results to some extent depended on the number of available sampled structures.[66] Considering their results and computation time, we extracted 23500 structures for ATI pathway (500 structures from each MD trajectory) and 20000 structures for AI pathway (1000 structures from each MD trajectory) for Bayesian clustering. The clustering was performed in the dihedral angle space. The EGFR structure was described by 282 pairs of backbone dihedral angles. For ATI pathway, 23500 structures were grouped into 94 classes (class 0–93) in the optimal classification. For AI pathway, 85 classes (class 0–84) were obtained in the optimal resolution. The four metrics aforementioned are employed for comparison of sampled structures to the crystal structures (for the full data of the top 20 populated classes, see SI Tables S1 and S2). By them, we not only quantify the overall agreement of structures in classes with crystal structures but also identify intermediate structures classified by Bayesian clustering, which has not been reported yet. For ATI pathway, structures in classes 0, 2, and 11 have very good agreement with all structural properties of forms T, I, and A, respectively, with average RMSD values below 3.0 Å and similar ion-pair distances (for selected data, see Table 2; for full data, see SI Table S1). Except the known structures, conformations in some other classes are unique in that they do not correspond to any crystal structure. For example, structures in class 7 exhibit backbone and A-loop RMSDs as well as an ion-pair distance close to form T, but adopt the DFG-out conformation. Similar properties are also found in classes 9 and 16. For AI pathway (for selected data, see Table 2; for full data, see SI Table S2), structures in class 0 have good agreement with the structural properties of form I. Structures in class 3 exhibit properties similar to form A, typically with a good agreement to the DFG-in conformation, backbone RMSD, and ion-pair distance as well as a better correspondence to the A-loop RMSD. Similarly, some conformations distinguished from the crystal structures are available. Structures in class 1 have similar backbone RMSD and A-loop RMSD to form I while DFG RMSD and the distance of the ion pair are close to form T. Structures in class 9 adopt the DFG-out conformation, and RMSDs of the backbone and A-loop are closer to form I than to form A while the ion pair is likely to be in contact, which makes them look like the Imatinib-induced Abl structure.[67] Although this structure has been solved for some kinases, it has not been observed in EGFR structures experimentally determined. The representative structures selected from mentioned classes are shown in SI Figure S3.

Table 2

Structural Properties of Representative Classes from Bayesian Clustering and Their Comparisons to the Reference Crystal Structure

			av RMSD (Å)
	class no.	ref structure	backbone	A-loop	DFG	av distance of ion pair (Å)
ATI pathway	0	3W32 (T)	2.1	2.7	1.6	15.5
	2	4I20 (I)	2.6	2.1	1.5	15.0
	7	2ITW (A)	5.4	11.2	6.6	12.9
	11	2ITW (A)	2.4	2.8	1.9	8.5
	14	2ITW (A)	4.4	4.0	3.9	13.6
	31	2ITW (A)	4.2	5.6	6.0	13.9
AI pathway	0	4I20 (I)	2.6	2.2	1.5	16.0
	1	2ITW (A)	3.4	3.2	4.4	15.9
	3	2ITW (A)	2.3	3.3	1.8	8.4
	7	2ITW (A)	3.9	3.9	6.9	11.9
	9	2ITW (A)	4.0	4.1	6.2	9.7

Further examination of fluctuation of the structural properties shows that EGFR undergoes conformational transitions in some classes, though it is pretty stable in most of the classes (for the top 20 populated classes, see SI Figure S4). For example, in class 1 of ATI pathway the conformational change between DFG-out and DFG-in is clearly observable (Figure 3A). It is coupled with movement of the A-loop and alteration of the ion-pair distance. Similarly, structures in class 13 of AI pathway also exhibit conformational change of the DFG motif while the A-loop does not move far away from the extended state (Figure 3B). These kinds of changes in adopted structural properties occur more frequently in less-populated classes than in populated ones, suggesting transitions between intermediate states.

Figure 3

Evolution of structural properties in the class from Bayesian clustering: (A) Class 1 in ATI pathway; (B) class 13 in AI pathway. Four metrics are displayed: backbone RMSD, A-loop RMSD, DFG RMSD, and distance of the ion pair (Lys-Glu). The RMSD values are calculated with respect to the crystal structure of 2ITW (form A).

Two-Dimensional Free Energy Surfaces

To visualize the transition pathways and intermediate states, we tried to project the dihedral angles space into a two-dimensional FES. The principle component analysis was carried out, but the first two eigenvalues could only account for 13% of the variability of the data, which made them not a good choice. Then we turned to the metrics mentioned previously. Three metrics (DFG RMSD, A-loop RMSD, and distance of the ion pair) were employed for plotting of rough 2D FESs to investigate the relationship between DFG-flip and movement of the A-loop as well as displacement of the αC helix (Figure 4).

Figure 4

2D free energy surfaces for activation/inactivation of EGFR kinase domain. FESs for ATI pathway are shown in A and B; FESs for AI pathway are shown in C and D. Three metrics are employed for FES plotting: DFG RMSD, A-loop RMSD, and distance of the ion pair. The RMSD values are calculated with the crystal structure of 2ITW (form A) as the reference. Energy basins corresponding to forms A, T, and I are denoted. Intermediate states are labeled with prefix IS. Energy wells on FESs are compared to the classes from Bayesian clustering. For ATI pathway, three populated ensembles labeled as T, I, and A (Figure 4A,B) correspond to classes 0, 2, and 11 (Table 2 and SI Figure S3), displaying similar features with the crystal structures. IS1, IS2, and IS3 (Figure 4A) denote three intermediate states, corresponding to classes 7, 31, and 14 (Table 2), respectively. States A, T, and IS3 adopt the DFG-in conformation while I, IS1, and IS2 adopt the DFG-out conformation. For AI pathway, energy basins labeled as I and A (Figure 4C,D) correspond to classes 0 and 3, having properties similar to those of the crystal structures. IS4, IS5, and IS6 denote three different forms (Figure 4D), corresponding to classes 1, 9, and 7, respectively (Table 2). States I, IS5, and IS6 adopt the DFG-out conformation while A and IS4 adopt the DFG-in conformation. From FESs, the transition between EGFR active and inactive forms is apparently visible. In ATI pathway, DFG-flip occurs with little energy cost (Figure 4A,B). In contrast, DFG-flip happens less frequently in AI pathway, which is clearer in Figure 4D than in Figure 4C. Our results suggest that ATI pathway is more energetically favorable than AI pathway, though DFG-flip can happen in both pathways. Previous studies suggest that displacement of the αC helix[38,44] or A-loop movement[41] is the major energy barrier mediating the transition. Here, we further examine the role of A-loop and αC helix in DFG-flip. In both B and D of Figure 4, DFG-flip occurs when the ion pair is broken apart. When the ion pair is in contact, no transition of the DFG motif is observed, possibly because of sterical hindrance. This suggests that DFG-flip is coupled with disruption of the ion pair. On the other hand, in Figure 4C the A-loop fluctuates around the extended state, while, in Figure 4A, the A-loop floats away from the extended state and reaches the compact mode. DFG-flip happens more frequently in Figure 4A than in Figure 4C, suggesting that the transition is strongly influenced by movement of the A-loop. Thus, our results yield a view that breaking the ion pair is critically important for DFG-flip and movement of A-loop significantly facilitates the conformational change. This may explain why ATI pathway is more energetically preferred than AI pathway. Interestingly, except the predefined pathway (A-T-I), another transition pathway appears between form A and I (A-IS3-IS2-I in Figure 4A). In this pathway, the A-loop just moves slightly away from the extended state. We deduce that disruption of the ion pair plus a small movement of A-loop provides enough space for DFG-flip. If this is the case, considering that the A-loop displays high flexibility in kinase crystal structures and distinct A-loop conformations of EGFR have been recognized,[42] it is very likely that there exist multiple transition pathways between the active and inactive forms as well as various intermediate states. In addition, we notice that state IS5 appears in AI pathway (Figure 4D) but is not sampled in ATI pathway (Figure 4B). Examination of clustering results also shows that none of the classes in ATI pathway has structural properties similar to those of IS5. This may suggest that some conformation is not preferred for unbound EGFR. In ATI pathway, the inactive forms (T and I) and the conversion between them are energetically favorable (Figure 4B), and sampling is trapped in the energy wells. This agrees with a previous report that the isolated EGFR kinase domain in monomeric form has low catalytic activity.[68] In AI pathway, however, transition of the DFG motif is impeded because of constraint of the A-loop (Figure 4D), which may make state IS5 energetically favorable. Our results are consistent with the observation that the Imatinib-induced Abl structure (corresponding to IS5 in Figure 4D and class 9 in SI Figure S3B) is rarely found in unbound kinases but appears in kinase–ligand complexes. Possibly because this form is not energetically preferred and becomes stable under certain conditions such as ligand binding or constraint of the A-loop. Previously, conformational transitions of Src and Abl have been investigated with biased MD simulations and the free energy may be overestimated.[40] Compared to their data, the free energy calculated in this work may be underestimated. Interestingly, our results are close to the free energy computed for conformational changes of wide type and mutant EGFRs recently reported.[43] The difference in estimated free energy may result from distinct kinases, simulation methods or choice of order parameters. To improve performance of our approach, more seeding structures may be needed. But it is still a challenge to select appropriate order parameters.

Cracking of αC Helix

Studies have demonstrated that local protein unfolding (cracking) is common in conformational changes, and αC helix is disordered in EGFR conformations.[69−72] Based on previous reports, displacement of αC helix may be caused by outward rotation or local unfolding. Outward rotation of αC helix among kinase crystal structures has been discussed in detail.[35,44] Here we further examine variability of secondary structure of αC helix by calculating the time when residues of αC helix are in the helix status (SI Figure S5). In all MD trajectories three residues (residue nos. 764–766) are highly stable and keep in the helix status. In contrast, other residues to varying extent undergo the order–disorder transitions. As indicated in the previous section, disruption of the ion pair induced by displacement of the αC helix is necessary for DFG-flip. Our results suggest that partial unfolding of he αC helix may accompany the transition between EGFR active and inactive forms and facilitate DFG-flip.

Conclusion

We present a strategy combining MD simulations and Bayesian clustering to investigate conformational transitions between EGFR active and inactive forms. Compared to previous reports, our results verify that this approach is valid and efficient in sampling rare events. Moreover, with this approach, distinct transition pathways can be compared and the energy-favorable one can be distinguished. It is usually difficult to observe the conformational changes using constant-temperature simulations even on microsecond time scales. Running multiple MD simulations initiated on different conformational states may effectively cross the energy barrier and overcome the conformational sampling problem. With the help of a high parallel computing platform, this strategy is easily implemented. Bayesian clustering is then employed to recognize intermediate states and transitions along the pathway from the ensemble of MD trajectories. Although it is still a challenge to project the transition pathway from a high-dimensional space to a two-dimensional space, selection of appropriate order parameters is helpful for investigating the effect of local fluctuations on global conformational changes. Since the catalytic domain is highly conserved in all kinases and the three forms of EGFR (A, T, and I) have also been found in other distantly related kinases, the conformational changes between them may be a general mechanism of kinases activation and inactivation.[35] On the other hand, it is believed that each inactive conformation of kinase has some special features which distinguish it from other states. And kinase inactive forms have been the targets for development of high specific anticancer drugs in the pharmaceutical industry. Therefore, determination of unreported inactive conformations and transitions between them provides new targets for cancer treatment.

61 in total

Review 1. Steered molecular dynamics and mechanical functions of proteins.

Authors: B Isralewitz; M Gao; K Schulten
Journal: Curr Opin Struct Biol Date: 2001-04 Impact factor: 6.809

2. Nonlinear elasticity, proteinquakes, and the energy landscapes of functional transitions in proteins.

Authors: O Miyashita; J N Onuchic; P G Wolynes
Journal: Proc Natl Acad Sci U S A Date: 2003-10-17 Impact factor: 11.205

3. A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations.

Authors: Yong Duan; Chun Wu; Shibasish Chowdhury; Mathew C Lee; Guoming Xiong; Wei Zhang; Rong Yang; Piotr Cieplak; Ray Luo; Taisung Lee; James Caldwell; Junmei Wang; Peter Kollman
Journal: J Comput Chem Date: 2003-12 Impact factor: 3.376

4. Escaping free-energy minima.

Authors: Alessandro Laio; Michele Parrinello
Journal: Proc Natl Acad Sci U S A Date: 2002-09-23 Impact factor: 11.205

5. Accelerated molecular dynamics: a promising and efficient simulation method for biomolecules.

Authors: Donald Hamelberg; John Mongan; J Andrew McCammon
Journal: J Chem Phys Date: 2004-06-22 Impact factor: 3.488

6. Computing time scales from reaction coordinates by milestoning.

Authors: Anton K Faradjian; Ron Elber
Journal: J Chem Phys Date: 2004-06-15 Impact factor: 3.488

7. Reaction coordinates and rates from transition paths.

Authors: Robert B Best; Gerhard Hummer
Journal: Proc Natl Acad Sci U S A Date: 2005-04-06 Impact factor: 11.205

8. The Amber biomolecular simulation programs.

Authors: David A Case; Thomas E Cheatham; Tom Darden; Holger Gohlke; Ray Luo; Kenneth M Merz; Alexey Onufriev; Carlos Simmerling; Bing Wang; Robert J Woods
Journal: J Comput Chem Date: 2005-12 Impact factor: 3.376

9. Structural mechanism for STI-571 inhibition of abelson tyrosine kinase.

Authors: T Schindler; W Bornmann; P Pellicena; W T Miller; B Clarkson; J Kuriyan
Journal: Science Date: 2000-09-15 Impact factor: 47.728

10. Exploring protein native states and large-scale conformational changes with a modified generalized born model.

Authors: Alexey Onufriev; Donald Bashford; David A Case
Journal: Proteins Date: 2004-05-01

13 in total

1. Statistical analysis of EGFR structures' performance in virtual screening.

Authors: Yan Li; Xiang Li; Zigang Dong
Journal: J Comput Aided Mol Des Date: 2015-10-17 Impact factor: 3.686

2. Cancer-associated arginine-to-histidine mutations confer a gain in pH sensing to mutant proteins.

Authors: Katharine A White; Diego Garrido Ruiz; Zachary A Szpiech; Nicolas B Strauli; Ryan D Hernandez; Matthew P Jacobson; Diane L Barber
Journal: Sci Signal Date: 2017-09-05 Impact factor: 8.192

3. Structural Characterization of the Aurora Kinase B "DFG-flip" Using Metadynamics.

Authors: Naga Rajiv Lakkaniga; Meenakshisundaram Balasubramaniam; Shuxing Zhang; Brendan Frett; Hong-Yu Li
Journal: AAPS J Date: 2019-12-18 Impact factor: 4.009

4. In silico studies of interactions of peptide-conjugated cholesterol metabolites and betulinic acid with EGFR, LDR, and N-terminal fragment of CCKA receptors.

Authors: Madeline M Bashant; Saige M Mitchell; Lucy R Hart; Charlotta G Lebedenko; Ipsita A Banerjee
Journal: J Mol Model Date: 2021-12-28 Impact factor: 1.810

5. Redefining the Protein Kinase Conformational Space with Machine Learning.

Authors: Peter Man-Un Ung; Rayees Rahman; Avner Schlessinger
Journal: Cell Chem Biol Date: 2018-05-31 Impact factor: 8.116

6. Altered conformational landscape and dimerization dependency underpins the activation of EGFR by αC-β4 loop insertion mutations.

Authors: Zheng Ruan; Natarajan Kannan
Journal: Proc Natl Acad Sci U S A Date: 2018-08-13 Impact factor: 11.205

7. Exploration of gated ligand binding recognizes an allosteric site for blocking FABP4-protein interaction.

Authors: Yan Li; Xiang Li; Zigang Dong
Journal: Phys Chem Chem Phys Date: 2015-12-28 Impact factor: 3.676

8. Dynamic, structural and thermodynamic basis of insulin-like growth factor 1 kinase allostery mediated by activation loop phosphorylation.

Authors: Yaozong Li; Kwangho Nam
Journal: Chem Sci Date: 2017-03-20 Impact factor: 9.825

9. Structural Features and Ligand Selectivity for 10 Intermediates in the Activation Process of β₂-Adrenergic Receptor.

Authors: Tao Liang; Yuan Yuan; Ran Wang; Yanzhi Guo; Menglong Li; Xuemei Pu; Chuan Li
Journal: ACS Omega Date: 2017-12-01

10. Conformational landscape of the epidermal growth factor receptor kinase reveals a mutant specific allosteric pocket.

Authors: Srinivasaraghavan Kannan; Gireedhar Venkatachalam; Hong Hwa Lim; Uttam Surana; Chandra Verma
Journal: Chem Sci Date: 2018-05-16 Impact factor: 9.825