Literature DB >> 25420233

DFGmodel: predicting protein kinase structures in inactive states for structure-based discovery of type-II inhibitors.

Abstract

Protein kinases exist in equilibrium of active and inactive states, in which the aspartate-phenylalanine-glycine motif in the catalytic domain undergoes conformational changes that are required for function. Drugs targeting protein kinases typically bind the primary ATP-binding site of an active state (type-I inhibitors) or utilize an allosteric pocket adjacent to the ATP-binding site in the inactive state (type-II inhibitors). Limited crystallographic data of protein kinases in the inactive state hampers the application of rational drug discovery methods for developing type-II inhibitors. Here, we present a computational approach to generate structural models of protein kinases in the inactive conformation. We first perform a comprehensive analysis of all protein kinase structures deposited in the Protein Data Bank. We then develop DFGmodel, a method that takes either a known structure of a kinase in the active conformation or a sequence of a kinase without a structure, to generate kinase models in the inactive conformation. Evaluation of DFGmodel's performance using various measures indicates that the inactive kinase models are accurate, exhibiting RMSD of 1.5 Å or lower. The kinase models also accurately distinguish type-II kinase inhibitors from likely nonbinders (AUC > 0.70), suggesting that they are useful for virtual screening. Finally, we demonstrate the applicability of our approach with three case studies. For example, the models are able to capture inhibitors with unintended off-target activity. Our computational approach provides a structural framework for chemical biologists to characterize kinases in the inactive state and to explore new chemical spaces with structure-based drug design.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2014 PMID： 25420233 PMCID： PMC4301084 DOI： 10.1021/cb500696t

Source DB: PubMed Journal: ACS Chem Biol ISSN： 1554-8929 Impact factor: 5.100

The protein kinase family is one of the largest protein families in human, comprising 518 different kinases that function as on/off switches in cellular signaling pathways and modulate almost all basic cellular activities.[1,2] Malfunctions in protein kinases are associated with various diseases,[3] such as cancer,[4] and autoimmune disorders.[5] Therefore, protein kinases are one of the most pursued targets for drug development.[6,7] In fact, 25 kinase drugs have already been approved by the FDA, and many other potential kinase drugs are currently in clinical trials.[8,9] All protein kinases share similar structural fold.[10] This includes a catalytic domain comprising a smaller N-terminal subdomain (N-lobe) and a large C-terminal subdomain (C-lobe), in which the cleft between the two lobes forms the ATP-binding site. The N-lobe is composed of a β-sheet and a long α-helix (αC-helix), whereas the C-lobe is predominantly α-helical. The two lobes are connected by the hinge region, in which the N-lobe can adopt a wide range of positions relative to the C-lobe.[11] Situated in the C-lobe, the Asp-Phe-Gly (DFG) motif is a highly conserved motif that forms part of the ATP-binding site and coordinates magnesium binding. Immediately following the DFG-motif is the activation loop (A-loop), a stretch of 20–30 residues, which serves as the regulator of kinase activities.[12,13] Protein kinases are highly dynamic. The N-lobe, αC-helix, hinge region, and A-loop can undergo a wide range of movement and adopt multiple conformations, such as DFG-flip and rotation of αC-helix, that define catalytic activity.[14−18] Particularly, the DFG-motif of protein kinases adopts two major conformations, including the DFG-in and DFG-out conformations, which are thought to determine active or inactive states, respectively, as well as various intermediate conformations. In the active state or DFG-in conformation, the DFG-Phe is packed into a hydrophobic pocket, the DFG-pocket, between the N- and C-lobes, and stabilizes this active conformation through interactions with hydrophobic residues in that region.[19] In this conformation, the ATP-binding site is well-defined; the DFG-Asp faces outward to coordinate a magnesium ion for ATP binding, whereas the A-loop moves away from the ATP-binding site and forms a β-hairpin for substrate binding. Currently, there are 16 FDA-approved drugs that target the ATP-binding site in this conformation to competitively inhibit ATP binding (type-I kinase inhibitors, e.g., vandetanib).[9] Conversely, in the inactive state or DFG-out conformation, the DFG-Asp flips, and the DFG-Phe moves out of the DFG-pocket (Figure 1A), which can adopt a range of conformations (Figure S1). This DFG-flip induces movement in the A-loop, thereby deforming and obstructing part of the ATP-binding site. The unoccupied DFG-pocket joins with the deformed ATP-binding site and becomes accessible for ligand binding. Drugs that target the DFG-out conformation (type-II kinase inhibitors), such as the cancer drug imitanib (Gleevec) and sorafenib (Nexavar), have been recently developed, and current effort has focused on the design of this class of inhibitors.[9,20] Notably, structural data of kinases with the DFG-out conformation remains scarce, making it difficult to discover unique type-II inhibitor scaffolds for key drug targets such as mTOR and RET with structure-based drug design methods. Thus, several approaches have been developed to model kinase domains in the DFG-out conformation. For example, Kufareva and Abagyan deleted the DFG-motif in a DFG-in kinase structure and replaced it with an attractive potential density.[21] Xu et al. modeled the DFG-out conformation by building the A-loop ab initio and rotating the N-lobe by a fixed amount of degree to account for the conformational changes.[22] However, the extent of the N-lobe rotation can be highly variable and is still unknown for some kinases (e.g., mTOR). Furthermore, the modeling of the roughly 25-residue A-loop is time-exhaustive and subject to large errors.[23]

Figure 1

Here, we introduce a method to generate structural models of the kinase domain in DFG-out conformations using homology modeling. We first perform a comprehensive analysis of all experimentally determined kinase structures in complex with small molecule ligands. We then describe the development of DFGmodel, a method that constructs DFG-out kinase models based on multiple structures of related kinases, and evaluate the method’s performance with a variety of measures (e.g., statistical potentials). Finally, we illustrate the utility of the approach by describing three case studies of relevant targets.

Results and Discussion

The DFG-Motif Is Classified into Three Conformational States

The DFG-motif is highly conserved in both sequence and structure among the majority of protein kinases, including various atypical kinases (e.g., mTOR, RNaseR), and its conformation determines the shape of the ATP-binding site (Figure 1). We first grouped all 1950 crystal structures of the human protein kinases from the Protein Data Bank[24] (PDB) into three conformational states, DFG-in, DFG-out, and intermediate, by calculating the directional vectors of the residues in the DFG-motif and comparing them to a reference DFG-in kinase structure (Figure S2; Methods). In brief, DFG-in and DFG-out conformations have both directional vectors in roughly opposite directions, and they usually represent ligand-bound conformations; intermediate structures are those with directional vectors that do not match the criteria for DFG-in and DFG-out conformations and are usually apo or have the ligand bound with an irregular pose that distorts the DFG-motif. For example, the DFG-motif of P38α co-crystallized with dibenzoxepinone (PDG: 4L8M) adopts an intermediate conformation in which the DFG-Asp side chain directly interacts with the ligand.[17] Structures in the DFG-in, DFG-out, and intermediate conformations account for 75.6, 7.3, and 17.1% of all kinase structures, respectively. Moreover, the fraction of crystal structures in the DFG-out conformation varies between serine/threonine (S/T) kinases and tyrosine (Y) kinases (Figure 2A). This highlights the limited structural information on DFG-out structures that is needed for structure-based drug design of type-II inhibitors.

Figure 2

Structural features of the human kinome. (A) Fractions of human kinase with known structures in the DFG-in, DFG-out, and intermediate conformations. (B) Kinases with DFG-out conformation in the human kinome. Kinases with an experimentally determined structure in the DFG-out conformation are marked with a red circle. Compositions of key amino acids at the (C) gatekeeper and (D) (D – 1) positions. The majority of these amino acids are equally abundant in kinases with a DFG-out crystal structure and those with a DFG-in crystal structure only.

DFG-motif flips between two conformations. (A) Type-II inhibitor sorafenib (cyan sticks) is not compatible with the DFG-in conformation because it overlaps with the phenylalanine residue of the DFG-motif (red stick). (B) The DFG-motif and the residues preceding it are structurally conserved across most kinases, with the Phe side chain (red line) pointing away from the protein core. Type-II inhibitors (gray spheres) occupy the ATP-binding site and the DFG-pocket. The αC-helix of the N-lobe (colored ribbons) can adopt a wide range of conformations, thereby shifting the position of the N-lobe (white ribbons) relative to the C-lobe. Structural features of the human kinome. (A) Fractions of human kinase with known structures in the DFG-in, DFG-out, and intermediate conformations. (B) Kinases with DFG-out conformation in the human kinome. Kinases with an experimentally determined structure in the DFG-out conformation are marked with a red circle. Compositions of key amino acids at the (C) gatekeeper and (D) (D – 1) positions. The majority of these amino acids are equally abundant in kinases with a DFG-out crystal structure and those with a DFG-in crystal structure only.

DFG-Out Conformation Is Highly Conserved

We constructed a multiple structure alignment of all protein kinase structures (Figure 1B). The DFG-out structures are generally less compact than the DFG-in structures. For example, the N-lobe of DFG-out structure adopts a relatively wide range of conformations, as shown by the range of αC-helix positions (Figure 1B). The alignment indicates that the β-hairpin preceding the DFG-motif is structurally conserved, even in atypical kinases, and the majority of the DFG-out structures have similar DFG-motif conformation. Specifically, the Asp and Phe of the DFG-motif, which we defined as the (D) and (D + 1) residues (Methods), respectively, are flipped by approximately 180° in relation to DFG-in structures, whereas the side chain of the (D + 1) residue points away from the hydrophobic DFG-pocket (Figure 1B). The conformation of the DFG-motif and the overall position of the A-loop may be correlated, although it may not be the determinant factor of A-loop secondary structure (Figure S3). Because the A-loop does not take part in forming the DFG-out binding site, the conformation of the A-loop may not be critical for type-II ligand binding. Furthermore, the DFG-flip does not change the overall structure of the C-lobe. This analysis provides a structural framework to generate DFG-out models for kinases with no known DFG-out structures. Particularly, to account for the conformation changes in the N-lobe upon the DFG-flip, we hypothesize that a diverse set of kinase structures in DFG-out conformations with distinct N-lobe conformations, especially the αC-helix and glycine-rich loop, could be used as templates for modeling.

DFG-Out Conformation Is Not Restricted to Only a Few Protein Kinases or Families

Overall, we identified 189 experimentally determined DFG-out structures, which cover the majority of human kinome (Figure 2A,B). This suggests that the DFG-out conformation is not restricted to only a few protein kinases or families, in agreement with previous analysis.[9] However, these structures represent only 43 unique proteins, less than 9% of the kinome. In particular, P38α (44) and BRAF (12) structures constitute the majority of S/T-kinase structures, whereas KDR (25) is overrepresented among the Y-kinase structures (Figure S4). This nonuniform distribution is directly related to the current research focus that has mostly centered on a very limited number of targets for therapeutic purposes, such as P38α and KDR, and may not necessarily be because a protein kinase is incapable of adopting the DFG-out conformation that can be targeted by type-II inhibitors. Next, we analyzed the amino acid composition at two key positions: the gatekeeper residue, which resides in the kinase hinge region of the binding site, and the residue preceding the DFG-motif, i.e., the (D – 1) position (Figure S5).[25] These two residues form the bottleneck of the channel that connects the ATP-binding site to the DFG-pocket in the DFG-out conformation. The width of the channel can be approximated by the sum of the size of these residues. We analyzed previously published kinase inhibition profile data[26] and observed that type-II inhibitors inhibit kinases with varying channel widths, including those with narrow channels that have large residues at the gatekeeper/(D – 1) positions (Table S1). For example, P38α has a narrow bottleneck to the DFG-pocket. Interestingly, 7 of 14 previously tested type-II inhibitors significantly inhibited P38α activity (Table S1). This suggests that type-II inhibitors can be designed for kinases with a narrow or wide channel, in agreement with previous analysis.[9] For the gatekeeper residue, it was previously proposed that this residue limits the accessibility of the DFG-pocket, where a small residue (e.g., Thr) is preferred by type-II inhibitors and larger residues (e.g., Met, and Phe) may block access to the DFG-pocket.[21] In contrast, we find that larger amino acids, such as Met and Phe, are as common in kinases with DFG-out structures as in those kinases that have not yet been determined to be in the DFG-out conformation (Figure 2C), in agreement with Zhao et al.[9] For the (D – 1) residue, the small amino acids Gly, Ala, Ser, Thr, and Cys account for 82% of the amino acids found at this position in the human kinome (Figure 2D).[27] The differences for Gly, Leu, and Ile at this position are statistically insignificant due to the low number of available DFG-out crystal structures (Table S2). Similar to the gatekeeper residue, we do not observe a significant difference in the amino acid composition between kinases with and without known DFG-out structure. These data suggest that the identity of the gatekeeper and (D – 1) residues may not be the only factor that determine a kinase’s susceptibility to type-II kinase inhibitors. Importantly, given that these residues have similar composition in kinases with and without a known DFG-out conformation, it may imply that most protein kinases have the intrinsic ability to access the DFG-out conformation. Other factors may contribute to a kinase’s susceptibility to type-II kinase inhibitors, including the level of phosphorylation in the A-loop,[12] the presence of a binding partner,[28] or the state of the domain that modulates the conformational state of the kinase,[15] which may influence the conformation of the DFG-motif.

DFG-Out Conformations Can Be Modeled Accurately through Homology Modeling

In homology modeling, a target protein is modeled based on one or more related experimentally determined protein structures (i.e., templates). Due to the limited structural data of noncatalytic domains of protein kinases (Figure S6), here we focus on the catalytic domain of the kinase and do not model the noncatalytic domains. Furthermore, the models are based on ligand-bound kinase structures and are therefore expected to capture biologically relevant kinase conformations, similar to the crystal structures. In DFG-out structures, the relative position of the N-lobe to the C-lobe determines the conformation of the binding site, which is typically composed of the DFG-motif, the αC-helix, and the glycine-rich loop (Figure 1B). Modeling DFG-out structures using a single template may not optimally represent the ensemble of conformations of the N/C-lobes.[29] Therefore, to capture an “average” DFG-out conformation, we developed an approach, DFGmodel, to generate DFG-out models for the target kinase based on homology modeling of multiple template structures (Figure 3; Methods). We selected 18 representative structures with large variation in the relative position of the N/C-lobes, while avoiding overrepresentation of any kinase (Figure S4), as templates for modeling (Table S3).

Figure 3

Workflow of DFGmodel.

Workflow of DFGmodel. We evaluated the quality of the multi-template-based models generated by DFGmodel (Figure 4) as well as single-template-based models from MODBASE,[30] a database of automatically generated homology models. Particularly, we used three different measures: normalized discrete optimized protein energy (Z-DOPE),[31] template modeling score (TM-Score),[32] and root-mean-square distance (RMSD). Z-DOPE is a normalized atomic distance-dependent statistical potential based on known protein structure and is used to assess the overall quality of the homology models. A Z-DOPE score below −1.0 indicates high structural overlap to the native structure.[33] As expected, crystal structures obtain the best Z-DOPE scores (average score of −1.6 or lower). DFGmodel models perform better than MODBASE models, exhibiting Z-DOPE scores of −1.33 and −1.08, respectively (Tables 1 and S4).

Figure 4

Table 1

DFG-Out Model Assessment

	Z-DOPEa
	mean	median	SD
DFG-in crystal structures	–1.66	–1.63	0.17
DFG-out crystal structures	–1.61	–1.64	0.18
DFGmodel models	–1.33	–1.31	0.24
MODBASE models	–1.08	–1.17	0.45

Z-DOPE is the score based on a normalized atomic distance-dependent statistical potential based on experimentally determined structures.

TM-Score is the template modeling score, which assesses the topological similarity of two protein structures.

RMSD marks the root-mean-square deviation between two structures.

DFGmodel models of P38α. Fifty models (gray ribbon) are shown superposed onto the C-lobe of a (A) DFG-in crystal structure (PDB: 2LGC) and a (B) DFG-out crystal structure (PDB: 4A9Y). (C, D) DFG-flip moves F169 (blue arrow) in the models by 11 Å. Centroid of the models’ N-lobe is shifted, displacing the conserved K53 (orange arrow) by 3.0 Å and E71 (violet arrow) on αC-helix by 1.0 Å. The DFGmodel models are structurally more similar to known DFG-out structures than to DFG-in structures. Z-DOPE is the score based on a normalized atomic distance-dependent statistical potential based on experimentally determined structures. TM-Score is the template modeling score, which assesses the topological similarity of two protein structures. RMSD marks the root-mean-square deviation between two structures. In addition, we used TM-Score and RMSD to directly evaluate the accuracy of the models. TM-Score and RMSD are complementary measures, where TM-Score is more sensitive to the global topology than local variations. In particular, differences among distinct crystal structures of the same protein (crystal-to-crystal comparison) are set as an upper boundary for structure prediction (Tables 1, S5, and S6). Crystal-to-crystal comparisons of the N-lobe and C-lobe, separately, indicate that the lobes are structurally similar, with RMSD values generally lower than 0.9 Å. The full catalytic domain has slightly higher RMSD (1.11 Å) despite the identical sequence, suggesting that the N-lobe can adopt a range of positions relative to the C-lobe in DFG-out crystal structures, supporting our approach to use multiple template structures to represent the rigid movement of the N/C-lobes. Applying the same structural analysis to our DFG-out models, both DFGmodel and MODBASE models have an average TM-Score above 0.85, which indicates a high probability of the same topology and fold,[34] with models generated by DFGmodel score being slightly better than MODBASE models (Table 1). The DFGmodel models also have lower full domain RMSD (1.49 Å) than MODBASE models (1.79 Å) (Tables 1, S7, and S8). Notably, the RMSDs of the model-to-crystal comparison are only marginally higher than those of the crystal-to-crystal comparison. At the subdomain level, the DFGmodel models have RMSDs slightly higher than the corresponding RMSDs in crystal-to-crystal comparison. For example, the RMSD of the N-lobe is usually slightly higher (1.42 Å), probably due to the intrinsically flexible loops that frequently found in this subdomain. In summary, DFGmodel models are more accurate than MODBASE models, likely because they capture the rigid body movement of the lobes observed in the DFG-out structures.

DFG-Out Models Distinguish Type-II Inhibitors from Other Molecules

Because Z-DOPE, TM-Score, and RMSD assess the overall quality of the structure, they are not optimal for selecting the best models for virtual screening.[35] Therefore, to evaluate whether the DFG-out models are suitable for virtual screening, we examined how well the models distinguish between type-II kinase inhibitors and non-type-II inhibitors (e.g., type-I inhibitors). Our test set included S/T- and Y-kinases representing distinct branches of the kinome tree (Table 2). To examine the potential utility of the DFGmodel-generated models for structure-based studies such as structure-based drug design[36] and virtual screening,[37] models with larger binding site volume were selected for enrichment evaluation with a data set of type-II ligands (Methods). Specifically, we used the enrichment plot to derive the area under the curve (AUC) and the logarithmic scale of the enrichment plot (logAUC) values that evaluate the enrichment of type-II inhibitors.[35,38] Three case studies are presented: first with kinases with DFG-out structures to demonstrate that our models are comparable to known DFG-out crystal structures, followed by two case studies with kinases that do not have a DFG-out crystal structure to illustrate the potential real-world applicability of DFGmodel.

Table 2

Docking Performance

		Case 1: Kinases with DFG-Out structure
		DFG-out crystal structure			DFGmodel models
	ligandc	PDB	AUCd	logAUCe	PDB	AUC	logAUC
KDRb	34	2RL5	76.4	38.9	3CJG	77.6	33.6
P38αa	24	2BAJ	79.5	37.6	1OUY	82.0	41.4
ABL1b	20	3CS9	73.8	25.2	2F4J	75.7	31.9
BRAFa	19	4FC0	90.6	44.6	2FB8	87.7	39.1
LCKb	18	2OG8	84.7	45.1	3LCK	85.0	44.2
SRCb	16	2OIQ	62.4	19.4	2BDJ	76.8	39.5
KITb	14	1T46	71.3	26.9	1PKG	82.4	39.0
EPHA3b	6	3DZQ	48.1	21.9	4G2F	94.3	45.4
CDK8a	5	4F7J	84.2	55.1	4G6L	95.0	47.7
JNK2a	4	3NPC	79.7	30.8	3E7O	89.5	36.0
median			78.0	34.2		83.7	39.3
SD			12.3	11.6		7.0	5.1

		Case 2: Kinases with No DFG-Out Structure
RETb	14				2X2L	82.8	36.2
S6K1a	6				4L3J	92.3	39.0

		Case 3: Kinase with No Crystal Structure
PDFGRαb	14				n/a	82.3	33.4

Serine/Threonine kinase.

Tyrosine kinase.

Ligand marks the number of ligands annotated as type-II kinase inhibitor (Methods).

AUC is the area under the curve of the enrichment plot (Methods).

logAUC marks the logarithmic scale of the enrichment plot (Methods).

Serine/Threonine kinase. Tyrosine kinase. Ligand marks the number of ligands annotated as type-II kinase inhibitor (Methods). AUC is the area under the curve of the enrichment plot (Methods). logAUC marks the logarithmic scale of the enrichment plot (Methods).

Case Study 1: Kinases with DFG-Out Structures

We evaluated 10 protein kinases with crystal structures in DFG-in and in DFG-out conformations (Table 2). For each protein, DFGmodel was used to generate DFG-out models. We combined the docking results of these models into a consensus prediction and compared the enrichment of the consensus prediction to that of the known DFG-out crystal structure. Consensus of DFGmodel models enrich better than the experimentally determined DFG-out structures, for which the DFGmodel models obtain a slightly higher AUC median value than that of the DFG-out crystal structure (83.7 vs 78.6) (Table 2 and Figure 5). In addition, consensus DFG-out models perform favorably at retrieving hits early in the screening. For instance, the logAUC value,[35] a measurement for early enrichments, indicates that the consensus DFGmodel result matches the performance of DFG-out crystal structures, with median values of 39.3 to 34.2, respectively. These results suggest that the consensus DFGmodel models generally perform as well as DFG-out crystal structure and, in some cases, outperform crystal structure in identifying correct type-II kinase inhibitors. The reason for this performance difference may be due to the fact that crystal structure, which is co-crystallized with a unique inhibitor, may have subtle differences in the binding site that is optimized to enrich a different set of type-II inhibitors.[21] The use of consensus docking results from different models that were generated based on diverse DFG-out template structures overcomes the issue of a preoptimized binding pocket. Compared to using multiple DFG-out crystal structures for docking enrichment, which may be limited by the few crystal structures available, e.g., EPHA3 has only one DFG-out crystal structure, DFGmodel inherently generates an ensemble of models and thus we are not limited by the availability of a single DFG-out conformation; we can perform consensus docking against multiple conformations that typically tend to improve enrichment.[35]

Figure 5

Enrichment of the consensus DFGmodel models: (A) BRAF, (B) P38α, and (C) S6K1 are S/T-kinases; (D) EPHA3, (E) KIT, and (F) RET are Y-kinases. Blue dotted line represents random selection ligand from a database of ligands and nonbinders. The inserted enrichment plot is the performance of a corresponding DFG-out crystal structure. RET and S6K1 do not have a DFG-out structure for comparison. DFGmodel models perform as well as (e.g., BRAF) or better than the DFG-out crystal structures. Notably, the docking poses from the consensus DFG-out models are similar to poses of the ligands in DFG-out crystal structures (Figure 6). For example, the docked pose of the type-II inhibitors are comparable to the crystallographic poses seen in the corresponding DFG-out crystal structures (Figures 6A–C). Interestingly, in the majority of our test set of kinases without a known structure in complex with any type-II inhibitor, the predicted docked pose of the ligand is similar to the observed pose in its corresponding crystal structure (Figures 6D–F). For example, ligand 19B was originally designed for P38α (top 0.6%; ranked 9 among 1463 ligands) and was found to inhibit BRAF in a panel of kinase assay.[39] The DFG-out model identified 19B as one of the top hits (top 2%; ranked 27), with a docked pose similar to the crystallographic pose in P38α (Figure 6E). Another example is ligand B1E, a BRAF(V600E) inhibitor by design (top 0.5%; ranked 6) that was found to inhibit SRC (top 0.8%; ranked 10) (Figure 6D).[40]

Figure 6

Docked and crystal poses of type-II kinase inhibitors: (A) 1N8, (B) PD5, (C) B96, (D) B1E, (E) 19B, (F) AQB, (G) PD3, (H) PD5, and (I) BAX. Docked ligand poses of the DFGmodel models are compared to the poses derived from the corresponding crystal structures. Kinase models are shown as gray cartoons; crystal poses and docked poses are shown as sticks in green and various colors, respectively. DFG-Phe and the conserved Glu in αC-helix are depicted as gray sticks. The PDB name of the ligand is depicted next to the name of the kinase.

Case Study 2: Kinase with No DFG-Out Structure

To demonstrate the applicability of DFGmodel, we modeled two kinases that have not been co-crystallized with type-II kinase inhibitors in the DFG-out conformation before, including S6K1, an S/T-kinase, and RET, a Y-kinase. We chose these two kinases because they have been reported to be targets for a series of type-II inhibitors with well-studied polypharmacology profiles.[41] The DFG-out models of S6K1 and RET have average Z-DOPE scores of −0.93 and −1.55, respectively, suggesting that these models are accurate. Furthermore, enrichment calculations of these models suggest that they can accurately distinguish novel type-II inhibitors from type-I inhibitors. For example, DFG-out models of S6K1 and RET have AUC values of 92.3 and 82.8, respectively (Table 2 and Figure 5C,F). Importantly, both models have good early detection performance, with logAUC values of 39.0 and 36.2, respectively. Although crystallographic data of S6K1 and RET in the DFG-out conformation is unavailable, the docking results suggest that our consensus DFGmodel models capture binding site properties that are important for virtual screening. Finally, we compared the predicted structures of S6K1 and RET in complex with their inhibitors PD3 and PD5[41] to the crystal structures of a different kinase, SRC, bound to these compounds (Figure 6G,H).[42] Ligands PD3 and PD5 are broad-spectrum type-II kinase inhibitors that show activity against both S6K1 and RET. Both of these compounds are identified as top hits (top 5%) in our consensus DFG-out models for S6K1 and RET, and their docked poses are highly similar to the crystallographic poses, further increasing our confidence in the models and approach.[42]

Case Study 3: Kinase with No Structure

Platelet-derived growth factor receptors (PDGFRs) are membrane proteins responsible for regulation of cell growth and division. Of the two human PDGFR subtypes, α and β, abnormality in PDGFRα is associated with glioblastoma multiforme (GBM),[43] suggesting that PDGFRα is a drug target for GBM treatment. However, currently, there is no known structure of PDGFRα. Thus, we used DFGmodel to model PDGFRα in the DFG-out conformation. The PDGFRα DFG-out models scores highly using various measures, including the statistical potential Z-DOPE (−1.21) as well as the enrichment values AUC (0.82) and logAUC (33.4), which suggest that the models are sufficiently accurate for productive virtual screening (Table 2). Although we do not have crystallographic data to support the binding modes of the known PDGFRα inhibitors, the docked pose of these type-II ligands are similar to the crystallographic binding pose identified in other known crystal structures. For example, sorafenib is known to inhibit PDGFRα. It was identified as one of the top hits to our consensus DFGmodel of PDGFRα (top 0.5%) and has a docked pose almost identical to the crystallographic pose of sorafenib observed in KDR, a related kinase (Figure 6I).

Conclusions

The limited number of structures of protein kinases in their inactive conformation often hampers the design of novel ligands against key kinase targets. We performed an analysis of the structures and sequences of the human protein kinome. Three key results emerge from this study. First, protein kinases adopt similar inactive DFG-out conformation and use similar amino acid types to regulate interactions with type-II inhibitors. Second, we developed DFGmodel, a method based on homology modeling that utilizes a diverse set of DFG-out template structures to generate kinase models in the inactive, DFG-out conformation (Figure 3). The models generated by DFGmodel are accurate, using various assessment measures, such as RMSD and TM-Score (Table 1). This suggests that our approach provides a framework for modeling kinase structures in conformations relevant for drug discovery. Third, the performance of the models in distinguishing known type-II inhibitors from other small molecules is comparable to or exceeds that of the DFG-out crystal structures (Table 2). Thus, DFGmodel is useful for virtual screening to identify novel kinase inhibitors and allows us to rationalize off-target effects of some type-II kinase inhibitors. The results presented in this study provide a structural basis for using homology modeling to characterize kinases in the DFG-out conformation for ligand discovery.

Methods

Alignment of Kinase Sequences and Structures

We used keyword search to obtain structures that are associated with human protein kinases from the PDB (total of 5484 chains).[24] Because a typical kinase catalytic domain’s length is 220–300 residues, protein sequences with fewer than 200 residues (e.g., cyclin A, kinase fragments) were removed. The remaining sequences were aligned using T-Coffee/Expresso[44,45] and visualized in Jalview[46] v2.8.1. Sequences without the highly conserved glycine-rich loop and DFG-motif were discarded. For crystal structures that include multiple chains of the same kinase, the first chain in the structure was used. We identified a total of 3247 kinase chains that correspond to 2551 crystal structures (1924 are S/T-kinases and 627 are Y-kinases). Protein kinase structures were processed with BioPython.[47,48] All S/T- and Y-kinases were structurally aligned to the C-lobe, excluding the A-loop, of the template protein kinase PKA (1ATP) and SRC (2BDF), respectively, with PyMOL.[49] Kinases that are divergent in structures (RMSD > 5.0 Å), including atypical kinases such as pyruvate dehydrogenase kinases, RNaseR, mTOR, and PI3Ks, were not included in the analysis.

DFG-Motif Conformation Classification

Residues in or adjacent to the DFG-motif are named based on their relative positions to the Asp (D) of the DFG-motif. For example, Gly of the DFG-motif is at the (D + 2) position. DFG-motif conformation (i.e., DFG-in, DFG-out, or intermediate) is partially dictated by the dihedral angles of residues at the (D – 1) and (D) positions, which influence the directions of the side chains of the residues at the (D) and (D + 1) positions. A major difference between DFG-in and DFG-out conformations is the directional flip of the residues at the (D) and (D + 1) positions (Figure 1A). This directionality change can be quantified and compared using a vector-based methodwhere r1, r2, r3, and r4 are the atomic coordinates of (D):Cγ, (D):Cα, (D + 1):Cα, and (D + 1):Cγ, respectively, whereas the cross products D and D+1 define the direction of the residues at (D) and (D + 1) (Figure S2). The vectors D and D+1 are compared to the corresponding vectors found in the reference DFG-in structure, 1ATP, and various structures with well-defined DFG conformations, to derive the following conditions. The model has a DFG-in conformation if D·Dref > −0.005 and D+1·D+1ref > 0.0. If D·Dref < −0.125 and D+1·D+1ref < −0.05, then it has a DFG-out conformation. Those that do not fall into any of the categories have an intermediate conformation. The kinome tree diagram is generated with Kinome Render.[50]

DFGmodel

Models of the target kinases in DFG-out conformation were generated using MODELLER[51] v9.12 based on 18 DFG-out kinases structures (Table S3). The entire N-lobe and the DFG-motif (from the (D – 2) to (D + 2) position) were modeled on the basis of the selected template structures, whereas the rest of the C-lobe would remain the same as the input structure. A-loop residues beyond (D + 2) were excluded from modeling because they are not in close proximity to the type-II inhibitor-binding site and are often disordered, adopting an ensemble of conformations (Figure S3). For each target, 50 initial models were generated and subsequently refined with two cycles of optimization, which undergoes 300 iterations of conjugate gradients using a variable-target function method and molecular dynamics with simulated annealing. Because, in some models, the binding site was blocked by rotamers of a residue in close proximity to the binding site, we excluded models with a particularly small volume by selecting 10 models with the largest calculated binding site volume as the docking receptors. The volume of a DFG-out model’s binding site was calculated using POVME[52] v2.0. A sphere of inclusive volume and a sphere of exclusive volume defined the binding pocket for grid-point calculation. A grid space of 0.75 Å and a receptor–atom distance cutoff of 1.50 Å were used. Volume outside of the receptor’s convex hull was excluded. Small isolated volumes were considered contiguous with the primary pocket if they share more than eight neighboring points in common.

Model Assessment

The multiple-template DFG-out models were compared to both the crystal structures and the best single-template model from MODBASE.[30] Three metrics were used to assess the quality of the multitemplate DFG-out models: Z-DOPE,[31] TM-Score,[32,34] and RMSD. For TM-Score and RMSD, the models were compared to the N-lobe, C-lobe, and full domain of both the DFG-in and DFG-out crystal structures, if a structure is available.

Docking Assessment

The selected DFG-out model’s ability to enrich type-II inhibitors in a docking screen was evaluated with a set of 1463 known type-I and type-II kinase inhibitors (Figure S7). Since many type-II kinase inhibitors are known to have a wide spectrum of anti-kinase activity, we searched through the corresponding literature to extract additional kinase assay data for virtual screening performance analysis. Molecular docking of small molecules against the models was done with FRED,[53] a component of OpenEye’s OEDocking modeling suit, whereas the docked ligands were processed by RDKit. We used OMEGA[53,54] to generate the ligand library, in which a maximum of 300 conformers was allowed. The best scoring pose of each ligand from the docked models was selected as the representative of the consensus docked ligand. Performance of the docking results was measured by the area under the curve (AUC) of the enrichment plot. Early detection performance of the model was quantified by the logarithmic scale of enrichment plot (logAUC) value.[35]

52 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Authors: C Notredame; D G Higgins; J Heringa
Journal: J Mol Biol Date: 2000-09-08 Impact factor: 5.469

Review 3. Protein kinases--the major drug targets of the twenty-first century?

Authors: Philip Cohen
Journal: Nat Rev Drug Discov Date: 2002-04 Impact factor: 84.694

Review 4. Structure-based strategies for drug design and discovery.

Authors: I D Kuntz
Journal: Science Date: 1992-08-21 Impact factor: 47.728

5. Scoring function for automated assessment of protein structure template quality.

Authors: Yang Zhang; Jeffrey Skolnick
Journal: Proteins Date: 2004-12-01

6. How well can the accuracy of comparative protein structure models be predicted?

Authors: David Eramian; Narayanan Eswar; Min-Yi Shen; Andrej Sali
Journal: Protein Sci Date: 2008-10-01 Impact factor: 6.725

7. Kinase array design, back to front: biaryl amides.

Authors: Ian Baldwin; Paul Bamborough; Claudine G Haslam; Suchete S Hunjan; Tim Longstaff; Christopher J Mooney; Shila Patel; Jo Quinn; Don O Somers
Journal: Bioorg Med Chem Lett Date: 2008-08-22 Impact factor: 2.823

Review 8. Targeting cancer with small-molecular-weight kinase inhibitors.

Authors: Doriano Fabbro; Sandra W Cowan-Jacob; Henrik Möbitz; Georg Martiny-Baron
Journal: Methods Mol Biol Date: 2012

9. Progress in super long loop prediction.

Authors: Suwen Zhao; Kai Zhu; Jianing Li; Richard A Friesner
Journal: Proteins Date: 2011-08-23

10. Jalview Version 2--a multiple sequence alignment editor and analysis workbench.

Authors: Andrew M Waterhouse; James B Procter; David M A Martin; Michèle Clamp; Geoffrey J Barton
Journal: Bioinformatics Date: 2009-01-16 Impact factor: 6.937

13 in total

1. Identification of RIPK3 Type II Inhibitors Using High-Throughput Mechanistic Studies in Hit Triage.

Authors: Amy C Hart; Lynn Abell; Junqing Guo; Michael E Mertzman; Ramesh Padmanabha; John E Macor; Charu Chaudhry; Hao Lu; Kevin O'Malley; Patrick J Shaw; Carolyn Weigelt; Matthew Pokross; Kevin Kish; Kyoung S Kim; Lyndon Cornelius; Andrew E Douglas; Deepa Calambur; Ping Zhang; Brian Carpenter; William J Pitts
Journal: ACS Med Chem Lett Date: 2019-05-06 Impact factor: 4.345

2. Type II Binders Targeting the "GLR-Out" Conformation of the Pseudokinase STRADα.

Authors: Ryan H B Smith; Zaigham M Khan; Peter Man-Un Ung; Alex P Scopton; Lisa Silber; Seshat M Mack; Alexander M Real; Avner Schlessinger; Arvin C Dar
Journal: Biochemistry Date: 2021-01-13 Impact factor: 3.162

3. Compound heterozygous NEK1 variants in two siblings with oral-facial-digital syndrome type II (Mohr syndrome).

Authors: Glen R Monroe; Isabelle Fpm Kappen; Marijn F Stokman; Paulien A Terhal; Marie-José H van den Boogaard; Sanne Mc Savelberg; Lars T van der Veken; Robert Jj van Es; Susanne M Lens; Rutger C Hengeveld; Marijn A Creton; Nard G Janssen; Aebele B Mink van der Molen; Michelle B Ebbeling; Rachel H Giles; Nine V Knoers; Gijs van Haaften
Journal: Eur J Hum Genet Date: 2016-08-17 Impact factor: 4.246

4. Novel selective thiadiazine DYRK1A inhibitor lead scaffold with human pancreatic β-cell proliferation activity.

Authors: Kunal Kumar; Peter Man-Un Ung; Peng Wang; Hui Wang; Hailing Li; Mary K Andrews; Andrew F Stewart; Avner Schlessinger; Robert J DeVita
Journal: Eur J Med Chem Date: 2018-08-22 Impact factor: 6.514

5. Conformational Control of UDP-Galactopyranose Mutase Inhibition.

Authors: Kittikhun Wangkanont; Valerie J Winton; Katrina T Forest; Laura L Kiessling
Journal: Biochemistry Date: 2017-07-20 Impact factor: 3.162

6. Redefining the Protein Kinase Conformational Space with Machine Learning.

Authors: Peter Man-Un Ung; Rayees Rahman; Avner Schlessinger
Journal: Cell Chem Biol Date: 2018-05-31 Impact factor: 8.116

Review 7. A Special View of What Was Almost Forgotten: p38δ MAPK.

Authors: Débora Bublitz Anton; Rodrigo Gay Ducati; Luís Fernando Saraiva Macedo Timmers; Stefan Laufer; Márcia Inês Goettert
Journal: Cancers (Basel) Date: 2021-04-25 Impact factor: 6.639

8. A whole-animal platform to advance a clinical kinase inhibitor into new disease space.

Authors: Masahiro Sonoshita; Alex P Scopton; Peter M U Ung; Matthew A Murray; Lisa Silber; Andres Y Maldonado; Alexander Real; Avner Schlessinger; Ross L Cagan; Arvin C Dar
Journal: Nat Chem Biol Date: 2018-01-22 Impact factor: 15.040

9. Integrated computational and Drosophila cancer model platform captures previously unappreciated chemicals perturbing a kinase network.

Authors: Peter M U Ung; Masahiro Sonoshita; Alex P Scopton; Arvin C Dar; Ross L Cagan; Avner Schlessinger
Journal: PLoS Comput Biol Date: 2019-04-26 Impact factor: 4.475

10. Ligand Discovery for the Alanine-Serine-Cysteine Transporter (ASCT2, SLC1A5) from Homology Modeling and Virtual Screening.

Authors: Claire Colas; Christof Grewer; Nicholas James Otte; Armanda Gameiro; Thomas Albers; Kurnvir Singh; Helen Shere; Massimiliano Bonomi; Jeff Holst; Avner Schlessinger
Journal: PLoS Comput Biol Date: 2015-10-07 Impact factor: 4.475