Literature DB >> 22280316

Binding affinity prediction for ligands and receptors forming tautomers and ionization species: inhibition of mitogen-activated protein kinase-activated protein kinase 2 (MK2).

Senthil Natesan¹, Rajesh Subramaniam, Charles Bergeron, Stefan Balaz.

Abstract

Treatment of ionization and tautomerism of ligands and receptors is one of the unresolved issues in structure-based prediction of binding affinities. Our solution utilizes the thermodynamic master equation, expressing the experimentally observed association constant as the sum of products, each valid for a specific ligand-receptor species pair, consisting of the association microconstant and the fractions of the involved ligand and receptor species. The microconstants are characterized by structure-based simulations, which are run for individual species pairs. Here we incorporated the multispecies approach into the QM/MM linear response method and used it for structural correlation of published inhibition data on mitogen-activated protein kinase (MAPK)-activated protein kinase (MK2) by 66 benzothiophene and pyrrolopyridine analogues, forming up to five tautomers and seven ionization species under experimental conditions. Extensive cross-validation showed that the resulting models were stable and predictive. Inclusion of all tautomers and ionization ligand species was essential: the explained variance increased to 90% from 66% for the single-species model.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2012 PMID： 22280316 PMCID： PMC3315360 DOI： 10.1021/jm201217q

Source DB: PubMed Journal: J Med Chem ISSN： 0022-2623 Impact factor: 7.446

Introduction

Quantitative prediction of binding affinities of ligands interacting with target macromolecules is one of the most important tasks for lead optimization and other procedures in computational medicinal chemistry. The majority of approved drugs and drug candidates contain tautomerism-prone heteroaromatic ring systems and heteroatom-rich substructures[1] as well as one or more ionizing groups.[2,3] Components of the receptor binding sites, e.g., several amino acid residues,[4,5] cofactors (porphyrin,[6,7] NAD+, biotin,[3] and others), and nucleobases,[8−10] are also prone to ionization and tautomerism under physiological conditions. Structural differences of tautomer and ionization species lead to varying interactions with the binding site and cause the dependence of overall affinity on several factors. In addition to pH and temperature, the influence of medium polarity on tautomer and ionization equilibria plays a role because the interactions with the receptors may happen in an aqueous medium (blood/plasma, extra- and intracellular fluids) or in a nonpolar medium such as the bilayer core of the cell membrane. The time scale of establishing the tautomeric equilibria depends on the nature of broken and created bonds. Tautomers that interchange by CH bond cleavage and formation can often be isolated because their half-lives (t1/2) are measured in hours thanks to high (>30 kcal/mol) interconversion energy barriers. The CH-to-NH, −OH, and −SH tautomer conversions have much lower energy barriers (∼20 kcal/mol), and their t1/2 values are in the range of a second.[11] For the NH-to-OH and OH-to-OH tautomerism, the rates are much faster: some keto–enol tautomers convert on the picosecond time scale.[12] The fast tautomer interconversions of the last two categories facilitate the description of the ligand binding to macromolecules because the tautomer fractions remain constant during the binding process and are fully characterized by the equilibrium constants, eliminating the need to consider the kinetics of the tautomer interconversion process. Treatment of tautomers presents a challenge to methods utilizing classical force fields because of the increased occurrence of less common structures for which the parameters may not be readily available. Along with this nuisance, the static charge distribution and other approximations of the classical force fields contribute to the increasing use of combined quantum mechanics/molecular mechanics (QM/MM) methods in structure-based binding affinity predictions.[13,14] The application of hybrid QM/MM methods alleviates several issues in the classical description of biomolecular interactions and enables their more accurate description at a reasonable computational expense.[15] We previously reported an extension of the linear response method[16−22] by using the QM/MM energies of the time-averaged structures after MD simulation.[23,24] The QM/MM-LR approach was successfully applied to the affinity prediction of inhibitor binding to Zn-dependent matrix metalloproteinases (MMPs).[12] The model was capable of distinguishing subtle differences in the binding sites of two related MMPs and provide intricate selectivity clues[25] by including the phenomena that are out of reach of classical force fields, such as the coordination bonds made by the studied hydroxamates with the binding-site zinc and the proton transfer from the hydroxamate OH group to carboxyl of a neighboring Glu side chain in the binding site. Treatment of multiple species resulting from ionization and tautomerism is a challenge for current force-field based simulations in spite of the progress in the area.[26] Keeping in mind the advantages the QM/MM techniques offer for the treatment of tautomers and polarizing interactions, we wanted to examine the ability of the QM/MM-LR approach to describe binding of ligands involving tautomer and ionization equilibria. Suitable chemotypes, forming several tautomers and ionization species in aqueous media under physiological conditions, were found among the inhibitors of mitogen-activated protein kinase (MAPK)-activated protein kinase 2 (MK2) for the complexes of which a number of X-ray structures were available.[27−31] MK2 has two isoforms, α and β, which are produced by alternative splicing with 400 and 370 residues, respectively.[31] The enzyme consists of N-terminal proline-rich region (residues 10–44), catalytic region with DFG motif (residues 51–325), and C-terminal autoinhibitory region (residues 328–364). The C-terminal region includes a nuclear export signal (NES) of hydrophobic residues 356–368 and a nuclear localization signal (NLS) containing basic residues 373–389. The isoform β lacks the NLS region. During the presence of the inactive complex of p38α and MK2 in the nucleus, the C-terminal NLS is functional and the NES is masked. Upon stress stimulation, upstream kinases like MAPK kinase-6 activate p38, which in turn phosphorylates MK2, unmasking the NES, and this active complex translocates to cytoplasm. This cascade of events leads to coexport of activated p38 from the nucleus to cytoplasm. MK2 is a serine/threonine protein kinase regulating, by a post-transcriptional mechanism, biosynthesis of tumor necrosis factor α (TNFα) that is overproduced in inflammatory diseases such as rheumatoid arthritis and inflammatory bowel disease. MK2 knockout mice showed reduced expression of TNFα when stimulated with lipopolysaccharide (LPS) and were resistant to developing disease in arthritis models.[32] Mice that lack MK2 show increased stress resistance and survive LPS-induced endotoxic shock,[33] thanks to significant reduction in the production of TNFα. Unlike p38 MAP kinase inhibition leading to several serious side effects,[34] MK2 knockout mice are healthy and have a normal phenotype. The role of MK2 and the benefits of MK2 inhibition in other inflammatory diseases such as Alzheimer’s disease, atherosclerosis, and cancer are being actively investigated.[35] This study focuses on characterization of structural and energetic determinants of MK2 inhibitor binding involving multispecies equilibria using the newly developed MS-QM/MM-LR approach. The approach is applicable to binding predictions of speciated small molecules to speciated macromolecules, which are of interest in several areas of pure and applied chemistry.

Results and Discussion

The structures, tautomer and species, and inhibitory activities of the studied 66 MK2 inhibitors,[27−29] analogues of benzothiophene (benzothienodiazepinones, except 6 and 7) and pyrrolopyridine, are summarized in Tables 1 and 2, respectively. Under physiological conditions, the compounds exhibit ionization as well as prototropic and annular tautomerism: they form up to five different skeleton tautomers, with some tautomers creating two to four ionization species (Schemes 1 and 2), occasionally thanks to the ionization centers in the side moieties R (Tables 1 and 2). All potential tautomers within the two series are formed by the shifts of hydrogen between C and N, C and O, and N and O (Schemes 1 and 2), so the tautomer equilibria in an aqueous solution are expected to be established practically instantaneously.

Table 1

Benzothiophene Analogues: Structures, Considered Tautomers and Species, and MK2 Inhibition IC50 Values (M).[28,29]

						considered		log(1/IC₅₀)
ligand no.	X	R1	R2	R3	R4	tautomers	species	exp	calcd
1	NH	H	H	OCH₃	H	T1,T2	S1,S5,S7	6.745	6.739
2	NH	CH₃	H	OCH₃	H	T1,T2	S1,S5,S7	7.398	7.635
3	NH	H	CH₃	OCH₃	H	T1,T2	S1,S5,S7	6.523	6.713
4	NH	(CH₂)₂CH₃	H	OCH₃	H	T1,T2	S1,S5,S7	5.818	5.486
5	NH	CH₂OH	H	OCH₃	H	T1,T2,T5	S1,S5,S7,S11	7.854	7.446
6	S	H	H	OCH₃	H	T1,T3	S1,S9	5.963	5.547
7	CH₂	H	H	OCH₃	H	T1,T3	S1,S9	6.301	6.281
8	NH	CH₂NH₂	H	OCH₃	H	T1,T2	S1,S2,S5,S6	8.301	8.436
9	NH	CH₂NHCH₂CH₃	H	OCH₃	H	T1,T2	S1,S2,S5,S6	6.244	6.021
10	NH	CH₂NH(CH₂)₂CH₃	H	OCH₃	H	T1,T2	S1,S2,S5,S6	6.347	6.621
11	NH	CH₂NHCH₂Pha	H	OCH₃	H	T1,T2,T5	S1,S2,S5,S6,S12	6.244	6.418
12	NH	CH₂N(CH₃)CH₂Ph	H	OCH₃	H	T1,T2,T5	S1,S2,S5–S7,S12	5.719	6.074
13	NH	CH₃	H	OH	H	T1,T2	S1,S2,S5,S6–S8	6.721	6.944
14	NH	CH₃	H	OCH₂(3-OCH₃-Ph)	H	T1,T2	S1,S5,S7	6.959	7.022
15	NH	CH₃	H	OCH₂-c-C₃H₅	H	T1,T2	S1,S5,S7	5.879	5.874
16	NH	CH₃	H	O(CH₂)₂OCH₃	H	T1,T2	S1,S5,S7	6.569	6.769
17	NH	CH₃	H	OCH₂CH(CH₃)₂	H	T1,T2	S1,S5,S7	6.398	6.238
18	NH	CH₃	H	OCH₂COOCH₃	H	T1,T2	S1,S5,S7	7.523	7.212
19	NH	CH₃	H	OCH₂-c-C₆H₁₁	H	T1,T2	S1,S5,S7	5.463	5.593
20	NH	CH₃	H	O–CH=CHb		T1,T2	S1,S5,S7	7.796	7.775
21	NH	CH₃	H	O–CH₂–CH₂		T1,T2	S1,S5,S7	7.553	7.593
22	NH	CH₃	H	N=CH–CH=CH		T1,T2,T4	S1,S5,S7,S10	9.000	8.549
23	NH	CH₃	H	OC(Ph)=CH		T1,T2	S1,S5,S7	6.824	6.801
24	NH	CH₃	H	N=C(Cl)-CH=CH		T1,T2,T4	S1,S5,S7,S10	8.699	8.474
25	NH	CH₃	H	N=C(Ph)-CH=CH		T1,T2,T4	S1,S5,S7,S10	8.046	7.758
26	NH	CH₃	H	N=C(4-Py)-CH=CHc		T1,T2,T4	S1,S5–S7,S10	7.854	7.967
27	NH	CH₃	H	N=C(3-Py)-CH=CH		T1,T2,T4	S1,S5–S7,S10	8.301	7.705
28	NH	CH₃	H	N=C(5-Pm)-CH=CHd		T1,T2,T4	S1,S5,S7,S10	7.638	7.356
29	NH	CH₃	H	N=C(2-OCH₃-Ph)-CH=CH		T1,T2	S1,S5,S7	7.699	7.517
30	NH	CH₃	H	N=C(3-OCH₃-Ph)-CH=CH		T1,T2,T4	S1,S5,S7,S10	7.387	7.199
31	NH	CH₃	H	N=C(2-F-Ph)-CH=CH		T1,T2,T4	S1,S5,S7,S10	7.854	7.705
32	NH	CH₃	H	N=C(2-CH₃-Ph)-CH=CH		T1,T2,T4	S1,S5,S7,S10	7.523	7.771
33	NH	CH₃	H	N=C(4-CH3-3-Py)-CH=CH		T1,T2,T4	S1,S2,S5–S7,S10	8.301	8.265
34	NH	CH₂N-Tpe	H	OCH₃	H	T1,T2	S1,S2,S5,S6	<4.699	4.879
35	NH	CH₂N-Mof	H	OCH₃	H	T1,T2,T5	S1,S2,S5,S7,S11	<4.699	4.812

Ph: phenyl.

R3 and R4 join the phenyl ring to form cyclic derivatives in ligands 20–33.

Py: pyridyl.

Pm: pyrimidinyl.

Tp: tetrahydropyrrolyl.

Mo: morpholinyl.

Table 2

Pyrrolopyridine Analogues: Structures, Considered Species,a and MK2 Enzyme Inhibition IC50 Values (M)[27]

			log(1/IC₅₀)
ligand no.	R	considered species	exp	calcd
36	H	S1,S3,S5	6.767	6.561
37	Phb	S1,S3,S5	7.180	7.520
38	4-Pyc	S1,S2,S3,S5	7.252	7.339
39	3-Py	S1,S3,S5	7.319	7.486
40	3-Thd	S1,S3,S5	7.119	7.550
41	2-Npe	S1,S3,S5	7.284	6.934
42	2-Btf	S1,S3,S5	7.523	7.347
43	3-Qg	S1,S3,S5	8.071	7.994
44	2-OH-Ph	S1,S2,S3–S5	6.387	6.920
45	3-OH-Ph	S1,S2,S3–S5	7.602	7.829
46	4-OH-Ph	S1,S2,S3–S5	7.678	7.669
47	2-Cl-Ph	S1,S3,S5	6.218	6.430
48	3-Cl-Ph	S1,S3,S5	7.432	7.747
49	4-Cl-Ph	S1,S3,S5	7.495	7.568
50	2-F-Ph	S1,S3,S5	6.900	7.165
51	3-F-Ph	S1,S3,S5	7.523	7.659
52	4-F-Ph	S1,S3,S5	7.301	7.422
53	4-CF₃-Ph	S1,S3,S5	7.149	7.181
54	4-COCH₃-Ph	S1,S3,S5	7.292	7.228
55	4-OCH₃-Ph	S1,S3,S5	7.260	7.205
56	4-NH₂-Ph	S1,S3,S5	7.387	7.340
57	4-[CONH-c-C₅H₉]-Ph	S1,S3,S5	8.097	8.153
58	4-[CONH-c-C₆H₁₁]-Ph	S1,S3,S5	7.770	7.662
59	4-[CONHCH₂Ph]-Ph	S1,S3,S5	8.097	7.786
60	4-[CONH(CH₂)₂Ph]-Ph	S1,S3,S5	7.337	7.508
61	4-[CONH(CH₃)CH₂Ph]-Ph	S1,S3,S5	7.252	7.309
62	Cl	S1,S3,S5	6.216	6.270
63	5-Pmh	S1,S3,S5	7.081	7.115
64	4-CN-Ph	S1,S3,S5	7.208	7.218
65	4-COOH-Ph	S2,S3,S5	7.658	7.683
66	4-[CONH-c-C₃H₅]-Ph	S1,S3,S5	7.824	7.816

All compounds present as tautomers T1 and T2.

Phenyl.

Pyridinyl.

Thienyl.

Naphthyl (marked as 3-Np in the original publication).

Benzothienyl.

Quinolinyl.

Pyrimidinyl.

Scheme 1

Tautomer and Ionization Equilibria of Studied Benzothiophene Analogues (Table 1)

Species, present in at least 0.01% fraction under experimental conditions, are labeled S1–S12 and their parent tautomers are labeled as T1–T5. The neutral forms of the T3, T4, and T5 neutral-zwitterion pairs, marked with a prime, are present in negligible fractions and are listed only as intermediates in the formation of more abundant zwitterionic counterparts. The protonated form of T2 is shown as T2+. The charges of substituent groups are given in brackets. The R3–R4 substituent group for S2/T1 and S6/T2 refers to the cyclic derivatives (20–33, Table 1). Species S3 and S4 are not present in this series.

Scheme 2

Tautomer and Ionization Equilibria of the Studied Pyrrolopyridine Analogues (Table 2), Showing the Species Present in at Least 0.01% Fraction under Experimental Conditions

Ph: phenyl. R3 and R4 join the phenyl ring to form cyclic derivatives in ligands 20–33. Py: pyridyl. Pm: pyrimidinyl. Tp: tetrahydropyrrolyl. Mo: morpholinyl. All compounds present as tautomers T1 and T2. Phenyl. Pyridinyl. Thienyl. Naphthyl (marked as 3-Np in the original publication). Benzothienyl. Quinolinyl. Pyrimidinyl.

Ligand Fractions of Tautomers and Ionization Species

Ligand fractions of tautomers and ionization species in aqueous solution were estimated for water under the conditions of experiments (pH 7.5 and 30 °C), using the SPARC[36] online calculator. The fractions of individual tautomers were calculated from the relative abundances of all possible tautomers at equilibrium (Schemes 1 and 2). Ionization estimates were only performed for tautomers with at least 0.01% fraction to determine the fraction of each species. The resulting 233 tautomers (T1–T5) and species (S1–S12) for the studied 66 compounds were grouped and numbered as shown in Schemes 1 and 2. The species classification is based on the ionization state as well as on the parent tautomer. As a result, the species with the same number may have different overall charges as indicated in Schemes 1 and 2.

Tautomer and Ionization Equilibria of Studied Benzothiophene Analogues (Table 1)

Tautomer and Ionization Equilibria of the Studied Pyrrolopyridine Analogues (Table 2), Showing the Species Present in at Least 0.01% Fraction under Experimental Conditions

Tautomers are labeled as T1–T2 and species are labeled S1–S5. Protonated form of T1 is shown as T1+. The charges of substituent groups are given in brackets. Species S1, S2, and S5 are similar to those in Scheme 1 in the approximate position of charged atoms with regard to the CO-NH in T1. Species S6–S12 are not present in this series. The neutral-zwitterion tautomer pairs T3, T4, and T5 are mainly present as zwitterions, with the proton transferred from the hydroxyl in position 1 to N in position 2 or 5 (atom X in the Table 1 structure). Charged skeletons are produced by external protonation, as seen in benzothiophene tautomer T2+ (Scheme 1) and pyrrolopyridine tautomer T1+, with the protonated N in position 5 or in the pyridine ring, respectively. Further ionizable groups are present in the side chains R1, R3, R4, or cyclic substructures R3–R4. Combinations of the skeleton tautomers and the side chains ionized to varying degrees give rise to altogether 12 species. While both series are dominated mainly by tautomers T1 and/or T2, the presence of other tautomers cannot be neglected. The formation of T2 and T5 results in the loss of planarity of thiophene ring forming a new chiral center, which exhibited stronger interactions with the MK2 enzyme when set in S configuration in all MD simulations (described below). An overview of speciation of individual compounds is given in Figure 1, with all details listed in Table S4 in Supporting Information.

Figure 1

Species fractions of the studied benzothiophenes (1–35, Table 1) and pyrrolopyridines (36–66, Table 2) in water under experimental conditions: species S1/T1, S2/T1, S3/T1, S4/T1, S5/T2, and S6/T2 (Schemes 1 and 2) are shown in black, red, blue, yellow, gray, and green, respectively. Only major species (>10% in at least one compound) are shown. The majority of the benzothiophene analogues (Table 1 and Figure 1) do not ionize in water under experimental conditions. For 26 of 35 compounds, species 5/tautomer 2 (S5/T2) is the predominant species. Compounds 6 and 7 are mainly present as S1/T1. Only compounds 8–12 and 34 are mostly available as ionized S6/T2 and S2/T1. All pyrrolopyridine analogues (Table 2 and Figure 1) are present mainly as T1 in water. While neutral species S1 dominates (>70%) for compounds 42, 56, and 62 and ionized species S3 for compounds 36, 37, 40, 41, 46, 55, 64, and 65, most compounds share preferences for both species 1 and 3. Compound 65 with carboxyl group substituent is always ionized and present as both species S2 and S3. Compound 44 also exhibits preference for neutral species S4 in addition to species S1 and S3.

Protonation States of Ionizable Protein Residues

Protonation of ionizable residues of MK2 was determined by pKa calculations using the PROPKA 2.0 web server[37] at the experimental conditions (pH 7.5 and 30 °C). This empirical structure-based approach includes desolvation effects and intramolecular interactions such as H-bonds and charge–charge interactions. The pKa values for ionizable residues in the binding site (Figure 2 below), Glu139 (pKa = 1.88), Asp142 (2.51), Asp207 (1.24), and Lys93 (12.83), differ from the pH 7.5 of experiment by 5 or more logarithmic units, indicating that the binding site has just one ionization state in fraction exceeding 0.01%, with the side chain carboxyl groups of both Glu and Asp residues deprotonated and the side chain ε-amino group of Lys93 protonated. No tautomerization is expected for the binding site residues.

Figure 2

Time-averaged structure of MK2 with bound compound 8 (Table 1) used in step 4. The protein is shown as Cα-trace with the binding site indicated as transparent surface of the binding site residues Leu70, Gly71, Leu72, Gly73, Ile74, Ala91, Leu92, Lys93, Met138, Glu139, Cys140, Leu141, Asp142, Asn191, Leu192, Leu193, Thr206, and Asp207. The key residues and the ligand, shown in atom-type-colored ball and stick representation and the other binding site residues represented as red sticks are included in QM and optimized MM regions of the QM/MM geometry optimization, respectively (step 2). The ligand and all binding site residues were included in the flexible region in FlexiDock docking (step 1) and the QM-region in the single-point QM/MM energy calculation (step 4).

Multispecies Binding Equilibria

Individual ligand (L) and receptor (R) species present in the solution differ in structures of some fragments and, consequently, the binding event results in different structures of the complexes, which have different association microconstants. For the receptor, only the ionization or tautomer species of the fragments, which either directly participate in the binding or are in a sufficient proximity of the site to affect the binding, will differ in the association microconstant and need to be considered. The mutually exclusive 1:1 binding of multiple ligand species (total number l) to a macromolecule receptor site species (total number s) is illustrated in eq 1 for the ith ligand species. The microconstants K characterizing affinities of the l × s ligand–receptor complex species are defined, for the ith ligand species bound to the jth receptor species, as The association microconstants are the relevant quantities for the correlation with structure. The measured equilibrium constant K of the ligand, however, contains the total concentration of the ligand–receptor complexes without distinguishing between complexes differing in interacting species. To express K as a function of Ks, a series of rearrangements needs to be made, as shown in eq 3: (1) the total bound concentration, [LR], is given as the sum of concentration of the l × s complex species, to get the third term; (2) in the numerator of the third term, each summand is formally multiplied by [L][R]/[L][R] and, in this way, the fractions of each ligand species, f = [L]/[L], and each receptor species, f = [R]/[R], are introduced, as shown in the fourth term; (3) finally, the microconstants K are incorporated using their definition in eq 2.The fractions f and f depend on the medium and remain constant as long as the medium is not changing. The test media and intra/extracellular body fluids are buffered, so the key property, the pH value, remains invariant and the fractions f and f can usually be calculated before optimization. For ligand ionization, the expressions for the fractions f of the ith species can be derived from the definition of the ionization constants. In the case of two or more ionization groups, attention needs to be paid to the actual macroscopic pKa values. If the pKa values are closer than 3–4 units, the ionization tendencies of the two ionizable groups are not clearly separated, and mixed species may occur. Consequently, such pKa values are no longer equal to the ionization microconstants, and multiple species with the same charge (e.g., neutral molecules and zwitterions as two species with the net zero charge, such as the neutral-zwitterion tautomer pairs T3–T5 in Scheme 1) can be present for a given pH value of the medium. For the receptors, the equilibria can be affected by through-space proximity to other groups. Specialized techniques have been developed to estimate the ionization constants of individual ionizable groups[37−47] and the equilibrium constants for tautomers.[5] The prevalence of the ith bound species for a ligand can be calculated using the sum of all its complexes with the receptor species:The numerator and denominator come from eqs 2 and 3, respectively. The third quasi-equality in eq 4 ensures that the sum of prevalences for all l species of a ligand equals unity. The prevalence of the receptor species for the given ligand is calculated in a similar way; just the summations in the numerators of eq 4 would run through all l ligand species instead of the s receptor species.

Multispecies Inhibitor Studies

In the single-species QM/MM-LR approach, a linear combination of the QM/MM energy term and the solvent-accessible surface area (SASA) term is correlated with enzyme inhibition potency, which may be given as log(1/IC50) values, where IC50 is the concentration of an inhibitor that decreases the rate of an enzyme-catalyzed reaction by 50%. For reversible and competitive enzyme inhibition, the direct proportionality between IC50 and ligand–receptor association constant is given by the Cheng–Prusoff equation.[48] For the multispecies correlation, which uses the total association constant K as the dependent variable, each of the species microconstants K in eq 3 needs to be expressed as an exponential with the exponent equal to the linear combination of the ligand–protein QM/MM interaction energy term and the SASA term and eq 3 needs to be logarithmized. Assuming that the regression coefficients α, γ, and κ maintain the same values for each pair of ligand and receptor species, the inclusion of multiple species does not increase the number of the regression coefficients. The MS-QM/MM-LR correlation equation is thenThe term ⟨ΔEQM/MM⟩ denotes the binding energy defined as the difference between the QM/MM energies of the complex and those of the unbound interaction partners, all calculated for the time-averaged structures from MD simulations. The change in SASA upon binding is defined analogously. The QM/MM term in eq 5 replaced the van der Waals and electrostatic energy terms in the classical LR approach, which were scaled by coefficients α and β, respectively.[16−22] The QM/MM and SASA terms of 233 tautomers and species of 66 ligands complexed with MK2 enzyme were calculated using a slight modification of our previously published four-tier approach.[23,24] The protocol consists of (1) flexible docking with the poses selected based on the docking score, (2) QM/MM geometry optimization of the docked complexes, (3) MD simulation of the geometry optimized structures, solvated with explicit TIP3P water molecules,[55] and (4) for the time-averaged structures from step 3, calculation of the single-point QM/MM interaction energy term and the SASA term. The two terms were correlated with the enzyme inhibition potencies using eq 5.

Computational Protocol

In step 1 (docking), the studied inhibitors were docked into the recently published[30] X-ray structure of MK2 in active conformation, bound to a 3-aminopyrazole derivative (PDB code 3KGA(30)) with 2.55 Å resolution, taken from the Protein Data Bank.[49] MK2 features the β-sheet dominated N-terminal lobe and a purely α-helical C-terminal lobe, enclosing the ATP-binding site for which the inhibitors compete. The binding site (Figure 2) is lined by conserved glycine-rich motif GXGXXG of residues 71–76 and the hinge region, extending from the gate-keeper residue Met138 and including residues 139–142. The interaction of conserved Lys93 with the conserved Glu104 from helix C is considered a hallmark of the active conformation of protein kinases.[50] Docking was performed using the FlexiDock module from SYBYL-X. Two available X-ray structures, one for each series (for ligands 33(29) and 50(27)), were used to guide prepositioning of studied compounds. In addition to ligands, rotatable bonds of side chains of binding site residues were allowed to move during the docking procedure. This step was critical in selecting the optimal binding mode with the highest FlexiDock score, especially for more flexible compounds. The deviations between the optimal binding modes (poses) of individual tautomers/species of the same compounds were significant. The respective heavy-atom rmsd values varied between 0.55 and 10.95 Å and are summarized in Table S1 in Supporting Information. To speed up the QM/MM convergence, docked complexes were briefly minimized using Generalized Born implicit solvent method in Amber 10. In step 2 (QM/MM optimization), geometries of minimized protein–ligand complexes were further optimized by the ONIOM method,[51,52] a hybrid QM/MM method available in Gaussian 09.[53] Ligands and key binding site components (backbone atoms of Cys140, Leu141, and Asp142; whole residues of Thr206 and Asp207) were included in the QM region, and the rest of the system was defined as the MM region. The ligand and binding site residues were allowed to move during geometry optimization, and the rest of the system was frozen (Figure 2). No tautomer/species conversion was observed during the geometry optimization procedure. In step 3 (MD simulations), the entire hydrated and neutralized QM/MM-optimized complexes were heated and subjected to 1 ns MD simulation in Amber 10. Analysis of the trajectories for energy terms (potential, kinetic, and total energies), density, volume, temperature, pressure, and rmsd values of Cα atoms revealed that that the secondary and tertiary structure of the ligand–receptor complex remained stable throughout the simulation and all studied ligand–receptor complexes attained equilibrium within the 1 ns simulation period. For several compounds, 20 ns MD simulations were run, showing no major differences to the 1 ns trajectories. The MD trajectories were also analyzed to track critical intermolecular interactions, including water bridges that were involved in ligand binding (see below). For all complexes, the conserved Lys93 was found to be interacting with a conserved Glu104 from helix C, indicating the active conformation of protein kinases.[50] The time-averaged structures from the final 900 ps of MD simulations were calculated using the ptraj program in AmberTools,[54] keeping only 100 water molecules closest to the ligands and stripping off the remaining water molecules and counterions used for neutralizing the system. The resulting structures were briefly minimized to remove unnatural features. Step 4 (single-point QM/MM energy and SASA calculations) was performed for time-averaged structures resulting from step 3. The QM region consisted of ligands, binding site residues, and water molecule for complexes, in the cases when water-mediated interactions were observed. The heavy-atom rmsd values between the time-averaged structures of individual tautomers/species varied between 1.30 and 11.73 Å and are summarized in Table S2 in Supporting Information. For steps 1–4, the computations for each ligand–receptor species took on average 15–20 min, 40–60 h, 200–240 h, and 12–24 h of single-processor time, respectively. For the 233 complexes, plus free interaction partners, this study used about ∼130000 h of CPU time.

Multispecies 3D-QSAR Correlation

The MK2 enzyme inhibition potencies, given as logarithmized 1/IC50 values, were correlated with the QM/MM interaction terms and the SASA terms, both calculated in step 4. The nonlinear regression model (eq 5), with s = 1 as only one ionization/tautomer state was predicted for the binding site, was optimized using the Solver[55] software. Contributions of individual steps of the calculation protocol to the correlation were examined, and the results are summarized in Table 3. The use of the FlexiDock scores in place of the QM/MM energies in eq 5 resulted in no correlation (r2 = 0.002). The QM/MM energies and SASA, calculated from the geometry optimization step 2, gave r2 = 0.202. The use of van der Waals and electrostatic energy terms from the MD simulation in the MS-LR fashion in eq 5, as summarized in step 3, slightly improved the correlation (r2 = 0.353), but the signs of the coefficients α and β for the van der Waals and electrostatic energy terms, respectively, were incorrect. The full MS-QM/MM-LR treatment (eq 5) was necessary to achieve an agreement between experimental and calculated inhibitory activities, with the correct signs of the optimized coefficients α and γ (r2 = 0.906).

Table 3

Linear Response Correlations Using eq 5 for Individual Steps: Optimized Coefficients and Descriptive Statistical Indicesa

step	α × 10^–3(mol/kcal)b	β × 10^–4(mol/kcal)c	γ × 10^–3(1/Å²)d	κ	r²	RMSE
1. docking	0.0878 ± 0.115			7.292 ± 0.238	0.002	0.741
2. QM/MM optimization	–0.3868 ± 21.25		–7.354 ± 46.51	4.107 ± 27.44	0.202	0.663
3. MD simulation	10.47 ± 54.29	5.709 ± 62.32	–11.61 ± 86.56	5.417 ± 38.22	0.353	0.597
4. single-point QM/MM	–1.326 ± 0.591		–7.761 ± 2.261	1.160 ± 0.358	0.906	0.228

The squared correlation coefficient (r2) and the root-mean-square error (RMSE).

Scales the QM/MM energy term (eq 5) in rows 2 and 4, and the vdW energy term in row 3.

Scales the electrostatic energy term.

Scales the SASA term.

The squared correlation coefficient (r2) and the root-mean-square error (RMSE). Scales the QM/MM energy term (eq 5) in rows 2 and 4, and the vdW energy term in row 3. Scales the electrostatic energy term. Scales the SASA term. An analysis of the importance of tautomeric and ionization species for the correlation is summarized in Table 4. The simplest model (row 1) uses only the QM/MM and SASA terms for tautomer 1 (no ionization species are included), in eq 5, which thus became a standard, one-mode QM/MM-LR equation and exhibits a comparatively weak correlation (r2 = 0.662). The addition of ionization species for tautomer 1 (row 2) lead to a better performance (r2 = 0.734). The consideration of all tautomers and no ionization species (row 3) brought further improvement (r2 = 0.839). The model including all tautomers and all ionization species (row 4) provided the best results (r2 = 0.906). The comparison of experimental and calculated inhibitory potencies for different species compositions (Table 4), shown in Figure 3, also confirms the need for inclusion of all tautomers and ionization species.

Table 4

The MS-QM/MM-LR Correlations (eq 5) for Different Tautomer and Species Composition: Optimized Coefficients and Statistical Indices

							LOOc		MC-LGOd
tautomer	species	α × 10^–3(mol/kcal)a	γ × 10^–3(1/Å²)b	κ	r²	RMSE	q²	RMSE	q²	RMSE
single	single	–1.212 ± 1.289	–8.985 ± 8.451	1.151 ± 0.896	0.662	0.431	0.617	0.458	0.616	0.459
single	multi	–1.291 ± 1.026	–8.664 ± 7.211	1.000 ± 0.758	0.734	0.382	0.690	0.412	0.691	0.411
multi	single	–1.338 ± 0.801	–8.334 ± 5.002	0.960 ± 0.507	0.839	0.297	0.819	0.315	0.819	0.315
multi	multi	–1.326 ± 0.591	–7.761 ± 2.261	1.160 ± 0.358	0.906	0.228	0.897	0.237	0.894	0.241

Scales the QM/MM energy term.

Scales the SASA term.

Leave-one-out cross-validation: the results reported as RMSE and the squared predictive correlation coefficient (q2).

Monte Carlo leave-group-out cross-validation: the compounds were randomly divided into 7 groups, and each group of ∼9 compounds was used once as a test set; the procedure was repeated 10 times.

Figure 3

Correlations between experimental and calculated MK2 inhibition potencies for studied compounds are shown for all models: single-tautomer, single-species (green squares), single-tautomer, multispecies (red triangles), multitautomer, single-species (blue diamonds), and multitautomer, multispecies (black circles).

Scales the QM/MM energy term. Scales the SASA term. Leave-one-out cross-validation: the results reported as RMSE and the squared predictive correlation coefficient (q2). Monte Carlo leave-group-out cross-validation: the compounds were randomly divided into 7 groups, and each group of ∼9 compounds was used once as a test set; the procedure was repeated 10 times. Correlations between experimental and calculated MK2 inhibition potencies for studied compounds are shown for all models: single-tautomer, single-species (green squares), single-tautomer, multispecies (red triangles), multitautomer, single-species (blue diamonds), and multitautomer, multispecies (black circles). The agreement between the model and experiment is very satisfactory, especially for the complete setup (row 4). The LOO-cross-validation and, more importantly, the rigorous MC-LGO cross-validation confirmed that all models are stable, exhibit no overfitting, and have adequate predictive power because the values of the predictive indices RMSE and q2 are similar to those of the descriptive indices RMSE and r2. Interestingly, the optimized values of the coefficients α and γ did not vary significantly with the addition of species, which is also a sign of extraordinary stability of the models. As an additional model validation attempt, we performed the correlations with swapped values of the fractions f of individual tautomers/species in water. The numbers of tautomers/species differ but are equal to three for most compounds (Tables 1 and 2). Therefore, we swapped the f values in the correlation eq 5 for the first three species (the fi values for compounds 6 and 7, having only two species, were always exchanged). The descriptive abilities of the correlations using eq 5 were characterized by r2 = 0.906 for the correct (1–2–3) order of the f values, r2 = 0.831 for the order 3–2–1 (the f values for species 1 and 3 were exchanged), r2 = 0.807 for the 2–1–3 order, r2 = 0.629 for the 1–3–2 order, r2 = 0.702 for the 2–3–1 order, and 0.673 for the 3–2–1 order. The swapping of some f values, although providing only limited randomization, clearly leads to deterioration of the correlation. For compounds 34 and 35 with semiquantitative enzyme inhibitory potencies (Table 1), which were excluded from the model fitting process, predictions were made using all models. Activities of both compounds were predicted to be very close to the observed limits using the complete model (Table 4, line 4) although not below them (Figure 3). Contribution of the QM/MM energy to the correlation with inhibitory potency is greater than that of the SASA term. Coefficient γ associated with SASA term is approximately 7 times larger than that of the QM/MM term; however, when the overall contributions, i.e., the products of coefficients and the variables, are considered, the values of the QM/MM contributions are ∼10 times larger than those of the SASA term. There is no cross-correlation between the QM/MM energy and SASA terms (r2 = −0.334). Standard deviations of inhibitory activities (IC50) were only published for the series 2 compounds (36–66, Table 2) and are listed in Table S3 in Supporting Information. No information was published about the error distribution, so a detailed comparison of model performance with respect to experimental errors cannot be made. However, we can use the standard deviations as the estimates of experimental variance. On the logarithmic scale, which was used in the correlation, the average error interval for log(1/IC50) is 0.220 (Supporting Information Table S3), with the average log(1/IC50) located asymmetrically in this interval. The average error interval, which includes both positive and negative deviations, is smaller than RMSE = 0.228 for the best prediction (step 4 in Table 3), indicating that the complexity of the model is adequate: the model is not overly detailed and does not describe experimental errors.

Critical MK2–Inhibitor Interactions

Important interactions between MK2 and inhibitors were tracked by the analysis of MD trajectories of 233 complexes from the production phase.

H-Bond Interactions

H-bond interactions, both at the catalytic end (Asp207 and Lys93) and in the hinge region (Leu141), shown in Figure 2, are typical for all compounds exhibiting higher inhibitory potencies. The amino and carbonyl groups in the lactam ring make H-bond interactions with catalytic Asp/Lys pairs and methoxy oxygen (Table 1) or pyridinyl nitrogen (Table 2) groups interact with the backbone NH group of Leu141 in hinge region (Figure 4). The magnitudes of the average bond lengths (<2 Å) and angles (>150°, i.e., not deviating much from the ideal 180° magnitude) indicate strong H-bond interactions that seem to be critical for enzyme inhibition.

Figure 4

Interaction map of compound 8 (Table 1) as species 5/tautomer 2 (Scheme 1) in the binding site of MK2. Nitrogen, oxygen, and sulfur atoms are printed in blue, red, and yellow colors, respectively. H-Bond interactions are drawn in red color, and the bond lengths and bond angles (in italics) for the time-averaged structure are shown.

Water-Mediated Interactions

Water-mediated interactions were observed in many studied ligands bound to MK2. The protonated N atom in position X (Table 1) of the lactam ring in tautomers T2 and/or T4 of compounds 23 and 33 makes the water-mediated H-bond interaction with the backbone carbonyl oxygen of Glu190. Additionally, in compound 33, a water molecule creates an H-bonded bridge between the protonated N atom of pyridine ring and the carboxyl group of the Asp142 side chain in the hinge region. Water-mediated interactions were also observed for tautomer T2 of compounds 5, 8, 9, 11, 17, and 19, as well as for tautomers T2 and T4 of compounds 25, 26, and 29 (Table 1), in which the protonated N atom in the X position of the lactam ring forms H-bond interaction with the backbone carbonyl oxygen of Leu70 in the glycine-rich loop through a water molecule. Pyrrolopyridines (Table 2) binding as S3/T1 possess protonated pyridine ring, which interacts either directly with Glu139 backbone carbonyl oxygen (compounds 40, 41, 43–46, 49, 52, and 57) or with Leu141 backbone NH group through a water-mediated interaction (compounds 36, 38, 39, 56, 58, 59, and 60).

Hydrophobic Interactions

Hydrophobic interactions with the hinge region (Figure 2) are formed by several substructures in the part of the molecules that is opposite to the lactam ring. In benzothiophene series (Table 1), the rings attached in positions R3–R4 make compounds 22–33 relatively potent inhibitors. On the contrary, the absence of the aryl ring attached to the pyridyl ring decreases the potency of pyrrolopyridines 36 and 62 (Table 2). However, the structural dependence is more complex: substituents in the o-position of the aryl ring attached to the pyridyl ring negatively affect potency for pyrrolopyridines 44, 47, and 49 (Table 2). The side chain atoms of residue Leu70 (glycine-rich loop, Figure 2) and Leu193 make hydrophobic interactions from both sides of the binding site with many compounds having aryl ring substituents positioned near the hinge region of the MK2 enzyme.

Prevalences of Bound Tautomers and Species

The prevalences, as predicted by eq 4 with the K values calculated from eq 5 with optimized coefficients (last lines of Tables 3or 4), are summarized in Tables S4 and S5 in Supporting Information. The majority (28 of 35) of benzothiophenes (1–5, 12–33, and 35; Table 1) preferably bind as neutral species 5/tautomer 2 (Scheme 1). Several benzothiophenes (1–4, 12–14, 17, 18, 24, 29, 30, and 35) also exhibit some fraction of bound S1/T1 species. Only compounds 6 and 7 bind exclusively as S1/T1 (Scheme 1). Compounds 8–10 and 34 prefer to bind as protonated S6/T2 thanks to the presence of amino substituent in the lactam ring. Compound 11 shows almost equal binding preferences for S2/T1, S5/T2, and S6/T2. Compounds 9 and 11 bind (∼10%) in S2/T1 form as well. Pyrrolopyridine analogues (Table 2, Scheme 2) mostly bind in neutral form as S1/T1 (42, 50, 60, and 62) or protonized as S3/T1 (36, 41, 43, 45, 46, 49, 54–57, 64, and 65) or both (37–40, 43, 47, 48, 51–53, 58, 59, 61, 63, and 66). Compound 44 binds as both S3/T1 and S4/T1 species, the latter having the phenyl ring hydroxyl group in the deprotonated form. The predominantly bound tautomer 1 has the planar lactam ring attached to aromatic pyrrole ring. All compounds from this series make favorable interactions with the Asp/Lys catalytic pair, which may explain their comparatively narrow potency range. The fractions of free vs bound tautomers and species for all studied inhibitors are plotted in Figure 5. For the majority of the compounds, the prevalent species in solution are the bound species as well. Nevertheless, the correlation is rather loose and there are several noticeable differences between binding prevalences and the fractions in solution.

Figure 5

A comparison of predicted bound species prevalences and species fractions in the aqueous solution for studied ligands (the numbers correspond to the compound numbers in Tables 1 and 2). Species 1–6 (Schemes 1 and 2) are referred to by black, cyan, red, magenta, blue, and green colors, respectively. Only major species (at least one compound with >10%) are shown here. For example, compounds 11 and 12 (Table 1, Scheme 1) are mainly present as protonated species S6 in solution (>60%) but they are bound as nonionized species S5. Compound 13 binds to a significant extent as species S1, S5, and S6, although only species S5 is prevalent in solution. Compound 50 (Table 2, Scheme 2) is present as both species S1 and S3 in solution but shows a higher preference for species S1 while binding.The situation is opposite for compounds 43, 56, and 57, which prefer to bind as species S3, even though they have about equal concentrations of species S1 and S3 in solution.

Conclusions

The QM/MM linear response method was extended for multiple ligand and receptor species, resulting from tautomerism and ionization, using rigorous thermodynamic principles, to fill a gap in the armory of the computational methods for structure-based, ensemble-utilizing, and empirically calibrated prediction of binding affinities. Ionization and tautomerism, as widespread phenomena among the drug-like molecules, need to be included in the methods for prediction of binding affinities to bring the models closer to reality. Our approach is based on performing the simulations individually for each species pair and combining the results of these simulations in a correlation equation that is calibrated using the experimental data. Interestingly, inclusion of multiple species does not increase the number of optimized coefficients, at least in the studied case: the same regression coefficients could be used for all species without a detrimental impact on the quality of the fit. The developed MS-QM/MM LR method was applied to a set of 66 MK2 inhibitors forming up to five tautomers and seven ionization species under the experimental conditions. The treatment of multiple ligand species significantly improved the correlation between experimental inhibition potencies and the QM/MM interaction energy and SASA terms for the time-averaged structures from MD simulations. The structural and energetic information obtained from the time-averaged structures highlights the critical interactions necessary for optimal enzyme inhibition. Strength of the H-bond interactions of ligands with the backbone amide group of Leu141 in the hinge region, and carboxyl and ε-amino groups in the side chains of catalytic Asp207 and Lys93, respectively, significantly affect the inhibition potency. Additional interactions with backbone carbonyl group of Glu190 and glycine-rich loop made by some potent inhibitors represent hints for further improvements in potency. If future studies, preferably dealing with the experimentally observed prevalences of bound species, confirm the utility of the MS-QM/MM-LR approach, it may become a welcome addition to the methods for lead optimization in drug design.

Experimental Section

Data Set

The MK2 inhibitory activities for 66 benzothiophene and pyrrolopyridine derivatives were reported by Anderson et al.[27−29] The X-ray structures of the MK2 complexes with a benzothiophene analogue 33 (Table 1), a pyrrolopyridine analogue 50 (Table 2), and an untested 3-aminopyrazole derivative with the PDB[49] codes 3FYJ,[29]2P3G,[27] and 3KGA,[30] respectively, facilitated the use of structure-based techniques. The structure 3FYJ was preferred to 3FYK(29) because its ligand is larger and provides more clues for docking larger benzothiophene analogues. The inhibitors compete with ATP for binding to MK2. The isoeffective concentrations (IC50) were determined in the assay, monitoring the amount of HSP-peptide (KKKALSRQLSVAA) that was phosphorylated by MK2 after 30 min incubation at pH 7.5 and 30 °C.[27−29] Experimental errors were only given for the pyrrolopyridine series[27] and are listed in Table S3 in Supporting Information.

Tautomerism and Ionization of Ligands

The enzyme inhibitory IC50 values of the studied ligands were determined at pH 7.5 and 30 °C. The relative abundances of all possible tautomers at equilibrium were estimated under the conditions of the experiment. Subsequently, ionization estimates were performed only for tautomer having the fraction of at least 0.01% to determine the fraction of individual species as a function of pH. In both cases, the SPARC web server[36] was used.

Ionization and Tautomerism of Binding Site Residues

The protonation states of ionizable protein residues were estimated by pKa calculations using the PROPKA 2.0 web server[37] at the experimental conditions (pH 7.5 and 30 °C). There was one dominant ionization state of the binding site (Figure 2) predicted, with Glu and Asp side chain carboxyl groups deprotonated and Lys side chain ε-amino groups protonated. No tautomerism was expected for the binding site residues (Figure 2).

Ligand Preparation

Ligand structures were built in Sybyl-X 1.1.[56] Geometry optimization and Mulliken charge[57] calculations were done in Jaguar[58] (Schrodinger) using the DFT/B3LYP method[59] with basis set 6-31G**.[60,61] Charge and spin multiplicity were entered manually for different protonation states.

Protein Preparation

The recently published X-ray structure of MK2 in active conformation bound to a 3-aminopyrazole derivative (PDB[49] code 3KGA(30)) with 2.55 Å resolution is of better quality than other available structures (3FYJ,[29]3FYK,[29]2P3G[27]), as documented by resolution (2.55 Å vs >3.5 Å) and distribution of φ–ψ dihedrals (90% vs <70%) in the most favored regions. The 3KGA structure was modified with Biopolymer structure preparation tool in Sybyl-X:[56] the bound ligand and water molecules were removed, heavy atoms missing from residues were added, and the N-terminal and the C-terminal residues were capped with N-acetyl and N-methyl amide groups. Protonation types were set for His (ε-N protonated), Glu (negatively charged), and Lys (positively charged) residues. Hydrogen atoms were added, and charges were loaded using the AMBER 7 FF02[62] force field. The resulting structure was briefly (100 steps) minimized only for H-atoms by the Powell conjugate gradient method in Sybyl-X.[56]

Flexible Protein–Ligand Docking

Docking of ligands into MK2 binding site was performed using the FlexiDock program in Sybyl-X.[56] FlexiDock deploys a genetic alghorithm-based optimization in torsional space for both the protein binding site (rotatable bonds in side chains) and the ligand. The fitness function uses a subset of the Tripos force field: the van der Waals, electrostatic, torsional, and constraint energy terms. The number of rotatable bonds in the binding site residues (see Figure 2) of the protein was up to 57 and that of ligands ranged from 1 to 8. The number of generations used varied from 35000 to 60000 depending upon the total number of rotatable bonds, satisfying the rule-of-thumb recommendation of 500–1000 generations per each rotatable bond in protein and ligand plus six. The site-point scoring feature was activated in FlexiDock by defining the H-bond interaction partners in the protein and ligands: the backbone NH of Leu141 in the hinge region of MK2 as H-bond donor, and, as acceptor, the first atom, if it was O or N, in substituent R3 of benzothiophenes (Table 1) or the N atom in the pyridyl ring of pyrrolopyridines (Table 2). The available X-ray structures for the data set series, with the PDB[49] codes 3FYJ (compound 33 In Table 1 below)[29] and 2P3G (compound 50 in Table 2 below),[27] were used to aid the prepositioning of benzothiophene and pyrrolopyridine analogues, respectively, in the binding site prior to docking. The protein parts of these structures were aligned with that of the 3KGA structure by homology in Biopolymer module in Sybyl-X. The remaining ligands from both the series were superimposed using the fit atoms method. The best structures were chosen based on the FlexiDock scores and visual inspection. As tautomers with nonplanar lactam ring for most benzothiophenes (Table 1) did not always make catalytic end H-bond interactions, the H-bond distances to catalytic residues Asp207 and Lys93 were not considered mandatory for the pose selection. The difference in the docked poses (conformations) of tautomers/species of individual compounds were reported as the heavy-atom rmsd values (Table S1 in Supporting Information), calculated using Chimera.[63] For all compounds, species 1 served as reference structure except for compound 65, where species 2 was used for this purpose. The same approach was used to calculate the heavy-atom rmsd values for the time-averaged ligand geometries of tautomers/species after MD simulation (steps 3/4), which are summarized in Table S2 in Supporting Information.

QM/MM Geometry Optimization

Docked protein–ligand complexes were briefly (1000 steps) optimized to remove any bad contacts by modified Generalized Born implicit solvent model in Amber 10. Optimization started with steepest descent method to quickly remove the largest strains and was switched after 100 cycles to the conjugate gradient method with root-mean-square of 1 cal/(mol × Å) as the convergence criterion for fine optimization close to the energy minimum. The nonbonded cutoff was set to 16.0 Å. Input files for ONIOM calculations[51,52] in Gaussian 09[53] were prepared in GaussView 5.0.[64] Ligands and selected binding site components (entire residues of Thr206 and Asp207, backbone atoms of Cys140, Leu141, and Asp142) were included in the QM region, and the rest of the system was defined as the MM region. The partitioning of the system in ONIOM was based on published recommendations:[65] the QM/MM cuts were always made between the C–C bonds and at least three bonds away from any potential interacting atoms. In the MM region, the remaining binding site residues (Figure 2) were allowed to move during geometry optimization and the rest of the system was frozen. Charges were manually entered for both QM (along with spin multiplicity) and MM regions. The defined QM and MM regions were treated with DFT/B3LYP functional[59] using 6-31G(d,p) basis set[60,61] and Amber ff99SB force field,[66] respectively. Electronic embedding option was activated during calculations.

MD Simulation

Molecular dynamics simulations were performed using Amber 10 package[67] under isothermal/isobaric (NPT) conditions with Amber ff99SB[66] and general Amber (GAFF)[68] force fields for protein and ligand molecules, respectively.

Structure Preparation

The Leap program from Antechamber tools (AmberTools 1.4)[54] was used to prepare the parameter/topology (prmtop) and input coordinate (inpcrd) files using the Mulliken charges[57] from the QM/MM optimization, which were kept for protein–ligand complexes during MD simulations. For each system, protein–ligand complexes and unbound ligands were simulated separately. The unbound protein was simulated only once. The net charge of the protein–ligand complexes varied from +3 to +6 depending upon the overall charge of the ligand and was neutralized by adding Cl– ions at positions of high positive electron potential around the complexes. The system was immersed in a truncated octahedral box of pre-equilibrated TIP3P[69] water molecules[55] in the way that no atoms in the protein–ligand complexes were closer than 8 Å to any of the sides of the water box. The counterions and solvent molecules were briefly minimized for 1000 steps to remove any bad contacts with the complexes, whereby the protein and ligands were position-restrained using force constant of 500 kcal/(mol × Å2).

Heating, Equilibration, and Production Phases

To allow the readjustment of solvent molecules to the potential field of the ligand–receptor complex, the solvent equilibration step was performed in three stages. During the first heating phase, MD simulation was carried out for 10 ps, with constant volume periodic boundaries with an initial temperature of 0 K, allowing to heat up to 100 K. The second heating phase was performed for 20 ps under constant pressure periodic boundaries with an average pressure of 1 atm. Isotropic position scaling was used to maintain the pressure, and a relaxation time of 2 ps was used. The system was allowed to heat up from the initial temperature 100 K to the final temperature 300 K. The final solvent equilibration step was performed for 50 ps under constant pressure periodic boundary conditions as mentioned above. The Langevin dynamics was used in all stages to control the temperature using a collision frequency of 1.0 ps–1. During these three stages, the protein–ligand complexes were position-restrained with a force constant of 10 kcal/(mol × Å2). The final production phase with the entire system was carried out under isothermal/isobaric conditions for 1 ns including 100 ps equilibration phase. The SHAKE algorithm[70] was used to constrain bonds involving hydrogen, allowing time step of 2 fs, for a total of 500000 steps. The trajectory file was written for every 100 steps, resulting in 5000 frames. The cutoff was set to 10 Å in all steps.

Tracking H-Bond and Water-Mediated Interactions.

MD trajectory analysis for H-bond and water-mediated interactions for 233 complexes from the production phase was performed using the ptraj program in AmberTools 1.4.[54] The default cutoff values for distance (3.5 Å) and angle (no value) were chosen. The following pairwise interactions were monitored: (1) the backbone NH of Leu141 in the hinge region of MK2 as H-bond donor, and, as H-bond acceptor, the first atom, if it was O or N, in substituent R3 of benzothiophenes (Table 1) or the N atom in the pyridyl ring of pyrrolopyridines (Table 2), (2) Asp207 carboxyl oxygen atoms and the NH group of the six- or seven-membered lactam ring of ligands, and (3) the O atom of the lactam ring and the amino group of the Lys93 side chain. To keep track of important water-mediated interactions, “solvent” keywords solventneighbor (value = 6), solventdonor (WAT O), and solventacceptor (WAT O H1 and WAT O H2) were specified. The solventneighbor value specifies the maximum number of possible interactions per a given donor or acceptor.

Time-Averaged Structures

Time-averaged structures were calculated from 4000 trajectory frames (final 900 ps) of the production phase of the MD simulation using the ptraj program in AmberTools 1.4.[54] The hundred water molecules closest to the ligands were kept for all complexes, and the remaining waters and ions were stripped. The structures were briefly minimized to relieve conflicts, resulting in structures with close-to-standard bond length and angles and the dihedrals representing the ensemble.

Single Point QM/MM Energy Calculations

The time-averaged structures from MD simulations were used for single point QM/MM energy calculation in ONIOM[51,52] using DFT/B3LYP functional[59] method with 6-31G(d,p) basis set[60,61] for QM region and Amber ff99SB[66] force field for MM region, respectively. Electronic embedding option was activated during calculations. The QM region consisted of ligands, binding site residues (Figure 2), and a water molecule for complexes, where water-mediated interactions were observed. The binding energy term ⟨ΔEQM/MM⟩ term was calculated as the difference between the QM/MM energies of the complex and those of the unbound interaction partners.

SASA Term Calculation

The solvent accessible surface area (SASA) terms for time-averaged structures of protein–ligand complexes and unbound interaction partners were calculated using the program Naccess,[71] which implements the Lee and Richards algorithm.[72] The default values (probe radius 1.4 Å; z-slices 0.05 Å; van der Waals radii) were used in the calculations. The burial of solvent accessible surface area upon protein–ligand binding was calculated as the difference between the SASAs of the complex and those of unbound interaction partners.

Regression and Cross-Validation

The coefficient values in the model (eq 5) were optimized by nonlinear regression analysis in the Premium Solver Platform.[55] The stability and predictive power of the model was tested by the Monte Carlo leave-group-out (MC-LGO) cross-validation. In this test, compounds were randomly divided into seven groups of about equal size and the model calibration process involved only compounds from six groups in any given cycle leaving one of the groups outside as the test set. The resulting model was used to predict the activities for the omitted test set compounds. This process was repeated 10 times, reshuffling the compounds in each group, resulting in 70 models in total. The leave-one-out (LOO) cross-validation was also performed, omitting and predicting one compound at a time.

50 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Development and testing of a general amber force field.

Authors: Junmei Wang; Romain M Wolf; James W Caldwell; Peter A Kollman; David A Case
Journal: J Comput Chem Date: 2004-07-15 Impact factor: 3.376

Review 3. Mechanisms and free energies of enzymatic reactions.

Authors: Jiali Gao; Shuhua Ma; Dan T Major; Kwangho Nam; Jingzhi Pu; Donald G Truhlar
Journal: Chem Rev Date: 2006-08 Impact factor: 60.622

Review 4. MAPKAP kinases - MKs - two's company, three's a crowd.

Authors: Matthias Gaestel
Journal: Nat Rev Mol Cell Biol Date: 2006-02 Impact factor: 94.444

5. Comparison of multiple Amber force fields and development of improved protein backbone parameters.

Authors: Viktor Hornak; Robert Abel; Asim Okur; Bentley Strockbine; Adrian Roitberg; Carlos Simmerling
Journal: Proteins Date: 2006-11-15

Review 6. MK2: a novel molecular target for anti-inflammatory therapy.

Authors: Senthil Duraisamy; Malini Bajpai; Usha Bughani; Sunanda G Dastidar; Abhijit Ray; Puneet Chopra
Journal: Expert Opin Ther Targets Date: 2008-08 Impact factor: 6.902

7. Estimation of binding free energies for HIV proteinase inhibitors by molecular dynamics simulations.

Authors: T Hansson; J Aqvist
Journal: Protein Eng Date: 1995-11

8. Binding affinities for sulfonamide inhibitors with human thrombin using Monte Carlo simulations with a linear response method.

Authors: D K Jones-Hertzog; W L Jorgensen
Journal: J Med Chem Date: 1997-05-09 Impact factor: 7.446

9. A new method for predicting binding affinity in computer-aided drug design.

Authors: J Aqvist; C Medina; J E Samuelsson
Journal: Protein Eng Date: 1994-03

10. Novel 3-aminopyrazole inhibitors of MK-2 discovered by scaffold hopping strategy.

Authors: Juraj Velcicky; Roland Feifel; Stuart Hawtin; Richard Heng; Christine Huppertz; Guido Koch; Markus Kroemer; Henrik Moebitz; Laszlo Revesz; Clemens Scheufler; Achim Schlapbach
Journal: Bioorg Med Chem Lett Date: 2009-11-03 Impact factor: 2.823

4 in total

Review 1. Rigorous incorporation of tautomers, ionization species, and different binding modes into ligand-based and receptor-based 3D-QSAR methods.

Authors: Senthil Natesan; Stefan Balaz
Journal: Curr Pharm Des Date: 2013 Impact factor: 3.116

Review 2. Targeting Mitogen-Activated Protein Kinase-Activated Protein Kinase 2 (MAPKAPK2, MK2): Medicinal Chemistry Efforts To Lead Small Molecule Inhibitors to Clinical Trials.

Authors: Mario Fiore; Stefano Forli; Fabrizio Manetti
Journal: J Med Chem Date: 2015-11-09 Impact factor: 7.446

3. Thermochemical and Quantum Descriptor Calculations for Gaining Insight into Ricin Toxin A (RTA) Inhibitors.

Authors: Acassio Rocha-Santos; Elton José Ferreira Chaves; Igor Barden Grillo; Amanara Souza de Freitas; Demétrius Antônio Machado Araújo; Gerd Bruno Rocha
Journal: ACS Omega Date: 2021-03-23

Review 4. Pharmacophore selection and redesign of non-nucleotide inhibitors of anthrax edema factor.

Authors: Catherine H Schein; Deliang Chen; Lili Ma; John J Kanalas; Jian Gao; Maria Estrella Jimenez; Laurie E Sower; Mary A Walter; Scott R Gilbertson; Johnny W Peterson
Journal: Toxins (Basel) Date: 2012-11-08 Impact factor: 4.546

4 in total