Literature DB >> 22468611

Cellular quantitative structure-activity relationship (Cell-QSAR): conceptual dissection of receptor binding and intracellular disposition in antifilarial activities of Selwood antimycins.

Senthil Natesan¹, Tiansheng Wang, Viera Lukacova, Vladimir Bartus, Akash Khandelwal, Rajesh Subramaniam, Stefan Balaz.

Abstract

We present the cellular quantitative structure-activity relationship (cell-QSAR) concept that adapts ligand-based and receptor-based 3D-QSAR methods for use with cell-level activities. The unknown intracellular drug disposition is accounted for by the disposition function (DF), a model-based, nonlinear function of a drug's lipophilicity, acidity, and other properties. We conceptually combined the DF with our multispecies, multimode version of the frequently used ligand-based comparative molecular field analysis (CoMFA) method, forming a single correlation function for fitting the cell-level activities. The resulting cell-QSAR model was applied to the Selwood data on filaricidal activities of antimycin analogues. Their molecules are flexible, ionize under physiologic conditions, form different intramolecular H-bonds for neutral and ionized species, and cross several membranes to reach unknown receptors. The calibrated cell-QSAR model is significantly more predictive than other models lacking the disposition part and provides valuable structure optimization clues by factorizing the cell-level activity of each compound into the contributions of the receptor binding and disposition.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2012 PMID： 22468611 PMCID： PMC3338160 DOI： 10.1021/jm201371y

Source DB: PubMed Journal: J Med Chem ISSN： 0022-2623 Impact factor: 7.446

Introduction

Conceptual methods for the correlation of binding affinities with drug structure (3D-QSARs) are vital to the drug development process. The approaches include receptor-based methods,[1] e.g., free energy perturbation,[2] the linear response method,[3−5] and the mining minima approach,[6−9] and ligand-based methods for unknown receptors, e.g., comparative molecular field analysis (CoMFA).[10] The methods have been developed and perform satisfactorily for the binding data measured with isolated macromolecules. Unfortunately, binding affinities for isolated macromolecules are often unavailable because the receptors have not yet been identified, cannot be isolated without denaturation or dissociation, or do not work in aqueous solutions. In such situations, more complex assays are used, deploying receptors reconstituted in lipid vesicles as the simplest system or utilizing intact cells, tissues, organs, or organisms. The effective concentrations of the drugs in the receptor surroundings vary among the studied compounds because of interactions with nonreceptor assay constituents. Drug disposition has often been neglected in modeling the bioactivities measured in complex systems. At best, the 1-octanol/water partition coefficients and their squares,[11] and sometimes the dissociation constants,[12] as properties affecting the disposition, have been included in simplified linear forms. In many cell-level studies, 3D-QSAR methods have been applied without any correction for ligand disposition or empirical models with various descriptors have been deployed. Here, we propose a straightforward solution to the problem of QSAR modeling of cell-level data: extend the proven 3D-QSAR methods by accounting for the varying ligand disposition in the receptor surroundings using the disposition function (DF). The conceptual cell-QSAR approach is based on a new correlation equation combining the 3D-QSAR expression and the DF, whereby the coefficients in both parts can be optimized either simultaneously or separately if the uptake data are available. The exact form of the correlation equation depends on the kinetics of the processes underlying the drug effects and on the complexity of studied compounds (common skeleton, similarity of substituents, ability to ionize and form tautomers), as discussed in the Methods. The DF describes the kinetics of drug disposition and relates the dose or the initial concentration to the concentration in the receptor surroundings (see eq 2 in the Methods). The first DF forms were peak-shaped dependencies, on logarithmic scales, of the drug concentration inside the receptor compartment, which was separated from the dosing compartment by a few bilayers, on lipophilicity (the reference partition coefficient).[13−16] These dependencies represented the basis of the first QSAR techniques, such as the Hansch approach[13] and related methods. Subsequently, other parameters, including the ionization constant,[17] the exposure time, the metabolic rate parameters,[18] the ratio of the partition coefficients in alkane/water and 1-octanol/water systems,[19−22] and the polar surface area[23,24] as descriptors of H-bonding ability, the partition coefficient between phosphatidylcholine headgroups and hexadecane as a predictor of bilayer localization and partitioning,[25] the membrane-interaction QSAR parameters,[26,27]the polarizability,[28]the cross-sectional area of the molecule,[29] and numerous others, were added. The DF functional forms developed from the parabolic,[13] bilinear,[15] nonequilibrium,[30] equilibrium,[17] and mixed models to the pseudoequilibrium DF.[31,32] Some of these DFs predicted reliably outside the used property ranges, in contrast to empirical models.[33] The DF can also be calibrated independently using the data on cellular uptake and disposition, which are obtained by various experimental approaches. High-content screening techniques provide more detailed information than classical uptake experiments,[34] as demonstrated for the disposition of a small library of lipophilic, fluorescent compounds in living HeLa cells.[35] Structure-dependent phenomena, such as active influx or efflux and metabolism, could be captured in this process. The additional effort to obtain and fit the disposition data can be spared in situations when the disposition is not complicated by structure-dependent phenomena. Then the DF depends on drug properties characterizing the bilayer transport and accumulation, nonspecific protein binding, hydrolysis, and other nonenzymatic reactions. An appropriate DF can be selected from the armoire of available models and can be calibrated simultaneously with the 3D-QSAR model using the cell-level data. This approach is demonstrated here using the Selwood data[36] on filaricidal activities of analogues of antimycin A, which inhibits mitochondrial electron transport and oxidative phosphorylation[37] and induces apoptosis.[38] These data are regarded as difficult to model and are frequently used to validate QSAR methods on the basis of variable selection. The descriptive ability of the generated models did not exceed 60–80% of the explained variance,[36,39−47] so there is space for improvement. To the best of our knowledge, this cell-QSAR study represents the first attempt at the QSAR modeling of drug effects in complex systems, where both drug–receptor interactions and drug disposition significantly vary among studied compounds and both are conceptually treated using a common correlation equation.

Results and Discussion

Studied Antimycin A Analogues

The studied compounds, listed in Table 1, have varying substituents, R1, on the phenol ring and large lipophilic substructures, R2, which replace a complex dilactone moiety of the parent compound. The compounds can be formally divided into three groups: two-ring analogues (23 and 31), three-ring analogues (2–16 and 26–30), and alkyl-chain analogues (1, 17–22, 24, and 25). The salicylamide moiety, present in all compounds except 31, exhibits a peculiar intramolecular H-bonding.[48,49] Two types of intramolecular H-bonds are formed, depending upon ionization of the hydroxyl (Figure 1).

Table 1

Structures and Properties of Antimycin Analogues

compd no.	R₁	R₂	log Pa	pK_ab	log Kc	log DFc	log(1/EC₅₀)d
1	3-NHCHO	NHC₁₄H₂₉	7.491	7.88	1.181	4.150	5.155
2e	3-NHCHO	NHC₆H₃-3-Cl-4-(OC₆H₄-4-Cl)	5.955	7.21	2.307	3.438	5.620
3	5-NO₂	NHC₆H₃-3-Cl-4-(OC₆H₄-4-Cl)	6.944	5.15	1.710	5.476	7.398
4	5-SCH₃	NHC₆H₃-3-Cl-4-(OC₆H₄-4-Cl)	7.402	8.00	2.290	4.086	6.319
5	5-SOCH₃	NHC₆H₃-3-Cl-4-(OC₆H₄-4-Cl)	5.722	6.80	1.843	3.535	5.125
6	3-NO₂	NHC₆H₃-3-Cl-4-(OC₆H₄-4-Cl)	6.944	4.60	1.668	5.548	6.824
7	5-CN	NHC₆H₃-3-Cl-4-(OC₆H₄-4-Cl)	6.698	5.66	2.534	5.225	7.839
8	5-NO₂	NHC₆H₄-4-(OC₆H₄-4-CF₃)	6.541	5.15	1.483	5.397	7.022
9	3-SCH₃	NHC₆H₃-3-Cl-4-(OC₆H₄-4-Cl)	7.402	7.47	1.931	4.292	6.420
10	5-SO₂CH₃	NHC₆H₃-3-Cl-4-(OC₆H₄-4-Cl)	5.731	6.11	1.681	4.166	6.000
11e	5-NO₂	NHC₆H₄-4-(OC₆H₅)	5.658	5.31	1.450	4.818	6.097
12	5-NO₂	NHC₆H₃-3-Cl-4-(COC₆H₄-4-Cl)	6.264	5.37	1.803	5.184	7.131
13	5-NO₂	NHC₆H₄-4-(OC₆H₃-2-Cl-4-NO₂)	5.884	5.20	1.936	5.067	6.921
14	5-NO₂	NHC₆H₃-3-Cl-4-(OC₆H₄-4-OCH₃)	6.150	5.11	1.839	5.271	6.770
15	3-SO₂CH₃	NHC₆H₃-3-Cl-4-(OC₆H₄-4-Cl)	5.731	5.10	1.433	5.034	6.301
16	5-NO₂	NHC₆H₃-3-Cl-4-(SC₆H₄-4-Cl)	7.476	5.25	1.554	5.504	7.357
17e	3-NHCHO	NHC₆H₁₃	3.259	7.88	4.018	0.430	<5.000
18	3-NHCHO	NHC₈H₁₇	4.317	7.88	4.091	1.488	5.585
19f	3-NHCOCH₃	NHC₁₄H₂₉	7.468	7.88	1.181	4.143	5.097
20e	5-NO₂	NHC₁₄H₂₉	8.648	5.83	1.016	5.406	6.893
21	3-NO₂	NHC₁₄H₂₉	8.648	5.28	1.293	5.522	6.818
22	3-NO₂-5-Cl	NHC₁₄H₂₉	9.419	4.68	1.684	5.564	7.362
23	5-NO₂	NHC₆H₄-4-C(CH₃)₃	5.386	5.55	1.659	4.369	6.229
24e	5-NO₂	NHC₁₂H₂₅	7.590	5.83	1.647	5.361	7.409
25	3-NO₂	NHC₁₆H₃₃	9.706	5.28	0.555	5.524	5.959
26	5-NO₂	NHC₆H₃-3-Cl-4-(NH-C₆H₄-4-Cl)	6.900	5.12	0.968	5.477	6.432
27	5-NO₂	NHC₆H₄-4-(OC₆H₄-3-CF₃)	6.541	5.16	1.675	5.394	7.027
28	5-NO₂	NHC₆H₃-3-Cl-4-(NHC₆H₄-4-SCF₃)	7.899	5.04	1.711	5.541	7.553
29	5-NO₂	NH-3-Cl-4-(3-CF₃C₆H₄O)C₆H₃	7.114	5.06	1.872	5.510	7.071
30e	5-NO₂	NHC₆H₄-4-(CH(OH)C₆H₅)	3.870	5.42	1.197	3.015	<5.000
31f	5-NO₂	C₆H₄-4-Cl	4.329	6.73	0.777	2.221	6.481

ClogP-predicted 1-octanol/water partition coefficient P.

SPARC-predicted acidity of the phenolic hydroxyl.

Affinities and disposition function values predicted from the DF-MSMM CoMFA model.

Measured antifilarial activities (molar EC50).[36,52]

Test set compounds.

Compounds excluded before analysis because of singularity.

Figure 1

Conformational switching of an intramolecular H-bond in the salicylamide ring of antimycin derivatives upon ionization.

ClogP-predicted 1-octanol/water partition coefficient P. SPARC-predicted acidity of the phenolic hydroxyl. Affinities and disposition function values predicted from the DF-MSMM CoMFA model. Measured antifilarial activities (molar EC50).[36,52] Test set compounds. Compounds excluded before analysis because of singularity. Conformational switching of an intramolecular H-bond in the salicylamide ring of antimycin derivatives upon ionization. The hydroxyl pKa values (Table 1) of the studied antimycins were estimated by the SPARC method,[50] which seems to take the intramolecular H-bond into account. The magnitude of the pKa values (range 4.6–8.0) is determined mainly by the electron-withdrawing ability of the R1 substituents on the phenol ring, with the R2 substituents exhibiting a slighter influence. Under neutral conditions, the compounds are present in the aqueous solution as a mixture of ionized and nonionized species, with the balance shifted toward the ionized species for most compounds, except compounds 1, 2, 4, 9, and 17–19 having pKa > 7. Formation of tautomers and carbonyl hydrates was negligible, according to the SPARC estimates; otherwise, they would be included in the model in the same way as ionization. Table 1 also contains the activities spanning almost 3 orders of magnitude and the 1-octanol/water partition coefficients P (ClogP estimates[51] for neutral molecules). Antimycin A is a known inhibitor of oxidative phosphorylation.[37] Therefore, we examined whether the Selwood data could be explained by our two-decade-old, conceptual QSAR model for uncouplers.[34] The model describes isoeffective concentrations eliciting a given degree of uncoupling as a nonlinear function containing lipophilicity (P) and acidity (pKa). Although the surface generated by the fit to the data was reminiscent of the expected shape,[34] the deviations of individual compounds were too large to consider the fit satisfactory. We hypothesized that the deviations were caused by structure-specific interactions with an unknown receptor; therefore, we decided to deploy a ligand-based 3D-QSAR method.

Setup of the Cell-QSAR Model

The cell-QSAR correlation equation for the simplest drug effect scenario consists of the product of the drug–receptor association constant K and the DF (eq 3 in the Methods). The receptor is unknown in this case, so K was expressed using one of the popular ligand-based methods, CoMFA.[10] The antimycin analogues ionize to different degrees and are flexible; therefore, we applied our multispecies (MS), multimode (MM) modification of CoMFA.[53] The pseudoequilibrium form of the DF (eq 10 in the Methods),[32] treating absorption as an instantaneous process, was chosen, because no lag phase was observed in the time course of toxicity.[36,52] An additional reason for selecting this DF form was the ability of the underlying model[32] to handle speciation of the ligands.

Ligand Superposition

Pharmacophores obtained for all species using the Galahad procedure,[54,55] which performs flexible superposition based on the active analogue approach,[56] indicated that the salicylamide moieties are closely superimposed in all cases. This outcome is consistent with the apparent importance of this moiety for inhibition: an analogue capable of H-bond formation (Figure 1) inhibited the cytochrome bc1 complex activity in submitochondrial particles much stronger than a similar analogue lacking this ability.[57] A rigid compound that preferably exhibits a high activity would be very useful for guiding the superposition of the ligands. Unfortunately, there is no truly rigid analogue available in the studied series. The three-ring compounds contain more rigid aromatic rings than other compounds, so we selected the most active compound in this group and in the entire series (7) as the template. Because the pharmacophore-generating procedure did not provide clues about the conformation of the lipophilic fragments connected to the salicylamide ring, we used the low-energy conformations of compound 7 as the assumed bound conformations. This step is associated with the risk that the overall spatial organization of the binding site model may not completely correspond to that of the actual receptor. While the binding points will be positioned appropriately around individual rigid rings of the template, the relative positions of the individual groups of points surrounding the rigid fragments may differ from those of the actual receptor because of possible fragment rotations. This drawback would not affect the predictive ability of the model for compounds which can adopt the template conformation without significant conformational strain. The conformational space of compound 7 is characterized by four torsions (Figure 2). Conformational analysis identified the minimum-energy conformations for the H-bonded forms of both neutral and ionized molecules (Figure 1). The common minimum conformations of ionized and neutral species without the intramolecular H-bonds were also looked up, for the possibility that the hydroxyl and the amide groups would interact with the receptor instead of forming the intramolecular bond. Two skeleton conformations appear for each of the two species: one closed via the intramolecular H-bond and one open without the H-bond. The conformations are summarized in Table 2.

Figure 2

Table 2

Superposition Templates Based on the Most Active Compound 7 (Table 1)

				torsionc (deg)
template no.	speciesa	conformationb	intramolecular H-bond	Φ₁	Φ₂	Φ₃	Φ₄
1	neutral	closed	OH···O=C	4.1	2.8	93.8	178.5
2	neutral	open	none	137.7	–15.7	208.3	225.4
3	ionized	closed	NH···O^–	180.6	0.8	98.4	174.7
4	ionized	open	none	137.7	–15.7	208.3	225.4

See Figure 1.

“Closed” and “open” refer to the intramolecular H-bond being present and absent, respectively.

See Figure 2.

Four torsions of compound 7 (Table 1) characterizing the conformations of three-ring analogues displayed in red color. They are Φ1 (C3–C4–C11–O12), Φ2 (C11–N13–C15–C20), Φ3 (C17–C18–O22–C23), and Φ4 (C18–O22–C23–C28). See Figure 1. “Closed” and “open” refer to the intramolecular H-bond being present and absent, respectively. See Figure 2. It has long been recognized that flipping of symmetrical rings with asymmetrically placed substituents results in similar but distinct bound geometries. If the receptor structure is unknown, preferred geometries cannot be selected a priori and all geometries should be considered in model building.[58−64] A total of 16 of 31 studied compounds (Table 1) have substituents in meta-positions on either the middle ring (2–7, 9, 10, 12, 14–16, 26, and 28) or the third ring (13 and 27). For these compounds, rotation of the asymmetrically substituted ring by 180° led to doubling of the existing two modes. Compound 29 with two asymmetrically substituted rings is represented by eight modes. The process of creating individual superpositions is illustrated in Figure 3. The four basic templates of compound 7 were superimposed by the atom fit method in Sybyl (Figure 3b) to ensure the best overlap of the salicylamide moieties, as suggested by the generated pharmacophore. All other molecules were superimposed on the templates using the FlexS procedure[65] (Figure 3c,d).

Figure 3

Superposition procedure: (a) four basic templates of compound 7, generated by conformational analysis and numbered as in Table 2; (b) superposition of the four basic templates; (c) superpositions of all compounds (Table 1) to the four basic templates in respective species and conformations, numbered as in row a; (d) final superpositions, enriched by flipping the benzene rings by 180° for the MSMM situation (I; 200 molecules = 31 compounds × 2 species × 2 skeleton conformations × 1, 2, or 4 ring-flipping conformations), the MM situation (II; 100 molecules created as in situation I but with 1 species), and the standard one-mode situation (III; 31 molecules).

DF-MSMM CoMFA Analysis

The analysis is based on the correlation equation obtained by taking the logarithm of eq 3 (X = 0.5 because the EC50 values are modeled), with the association constant K expressed by eq 9, and the DF given by eq 10 (the exposure time t = constant) combined with eqs 11 and 12. For both the receptor-binding and disposition parts, ionization is included and the coefficients are optimized simultaneously. Optimization examines two forms of the DF, for the aqueous phases (shown in eq 10) and membranes. If the receptors are localized in a membrane and the compounds bind to them from the hydrocarbon core of the bilayer, eq 10 needs to be multiplied by Pβ (P is the reference partition coefficient and β is the Collander coefficient, which is one of the optimized regression coefficients). The water and membrane variants of eq 10 are initially used in fully expanded form: each of the coefficients A–D is replaced by eqs 11 and 12. The optimization proceeds with gradual forward selection of energies in relevant grid points in the CoMFA part, which is done separately for the two forms of the DF. The better DF is selected on the basis of the statistical criteria and further optimized by omitting the parameters with higher standard deviations, if the elimination does not cause a significant decrease in statistical criteria. One, or sometimes both, of the coefficients C and D, describing lipophilicity-dependent and lipophilicity-independent elimination, can frequently be omitted. In each pair of coefficients A–D (eq 11), the term for one species may or may not be significant. The combinatorial search for the best DF-MSMM CoMFA model was extensive for three reasons: many possible combinations of relevant grid points need to be examined, the DF can have multiple functional forms, and many sets of the initial estimates of coefficients need to be tested to avoid trapping in local minima. The best model was selected on the basis of the r2 and q2 (for the leave-one-out cross-validation in the training set, used to eliminate unstable models) values and the number of optimized coefficients and and their standard deviations. For illustration, there were 65 021 models with r2 ≥ 0.6, 11 142 models with r2 ≥ 0.8 and q2 ≥ 0.6, and 3752 models with r2 ≥ 0.9 and q2 ≥ 0.7, of which 25 models had less than 11 coefficients, each with smaller than 60% standard deviation. The best DF-MSMM CoMFA model contained the DF for the compounds interacting with the receptors in a membrane, with metabolism that was either insignificant or invariant for the studied compounds (C = 0 and D = 0), and five required CoMFA grid points. The final, pruned correlation equation, resulting from eq 3 (X = 0.5), with the association constant K expressed by eq 9 (the subscript i was omitted for fractions f), and the DF given by eqs 10–12, wasThe first term is an MSMM CoMFA expression for the receptor binding (log K), and the remaining terms describe disposition (log DF). The fractions of ionized and nonionized molecules, f1 and f2, respectively, include the pKa values and the optimized pH value as shown in eq 12 and are identical in both receptor-binding and disposition terms. The microscopic association constant K for each of the m considered modes is represented by a k-summand. Each of the k-summands contains five identical adjustable coefficients, C1–C5, which, for the selected grid points (coordinates in Table 3), multiply the energies X for the given modes. The addition of multiple modes does not increase the number of optimized coefficients; it only changes the form of the correlation equation by increasing the number of k-summands.

Table 3

Characterization of the Receptor-Binding Part of the DF-MSMM CoMFA Model (Eq 1)

		coordinates (Å)b
coefficient	valuea	x	y	z	energyc
C₁	0.538 ± 0.247	12.66	6.83	4.780	S
C₂	–1.931 ± 0.711	6.668	10.836	–5.218	S
C₃	–0.245 ± 0.108	2.668	6.836	–3.218	E
C₄	–6.657 ± 2.873	4.668	16.863	–5.218	S
C₅	0.116 ± 0.043	–3.330	14.836	–7.2184	E

Regression coefficients.

Grid point coordinates.

Energy types: S, steric, E, electrostatic.

Regression coefficients. Grid point coordinates. Energy types: S, steric, E, electrostatic. Nonlinear regression analysis resulting in eq 1 provided the following outcome for the best DF-MSMM model. For the receptor-binding expression, the results are summarized in Table 3. The optimal values of the disposition-describing coefficients for the DF were A1 = (7.142 ± 2.213) × 10–5, A2 = (2.636 ± 1.254) × 10–6, B1 = (9.989 ± 4.233) × 102, and pH 7.56 ± 0.32. The optimization of the Collander coefficient β and the coefficient C0 did not lead to a sufficient improvement, so their values were fixed as β = 1 and C0 = 0. The description statistics, calculated for the training set, consists of the sum of squares of errors (SSE), where the error is the difference between calculated and experimental values, and the squared correlation coefficient (the coefficient of determination, equal to the percentage of explained variance) r2 = 1 – SSE/SYY, where SYY is the sum of squares of deviations of the experimental log K values from their average. The prediction statistics contains similar indices calculated for the test set that was not used in the development of the model: predictive sum of squares of deviations between predicted and experimental values (PRESS) and the squared correlation coefficient q2 = 1 – PRESS/SYY. Table 4 summarizes the descriptive (SSE and r2 for the training set) and predictive (q2 and PRESS for the test set) statistical criteria for the DF-MSMM CoMFA model (1) and for the reduced models (2–13). For comparison, the random scrambling of bioactivities (50 runs) produced no significant correlations, with the average values of SSE = 11.12 and r2 = 0.250. Optimized eq 1 also provides the affinity (log K) and disposition (log DF) components of bioactivity (Table 1). A test set (Table 1, compounds 2, 11, 17, 20, 24, and 30) covering the entire range of biological activities was selected before optimization. The use of multiple test sets, which would be a preferred way to examine the predictive abilities of the model, was precluded by the extent of the optimization procedure. For compounds 17 and 30 with semiquantitative activities, the q2 and PRESS values (Table 4) were calculated using the upper limits, i.e., log(1/EC50) = 5. Because only the DF-MSMS CoMFA model correctly predicted log(1/EC50) < 5 (Figure 4b), the predictive abilities are probably underestimated for the DF-MSMS CoMFA model and overestimated for all other models.

Table 4

Statistical Indices Characterizing CoMFA Models of Varying Complexitya

				statistical indices
CoMFA model				descriptive		predictive
no.	species	mode	no. of coefficients	SSE	r²	PRESS	q²
1	DF-MS	MM	5 + 5	0.902	0.923	1.356	0.815
2	MS	MM	11	2.336	0.801	3.161	0.569
3	SS: majorb	MM	11	2.476	0.789	2.995	0.592
4	SS: neutral	SM: closed	5 (3568)c	0.154	0.987	6.410	0.126
5	SS: neutral	SM: open	5 (3563)	0.361	0.969	6.070	0.173
6	SS: neutral	MM: averaged	5 (564)	0.400	0.966	6.498	0.114
7	SS: ionized	SM: closed	5 (3569)	0.155	0.987	6.686	0.089
8	SS: ionized	SM: open	5 (3584)	0.196	0.983	5.411	0.262
9	SS: ionized	MM: averaged	5 (572)	0.580	0.951	6.228	0.151
10	SS: majorb	SM: closed	5 (3570)	0.154	0.987	6.785	0.075
11	SS: majorb	SM: open	5 (3583)	0.325	0.972	4.818	0.343
12	SS: majorb	MM: averaged	5 (534)	0.064	0.995	5.897	0.139
13	SS: bounde	SM: bounde	5 (3581)	0.403	0.966	4.455	0.393

The models differ in the use of disposition function (DF), multiple or single species (MS or SS), and multiple or single modes (MM or SM).

Prevailing species in aqueous solution used for each compound.

Number of latent variables shown, with the number of all energies given in parentheses.

The energies of all considered modes are averaged and analyzed by standard, one-mode CoMFA.

Prevalent bound species and mode as determined by the DF-MSMM CoMFA model used for each compound.

Figure 4

Calculated and predicted activities plotted against experimental data for the training set (a) and test set (b), generated with the DF-MSMM (black), MSMM (blue), SSMM (red), and standard (green) CoMFA models (models 1–3 and 11 in Table 4, respectively). In plot b, the two experimental values of log(1/EC50) = 5 are actually <5 (Table 1, compounds 17 and 30), as indicated by the arrows.

The models differ in the use of disposition function (DF), multiple or single species (MS or SS), and multiple or single modes (MM or SM). Prevailing species in aqueous solution used for each compound. Number of latent variables shown, with the number of all energies given in parentheses. The energies of all considered modes are averaged and analyzed by standard, one-mode CoMFA. Prevalent bound species and mode as determined by the DF-MSMM CoMFA model used for each compound. Calculated and predicted activities plotted against experimental data for the training set (a) and test set (b), generated with the DF-MSMM (black), MSMM (blue), SSMM (red), and standard (green) CoMFA models (models 1–3 and 11 in Table 4, respectively). In plot b, the two experimental values of log(1/EC50) = 5 are actually <5 (Table 1, compounds 17 and 30), as indicated by the arrows. Table 4 demonstrates that all features used in the DF-MSMM CoMFA model are necessary for a good prediction. Omission of the disposition leads to a decline in predictive abilities (PRESS and q2), even if multiple species and/or modes are considered (models 2 and 3). The decline is more severe for standard, one-mode CoMFA models (4–13), including those with the averaged multiple modes (models 6, 9, and 12) and even for the prevalent bound species and modes (model 13), which were selected by DF-MSMM CoMFA model 1. This fact indicates that the one-mode QSAR solution may not exist. Although only this indirect, model-based indication is available for the studied case, multiple modes were experimentally proven to occur in many ligand–macromolecule complexes (see the references in ref (57)). Interestingly, the descriptive abilities (SSE and r2) were great in all cases. This may, in part, explain the use of 3D-QSAR methods without any description of disposition for cell-level activities. It is worth mentioning that an exhaustive examination of modes and species for each compound in a standard CoMFA analysis, seeking a one-mode solution, is technically not feasible. The 23 compounds of the training set, each present in two species and two conformations (for simplicity, the ring-flipping conformations are omitted), would require 4[23] analyses or about 2.2 million CPU years, if a single one-mode analysis only lasts 1 s. Therefore, the MSMM approach, performing an extensive, albeit not exhaustive search lasting several weeks on a few processors, compares favorably. The calculated bioactivities for the training set and the predictions for the test set compounds are compared with experimental values in parts a and b, respectively, of Figure 4. The use of multiple species and modes and, especially, of the DF clearly improves predictivity (PRESS and q2). Descriptive statistical indices for calibration display a similar trend, except for the excellent SSE and r2 values for the standard, one-mode CoMFA model, utilizing the full set of energies. The standard model, however, did not confirm its calibration quality in the prediction exercise (Figure 4b).

3D-QSAR Contour Map

The steric and electrostatic contours for the DF-MSMM CoMFA model with various compounds embedded in the background of overall MSMM superposition (Figure 3d, situation I) are displayed in Figure 5, along with representative ligand conformations. The contour map shows that the binding site model is defined by a sterically favorable region near the salicylamide ring, two sterically unfavorable regions in the midsection of the binding site, an electronegative-favoring region near the second ring of two- or three-ring compounds, and an electropositive-favoring region at the lipophilic end of the molecules. The absence of electrostatic regions around the salicylamide ring may result from small differences among the studied compounds in this substructure.

Figure 5

Contour maps of the DF-MSMM CoMFA model with embedded superposition of all 200 molecules to indicate the applicability space. Green and yellow contours indicate regions that are sterically favorable and unfavorable, respectively. Blue regions favor electropositive groups, and red regions favor electronegative groups. The following ligands are presented in ball and stick models: (a) compound 2 (Table 1) binding in ionized and open conformation, (b) compound 2 binding in neutral and closed conformation, showing the clash of the 3-Cl in the second ring with one of the sterically unfavorable regions, (c) compound 18 with a shorter alkyl chain, one of the strongest binders, (d) compound 19 with the 3-acetamide group in the salicylamide ring occupying a sterically favorable region, showing clashes of the long chain with one of the sterically unfavorable regions. The binding conformation in which the bulky chlorine atom in the second ring is closer to the salicylamide OH group and far from a sterically unfavorable region (Figure 5a) is preferred in 11 out of 15 compounds. Consequenty, the ring-flipped conformation (Figure 5b) is not populated. Compounds with lipophilic alkyl groups tend to have a wide range of activities depending on the length of the alkyl chains. Compounds 17 and 18 (Figure 5c) with six and eight methylene groups, respectively, bind in an extended conformation and show the highest predicted binding affinities (Table 1, log K > 4), indicating the optimal chain length for binding. On the other hand, compounds with 12–14 methylene groups (Table 1, 1, 19–22) bind in a packed conformation (Figure 5d) that results in low affinities (Table 1, log K < 1.7), probably because the folded chain protrudes into the sterically unfavorable regions. For the same reason, the affinity of compound 25 with 16 methylene groups exhibits a further drop (log K ≈ 0.5). This reasoning could also apply to compound 23 showing only average affinity (log K ≈ 1.7), although its 4-tert-butylphenyl ring is similar in length to the optimal n-hexyl chain in the extended conformation (compound 17) but is much bulkier. The red electronegative-favoring region in the midsection of the binding site improves the affinity (log K) of 15 of 31 compounds with an electronegative chlorine atom in the second ring, even though their overall activities differ significantly because of varying disposition. This interaction also causes the compounds with the electronegative oxygen (2–11, 13–15, 27, 29) or carbonyl (12) linkers between the second and third rings to have a higher binding affinity than compounds with other linkers (16, 26, 30). Two compounds, 19 and 31, were excluded from optimization before its start because of singularities caused by molecular parts protruding into subspaces which were not reached by other compounds in the series. Singularity of compound 19 is caused by the 3-acetamide group in the salicylamide ring, which reaches a spot in the vicinity of the sterically attractive region (Figure 5d). The singular region may be an open, water-filled space with no effect on affinity because the prediction for compound 19 by the DF-MSMM model (log K + log DF = 5.324, Table 1) is close to the experimental log(1/EC50) = 5.097. Compound 31 lacks an amine linker between its rings, and therefore, its molecule extends into uncharted space. Compound 31 has a much lower predicted activity (log K + log DF = 2.998) than the actual measured log(1/EC50) = 6.461. This difference indicates that there may be an attractive spot in the midsection of the binding site which the model does not capture because of the lack of molecules occupying this region.

Prevalences of Bound Species and Modes

The prevalences, given by the Boltzmann distribution shown in eq 8 in the Methods, cannot be guessed a priori because they are one of the outcomes of optimization. The prevalences for the DF-MSMM CoMFA model (eq 8), with individual K values calculated as the decadic terms in eq 1 with optimized values of the coefficients C, are shown in detail in the Supporting Information and summarized here. A total of 16 of 31 compounds (1, 2, 8, 10, 11, 13, 15–18, 23, 25, 27, 28, 30, 31) bind predominantly in the closed conformation, maintaining the intramolecular H-bond (Figure 1). The remaining compounds bind in the open conformation, exposing the salicylamide’s polar groups, with at least 50% prevalences. Compounds 15, 17, 18, and 25 bind 100% in the closed conformation, while compound 20 binds 100% in the open conformation. Most compounds prefer to bind as ionized species. The exceptions are compounds 2, 4, 9, 17, and 18, for which the neutral species bind with 74%, 53%, 58%, 100%, and 100% prevalences, respectively. Figure 6 indicates that the species distributions of bound and free molecules are quite similar for most compounds. For neutral species, most compounds have both the fraction in water and binding prevalence less than 10%. For ionized species, most compounds have the fraction in water and binding prevalence as high as 90%. About a quarter of the compounds (1, 2, 4, 5, 18, 19, 31), however, exhibit significant differences between the fractions in water and binding prevalences.

Figure 6

Speciation of free and bound molecules according to the DF-MSMM CoMFA model. Neutral and ionized species are shown in blue and red, respectively, as the compound number (Table 1).

Speciation of free and bound molecules according to the DF-MSMM CoMFA model. Neutral and ionized species are shown in blue and red, respectively, as the compound number (Table 1). The predominant binding species and binding conformations are to a large extent affected by the substituents in the salicylamide ring. Strong electron-withdrawing groups promote ionization and also contribute to higher activity: only 3 (4, 9, 31) out of 24 compounds with log(1/EC50) > 6 have pKa > 6.5 (Table 1). However, lipophilic moieties including alkyl chains and two or three ring systems also affect the binding as well as disposition, so the structure–activity relationships are complex.

Binding of Alkyl-Chain Analogues

The strongest binders, compounds 17 and 18 (Table 1, log K > 4) with a 3-formylamino group in the salicylamide ring predominantly bind as neutral species in the closed conformation. The shorter alkyl chains may allow sufficient conformational freedom for these two compounds to make the best possible interactions in the binding site, leading to high binding affinities. Long-chain analogues (compounds 19–22 and 24, Table 1) predominantly bind in ionized open conformations or sometimes in ionized closed conformations, as seen for compounds 1 and 25. All long-chain analogues exhibit low to average binding affinity, possibly because of steric clashes of long chains in packed conformations (Figure 5d) with the sterically unfavorable regions in the midsection of the binding site.

Binding of Two- and Three-Ring Analogues

Compounds with a 2-NO2 or 5-NO2 group in the salicylamide ring bind predominantly as ionized species, either in open or closed conformations, depending upon the presence of a 3-Cl substituent in the second ring (the absence of which seems to be compensated by the presence of a 2-Cl or 4-CF3 substituent in the third ring). Compounds 3, 6, 12, 14, 26, 28, and 29 with a 3-Cl substituent in the second ring mainly bind in the open conformation, although compounds 12 and 29 bind about equally in the closed conformation as well, and compound 28 shows slightly higher preference for the closed conformation. This variation is possibly affected by the presence of a carbonyl linker group between the second and third rings in compound 12 and 4-SCF3 and 3-CF3 groups in the third ring of compounds 28 and 29, respectively. Compounds without a 3-Cl substituent in the second ring (8, 11, 13, 23, 27, and 30) bind predominantly as ionized species and show higher preference for the closed conformation. The absence of steric hindrance by a chlorine substituent in the second ring allows these compounds to assume the preferred closed conformation with an intramolecular H-bond between the amide NH and phenoxide ion. Compound 16 (with a 5-NO2 group in the first ring and a 3-Cl substituent in the second ring) prefers to bind in the closed conformation, possibly because of the presence of a thioether linkage between the second and third rings, resulting in a longer linker. The most active in this series, compound 7 with a 5-CN substituent in the salicylamide ring, binds predominantly as an ionized species in the open conformation.

Influence of Disposition

The disposition of antimycins in female worms of Dipetalonema viteae after 120 h of exposure[52] can be visualized for individual compounds as the difference between the experimental log(1/EC50) values and the log K values (Table 1) calculated from the affinity part of the DF-MSMM CoMFA model (eq 1). These values are plotted as a function of the drugs’ lipophilicity (log P) and acidity (pKa) in Figure 7, along with the surface given by the disposition part of the DF-MSMM CoMFA model (eq 1) with optimized coefficients. The shape of log DF(P,Ka) is remarkably nonlinear: it is essentially composed of interconnected planes positioned at varying angles. No further curvatures are expected, according to the model, as the planes extend in all four directions to the limit property values, so this conceptual function safely predicts outside of the property ranges.[33] Disposition significantly modulates the bioactivity of antimycins by ∼5 log(1/EC50) units. Using the correct form of the DF for the correlations is essential, because its peculiar shape (Figure 7) is difficult to emulate by polynomials, which are too flexible to approximate the planar areas.[33]

Figure 7

Dependence of the pseudoequilibrium, membrane-based DF on lipophilicity and acidity, as given by the disposition part of eq 1 with the optimized coefficient values listed in the text. The data for best binders with poor disposition (compounds 17 and 18, Table 1) are shown as open points. The resulting fit (eq 1, Table 3 and accompanying text) can be interpreted as follows. The variation in disposition is caused by pseudoequilibrium partitioning of ionizable compounds. The hypothetical receptors are localized in a membrane, with the drugs accessing the binding site (the CoMFA model in Figure 5) from within the bilayer. The optimized coefficients are associated with the processes that determine drug disposition: A, with bilayer accumulation and nonspecific protein binding; B, with accumulation in aqueous phases, with subscripts 1 and 2 indicating the relatedness to the nonionized and ionized species, respectively. The coefficient B2 was not necessary for a good fit, indicating that accumulation of ionized molecules in the aqueous compartments did not cause significant variation in overall disposition. The exponent, coming from eq 10 and containing coefficients C and D describing metabolism, need not be used in eq 1, meaning that C1 = C2 = D1 = D2 = 0. The model indicates that the pseudoequilibrium disposition of the studied compounds was not affected by metabolism, which could be either negligible or similar for all compounds. The internal pH value in terms f1 and f2 (eq 12) was optimized as pH 7.56 ± 0.32. This high value can normally be encountered only in one part of the cell: in the matrix of mitochondria. Therefore, the fit indicates that the hypothetical receptors are localized in the bilayer that is in contact with the mitochondrial matrix. This indication agrees with the current knowledge about antimycin A’s mechanism of action, which focuses on inhibition of electron transport.[57,66−69] Antimycin A binds at the ubiquinone binding site Qi of the complex that is located on the matrix side of the cristae membrane.[70] Notably, the disposition function provides clear guidelines for the optimization of the drug structure for the best disposition. The top plateau in Figure 7 indicates the range of log P and pKa values that lead to the maximum drug disposition in the receptor surroundings. The region of the maximum disposition can be defined as pKa ≤ log P for log P ≤ 6 and pKa ≤ 6 for log P > 6. The knowledge of suitable combinations of log P and pKa values provides unique clues for optimization of drug structure, also because the lower log P and pKa values will lead to increased aqueous solubility.

Factoring Bioactivity into Affinity and Disposition Components

Equation 1 defines two contributions to the affinity: log K (the first term) and disposition (log DF; the remaining terms). The two contributions were enumerated from the calibrated DF-MSMM CoMFA model (eq 1 and the accompanying text, Table 3), summarized in Table 1, and plotted against the experimental antifilarial activities (Table 1) in Figure 8.

Figure 8

Dissection of affinity (log K; red symbols) and disposition (log DF; blue symbols) contributions to overall bioactivity by the calibrated DF-MSMM CoMFA model. Stars mark the best binders 17 and 18 with poor disposition. All data are given in Table 1. Figure 8 and Table 1 show that two weakly active compounds (17 and 18) are actually the best binders, which are held back by poor disposition. Their disposition can be, in principle, increased by 4–5 logarithmic units, leading to overall bioactivities approaching log(1/EC50) ≈ 9.5, which is almost 2 orders of magnitude better than that of the most active compound 7. Disposition is easier to optimize than binding because the former depends on lipophilicity and acidity, which are well understood in terms of drug structure. The DF-MSMM CoMFA model, and cell-QSAR models in general, provides unique clues for optimization of the compounds.

Conclusions

The cell-QSAR concept adapts ligand-based and receptor-based 3D-QSAR approaches for use with the bioactivities measured in cells or more complex systems. The adaptation consists of the extension of the correlation equation specific for the given 3D-QSAR method by the DF. The DF describes changes in the concentration of compounds in the receptor surroundings originating from membrane accumulation, binding to nonreceptor proteins, hydrolysis, and possibly other processes. The cell-QSAR concept was used here with the popular ligand-based CoMFA method. A conceptual combination of the DF with our MSMM extension of the CoMFA model was applied to antifilarial activities of antimycin A analogues, representing a part of the difficult-to-model Selwood data set that is often used as a test case for assessment of new variable-selection algorithms. The fitting results are very satisfactory and show that each component of the model (DF, MS, MM) improves the predictive ability. The optimized form of the disposition function indicates that the compounds bind to the hypothetical, membrane-bound receptor in the cristae bilayer of mitochondria and the differences in their metabolic rates are negligible. The perceived complexity of the Selwood data set may be explained by the nonlinearity of the model in lipophilicity and acidity in combination with the lack of proper descriptors for acidity in the original set, the peculiar intramolecular H-bonds in the salicylamide moiety that differ for neutral and ionized species, and the structure-specific receptor binding. The results also illustrate the utility of the cell-QSAR approach in medicinal chemistry. Specifically, the cell-QSAR approach delineates the affinity and disposition contributions to bioactivity, which provide for a straightforward, separate structure optimization for affinity and disposition. Although further testing is necessary, the cell-QSAR concept seems to be a promising direction in the structure-based or ligand-based predictions of bioactivities in cellular or more complex systems and drug structure optimization. For larger and more diverse data sets, better superposition techniques and methods for prediction of species fractions will be necessary. The model-based nature opens the cell-QSAR concept for incorporation of any future improvements in 3D-QSAR techniques and the disposition function, as well as new pharmacology and pharmacodynamic scenarios.

Methods

Pharmacokinetics (PK), Pharmacology (PL), and Pharmacodynamics (PD) Phases of the Drug Effect in QSAR

The PK phase is described by the DF. The simplest PL/PD scenario consists of a situation when the PL phase is represented by the 1:1, fast, and reversible ligand–receptor interaction and the PD phase describes the biological effect, which is a direct and immediate consequence of the drug–receptor interaction and is proportional to the fraction of occupied receptors. The ligand–receptor association constant K is defined using the concentrations (denoted by square brackets) of the ligand (L), receptor (R), and complex (LR) as K = [LR]/([L][R]). In a cellular system, the free ligand concentration [L] varies because of the disposition, and consequently, all three concentrations are time-dependent. Usually, the concentrations are sufficiently low to approximate activities. If the concentration of the free receptors [R] is expressed as the difference between the total receptor concentration [R]0 and [LR], the fraction of occupied receptors is given as [LR]/[R]0 = K[L]/([L] + 1) = E/Emax. The latter equality comes from the simplest PD phase, describing the primary effect E as being proportional to the fraction of occupied receptors, [LR]/[R]0. Secondary effects lacking this simple tie to the receptor modification require more complex PD models.[71] Nevertheless, the above scenario applies to many drug effects caused by competitive enzyme inhibition or receptor binding. Since the receptor-bound drug represents only a small drug fraction in the body, the disposition is not significantly affected by the receptor binding, and the ligand concentration in the receptor surroundings, [L], can be expressed using the DF with properties p and time t as variables and the initial ligand concentration c0:[72] To obtain a PK/PL/PD expression, [L] in the above expression for E/Emax is replaced by the product DF(p,t) c(t), where c(t) are the isoeffective drug concentrations eliciting the effect, which is equal to the fraction X of the maximum effect. With E/Emax = X, the bioactivity, characterized by c(t), is described as[72]Equivalent PK/PL/PD dependencies were derived for the irreversible drug–receptor interactions, either single or preceded by a fast, reversible step.[72] Equation 3 and equivalent PK/PL/PD relationships provide the blueprint for a conceptual combination of the disposition function with any ligand-based or receptor-based 3D-QSAR method for prediction of binding affinities. The K term in eq 3 needs to be replaced by a suitable conceptual 3D-QSAR correlation equation, and the disposition function can be selected from the available repertoire.[30,32] This cell-QSAR concept represents the recipe for the formulation of conceptual QSARs for cell-based assays and other complex systems.

Multispecies Multimode Equilibria

Studied antimycin analogues ionize under the conditions of the experiments, and each species forms different intramolecular H-bonds. The molecules are flexible, and each species can bind to the receptor R in different orientations or conformations (modes). The generalized multispecies multimode binding equilibria are outlined in the following equation:Each binding equilibrium is characterized by the microscopic association constant K (eq 5) and represents the receptor binding of the ith ligand in the jth species and the kth binding mode. In principle, the number of species s and the number of binding modes m can differ for individual ligands. In the studied case, there are only two species (s = 2, nonionized and ionized, with charges 0 and −1, respectively) for each ligand (Table 1). The number of modes is unknown, since there is no experimental information available on either the receptor or the complex, and can differ for individual ligands. In the general expressions, s and m represent the maximum numbers of species and binding modes for the tested ligands. The missing species or modes of some ligands will have zero fractions f or zero association constants K (see below). The microscopic association constant for the ith molecule’s jth species in the kth binding mode isUsually, only the total association constant of the ith ligand (K) is determined, although the microscopic association constants K are the relevant quantities to be correlated with the structure. To obtain a conceptual correlation equation for K, it must be expressed as a function of K. The experimentally measured K reflects the total concentration of the ligand/receptor complexes in any species and any binding mode. A series of rearrangements to the definition of K provides the desired function of K, shown in eq 6: (1) the total concentration of the complex [LR] is expressed as the sum of bound concentrations of individual species, (2) each summand is formally multiplied by [L]/[L] to introduce the fraction of each species, f = [L]/[L], and (3) the bound concentration of each species [LR] is broken down to individual binding modes. The relationship between the microscopic and total association constants is obtained by expressing the m-summands in eq 6 using eq 5:Here f is the fraction of the jth molecular species in the ith compound in solution. For compounds present in the solution as one species, the right-hand side of eq 7 is equal to the k-summation. In this case, eq 7 is in accordance with published analyses of formally analogous situations: the statistical thermodynamic[73] and equilibrium[58,59] treatment of multimode binding in ligand/protein interactions and kinetic analyses of a reversible unimolecular reaction leading to different products[74] or isomers.[75] The expressions for the fractions f of the jth species can be derived from the definition of the ionization constants. Only one molecule of a compound, in any species and any mode, can be bound in each complex. The distribution of species and modes in the bound state is characterized by their prevalences. For the kth mode of the jth bound species for the ith compound, the prevalence isThe numerator and denominator come from eqs 5 and 6, respectively. The third quasi-equality ensures that the sum of all s × m prevalences equals unity. The prevalence of the jth bound species in all modes and the prevalence of the kth bound mode in all species can easily be calculated using eq 8. For the application in CoMFA, eq 7 needs to be logarithmized and the log K values must be replaced by the correlation equation for CoMFA:Here C are the regression coefficients characterizing the weights of the ligand–probe interaction energies X in individual cross-sections of a grid surrounding the superimposed ligands. The subscripts indicate the relatedness to the ith compound, jth species, kth mode, and lth grid point, and g is the total number of grid points. Other contributions to the binding free energy (internal ligand energy, binding entropy, desolvation) can be added to the summation in the exponents. In this case, we only used the CoMFA energies because the size of the used data set did not allow a rigorous optimization of more than a dozen coefficients.

Intracellular Disposition

The pseudoequilibrium DF expresses the time course of the concentration of nonionized ligand molecules in the aqueous phases of the studied biosystem during the elimination period:Here P is the reference partition coefficient, β is the empirical Collander coefficient, and t is the exposure time. The terms A, B, C, and D characterize the key fate-determining processes the ligand molecules undergo in the biological system: A, accumulation in membranes/protein binding; B, distribution in aqueous phases; C, lipophilicity-dependent elimination; D, lipophilicity-independent elimination. For the receptors localized in the bilayer, eq 10 needs to be multiplied by Pβ. If compounds are present in the aqueous phase in the receptor surroundings in s species created by ionization, tautomerism, or carbonyl hydration, the model suggests that each of the parameters A, B, C, and D (represented by Y) from eq 10 be expanded as[76]The fractions f of individual species are the same quantities as in eq 9, except the subscripts i, indicating the compound, are omitted. The subscripts j denote, as in eq 9, the quantities associated with the jth species. The fractions of nonionized, neutral molecules (subscript j = 1) and ionized molecules with an overall charge of −1 (subscript j = 2) are given by the definition of the dissociation constant Ka for the given pH of the medium and can be written, respectively, as Generally, on the basis of the probable intracellular localization of the receptors, a reasonable intracellular pH range is 5–8, the lower limit being observed in lysozomes and the upper limit in the mitochondrial matrix. The pH value of the cultivation medium could affect these values, but the exact pH of the assay was not published.[52] For this reason and because the receptor localization is unknown, the pH value in eq 12 was treated as one of the adjustable coefficients.

Data Set

The Selwood data set includes in vitro filaricidal activities against D. viteae of 31 antimycin analogues (Table 1), which were determined by monitoring the leakage of incorporated radiolabeled adenine from intact female D. viteae after 120 h of exposure.[36,52] The EC50 values for adenine leakage were calculated after 120 h of exposure to a range of drug concentrations at 37 °C under neutral conditions.[52] The 1-octanol/water partition coefficients P were calculated by the ClogP program in Bio-Loom[51] software for neutral molecules and are summarized in Table 1. The values differ from the original data[77] because of the developments in the ClogP software. The differences are within ±0.25 log unit, except for compounds 6 (Table 1, log Pold – log Pnew = 1.526), 10 (−0.770), 12 (−0.638), 14 (−1.654), and 26 (0.325). The log P values were used with the belief that they properly reflect the relative lipophilicity of the studied compounds, although the values of log P > 7 were marked in the ClogP output as unrealistically high. Ionization decreases the log P values by several units, and most compounds were present mainly as ionized species under physiologic conditions. Our multispecies approach can take into account tautomers and hydrates of aldehydes and ketones; hence, formation of these species was examined using the SPARC Web server[50] and was found negligible for all compounds on the basis of the following reasoning. As shown in eq 7, the binding contribution of each species is given by the product of the association constant K (represented by the sum of the association constants K for individual modes[61]) and the species fraction f. The fraction limit for a species to be included in the correlation was set to 10–4 on the basis of the following considerations. The interaction of a rare species with the receptor can release about 2–3 kcal/mol more free energy than that of the prevalent species because both species bind by weak interactions and differ in a single, well-defined structural feature. For this situation, the product fK for the rare species would be less than 1% of the same quantity for the prevalent species.

Superposition

Pharmacophore models were built using the Galahad procedure[54,55] in Sybyl-X. The ligands are first aligned flexibly to each other in torsional space, and then the conformations produced are rigidly aligned in Cartesian space. Pharmacophore and steric bitmaps (TUPLET[78]) representing H-bonding and steric terms, respectively, along with the total ligand energy, serve as input to a multiobjective fitness function. Produced pharmacophores indicated that the salicylamide moieties of all compounds in all species are superimposed in all cases. In the absence of structural and other information on the receptor, the most active and, at the same time, one of most rigid analogues, compound 7 (Figure 2, Table 1), was selected as the template for superposition. Conformational analysis was performed using the Tripos force field in Sybyl[79] with an increment of 10° for all considered torsions. When the molecule forms an intramolecular H-bond (Figure 1), either as a neutral species (OH···O=C) or as an ionized species (NH···O–), the first torsion Φ1 is fixed and only Φ2, Φ3, and Φ4 are allowed to rotate in conformational analysis. In the opposite case, when analogues have no intramolecular H-bond, all four torsions in Figure 2 are considered rotatable. The approximate minimum conformations were further optimized and Mulliken charges were calculated in Jaguar[80] using the DFT/B3LYP method with a 6-31G** basis set. Charge and spin multiplicity were entered manually for different protonation states. In this way, the conformations of the template for the superposition were obtained. Open conformation, the mode without the intramolecular H-bond has the same optimal geometries for both neutral and ionized species. In the MM approach, the user is allowed to enter several possible bound conformations of the template in one optimization run. The MSMM approach extends this option to all species which are present in the solution surrounding the receptor. The four basic template molecules of compound 7 were superimposed by the atom fit method in Sybyl (Figure 3a). All studied compounds do not share a common skeleton; therefore, for each compound, both species in all modes were superimposed on the templates using the FlexS procedure[65] as incorporated in Sybyl (Figure 3c). FlexS combines conformational search with alignment in two steps. First, a base fragment of the ligand is selected and placed on the template. Second, as the rest of the ligand parts are incrementally built, torsion angles are perturbed to generate new conformations, and each conformation is scored for superposition quality. The obtained conformations of all compounds were minimized by Jaguar using the DFT/B3LYP method[81] with a 6-31G** basis set[82,83] with all torsions fixed. The Mulliken charges[84] were calculated for the optimized geometries. The superposition was repeated, and the resulting conformations were used in the CoMFA analyses.

Interaction Energies

Steric and electrostatic interaction energies X with an sp3 carbon probe with a +1 charge were calculated for each compound (subscript i) in each species (subscript j) and each binding mode (skeleton conformations × ring-flipping conformations, subscript k). The probe was placed at each lattice intersection (subscript l) of a 3D grid (2 Å in each coordinate direction). The grid has the following coordinates: −7 to +18 Å along the x-axis, 1–21 Å along the y-axis, and −15 to +11 Å along the z-axis. An energy cutoff of 30 kcal/mol was used to cap the maximum used energies.

Regression Analysis

The correlation equation is obtained by taking the logarithm of eq 3, in which the association constant log K is replaced by eq 9 and the disposition function DF(p,t) is given by eq 10, as combined with eq 11, in which the fractions are expressed by eq 12. The receptor-binding and disposition components have, from the optimization viewpoint, quite different properties: eq 9 has only one form, and the descriptive variables need to be selected from a large pool, while eqs 10–12 contain only two variables, which are organized in one of two possible functional forms, depending upon the location of the receptors either in the aqueous phase or in membranes. For the receptor-binding part (eq 9), the coefficients C are associated with the grid points and are independent of the species and modes. Therefore, the final set of optimized coefficients is independent of the numbering of individual species and modes. Notably, the inclusion of multiple species and multiple modes does not increase the number of optimized regression coefficients C. The correlation equation is nonlinear in the coefficients C in eq 9 and the coefficients A–D, pH and β in eqs 10–12. All statistical procedures[61] were written in C programming language and integrated in Sybyl through the Sybyl Programming Language (SPL) scripts. The choice of one of the two implemented optimization methods depends on the total number (N) of coefficients C in the calibrated model and on the number of compounds (n). Nonlinear regression analysis is applied when N ≤ n/2. Otherwise, the developed software automatically switches to the PLS analysis with a linearized form[61] of eq 9. The second option was not used for optimization of the best models. Both optimization methods are sensitive to good initial estimates. Therefore, a forward-selection procedure, conceptually similar to that used before for multimode analysis,[61] was a rational choice. Before analysis, the X variables (energy columns) were sorted using the following criteria: (1) high and sustained variability, (2) the minimal number of colinear columns with similar information, and (3) even distribution of selected grid points around the ligands. Sustained variability means that the energy values cover the whole interval more or less evenly and are not clustered. An extreme case is the situation when all energies in a given column are equal (usually zero) and only one mode has a different energy. These singularities originate when only one molecule protrudes into a subspace and other molecules do not affect the probe interaction energies in that subspace. Singular molecules need to be eliminated from the analysis because they do not provide statistically significant information about the poorly covered subspace. The model calibration can be steered by user-defined inputs, for which the default values are shown in parentheses. The process starts with a quick scan through the models consisting of a few (four) variables which are randomly selected from the top variables (10%). The initial values of the coefficients (±0.1) are systematically evaluated by a grid search. In this step, no optimization is applied and the correlation equation is evaluated for the initial coefficients. For the best models (top 10% r2), the coefficients are optimized. At each of the following steps, a reduction of the number of coefficients is attempted: the coefficients, which do not lead to a decrease in the r2 value, are eliminated. The best models (top 5% r2) are developed by a gradual addition of groups of a few (three) variables until all variables from the selected set (top 10%) are used. The procedure is finalized by fine-tuning of the best models (top 5% r2) by addition of single variables. The best model is selected on the basis of the following criteria: N, r2, q2 for the leave-one-out cross-validation (applied to the training set only to eliminate unstable models), and the standard errors of the parameters.

50 in total

1. QSAR of benzene derivatives: comparison of classical descriptors, quantum theoretic parameters and flip regression, exemplified by phenylalkylamine hallucinogens.

Authors: Brian W Clare
Journal: J Comput Aided Mol Des Date: 2002 Aug-Sep Impact factor: 3.686

2. Subcellular pharmacokinetics and its potential for library focusing.

Authors: Stefan Balaz; Viera Lukacova
Journal: J Mol Graph Model Date: 2002-06 Impact factor: 2.518

3. Genetic Algorithm guided Selection: variable selection and subset selection.

Authors: Sung Jin Cho; Mark A Hermsmeier
Journal: J Chem Inf Comput Sci Date: 2002 Jul-Aug

4. Free energy, entropy, and induced fit in host-guest recognition: calculations with the second-generation mining minima algorithm.

Authors: Chia-En Chang; Michael K Gilson
Journal: J Am Chem Soc Date: 2004-10-13 Impact factor: 15.419

5. Inhibition of electron transfer from ferrocytochrome b to ubiquinone, cytochrome c1 and duroquinone by antimycin.

Authors: G VON Jagow; C Bohrer
Journal: Biochim Biophys Acta Date: 1975-06-17

6. Estimation of binding free energies for HIV proteinase inhibitors by molecular dynamics simulations.

Authors: T Hansson; J Aqvist
Journal: Protein Eng Date: 1995-11

7. Theoretical model-based equations for the linear free energy relationships of the biological activity of ionizable substances. 1. Equilibrium-controlled potency.

Authors: Y C Martin; J J Hackbarth
Journal: J Med Chem Date: 1976-08 Impact factor: 7.446

8. A new method for predicting binding affinity in computer-aided drug design.

Authors: J Aqvist; C Medina; J E Samuelsson
Journal: Protein Eng Date: 1994-03

9. Inhibition of electron transport of rat-liver mitochondria by synthesized antimycin A analogs.

Authors: N Tokutake; H Miyoshi; H Nakazato; H Iwamura
Journal: Biochim Biophys Acta Date: 1993-05-06

10. Partitioning of organic compounds in phases imitating the headgroup and core regions of phospholipid bilayers.

Authors: Viera Lukacova; Ming Peng; Roman Tandlich; Anne Hinderliter; Stefan Balaz
Journal: Langmuir Date: 2006-02-14 Impact factor: 3.882

4 in total

1. Biomacromolecular quantitative structure-activity relationship (BioQSAR): a proof-of-concept study on the modeling, prediction and interpretation of protein-protein binding affinity.

Authors: Peng Zhou; Congcong Wang; Feifei Tian; Yanrong Ren; Chao Yang; Jian Huang
Journal: J Comput Aided Mol Des Date: 2013-01-10 Impact factor: 3.686

Review 2. Does transbilayer diffusion have a role in membrane transport of drugs?

Authors: Stefan Balaz
Journal: Drug Discov Today Date: 2012-06-15 Impact factor: 7.851

Review 3. Rigorous incorporation of tautomers, ionization species, and different binding modes into ligand-based and receptor-based 3D-QSAR methods.

Authors: Senthil Natesan; Stefan Balaz
Journal: Curr Pharm Des Date: 2013 Impact factor: 3.116

4. Structure-based prediction of drug distribution across the headgroup and core strata of a phospholipid bilayer using surrogate phases.

Authors: Senthil Natesan; Viera Lukacova; Ming Peng; Rajesh Subramaniam; Sandra Lynch; Zhanbin Wang; Roman Tandlich; Stefan Balaz
Journal: Mol Pharm Date: 2014-09-18 Impact factor: 4.939

4 in total