We present the cellular quantitative structure-activity relationship (cell-QSAR) concept that adapts ligand-based and receptor-based 3D-QSAR methods for use with cell-level activities. The unknown intracellular drug disposition is accounted for by the disposition function (DF), a model-based, nonlinear function of a drug's lipophilicity, acidity, and other properties. We conceptually combined the DF with our multispecies, multimode version of the frequently used ligand-based comparative molecular field analysis (CoMFA) method, forming a single correlation function for fitting the cell-level activities. The resulting cell-QSAR model was applied to the Selwood data on filaricidal activities of antimycin analogues. Their molecules are flexible, ionize under physiologic conditions, form different intramolecular H-bonds for neutral and ionized species, and cross several membranes to reach unknown receptors. The calibrated cell-QSAR model is significantly more predictive than other models lacking the disposition part and provides valuable structure optimization clues by factorizing the cell-level activity of each compound into the contributions of the receptor binding and disposition.
We present the cellular quantitative structure-activity relationship (cell-QSAR) concept that adapts ligand-based and receptor-based 3D-QSAR methods for use with cell-level activities. The unknown intracellular drug disposition is accounted for by the disposition function (DF), a model-based, nonlinear function of a drug's lipophilicity, acidity, and other properties. We conceptually combined the DF with our multispecies, multimode version of the frequently used ligand-based comparative molecular field analysis (CoMFA) method, forming a single correlation function for fitting the cell-level activities. The resulting cell-QSAR model was applied to the Selwood data on filaricidal activities of antimycin analogues. Their molecules are flexible, ionize under physiologic conditions, form different intramolecular H-bonds for neutral and ionized species, and cross several membranes to reach unknown receptors. The calibrated cell-QSAR model is significantly more predictive than other models lacking the disposition part and provides valuable structure optimization clues by factorizing the cell-level activity of each compound into the contributions of the receptor binding and disposition.
Conceptual methods for the correlation
of binding affinities with
drug structure (3D-QSARs) are vital to the drug development process.
The approaches include receptor-based methods,[1] e.g., free energy perturbation,[2] the
linear response method,[3−5] and the mining minima approach,[6−9] and ligand-based methods for unknown
receptors, e.g., comparative molecular field analysis (CoMFA).[10] The methods have been developed and perform
satisfactorily for the binding data measured with isolated macromolecules.
Unfortunately, binding affinities for isolated macromolecules are
often unavailable because the receptors have not yet been identified,
cannot be isolated without denaturation or dissociation, or do not
work in aqueous solutions. In such situations, more complex assays
are used, deploying receptors reconstituted in lipid vesicles as the
simplest system or utilizing intact cells, tissues, organs, or organisms.The effective concentrations of the drugs in the receptor surroundings
vary among the studied compounds because of interactions with nonreceptor
assay constituents. Drug disposition has often been neglected in modeling
the bioactivities measured in complex systems. At best, the 1-octanol/water
partition coefficients and their squares,[11] and sometimes the dissociation constants,[12] as properties affecting the disposition, have been included in simplified
linear forms. In many cell-level studies, 3D-QSAR methods have been
applied without any correction for ligand disposition or empirical
models with various descriptors have been deployed. Here, we propose
a straightforward solution to the problem of QSAR modeling of cell-level
data: extend the proven 3D-QSAR methods by accounting for the varying
ligand disposition in the receptor surroundings using the disposition function (DF). The conceptual cell-QSAR approach is based on a new correlation equation combining the 3D-QSAR
expression and the DF, whereby the coefficients in both parts can
be optimized either simultaneously or separately if the uptake data
are available. The exact form of the correlation equation depends
on the kinetics of the processes underlying the drug effects and on
the complexity of studied compounds (common skeleton, similarity of
substituents, ability to ionize and form tautomers), as discussed
in the Methods.The DF describes the
kinetics of drug disposition and relates the
dose or the initial concentration to the concentration in the receptor
surroundings (see eq 2 in the Methods). The first DF forms were peak-shaped dependencies,
on logarithmic scales, of the drug concentration inside the receptor
compartment, which was separated from the dosing compartment by a
few bilayers, on lipophilicity (the reference partition coefficient).[13−16] These dependencies represented the basis of the first QSAR techniques,
such as the Hansch approach[13] and related
methods. Subsequently, other parameters, including the ionization
constant,[17] the exposure time, the metabolic
rate parameters,[18] the ratio of the partition
coefficients in alkane/water and 1-octanol/water systems,[19−22] and the polar surface area[23,24] as descriptors of H-bonding
ability, the partition coefficient between phosphatidylcholine headgroups
and hexadecane as a predictor of bilayer localization and partitioning,[25] the membrane-interaction QSAR parameters,[26,27]the polarizability,[28]the cross-sectional
area of the molecule,[29] and numerous others,
were added. The DF functional forms developed from the parabolic,[13] bilinear,[15] nonequilibrium,[30] equilibrium,[17] and
mixed models to the pseudoequilibrium DF.[31,32] Some of these DFs predicted reliably outside the used property ranges,
in contrast to empirical models.[33]The DF can also be calibrated independently using the data on cellular
uptake and disposition, which are obtained by various experimental
approaches. High-content screening techniques provide more detailed
information than classical uptake experiments,[34] as demonstrated for the disposition of a small library
of lipophilic, fluorescent compounds in living HeLa cells.[35] Structure-dependent phenomena, such as active
influx or efflux and metabolism, could be captured in this process.The additional effort to obtain and fit the disposition data can
be spared in situations when the disposition is not complicated by
structure-dependent phenomena. Then the DF depends on drug properties
characterizing the bilayer transport and accumulation, nonspecific
protein binding, hydrolysis, and other nonenzymatic reactions. An
appropriate DF can be selected from the armoire of available models
and can be calibrated simultaneously with the 3D-QSAR model using
the cell-level data. This approach is demonstrated here using the
Selwood data[36] on filaricidal activities
of analogues of antimycin A, which inhibits mitochondrial electron
transport and oxidative phosphorylation[37] and induces apoptosis.[38] These data are
regarded as difficult to model and are frequently used to validate
QSAR methods on the basis of variable selection. The descriptive ability
of the generated models did not exceed 60–80% of the explained
variance,[36,39−47] so there is space for improvement.To the best of our knowledge,
this cell-QSAR study represents the
first attempt at the QSAR modeling of drug effects in complex systems,
where both drug–receptor interactions and drug disposition
significantly vary among studied compounds and both are conceptually
treated using a common correlation equation.
Results and Discussion
Studied Antimycin A Analogues
The studied compounds,
listed in Table 1, have varying substituents,
R1, on the phenol ring and large lipophilic substructures,
R2, which replace a complex dilactone moiety of the parent
compound. The compounds can be formally divided into three groups:
two-ring analogues (23 and 31), three-ring
analogues (2–16 and 26–30), and alkyl-chain analogues (1, 17–22, 24, and 25). The salicylamide moiety, present in all compounds except 31, exhibits a peculiar intramolecular H-bonding.[48,49] Two types of intramolecular H-bonds are formed, depending upon ionization
of the hydroxyl (Figure 1).
Table 1
Structures and Properties of Antimycin
Analogues
compd no.
R1
R2
log Pa
pKab
log Kc
log DFc
log(1/EC50)d
1
3-NHCHO
NHC14H29
7.491
7.88
1.181
4.150
5.155
2e
3-NHCHO
NHC6H3-3-Cl-4-(OC6H4-4-Cl)
5.955
7.21
2.307
3.438
5.620
3
5-NO2
NHC6H3-3-Cl-4-(OC6H4-4-Cl)
6.944
5.15
1.710
5.476
7.398
4
5-SCH3
NHC6H3-3-Cl-4-(OC6H4-4-Cl)
7.402
8.00
2.290
4.086
6.319
5
5-SOCH3
NHC6H3-3-Cl-4-(OC6H4-4-Cl)
5.722
6.80
1.843
3.535
5.125
6
3-NO2
NHC6H3-3-Cl-4-(OC6H4-4-Cl)
6.944
4.60
1.668
5.548
6.824
7
5-CN
NHC6H3-3-Cl-4-(OC6H4-4-Cl)
6.698
5.66
2.534
5.225
7.839
8
5-NO2
NHC6H4-4-(OC6H4-4-CF3)
6.541
5.15
1.483
5.397
7.022
9
3-SCH3
NHC6H3-3-Cl-4-(OC6H4-4-Cl)
7.402
7.47
1.931
4.292
6.420
10
5-SO2CH3
NHC6H3-3-Cl-4-(OC6H4-4-Cl)
5.731
6.11
1.681
4.166
6.000
11e
5-NO2
NHC6H4-4-(OC6H5)
5.658
5.31
1.450
4.818
6.097
12
5-NO2
NHC6H3-3-Cl-4-(COC6H4-4-Cl)
6.264
5.37
1.803
5.184
7.131
13
5-NO2
NHC6H4-4-(OC6H3-2-Cl-4-NO2)
5.884
5.20
1.936
5.067
6.921
14
5-NO2
NHC6H3-3-Cl-4-(OC6H4-4-OCH3)
6.150
5.11
1.839
5.271
6.770
15
3-SO2CH3
NHC6H3-3-Cl-4-(OC6H4-4-Cl)
5.731
5.10
1.433
5.034
6.301
16
5-NO2
NHC6H3-3-Cl-4-(SC6H4-4-Cl)
7.476
5.25
1.554
5.504
7.357
17e
3-NHCHO
NHC6H13
3.259
7.88
4.018
0.430
<5.000
18
3-NHCHO
NHC8H17
4.317
7.88
4.091
1.488
5.585
19f
3-NHCOCH3
NHC14H29
7.468
7.88
1.181
4.143
5.097
20e
5-NO2
NHC14H29
8.648
5.83
1.016
5.406
6.893
21
3-NO2
NHC14H29
8.648
5.28
1.293
5.522
6.818
22
3-NO2-5-Cl
NHC14H29
9.419
4.68
1.684
5.564
7.362
23
5-NO2
NHC6H4-4-C(CH3)3
5.386
5.55
1.659
4.369
6.229
24e
5-NO2
NHC12H25
7.590
5.83
1.647
5.361
7.409
25
3-NO2
NHC16H33
9.706
5.28
0.555
5.524
5.959
26
5-NO2
NHC6H3-3-Cl-4-(NH-C6H4-4-Cl)
6.900
5.12
0.968
5.477
6.432
27
5-NO2
NHC6H4-4-(OC6H4-3-CF3)
6.541
5.16
1.675
5.394
7.027
28
5-NO2
NHC6H3-3-Cl-4-(NHC6H4-4-SCF3)
7.899
5.04
1.711
5.541
7.553
29
5-NO2
NH-3-Cl-4-(3-CF3C6H4O)C6H3
7.114
5.06
1.872
5.510
7.071
30e
5-NO2
NHC6H4-4-(CH(OH)C6H5)
3.870
5.42
1.197
3.015
<5.000
31f
5-NO2
C6H4-4-Cl
4.329
6.73
0.777
2.221
6.481
ClogP-predicted 1-octanol/water
partition coefficient P.
SPARC-predicted acidity of the phenolic
hydroxyl.
Affinities and
disposition function
values predicted from the DF-MSMM CoMFA model.
Compounds excluded before analysis
because of singularity.
Figure 1
Conformational switching of an intramolecular H-bond in the salicylamide
ring of antimycin derivatives upon ionization.
ClogP-predicted 1-octanol/water
partition coefficient P.SPARC-predicted acidity of the phenolic
hydroxyl.Affinities and
disposition function
values predicted from the DF-MSMM CoMFA model.Measured antifilarial activities
(molar EC50).[36,52]Test set compounds.Compounds excluded before analysis
because of singularity.Conformational switching of an intramolecular H-bond in the salicylamide
ring of antimycin derivatives upon ionization.The hydroxyl pKa values
(Table 1) of the studied antimycins were estimated
by the
SPARC method,[50] which seems to take the
intramolecular H-bond into account. The magnitude of the pKa values (range 4.6–8.0) is determined
mainly by the electron-withdrawing ability of the R1 substituents
on the phenol ring, with the R2 substituents exhibiting
a slighter influence. Under neutral conditions, the compounds are
present in the aqueous solution as a mixture of ionized and nonionized
species, with the balance shifted toward the ionized species for most
compounds, except compounds 1, 2, 4, 9, and 17–19 having pKa > 7. Formation of tautomers
and carbonyl hydrates was negligible, according to the SPARC estimates;
otherwise, they would be included in the model in the same way as
ionization. Table 1 also contains the activities
spanning almost 3 orders of magnitude and the 1-octanol/water partition
coefficients P (ClogP estimates[51] for neutral molecules).Antimycin A is a known inhibitor
of oxidative phosphorylation.[37] Therefore,
we examined whether the Selwood data
could be explained by our two-decade-old, conceptual QSAR model for
uncouplers.[34] The model describes isoeffective
concentrations eliciting a given degree of uncoupling as a nonlinear
function containing lipophilicity (P) and acidity
(pKa). Although the surface generated
by the fit to the data was reminiscent of the expected shape,[34] the deviations of individual compounds were
too large to consider the fit satisfactory. We hypothesized that the
deviations were caused by structure-specific interactions with an
unknown receptor; therefore, we decided to deploy a ligand-based 3D-QSAR
method.
Setup of the Cell-QSAR Model
The cell-QSAR correlation
equation for the simplest drug effect scenario consists of the product
of the drug–receptor association constant K and the DF (eq 3 in the Methods). The receptor is unknown in this case, so K was
expressed using one of the popular ligand-based methods, CoMFA.[10] The antimycin analogues ionize to different
degrees and are flexible; therefore, we applied our multispecies (MS),
multimode (MM) modification of CoMFA.[53] The pseudoequilibrium form of the DF (eq 10 in the Methods),[32] treating absorption as an instantaneous process, was chosen, because
no lag phase was observed in the time course of toxicity.[36,52] An additional reason for selecting this DF form was the ability
of the underlying model[32] to handle speciation
of the ligands.
Ligand Superposition
Pharmacophores obtained for all
species using the Galahad procedure,[54,55] which performs
flexible superposition based on the active analogue approach,[56] indicated that the salicylamide moieties are
closely superimposed in all cases. This outcome is consistent with
the apparent importance of this moiety for inhibition: an analogue
capable of H-bond formation (Figure 1) inhibited
the cytochrome bc1 complex activity in
submitochondrial particles much stronger than a similar analogue lacking
this ability.[57]A rigid compound
that preferably exhibits a high activity would be very useful for
guiding the superposition of the ligands. Unfortunately, there is
no truly rigid analogue available in the studied series. The three-ring
compounds contain more rigid aromatic rings than other compounds,
so we selected the most active compound in this group and in the entire
series (7) as the template. Because the pharmacophore-generating
procedure did not provide clues about the conformation of the lipophilic
fragments connected to the salicylamide ring, we used the low-energy
conformations of compound 7 as the assumed bound conformations.
This step is associated with the risk that the overall spatial organization
of the binding site model may not completely correspond to that of
the actual receptor. While the binding points will be positioned appropriately
around individual rigid rings of the template, the relative positions
of the individual groups of points surrounding the rigid fragments
may differ from those of the actual receptor because of possible fragment
rotations. This drawback would not affect the predictive ability of
the model for compounds which can adopt the template conformation
without significant conformational strain.The conformational
space of compound 7 is characterized
by four torsions (Figure 2). Conformational
analysis identified the minimum-energy conformations for the H-bonded
forms of both neutral and ionized molecules (Figure 1). The common minimum conformations of ionized and neutral
species without the intramolecular H-bonds were also looked up, for
the possibility that the hydroxyl and the amide groups would interact
with the receptor instead of forming the intramolecular bond. Two
skeleton conformations appear for each of the two species: one closed via the intramolecular H-bond and one open without the H-bond. The conformations are summarized in Table 2.
Figure 2
Four torsions of compound 7 (Table 1) characterizing the conformations of three-ring
analogues
displayed in red color. They are Φ1 (C3–C4–C11–O12),
Φ2 (C11–N13–C15–C20), Φ3 (C17–C18–O22–C23), and Φ4 (C18–O22–C23–C28).
Table 2
Superposition Templates Based on the
Most Active Compound 7 (Table 1)
torsionc (deg)
template
no.
speciesa
conformationb
intramolecular
H-bond
Φ1
Φ2
Φ3
Φ4
1
neutral
closed
OH···O=C
4.1
2.8
93.8
178.5
2
neutral
open
none
137.7
–15.7
208.3
225.4
3
ionized
closed
NH···O–
180.6
0.8
98.4
174.7
4
ionized
open
none
137.7
–15.7
208.3
225.4
See Figure 1.
“Closed”
and “open”
refer to the intramolecular H-bond being present and absent, respectively.
See Figure 2.
Four torsions of compound 7 (Table 1) characterizing the conformations of three-ring
analogues
displayed in red color. They are Φ1 (C3–C4–C11–O12),
Φ2 (C11–N13–C15–C20), Φ3 (C17–C18–O22–C23), and Φ4 (C18–O22–C23–C28).See Figure 1.“Closed”
and “open”
refer to the intramolecular H-bond being present and absent, respectively.See Figure 2.It has long been recognized that flipping of symmetrical
rings
with asymmetrically placed substituents results in similar but distinct
bound geometries. If the receptor structure is unknown, preferred
geometries cannot be selected a priori and all geometries should be
considered in model building.[58−64] A total of 16 of 31 studied compounds (Table 1) have substituents in meta-positions on either
the middle ring (2–7, 9, 10, 12, 14–16, 26, and 28) or the third ring
(13 and 27). For these compounds, rotation
of the asymmetrically substituted ring by 180° led to doubling
of the existing two modes. Compound 29 with two asymmetrically
substituted rings is represented by eight modes.The process
of creating individual superpositions is illustrated
in Figure 3. The four basic templates of compound 7 were superimposed by the atom fit method in Sybyl (Figure 3b) to ensure the best overlap of the salicylamide
moieties, as suggested by the generated pharmacophore. All other molecules
were superimposed on the templates using the FlexS procedure[65] (Figure 3c,d).
Figure 3
Superposition
procedure: (a) four basic templates of compound 7, generated
by conformational analysis and numbered as in
Table 2; (b) superposition of the four basic
templates; (c) superpositions of all compounds (Table 1) to the four basic templates in respective species and conformations,
numbered as in row a; (d) final superpositions, enriched by flipping
the benzene rings by 180° for the MSMM situation (I; 200 molecules
= 31 compounds × 2 species × 2 skeleton conformations ×
1, 2, or 4 ring-flipping conformations), the MM situation (II; 100
molecules created as in situation I but with 1 species), and the standard
one-mode situation (III; 31 molecules).
Superposition
procedure: (a) four basic templates of compound 7, generated
by conformational analysis and numbered as in
Table 2; (b) superposition of the four basic
templates; (c) superpositions of all compounds (Table 1) to the four basic templates in respective species and conformations,
numbered as in row a; (d) final superpositions, enriched by flipping
the benzene rings by 180° for the MSMM situation (I; 200 molecules
= 31 compounds × 2 species × 2 skeleton conformations ×
1, 2, or 4 ring-flipping conformations), the MM situation (II; 100
molecules created as in situation I but with 1 species), and the standard
one-mode situation (III; 31 molecules).
DF-MSMM CoMFA Analysis
The analysis is based on the
correlation equation obtained by taking the logarithm of eq 3 (X = 0.5 because the EC50 values are modeled), with the association constant K expressed by eq 9, and the DF given by eq 10 (the exposure time t = constant)
combined with eqs 11 and 12. For both the receptor-binding and disposition parts, ionization
is included and the coefficients are optimized simultaneously.Optimization examines two forms of the DF, for the aqueous phases
(shown in eq 10) and membranes. If the receptors
are localized in a membrane and the compounds bind to them from the
hydrocarbon core of the bilayer, eq 10 needs
to be multiplied by Pβ (P is the reference partition coefficient and β is
the Collander coefficient, which is one of the optimized regression
coefficients). The water and membrane variants of eq 10 are initially used in fully expanded form: each of the coefficients A–D is replaced by eqs 11 and 12. The optimization proceeds with
gradual forward selection of energies in relevant grid points in the
CoMFA part, which is done separately for the two forms of the DF.
The better DF is selected on the basis of the statistical criteria
and further optimized by omitting the parameters with higher standard
deviations, if the elimination does not cause a significant decrease
in statistical criteria. One, or sometimes both, of the coefficients C and D, describing lipophilicity-dependent
and lipophilicity-independent elimination, can frequently be omitted.
In each pair of coefficients A–D (eq 11), the term for one species may or may
not be significant.The combinatorial search for the best DF-MSMM
CoMFA model was extensive
for three reasons: many possible combinations of relevant grid points
need to be examined, the DF can have multiple functional forms, and
many sets of the initial estimates of coefficients need to be tested
to avoid trapping in local minima. The best model was selected on
the basis of the r2 and q2 (for the leave-one-out cross-validation in the training
set, used to eliminate unstable models) values and the number of optimized
coefficients and and their standard deviations. For illustration,
there were 65 021 models with r2 ≥ 0.6, 11 142 models with r2 ≥ 0.8 and q2 ≥ 0.6, and
3752 models with r2 ≥ 0.9 and q2 ≥ 0.7, of which 25 models had less
than 11 coefficients, each with smaller than 60% standard deviation.The best DF-MSMM CoMFA model contained the DF for the compounds
interacting with the receptors in a membrane, with metabolism that
was either insignificant or invariant for the studied compounds (C = 0 and D = 0), and five required CoMFA
grid points. The final, pruned correlation equation, resulting from
eq 3 (X = 0.5), with the association
constant K expressed by eq 9 (the subscript i was omitted for fractions f), and the DF given by eqs 10–12, wasThe first term is an MSMM CoMFA expression
for the receptor binding (log K), and the remaining
terms describe disposition (log DF). The fractions of ionized and
nonionized molecules, f1 and f2, respectively, include the pKa values and the optimized pH value as shown in eq 12 and are identical in both receptor-binding and disposition
terms. The microscopic association constant K for each of the m considered
modes is represented by a k-summand. Each of the k-summands contains five identical adjustable coefficients, C1–C5, which,
for the selected grid points (coordinates in Table 3), multiply the energies X for the given modes. The addition of multiple modes does not
increase the number of optimized coefficients; it only changes the
form of the correlation equation by increasing the number of k-summands.
Table 3
Characterization of the Receptor-Binding
Part of the DF-MSMM CoMFA Model (Eq 1)
coordinates
(Å)b
coefficient
valuea
x
y
z
energyc
C1
0.538 ± 0.247
12.66
6.83
4.780
S
C2
–1.931 ± 0.711
6.668
10.836
–5.218
S
C3
–0.245 ± 0.108
2.668
6.836
–3.218
E
C4
–6.657 ± 2.873
4.668
16.863
–5.218
S
C5
0.116 ± 0.043
–3.330
14.836
–7.2184
E
Regression coefficients.
Grid point coordinates.
Energy types: S, steric, E, electrostatic.
Regression coefficients.Grid point coordinates.Energy types: S, steric, E, electrostatic.Nonlinear regression analysis resulting in eq 1 provided the following outcome for the best DF-MSMM
model. For the
receptor-binding expression, the results are summarized in Table 3. The optimal values of the disposition-describing
coefficients for the DF were A1 = (7.142
± 2.213) × 10–5, A2 = (2.636 ± 1.254) × 10–6, B1 = (9.989 ± 4.233) × 102, and pH 7.56 ± 0.32. The optimization of the Collander coefficient
β and the coefficient C0 did not
lead to a sufficient improvement, so their values were fixed as β
= 1 and C0 = 0. The description statistics,
calculated for the training set, consists of the sum of squares of
errors (SSE), where the error is the difference between calculated
and experimental values, and the squared correlation coefficient (the
coefficient of determination, equal to the percentage of explained
variance) r2 = 1 – SSE/SYY, where
SYY is the sum of squares of deviations of the experimental log K values from their average. The prediction statistics contains
similar indices calculated for the test set that was not used in the
development of the model: predictive sum of squares of deviations
between predicted and experimental values (PRESS) and the squared
correlation coefficient q2 = 1 –
PRESS/SYY. Table 4 summarizes the descriptive
(SSE and r2 for the training set) and
predictive (q2 and PRESS for the test
set) statistical criteria for the DF-MSMM CoMFA model (1) and for
the reduced models (2–13). For comparison, the random scrambling
of bioactivities (50 runs) produced no significant correlations, with
the average values of SSE = 11.12 and r2 = 0.250. Optimized eq 1 also provides the
affinity (log K) and disposition (log DF) components
of bioactivity (Table 1). A test set (Table 1, compounds 2, 11, 17, 20, 24, and 30)
covering the entire range of biological activities was selected before
optimization. The use of multiple test sets, which would be a preferred
way to examine the predictive abilities of the model, was precluded
by the extent of the optimization procedure. For compounds 17 and 30 with semiquantitative activities, the q2 and PRESS values (Table 4) were calculated using the upper limits, i.e., log(1/EC50) = 5. Because only the DF-MSMS CoMFA model correctly predicted log(1/EC50) < 5 (Figure 4b), the predictive
abilities are probably underestimated for the DF-MSMS CoMFA model
and overestimated for all other models.
Table 4
Statistical Indices Characterizing
CoMFA Models of Varying Complexitya
statistical indices
CoMFA
model
descriptive
predictive
no.
species
mode
no. of coefficients
SSE
r2
PRESS
q2
1
DF-MS
MM
5 + 5
0.902
0.923
1.356
0.815
2
MS
MM
11
2.336
0.801
3.161
0.569
3
SS: majorb
MM
11
2.476
0.789
2.995
0.592
4
SS: neutral
SM: closed
5 (3568)c
0.154
0.987
6.410
0.126
5
SS: neutral
SM: open
5 (3563)
0.361
0.969
6.070
0.173
6
SS: neutral
MM: averaged
5 (564)
0.400
0.966
6.498
0.114
7
SS: ionized
SM: closed
5 (3569)
0.155
0.987
6.686
0.089
8
SS: ionized
SM: open
5 (3584)
0.196
0.983
5.411
0.262
9
SS: ionized
MM: averaged
5 (572)
0.580
0.951
6.228
0.151
10
SS: majorb
SM: closed
5 (3570)
0.154
0.987
6.785
0.075
11
SS: majorb
SM: open
5 (3583)
0.325
0.972
4.818
0.343
12
SS: majorb
MM: averaged
5 (534)
0.064
0.995
5.897
0.139
13
SS: bounde
SM: bounde
5 (3581)
0.403
0.966
4.455
0.393
The models differ in the use of
disposition function (DF), multiple or single species (MS or SS),
and multiple or single modes (MM or SM).
Prevailing species in aqueous solution
used for each compound.
Number of latent variables shown,
with the number of all energies given in parentheses.
The energies of all considered modes
are averaged and analyzed by standard, one-mode CoMFA.
Prevalent bound species and mode
as determined by the DF-MSMM CoMFA model used for each compound.
Figure 4
Calculated and predicted activities plotted against experimental
data for the training set (a) and test set (b), generated with the
DF-MSMM (black), MSMM (blue), SSMM (red), and standard (green) CoMFA
models (models 1–3 and 11 in Table 4, respectively). In plot b, the two experimental values of log(1/EC50) = 5 are actually <5 (Table 1,
compounds 17 and 30), as indicated by the
arrows.
The models differ in the use of
disposition function (DF), multiple or single species (MS or SS),
and multiple or single modes (MM or SM).Prevailing species in aqueous solution
used for each compound.Number of latent variables shown,
with the number of all energies given in parentheses.The energies of all considered modes
are averaged and analyzed by standard, one-mode CoMFA.Prevalent bound species and mode
as determined by the DF-MSMM CoMFA model used for each compound.Calculated and predicted activities plotted against experimental
data for the training set (a) and test set (b), generated with the
DF-MSMM (black), MSMM (blue), SSMM (red), and standard (green) CoMFA
models (models 1–3 and 11 in Table 4, respectively). In plot b, the two experimental values of log(1/EC50) = 5 are actually <5 (Table 1,
compounds 17 and 30), as indicated by the
arrows.Table 4 demonstrates that
all features used
in the DF-MSMM CoMFA model are necessary for a good prediction. Omission
of the disposition leads to a decline in predictive abilities (PRESS
and q2), even if multiple species and/or
modes are considered (models 2 and 3). The decline is more severe
for standard, one-mode CoMFA models (4–13), including those
with the averaged multiple modes (models 6, 9, and 12) and even for
the prevalent bound species and modes (model 13), which were selected
by DF-MSMM CoMFA model 1. This fact indicates that the one-mode QSAR
solution may not exist. Although only this indirect, model-based indication
is available for the studied case, multiple modes were experimentally
proven to occur in many ligand–macromolecule complexes (see
the references in ref (57)). Interestingly, the descriptive abilities (SSE and r2) were great in all cases. This may, in part, explain
the use of 3D-QSAR methods without any description of disposition
for cell-level activities.It is worth mentioning that an exhaustive
examination of modes
and species for each compound in a standard CoMFA analysis, seeking
a one-mode solution, is technically not feasible. The 23 compounds
of the training set, each present in two species and two conformations
(for simplicity, the ring-flipping conformations are omitted), would
require 4[23] analyses or about 2.2 million
CPU years, if a single one-mode analysis only lasts 1 s. Therefore,
the MSMM approach, performing an extensive, albeit not exhaustive
search lasting several weeks on a few processors, compares favorably.The calculated bioactivities for the training set and the predictions
for the test set compounds are compared with experimental values in
parts a and b, respectively, of Figure 4. The
use of multiple species and modes and, especially, of the DF clearly
improves predictivity (PRESS and q2).
Descriptive statistical indices for calibration display a similar
trend, except for the excellent SSE and r2 values for the standard, one-mode CoMFA model, utilizing the full
set of energies. The standard model, however, did not confirm its
calibration quality in the prediction exercise (Figure 4b).
3D-QSAR Contour Map
The steric and electrostatic contours
for the DF-MSMM CoMFA model with various compounds embedded in the
background of overall MSMM superposition (Figure 3d, situation I) are displayed in Figure 5, along with representative ligand conformations. The contour map
shows that the binding site model is defined by a sterically favorable
region near the salicylamide ring, two sterically unfavorable regions
in the midsection of the binding site, an electronegative-favoring
region near the second ring of two- or three-ring compounds, and an
electropositive-favoring region at the lipophilic end of the molecules.
The absence of electrostatic regions around the salicylamide ring
may result from small differences among the studied compounds in this
substructure.
Figure 5
Contour maps of the DF-MSMM CoMFA model with embedded
superposition
of all 200 molecules to indicate the applicability space. Green and
yellow contours indicate regions that are sterically favorable and
unfavorable, respectively. Blue regions favor electropositive groups,
and red regions favor electronegative groups. The following ligands
are presented in ball and stick models: (a) compound 2 (Table 1) binding in ionized and open conformation,
(b) compound 2 binding in neutral and closed conformation,
showing the clash of the 3-Cl in the second ring with one of the sterically
unfavorable regions, (c) compound 18 with a shorter alkyl
chain, one of the strongest binders, (d) compound 19 with
the 3-acetamide group in the salicylamide ring occupying a sterically
favorable region, showing clashes of the long chain with one of the
sterically unfavorable regions.
Contour maps of the DF-MSMM CoMFA model with embedded
superposition
of all 200 molecules to indicate the applicability space. Green and
yellow contours indicate regions that are sterically favorable and
unfavorable, respectively. Blue regions favor electropositive groups,
and red regions favor electronegative groups. The following ligands
are presented in ball and stick models: (a) compound 2 (Table 1) binding in ionized and open conformation,
(b) compound 2 binding in neutral and closed conformation,
showing the clash of the 3-Cl in the second ring with one of the sterically
unfavorable regions, (c) compound 18 with a shorter alkyl
chain, one of the strongest binders, (d) compound 19 with
the 3-acetamide group in the salicylamide ring occupying a sterically
favorable region, showing clashes of the long chain with one of the
sterically unfavorable regions.The binding conformation in which the bulky chlorine
atom in the
second ring is closer to the salicylamide OH group and far from a
sterically unfavorable region (Figure 5a) is
preferred in 11 out of 15 compounds. Consequenty, the ring-flipped
conformation (Figure 5b) is not populated.
Compounds with lipophilic alkyl groups tend to have a wide range of
activities depending on the length of the alkyl chains. Compounds 17 and 18 (Figure 5c)
with six and eight methylene groups, respectively, bind in an extended
conformation and show the highest predicted binding affinities (Table 1, log K > 4), indicating the
optimal
chain length for binding. On the other hand, compounds with 12–14
methylene groups (Table 1, 1, 19–22) bind in a packed conformation (Figure 5d) that results in low affinities (Table 1, log K < 1.7), probably because
the folded chain protrudes into the sterically unfavorable regions.
For the same reason, the affinity of compound 25 with
16 methylene groups exhibits a further drop (log K ≈ 0.5). This reasoning could also apply to compound 23 showing only average affinity (log K ≈ 1.7), although its 4-tert-butylphenyl ring is similar
in length to the optimal n-hexyl chain in the extended
conformation (compound 17) but is much bulkier. The red
electronegative-favoring region in the midsection of the binding site
improves the affinity (log K) of 15 of 31 compounds
with an electronegative chlorine atom in the second ring, even though
their overall activities differ significantly because of varying disposition.
This interaction also causes the compounds with the electronegative
oxygen (2–11, 13–15, 27, 29) or carbonyl (12) linkers between the second and third rings to have a higher binding
affinity than compounds with other linkers (16, 26, 30).Two compounds, 19 and 31, were excluded
from optimization before its start because of singularities caused
by molecular parts protruding into subspaces which were not reached
by other compounds in the series. Singularity of compound 19 is caused by the 3-acetamide group in the salicylamide ring, which
reaches a spot in the vicinity of the sterically attractive region
(Figure 5d). The singular region may be an
open, water-filled space with no effect on affinity because the prediction
for compound 19 by the DF-MSMM model (log K + log DF = 5.324, Table 1) is close to the
experimental log(1/EC50) = 5.097. Compound 31 lacks an amine linker between its rings, and therefore, its molecule
extends into uncharted space. Compound 31 has a much
lower predicted activity (log K + log DF = 2.998)
than the actual measured log(1/EC50) = 6.461. This difference
indicates that there may be an attractive spot in the midsection of
the binding site which the model does not capture because of the lack
of molecules occupying this region.
Prevalences of Bound Species and Modes
The prevalences,
given by the Boltzmann distribution shown in eq 8 in the Methods, cannot be guessed a priori
because they are one of the outcomes of optimization. The prevalences
for the DF-MSMM CoMFA model (eq 8), with individual K values calculated as the
decadic terms in eq 1 with optimized values
of the coefficients C, are shown in detail in the Supporting Information and summarized here. A total of 16 of 31 compounds (1, 2, 8, 10, 11, 13, 15–18, 23, 25, 27, 28, 30, 31) bind predominantly in the closed conformation,
maintaining the intramolecular H-bond (Figure 1). The remaining compounds bind in the open conformation, exposing
the salicylamide’s polar groups, with at least 50% prevalences.
Compounds 15, 17, 18, and 25 bind 100% in the closed conformation, while compound 20 binds 100% in the open conformation.Most compounds
prefer to bind as ionized species. The exceptions are compounds 2, 4, 9, 17, and 18, for which the neutral species bind with 74%, 53%, 58%,
100%, and 100% prevalences, respectively. Figure 6 indicates that the species distributions of bound and free
molecules are quite similar for most compounds. For neutral species,
most compounds have both the fraction in water and binding prevalence
less than 10%. For ionized species, most compounds have the fraction
in water and binding prevalence as high as 90%. About a quarter of
the compounds (1, 2, 4, 5, 18, 19, 31), however,
exhibit significant differences between the fractions in water and
binding prevalences.
Figure 6
Speciation of free and bound molecules according to the
DF-MSMM
CoMFA model. Neutral and ionized species are shown in blue and red,
respectively, as the compound number (Table 1).
Speciation of free and bound molecules according to the
DF-MSMM
CoMFA model. Neutral and ionized species are shown in blue and red,
respectively, as the compound number (Table 1).The predominant binding species and binding conformations
are to
a large extent affected by the substituents in the salicylamide ring.
Strong electron-withdrawing groups promote ionization and also contribute
to higher activity: only 3 (4, 9, 31) out of 24 compounds with log(1/EC50) > 6
have
pKa > 6.5 (Table 1). However, lipophilic moieties including alkyl chains and two or
three ring systems also affect the binding as well as disposition,
so the structure–activity relationships are complex.
Binding of Alkyl-Chain Analogues
The strongest binders,
compounds 17 and 18 (Table 1, log K > 4) with a 3-formylamino group
in
the salicylamide ring predominantly bind as neutral species in the
closed conformation. The shorter alkyl chains may allow sufficient
conformational freedom for these two compounds to make the best possible
interactions in the binding site, leading to high binding affinities.
Long-chain analogues (compounds 19–22 and 24, Table 1) predominantly
bind in ionized open conformations or sometimes in ionized closed
conformations, as seen for compounds 1 and 25. All long-chain analogues exhibit low to average binding affinity,
possibly because of steric clashes of long chains in packed conformations
(Figure 5d) with the sterically unfavorable
regions in the midsection of the binding site.
Binding of Two- and Three-Ring Analogues
Compounds
with a 2-NO2 or 5-NO2 group in the salicylamide
ring bind predominantly as ionized species, either in open or closed
conformations, depending upon the presence of a 3-Cl substituent in
the second ring (the absence of which seems to be compensated by the
presence of a 2-Cl or 4-CF3 substituent in the third ring).
Compounds 3, 6, 12, 14, 26, 28, and 29 with a 3-Cl substituent
in the second ring mainly bind in the open conformation, although
compounds 12 and 29 bind about equally in
the closed conformation as well, and compound 28 shows
slightly higher preference for the closed conformation. This variation
is possibly affected by the presence of a carbonyl linker group between
the second and third rings in compound 12 and 4-SCF3 and 3-CF3 groups in the third ring of compounds 28 and 29, respectively. Compounds without a
3-Cl substituent in the second ring (8, 11, 13, 23, 27, and 30) bind predominantly as ionized species and show higher preference
for the closed conformation. The absence of steric hindrance by a
chlorine substituent in the second ring allows these compounds to
assume the preferred closed conformation with an intramolecular H-bond
between the amideNH and phenoxide ion. Compound 16 (with
a 5-NO2 group in the first ring and a 3-Cl substituent
in the second ring) prefers to bind in the closed conformation, possibly
because of the presence of a thioether linkage between the second
and third rings, resulting in a longer linker. The most active in
this series, compound 7 with a 5-CN substituent in the
salicylamide ring, binds predominantly as an ionized species in the
open conformation.
Influence of Disposition
The disposition of antimycins
in female worms of Dipetalonema viteae after 120 h of exposure[52] can be visualized
for individual compounds as the difference between the experimental
log(1/EC50) values and the log K values
(Table 1) calculated from the affinity part
of the DF-MSMM CoMFA model (eq 1). These values
are plotted as a function of the drugs’ lipophilicity (log P) and acidity (pKa) in Figure 7, along with the surface given by the disposition
part of the DF-MSMM CoMFA model (eq 1) with
optimized coefficients. The shape of log DF(P,Ka) is remarkably nonlinear: it is essentially
composed of interconnected planes positioned at varying angles. No
further curvatures are expected, according to the model, as the planes
extend in all four directions to the limit property values, so this
conceptual function safely predicts outside of the property ranges.[33] Disposition significantly modulates the bioactivity
of antimycins by ∼5 log(1/EC50) units. Using the
correct form of the DF for the correlations is essential, because
its peculiar shape (Figure 7) is difficult
to emulate by polynomials, which are too flexible to approximate the
planar areas.[33]
Figure 7
Dependence of the pseudoequilibrium,
membrane-based DF on lipophilicity
and acidity, as given by the disposition part of eq 1 with the optimized coefficient values listed in the text.
The data for best binders with poor disposition (compounds 17 and 18, Table 1) are shown as
open points.
Dependence of the pseudoequilibrium,
membrane-based DF on lipophilicity
and acidity, as given by the disposition part of eq 1 with the optimized coefficient values listed in the text.
The data for best binders with poor disposition (compounds 17 and 18, Table 1) are shown as
open points.The resulting fit (eq 1,
Table 3 and accompanying text) can be interpreted
as follows. The
variation in disposition is caused by pseudoequilibrium partitioning
of ionizable compounds. The hypothetical receptors are localized in
a membrane, with the drugs accessing the binding site (the CoMFA model
in Figure 5) from within the bilayer. The optimized
coefficients are associated with the processes that determine drug
disposition: A, with bilayer accumulation and nonspecific
protein binding; B, with accumulation in aqueous
phases, with subscripts 1 and 2 indicating the relatedness to the
nonionized and ionized species, respectively. The coefficient B2 was not necessary for a good fit, indicating
that accumulation of ionized molecules in the aqueous compartments
did not cause significant variation in overall disposition. The exponent,
coming from eq 10 and containing coefficients C and D describing metabolism, need not
be used in eq 1, meaning that C1 = C2 = D1 = D2 = 0. The model indicates
that the pseudoequilibrium disposition of the studied compounds was
not affected by metabolism, which could be either negligible or similar
for all compounds.The internal pH value in terms f1 and f2 (eq 12) was optimized
as pH 7.56 ± 0.32. This high value can normally be encountered
only in one part of the cell: in the matrix of mitochondria. Therefore,
the fit indicates that the hypothetical receptors are localized in
the bilayer that is in contact with the mitochondrial matrix. This
indication agrees with the current knowledge about antimycin A’s
mechanism of action, which focuses on inhibition of electron transport.[57,66−69] Antimycin A binds at the ubiquinone binding site Qi of
the complex that is located on the matrix side of the cristae membrane.[70]Notably, the disposition function provides
clear guidelines for
the optimization of the drug structure for the best disposition. The
top plateau in Figure 7 indicates the range
of log P and pKa values
that lead to the maximum drug disposition in the receptor surroundings.
The region of the maximum disposition can be defined as pKa ≤ log P for log P ≤ 6 and pKa ≤ 6 for log P > 6. The knowledge of suitable combinations of log P and pKa values provides unique
clues for optimization of drug structure, also because the lower log P and pKa values will lead to
increased aqueous solubility.
Factoring Bioactivity into Affinity and Disposition Components
Equation 1 defines two contributions to the
affinity: log K (the first term) and disposition
(log DF; the remaining terms). The two contributions were enumerated
from the calibrated DF-MSMM CoMFA model (eq 1 and the accompanying text, Table 3), summarized
in Table 1, and plotted against the experimental
antifilarial activities (Table 1) in Figure 8.
Figure 8
Dissection of affinity (log K; red symbols)
and
disposition (log DF; blue symbols) contributions to overall bioactivity
by the calibrated DF-MSMM CoMFA model. Stars mark the best binders 17 and 18 with poor disposition. All data are
given in Table 1.
Dissection of affinity (log K; red symbols)
and
disposition (log DF; blue symbols) contributions to overall bioactivity
by the calibrated DF-MSMM CoMFA model. Stars mark the best binders 17 and 18 with poor disposition. All data are
given in Table 1.Figure 8 and Table 1 show that two weakly active compounds (17 and 18) are actually the best binders, which are held
back by
poor disposition. Their disposition can be, in principle, increased
by 4–5 logarithmic units, leading to overall bioactivities
approaching log(1/EC50) ≈ 9.5, which is almost 2
orders of magnitude better than that of the most active compound 7. Disposition is easier to optimize than binding because
the former depends on lipophilicity and acidity, which are well understood
in terms of drug structure. The DF-MSMM CoMFA model, and cell-QSAR
models in general, provides unique clues for optimization of the compounds.
Conclusions
The cell-QSAR concept adapts ligand-based
and receptor-based 3D-QSAR
approaches for use with the bioactivities measured in cells or more
complex systems. The adaptation consists of the extension of the correlation
equation specific for the given 3D-QSAR method by the DF. The DF describes
changes in the concentration of compounds in the receptor surroundings
originating from membrane accumulation, binding to nonreceptor proteins,
hydrolysis, and possibly other processes. The cell-QSAR concept was
used here with the popular ligand-based CoMFA method. A conceptual
combination of the DF with our MSMM extension of the CoMFA model was
applied to antifilarial activities of antimycin A analogues, representing
a part of the difficult-to-model Selwood data set that is often used
as a test case for assessment of new variable-selection algorithms.
The fitting results are very satisfactory and show that each component
of the model (DF, MS, MM) improves the predictive ability. The optimized
form of the disposition function indicates that the compounds bind
to the hypothetical, membrane-bound receptor in the cristae bilayer
of mitochondria and the differences in their metabolic rates are negligible.
The perceived complexity of the Selwood data set may be explained
by the nonlinearity of the model in lipophilicity and acidity in combination
with the lack of proper descriptors for acidity in the original set,
the peculiar intramolecular H-bonds in the salicylamide moiety that
differ for neutral and ionized species, and the structure-specific
receptor binding. The results also illustrate the utility of the cell-QSAR
approach in medicinal chemistry. Specifically, the cell-QSAR approach
delineates the affinity and disposition contributions to bioactivity,
which provide for a straightforward, separate structure optimization
for affinity and disposition. Although further testing is necessary,
the cell-QSAR concept seems to be a promising direction in the structure-based
or ligand-based predictions of bioactivities in cellular or more complex
systems and drug structure optimization. For larger and more diverse
data sets, better superposition techniques and methods for prediction
of species fractions will be necessary. The model-based nature opens
the cell-QSAR concept for incorporation of any future improvements
in 3D-QSAR techniques and the disposition function, as well as new
pharmacology and pharmacodynamic scenarios.
Methods
Pharmacokinetics (PK), Pharmacology (PL), and Pharmacodynamics
(PD) Phases of the Drug Effect in QSAR
The PK phase is described
by the DF. The simplest PL/PD scenario consists of a situation when
the PL phase is represented by the 1:1, fast, and reversible ligand–receptor
interaction and the PD phase describes the biological effect, which
is a direct and immediate consequence of the drug–receptor
interaction and is proportional to the fraction of occupied receptors.
The ligand–receptor association constant K is defined using the concentrations (denoted by square brackets)
of the ligand (L), receptor (R), and complex (LR) as K = [LR]/([L][R]). In a cellular system, the free ligand concentration
[L] varies because of the disposition, and consequently, all three
concentrations are time-dependent. Usually, the concentrations are
sufficiently low to approximate activities. If the concentration of
the free receptors [R] is expressed as the difference between the
total receptor concentration [R]0 and [LR], the fraction
of occupied receptors is given as [LR]/[R]0 = K[L]/([L] + 1) = E/Emax. The latter equality comes from the simplest PD phase, describing
the primary effect E as being proportional to the
fraction of occupied receptors, [LR]/[R]0. Secondary effects
lacking this simple tie to the receptor modification require more
complex PD models.[71] Nevertheless, the
above scenario applies to many drug effects caused by competitive
enzyme inhibition or receptor binding. Since the receptor-bound drug
represents only a small drug fraction in the body, the disposition
is not significantly affected by the receptor binding, and the ligand
concentration in the receptor surroundings, [L], can be expressed
using the DF with properties p and time t as variables and the initial ligand concentration c0:[72]To obtain a PK/PL/PD expression, [L]
in the above expression for E/Emax is replaced by the product DF(p,t) c(t), where c(t) are the
isoeffective drug concentrations eliciting the effect, which is equal
to the fraction X of the maximum effect. With E/Emax = X,
the bioactivity, characterized by c(t), is described as[72]Equivalent PK/PL/PD dependencies were derived
for the irreversible drug–receptor interactions, either single
or preceded by a fast, reversible step.[72]Equation 3 and equivalent PK/PL/PD relationships
provide the blueprint for a conceptual combination of the disposition
function with any ligand-based or receptor-based 3D-QSAR method for
prediction of binding affinities. The K term in eq 3 needs to be replaced by a suitable conceptual 3D-QSAR
correlation equation, and the disposition function can be selected
from the available repertoire.[30,32] This cell-QSAR concept
represents the recipe for the formulation of conceptual QSARs for
cell-based assays and other complex systems.
Multispecies Multimode Equilibria
Studied antimycin
analogues ionize under the conditions of the experiments, and each
species forms different intramolecular H-bonds. The molecules are
flexible, and each species can bind to the receptor R in different
orientations or conformations (modes). The generalized multispecies
multimode binding equilibria are outlined in the following equation:Each binding equilibrium is characterized
by the microscopic association constant K (eq 5) and represents the
receptor binding of the ith ligand in the jth species and the kth binding mode. In
principle, the number of species s and the number
of binding modes m can differ for individual ligands.
In the studied case, there are only two species (s = 2, nonionized and ionized, with charges 0 and −1, respectively)
for each ligand (Table 1). The number of modes
is unknown, since there is no experimental information available on
either the receptor or the complex, and can differ for individual
ligands. In the general expressions, s and m represent the maximum numbers of species and binding modes
for the tested ligands. The missing species or modes of some ligands
will have zero fractions f or zero association constants K (see below).The microscopic association
constant for the ith molecule’s jth species in the kth binding mode isUsually, only the total association constant
of the ith ligand (K) is determined, although the microscopic association
constants K are the
relevant quantities to be correlated with the structure. To obtain
a conceptual correlation equation for K, it must be expressed as a function of K.The experimentally measured K reflects the total concentration
of the ligand/receptor complexes
in any species and any binding mode. A series of rearrangements to
the definition of K provides
the desired function of K, shown in eq 6: (1) the total concentration of the complex
[LR] is expressed as the sum of bound
concentrations of individual species, (2) each summand is formally
multiplied by [L]/[L] to introduce the fraction of each species, f = [L]/[L], and (3) the bound concentration
of each species [LR] is broken down
to individual binding modes. The relationship between the microscopic
and total association constants is obtained by expressing the m-summands in eq 6 using eq 5:Here f is the fraction of the jth molecular species
in the ith compound in solution. For compounds present
in the solution as one species, the right-hand side of eq 7 is equal to the k-summation. In this case,
eq 7 is in accordance with published analyses
of formally analogous situations: the statistical thermodynamic[73] and equilibrium[58,59] treatment
of multimode binding in ligand/protein interactions and kinetic analyses
of a reversible unimolecular reaction leading to different products[74] or isomers.[75] The
expressions for the fractions f of the jth species can be derived from the
definition of the ionization constants.Only one molecule of
a compound, in any species and any mode, can
be bound in each complex. The distribution of species and modes in
the bound state is characterized by their prevalences. For the kth mode of the jth bound species for the ith compound, the prevalence isThe numerator and denominator
come from eqs 5 and 6,
respectively. The third quasi-equality ensures that the sum of all s × m prevalences equals unity. The
prevalence of the jth bound species in all modes
and the prevalence of the kth bound mode in all species
can easily be calculated using eq 8.For
the application in CoMFA, eq 7 needs
to be logarithmized and the log K values
must be replaced by the correlation equation for CoMFA:Here C are the
regression coefficients characterizing the weights of the ligand–probe
interaction energies X in individual cross-sections
of a grid surrounding the superimposed ligands. The subscripts indicate
the relatedness to the ith compound, jth species, kth mode, and lth grid
point, and g is the total number of grid points.
Other contributions to the binding free energy (internal ligand energy,
binding entropy, desolvation) can be added to the summation in the
exponents. In this case, we only used the CoMFA energies because the
size of the used data set did not allow a rigorous optimization of
more than a dozen coefficients.
Intracellular Disposition
The pseudoequilibrium DF
expresses the time course of the concentration of nonionized ligand
molecules in the aqueous phases of the studied biosystem during the
elimination period:Here P is the reference partition
coefficient, β is the empirical Collander coefficient, and t is the exposure time. The terms A, B, C, and D characterize
the key fate-determining processes the ligand molecules undergo in
the biological system: A, accumulation in membranes/protein
binding; B, distribution in aqueous phases; C, lipophilicity-dependent elimination; D, lipophilicity-independent elimination. For the receptors localized
in the bilayer, eq 10 needs to be multiplied
by Pβ.If compounds are present
in the aqueous phase in the receptor surroundings in s species created by ionization, tautomerism, or carbonyl hydration,
the model suggests that each of the parameters A, B, C, and D (represented
by Y) from eq 10 be expanded
as[76]The fractions f of individual
species are the same quantities as in eq 9,
except the subscripts i, indicating the compound,
are omitted. The subscripts j denote, as in eq 9, the quantities associated with the jth species. The fractions of nonionized, neutral molecules (subscript j = 1) and ionized molecules with an overall charge of −1
(subscript j = 2) are given by the definition of
the dissociation constant Ka for the given
pH of the medium and can be written, respectively, asGenerally, on the basis of the probable
intracellular localization
of the receptors, a reasonable intracellular pH range is 5–8,
the lower limit being observed in lysozomes and the upper limit in
the mitochondrial matrix. The pH value of the cultivation medium could
affect these values, but the exact pH of the assay was not published.[52] For this reason and because the receptor localization
is unknown, the pH value in eq 12 was treated
as one of the adjustable coefficients.
Data Set
The Selwood data set includes in vitro filaricidal
activities against D. viteae of 31
antimycin analogues (Table 1), which were determined
by monitoring the leakage of incorporated radiolabeled adenine from
intact female D. viteae after 120 h
of exposure.[36,52] The EC50 values for
adenine leakage were calculated after 120 h of exposure to a range
of drug concentrations at 37 °C under neutral conditions.[52]The 1-octanol/water partition coefficients P were calculated by the ClogP program in Bio-Loom[51] software for neutral molecules and are summarized
in Table 1. The values differ from the original
data[77] because of the developments in the
ClogP software. The differences are within ±0.25 log unit, except
for compounds 6 (Table 1, log Pold – log Pnew = 1.526), 10 (−0.770), 12 (−0.638), 14 (−1.654), and 26 (0.325). The log P values were used with the belief that they properly reflect
the relative lipophilicity of the studied compounds, although the
values of log P > 7 were marked in the ClogP output
as unrealistically high. Ionization decreases the log P values by several units, and most compounds were present mainly
as ionized species under physiologic conditions.Our multispecies
approach can take into account tautomers and hydrates
of aldehydes and ketones; hence, formation of these species was examined
using the SPARC Web server[50] and was found
negligible for all compounds on the basis of the following reasoning.As shown in eq 7, the binding contribution
of each species is given by the product of the association constant K (represented by the sum
of the association constants K for individual modes[61]) and the
species fraction f.
The fraction limit for a species to be included in the correlation
was set to 10–4 on the basis of the following considerations.
The interaction of a rare species with the receptor can release about
2–3 kcal/mol more free energy than that of the prevalent species
because both species bind by weak interactions and differ in a single,
well-defined structural feature. For this situation, the product fK for the rare species would be less than 1% of the
same quantity for the prevalent species.
Superposition
Pharmacophore models were built using
the Galahad procedure[54,55] in Sybyl-X. The ligands are first
aligned flexibly to each other in torsional space, and then the conformations
produced are rigidly aligned in Cartesian space. Pharmacophore and
steric bitmaps (TUPLET[78]) representing
H-bonding and steric terms, respectively, along with the total ligand
energy, serve as input to a multiobjective fitness function. Produced
pharmacophores indicated that the salicylamide moieties of all compounds
in all species are superimposed in all cases. In the absence of structural
and other information on the receptor, the most active and, at the
same time, one of most rigid analogues, compound 7 (Figure 2, Table 1), was selected
as the template for superposition.Conformational analysis was
performed using the Tripos force field in Sybyl[79] with an increment of 10° for all considered torsions.
When the molecule forms an intramolecular H-bond (Figure 1), either as a neutral species (OH···O=C)
or as an ionized species (NH···O–), the first torsion Φ1 is fixed and only Φ2, Φ3, and Φ4 are allowed
to rotate in conformational analysis. In the opposite case, when analogues
have no intramolecular H-bond, all four torsions in Figure 2 are considered rotatable. The approximate minimum
conformations were further optimized and Mulliken charges were calculated
in Jaguar[80] using the DFT/B3LYP method
with a 6-31G** basis set. Charge and spin multiplicity were entered
manually for different protonation states. In this way, the conformations
of the template for the superposition were obtained. Open conformation,
the mode without the intramolecular H-bond has the same optimal geometries
for both neutral and ionized species.In the MM approach, the
user is allowed to enter several possible
bound conformations of the template in one optimization run. The MSMM
approach extends this option to all species which are present in the
solution surrounding the receptor. The four basic template molecules
of compound 7 were superimposed by the atom fit method
in Sybyl (Figure 3a). All studied compounds
do not share a common skeleton; therefore, for each compound, both
species in all modes were superimposed on the templates using the
FlexS procedure[65] as incorporated in Sybyl
(Figure 3c). FlexS combines conformational
search with alignment in two steps. First, a base fragment of the
ligand is selected and placed on the template. Second, as the rest
of the ligand parts are incrementally built, torsion angles are perturbed
to generate new conformations, and each conformation is scored for
superposition quality. The obtained conformations of all compounds
were minimized by Jaguar using the DFT/B3LYP method[81] with a 6-31G** basis set[82,83] with all torsions
fixed. The Mulliken charges[84] were calculated
for the optimized geometries. The superposition was repeated, and
the resulting conformations were used in the CoMFA analyses.
Interaction Energies
Steric and electrostatic interaction
energies X with an
sp3carbon probe with a +1 charge were calculated for each
compound (subscript i) in each species (subscript j) and each binding mode (skeleton conformations ×
ring-flipping conformations, subscript k). The probe
was placed at each lattice intersection (subscript l) of a 3D grid (2 Å in each coordinate direction). The grid
has the following coordinates: −7 to +18 Å along the x-axis, 1–21 Å along the y-axis,
and −15 to +11 Å along the z-axis. An
energy cutoff of 30 kcal/mol was used to cap the maximum used energies.
Regression Analysis
The correlation equation is obtained
by taking the logarithm of eq 3, in which the
association constant log K is replaced by eq 9 and the disposition function DF(p,t) is given by eq 10, as
combined with eq 11, in which the fractions
are expressed by eq 12. The receptor-binding
and disposition components have, from the optimization viewpoint,
quite different properties: eq 9 has only one
form, and the descriptive variables need to be selected from a large
pool, while eqs 10–12 contain only two variables, which are organized in one of two possible
functional forms, depending upon the location of the receptors either
in the aqueous phase or in membranes. For the receptor-binding part
(eq 9), the coefficients C are associated with the grid points and
are independent of the species and modes. Therefore, the final set
of optimized coefficients is independent of the numbering of individual
species and modes. Notably, the inclusion of multiple species and
multiple modes does not increase the number of optimized regression
coefficients C.The correlation equation is nonlinear in the coefficients C in eq 9 and the coefficients A–D, pH and β in eqs 10–12. All statistical procedures[61] were
written in C programming language and integrated in Sybyl through
the Sybyl Programming Language (SPL) scripts. The choice of one of
the two implemented optimization methods depends on the total number
(N) of coefficients C in the calibrated
model and on the number of compounds (n). Nonlinear
regression analysis is applied when N ≤ n/2. Otherwise, the developed software automatically switches
to the PLS analysis with a linearized form[61] of eq 9. The second option was not used for
optimization of the best models. Both optimization methods are sensitive
to good initial estimates. Therefore, a forward-selection procedure,
conceptually similar to that used before for multimode analysis,[61] was a rational choice.Before analysis,
the X variables (energy
columns) were sorted using the following
criteria: (1) high and sustained variability, (2) the minimal number
of colinear columns with similar information, and (3) even distribution
of selected grid points around the ligands. Sustained variability
means that the energy values cover the whole interval more or less
evenly and are not clustered. An extreme case is the situation when
all energies in a given column are equal (usually zero) and only one
mode has a different energy. These singularities originate when only
one molecule protrudes into a subspace and other molecules do not
affect the probe interaction energies in that subspace. Singular molecules
need to be eliminated from the analysis because they do not provide
statistically significant information about the poorly covered subspace.The model calibration can be steered by user-defined inputs, for
which the default values are shown in parentheses. The process starts
with a quick scan through the models consisting of a few (four) variables
which are randomly selected from the top variables (10%). The initial
values of the coefficients (±0.1) are systematically evaluated
by a grid search. In this step, no optimization is applied and the
correlation equation is evaluated for the initial coefficients. For
the best models (top 10% r2), the coefficients
are optimized. At each of the following steps, a reduction of the
number of coefficients is attempted: the coefficients, which do not
lead to a decrease in the r2 value, are
eliminated. The best models (top 5% r2) are developed by a gradual addition of groups of a few (three)
variables until all variables from the selected set (top 10%) are
used. The procedure is finalized by fine-tuning of the best models
(top 5% r2) by addition of single variables.
The best model is selected on the basis of the following criteria: N, r2, q2 for the leave-one-out cross-validation (applied to the training
set only to eliminate unstable models), and the standard errors of
the parameters.