Unraveling the relation between the chemical structure of small druglike compounds and their rate of passive permeation across lipid membranes is of fundamental importance for pharmaceutical applications. The elucidation of a comprehensive structure-permeability relationship expressed in terms of a few molecular descriptors is unfortunately hampered by the overwhelming number of possible compounds. In this work, we reduce a priori the size and diversity of chemical space to solve an analogous-but smoothed out-structure-property relationship problem. This is achieved by relying on a physics-based coarse-grained model that reduces the size of chemical space, enabling a comprehensive exploration of this space with greatly reduced computational cost. We perform high-throughput coarse-grained (HTCG) simulations to derive a permeability surface in terms of two simple molecular descriptors-bulk partitioning free energy and pK a. The surface is constructed by exhaustively simulating all coarse-grained compounds that are representative of small organic molecules (ranging from 30 to 160 Da) in a high-throughput scheme. We provide results for acidic, basic, and zwitterionic compounds. Connecting back to the atomic resolution, the HTCG predictions for more than 500 000 compounds allow us to establish a clear connection between specific chemical groups and the resulting permeability coefficient, enabling for the first time an inverse design procedure. Our results have profound implications for drug synthesis: the predominance of commonly employed chemical moieties narrows down the range of permeabilities.
Unraveling the relation between the chemical structure of small druglike compounds and their rate of passive permeation across lipid membranes is of fundamental importance for pharmaceutical applications. The elucidation of a comprehensive structure-permeability relationship expressed in terms of a few molecular descriptors is unfortunately hampered by the overwhelming number of possible compounds. In this work, we reduce a priori the size and diversity of chemical space to solve an analogous-but smoothed out-structure-property relationship problem. This is achieved by relying on a physics-based coarse-grained model that reduces the size of chemical space, enabling a comprehensive exploration of this space with greatly reduced computational cost. We perform high-throughput coarse-grained (HTCG) simulations to derive a permeability surface in terms of two simple molecular descriptors-bulk partitioning free energy and pK a. The surface is constructed by exhaustively simulating all coarse-grained compounds that are representative of small organic molecules (ranging from 30 to 160 Da) in a high-throughput scheme. We provide results for acidic, basic, and zwitterionic compounds. Connecting back to the atomic resolution, the HTCG predictions for more than 500 000 compounds allow us to establish a clear connection between specific chemical groups and the resulting permeability coefficient, enabling for the first time an inverse design procedure. Our results have profound implications for drug synthesis: the predominance of commonly employed chemical moieties narrows down the range of permeabilities.
The passive permeation
of small molecules across lipid membranes
offers not only physicochemical insight but also crucial pharmaceutical
information about drug–membrane thermodynamics.[1] It probes the time scale of translocation due to a concentration
gradient of the drug, without active cellular mechanisms (Figure ). A detailed understanding
of the underlying structure–property relationships between
drug chemistry and passive-permeation thermodynamics, though of great
interest for drug development, is still lacking.
Figure 1
From left to right: coarse-graining
reduces the size of chemical
space, such that many small molecules of similar size and hydrophobicity
are mapped to the same representation.[5] For each molecule, we model its passive translocation across a lipid
bilayer (water not shown for clarity). The thermodynamics of the system
is characterized by the potential of mean force (PMF), evaluating
both the neutral and charged species, shifted according to the compound’s
pKa. The major dependence of the PMF on
the water/octanol partitioning and the pKa motivates these as molecular descriptors to construct a permeability
surface (eq ). These
two molecular descriptors, also highlighted in red, are experimental
quantities directly fed into the physics-based simulations to yield
a parameter-free estimation of the permeability coefficient.
From left to right: coarse-graining
reduces the size of chemical
space, such that many small molecules of similar size and hydrophobicity
are mapped to the same representation.[5] For each molecule, we model its passive translocation across a lipid
bilayer (water not shown for clarity). The thermodynamics of the system
is characterized by the potential of mean force (PMF), evaluating
both the neutral and charged species, shifted according to the compound’s
pKa. The major dependence of the PMF on
the water/octanol partitioning and the pKa motivates these as molecular descriptors to construct a permeability
surface (eq ). These
two molecular descriptors, also highlighted in red, are experimental
quantities directly fed into the physics-based simulations to yield
a parameter-free estimation of the permeability coefficient.Structure–property relationships
are often tackled by means
of high-throughput screening experiments: (i) a large number of compounds
are probed with respect to the property of interest by individual
measurements or calculations; and (ii) the relationship between structure
and property is empirically learned by means of statistical algorithms.[2−4] While structure–property relationships thus formally rely
on both the breadth and quality of the data, as well as the accuracy
of the statistical model, the common bottleneck in the pharmaceutical
sciences often arises from the former.Even though in vivo techniques
probe drug–membrane interactions
in all the intricacies of the cellular environment, the experimental
cost and complexity make them poorly suited for high-throughput screening.[6] It is instead the development of in vitro techniques
that have helped in expanding passive-permeation databases.[7,8] Unfortunately, limited aggregate data has been made publicly available
thus far. The resulting statistical models—such as quantitative
structure–property relationships (QSPRs) or machine learning—typically
rely on 102–104 data points only.[9−11] The question follows: how representative can these samples be, when
the size of drug chemical space is estimated at 1060?[12] The tendency of these statistical models to
depend significantly on individual outliers strongly suggests overfitting—these
models lack transferability across chemical space.[10] The small-data-set problem is typically aggravated by the
compounds’ poor diversity.[13]As a complementary approach to experimental measurements, physics-based
modeling provides a robust strategy to predict passive permeation
in silico.[10,14] The inhomogeneous solubility–diffusion
model[15,16] considers the concentration gradient of
a solute molecule across an interface to yield a permeability coefficient, P.[17] This results in a spatial
integral normal to the interface, z, of the potential
of mean force (PMF), G(z), and local
diffusivity, D(z):with β = 1/kBT. Equation highlights
the two key parameters that contribute to the
rate of passive permeation of a compound: its hydrophobicity, quantified
by the PMF, together with the local diffusivity. Practically, G(z) and D(z) are commonly extracted from enhanced-sampling classical molecular
dynamics simulations. Grounding the problem within the statistical
mechanics of a concentration flux diffusing through an interface combined
with conformational sampling from physically motivated force fields
can offer unprecedented insight. We stress that current experimental
techniques have yet to resolve G(z)—computer simulations thus remain the gold standard to estimate eq . Unfortunately, adequate
conformational sampling remains computationally daunting at the atomistic
level, even for a small rigid molecule crossing a single-component
lipid membrane: roughly 105 CPU-hours per compound limit
this strategy to up to ∼10 different molecules per study.[18−21] When combined with the overwhelming size of chemical space, these
figures hinder short-term prospects of running atomistic simulations
at high throughput, thereby hampering the elucidation of the underlying
structure–property relationships.Structure–property
relationships effectively project down
chemical complexity on a few molecular descriptors that map to the
property of interest.[22,23] Inferring these maps typically
relies on a statistical analysis over many measurements, identifying
a smooth (i.e., low-dimensional) connection between
structure and property. In this work we propose an alternative strategy:
rather than smoothing this connection a posteriori, we enforce it
a priori. We still rely on physics-based models but reduce their resolution
to efficiently interpolate across chemistry, while ensuring accurate
thermodynamics by construction. This enables a high-throughput approach
for two reasons: (i) the reduced representation significantly speeds
up every simulation, and (ii) the interpolation across chemistry effectively
reduces the size of chemical space. Solving the structure–property
relationship problem for the reduced model proves significantly more
tractable, and allows the identification of a permeability surface
as a function of simple molecular descriptors. We further connect
back to the original problem by means of a large-scale analysis of
our predictions.The above-mentioned reduced models, better
known as coarse-grained
(CG) models, lump together several atoms into a bead.[24,25] While defined in terms of fewer degrees of freedom, coarse-graining
remains physics-based and can be combined with rigorous free-energy
calculations. Here, we rely on the CG Martini model, which has shown
useful to simulating a wide variety of biomolecular systems.[26−28] In Martini, a small set of bead types encodes how small organic
fragments partition between solvents of different polarity, thus ensuring
robust thermodynamics at complex interfaces,[28] while cutting down the computational costs by 3 orders of magnitude.[29]The parametrization of a molecule at the
CG level thus consists
of a collection of Martini beads, each representing a specific chemical
group. Constructing a CG molecule can be streamlined into a systematic
procedure,[30] so as to emulate the molecule’s
overall shape and hydrophobicity. The small set of bead types leads
to a degeneracy in the representation: many molecules of similar shapes
and hydrophobicity map to the same CG parametrization (Figure ). Such a many-to-one mapping
generates a significant reduction in the size of chemical space—further
lowering the computational investment by an additional 103–104. The few bead types involved lead to a dramatic
reduction in the combinatorial explosion of chemistry, easing the
construction of all CG small molecules up to a certain
size (Figure ).[5] This addresses the poor-diversity issues that
synthetic databases typically face, facilitating a representative
coverage of subsets of chemical space projected primarily along size
and hydrophobicity. Recently, these properties allowed us to predict
the PMF of drug–membrane partitioning for an unprecedented
511 427 small molecules—several orders of magnitude
beyond what was previously available.[5] In
terms of accuracy we showed that the CG model achieves a mean absolute
error of 0.8 kcal/mol to predict bulk water/octanol partitioning free
energies,[30] translating to a 1.4 kcal/mol
error along a PMF.[5] We further stress that
these errors are evaluated across a significant subset of the chemistry
of small organic molecules, while atomistic results are too scarce
to make such estimates. For the present work, our error estimates
roughly translate to an accuracy of 1 log10 unit in the
permeability coefficient, validated across an extensive set of structurally
distinct compounds against both atomistic simulations and experimental
measurements (see the Supporting Information (SI)).In this work, we use high-throughput coarse-grained
(HTCG) simulations
to cover a subset of chemical space both efficiently and broadly.
Unlike conventional high-throughput screening protocols that require
an arbitrary selection of compounds, we consider all coarse-grained representations up to a threshold size, mapping to
most small organic molecules ranging from 30 to 160 Da. This comprehensive
exploration allows us to systematically investigate the effect of
hydrophobicity and pKa on the permeation
rate (Figure ), unlike
previous studies limited to a handful of compounds.[18−21] Our methodology offers a unique
approach to construct a two-dimensional surface describing the permeability
of a small molecule across a lipid membrane (Figure ). The molecular descriptors are here motivated
by the physics of the permeation process, i.e., the interplay between
diffusivity and solubility (eq ). Because the diffusivity was shown to be rather insensitive
to chemical detail,[18] we focus on the potential
of mean force, G(z). We have recently
shown that the key features of G(z) can be reconstructed simply from the bulk partitioning free energy.[5] By further accounting for the contribution of
different protonation states, we also express the permeability surface
in terms of its acid dissociation constant in water, pKa. In the following we focus on acidic and basic compounds,
while the SI further discusses zwitterions.
These surfaces allow for a rapid, simulation-free prediction of drug permeability starting from key molecular properties.
The accuracy is roughly on par with explicit CG simulations because
of compensating errors between the two methods.Extracting permeability
surfaces from the CG simulations allows
us to connect back to the original structure–property relationship
problem. Our analysis of over 500 000 small molecules mapping
to the investigated CG representations unveils the role played by
representative functional groups in the permeability coefficient,
enabling inverse molecular design. The link drawn here has profound
implications for drug synthesis: favoring the incorporation of certain
chemical groups (e.g., carboxylic groups) will reduce the range of
accessible permeabilities of the final compound.
Results and Discussion
While drug permeation is known to depend on lipid composition,[21] in this work we only consider a single-component
bilayer made of 1,2-dioleoyl-sn-glycero-3-phosphocholine
(DOPC). The permeability coefficient, P, is readily
estimated from the PMF and diffusivity profile (eq ). The PMFs are extracted from HTCG simulations
of all CG representations made of one and two beads,
mapping to a representative subset of small organic molecules in the
range 30–160 Da.[5] For compounds
capable of (de)protonating, we also model the corresponding charged
species. For convenience, we distinguish the pKa of a chemical group as being either acidic (apKa) or basic (bpKa), which
quantifies the propensity of a neutral compound to
deprotonate or protonate, respectively. The effective permeability
coefficient is constructed by a combination of the two PMFs (Figure ), shifted according
to the compound’s pKa in water
(see the Methods section).[31,32] The diffusivity profile is estimated from reference atomistic simulations.[18]
Permeability Surfaces
Figure displays the computed drug–membrane
permeability as a function of two drug parameters: its pKa in water and water/membrane partitioning free energy, ΔW→M. The latter
corresponds to the free-energy difference between insertion in bulk
water and the membrane-bilayer midplane. Though we have shown this
quantity to correlate extremely well with the experimentally accessible
water/octanol partitioning free energy, ΔW→Ol, ΔW→M displays enhanced transferability
across CG molecular sizes.[5] Indeed, HTCG
simulations of single-bead or two-bead CG compounds lead to identical
permeability surfaces, except for the range of ΔW→M covered (compare Figure S4 with Figure ).
Figure 2
Permeability surfaces (log10 scale)
calculated from
HTCG simulations as a function of two small-molecule descriptors:
the (a) basic or (b) acidic pKa in water
and the water/membrane partitioning free energy, ΔW→M. Cooler (warmer) colors
correspond to faster (slower) permeating molecules. The intersection
between the two surfaces corresponds to compounds that effectively
always remain neutral. Green ●, yellow ★, and orange
■ correspond to deviations from atomistic simulations within
0.5, 1.3, and 2.2 log units, respectively (SI).
Permeability surfaces (log10 scale)
calculated from
HTCG simulations as a function of two small-molecule descriptors:
the (a) basic or (b) acidic pKa in water
and the water/membrane partitioning free energy, ΔW→M. Cooler (warmer) colors
correspond to faster (slower) permeating molecules. The intersection
between the two surfaces corresponds to compounds that effectively
always remain neutral. Green ●, yellow ★, and orange
■ correspond to deviations from atomistic simulations within
0.5, 1.3, and 2.2 log units, respectively (SI).Figure displays
smooth permeability surfaces as a function of the drug’s acidic
and basic pKa value in water. The log10 scale of the permeability surfaces indicates the wide time
scale variations these molecular parameters exert on the thermodynamic
process. For both panels, the horizontal behavior indicates that larger
permeabilities are obtained toward the left—more hydrophobic
compounds—while polar molecules experience more difficulties
crossing the lipid bilayer, leading to a drastic reduction in P. The effect is compounded by (de)protonation: Figure b across the vertical
axis describes the effect of the compound’s apKa in water onto P. Extremely strongly
acidic molecules (apKa ≲ 2) effectively
remain charged across the membrane interface, leading to prohibitively
large free energies along the PMF, such that their rate of permeation
is strongly suppressed. Increasing apKa shows a significant increase in P, up to apKa ≈ 7, beyond which P plateaus. This stabilization is due to the competition between neutral
and charged PMFs, where the charged PMF is shifted to increasingly
larger values, and therefore never contributes significantly compared
to the more attractive neutral PMF. Of particular interest are the
strong acids (2 ≲ apKa ≲
7), which neutralize upon entering the membrane, effectively enhancing
the permeability coefficient as compared to a compound that remains
charged across the interface. An approximately symmetric behavior
can be observed when switching from acidic to basic compounds (Figure a). The impact of
both apKa and bpKa on the permeability coefficient becomes even more pronounced
in the case of zwitterions (Figure S5),
where high permeation rates are only obtained for compounds containing
both weak acidic and basic chemical groups.The permeability surface also displays a comparison against atomistic
simulations[18,19,31] for several compounds (symbols in Figure ). These points provide a validation of our
methodology—we report a mean absolute error of 1.0 log10 unit across the two molecular descriptors—with additional
information included in the SI (also against
experimental data). Most importantly, the few data points highlight
the extremely limited exploration of chemical space using in silico
simulations at an atomistic resolution.
Functional-Group Localization
on the Permeability Surfaces
To better elucidate how the
chemical structure impacts the permeability
coefficient, we consider a large database of small organic molecules
from combinatorial chemistry: the generated database (GDB).[33,34] It consists of a large set of stable molecules up to 10 heavy atoms
made of the chemical elements C, O, N, and F, saturated with H. We
pointed out how transferable coarse-grained models effectively reduce
the size of chemical space by lumping many molecules into one coarse-grained
representation.[5] This allows us to associate
the above-mentioned one- and two-bead CG permeability results to 5
× 105 molecules. The distinction made between compounds
that reduce to CG molecules made of a single bead (“unimers”)
from those made of two beads (“dimers”) effectively
amounts to a segregation between molecular weights.[5] We populate the permeability surfaces with these compounds—projecting
them onto the two molecular descriptors: pKa and water/octanol partitioning free energy ΔW→Ol. By coarse-graining every
single compound, we establish a map between chemical structure and
its CG thermodynamic property.Figure displays the chemical-space coverage of
GDB compounds onto the molecular descriptors. For all panels, we have
colored the points in terms of the permeability calculated using HTCG
simulations. Top and bottom panels distinguish between bpKa and apKa, while left and
right denote unimers and dimers, respectively. We first note that
the cloud of points is not uniformly distributed, but is instead centered
around zero in ΔW→Ol. An increase in the molecular weight of the compound (left to right
in Figure ) opens
up new regions of chemical space, as we observe a significant broadening
of the distribution along the water/octanol axis. This naturally arises
due to the extensivity of the water/octanol partitioning, the more
complex combinatorics of atoms involved, and the additional presence
of five-membered rings.
Figure 3
Chemical-space coverage of GDB projected onto
pKa and water/octanol partitioning free
energies, ΔW→Ol. Basic and
acidic pKa are shown in panels a and b,
and c and d, respectively. Panels (a,c) and (b,d) describe the coverage
corresponding to coarse-grained unimers and dimers, respectively.
Regions highlighted in light blue display several representative chemical
groups. Substitutions denoted by “?” correspond to either
H or a substitution starting with an alkyl or aryl carbon, while “?*”
only corresponds to substitutions that begin with an alkyl carbon.
(e) Our analysis clusters molecules containing both a predominant
functional group (blue), and one or several substitutions (black),
of which only a few possibilities are shown.
Chemical-space coverage of GDB projected onto
pKa and water/octanol partitioning free
energies, ΔW→Ol. Basic and
acidic pKa are shown in panels a and b,
and c and d, respectively. Panels (a,c) and (b,d) describe the coverage
corresponding to coarse-grained unimers and dimers, respectively.
Regions highlighted in light blue display several representative chemical
groups. Substitutions denoted by “?” correspond to either
H or a substitution starting with an alkyl or aryl carbon, while “?*”
only corresponds to substitutions that begin with an alkyl carbon.
(e) Our analysis clusters molecules containing both a predominant
functional group (blue), and one or several substitutions (black),
of which only a few possibilities are shown.Unlike bulk partitioning, the pKa of
a compound is not significantly impacted by aggregate behavior, but
is instead dominated by one or a few specific chemical groups capable
of (de)protonating. As such, we investigated the presence of chemical
groups representative of a subset of chemical space. The regions in
blue highlight a chemical group that is predominant, appearing in
at least 50% of the molecules in that subset. Detailed statistics
pertaining to the frequency of specific functional groups in each
of the blue regions are provided in the SI. The localization of chemical groups remains largely similar from
unimers to dimers (e.g., carboxylic group). Our high-throughput analysis
offers an intuitive visualization of the link between chemistry and
permeabilities via the pKa. Figure reflects that oxygen-containing
functional groups are generally more likely to be proton donors, whereas
nitrogen-containing functional groups can serve as either proton donors
or acceptors.[35] At low apKa values, we mainly see carboxylic groups transitioning
to nitrogen-containing functional groups (e.g., oxime derivatives)
as we increase the apKa. Contrastingly,
the bpKa chemical coverage displays no
predominant oxygen-containing functional groups. Notable exceptions
are the zwitterionic amino-acid-like compounds and certain aromatic
heterocyclic compounds shown in Figure , which have both a low apKa and a high bpKa. These functional groups
largely contribute to the chemical coverage of zwitterions (Figure S5b).
Linking Functional Groups
and the Permeability Surface Enables
Molecular Design
Figure enables a robust ad hoc method for both direct and
inverse molecular design. The direct route amounts to estimating the
permeability coefficient given a chemical structure. Figure simply requires an estimate
for the two molecular descriptors, pKa and ΔW→Ol, from either experiments or prediction algorithms.[32,36] More interestingly, our results allow us to focus on specific regions
of chemical space compatible with a desired permeability coefficient.
We effectively reduce the high dimensionality of chemical space by
projecting down onto our molecular descriptors and identifying key
scaffolds.Figure offers a simple route at an inverse design procedure. For example,
if designing a small molecule of 3–5 heavy atoms (i.e., mapping
to a CG unimer) that requires a log10P of −1.0, Figure c suggests molecules containing either a terminal hydroxyl
group or an oxime group. Indeed, small alcohols such as propanol and
butanol match this target (Figure S8),
although we are not aware of relevant experimental studies containing
small oxime derivatives. Interestingly, we can also predict how small
chemical changes will affect permeability: a change that impacts hydrophobicity
(e.g., through heteroatom substitutions) will smoothly shift the compound
horizontally on the surface. On the other hand, the introduction of
new (de)protonatable groups might lead to large jumps on the surface,
dictated by the strongest acid or base present in the molecule. The
different behavior across the horizontal and vertical axes is due
to the extensive and intensive characters of the descriptors, respectively.Critically, Figure shows remarkable transferability outside the range
of compounds used in the screening. For example, while salicylate
is made up of 10 heavy atoms, its aromatic ring leads to a four-bead
representation. CG simulations using this parametrization result in
log10P = −4.21 (Figure S6 and Table S1), deviating only one log10 unit from the atomistic results (highlighted as one of the symbols
in Figure ).[18] Alternatively, we can easily read off the permeability
from the surface: the carboxylic group is the main contributor for
its descriptors apKa = 2.8 and ΔW→Ol = −2.7
kcal/mol (Figure ).
This results in a simulation-free prediction for
log10P of −3.72, less than two
log units away from the atomistic results. The discrepancy between
the four-bead representation and the dimer surface we rely on is the
main source of errors: we have observed a systematic shift between ΔW→Ol and ΔW→M as a function
of the number of CG beads.[5] An even more
challenging test case involved ibuprofen (206 Da, significantly outside
our range of molecular weights), for which both CG simulations and
the surface prediction yield an accuracy within 1 log10 unit within the atomistic results (symbol in Figure , and Figure S6 and Table S1).We verified this consistent accuracy between explicit
CG simulations
and simulation-free surface predictions across two dozen small molecules—both
in and out of the range of molecular weights considered (Figure S7 and Table S1). Although one would expect
higher accuracy from explicit simulations, we observe compensating
errors between the discretization of partitioning free energies and
the smoothing of the surface. The transferability beyond the initial
molecular weight considered speaks to the robustness of our physics-based
approach. This feature contrasts radically with statistical methods
that fit experimental data, such as QSPRs: the transferability of
a QSPR model hinges upon potential biases in the training data set.
Given the small data set sizes available from experiments and the
wider range of molecular weights, QSPR models tend to be limited to
chemistries very close to those used in training.[37,38] On the other hand, the HTCG method systematically spans a wide region
of chemical compound space without resorting to parameter tuning,
offering accurate predictions even beyond the range of molecular weight
considered.
Impact of Functional-Group Localization on
Bioavailability
The projection of the GDB database onto the
two molecular descriptors
provides a low-dimensional representation of chemical-space coverage.
Interestingly, this helps compare its breadth and variety with other
databases. In particular, we focus on ChEMBL: a database of synthesized
compounds.[39] We prune ChEMBL to only retain
compounds roughly compatible in size with the compounds in GDB (up
to 10 heavy atoms), as well as H, C, O, N, and F elements only. Figure displays the coverage
of both GDB and ChEMBL onto the molecular descriptors. Here again, Figure a,b distinguishes
acidic and basic ionizing groups. We first note that ChEMBL displays
a much smaller number of data points, illustrating the minuscule ratio
of stable compounds that have been synthesized.[12] Overall the two databases cover remarkably similar regions
of this chemical surface. However, a projection of the distributions
along the individual axes indicates a statistically significant difference
for apKa: synthesized compounds strikingly
over-represent compounds with low apKa values (from 2 to 4). We find a significant over-representation
of carboxylic groups in ChEMBL: 90% of the compounds in the range
0 < apKa < 6 contain such a group.
This well-known bias in drug design[40] can
readily be rationalized: Synthesizing compounds that include carboxylic
groups will offer relatively strong acidity as well as an improved
ability to hydrogen bond—a dominant interaction in most biomolecular
processes. Our results introduce further implications: the over-representation
of carboxylic groups will effectively narrow down the range of permeability
coefficients. This limitation will be further compounded by the necessity
of a drug candidate to show high aqueous solubilities, and the delicate
interplay existing between these two properties,[41] overall affecting the compounds’ bioavailability.
Figure 4
Comparison
of the chemical-space coverage of the combinatorial
GDB and synthetic ChEMBL databases, projected onto (a) basic or (b)
acidic pKa and water/octanol partitioning
free energy, ΔW→Ol. The coverages are further projected down along a single variable
on the sides. Note the significant differences between the GDB and
ChEMBL distributions along the apKa in
panel b.
Comparison
of the chemical-space coverage of the combinatorial
GDB and synthetic ChEMBL databases, projected onto (a) basic or (b)
acidic pKa and water/octanol partitioning
free energy, ΔW→Ol. The coverages are further projected down along a single variable
on the sides. Note the significant differences between the GDB and
ChEMBL distributions along the apKa in
panel b.
Conclusions
We
present the prediction of membrane-permeability coefficients
for an unprecedented number and chemical range of small organic molecules
across a single-component DOPClipid bilayer. Rather than tackling
the original structure–property relationship problem head-on,
we work with physics-based reduced models that smoothly interpolate
across chemistry, thereby reducing the size of chemical space. Critically,
we do not arbitrarily select compounds to be screened, but instead
systematically consider all coarse-grained representations
that map to small organic molecules ranging from 30 to 160 Da. Coarse-grained
permeability predictions were extensively validated against both atomistic
simulations and experimental measurements for structurally diverse
compounds. The high-throughput coarse-grained (HTCG) simulation approach
used here compounds more efficient conformational sampling and reduction
in chemical space, offering an overall speedup of ∼106 compared to atomistic simulations. This enables a systematic exploration
of the link between chemical structure and permeability coefficient.
To this end we construct a smooth surface as a function of two molecular
descriptors: the pKa and water/membrane
partitioning free energy, ΔW→M. The many orders of magnitude covered by the
surface indicate the significant impact of the small molecule’s
chemistry onto the thermodynamic process. The surfaces illustrate
how strong acids and bases limit the loss of permeability for charged
compounds. Having solved the reduced structure–property mapping
allows us to connect back to the original, higher-dimensional problem.
We identify dominant functional groups representative of chemical
regions in the permeability surface. The identification of functional
groups linking to the permeability coefficient effectively provides
robust structure–property relationships for drug–membrane
permeation, and the means to perform inverse molecular design. Finally,
we show how the apparent bias of synthetic databases toward carboxylic
groups can have deleterious effects on the accessible range of permeability
coefficients, and thus on bioavailability. All in all, our HTCG approach
offers a complementary approach to in vitro high-throughput screening,
providing much larger numbers of compounds (510 000 in this
study) than currently available in public databases. The much larger
data set size will help statistical models (e.g., QSPRs) reach improved
transferability. In analogy to rapidly growing interests in generating
in silico databases of electronic properties,[42,43] we expect HTCG to have a broad impact in efficiently mapping the
relevant low-dimensional surfaces that link chemical structure to
thermodynamic properties.
Methods
Molecular Dynamics Simulations
Molecular dynamics simulations
in this work were performed in GROMACS 4.6.6[44] and with the Martini force field,[26−28] relying on the standard
simulation parameters.[45] The integration
time step was δt = 0.02 τ, where τ
is the model’s natural unit of time dictated by the units of
energy , mass , and length , . Sampling from
the NPT ensemble at P = 1 bar and T =
300 K was obtained by means of a Parrinello–Rahman
barostat[46] and a stochastic velocity rescaling
thermostat,[47] with coupling constants τ = 12 τ and τ = τ, respectively. We relied on the INSANE building
tool[48] to generate a membrane of ≈36
nm2 containing N = 128 DOPC lipids (64
per layer), N′ = 1890 water molecules, N″ = 190 antifreeze particles,[27] and enough counterions to neutralize the box. The system
was subsequently minimized, heated up, and equilibrated.The
potential of mean force G(z) of
each compound was determined by means of umbrella sampling.[49] We employed 24 simulation windows with harmonic
biasing potentials (k = 240 kcal/(mol nm2)) centered every 0.1 nm along the normal to the bilayer midplane.
In each of them, two solute molecules were placed in the membrane
to increase sampling and alleviate leaflet-area asymmetry.[31,50] The total production time for each umbrella simulation was 1.2 ×
105 τ. We then estimated the free-energy profiles
by means of the weighted histogram analysis method.[51−53]
Permeability
Coefficients
The permeability coefficient
is obtained from the potential of mean force G(z) and local diffusivity D(z) in the resistivity R(z) = exp[βG(z)]/D(z) (see eq ). For compounds with multiple protonation states, both neutral and
charged species contribute to the total flux, leading to the total
resistivity RT given by[18]RT(z)−1 = RN(z)−1 + RC(z)−1, where RN and RC are the resistivities of the neutral and charged
species, respectively. In calculating these quantities in the case
of a single (de)protonation reaction, one has to offset the corresponding
PMFs GN(z) and GC(z) by the free-energy difference
for the acid/base reaction in bulk water,[31] as shown inFigure , where we systematically consider neutral
pH = 7.4. Beyond
the distinction between acid and base, we consider both neutral and
charged species (Figure ): (i) a neutral acid deprotonates into a charged conjugate base
(acidic pKa or apKa), and (ii) a neutral base protonates into a charged conjugate
acid (basic pKa or bpKa). The extension to zwitterions, in which two consecutive
protonation and deprotonation reactions occur in different chemical
groups leaving the molecule globally neutral, is discussed in the SI.Estimation of the local diffusivity, D(z), using the CG simulations is a priori
problematic given the tendency of these models to inconsistently accelerate
the dynamics.[54] On the other hand, atomistic
simulations showed that the diffusivity across a DOPC bilayer was
virtually independent of the chemistry of the solute.[18] We used this profile in the present calculations. We stress
that the local diffusivity only provides a logarithmic correction
to log10P (see eq ), and therefore has limited impact—a
variation well within 1 log10 unit depending on the diffusivity
profile. More details can be found in sections S2 and S6 of the SI.We obtained
the permeability
surfaces presented in Figure and Figure S4 by first determining
the PMF G(z) for all possible neutral combinations of one and two CG beads, 119 in total.
For each of them we then determined G(z) for its charged counterparts, amounting to a total of 232 additional
compounds. All PMF calculations required less than 105 CPU
hours, on par with the typical computational time needed to run a single compound at an atomistic resolution.[10] At the CG level, protonating (deprotonating) a neutral
chemical group amounts to replacing the bead type with a positive
(negative) charge. We assume that the (de)protonation reaction always
occurs in the chemical fragment represented by the more polar bead,
and select the bead accordingly. In section S3 of the SI, we justify this approach by analyzing the
pKa distribution for various CG bead types.
By combining neutral and charged PMFs, we calculated the permeability
coefficient of every compound as a function of the bpKa (apKa) every 0.2 pKa unit, and projected the results on the (ΔW→M, pKa) plane. The data consisted of a discrete set
of permeabilities densely covering the partitioning free-energy axis
located at the ΔW→M of each CG compound, and were finally interpolated on a grid with
Gaussian weights resulting in the surfaces shown in Figure and Figure S4.
Chemical-Space Coverage
Prediction
of the water/octanol
partitioning on both chemical databases considered in this work, GDB[33,34] and ChEMBL,[39] was performed by means
of the neural network ALOGPS.[36] apKa and bpKa predictions
of neutral compounds were provided by the calculator plug-in of CHEMAXON
MARVIN.[32] The mean absolute errors associated
with the two prediction algorithms are 0.36 kcal/mol[36] and 0.86 units,[55] respectively.
The aggregate predictions of water/octanol partitioning and pKa on both databases required roughly 102 CPU hours. Functional groups were identified using the CHECKMOL
package.[56] Using the AUTO-MARTINI scheme,
511 427 molecules were coarse-grained.[30] AUTO-MARTINI automatically determines the coarse-grained force field
in two steps: (i) the CG mapping is optimized according to Martini-based
heuristic rules, and (ii) interactions are set by determining a type
for each bead, selected from chemical properties of the encapsulated
atoms, especially water/octanol partitioning, net charge, and hydrogen-bonding.
Authors: Paulo C T Souza; Sebastian Thallmair; Paolo Conflitti; Carlos Ramírez-Palacios; Riccardo Alessandri; Stefano Raniolo; Vittorio Limongelli; Siewert J Marrink Journal: Nat Commun Date: 2020-07-24 Impact factor: 14.919