Literature DB >> 36148968

Chemical Space Exploration with Active Learning and Alchemical Free Energies.

Yuriy Khalak¹, Gary Tresadern², David F Hahn², Bert L de Groot¹, Vytautas Gapsys¹.

Abstract

Drug discovery can be thought of as a search for a needle in a haystack: searching through a large chemical space for the most active compounds. Computational techniques can narrow the search space for experimental follow up, but even they become unaffordable when evaluating large numbers of molecules. Therefore, machine learning (ML) strategies are being developed as computationally cheaper complementary techniques for navigating and triaging large chemical libraries. Here, we explore how an active learning protocol can be combined with first-principles based alchemical free energy calculations to identify high affinity phosphodiesterase 2 (PDE2) inhibitors. We first calibrate the procedure using a set of experimentally characterized PDE2 binders. The optimized protocol is then used prospectively on a large chemical library to navigate toward potent inhibitors. In the active learning cycle, at every iteration a small fraction of compounds is probed by alchemical calculations and the obtained affinities are used to train ML models. With successive rounds, high affinity binders are identified by explicitly evaluating only a small subset of compounds in a large chemical library, thus providing an efficient protocol that robustly identifies a large fraction of true positives.

Entities: Chemical

Mesh：

Substances：

Year: 2022 PMID： 36148968 PMCID： PMC9558370 DOI： 10.1021/acs.jctc.2c00752

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.578

Introduction

The endeavor of drug discovery can be viewed as chemical space exploration with an aim to simultaneously optimize multiple properties, e.g., ligand binding affinity to the target, synthetic accessibility, and toxicity. As this search space is vast, estimates go up to 1060 drug-like compounds,[1]in vitro and in vivo library screens are able to cover only a minor fraction of the possible solutions. To this end, computational chemoinformatic and physics-based approaches have been employed to increase the reach of the chemical space explorations. Over the recent years, with the advent of artificial intelligence (AI) methodology, machine learning (ML) approaches saw a rapid adoption in drug discovery. A lot of research has been devoted to constructing artificial neural networks capable of exploring chemical space to suggest novel drug-like candidate molecules for further screens.[2−4] Deep learning methods have also been successfully applied to predict ligand molecular properties and establishing QSAR models.[5] Establishing structure activity relationships requires accurate prediction of the ligand binding affinity to the target protein and remains a key challenge for computational chemists. Active learning (AL) approaches present a promising pathway to this goal. The AL methodology comprises an iterative approach where the machine learning models suggest new compounds for an oracle (experimental measurement or a computational predictor) to evaluate. These compounds and their scores are then incorporated back into the training set for further improvement of the models. For example, machine learned models have been used to predict results of free energy calculations[6−8] or molecular docking.[9] Subsequently, a fraction of the compounds was selected for calculations in the next iteration. Feeding the results of the calculations back into the ML training and iterating the process in a loop allowed to efficiently screen through a large chemical library. Alchemical free energy calculations[10−12] based on first principle statistical mechanics may serve as an optimal input for such AL applications. While computationally demanding, nowadays these calculations are readily accessible even at large scale: the predictions for hundreds to thousands of ligands can be obtained in a matter of days.[13,14] Also, the accuracy of alchemical predictions draws close to the experimental measurements.[15−18] Therefore, using these calculations as an oracle to construct ML models could allow describing binding affinities of large chemical libraries with high accuracy, while only a small fraction of the library needs to be evaluated with the computationally expensive alchemical method. In the current work, we apply AL approaches to the lead optimization step of drug discovery. In the first part of the study we retrospectively analyzed a large set of phosphodiesterase 2 (PDE2) inhibitors for which experimentally measured binding affinities were readily available. We explored the optimization of the learning process with respect to the ligand selection procedure for the free energy calculations, the molecule encoding for ML, and the hyper-parameter tuning of the ML models. Having established the optimal set of parameters to efficiently navigate in this chemical subspace, we proceeded with a prospective search for potent PDE2 inhibitors. We generated an in silico compound library and navigated using an active learning protocol based on the alchemical free energy calculation oracle. Lead optimization performed this way recovered multiple ligands with strong computed binding affinities, with only a small fraction of compounds screened by computationally costly alchemical calculations.

Methods

Generating Ligand Binding Poses

For the retrospective ligand library, which spans multiple different scaffolds, multiple aligned crystal structures with bound inhibitors were considered for use as reference structures for starting pose generation: 4D08,[19] 4D09,[19] 4HTX,[20] 6CYD,[21] 6EZF,[22] as well as 13 unpublished structures shared with us by Janssen Research & Development. For each ligand in the retrospectively analyzed library (Part I) the inhibitor with the highest Dice similarity[23] based on the RDKit topological fingerprint[24] was used as the reference. For the prospective investigation in Part II, the generated ligand library shared a core with the inhibitor from the 4D09 crystal structure;[19] thus, 4D09 coordinates were used as a reference for the generation of binding poses for each ligand in the library. Afterward, coordinates of the largest substructure matches between each ligand and its reference were constrained to the same coordinates as in the crystal structure. The remaining atoms initial guesses were assigned via constrained embedding following the ETKDG algorithm[25] as implemented in RDKit.[24] This approach was not always able to respect the constrained positions of the common substructures and would return a different outcome depending on the initial random seed. One hundred of such structures were constructed for each ligand, and the one with the smallest RMSD to the reference was selected. Ligand binding poses were then refined by molecular dynamics simulations in a vacuum. Here, the 6EZF[22] structure was used for the retrospectively analyzed ligand library (Part I), and the 4D09 crystal structure[19] was used for the prospective library (Part II). First, a hybrid topology between the reference inhibitor and each ligand was constructed with pmx,[26] and the coordinates of the largest common substructure were restrained with a force constant of 9000 kJ/(mol nm2). Next, the energy of the protein and reference inhibitor system was minimized. Finally, the reference inhibitor was morphed into the ligand following the hybrid topology while simultaneously lowering the temperature from 298 to 0 K in a 10 ps simulation. Ligand coordinates from the final frame were treated as the binding pose and used to construct both the ligand representations for machine learning and as starting ligand coordinates for the relative binding free energy calculations.

Ligand Representations and Feature Engineering

Machine learning of ligand properties requires a consistent, fixed-size vector representation for each ligand. These are typically composed of molecular fingerprints and/or constitutional, topological, geometric, thermodynamic, and electronic features of the molecule, an overview of which can be found elsewhere.[27] Here, we explored several representations to encode the ligand library. The first and most complex representation was built from all the features we could compute directly with RDKit[24] from ligand topologies and 3D coordinates. Hence, we refer to this representation as 2D_3D. These features include constitutional, electrotopological, and molecular surface area descriptors as well as multiple well-established molecular fingerprints. A more detailed breakdown is shown in Table S3. Another representation was based on MedusaNet[28] and allows for encoding the three-dimensional shape and orientation of a ligand in the active site. For this representation we split the binding site into a grid of cubic voxels with 2 Å edge length and counted the number of ligand atoms of each chemical element in each voxel resulting in a sparse 4-dimensional tensor. Unlike in the original MedusaNet paper,[28] which dealt with convolutional neural networks, we used a one-dimensional representation of the tensor, as we work with linear layers instead. We refer to this representation as atom-hot, as it is similar to one-hot encoding used to label training data for classifiers in machine learning, except for multiple atoms being able to occupy the same voxel. Additionally a modified version, called atom-hot-surf, was probed. This representation only considered voxels on the van der Waals surface of the binding pocket. The rest of the representations encoded protein ligand interactions. The PLEC fingerprints[29] were constructed by means of the Open Drug Discovery Toolkit v0.7[30] to represent the number and type of contacts between the ligand and each protein residue from the 4D09 crystal structure. Additionally, we also used a pair of representations composed of both electrostatic and van der Waals interaction energies between the ligand and each protein residue with at least one atom within 1.5 nm of any ligand in the library. Both were computed with Gromacs 2021.1,[31] the Amber99SB*-ILDN force field[32−34] for the protein, and the GAFF 1.9 force field[35] for ligands. The energies were evaluated at two different cutoff values: 1.1 nm for the MDenerg representation and 5.1 nm for MDenerg-LR representation. Finally, in the first three iterations for the prospectively analyzed data set (Part II), R-group-only versions of all of the above representations were also used in addition to the complete ligand ones described above. In these representations, features that were impossible to calculate for all ligands in the library given the much smaller structures, like parts of the GETAWAY fingerprint, were dropped from the respective representations.

Ligand Selection Strategies

The character of chemical space exploration can be altered by modifying the selection strategy of ligands to be presented for an evaluation by the oracle. We have probed the following strategies to select a batch of 100 ligands at every iteration: random selection of ligands; greedy selects only the top predicted binders at every iteration step; narrowing strategy combines broad selection in the first 3 iterations with the subsequent switch to greedy approach. For the first iterations, several models are trained, each using different sets of the previously described ligand descriptors and the 5 models with the lowest cross-validation RMSE are identified. From each of those models, the 20 best predicted binders are then selected; uncertain strategy selects the ligands for which the prediction uncertainty is the largest; mixed strategy first identifies the 300 ligands with the strongest predicted binding affinity (three times more than with greedy selection), and then selects the 100 ligands with the most uncertain predictions among them. In all the cases, initialization of the models (iteration 0) was based on the weighted random selection. Namely, the ligands were selected with the probability inversely proportional to the number of similar ligands in the data set. Ligands were considered similar if after a t-SNE embedding[36] they fell within the same bin of a 2D histogram (the square bins of the 2D histogram had a side length of one unit in the t-SNE space). The embedding was constructed from the ligands’ 2D features (constitutional and graph descriptors as well as MACCS[37] and BCUT2D[38] fingerprints) using the full ligands for the retrospective library and only the R-groups for the prospective library.

The Oracle: Alchemical Free Energy Calculations

Free energy calculations were used to generate training targets in the prospective data set (Part II). These calculations were based on the molecular dynamics simulations relying on the nonequilibrium free energy calculation protocol[10,17] based on Crooks’ Fluctuation Theorem.[39] The perturbation maps were constructed using a star shaped map topology,[40] where a single ligand with the experimentally measured binding affinity and common scaffold with the rest of the compounds was used as a reference for all perturbations. All the ligands were considered in their neutral form using a single tautomer, as generated by RDKit. First, the ligands were parametrized with GAFF 1.81 using ACPYPE[41] and AnteChamber[42] with AM1-BCC charges[43] and off-site charges for halogen atoms.[44] A hybrid topology was then built for each evaluated ligand against the reference ligand with pmx.[26] Solutes for the two legs of the thermodynamic cycle were assembled. One leg of the cycle contained the protein (parametrized by the Amber99sb*ILDN[32−34] force field) from the 4d09 crystal structure,[19] the crystallographic waters, and the hybrid ligands positioned according to their previously determined binding poses. The other branch contained only the hybrid ligands. The structures were then solvated with TIP3P[45] water and 0.15 M sodium and chloride ions parametrized by Joung and Cheatham[46] in a dodecahedral simulation box with 1.5 Å padding. All subsequent simulations were carried out with Gromacs 2021.6[31] with a 2 fs integration time step. Prior to production runs, energy minimization and, for the protein and ligand leg of the cycle, a short 50 ps NVT simulation were performed. During these runs the solute heavy atoms were position restrained with a force constant of 1000 kJ/(mol nm2). Subsequently, 6 ns equilibrium simulations were performed in an NPT ensemble. The temperature was kept at 298 K with the velocity rescaling thermostat[47] with a time constant of 2 ps. One bar pressure was retained with the Parrinello–Rahman barostat[48] with a time constant of 5 ps. Electrostatic interactions were handled via Smooth Particle Mesh Ewald[49,50] with 1.1 nm real space cutoff. Van der Waals interactions were smoothly switched off from 1.0 to 1.1 nm. Isotropic corrections for both energy and pressure due to long-range dispersion[51] were applied. All bond lengths were constrained via the LINCS algorithm.[52] From the generated trajectories the first 2.25 ns were discarded, and the remaining simulation frames were used to initialize alchemical nonequilibrium transitions between the two end states: 80 transitions in each of the two directions. 50 ps long nonequilibrium alchemical transitions were started from each frame, and the work needed to perform the transition was recorded. Relative binding free energies were calculated from the bidirectional work distributions using a maximum-likelihood estimator[53] implemented in pmx.[26] The whole equilibration-transitions-analysis protocol was repeated five times for each evaluated ligand and the mean and standard error of the five repeats were taken as the relative binding free energy and associated uncertainty. To obtain the absolute binding free energy for use in the training set, the relative free energy was combined with the experimentally known absolute binding free energy of the reference ligand.

Model Architecture

Regression models for free energy prediction were ensemble models[54] of multilayer perceptrons with ReLU[55] activation functions. Each individual perceptron was trained on a 5-fold split of the training data, each leaving out one fold for cross-validation, and was initialized with different weights and biases. Each produced different predictions for ligands in regions of chemical space where insufficient training data was available. Averaging over the predictions of independent models allowed us to recover not only more precise values but also more accurate ones.[56] Final predictions in most of this work came from means of 5 models with standard errors used for uncertainties. However, iterations four and five of active learning on the prospective library further expand the ensembles to average the final prediction over five repeats of the above cross-validation training procedure, leading to averaging over 25 individual models in total for these iterations. Varying network depths and layer widths were probed. Preliminary hyper-parameter optimization of these values was carried out on the retrospectively analyzed data set in Part I. The resulting values were used for the first three iterations of active learning on the prospective data set in Part II. In iteration 4, hyper-parameters were reoptimized and feature selection was performed by selecting the best combination of previously described ligand encodings to use for the available training data. The combination of 2D_3D descriptors and PLEC fingerprints performed the best. In addition, for this iteration feature selection was performed by discarding the features whose mean importance determined by Integrated Gradients[57] was under 0.02. Subsequent iterations used the full 2D_3D ligand representation without further feature selection. The details of the meta-parameters used with the prospective data set are in Table S1. Active learning on the retrospective data set reused many of the hyper-parameter values from the corresponding iterations of the prospective case (Table S2). Distributions of the input feature values were normalized to zero mean and unit variance for each feature independently. Similar normalization of the training free energies was also attempted. However, better model accuracies were observed with manually optimized scaling and bias values (Table S1).

Model Training

Training of models was done for 2000 or 20 000 epochs with L1 loss function (absolute error between the prediction and training data). The stochastic gradient descent optimizer[58] with a momentum of 0.9 and batches of up to 500 training ligands were used. An exponentially decaying learning rate of 0.005 × 0.1epoch/10000 was employed. Inverse frequency weighting was used to weigh the loss from individual training examples based on a Gaussian kernel distribution of the training free energies to remove bias due to overrepresentation of medium and high affinity ligands (Figure C). Early stopping based on cross-validation loss was used to limit overtraining.

Figure 3

Characterization of the retrospective data set and progression of one repeat of the active learning protocol using the narrowing selection rule to pick 100 new ligands per iteration. (A) T-SNE embedding into 2D space based on Tanimoto similarity coefficients shows three clusters of strongly binding ligands. (B) Distribution of the experimental binding free energies shows the overall number of such strong binders to be small. As the protocol progresses, the number of identified high affinity ligands increases (C), as at each iteration 100 new ligands are selected (D) to be added to the training set. The inset in panel (D) shows distribution means (with 95% confidence intervals) and strongest binders found at each iteration. After the first iteration, the distribution of binding free energies predicted using models based on the 2D_3D representation (E) remains stable. Ligands selected at each iteration in one repeat of the protocol (F, left) and the neural network predictions for the binding free energies of all the ligands in the same iteration (F, right).

Ligand Library Construction

The ligand library for the prospective PDE2 inhibitor study in the Part II of the manuscript was constructed around a modified core from the4D09PDB entry.[19] A manual examination of the data set explored in Part I revealed that chlorination of the cyclohexene ring at different positions and addition of a methyl or a difluoromethyl to the tricycle led to better binding affinities, and a single combination of these features was chosen as the core of the current library; chemical space exploration was restricted to the remaining R-group (Figure A). The various R-groups were built up from fragments present in the data set from Part I to increase the likelihood of synthetic accessibility of the ligands.

Figure 4

Ligand library construction and validation of alchemical free energy calculations. (A) For the library construction, one ligand scaffold was selected from the data set investigated in the Part I. The scaffold was combined with a set of linkers (Figure S6) and termini, using the chemical groups marked in red to form the covalent bonds. (B) Validation of binding free energies obtained with nonequilibrium free energy calculations against experimental results.

Such fragments were obtained by removing the common cores from each ligand series in the data set and decomposing the remainders into chemical groups with the BRICS algorithm[59] as implemented in the RDKit version 2021.03.3[24] while keeping track of the atoms bonding to the cores and to other fragments. This resulted in two groups of fragments: linkers (Figure S6), which directly bond to the cores, and termini (Figure A), which bond to the linkers. The library R-groups were assembled by attaching each linker to the core by the same atom it would attach to the cores in the original data set from Part I. Different numbers and combinations of termini were then added to the designated linker’s atoms.

Results

Active Learning Cycle

Throughout the work we employ an active learning cycle, as depicted in Figure , to explore chemical space of PDE2 inhibitors. In the AL cycle, the process is started by assembling a chemical library of interest and initializing the procedure by a weighted random selection of a batch of compounds for the first iteration to ensure ligand diversity. The binding affinities of the selected ligands are evaluated in an alchemical free energy calculation procedure. These ligands together with the obtained affinity estimates form a training set for machine learning (ML) models, which, in turn, predict binding affinities for all the ligands in the chemical library. In the next iteration, another set of compounds is selected, and the same steps of the cycle are repeated. This way, the training set keeps increasing, thus improving the accuracy of the ML predictions. Most of the compounds with the highest binding affinity are identified in a small number of iterations of the cycle. In the process only a small fraction of the chemical library is evaluated explicitly with the computationally expensive physics-based approach, while affinities for the rest of the ligands are predicted by the ML model.

Figure 1

Active learning scheme. Models are trained to reproduce free energies obtained experimentally or computed by MD. At each iteration a batch of ligands is selected to be added to the training set based on their predicted free energies according to the previous iteration’s models. Iterative training of the models with an increasing training data set improves prediction accuracy: most of the top binders are identified by probing only a small part of the whole chemical library. To optimize the parameters of the active learning protocol, we start with applying this scheme on a large collection of PDE2 inhibitors for which the binding affinities have been measured experimentally (Part I of the study). This allows us to replace the computationally expensive alchemical free energy calculation step with a simple lookup table of measured affinities. In doing so, in Part I of the study we are able to explore various approaches for ligand encoding, their selection procedures, and the effects on the ML predictions. In Part II of the study, we apply the active learning cycle prospectively, now using alchemical free energy calculations to guide the model training.

Part I: Protocol Evaluation on a Retrospective Data Set

In the first part of the investigation, we explored the efficiency and convergence of the active learning protocols on a data set of PDE2 inhibitors with experimentally measured binding affinities. The collection of 2351 ligands interacting with PDE2 has been assembled in Janssen Pharmaceutica from the corresponding drug discovery project. This ligand set presents a convenient case for probing different versions of model building protocols, directly based on experimentally measured ΔG values, rather than relying on computational methods. This way, the oracle in Figure is represented by the experimentally obtained affinities. We started by generating binding poses for the ligands relying on the 6EZF crystal structure of PDE2, followed by encoding ligand representations for machine learning. As this collection contains molecules with a variety of chemical scaffolds, ligands could not be uniquely described solely by R-groups attached to a single scaffold. Hence, only representations involving features of complete ligands were used.

Ligand Representation

Both ligand representation and their selection protocol are essential components for the efficiency and accuracy of the active learning protocol in Figure . First, we evaluated the effectiveness of different ligand representations by encoding the ligand library with diverse 2D and 3D ligand descriptors (2D_3D), ligand–protein interaction fingerprints (PLEC), interaction energies from molecular mechanics force fields (MDenerg, MDenerg_LR), and grid-based ligand representations (atom_hot and atom_hot_surf) (Figure , top row). These ligand representations are described in more detail in the Methods section.

Figure 2

Accuracy comparisons of different ligand representations (top) and ligand selection strategies (bottom). Representations are evaluated with the greedy selection scheme, while the selection strategies with the 2D_3D representation composed of all the features supported by RDKit. Relying on a simple greedy selection rule, we performed the cycles of active learning protocol choosing 100 ligands at a time and using the experimentally measured ΔG values for reference. The 2D_3D representation composed of all the chemoinformatic features supported by RDKit[24] consistently outperformed the physics-based representations (MDenerg, MDenerg_LR, and PLEC) both in model accuracy and in the rate at which strongly binding ligands were identified. The PLEC fingerprint representation was finding strong binders much slower than the others. Meanwhile, the 2D_3D representation yielded the same top ligands more consistently than other representations (Figure S1).

Ligand Selection Strategy

Having identified the 2D_3D descriptor representation as the most robust molecular encoding, we further investigated the performance of ligand selection strategies. In the above analysis we used the greedy strategy of selecting the strongest binders predicted at every iteration. While this leads to rapid improvement of binding affinities, it runs the risk of getting trapped in the first local minimum that is found. To mitigate this risk, we developed the narrowing strategy, where in the first three iterations we instead focus on broadening the scope of exploration in the chemical library and in the later iterations switch to the greedy selection mode. For these first three iterations, we train separate models for all the ligand representations discussed above as well as binned variants of MDenerg and MDenerg_LR, select the five that have the best internal cross-validation RMSE, and use the top 20 ligands predicted by each of them. If multiple representations yield the same ligand in their top 20s, we use the next best ligand from one of the representations, making sure that we select 100 unique ligands for the next iteration. After the third iteration we switch to the greedy approach with the 2D_3D representation to descend deeper into the best local minimum found so far. All in all, this strategy performs similarly to the greedy approach and finds the most potent binders in the library at a comparable rate (Figure bottom row). In addition to the greedy and narrowing protocols, we probed three more strategies: uncertain, mixed, and random, all shown in the lower panel of Figure . One might expect that random selection of ligands to construct the ML models would yield suboptimal predictions, yet this depends on the particular objective that is set for exploring chemical space. The random approach describes well the general features of the chemical library (low RMSE, high correlation). This comes as a consequence of arbitrarily selecting ligands of diverse chemistry for model training. However, such a seemingly good performance of the random selection comes with a shortcoming: few potent compounds are being selected (low percentage of Top 50 ligands). Thus, random sampling of the chemical space could be used to obtain a general description of the library, but for ligand affinity optimization a different strategy might be preferred. The uncertain strategy prioritizes selection of the predictions for which the model showed the largest uncertainty. Here, we model this uncertainty as standard deviation in predictions of 5 ensemble models trained on the same data with different random starting weights. Similar to random selection, this selection strategy places no priority on finding strong binders, yet both the rate of their discovery and the accuracy of their predicted free energies are worse than in the random selection of ligands. We also explored a mixed strategy first proposed by Yang et al.[9] This strategy selects the most uncertain ligands among a larger number of the strongest predicted binders. While the mixed strategy outperforms random ligand selection, in contrast to the finding of Yang et al., it identifies desirable ligands slower than the narrowing and greedy approaches. Here, we used a 3:1 ratio of the number of selected predicted strongest binding ligands to the number of selected most uncertain ones among them. In the original work by Yang et al. the ratio was 50:1, possibly explaining the observed difference in performance. However, a ratio this large was impractical in our case given the limited size of our data set of 2351 ligands only. In practice, the performance of this selection approach can be tuned by changing the above ratio, becoming equivalent to the greedy strategy with a 1:1 ratio, i.e., selecting only the strongest binders, and equivalent to the uncertain strategy when the identified strongest binders cover the whole data set. However, such tuning is impractical in a real prospective study, as it requires rerunning the AL protocol multiple times and finding reference free energies for all the discovered ligands in each repeat to identify the optimal ratio. As the protocol progresses and more ligands are added to the training sets, all selection strategies result in better agreement with the predictions from the final iteration, but the uncertain strategy improves correlations the fastest (Figure S2). Overall, the greedy, narrowing, and mixed approaches are able to quickly locate the best binding ligands. The greedy and narrowing approaches do this faster and more consistently identify the same high affinity binders (Figure S3). However, the Kendall’s rank correlation between the predicted and experimental binding free energies is low for all the selection methods besides the uncertainty-driven one (Figure S4). This results in large numbers of ligands with lower experimental affinities being selected for evaluation and inclusion into the training data set at each iteration. To select more active ligands in each iteration, one can simply evaluate more ligands per iteration. While this does improve the Kendall-tau correlation and the rate of discovery of strong binders in the early iterations, increasing the number of randomly selected ligands evaluated to build the very first model significantly decreases the number of identified active compounds per number of evaluated ligands in the starting iteration (Figure S5). Overall, the uncertain and random ligand selection sampling covers broadly the chemical library and provides a better overall description of the chemical space. However, to efficiently identify most potent binders, other strategies, e.g., greedy or narrowing, are preferred. As we are not primarily interested in an accurate description of medium and low affinity compounds, we can choose to sacrifice the accuracy of the general data set quantification and proceed with those strategies that are capable of best uncovering the most potent compounds.

Active Learning

As the ligand encoding and selection strategies have been explored, we further illustrate the overall active learning based chemical space exploration cycle using the same experimentally characterized set of PDE2 inhibitors (Figure ). Here, we relied on the narrowing protocol of ligand selection and performed 6 iterations of model training and the subsequent binding affinity prediction. As the ligand library is analyzed retrospectively, we readily have access to the experimentally determined affinities (Figure A,B). Characterization of the retrospective data set and progression of one repeat of the active learning protocol using the narrowing selection rule to pick 100 new ligands per iteration. (A) T-SNE embedding into 2D space based on Tanimoto similarity coefficients shows three clusters of strongly binding ligands. (B) Distribution of the experimental binding free energies shows the overall number of such strong binders to be small. As the protocol progresses, the number of identified high affinity ligands increases (C), as at each iteration 100 new ligands are selected (D) to be added to the training set. The inset in panel (D) shows distribution means (with 95% confidence intervals) and strongest binders found at each iteration. After the first iteration, the distribution of binding free energies predicted using models based on the 2D_3D representation (E) remains stable. Ligands selected at each iteration in one repeat of the protocol (F, left) and the neural network predictions for the binding free energies of all the ligands in the same iteration (F, right). Analyzing the chemical space reveals three clusters of high affinity binders, yet the overall number of such ligands is low. The aim of the active learning procedure is to identify these potent molecules. We start with a weighted (to reduce the chances of very similar ligands being chosen) random selection of 100 ligands from our library and retrieve their binding free energies. Following the narrowing selection approach, the active learning protocol broadly explores the chemical landscape due to disagreements between models based on different ligand representations in the first two iterations. By iteration 2, the top models agree that the best ligands reside in the same three clusters we identified (Figure A). From the third iteration the narrowing selection rule behaves exactly as the greedy protocol. The procedure switches to training an ensemble of five models only on the 2D_3D representation. The next 100 ligands are selected based on the mean prediction of these five models, allowing for some cancellation of errors in regions of insufficient training data as a result of this ensemble approach. As active learning continues, it focuses on the three high affinity regions and selects ligands with lower experimental binding free energies than initially (Figure D). The training sets also become progressively biased toward strongly binding ligands with each iteration (Figure C). Despite this, the distribution of predicted binding free energies does not change much following the initial iteration (Figure E). Eventually, though, after the majority of the strongly binding ligands have already been identified and added to the training set, their pool is exhausted. With the decrease in number of strong yet unidentified binders, the models begin to backfill the training set with weaker binding ligands: the strongest identified binders at iterations 5 and 6 do not outcompete the best binders identified earlier (inset in panel D of Figure ). As the probability of finding even higher affinity ligands now drops with each iteration while the computational cost remains the same, about 5 iterations appears to be optimal for halting the active learning process.

Figure 5

Progress of the active learning algorithm used with the prospective ligand library. Validation ligands highlighted inside the (A) t-SNE embedding[36] for the prospective library and (B) the distribution of their experimental binding free energies. (C) Distribution of the calculated free energies of the training ligands, and (D) those selected for evaluation of binding free energies at each iteration. The inset in panel (D) shows distribution means (with 95% confidence intervals) and strongest binders found at each iteration. (E) Distribution of predicted free energies over the whole prospective ligand library at each iteration. Calculated binding free energies of the ligands selected to be added to the training set at each iteration and free energies predicted by the models at those iterations all displayed on a t-SNE embedding of the whole prospective ligand library. Arrows indicate progress of the active learning protocol.

Part II: Prospective Ligand Optimization

Having verified and identified the limits of adaptive learning protocols in a retrospective analysis (Part I), we have applied this approach prospectively to identify novel high affinity PDE2 inhibitors. In Part II, the alchemical free energy calculations were used as an oracle in the active learning cycle (Figure ).

Library Generation and Alchemical Oracle

For the search of potent inhibitors, we constructed a custom library of 34 114 compounds. For that, we selected one scaffold from the data set analyzed in Part I as a core and attached varying R-groups at a common position. Such library design based on a common scaffold ensures that relative binding free energies can be calculated accurately, thus providing reliable decisions by the oracle in the AL cycle. Each R-group was composed of functional groups present in the data set analyzed in the first part of the investigation. The R-groups comprised a linker attached to the core (Figure S6) decorated with up to three terminal groups (Figure A). Ligand library construction and validation of alchemical free energy calculations. (A) For the library construction, one ligand scaffold was selected from the data set investigated in the Part I. The scaffold was combined with a set of linkers (Figure S6) and termini, using the chemical groups marked in red to form the covalent bonds. (B) Validation of binding free energies obtained with nonequilibrium free energy calculations against experimental results. Since in the prospective analysis experimental affinity measurements were not available, we computed binding free energies of the ligands selected by the protocol of MD-based alchemical simulations and trained the neural networks on the calculated affinities. The alchemical approach yields accurate predictions of free energies, which we benchmarked on a subset of ligands from the experimental library from Part I that share the same scaffold as the prospective library (55 molecules). The root mean squared error (RMSE) of the computed values compared to the experimental measurement was 1.1 ± 0.3 kcal/mol (Figure B). In addition to the computational error, experimental measurements also have an associated uncertainty. For a similar set of PDE2 inhibitors a standard deviation for measurements of a bioactivity assay is reported to be 0.3 kcal/mol.[60] This, however, likely represents a lower bound of the true experimental error, as repeated pIC50 measurements for the same compound and protein show standard deviations of ∼0.9 kcal/mol.[61] All in all, propagating uncertainties from computation and experiment, we estimate the difference between the computed and measured ΔG to have an associated error of 1.1–1.4 kcal/mol. The benchmarked compounds together with the reference ligand (55 molecules total) also appear in the currently investigated prospective library and, so, are further used for validation of model predictions (Figure A,B). Progress of the active learning algorithm used with the prospective ligand library. Validation ligands highlighted inside the (A) t-SNE embedding[36] for the prospective library and (B) the distribution of their experimental binding free energies. (C) Distribution of the calculated free energies of the training ligands, and (D) those selected for evaluation of binding free energies at each iteration. The inset in panel (D) shows distribution means (with 95% confidence intervals) and strongest binders found at each iteration. (E) Distribution of predicted free energies over the whole prospective ligand library at each iteration. Calculated binding free energies of the ligands selected to be added to the training set at each iteration and free energies predicted by the models at those iterations all displayed on a t-SNE embedding of the whole prospective ligand library. Arrows indicate progress of the active learning protocol.

Active Learning of the Prospective Data Set

The behavior of the active learning protocol trained on MD based free energies was similar to the models trained on experimental free energies in the retrospective analysis in Part I (Figure ). The protocol initially explored the chemical space to find promising high affinity regions (Figure F). After the switch to using the best predicted ligands, greedy selection, from models based only on the 2D_3D ligand representation in the third iteration, the protocol predominantly focused on one region of chemical space. In this subspace, a substitution of a benzene ring directly bound to the scaffold was preferred, and probing molecules with this chemistry returned higher affinity hits (Figure C,D). Finally, at the sixth iteration, the number of found high affinity ligands dropped, indicating their pool was likely exhausted. Therefore, the protocol was stopped at this point. The RMSEs for both the experimental validation ligands (green curve) and those yet to be evaluated via simulations (purple curve) show low RMSE values, by the end of the learning cycle reaching 1.00 ± 0.09 and 1.19 ± 0.08 kcal/mol, respectively (Figure A). These error magnitudes are already on par with those expected from the reference MD simulations. Furthermore, as the iterations progress and more ligands are added to the training set, model accuracy increases. This can be seen from the decrease in RMSE for ligands that will be evaluated for subsequent iterations (Figure A). The prediction accuracy for validation ligands (molecules with the experimentally measured affinities) does not change much with every iteration, yet a low RMSE value of ∼1 kcal/mol is retained. As the ligands from the validation set all have fairly low binding affinities, few chemically similar molecules are added to the training set, leading to little opportunity for improvement in those regions of chemical space.

Figure 6

Metrics of model accuracy (A, B) in active learning on the prospective ligand library and a comparison of the predicted and reference binding free energies used in its sixth iteration (C). Predictions for validation ligands are compared to experimental binding affinities, while for all other ligands free energies from simulations are used as the reference. Vertical dotted lines in (A) and (B) represent the switch from using models from multiple ligand representations to a single one. For regression models, a true positive rate can only be determined relative to a threshold. Here we use a threshold of −13.29 kcal/mol, equal to the experimental binding free energy of the best binding validation ligand and depicted as dotted lines in panel (C). The gray region indicates absolute unsigned errors of up to 1 kcal/mol. Error bars represent standard errors of the mean and for RMSE and TPR are determined via bootstrapping. Both ligands from the validation set and those selected at each iteration have narrow dynamic ranges of predicted binding free energies (Figure C). Combined with the remaining model errors, this leads to low correlations between the predicted and reference free energies, when evaluated within those ligand sets. Additionally, the predicted free energies of the strongest binders selected at each iteration are often underestimated (Figure C). Nevertheless, the models are still able to distinguish strongly and weakly binding ligands, reaching near perfect true positive rates for the yet unmeasured ligands in later iterations (Figure B). Furthermore, every iteration provides further enrichment of high affinity ligands in the training set.

Strongest Binders

We terminate the active learning procedure for the prospective data set after six iterations and further explore the identified highest affinity binders. The chemical space exploration yields several ligands with computed binding free energies below −17 kcal/mol that have been found, while the lowest binding free energy of the experimentally known ligands with this scaffold was above −14 kcal/mol (Figure B). The best binding ligands found in the prospective library are depicted in Figure . All the high affinity ligands show a similar pattern of substitution at the same functional group. A benzene ring serves as a linker to the scaffold, and two groups decorate the ring: a single halogen atom at the 4 or 5 ring position and a longer flexible hydrophobic group that sometimes also contains an electronegative atom.

Figure 7

Strongest binding ligands found, their calculated binding free energies, and binding poses sampled with molecular dynamics.

Strongest binding ligands found, their calculated binding free energies, and binding poses sampled with molecular dynamics. For the strongest binders, the increase in affinity is provided by the interactions with hydrophobic residues in the vicinity. We analyzed the most frequent contacts in the simulations for the protein ligand complexes in Figure and observed that the optimized functional group mostly interacted with leucine, methionine, and histidine side chains. The halogen atoms and ether group also allow for favorable contacts with serine and glutamate.

Discussion

Ligand Selection

The nature of the chemical space exploration by means of active learning (Figure ) can be strongly influenced by the strategy of ligand selection for ML model training. The random and uncertain strategies provided a decent general description of the compound library, yet for a task of lead optimization, one might prefer to obtain a better description of the most potent binders. The greedy and narrowing ligand selection approaches rapidly find strongly binding ligands, at the cost of better describing high affinity ligands than low affinity ones. For example, the last iteration of the active learning protocol on the prospective library predicts binding free energies for the ligands with the strongest predicted binding at an RMSE of 1.06 ± 0.14 kcal/mol, while for the weakest predicted binders the RMSE is 1.69 ± 0.16 kcal/mol (Figure C). This comes about due to the much smaller number of training ligands in the low affinity regions (Figures C and 5C). Still, free energy calculations show only two of the 94 worst predicted binders that were simulated to have stronger affinities than the validation ligands do in experiments, indicating a low false negative rate for such predictions. Although in the prospective investigation of this work (Part II) we used the narrowing selection strategy, in practical terms computing many descriptors required for the first iterations may be computationally expensive. Retrospectively, we have also probed a simple variant of such a narrowing approach, where for the first three iterations ligands are selected randomly and afterward the procedure switches into the greedy mode. Such a random2greedy strategy finds the strongest binders slower; however, the final results in the end offer a good balance between the identified high affinity ligands and the overall accurate description of the chemical library (Figure S8). While the selection strategy and ligand encoding have a strong effect on the accuracy of the models, the AL procedure appears to be robust with regard to the initial conditions. When starting with disparate ligand selections from localized chemically similar molecule clusters, the final models converge to comparable final predictions (Figure S9). This is encouraging, as in practical applications it may be convenient to initialize the AL cycle from the chemically similar congeneric series of compounds with experimentally readily measured affinities. The recall of the AL approach appears to be robust and independent of the exact number of actives in the set (Figure S10), thus making it an appealing candidate for large scale prospective studies.

Predicted Weak Binders

While we have already inspected the chemistry of the predicted high affinity compounds (Figure ), it is also interesting to understand what molecules were identified by the models as particularly weak binders. It appears that, in the prospective study (Part II), a large fraction of molecules predicted as low affinity binders contained sulfonyl groups (Figure C). Although alchemical calculations did not find any of the sulfonyl containing ligands to be strong binders, some did reach medium binding affinities of −15 kcal/mol. Thus, sulfonyl should not necessarily be a disqualifying factor for ligand affinity to PDE2. Yet, why does this chemical group dominate the predicted low affinity ligands?

Molecular Composition Bias

We identified this tendency to be largely due to use of inverse-frequency weighting,[62] which scales the impact of the ligands on the model based on the inverse probability of their reference free energy in the training set (SI Figure S7). While this technique was intended to compensate for bias due to increasing number of active compounds in the greedy and narrowing selections, it also has a side-effect causing ligands from poorly sampled regions of the ΔG spectrum to have a larger impact on the loss function. Inverse-frequency weighting does not significantly change the free energy distribution of sulfonyl containing ligands. Instead it makes other ligands less likely to be classified as weak binders. Therefore, while compensating for bias due to the free energy distribution of the training set, inverse-frequency weighting also exposes bias due to ligand composition. One approach to control the molecular composition bias in ligand selection for model training is to use ligand representations that do not rely directly on molecular composition but instead on physical interactions between the ligand and the protein. Examples of such representations are interaction energies computed using molecular mechanics force fields (MDenerg) or protein–ligand interaction fingerprints (PLEC). In the current study, however, none of these representations were able to outperform simple 2D_3D ligand based descriptors. More generally, training models on ligand-only information leads to memorization of ligand features,[63] even across different host proteins.[64] A wishful thought is for the model to learn the underlying physics by providing protein–ligand interaction descriptors as input. In practice, though, doing so hardly improves the accuracies of the models in question,[64] at least not without extensive sampling of the model applicability space in the training set.[65] Furthermore, many lead optimization data sets such as that used here exhibit 2D bias as the molecules were often designed and synthesized iteratively based on underlying 2D chemical similarity principles.

Further Improvements for the AL in Chemical Space Exploration

Drawing ligand selections from a variety of models built on different representations was expected to help active learning more uniformly sample the chemical space in the early iterations. However, since only the best predicted binders were selected from every model, the narrowing scheme performs similarly to the simple greedy protocol, which relies on a single ligand representation (Figure ). Both approaches are efficient at quickly exploring a narrow branch of the chemical space to identify potent binders. Interestingly, the uncertainty-driven protocol performs well in an overall description of the chemical library. It sacrifices the ability to quickly identify high affinity binders, but includes a broad range of ligands in the training set, thus providing a more accurate global description of the ligand set. For future investigations, it might be a promising avenue to combine uncertain and greedy protocols either into a narrowing-style scheme, where the first few iterations are handled with the uncertain selection rule and later iterations by the greedy one, or a modified mixed scheme, where the ratio of the most uncertain to most strongly binding predictions is changed at each iteration to smoothly transition from the uncertain to greedy selection during active learning. Reliance on MD calculations for the ground truth to train the active learning models allows one to perform ligand optimization completely in silico. While docking would be a faster alternative, MD calculations provide a much more accurate measure of the binding free energy, one that explicitly takes entropic contributions into account. Nevertheless, MD still suffers biases due to force field errors and uncertainties from incomplete sampling of phase space. While this approach can eliminate the need for effort intensive ligand synthesis and experimental affinity measurements during the course of the ligand optimization process, affinities of the final ligand selections still need to be experimentally validated.

Conclusion

In the current work we have developed an approach for lead optimization combining active learning and alchemical free energy calculations. In the first part of the investigation, we calibrated the machine learning procedure on a large data set of PDE2 inhibitors using experimentally measured affinities. Subsequently, in the second part of the work we have used the approach in a prospective manner relying on the calculated binding affinities as an oracle in the active learning cycle. All in all, we demonstrate that the active learning approach can be combined with alchemical free energy calculations for an efficient chemical space exploration, navigating toward high affinity binders. The iterative training of machine learning models on an increasing amount of data allows the number of compounds to be evaluated with the more computationally expensive methods to be significantly reduced. An active learning procedure can be tuned to capture different characteristics of the chemical library: in the current work we demonstrate how to quickly identify the most potent binders, while sacrificing the quality of the overall description of the chemical library. This objective, however, can be altered by the particular choices within the active learning loop.

48 in total

1. Reoptimization of MDL keys for use in drug discovery.

Authors: Joseph L Durant; Burton A Leland; Douglas R Henry; James G Nourse
Journal: J Chem Inf Comput Sci Date: 2002 Nov-Dec

2. Development and testing of a general amber force field.

Authors: Junmei Wang; Romain M Wolf; James W Caldwell; Peter A Kollman; David A Case
Journal: J Comput Chem Date: 2004-07-15 Impact factor: 3.376

3. P-LINCS: A Parallel Linear Constraint Solver for Molecular Simulation.

Authors: Berk Hess
Journal: J Chem Theory Comput Date: 2008-01 Impact factor: 6.006

4. Guiding Conventional Protein-Ligand Docking Software with Convolutional Neural Networks.

Authors: Huaipan Jiang; Mengran Fan; Jian Wang; Anup Sarma; Shruti Mohanty; Nikolay V Dokholyan; Mehrdad Mahdavi; Mahmut T Kandemir
Journal: J Chem Inf Model Date: 2020-10-14 Impact factor: 4.956

5. Large-Scale Assessment of Binding Free Energy Calculations in Active Drug Discovery Projects.

Authors: Christina E M Schindler; Hannah Baumann; Andreas Blum; Dietrich Böse; Hans-Peter Buchstaller; Lars Burgdorf; Daniel Cappel; Eugene Chekler; Paul Czodrowski; Dieter Dorsch; Merveille K I Eguida; Bruce Follows; Thomas Fuchß; Ulrich Grädler; Jakub Gunera; Theresa Johnson; Catherine Jorand Lebrun; Srinivasa Karra; Markus Klein; Tim Knehans; Lisa Koetzner; Mireille Krier; Matthias Leiendecker; Birgitta Leuthner; Liwei Li; Igor Mochalkin; Djordje Musil; Constantin Neagu; Friedrich Rippmann; Kai Schiemann; Robert Schulz; Thomas Steinbrecher; Eva-Maria Tanzer; Andrea Unzue Lopez; Ariele Viacava Follis; Ansgar Wegener; Daniel Kuhn
Journal: J Chem Inf Model Date: 2020-09-03 Impact factor: 4.956