Literature DB >> 35235748

AtomNet PoseRanker: Enriching Ligand Pose Quality for Dynamic Proteins in Virtual High-Throughput Screens.

Kate A Stafford¹, Brandon M Anderson¹, Jon Sorenson¹, Henry van den Bedem^1,2.

Abstract

Structure-based, virtual High-Throughput Screening (vHTS) methods for predicting ligand activity in drug discovery are important when there are no or relatively few known compounds that interact with a therapeutic target of interest. State-of-the-art computational vHTS necessarily relies on effective methods for pose sampling and docking and generating an accurate affinity score from the docked poses. However, proteins are dynamic; in vivo ligands bind to a conformational ensemble. In silico docking to the single conformation represented by a crystal structure can adversely affect the pose quality. Here, we introduce AtomNet PoseRanker (ANPR), a graph convolutional network trained to identify and rerank crystal-like ligand poses from a sampled ensemble of protein conformations and ligand poses. In contrast to conventional vHTS methods that incorporate receptor flexibility, a deep learning approach can internalize valid cognate and noncognate binding modes corresponding to distinct receptor conformations, thereby learning to infer and account for receptor flexibility even on single conformations. ANPR significantly enriched pose quality in docking to cognate and noncognate receptors of the PDBbind v2019 data set. Improved pose rankings that better represent experimentally observed ligand binding modes improve hit rates in vHTS campaigns and thereby advance computational drug discovery, especially for novel therapeutic targets or novel binding sites.

Entities: Chemical

Mesh：

Substances：
Ligands
Proteins

Year: 2022 PMID： 35235748 PMCID： PMC8924924 DOI： 10.1021/acs.jcim.1c01250

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Introduction

Successful drug discovery campaigns rely on identifying biologically active lead molecules that are chemically distinct from known compounds for the disease target. This is especially challenging when there is little or no nearby ligand data available, as is the case with novel targets, or when novel scaffolds are distant in chemical space. Structure-based, virtual High-Throughput Screening methods are designed to overcome this challenge, by identifying novel compounds with predicted activity from vast chemical libraries, for example, MCULE[1] or ENAMINE.[2] vHTS is routinely applied as a first step in the drug discovery process, with hit rates surpassing those of experimental screens.[3−5] Conventional, structure-based vHTS approaches use an empirical- or force-field-based scoring function to dock distinct ligand poses to a mostly rigid receptor and predict affinity. Underlying structure-based vHTS approaches is the assumption that receptor–ligand binding poses correlate with experimentally observed affinities. While conventional approaches have led to several exciting results, including a potent inhibitor for AmpV β-lactamase,[6] they have several important drawbacks. First, identifying a scoring function that simultaneously gives high docking power (distinguishing correct docking poses from decoy poses) and high scoring power (generating an affinity score) has historically been challenging.[7] For example, the widely used molecular docking scoring function AutoDock-Vina[8] excels at pose reproduction but is less competent at correlating poses to affinity, as assessed in the CASF benchmark.[9] Second, conventional methods bear substantial computational cost. Consequently, their ability to predict affinity is limited to small- or medium-sized chemical libraries of tens to several hundred million compounds. Third, the dynamic nature of proteins is exploited in ligand binding.[10] Different receptor conformations can bind different ligand chemotypes, perhaps best exemplified by the “DFG-in” and “DFG-out” states occupied in different ratios by many protein kinases.[11] Docking a ligand to a rigid receptor conformational substate that deviates from its native bound state (e.g., an apo state) can result in inaccurate predictions of the bound complex that are not useful for further drug design applications. Unfortunately, incorporating receptor flexibility and representing binding-competent receptor conformations remain challenging in conventional methods[12,13] and substantially increase computational cost. Machine learning (ML) and deep learning (DL) approaches can mitigate these limitations. ML can help determine features of the receptor–ligand complex that correlate with affinity to augment scoring functions and improve docking and screening power. For example, the ΔvinaRF20 scoring function combines 20 ligand, protein, and pharmacophore features selected among a larger set of candidate features with random forest regression.[14] Postscoring with ΔvinaRF20 improved docking and screening (ranking) power compared to the baseline AutoDock Vina scoring function.[8] A major advantage of learning approaches compared to conventional methods is that they can capitalize on the rapidly increasing availability of data to improve accuracy.[15] In contrast to ML-based methods, DL-based methods avoid the requirement to specify features and instead learn relevant features directly from structural representations of the protein–ligand complex. In recent years, structure-based DL architectures[16−21] have enabled vHTS and contributed to the discovery of numerous new leads for drugs, often for challenging protein targets and diseases.[22−24] However, early structure-based DL approaches appeared to learn molecular features from a pose-free structure-based descriptor only, neglecting ligand binding modes and protein ligand interactions.[25] This is especially detrimental to predicting affinity for ligands docked to noncognate receptors, which is a typical use-case in structure-based drug discovery (SBDD). To enforce sensitivity to protein–ligand interactions, more recent DL vHTS approaches include ligand binding mode information, either as a feature[26] or as a training label in a multitask architecture.[27,28] Simultaneous learning on ligand binding modes substantially improved activity screening and led to better generalization beyond the training set. Importantly, predicting correct ligand binding modes has merit in its own right.[29] Precise protein–ligand interactions are vital for developing structure–activity relationships in hit-to-lead and lead-optimization applications downstream from vHTS.[30] The efficacy of structure-based virtual screening campaigns relies on an adequate representation of the protein conformational ensemble. Proteins and their ligands undergo conformational exchange under physiological conditions,[31−33] and in many cases, ligands bind through induced fit (Figure A)[34,35] or bind short-lived intermediate states through conformational selection (Figure B).[36] In those situations, the binding site of a crystal structure may be partially or even fully occluded in the absence of a ligand, hindering the discovery of potent binders. In cases where the protein’s cognate ligand has small molecular weight, a holo crystal structure would limit opportunities for docking larger compounds even though those could be accommodated by the protein’s full conformational ensemble (Figure C). Similarly, structurally uncharacterized disease mutations distal to the ligand binding site can shift the protein conformational ensemble, dramatically reducing or even depleting populations favorable for ligand binding in wild-type protein in vivo (Figure D,E).[33] Such a situation would manifest itself by unfavorable binding kinetics in experimental assays despite highly ranked compounds in the virtual screen. Screening against multiple receptor conformations can fail to enrich experimentally validated active compounds if the conformations are higher-energy and unlikely to be accessible in solution. Careful selection of receptor states[38,39] can mitigate these effects of protein dynamics and increase virtual screening performance.[40]

Figure 1

Binding site plasticity challenges vHTS in structure-based drug design. A) Induced fit. B) Conformational selection. C) The GluN2A NMDA receptor ligand-binding domain in complex with its cognate activating ligand l-glutamate (l-GLU, pdb ID 4nf8 (slate)) and antagonist 1-(phenanthrene-2-carbonyl)piperazine-2,3-dicarboxylic acid (PPDA, pdb ID 4nf6 (gray)). The l-Glu occupied site requires dramatic structural changes to accommodate the larger antagonist. D) Mutations distal to the binding site (yellow) can shift the conformational equilibrium, thereby stabilizing nonbinding substates or abrogating substates that are required for conformational selection. E) The compound imatinib in complex with Abl kinase (pdb ID 2hyy, slate). Imatinib binds Abl kinase in the inactive I2 state, which occupies a population of ∼6% in WT Abl (pdb ID 6xrg, gray). A distal H415P mutation in Abl kinase, at 18 Å from the active site, reduces affinity for the inhibitor imatinib 5-fold. H415P destabilizes the I2 state, reducing its population to below detectable levels (pdb ID 2f4j, salmon). Note the A-loop in the open conformation. How to optimally represent protein conformational states computationally and how to generalize docking scores to (unseen) conformational substates remain important open problems. Some, but not all, proteins with ligands in the Protein Data Bank (PDB) are structurally resolved in multiple conformations. For individual systems, sophisticated and resource intensive protocols including molecular dynamics simulations and Markov state models can access biologically relevant conformations,[41,42] but these methods scale poorly to large databases of thousands of receptors. Time-independent sampling is a less resource-intensive alternative to access conformational substates. A common limitation of screening with multiple conformations, of any origin, is the challenge in comparing docking scores for molecules docked to different conformations. This has limited the adoption of earlier ensemble approaches, but machine learning techniques can mitigate this problem.[43] Importantly, conventional docking and scoring protocols cannot internalize receptor conformational variability encoded in the thousands of structurally resolved receptor–ligand complexes in, for example, the PDBbind data set.[44,45] While DL approaches are relatively underexplored in ensemble docking, they can, in principle, learn and generalize how receptor flexibility accommodates distinct binding modes, both of cognate and noncognate ligands. Here, we introduce a DL-based method for binding mode prediction that exploits protein conformational ensembles instead of single structures in vHTS applications. Starting from a protein crystal structure or homology model, we use Rosetta’s comparative modeling protocol (RosettaCM)[46] to sample low-energy conformations near the crystal structure to generate a six-member conformational ensemble, which includes the starting structure. We applied this protocol to each receptor-compound pair in PDBbind v2019 to generate FlexPDBbind v2019, a curated data set of conformational ensembles of drug targets. To enrich the quality of compound poses docked to these ensembles, we adapted AtomNet GRAPHite, a new directional message passing graph convolutional network, for Pose Ranking applications, which we describe as AtomNet PoseRanker (ANPR). To maximize learning on distinct receptor conformations, we additionally trained ANPR on noncognate poses, obtained from cross-docking ligands to distinct crystal structures of the same protein (Uniprot identifier) in the PDBBbind v2019 data set. We first demonstrate that ANPR rankings enrich pose quality for compounds docked to single conformations of their cognate receptors as well as noncognate receptors in the cross-docked PDBbind v2019 data set compared to conventional approaches. Our results indicate that training on a cross-docked data set teaches ANPR to recognize distinct poses as valid in different receptor conformations. ANPR trained on FlexPDBbind v2019 achieved nearly the same enrichment, suggesting that computationally sampled conformational ensembles can augment experimental ensembles when the latter are not available. Finally, we examined how conformationally diverse ensembles affect enrichment of active compounds in a virtual screen of Abl kinase. Ensemble-based procedures generally outperformed those using a single conformation. Strikingly, however, we found that using a single conformation with a cross-docked trained ANPR achieved nearly identical enrichment as that using an ensemble, suggesting that DL approaches can learn and infer receptor flexibility from the training set.

Methods

Preparing Train and Test Data Sets

We downloaded the PDBbind data set v2019 (http://www.pdbbind.org.cn) and removed entries that contained cofactors, incomplete ligands, more than one ligand, incorrect valences, entries annotated as “NMR”, or entries for which the ligand was annotated as a peptide. This resulted in a data set containing 4,593 crystal structures from the “refined” set and 10,011 structures from the “general” set. We split the data set into train (10,919) and test (5,206) sets by requiring that receptors share less than 70% or 50% sequence similarity between the sets (“seqsim70” and “seqsim50” splits). For comparison, we also generated train and test sets identical in size to the seqsim70 and seqsim50 sets but split by enforcing that these sets do not share identical Uniprot identifiers (“Uniprot split”).

Preparing Ligand Structures

We generated ligand structures for docking based on the ligands provided by PDBbind. To avoid biasing poses toward the crystal structures, and in contrast to previously reported studies, we discarded the native compound conformation and generated a UFF energy-minimized starting conformation from the ligand SMILES supplied by the PDB using RDKit.[47] This approach better reflects commercial vHTS campaigns, where a crystal pose or experimental structure of the ligand is typically lacking. We used a single low-energy ligand conformation for docking.

Preparing Receptor Structures

We prepared the receptors for docking by a three-step process. For consistency with our generated ensembles, we first read the crystal structure into Rosetta and removed any crystallographic buffer or water molecules but retained metal ions of the following types: Na, Fe, Mg, K, Mn, Zn, and Ca. Using Rosetta, we filled in missing atoms for incomplete protein residues. We then define a bounding box surrounding each initial ligand used as the search space for subsequent docking. For this study, we accepted the protonation states of titratable residues found in PDBbind (Figure ).

Figure 2

Flowchart illustrating data processing pipeline. Beginning from the PDBbind v2019 protein–ligand data set, we clustered ligands for all structures of the same Uniprot into distinct binding sites to define cross-docking groups. We used Rosetta (weekly release r231, 2019) to prepare the structures for docking, including filling in missing side-chain atoms and adding protons. We prepared 3D conformations for each ligand using RDKit and used these conformations to dock to the crystal structures within each binding site cluster. We additionally prepared conformational ensembles using the Rosetta hybridization protocol and docked ligands within each binding site cluster.

Cross-Docking

We superimposed receptors of the same Uniprot ID using the chain identifiers present in the “pocket” pdb file from PDBbind, keeping receptors that superimposed to within 5 Å. We identified different binding sites within the same Uniprot ID by clustering the center of mass of the superimposed ligands with DBSCAN[48] (parameters eps = 5., min_samples = 1). A small number of Uniprot IDs (“targets”) are represented by hundreds of receptor-compound pairs in PDBbind. To avoid those from dominating the cross-docked data set, we randomly selected five representatives from among the receptor-compound pairs to be included in the cross-docking procedure. Cross-docking was performed on all members of each binding site cluster. The final training set consisted of targets that sampled at least one low-RMSD pose. This procedure gave a total of 27,166 target-compound pairs.

Ensemble Docking

We generated five diverse receptor conformations using the Rosetta hybridize protocol (Supporting Information) with the score3 and score4_smooth_cart scoring functions in stages 1 and 2 of the centroid stage and the ref2015_cart scoring function with metalbinding_constraint in the full atom stage. Ligand parametrization relied on Rosetta’s templates whenever available or was generated with Rosetta’s molfile_to_params.py script using default settings. We used default weights for intraligand hetatm_cst_weight = 1.0 and receptor–ligand hetatm_to_protein_cst_weight = 1.0 constraints. Fragment insertions were disabled in stage 2: fragprob_stage2 = 0.0. We adjusted the stage 2 Monte Carlo temperature to 0.5: stage2_temperature = 0.5. Next, we subjected each hybridized receptor conformation and the crystal structure to six rounds of energy relaxation with Rosetta’s FastRelax protocol, retaining metal coordination with the SetupMetalsMover protocol combined with a metalbinding_constraint = 1.0. We observed that protein side-chain metal coordination was not preserved consistently throughout the hybridize protocol. In some cases, that led to final models with unsatisfied coordinate-covalent bonds. We created a final ensemble of six receptor conformations by selecting the conformation with the lowest Rosetta energy from each of the six relaxed conformations starting from the hybridized models or crystal structure.

Docking

We used a slightly modified version of the smina docking software[49] with the vina scoring function to generate binding poses for the receptor-compound pairs with command line parameters --exhaustiveness 384 --energy_range 99999 --num_modes 64 --mc_steps 3 --minimize_iters 40 --accurate_line --approximation linear --autobox_add 2.0 --seed 42. Our modification introduces the additional parameter mc_steps which is used to tune the number of steps in the Monte Carlo search. For cross-docking, we determined bounding boxes for the docking site using the largest ligand in the target class. We then added 2 Å to define a small buffer region surrounding this expanded search space (using the --autobox_add parameter). The remaining smina input parameters were optimized primarily for computational efficiency in docking ultralarge data sets, for higher rates of successful pose generation even at lower ranks, and to ensure a diverse set of negative examples. For each receptor-compound pair, we generated up to 64 poses with the default 1 Å minimum difference between poses. We labeled poses within 2.5 Å of the native pose a “hit”, and those greater than 4 Å a “miss”. Although pose classification work often uses a 2.0 Å threshold to define success,[9] we selected the slightly more generous 2.5 Å threshold to accommodate slight differences in alignment within our sampled ensembles, a common approach when representing receptor conformational flexibility.[50] Poses in between were discarded from the training set but retained in the testing sets. Note that this process results in a highly imbalanced data set; for cross-docking, the data set consists of 5.0% positive examples, 15.5% intermediate examples, and 79.5% negative examples. We calculated the RMSD between corresponding heavy atoms of docked poses and native crystal structures of the corresponding compounds by first matching substructures using RDKit and accounting for symmetric substructures by using the minimum RMSD in the case of multiple matches. For the final data set, we required that each compound adopted at least one pose within 2.5 Å of the native pose, evaluated after superposition of protein structures in the case of cross-docking.

Descriptor Generation

From each docked complex, we generated descriptors for use in training and testing our models. We describe each heavy atom with its corresponding SYBYL atom type indicating its chemical environment, generated by converting to Mol2 format using OpenBabel.[51] We do not represent protons explicitly in our final descriptor sets, and we treat all metals as a single type. We treat the protein and ligand as distinct entities and encode their atom types separately.

Markov State Models of Abl

We obtained structures for 16 Abl kinase macrostates from the manuscript by Roux and co-workers (PDB ID 2HYY).[52] We prepared a matching 16-member conformational ensemble using our Rosetta-based ensemble-docking protocol. For each of those ensembles, we prepared the receptors for docking following our standard protocol described above. We prepared the single receptor conformation for this analysis based on the crystal structure 2HYY. We selected 8,946 compounds from our internal databases to dock to Abl kinase, including molecules with known activity (3,205), known nonbinders (4,459), and random molecules (1,282). We ranked compounds using the ANPR model trained on a data set that excluded Abl kinase or receptors with more than 70% sequence similarity to Abl kinase. We calculated enrichment factors as ef(ν) = (ap)–1a(ν)/n(ν), where ap is the total fraction of actives in the data set, a(ν) is the number of actives in the top ν-percentile, and n(ν) is the total number of compounds in the top ν-percentile.

AtomNet PoseRanker

Underlying ANPR is our GRAPHite network architecture, a directional Graph Convolutional Network (GCN)[53,54] in which nodes encode receptor or ligand atoms (Figure A). We do not impose a covalent structure on the graph. Instead, pairs of atoms at layer within a radial threshold of each other can pass messages along a directed edge. To enable a focus on the ligand–receptor interface, we use a flexible framework that allows us to configure distinct sets of “source” () and “target” () atoms for messages in each layer (Figure B). At each layer, we can select a new set of target nodes from ligand ( = ), receptor (), or complex ( = = ) nodes. The target nodes of the previous layer will be selected as the source nodes .

Figure 3

Architecture of AtomNet GRAPHite, a directional Message Passing Neural Network. A) Ligand and receptor atoms are represented as nodes in AtomNet Graphite, omitting their covalent structure. B) Overview of AtomNet GRAPHite’s graph convolutional network architecture. After an initial embedding, any number of interaction blocks can be stacked. Each interaction block features a learnable pointwise convolution and can be configured to allow interactions between ligand and/or receptor atoms with a customizable layerwise cutoff. A final readout layer extracts relevant features from ligand and receptor embeddings independently and passes them through a final multilayer perceptron. We define a feature vector of features for source atoms , with f0 initialized using a one-hot encoding of SYBYL atom types. The combination of the radial threshold and layer-wise choice of target nodes gives a layer-wise neighborhood , where d = ||r – r|| is the pairwise distance between atoms i and j. Our interaction block (Figure B) is based upon a continuous filter convolution[54]where is a bottleneck source feature constructed by concatenating all layers of matching source atoms . This bottleneck allows us to change the number of filters on a layer and avoid mismatched filter sizes. We use convolutional kernels constructed from linear combinations of zeroth-order spherical Bessel functions with a layer-dependent normalization j0(x) = √(2/R) sin x/x. Here, z0 is the nth zero of j0(x), and are learnable weights, and the number of basis functions is chosen as , where . This parametrization ensures that the convolutional kernels vanish at the edge of an internal neighborhood. Note that since we are not calculating forces, we do not need a smooth cutoff at the boundary. After the graph convolution step, we add a skip connection based upon the target-bottleneck and apply a Linear – LeakyReLU – Linear – LayerNorm multilayer perceptron (MLP). This bottleneck layer is used instead of the conventional residual connection at the output of a convolution block. We found empirically that this performed better than standard skip connections from the previous layer of the same set of source atoms. To construct our final embedding, we concatenate readout operations to the ligand and receptor features independently at each layer: and , where and are respectively the ligand and receptor-only pooled features at layer (Figure B). Note that if no features are included at a layer, we do not include them in the embeddings. Finally, we concatenate all these embeddings together, zread = z ⊕ z, and apply a Linear – LeakyReLU – Linear MLP to obtain the ANPR ranking logit. We listed all hyperparameters in Table S1.

Results

Ligand:Receptor Interfaces Contribute to Pose Prediction Accuracy

Several studies suggest that structure-based vHTS CNN-based methods minimally rely on features of the receptor[55,56] but instead distinguish active from inactive compounds primarily by ligand-based features. Carefully designed data-augmentation can draw in receptor-based features in these models, thereby promoting generalizability.[57] By contrast, receptor features in graph-based models can affect binding mode prediction.[58] These reports of “memorization” in bioactivity prediction, which can also affect other AtomNet applications,[59] motivated us to design AtomNet’s configurable convolutional layers to deconvolve the role of the receptor and the ligand in predictive performance and to probe and optimize the role of ligand:receptor interactions. Note that in the application to pose quality studied here, ligand 1D features are identical in positive and negative examples, although 3D conformation and receptor interactions vary. A hyperparameter controls the number and configurations of AtomNet’s message passing layers. If ‘l’ denotes the set of ligand atoms, ‘r’ denotes the set of receptor atoms, and ‘lr’ denotes the combined set of atoms; each layer can pass messages from ‘l’ to ‘r’, ‘r’ to ‘l’, ‘l’ to ‘l’, and ‘r’ to ‘r’, and each ‘l’ or ‘r’ in these can be replaced by ‘lr’. To test the role of the ligand:receptor interface in classifying poses, we evaluated the effect of distinct layer configurations on distinguishing correctly docked poses from incorrectly docked poses for the Uniprot split PDBbind data set. We tested layer configurations featuring transitions between ligand and receptor layers to focus on interface features. Table reveals that, while the network demonstrates some capacity to identify correct poses based on ligand features alone, the test AUC (PDBbind self-docking, Uniprot split) increased dramatically when we included the ligand:receptor interface in the layer configurations (test AUC > 0.9), compared to a layer configuration between ligand atoms only (test AUC = 0.67). When we probed the interface with additional convolutional layers, we observed a peak in test AUC at 0.92. Adding more layers did not result in further improvements. These findings suggest that our graph convolutional architecture is sensitive to mechanisms of molecular recognition in structure-based drug design. To minimize the potential of ligand- or receptor-based pattern memorization, for the remainder of the study we adopted the “l→l→r→l→r→l” layer architecture that includes edges only between ligand–receptor atom pairs.

Table 1

Distinct Configurations of Convolutional Layers Respond Differentially to Mechanisms of Molecular Recognitiona

layer configuration	test ROC AUC
l→l→l→l→l→l	0.67
lr→lr→lr→lr→lr→lr	0.91
l→l→r→l→r→l	0.91
l→l→r→l→r→l→r→l	0.92

Test AUC for distinct convolutional layer configurations in ANPR. Test AUCs were computed from classifying poses obtained from self-docking on the PDBbind data set Uniprot split.

AtomNet PoseRanker Improves (sm)vina and ML-Based Rankings

Next, we compared ANPR’s ability to classify docked poses on the PDBbind data set to similar classifiers. On the Uniprot split, ANPR’s test AUC was 0.93. Other methods similarly reported AUCs in the 0.86–0.94 range[27] (Table ). By training simultaneously on poses and activity, Lim et al.[28] achieved an AUC = 0.94 on their data set. Direct comparison of AUCs with these methods is difficult owing to differences in the way the data is split which can lead to memorization effects[27] and slight random initializations of the methods.

Table 2

Pose Classification Compared to Other Methods Across Different Train/Test Splitsa

	ROC AUC (PR AUC)
method	Uniprot	seqsim70	seqsim50
ANPR	0.93 (0.47)	0.90 (0.45)	0.89 (0.39)
smina docking	0.82	0.82	0.81
Cornell et al.[58]	0.86
Ragoza et al.[60]	0.815c
Lim et al.[28]	0.94b

Baseline random performance is 0.5 for ROC AUC and 0.05 for PR AUC, reflecting the presence of 5% true positive good poses in the test set.

In contrast to other methods in this table, the model of Lim et al. was trained simultaneously on poses and activity.

Three-fold cross-validation on 90% sequence identity.

Baseline random performance is 0.5 for ROC AUC and 0.05 for PR AUC, reflecting the presence of 5% true positive good poses in the test set. In contrast to other methods in this table, the model of Lim et al. was trained simultaneously on poses and activity. Three-fold cross-validation on 90% sequence identity. We then tested ANPR’s ability to generalize out of the training data by evaluating its performance on the seqsim70 and seqsim50 data splits, which challenge memorization effects of the training data. We observed a small decrease in performance from AUC = 0.93 to 0.89. In practical applications, pose ranking within a particular target class is more important than the pose quality across all targets as reported by AUC. ANPR significantly enriched the fraction of poses within 2.5 Å of the native pose among the top-n ranked poses compared to smina docking (Figure ). When “gap” poses between 2.5 and 4 Å were excluded in training, 45% of the top poses in ANPR were within 2.5 Å of a crystal pose by target class (Figure A black line), compared to 39% of the top smina poses on the (most challenging) seqsim50 test set including “gap” poses (Figure A, blue line).

Figure 4

Comparison of pose ranking performances. Cumulative fraction of targets with docked poses within 2.5 Å RMSD of the crystal pose across all receptors within a target class by pose rank for the seqsim50 split according to vina (blue), ANPR trained on self-docked crystal structures (black), ANPR trained on cross-docked crystal structures (pink), and ANPR trained on sampled conformational ensembles (green). Each figure shows the top 16 ranked poses for each docking attempt in A) the full data set and B) targets that include at least six PDB structures, for a direct comparison of experimental and sampled ensembles. C) Models applied to a data set of single receptor conformations for all targets for which only one crystal structure is available. For reference, the vina ranking for the same targets across a six-member sampled ensemble is shown in orange.

AtomNet PoseRanker Enriches Intratarget Compound Rankings in Cross-Docking

While often the redocking performance of a deep learning model is reported to evaluate its performance, correctly predicting the binding mode of a new ligand, possibly to a new target, is more important in SBDD. The new target may even lack a crystal structure, so vHTS will need to rely on a closely related structural model or a homology model. We therefore trained and evaluated ANPR on a cross-docked data set (Methods). Table reports the AUC for ANPR compared to smina docking.

Table 3

Cross-Docking Pose Ranking/Classification Compared to smina Dockinga

	ROC AUC (PR AUC)
method	Uniprot	seqsim70	seqsim50
ANPR	0.91 (0.46)	0.90 (0.45)	0.86 (0.36)
smina docking	0.78	0.79	0.78

Baseline random performance is 0.5 for ROC AUC and 0.05 for PR AUC, reflecting the presence of 5% true positive good poses in the test set.

Baseline random performance is 0.5 for ROC AUC and 0.05 for PR AUC, reflecting the presence of 5% true positive good poses in the test set. Unsurprisingly, distinguishing favorable docking poses from unfavorable poses is more challenging in cross-docking compared to self-docking, reflected by reduced AUCs for ANPR and smina docking. There are numerous challenges in cross-docking: 1) the binding site often has a different size and/or shape owing to different (equilibrium) backbone and side-chain positions; 2) amino acid insertions, deletions, or substitutions can alter the steric and electrostatics characteristics of the site; 3) changes in solvation characteristics; and 4) experimental conditions like crystal packing, etc. Accordingly, test AUC were reduced slightly compared to self-docking across all data splits (Table ). Despite its reduced ROC-AUC, we observed marked intratarget early enrichment among the top-ranked poses of the model trained on cross-docked crystal structures compared to the model trained on cognate receptors (Figure A, top-1 54% (pink) vs 45% (black)). Our highly imbalanced data set could mask true model performance of the cognate model reported by ROC-AUC (0.89 vs 0.86 for the cross-docked model). However, we observed a concomitantly slightly elevated PR-AUC for the cognate model (0.39 vs 0.36) suggesting both models detect positives across poses at similar rates. Nonetheless, the cross-docked model appears more confident about good poses, likely because similar good poses occurred across multiple receptor conformations, whereas invalid poses were reproduced less across receptors. We note that our baseline smina performance (blue) is lower than similar applications using PDBBind,[61] likely due to differences in docking setup and parameters; for our intended application in reranking, we optimized for sampling depth rather than top1 performance.

Ensemble-Based Docking

Next, to further examine how learning receptor conformational diversity can help ANPR recognize valid binding modes, we developed a protocol that creates an ensemble of computationally sampled receptor conformations for docking based on Rosetta’s comparative modeling protocol[46] (Methods). Aside from providing conformational variability when multiple experimental structures are lacking, computationally sampled ensembles offer advantages over those constructed from experimental structures for a pose training task, including removing variations due to mutations or differences in construct, normalizing the numbers and sources of different conformations available for each protein in the data set, and permitting the trained model to be used in practical applications to docked poses to homology models produced through Rosetta-based pipelines.

FlexPDBbind: Ensemble Models for 13,000 Biomolecular Complexes

For each biomolecular complex in the PDBbind “refined” and “general” sets, we generated a six-member conformational ensemble. We applied Rosetta’s FastRelax protocol to the crystal structures to generate the first member and then repeatedly applied the Hybridize and FastRelax protocols to generate up to five additional conformations (Methods). This resulted in 13,063 biomolecular ensembles, for a total of 71,825 conformations. The mean Root Mean Square Deviation (RMSD) of the ensembles calculated over the CA atoms is 1.57 Å, and the Root Mean Square Fluctuation is 2.10 Å (Figure A,B). This compares to a mean RMSD of 1.24 Å for the structurally aligned crystal structures. To evaluate the structural quality of the generated models, we calculated clash scores and overall Molprobity scores with Molprobity[62] (Figure D,E). We discarded ensembles with outlying clash scores (scores over 10). We then docked ligands to each conformation of their cognate receptor using the protocol detailed in Methods. The RMSDs of docked ligand poses calculated to their starting pose in the source receptor conformation reveal a distribution sharply peaked around 3.5 Å with a long tail, illustrating enrichment of low-RMSD poses in sampled ensembles (Figure C). The distribution broadens when the RMSD is calculated with respect to the pose from the crystal structure, owing to random offsets in rigid body transformations of the generated structures (Figure C, red). Figure F shows two examples of conformations in FlexPDBbind.

Figure 5

Descriptive statistics of FlexPDBbind ensembles. A) RMSD and B) RMSF distributions for six-member sampled ensembles. C) Distribution of RMSDs of ligand poses docked to sampled ensembles (blue) and crystal structures (red). D) Molprobity clash scores and E) overall scores for sampled ensembles. F) Examples of a sampled conformational ensemble for ABL1 in complex with an inhibitor (left, PDB ID5HU9) and dethiobiotin synthetase in complex with CDP (right, PDB ID6CVF) with the crystal structure in purple and ensemble members in green. The ligand corresponding to the crystal structure is shown in yellow.

Ensemble-Docking Mitigates Limitations of Induced-Fit Docking

ANPR trained on the FlexPDBbind ensemble of cognate receptors slightly improved top-1 enrichment compared to the model trained on single conformations (Figure A; top poses: 48% (green) vs 45% (black) for single conformations). However, we did not observe improved enrichment beyond the top poses. Unsurprisingly, the model trained on cross-docked crystal structures outperformed the FlexPDBbind-trained model (Figure A pink; top poses: 54%), likely owing to receptor conformations that are closer to physiological substates and to a more diverse representation of binding modes for targets with a large number of receptor-compound pairs in PDBbind. Approximately 80% of targets in PDBbind have fewer than six structural representations. For a more direct comparison between the cross-docked set and FlexPDBbind, we repeated the analysis limited to targets that have at least six structural representations in the cross-docked set (Figure B). While the top-1 enrichment on the FlexPDBbind subset was similar to that of the full FlexPDBbind set (Figure A), somewhat surprisingly the enrichment gain between the cross-docked and FlexPDBbind was substantially reduced on these subsets. Strikingly, when we applied an ANPR model trained on sampled ensembles to predict binding modes using a data set of single receptor structures, we found that top poses were enriched over the model trained on single structures (Figure C, green vs black line). Notably, ANPR enrichment on single conformations exceeded that based on vina rank across six-member sampled ensembles (Figure C, green vs orange line). While, again, the improved ANPR performance could in part perhaps be attributable to the larger training set for the sampled ensembles, these results do suggest that the model retains information about flexible binding sites, illustrating the value of performing additional conformational sampling where experimental data is limited.

Ensemble-Trained AtomNet PoseRanker Enriched Abl Kinase Actives on a Conformational Ensemble and a Single Conformation

Abl Kinase

As a practical application, we examined the effects of receptor conformational ensembles and compound pose reranking in identifying active compounds of Abl kinase (Abelson tyrosine kinase). Abl is of clinical significance due to its causal role in chronic myelogenous leukemia (CML); in about 90% of cases of CML, a chromosomal translocation forms a “Philadelphia chromosome”, which creates a fusion between the Abl and break-point cluster (Bcr) genes to produce a protein with constitutive kinase activity.[63,64] Inhibitors of Abl kinase activity are in clinical use as treatments for CML and other cancers since the US approval of the targeted inhibitor imatinib in 2001 and have significantly improved clinical outcomes.[64] Kinase inhibitors are often divided into multiple classes or types depending on their binding mode.[65] Type I inhibitors are considered to bind to the “DFG-in” protein conformation, while type II inhibitors bind the “DFG-out” conformation. The DFG-in conformation is sterically incompatible with the most common binding mode of type II inhibitors. Among known mutations conferring resistance to imatinib and related compounds, many exert their effect by changing the protein’s conformational distribution and reducing occupancy of the favored binding state (Figure E). As a result, identifying both type I and type II inhibitors in a structure-based virtual screen of the ATP binding site is important for identification of new molecules of potential clinical utility and requires the use of multiple protein conformations to adequately sample native-like ligand binding modes. We evaluated the power of our ensemble-docking protocol to enrich known Abl inhibitors over an in-house curated library of negative examples including known nonbinders and randomly selected molecules. We compared a conformational ensemble (n = 16, RMSF = 2.7 Å) generated using our protocol to a conformational ensemble (n = 16, RMSF = 2.4 Å) derived from a Markov state model (MSM) based on extensive molecular dynamics simulations.[42,52] We also calculated the enrichment factor from the smina rank of docked poses to the (relaxed) crystal structure. Docking compounds to an ensemble of MSM receptor conformations dramatically improved early enrichment of active compounds compared to docking to a single receptor conformation when we ranked compounds by smina score (Figure A, purple vs blue). We observed similar enrichment factors when we ranked compounds docked to an ensemble of Rosetta-generated receptor conformations using ANPR (Figure A, red). While these enrichment factors are near-indistinguishable, our Rosetta ensembles were generated at a fraction of the computational cost compared to MSM ensembles. Strikingly, early enrichment remained similarly elevated when we ranked compounds docked to a single receptor conformation using ANPR (Figure A, magenta). Note that the ANPR model was trained on a cross-docked data set, which led the model to recognize distinct but valid poses of compounds corresponding to diverse receptor conformations. ANPR in combination with the MSM ensemble did not achieve the same enrichment factors (Figure A, yellow), likely because the MSM receptor conformations are too distinct from the Rosetta-generated conformations in the training set. Thus, docking against an ensemble of conformations can lead to early enrichment of active compounds, which can be retained using even a single receptor conformation with the ANPR model.

Figure 6

Ensemble docking enriched known Abl inhibitors compared to single receptor conformations. A) Enrichment factors for established Abl kinase inhibitors and confirmed nonbinders and random compounds. Colors represent different combinations of ensembles and scoring algorithms. Compounds are ranked from best to worst according to the score for each method, i.e., the value 0.5 on the x-axis represents the top-scoring 50% of compounds for each of the methods. B) Clustering by the ECFP4 fingerprint of active compounds selected by docking to a single conformation (blue) or an ensemble (orange). C) The structure of ponatinib, a known Abl1 inhibitor. Compounds in the lower right cluster of panel B are ponatinib analogs. D) The distribution of Rosetta receptor conformations for ANPR hits. Conformation 1 corresponds to the crystal structure. We then compared the chemical diversity of active top compounds docked to single receptor conformations to those using the Rosetta ensemble. We selected the 1% top-ranked compounds for each method. These 90 compounds would approximately fill a 96-well plate used for experimental validation in a vHTS screen. We then selected all active compounds among the top 1% for the single receptor conformation/smina ranking (31 compounds) and the Rosetta-ensemble/ANPR (59 compounds) method. We excluded two compounds that ranked within the top 90 in both methods, since we are primarily interested in examining their differences. A t-SNE[66] projection of a principal components analysis computed from their ECFP4 fingerprints suggests that the Rosetta ensemble is slightly more diverse compared to the single conformation compounds (Figure B). Toward the lower left corner of the t-SNE projection, smina uncovers exceedingly fewer compounds. In the lower right corner, the first t-SNE coordinate separates a cluster of compounds at “tsne 1” > 2 that were enriched nearly exclusively in the ensemble. These compounds are all analogs of Ponatinib (Figure C), notably sharing the ethynyl linker to a imidazol-1,2-pyridazine-like hinge region of the compound. ANPR identified the most hits in this cluster against receptor conformations 1 and 16 (Figure D). Altogether, two out of 20 hits in this cluster were identified from docking to a single receptor (using smina, blue circles). Among all top 1% of compounds in both docking methods, only two Ponatinib analogs were identified using smina ranking.

Conclusion

In this study, we describe ANPR, a graph-based convolutional neural network trained to identify high-quality crystal-like poses from docking. Importantly, and in contrast to previous work, we use ligand conformations prepared without knowledge of the ligand conformation in the PDB structure, which is representative of use cases encountered in virtual screening applications. Using the curated data set PDBbind 2019, we demonstrate that our method is capable of improving upon enrichment and ranking metrics for pose tasks compared to the baseline established by the physics-based scoring function vina, as implemented in the open-source docking software Smina. An important aspect of pose prediction in practical use cases for drug discovery campaigns is identifying a biologically relevant and binding-competent receptor conformation. Where there is limited crystallographic information available, such as only a single crystal structure or only a homology model for a target of interest, this can be particularly challenging. We introduce a simple Rosetta-based protocol for focused conformational sampling and demonstrate that docking to small receptor ensembles can improve pose ranking metrics. The protocol is competitive in identifying active compounds for a given target compared to ensembles generated with more computationally intensive sampling techniques, such as Markov state models derived from molecular dynamics simulations. We have integrated our sampling and ANPR reranking approach with vHTS models to improve hit identification in drug discovery.[59]

Data and Software Availability

We provide input scripts and parameters for our Rosetta protocol in the Supporting Information. Our FlexPDBBind v2019 data set of sampled ensembles for PDBBind structures and our analyses scripts are freely accessible and available for download from http://atomwise.com/flexpdbbind2019. We detailed minor modifications to the Smina software (https://sourceforge.net/projects/smina/, commit ca9dcb) in the Supporting Information. AtomNet PoseRanker is proprietary software built on open-source tools Python and PyTorch. The section “AtomNet PoseRanker” provides details for reimplementation of the MPNN underlying AtomNet PoseRanker. Note that our major findings, improved pose rankings from training on a dynamic ensemble and internalization of receptor conformational flexibility, should be similarly detected by high performance MPNNs like ANPR.

53 in total

Review 1. Docking Screens for Novel Ligands Conferring New Biology.

Authors: John J Irwin; Brian K Shoichet
Journal: J Med Chem Date: 2016-03-15 Impact factor: 7.446

2. Reliable and Accurate Solution to the Induced Fit Docking Problem for Protein-Ligand Binding.

Authors: Edward B Miller; Robert B Murphy; Daniel Sindhikara; Kenneth W Borrelli; Matthew J Grisewood; Fabio Ranalli; Steven L Dixon; Steven Jerome; Nicholas A Boyles; Tyler Day; Phani Ghanakota; Sayan Mondal; Salma B Rafi; Dawn M Troast; Robert Abel; Richard A Friesner
Journal: J Chem Theory Comput Date: 2021-03-29 Impact factor: 6.006

3. PDB-wide collection of binding data: current status of the PDBbind database.

Authors: Zhihai Liu; Yan Li; Li Han; Jie Li; Jie Liu; Zhixiong Zhao; Wei Nie; Yuchen Liu; Renxiao Wang
Journal: Bioinformatics Date: 2014-10-09 Impact factor: 6.937

4. Structure- and Ligand-Based Virtual Screening on DUD-E⁺: Performance Dependence on Approximations to the Binding Pocket.

Authors: Ann E Cleves; Ajay N Jain
Journal: J Chem Inf Model Date: 2020-04-21 Impact factor: 4.956

5. Recipes for the selection of experimental protein conformations for virtual screening.

Authors: Manuel Rueda; Giovanni Bottegoni; Ruben Abagyan
Journal: J Chem Inf Model Date: 2010-01 Impact factor: 4.956

6. Open Babel: An open chemical toolbox.

Authors: Noel M O'Boyle; Michael Banck; Craig A James; Chris Morley; Tim Vandermeersch; Geoffrey R Hutchison
Journal: J Cheminform Date: 2011-10-07 Impact factor: 5.514

7. Conformational analysis of the DFG-out kinase motif and biochemical profiling of structurally validated type II inhibitors.

Authors: R S K Vijayan; Peng He; Vivek Modi; Krisna C Duong-Ly; Haiching Ma; Jeffrey R Peterson; Roland L Dunbrack; Ronald M Levy
Journal: J Med Chem Date: 2014-12-12 Impact factor: 7.446

8. Small molecules block the interaction between porcine reproductive and respiratory syndrome virus and CD163 receptor and the infection of pig cells.

Authors: Chang Huang; Denzil Bernard; Jiaqi Zhu; Radha Charan Dash; Alexander Chu; Alec Knupp; Anna Hakey; M Kyle Hadden; Antonio Garmendia; Young Tang
Journal: Virol J Date: 2020-07-30 Impact factor: 4.099