Sarah Hall-Swan1, Didier Devaurs2, Mauricio M Rigo1, Dinler A Antunes3, Lydia E Kavraki4, Geancarlo Zanatta5. 1. Department of Computer Science, Rice University, Houston, 77005, Texas, United States. 2. MRC Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom. 3. Department of Computer Science, Rice University, Houston, 77005, Texas, United States; Department of Biology and Biochemistry, University of Houston, Houston, 77005, Texas, United States. Electronic address: dinler@uh.edu. 4. Department of Computer Science, Rice University, Houston, 77005, Texas, United States. Electronic address: kavraki@rice.edu. 5. Department of Physics, Federal University of Ceará, Fortaleza, CE, Brazil. Electronic address: geancarlo.zanatta@ufc.br.
Abstract
An unprecedented research effort has been undertaken in response to the ongoing COVID-19 pandemic. This has included the determination of hundreds of crystallographic structures of SARS-CoV-2 proteins, and numerous virtual screening projects searching large compound libraries for potential drug inhibitors. Unfortunately, these initiatives have had very limited success in producing effective inhibitors against SARS-CoV-2 proteins. A reason might be an often overlooked factor in these computational efforts: receptor flexibility. To address this issue we have implemented a computational tool for ensemble docking with SARS-CoV-2 proteins. We have extracted representative ensembles of protein conformations from the Protein Data Bank and from in silico molecular dynamics simulations. Twelve pre-computed ensembles of SARS-CoV-2 protein conformations have now been made available for ensemble docking via a user-friendly webserver called DINC-COVID (dinc-covid.kavrakilab.org). We have validated DINC-COVID using data on tested inhibitors of two SARS-CoV-2 proteins, obtaining good correlations between docking-derived binding energies and experimentally-determined binding affinities. Some of the best results have been obtained on a dataset of large ligands resolved via room temperature crystallography, and therefore capturing alternative receptor conformations. In addition, we have shown that the ensembles available in DINC-COVID capture different ranges of receptor flexibility, and that this diversity is useful in finding alternative binding modes of ligands. Overall, our work highlights the importance of accounting for receptor flexibility in docking studies, and provides a platform for the identification of new inhibitors against SARS-CoV-2 proteins.
An unprecedented research effort has been undertaken in response to the ongoing COVID-19 pandemic. This has included the determination of hundreds of crystallographic structures of SARS-CoV-2 proteins, and numerous virtual screening projects searching large compound libraries for potential drug inhibitors. Unfortunately, these initiatives have had very limited success in producing effective inhibitors against SARS-CoV-2 proteins. A reason might be an often overlooked factor in these computational efforts: receptor flexibility. To address this issue we have implemented a computational tool for ensemble docking with SARS-CoV-2 proteins. We have extracted representative ensembles of protein conformations from the Protein Data Bank and from in silico molecular dynamics simulations. Twelve pre-computed ensembles of SARS-CoV-2 protein conformations have now been made available for ensemble docking via a user-friendly webserver called DINC-COVID (dinc-covid.kavrakilab.org). We have validated DINC-COVID using data on tested inhibitors of two SARS-CoV-2 proteins, obtaining good correlations between docking-derived binding energies and experimentally-determined binding affinities. Some of the best results have been obtained on a dataset of large ligands resolved via room temperature crystallography, and therefore capturing alternative receptor conformations. In addition, we have shown that the ensembles available in DINC-COVID capture different ranges of receptor flexibility, and that this diversity is useful in finding alternative binding modes of ligands. Overall, our work highlights the importance of accounting for receptor flexibility in docking studies, and provides a platform for the identification of new inhibitors against SARS-CoV-2 proteins.
The respiratory disease COVID-19, caused by the novel coronavirus SARS-CoV-2, went from an outbreak to a pandemic in just a few months [1]. In response, there have been unprecedented global efforts to develop vaccines and effective treatments. Several effective vaccines have been validated and used in massive vaccination campaigns [2], particularly in developed countries. However, the need for pharmacological treatments for infected patients persists due to unequal vaccination coverage across the globe [3,4] and to the rise of more virulent variants that can cause symptoms even in fully-vaccinated individuals [5,6].Among potential pharmacological targets, proteins involved in the viral replication have been used in several computational studies focused on drug design, drug repurposing and virtual screening [7,8]. An impressive number of SARS-CoV-2 protein structures have been solved in a short period of time, but computational studies targeting these proteins have been mostly limited to the use of a single experimental structure [[9], [10], [11], [12], [13], [14], [15], [16]]. Although this approach is expected to correctly reproduce similar experimental structural data, it tends to fail in exploring the binding of ligands with diverse chemical features (e.g., large ligands or ligands with alternative binding modes), as it does not account for the inherent flexibility of proteins in solution [[17], [18], [19]].One example of the importance of protein malleability on SARS-CoV-2 drug screening involves the Main protease (Mpro). Although Mpro has been, so far, one of the most explored SARS-CoV-2 targets in computational studies, there remain numerous open questions for the design of effective inhibitors. Indeed, the malleability of the Mpro active-site cavity remains the greatest challenge in the development of effective inhibitors. By collecting X-ray data from ligands bound to Mpro at room temperature, Kneller et al. have demonstrated experimentally that Mpro has the ability to substantially distort its shape in response to binding [18]. Such malleability allows Mpro to accommodate a larger diversity of physico-chemical features (e.g., chemical groups, size, charge distribution) in ligands.Accounting for protein flexibility adds complexity and increases the computational cost of molecular docking studies. Therefore, explicitly sampling full receptor flexibility while also sampling alternative ligand conformations is generally unfeasible. Various strategies have been proposed to make this problem computationally tractable [20]. A popular solution involves explicitly sampling only selected parts of the receptor, as in selective docking with flexible binding-site residues. Another one, known as ensemble docking, only implicitly accounts for full receptor flexibility [21]: a separate sampling method is used to explore protein flexibility, and a set of representative conformations is extracted to create an ensemble for docking [22]; ligand sampling is then conducted as in a regular molecular docking job. By docking the ligand against all conformations in the ensemble, rather than a single conformation, ensemble docking implicitly accounts for receptor flexibility [23,24].Note that each of these strategies suffers from its own limitations and that successful predictions usually require significant knowledge about the system of interest and the required methodologies. On one hand, selective docking requires knowledge of which residues should be prioritized for explicit sampling during docking. In addition, execution time grows exponentially with the number of flexible bonds, which might prevent the exploration of complex binding sites or large ligands. On the other hand, ensemble docking requires expertise on tools that can sample protein conformations (e.g., molecular dynamics, Langevin dynamics, Monte Carlo simulations, or coarse-grained simulations [25,26]). These methods can produce a huge number of conformations, in turn requiring user expertise on methods that can select representative conformations (e.g., dimensionality reduction, clustering, or free energy estimation) [22,27]. All these tasks are very time-consuming, taking days to weeks to implement and execute, depending on available computational resources [22,23].Despite these challenges, recent studies have highlighted the importance of considering multiple receptor conformations in molecular docking studies [22,24,28], and even presented preliminary results on the use of ensemble docking to screen for drug inhibitors against SARS-CoV-2 proteins [23,29]. However, the resources and expertise required for conducting an ensemble docking study still prevents its use by most researchers interested in testing new drugs or natural products against SARS-CoV-2 proteins.To bridge this gap, we present DINC-COVID, a user-friendly webserver for ensemble docking of small molecules and peptides to SARS-CoV-2 proteins. The sampling of binding modes is performed with DINC, a parallelized meta-docking approach that has been shown to outperform conventional docking tools on several challenging datasets, especially for large or flexible ligands (e.g., peptides or peptidomimetics) [30]. The proteins currently available for docking through DINC-COVID are the Main protease (Mpro) [[9], [10], [11], [12], [13], [14], [15], [16]], the Papain-like protease (PLpro) [31], and the RNA-dependent RNA polymerase (RdRp) [[32], [33], [34]]. For Mpro, in addition to the catalytic site, we allow for predictions targeting an allosteric site [35]. Note that the ensembles we have pre-computed cover different ranges of protein flexibility, for each targeted binding site. This initial selection of SARS-CoV-2 proteins and binding sites was performed based on the availability of crystallographic structures in the Protein Data Bank (PDB) and the potential importance for drug discovery. Nonetheless, our team is continuously creating additional ensembles, covering other targets of interest, which will be later added to the DINC-COVID webserver.We have carried out a proof-of-concept validation of DINC-COVID using datasets of experimentally-determined inhibitors for SARS-CoV-2 proteins. As such data are still scarce, we have focused our validation on a small dataset of PLpro inhibitors [31] and three small datasets of Mpro inhibitors [18,36,37]. For all datasets, we have obtained good correlations between predicted binding energies and experimentally-determined IC
50 values. We have also compared DINC-COVID against two other online servers targeting SARS-CoV-2 proteins, namely the COVID-19 Docking Server [38], and the DockThor-VS webserver [39]. Note that DINC-COVID is the only webserver that performs the simultaneous docking of a ligand against multiple receptor conformations and automatically aggregates the results (as suggested by the term ensemble docking), while the two other webservers provide more docking targets. We provide a detailed discussion on how our results are affected by choices related to method settings, dataset composition and statistical analysis. Finally, we provide further insight on the range of receptor conformational flexibility captured by DINC-COVID predictions, depending on the data/methodology used to generate the input ensemble and on the scoring function used to rank the output binding modes.
Methods
Overview of DINC-COVID ensemble docking strategy
To enable fast predictions of protein-ligand binding modes while accounting for receptor flexibility, we have decoupled the steps required for ensemble generation from those directly related to the ensemble docking procedure. For each binding target of interest, three ensembles have been pre-computed and stored on our webserver (see Fig. 1
). Therefore, all ensembles mentioned in this manuscript are available for docking. After accessing the DINC-COVID webserver, users can choose a binding target, select one of the available ensembles, and upload a ligand of interest (e.g., a drug-like ligand or a peptide). These files are then used as input for ensemble docking, which involves the parallelized meta-docking approach DINC [30]. All generated binding modes are rescored and ranked using three scoring functions: AutoDock Vina [40] (a.k.a. Vina), AutoDock 4 [41] (a.k.a., AD4) and Vinardo [42]. Finally, the top-scoring binding modes (for each scoring function) are returned to the user, reflecting the flexibility of both the ligand and the receptor.
Fig. 1
DINC-COVID overview. The top left figure is a schematic representation of a SARS-CoV-2 protein, in this case the Main protease (Mpro) dimeric structure. For a given SARS-CoV-2 protein, three distinct ensembles have been pre-computed: different shades represent different conformations within each ensemble. After uploading a ligand of interest and selecting one of the ensembles, the user can submit a job that will execute the live steps of ensemble docking, scoring and ranking, eventually returning the best results to the user. BS, binding site.
DINC-COVID overview. The top left figure is a schematic representation of a SARS-CoV-2 protein, in this case the Main protease (Mpro) dimeric structure. For a given SARS-CoV-2 protein, three distinct ensembles have been pre-computed: different shades represent different conformations within each ensemble. After uploading a ligand of interest and selecting one of the ensembles, the user can submit a job that will execute the live steps of ensemble docking, scoring and ranking, eventually returning the best results to the user. BS, binding site.
Pre-computed receptor ensembles
Four target binding sites of SARS-CoV-2 proteins are currently available through the DINC-COVID webserver: catalytic site and allosteric site of Main protease (Mpro), catalytic site of Papain-like protease (PLpro), and catalytic site of RNA-dependent RNA polymerase (RdRp) (see Fig. 2
). For each target binding site, we first compiled three ensembles of conformations (see Appendix A for more information): one ensemble containing crystal structures, and two ensembles containing conformations extracted from in silico molecular dynamics (MD) simulations.
Fig. 2
Protein binding pockets. Surface representation of binding pockets (colored yellow) for (A) Mpro (catalytic site on the left, and allosteric site on the right), (B) PLpro (catalytic site), and (C) RdRp (catalytic site). (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Protein binding pockets. Surface representation of binding pockets (colored yellow) for (A) Mpro (catalytic site on the left, and allosteric site on the right), (B) PLpro (catalytic site), and (C) RdRp (catalytic site). (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)We performed the MD simulations with the GROMACS 2019 package [43], using two different force fields: CHARMM36 [44] and GROMOS53a6 [45]. With each force field we ran five independent MD simulations of 200 ns, for a total of 1 μs. All simulations were performed with full-atom representation and explicit water molecules. Note that CHARMM36 simulations used TIP3 water models, while GROMOS53a6 used SPC water models. In addition, ions Na+ and Cl- were added at concentration of 0.15 M to neutralize the net charge of the system. Each system was minimized through the steepest descent algorithm, followed by two equilibration steps of 1 ns each, using NVT and NPT ensembles, respectively. Positions of protein atoms were restrained during equilibration. During the simulations, long-range electrostatics were modelled with the Particle Mesh Ewald (PME) method [46], temperature coupling was set at 310.15 K using the V-rescale thermostat [47] and the Parrinello-Rahman barostat [48] with a reference pressure of 1 bar; a compressibility of 4.5 × 10−5 bar−1 was applied for pressure control. Covalent bonds were constrained to their equilibrium length by the LINCS algorithm [49]. The integration steps of all simulations were set to 2 fs. The stability of each structure during the simulation time was assessed through root-mean-square deviation (RMSD) calculations using the “gmx rms” algorithm. Eventually, we extracted a set of roughly 100,000 conformations from each MD simulation with MDtraj [50]. These come in addition to the 179 crystallographic structures (156 for Mpro, 12 for PLpro, and 11 for RdRp) we extracted from the Protein Data Bank (PDB).To create the final ensembles we implemented a data-driven protocol that could extract representative conformations using algorithms from the scikit-learn package [51]. Briefly, minimal distances between all pairs of amino acids in the targeted binding site (i.e., distances between the closest pair of heavy atoms belonging to two residues) were used as features for a Principal Component Analysis (PCA). The first principal components accounting for around 80% of variance in the data were used to project each set of conformations into a lower-dimensional space. In this space, the Elbow method of scikit-learn was used to determine the ideal number of clusters (K). Representative members of each cluster were identified with K-means, and extracted to build the final ensembles. Although other strategies could have been adopted, this is a reasonable protocol to build a representative ensemble of conformations, which has produced good results in our validation experiments (see Results section). Finally, all selected structures were protonated at pH 7.0 using the PROPKA algorithm [52] at the PDB2PQR server [53]. In summary, for each target binding site, we produced three ensembles: one based on crystal structures (hereafter called Crystal ensemble) and two based on MD simulations with different force fields (hereafter called Charmm ensemble and Gromos ensemble). Note that for each receptor, these three ensembles capture different ranges of receptor flexibility and are therefore all useful for ensemble docking (see Section 3.3).
Live webserver docking procedure
DINC-COVID uses parallelization to speed up both the sampling and the scoring of protein-ligand binding modes (see Fig. 3
). Once a ligand structure and an ensemble of receptor conformations have been selected as input, the algorithm triggers multiple parallel docking jobs with the fast docking method Vina. Each independent job starts with a distinct, randomized conformation of the ligand and one of the receptor conformations from the selected ensemble. This first batch of docking jobs produces a diverse set of binding modes for the ligand against a single receptor conformation. This process is then repeated for each of the receptor conformations available in the ensemble. Once all docking jobs are completed, the rescoring phase starts. In this phase, all binding modes predicted for each receptor conformation are rescored using three popular scoring functions: Vina, Vinardo and AD4. At the end of the whole procedure, a user-specified number of top scoring conformations is provided as output, for each scoring function. These results include both alternative conformations of the ligand and alternative conformations of the receptor.
Fig. 3
DINC-COVID ensemble docking algorithm. The DINC algorithm triggers multiple parallel docking jobs with the fast docking method Vina. This process is repeated for each receptor conformation available in the ensemble. Once this docking phase is completed, the rescoring phase starts using Vina, Vinardo and AD4. A user-specified number of top scoring conformations (for each scoring function) is returned as output at the end of the procedure.
DINC-COVID ensemble docking algorithm. The DINC algorithm triggers multiple parallel docking jobs with the fast docking method Vina. This process is repeated for each receptor conformation available in the ensemble. Once this docking phase is completed, the rescoring phase starts using Vina, Vinardo and AD4. A user-specified number of top scoring conformations (for each scoring function) is returned as output at the end of the procedure.
DINC-COVID webserver interface and infrastructure
The DINC-COVID ensemble docking procedure is made available through an intuitive webserver interface (see Fig. 4
). It allows users to upload a ligand of interest and select the targeted receptor ensemble. Optional ligand preparation (e.g., adding hydrogens and charges) is also available. Note that we plan to allow for batch submission in future work. After submission and execution, users can visualize the selected binding modes through the viewer embedded in the “Results” page. If they want to perform offline analyses, users can download all the results, which include top-scoring binding modes in pdb format, as well as a text file listing the selected conformations and their binding energies with respect to all scoring functions. More details are available at dinc-covid.kavrakilab.org/method/.
Fig. 4
Running ensemble docking with DINC-COVID. (A) The home page of DINC-COVID allows users to (1) upload a ligand (i.e., a pdb or mol2 file), (2) set the parameters (e.g., request preparation of the ligand, or select the receptor ensemble), (3) provide an e-mail address and (4) submit the job. (B) A link is sent to users by e-mail, to check the progress of the ongoing job. (C) The same link allows checking the results on the DINC-COVID webpage. Users can browse through different binding modes, visualize conformations, or download all the results for further inspection.
Running ensemble docking with DINC-COVID. (A) The home page of DINC-COVID allows users to (1) upload a ligand (i.e., a pdb or mol2 file), (2) set the parameters (e.g., request preparation of the ligand, or select the receptor ensemble), (3) provide an e-mail address and (4) submit the job. (B) A link is sent to users by e-mail, to check the progress of the ongoing job. (C) The same link allows checking the results on the DINC-COVID webpage. Users can browse through different binding modes, visualize conformations, or download all the results for further inspection.The DINC-COVID webserver is implemented using Docker [54], allowing it to run with its own dependencies on any machine with Docker installed. The main backend is implemented with Django [55]. Submitted jobs are managed by Celery [56], a distributed task queue. The webserver is currently hosted on a 16-cores virtual machine in the Owl Research Infrastructure Open Nebula (ORION) VM Pool on Rice University Campus. DINC-COVID currently uses 16 parallel threads, and our virtual infrastructure allows for future expansion.
Proof-of-concept validation
As in any virtual screening study, identifying better inhibitors for SARS-CoV-2 proteins requires ranking sets of ligands with respect to their estimated binding affinity with a given receptor. To validate the ability of our method to accurately rank ligands, we computed correlations between docking-derived binding energies and the natural logarithm of experimentally-determined half-maximal inhibitory concentration, ln(IC
50); these correlations were evaluated using Pearson's r and Spearman's ρ. As experimental data, we used ln(IC
50) instead of IC
50 because it is expected to better correlate with predicted binding energies [[57], [58], [59], [60]]. As DINC-COVID-predicted data, for each ligand and each scoring function, we used the binding energy predicted for the top-scoring binding mode of this ligand according to this scoring function. Note that our goal was to assess the overall ranking accuracy of our ensemble docking results, as opposed to assessing the accuracy of individual binding energy predictions.To compare DINC-COVID with other methods, we performed the same experiments with the COVID-19 Docking Server (hereafter referred to as COVID-19-DS) [38] and the DockThor-VS webserver [39,61]. For COVID-19-DS we selected Mpro or PLpro as the “nCoV Protein Target”, with the recommended exhaustiveness of 8. Although the top 10 models are returned with a score value (in kcal/mol), we used only the score value of the top 1 model to perform our correlation analysis.For DockThor-VS, we selected either the Nsp5-Main protease (wild type) or the Nsp3-PLpro (wild type) as the target protein. Since two crystal structures of the Mpro receptor (with PDB codes 6LU7 and 6W63) are made available, we used both of them as input receptors for independent docking runs. Similarly, for PLpro, the structures with PDB codes 6W9C and 6WX4 were used as input receptors for independent docking runs. As docking parameters, we manually defined the catalytic binding site of both proteins and used the standard algorithm precision (i.e., 1,000,000 evaluations, population size of 750, and 24 runs). DockThor-VS provides a single output binding mode with a corresponding binding energy, which we used to compute correlations with experimental data.We used four datasets of ligands with experimentally-determined binding affinities in our validation experiments: one for the PLpro receptor and three for the Mpro receptor. More specifically, to create the first dataset, we selected seven PLpro inhibitors with known IC
50 values published by Osipiuk et al. [31]:Compound 1: 5-amino-2-methyl-N-[(1R)-1-naphthalen-1-ylethyl]benzamide (GRL0617);•Compound 2: 5-carbamylurea-2-methyl-N-[(1R)-1-naphthalen-1-ylethyl]benzamide;•Compound 3: 5-acrylamide-2-methyl-N-[(1R)-1-naphthalen-1-ylethyl]benzamide;•Compound 4: 3-amino-N-(naphthalene-1-yl)-5-trifluoromethyl)benzamide;•Compound 5: 5-(butylcarbamoylamino)-2-methyl-N-[(1R)-1-naphthalen-1-ylethyl]benzamide;•Compound 6: 5-(((4-nitrophenoxy)carbonyl)amino)-2-methyl-N-[(1R)-1-naphthalen-1-ylethyl]benzamide;•Compound 7: 5-pentanoylamino-2-methyl-N-[(1R)-1-naphthalen-1-ylethyl]benzamide.The first Mpro dataset comprises 14 compounds identified by Li et al. through free energy perturbation-based virtual screening and validated experimentally through an enzymatic assay [36]. In their study, IC
50 values for all compounds were measured through a fluorescence assay using an inhibitory curve with 500 nM of enzyme, 20 μM of substrate and six different concentrations of each compound. The second Mpro dataset was initially gathered by Ngo et al. to validate a free energy perturbation protocol to assess Mpro inhibitors during virtual screening [37]. It comprises 11 Mpro inhibitors whose experimental IC
50 values were retrieved from the literature and therefore obtained in different experiments. The third Mpro dataset was obtained from a study demonstrating the capacity of leupeptin and three hepatitis C clinical protease inhibitors to bind and inhibit SARS-CoV-2 Mpro [18]. Kneller et al. characterized these four ligands by X-ray crystallography at near-physiological room temperature, thus capturing Mpro motions that would not be observable at lower temperature.
Results
The accuracy of molecular docking tools is usually assessed through self-docking experiments, where protein-ligand complexes previously determined by experimental methods are used as references to be reproduced. In a self-docking experiment, the ligand conformation is randomized before docking, and the quality of predicted binding modes is quantified by computing their deviation from the reference experimental structure. These experiments are useful to validate the geometries of predicted binding modes, which is usually summarized in terms of RMSD values. However, it is important to highlight that reproducing the bound geometries of available crystal structures is not the intended application of our webserver. DINC-COVID's goal is to efficiently account for receptor flexibility during the docking of a ligand. This allows for the sampling of alternative low-energy binding modes that are not captured by rigid crystal structures or by conventional molecular docking techniques. Therefore, by design, the geometries of output complex conformations can significantly differ from those observed in available crystal structures, especially when using an ensemble of receptor conformations derived from molecular dynamics. Our underlying hypothesis is that these alternative lower-energy binding modes can better approximate the population of receptor-bound ligand conformations that exist in solution (in vitro/in vivo).
DINC-COVID predictions for PLpro inhibitors show high correlation with experimental data
To evaluate DINC-COVID predictions, we used a recently published dataset of seven PLpro inhibitors (see Table 1
) [31]. This dataset is mostly composed of small ligands (i.e., up to seven flexible bonds) that are all strong binders (i.e., IC
50 values ranging from 2.3 μM to 43.2 μM). It is therefore quite challenging with respect to the task of reproducing a ranking of binding affinities.
Table 1
Correlations between docking-derived binding energies and experimentally-determined binding affinities of PLpro drug-like inhibitors [31]. Predicted binding energies are reported for DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). Correlation coefficients between binding energies and ln(IC50) values are reported below; in each row, the two highest coefficients are highlighted.
Ligand
IC50
DINC-COVID
COVID-19-DS
DockThor-VS
Crystal Ensemble
Charmm Ensemble
Gromos Ensemble
Vinaa
Vinardoa
AD4a
Vinaa
Vinardoa
AD4a
Vinaa
Vinardoa
AD4a
6W9Cb
6WX4b
Compound 1
2.3
−10.53
−10.74
−9.66
−8.45
−8.27
−7.93
−8.73
−8.85
−8.22
−8.50
−7.38
−7.64
Compound 2
5.1
−10.82
−11.17
−11.24
−9.16
−8.73
−8.82
−9.31
−9.85
−10.34
−10.20
−7.72
−6.87
Compound 3
6.4
−10.41
−11.17
−11.62
−8.62
−9.61
−8.50
−8.75
−9.67
−9.08
−8.50
−7.17
−7.96
Compound 6
7.0
−10.48
−11.56
−12.67
−9.86
−10.72
−10.93
−9.83
−10.35
−11.87
−10.20
−8.44
−7.51
Compound 7
12.7
−9.67
−10.70
−10.65
−7.87
−9.03
−8.70
−8.99
−9.50
−9.53
−9.00
−7.91
−6.85
Compound 5
16.8
−9.84
−11.07
−10.78
−7.97
−9.95
−8.56
−8.55
−9.92
−8.87
−8.90
−7.70
−7.93
Compound 4
43.2
−9.16
−9.55
−9.07
−7.99
−8.21
−7.12
−8.61
−9.31
−8.61
−8.90
−7.63
−8.56
Pearson'sr
0.90
0.62
0.33
0.49
0.02
0.32
0.32
−0.14
0.14
0.09
−0.17
−0.49
Spearman'sρ
0.89
0.45
0.29
0.54
−0.07
0.18
0.43
−0.14
0.07
−0.13
−0.21
−0.36
IC50: half maximal inhibitory concentration (μM).
Binding energies in kcal/mol for the top scoring conformation according to the indicated scoring function.
Binding energies in kcal/mol for the top scoring conformation when bound to the indicated receptor.
Correlations between docking-derived binding energies and experimentally-determined binding affinities of PLpro drug-like inhibitors [31]. Predicted binding energies are reported for DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). Correlation coefficients between binding energies and ln(IC50) values are reported below; in each row, the two highest coefficients are highlighted.IC50: half maximal inhibitory concentration (μM).Binding energies in kcal/mol for the top scoring conformation according to the indicated scoring function.Binding energies in kcal/mol for the top scoring conformation when bound to the indicated receptor.In spite of that, we achieved high correlations between ln(IC
50) values and binding energies predicted by DINC-COVID using the Crystal ensemble (i.e., Pearson's r = 0.9 and Spearman's ρ = 0.89), far above correlations achieved by binding energies predicted by other ensembles and by other servers (see Table 1). In addition, when comparing predicted binding modes with crystal structures available for three of these compounds, it appears that DINC-COVID produced binding modes that are very similar to the corresponding crystal structures (see Fig. 5
).
Fig. 5
Structural analysis of binding modes. Crystal structures of compounds 1, 2, and 3 are compared to the binding modes produced by DINC-COVID (using Vina and the Crystal ensemble), COVID-19-DS, and DockThor-VS (using receptor 6W9C). Crystal structures are shown in green, and binding modes are shown in yellow. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Structural analysis of binding modes. Crystal structures of compounds 1, 2, and 3 are compared to the binding modes produced by DINC-COVID (using Vina and the Crystal ensemble), COVID-19-DS, and DockThor-VS (using receptor 6W9C). Crystal structures are shown in green, and binding modes are shown in yellow. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)The best results were obtained with the Vina scoring function and the Crystal ensemble, which might be due to the nature of this dataset (i.e., small ligands and limited receptor flexibility). However, good correlations were also observed in other settings, for example using Vinardo or the Charmm ensemble (see Table 1). Note that the highest correlation with ln(IC
50) values is not a spurious one (see Appendix B, Fig. 11). Although the small size of this dataset is likely to have contributed to this high correlation, the structural agreement between predicted binding modes and experimental structures corroborates the validity of this correlation. In addition, the structural examination of binding modes provides a rationale for the lower correlations observed for other servers in this experiment.
DINC-COVID rankings of Mpro inhibitors show good agreement with affinity-based rankings
To further validate DINC-COVID predictions, we used three datasets of ligands that were tested in inhibitory assays with Mpro, and for which IC
50 values are available.The first dataset contains 14 drug-like ligands that are experimentally characterized SARS-CoV-2 Mpro inhibitors (see Table 2
) [36]. Performing a docking experiment with this dataset produced rather low correlations with experimental ln(IC
50) values for all tested settings. According to Pearson's r, the best correlation is observed for DockThor-VS using the 6LU7 receptor (r = 0.36), but according to Spearman's ρ, the best correlation is observed for DINC-COVID using Vinardo and the Charmm ensemble (ρ = 0.55). This experiment highlights the challenges of dealing with outliers (see Appendix B, Fig. 12). Indeed, two ligands have a particularly strong impact on the obtained correlations (see Table 2). Removing Indinavir from the correlation analysis produces better correlations for all settings, the best results being achieved by DockThor-VS with the 6LU7 receptor (Pearson's r = 0.65) and DINC-COVID with Vinardo and the Charmm ensemble (Spearman's ρ = 0.75). Although removing Dipyridamole alone from the correlation analysis does not produce substantial changes, removing both Dipyridamole and Indinavir substantially improves Pearson's correlations, with the best results being achieved by DockThor-VS with the 6LU7 receptor (r = 0.74) and DINC-COVID with Vinardo and the Charmm ensemble (r = 0.65). These results show that, when considering only the 12 remaining ligands, good agreement is reached between predicted binding energies and experimental binding affinities.
Table 2
Correlations between docking-derived binding energies and experimentally-determined binding affinities of Mpro drug-like inhibitors [36]. Predicted binding energies are reported for DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). Correlation coefficients between binding energies and ln(IC50) values are reported below; in each row, the two highest coefficients are highlighted. Results for correlation analyses without outliers (i.e., Dipyridamole and Indinavir) are also reported.
Binding energies in kcal/mol for the top scoring conformation according to the indicated scoring function.
Binding energies in kcal/mol for the top scoring conformation when bound to the indicated receptor.
Correlations between docking-derived binding energies and experimentally-determined binding affinities of Mpro drug-like inhibitors [36]. Predicted binding energies are reported for DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). Correlation coefficients between binding energies and ln(IC50) values are reported below; in each row, the two highest coefficients are highlighted. Results for correlation analyses without outliers (i.e., Dipyridamole and Indinavir) are also reported.IC50: half maximal inhibitory concentration (μM); Roxatidine ace hydro: Roxatidine acetate hydrochloride; Valganciclovir hydro: Valganciclovir hydrochloride.Binding energies in kcal/mol for the top scoring conformation according to the indicated scoring function.Binding energies in kcal/mol for the top scoring conformation when bound to the indicated receptor.The second one is a mixed dataset composed of 11 ligands of various types that are experimentally characterized SARS-CoV-2 Mpro inhibitors (see Table 3
) [37]. This dataset is quite challenging for the task of reproducing a ranking of binding affinities because all ligands are very strong binders, with IC
50 values ranging from 0.04 μM to 21.39 μM. After performing a docking experiment with this dataset, reasonably good correlations between predicted binding energies and ln(IC
50) values were obtained across most settings (see Table 3). The best correlations were achieved by COVID-19-DS (with Pearson's r = 0.74 and Spearman's ρ = 0.73). The second best correlations were achieved by DINC-COVID using the Crystal ensemble, with Vinardo according to Pearson's r (= 0.7), but with Vina according to Spearman's ρ (= 0.68). Results obtained on this dataset are also strongly influenced by two outliers: 13a and Shikonin (see Appendix B, Fig. 13). Removing these outliers from the correlation analysis leads to improved correlations for all methods. The best correlations are then achieved by DINC-COVID using Vina and the Gromos ensemble (Pearson's r = 0.88 and Spearman's ρ = 0.95). This represents a very good agreement between predicted binding energies and binding affinities.
Table 3
Correlations between binding energies and experimental binding affinities of Mpro inhibitors [37]. Binding energies are reported for DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). Correlation coefficients between binding energies and ln(IC50) values are reported below; in each row, the two highest coefficients are highlighted. Results for correlation analyses without outliers (i.e., 13a and Shikonin) are also reported.
Ligand
IC50
DINC-COVID
COVID-19-DS
DockThor-VS
Crystal Ensemble
Charmm Ensemble
Gromos Ensemble
Vina a
Vinardoa
AD4a
Vinaa
Vinardoa
AD4a
Vinaa
Vinardoa
AD4a
6LU7b
6W63b
11b
0.04
−9.08
−9.98
−12.03
−8.29
−8.25
−10.46
−8.09
−8.50
−9.89
−8.10
−8.25
−8.63
11a
0.05
−8.86
−9.66
−12.62
−8.21
−8.58
−10.27
−8.08
−8.18
−10.66
−8.10
−7.98
−8.77
11r
0.18
−8.92
−10.80
−15.29
−8.44
−8.82
−13.94
−7.95
−8.88
−13.09
−7.40
−9.16
−9.47
13b
0.67
−8.51
−9.82
−14.20
−8.11
−9.14
−12.46
−7.92
−8.12
−12.65
−7.40
−8.70
−8.79
Ebselen
0.67
−6.82
−6.02
−6.38
−6.41
−5.75
−6.02
−6.14
−4.96
−5.31
−6.60
−8.27
−7.96
Tideglusib
1.55
−8.43
−8.28
−9.38
−8.68
−7.12
−7.43
−7.46
−6.31
−6.87
−8.00
−7.94
−8.29
Carmofur
1.82
−6.68
−7.18
−6.08
−6.48
−6.28
−5.49
−6.10
−5.83
−4.71
−6.20
−7.31
−7.09
13a
2.39
−9.02
−9.94
−13.44
−8.11
−8.43
−12.87
−8.49
−9.30
−12.22
−7.60
−9.24
−8.76
Disulfiram
9.35
−4.52
−5.02
−4.78
−4.18
−4.29
−3.95
−4.12
−4.20
−4.04
−4.40
−7.25
−7.66
Shikonin
15.75
−8.12
−7.67
−7.61
−8.05
−7.02
−7.44
−7.19
−7.15
−6.90
−6.70
−7.89
−7.56
PX-12
21.39
−4.72
−5.12
−4.70
−4.47
−4.88
−4.42
−4.42
−5.04
−4.86
−4.20
−7.47
−7.60
Pearson'sr
0.67
0.70
0.68
0.59
0.68
0.60
0.65
0.58
0.59
0.74
0.43
0.68
w/o 13a
0.73
0.78
0.76
0.63
0.74
0.71
0.74
0.71
0.69
0.79
0.59
0.73
w/o Shikonin
0.78
0.74
0.68
0.74
0.73
0.62
0.75
0.65
0.60
0.81
0.42
0.64
w/o both
0.88
0.84
0.78
0.80
0.81
0.76
0.88
0.83
0.72
0.87
0.62
0.70
Spearman'sρ
0.68
0.66
0.63
0.65
0.63
0.55
0.62
0.48
0.54
0.73
0.51
0.68
w/o 13a
0.85
0.77
0.77
0.63
0.69
0.72
0.86
0.69
0.67
0.82
0.72
0.77
w/o Shikonin
0.70
0.67
0.62
0.66
0.63
0.58
0.63
0.49
0.57
0.76
0.43
0.63
w/o both
0.94
0.82
0.80
0.64
0.72
0.80
0.95
0.79
0.75
0.88
0.69
0.75
IC50: half maximal inhibitory concentration (μM).
Binding energies in kcal/mol for the top scoring conformation according to the indicated scoring function.
Binding energies in kcal/mol for the top scoring conformation when bound to the indicated receptor.
Correlations between binding energies and experimental binding affinities of Mpro inhibitors [37]. Binding energies are reported for DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). Correlation coefficients between binding energies and ln(IC50) values are reported below; in each row, the two highest coefficients are highlighted. Results for correlation analyses without outliers (i.e., 13a and Shikonin) are also reported.IC50: half maximal inhibitory concentration (μM).Binding energies in kcal/mol for the top scoring conformation according to the indicated scoring function.Binding energies in kcal/mol for the top scoring conformation when bound to the indicated receptor.The third dataset contains four peptidomimetics that are known antiviral compounds: Leupeptin, Telaprevir, Narlaprevir and Boceprevir (see Table 4
) [18]. At the time the Mpro Crystal ensemble for DINC-COVID was built, no experimental structure of Mpro bound to these compounds was available. Indeed, it was only recently that the crystallographic structures of these complexes were solved at room temperature (and deposited under PDB codes 6XCH, 6XQS, 6XQT, and 6XQU). Working at room temperature allowed for the induced-fit process to occur without the constraints of crystallographic packing. Even without access to these additional structures, DINC-COVID was able to reproduce the experimentally-determined binding affinities with good correlation (i.e., Pearson's r = 0.86), using Vina and the Charmm ensemble. Unfortunately, due to the very small number of ligands, this correlation is not of high quality (see Appendix B, Fig. 14). However, other correlations (of higher quality) achieved by DINC-COVID are superior to those achieved by DockThor-VS and COVID-19-DS (see Table 4). Interestingly, results produced by the Vinardo scoring present the worst correlations for all three DINC-COVID ensembles in this experiment. This is in contrast with good results produced by Vinardo in previous experiments, and might indicate a limitation of this function when ranking peptidomimetic ligands.
Table 4
Correlations between docking-derived binding energies and experimentally-determined binding affinities of Mpro peptidomimetic inhibitors [18]. Predicted binding energies are reported for DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). Pearson's correlation coefficient between binding energies and ln(IC50) values is reported below; the two highest values are highlighted.
Ligand
IC50
DINC-COVID
COVID-19-DS
DockThor-VS
Crystal Ensemble
Charmm Ensemble
Gromos Ensemble
Vinaa
Vinardoa
AD4a
Vinaa
Vinardoa
AD4a
Vinaa
Vinardoa
AD4a
6LU7b
6W63b
Boceprevir
3.1
−8.33
−8.12
−12.33
−7.95
−7.45
−10.55
−12.33
−7.09
−10.41
−7.20
−7.80
−8.77
Narlaprevir
5.1
−8.25
−9.35
−16.81
−7.91
−7.27
−15.03
−16.81
−7.77
−12.42
−6.80
−9.16
−7.87
Telaprevir
18
−9.08
−9.64
−14.60
−7.98
−8.72
−13.10
−14.60
−8.80
−12.68
−8.40
−9.00
−8.90
Leupeptin
92
−6.94
−7.96
−9.79
−6.57
−7.37
−8.77
−9.79
−6.72
−8.60
−6.10
−7.15
−8.46
Pearson'sr
0.60
0.24
0.61
0.86
−0.14
0.54
0.61
0.18
0.53
0.32
0.48
−0.12
IC50: half maximal inhibitory concentration (μM).
Binding energies in kcal/mol for the top scoring conformation according to the indicated scoring function.
Binding energies in kcal/mol for the top scoring conformation when bound to the indicated receptor.
Correlations between docking-derived binding energies and experimentally-determined binding affinities of Mpro peptidomimetic inhibitors [18]. Predicted binding energies are reported for DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). Pearson's correlation coefficient between binding energies and ln(IC50) values is reported below; the two highest values are highlighted.IC50: half maximal inhibitory concentration (μM).Binding energies in kcal/mol for the top scoring conformation according to the indicated scoring function.Binding energies in kcal/mol for the top scoring conformation when bound to the indicated receptor.These three datasets contain Mpro inhibitors with experimentally determined IC
50 values. However, these values were produced by different groups using different protocols, and might not be fully comparable. That is why we initially decided to use these datasets separately. On the other hand, working with small datasets introduces other issues regarding the reliability of the obtained ligand rankings. Therefore, we performed an additional correlation analysis in which the three Mpro datasets were combined into a larger one. Although the observed correlations are not very high, they are positive for all methods (see Table 5
). The best correlations were achieved by DINC-COVID using Vinardo and the Crystal ensemble (with Pearson's r = 0.41 and Spearman's ρ = 0.42). Again, Indinavir was clearly an outlier (see Appendix B, Fig. 15), and removing it produced better correlations. Without it, the best results are achieved by DINC-COVID using Vinardo and the Crystal ensemble according to Pearson's r (= 0.49), or by DockThor-VS using the 6LU7 receptor according to Spearman's ρ (= 0.52). Finally, using a larger dataset allowed us to compute another statistic for all methods: the area under the receiver operating characteristic (ROC) curve (see Appendix B, Fig. 16). This analysis confirmed the good performance of DINC-COVID when using Vinardo and the Crystal ensemble, but also indicated a competitive performance of DockThor-VS when using the 6LU7 receptor.
Table 5
Correlations between docking-derived binding energies and experimentally-determined binding affinities of all 29 Mpro inhibitors. Correlations coefficients are reported for DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations); in each row, the two highest coefficients are highlighted. Results for correlation analyses without Indinavir are also reported.
DINC-COVID
COVID-19-DS
DockThor-VS
Crystal Ensemble
Charmm Ensemble
Gromos Ensemble
Vina
Vinardo
AD4
Vina
Vinardo
AD4
Vina
Vinardo
AD4
6LU7
6W63
Pearson'sr
0.34
0.41
0.32
0.27
0.32
0.27
0.02
0.29
0.29
0.31
0.27
0.35
w/o Indinavir
0.41
0.49
0.40
0.34
0.41
0.34
0.03
0.36
0.36
0.39
0.41
0.42
Spearman'sρ
0.32
0.42
0.34
0.26
0.32
0.24
0.12
0.28
0.25
0.32
0.41
0.32
w/o Indinavir
0.42
0.51
0.43
0.35
0.43
0.32
0.18
0.34
0.32
0.42
0.52
0.40
Correlations between docking-derived binding energies and experimentally-determined binding affinities of all 29 Mpro inhibitors. Correlations coefficients are reported for DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations); in each row, the two highest coefficients are highlighted. Results for correlation analyses without Indinavir are also reported.Taken together, these results constitute a proof-of-concept validation of the predictive capabilities of DINC-COVID, as it consistently produced predictions in good agreement with experimental data across all studied datasets. The results also provide insights on the differences between docking outputs produced by different ensembles, as well as a comparison with webservers that do not perform ensemble docking. For instance, when making predictions for ligands that do not require significant adjustments in the receptor conformation, good results can be obtained with an ensemble of crystal structures (i.e., with only limited receptor flexibility) and even without an ensemble docking protocol, as illustrated by results obtained with the other webservers (see Table 2, Table 3). On the other hand, MD-derived conformations have greater chances of successfully accommodating ligands that require greater receptor flexibility for binding (see Table 4).
DINC-COVID predictions capture different ranges of receptor flexibility
Overall, our results highlight the benefits of using ensemble docking as an efficient way to account for receptor flexibility in molecular docking targeting SARS-CoV-2 proteins. In DINC-COVID, receptor flexibility is not adjusted on-the-fly during docking, as it is accounted for implicitly through the use of pre-computed ensembles [20]. However, we tried to compensate for this limitation by generating different ensembles for each target binding site. Having several pre-computed ensembles at their disposal allows users to explore different ranges of receptor flexibility, according to their needs.The differences in range between the provided ensembles are real and can be quantified (see Fig. 6
). For instance, the Crystal ensemble created for Mpro captures mostly side-chain rearrangements in the receptor, and only subtle variations of the backbone: the all-heavy-atom RMSD (for binding site residues) between members of the Crystal ensemble is lower than 0.5 Å. On the other hand, the Gromos ensemble created for Mpro captures much greater conformational changes, with up to 5 Å all-heavy-atom RMSD in relation to structures in the Crystal ensemble. Interestingly, the Charmm ensemble created for Mpro includes conformations in an intermediate range of flexibility.
Fig. 6
RMSD analysis (for binding site residues) of Mpro ensembles. (A) All-against-all root-mean-square deviation (RMSD) between conformations included in the Mpro Crystal ensemble (i.e., experimental data). (B) RMSD of all conformations of the Mpro Charmm ensemble (x-axis) against all conformations of the Mpro Crystal ensemble (y-axis). (C) RMSD of all conformations of the Mpro Gromos ensemble (x-axis) against all conformations of the Mpro Crystal ensemble (y-axis). All RMSD values were computed using all the heavy atoms of residues in the catalytic binding site. The range bar on the right, with RMSD values in Angtroms, indicates the color code used in all three plots. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
RMSD analysis (for binding site residues) of Mpro ensembles. (A) All-against-all root-mean-square deviation (RMSD) between conformations included in the Mpro Crystal ensemble (i.e., experimental data). (B) RMSD of all conformations of the Mpro Charmm ensemble (x-axis) against all conformations of the Mpro Crystal ensemble (y-axis). (C) RMSD of all conformations of the Mpro Gromos ensemble (x-axis) against all conformations of the Mpro Crystal ensemble (y-axis). All RMSD values were computed using all the heavy atoms of residues in the catalytic binding site. The range bar on the right, with RMSD values in Angtroms, indicates the color code used in all three plots. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)Most importantly, these differences between ensembles are useful, since the best results are not always provided by the same ensemble. As an extension, the diversity within ensembles is also useful, since the best ligand binding modes are not always obtained with the same receptor conformation from a given ensemble (see Fig. 7
). In summary, our analysis shows that both levels of diversity – i.e., within an ensemble, and between ensembles – contribute to the good quality of DINC-COVID predictions.
Fig. 7
Alternative receptor conformations included in the top DINC-COVID outputs. The heatmaps show the number of times each receptor conformation is included in the top 12 best ranked binding modes produced by the ensemble docking of Mpro inhibitors. Individual receptor conformations are indicated on the left-side y-axis by their PDB code or conformation number (#) in the MD-based ensembles. Scoring functions used during the rescoring phase are listed on the x-axis. The white color indicates that the receptor conformation was never observed in the top ranked binding modes. These heatmaps are derived from experiments involving the (A) Crystal ensemble, (B) Charmm ensemble, and (C) Gromos ensemble. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Alternative receptor conformations included in the top DINC-COVID outputs. The heatmaps show the number of times each receptor conformation is included in the top 12 best ranked binding modes produced by the ensemble docking of Mpro inhibitors. Individual receptor conformations are indicated on the left-side y-axis by their PDB code or conformation number (#) in the MD-based ensembles. Scoring functions used during the rescoring phase are listed on the x-axis. The white color indicates that the receptor conformation was never observed in the top ranked binding modes. These heatmaps are derived from experiments involving the (A) Crystal ensemble, (B) Charmm ensemble, and (C) Gromos ensemble. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)A third level of diversity in DINC-COVID output is introduced by the scoring function used to rank all sampled conformations. Our results show different frequencies for the selection of receptor conformations, depending on the combination of scoring function and conformation ensemble (see Fig. 7). Some of the starkest differences are observed when using the Charmm ensemble: the greatest diversity in receptor conformations included in the top conformations is observed with the AD4 scoring function, while Vina and Vinardo produce lower levels of diversity.It is worth emphasizing that in its current implementation DINC-COVID is not automatically selecting one of the available scoring functions, and is not providing a consensus ranking across all three functions. Instead, each output produced by the server includes the three alternative rankings of all sampled conformations, according to the three available scoring functions. This approach provides users with the flexibility to either follow one of the rankings or to analyze the results further, considering expert knowledge about the scoring functions or the ligand/receptor of interest.
Discussion
DINC-COVID offers a ready-to-use solution for researchers to account for protein flexibility while testing compounds against SARS-CoV-2 proteins. It allows users to run ensemble docking experiments without the additional burden of time-consuming simulations required for ensemble generation and docking preparation. Since sampling conformational flexibility for large proteins and multimeric complexes is a challenging and computationally intensive task, we decoupled it from the docking procedure itself. By making the pre-computed ensembles readily available in our webserver, we aim to facilitate and speed up the use of ensemble docking by a broader audience of users searching for new SARS-CoV-2 inhibitors.Users of our webserver can choose between four target binding sites: the catalytic site of Main protease (Mpro), Papain-like protease (PLpro) and RNA-dependent RNA polymerase (RdRp), as well as the allosteric binding site of Mpro. For each target binding site, three distinct ensembles are provided, including experimental data (i.e., structures from X-ray crystallography) or simulated data (i.e., structures from molecular dynamics with different force fields). The choice of the force field has significant impacts on the results of MD simulations, since it encompasses a series of empirically-determined parameters [62,63]. By relying on two force fields that use distinct representations for hydrogen atoms and different water models, we aim to limit the impact of force field bias on conformational sampling.It is important to recognize that the conformational ensembles available on the DINC-COVID webserver reflect different scales of protein flexibility, from subtle side-chain rearrangements (e.g., in the Crystal ensemble) to larger backbone motions (e.g., in the Gromos ensemble). Note that we have carefully prepared the structures provided in the ensembles, which includes verifying the protonation state of His, Glu and Asp residues. We have also provided users with the possibility of automatically preprocessing the ligand before docking. Altogether, these features make our server of potential interest even for advanced users, adding robustness to docking analyses with SARS-CoV-2 proteins.The main contribution of our webserver is to account for receptor flexibility while sampling ligand binding modes. This is made possible through a parallelized ensemble docking protocol using DINC. DINC has been extensively validated in previous work, through both self-docking and cross-docking experiments [30,64,65]. On the other hand, the DINC-COVID webserver cannot be properly assessed through conventional self-docking experiments, since the goal of ensemble docking is precisely to find alternative low-energy binding modes. For instance, when using an MD-derived ensemble, the best ranked receptor conformations will usually not be the ones with the lowest RMSD to a reference crystal structure. Our approach relies on the fact that a set of diverse receptor conformations can accommodate a much broader range of ligand conformations and sizes than a single receptor conformation [22]. To sum up, the goal of ensemble docking is not to reproduce binding geometries observed in crystallographic structures, but to produce conformations that better reflect the possible binding modes of a protein-ligand complex in solution.In this context, we searched for experimental binding affinities of inhibitors for SARS-CoV-2 proteins, to evaluate whether DINC-COVID could properly rank these ligands. Good correlations between predicted binding energies and experimental binding affinities were obtained with both PLpro inhibithors (e.g., Pearson's r = 0.9, see Table 1) and Mpro inhibithors. Interestingly, we obtained good correlations between predicted binding energies and experimental binding affinities with a recently published set of antiviral peptidomimetic inhibitors for Mpro (e.g., Pearson's r = 0.86, see Table 4). As noted, the binding of these peptidomimetics requires malleability of Mpro catalytic site [18], and as such could not be predicted by docking methods using a single receptor conformation (i.e., COVID-19-DS and DockThor-VS). These results highlight the potential of DINC-COVID to identify completely novel binding modes. Such binding modes could involve conformational changes of the receptor's binding site, and could potentially be predicted regardless of the availability of experimental data on such receptor's conformation.As a note of caution, we should reiterate that the analyzed datasets were small, which can introduce biases regarding their composition (e.g., impact of outliers) and the choice of statistical analysis. We tried to address these limitations by discussing the impact of different scenarios in the analysis of these datasets. In the case of the Mpro datasets, we also performed an alternative analysis in which we aggregated all results into a single dataset. Despite all the observed variability across different scenarios, the ensemble docking strategy of DINC-COVID produced results that were consistently in agreement with experimental data, outperforming other webservers in many of these experiments.It is important to acknowledge that accurate scoring remains an open challenge in the field of molecular docking [66], especially when considering large and flexible ligands [30]. In the context of ensemble docking, accurately scoring and ranking sampled binding modes becomes even more challenging. To try and compensate for the limitations of relying on a single scoring function, our meta-docking approach provides scoring of the results with three alternative functions. Users can then decide if they want to rely on a specific scoring function, or build a consensus across the three functions. In fact, our results show that the scoring function can affect the diversity of binding modes produced by DINC-COVID, which is a factor that users might want to explore in different ways depending on their specific application. Most importantly, despite the challenges and known limitations of available scoring functions, we were able to obtain good correlation with experimental data in our validation experiments.There are countless scoring functions proposed in previous studies and implemented in other tools. DINC-COVID predictions could certainly be improved by using scoring functions that might be tailored to a particular type of ligand. In order to explore this possibility, users can request to download a larger number of output conformations, and rescore them locally with the most appropriate scoring function. Future work on DINC-COVID will include the implementation of additional scoring functions during the sampling phase, and consensus ranking during the rescoring phase.The quality of results produced by ensemble docking is directly influenced by the quality of the conformations included in the ensemble [22]. For instance, it was shown that including too many conformations in an ensemble can deteriorate docking results [22,67], since it exacerbates existing limitations of both sampling and scoring. As briefly discussed in the Methods section, we used a reasonable and efficient strategy to extract conformations from MD simulations. By combining standard procedures for dimensionality reduction and clustering, we tried to maximize coverage of the conformational landscape, while minimizing the number of conformations included in each ensemble. In fact, we used the same strategy to generate the Crystal ensembles, as opposed to simply using all available crystal structures. Again, using a different strategy for selecting conformations could potentially improve DINC-COVID results [29,68]. However, demonstrating the optimality of our selection strategy goes beyond the scope of this study. It is worth noting that our proof-of-concept validation indicates the success of decisions made for conformation generation and conformation selection, providing us with a useful protocol that can be further improved upon.Finally, the choice of making DINC-COVID available through a webserver also comes with trade-offs. On one hand, it takes away most of the burden associated with software installation and the preparation of input files. We hope that this format will enable users who are not familiar with molecular docking to go ahead and run their own tests, potentially evaluating new compounds that are not yet available in public databases. On the other hand, it constrains the use of our ensemble docking strategy to the specific algorithms and parameters we considered. Fortunately, our implementation offers the possibility to make this pipeline available for local customization and execution. In particular, we are already working on leveraging the use of Docker containerization and widely-used python packages, to develop a future version of DINC-COVID that could be deployed on local computational resources. This will enable a full customization of DINC-COVID, and foster its use for large-scale virtual screening of viral variants.
Funding
This work was funded in part by the IIBR:Informatics:RAPID program (2033262), by the (Brazil, 437373/2018-5), and by funds. SHS is supported by a Training Program fellowship (T15LM007093-29). DD is a cross-disciplinary post-doctoral fellow supported by funding from the and (MC_UU_00009/2). DAA and MMR are supported by a Computational Cancer Biology Training Program fellowship ( Grant No. RP170593). LEK is supported in part by U01CA258512.
CRediT authorship contribution statement
Sarah Hall-Swan: Methodology, Software, Writing – original draft, preparation. Didier Devaurs: Methodology, Software, Data curation, Validation, Writing – review & editing. Mauricio M. Rigo: Data curation, Validation, Writing – review & editing. Dinler A. Antunes: Conceptualization, Supervision, Methodology, Writing – review & editing. Lydia E. Kavraki: Conceptualization, Supervision, Project administration, Funding acquisition. Geancarlo Zanatta: Conceptualization, Methodology, Data curation, Validation, Supervision, Project administration.
Declaration of competing interest
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Neither the manuscript nor any parts of its content are currently under consideration or published in another journal.
Table 6
Structures composing the Crystal ensemble for Mpro catalytic site. This set of 25 conformations was selected from a total of 156 crystal structures including both apo (i.e., unbound) and holo (i.e., ligand-bound) Mpro conformations. Note that the crystallographic file 6M2N has two full dimers, with protomers showing alternative protonation states; two of these states were selected and included in the final ensemble.
PDB ID
State
PubChem CID
Resolution (Å)
5R84
holo
1072430
1.83
5RE9
holo
880785
1.72
5RER
holo
404914157
1.88
5RF5
holo
265635
1.74
5RFA
holo
1224835
1.52
5RFH
holo
60645778
1.58
5RFJ
holo
3511405
1.8
5RFK
holo
20754800
1.75
5RGU
holo
146037571
2.11
5RH2
holo
89468951
1.83
5RH6
holo
405243691
1.6
5RH9
holo
405243697
1.91
6M0K
holo
405243775
1.5
6M2N
holo
5281605
2.2
6M2N
holo
5281605
2.2
6WNP
holo
10324367
1.44
6XB0
apo
–
1.8
6XBG
holo
–
1.45
6XCH
holo
137348943
2.2
6YVF
holo
44137675
1.6
7BQY
holo
146025593
1.7
7BRO
apo
–
2.2
7BRP
holo
10324368
1.8
7C8R
holo
405559748
2.3
7C8T
holo
11844232
2.05
Table 7
Structures forming the Crystal ensemble for Mpro allosteric site. This set of 24 conformations was selected from a total of 156 crystal structures including both apo and holo Mpro conformations. Note that all selected structures have the allosteric binding site in the apo state and all ligands listed are located in the catalytic binding site.