Literature DB >> 35492172

Improved method of structure-based virtual screening based on ensemble learning.

Jin Li1,2, WeiChao Liu2, Yongping Song3, JiYi Xia2.   

Abstract

Virtual screening has become a successful alternative and complementary technique to experimental high-throughput screening technologies for drug design. Since the scoring function of docking software cannot predict binding affinity accurately, how to improve the hit rate remains a common issue in structure-based virtual screening. This paper proposed a target-specific virtual screening method based on ensemble learning named ENS-VS. In this method, protein-ligand interaction energy terms and structure vectors of the ligands were used as a combination descriptor. Support vector machine, decision tree and Fisher linear discriminant classifiers were integrated into ENS-VS for predicting the activity of the compounds. The results showed that the enrichment factor (EF) 1% of ENS-VS was 6 times higher than that of Autodock vina. Compared with the newest virtual screening method SIEVE-Score, the mean EF 1% and AUC of ENS-VS (mean EF 1% = 52.77, AUC = 0.982) were statistically significantly higher than those of SIEVE-Score (mean EF 1% = 42.64, AUC = 0.912) on DUD-E datasets; and the mean EF 1% and AUC of ENS-VS (mean EF 1% = 29.73, AUC = 0.793) were also higher than those of SIEVE-Score (mean EF 1% = 25.56, AUC = 0.765) on eight DEKOIS datasets. ENS-VS also showed significant improvements compared with other similar research. The source code is available at https://github.com/eddyblue/ENS-VS. This journal is © The Royal Society of Chemistry.

Entities:  

Year:  2020        PMID: 35492172      PMCID: PMC9049841          DOI: 10.1039/c9ra09211k

Source DB:  PubMed          Journal:  RSC Adv        ISSN: 2046-2069            Impact factor:   4.036


Introduction

Virtual screening (VS) is a computational approach used to identify active compounds by predicting their activity. In recent years, it has become a successful alternative and complementary technique to experimental high-throughput screening technologies for drug design, because of its ability to decrease the cost and increase the hit rate of screening greatly.[1-4] Technically, virtual screening can be categorized into two types, namely, ligand-based virtual screening (LBVS) and structure-based virtual screening (SBVS). The similarity principle is used to identify potentially active compounds based on their similarity to known reference ligands in LBVS. This can be done by a variety of methods, including similarity and substructure searching,[5] pharmacophore matching[6] or 3D shape matching.[7] SBVS predicts the active compounds with higher docking quality by involving explicit molecular docking of each ligand into the binding site of the target. Many docking tools are used in SBVS, such as Glide,[8] GOLD,[9] Autodock,[10] and Autodock vina (refer to as Vina).[11] Because SBVS is based on the physical interactions between the protein target and the ligands, whereas LBVS is based on the similarity of known active compounds, SBVS is more likely to obtain novel compounds than LBVS. Another advantage of SBVS is the ability to perform interaction analysis to understand the affinity and selectivity of the compounds by using the docked structures. However, the classical scoring functions implemented in the docking software usually use simple function form and the linear regression method, which leads to the binding affinity between the target and the compound not being predicted accurately. Therefore, how to increase the hit rate becomes one of the most challenging tasks in SBVS. In recent years, researchers have applied machine learning methods[12] to improve the performance of VS and achieved good results, such as support vector machine (SVM), decision tree, neural network, deep-learning, etc.[13-16] Unlike the classical scoring functions with assumed mathematical functional form, machine learning-based scoring functions implicitly learn the relationships among protein–ligand complexes by non-linear regression.[17] However, it is hard to achieve high accuracy by one learner, the emergence of ensemble learning such as bagging,[18] boosting[18,19] and random forest,[20-22] can gain better accuracy. Moreover, it has been widely accepted that target-specific scoring functions may achieve better performance compared with universal scoring functions in actual drug research and development processes.[23,24] Therefore, we intended to build a target-specific VS model based on ensemble learning. In this model, we treated the ligand activity labelling task as a classification problem. Feature selection is one of the most important factors affecting the performance of machine learning methods. In the past, two types of descriptors (features for the active and non-active compounds classification) were usually used to describe the features of active and non-active compounds. One is protein–ligand interaction energy terms[16,25] which have no enough predictive power, since it is relatively too simple. The other is molecular fingerprint[26-28] which is prone to the overfitting due to too many descriptors. Therefore, we propose our first scientific question: How do we choose such descriptors that can effectively distinguish active compounds from non-active ones? Virtual screening aims to distinguish active compounds from a large number of non-active compounds. However, it will result in high recall and low precision[12] due to the serious imbalanced numbers of active and non-active compounds in current commonly used training data.[29,30] Previous studies[13,31,32] usually use random under-sampling to solve this problem, but it is easy to lose the important information of the non-active compounds. Therefore, we propose our second scientific question: How do we effectively utilize the information of imbalanced data? On the other hand, since most of the previous studies[13,16,33,34] just use only one machine learning algorithm for classification, such as SVM[35] and neural network;[36] and the ensemble learning methods only use one base learner.[18,19] One type of learner may not work well for most targets. For this reason, we propose our third scientific question: Can we integrate more machine learning algorithms and build a stable model which is suitable for most targets? According to these aforementioned scientific questions, we present a target-specific virtual screening method based on ensemble learning named ENS-VS, which has the following three innovations. Firstly, we select a moderate number of descriptors to classify the active and non-active compounds by considering both protein–ligand interaction energy terms and the structure character of the ligand. Secondly, we develop a method to solve the data imbalanced problem based on previously well-developed sampling ensemble method.[37,38] Finally, an ensemble learning approach is developed by integrating the SVM,[35] decision tree[39] and Fisher linear discriminant (refer to as Fisher)[40] algorithms to improve the predictive accuracy.

Materials and methods

Materials

The Directory of Useful Decoys, Enhanced (DUD-E)[29] database was used to evaluate the performance of ENS-VS. DUD-E contains 102 targets. All targets have two types of ligands: actives (active compounds) and decoys (non-active compounds), which can be labelled as 1 and −1 for classification model training. Since the decoys are similar in physico-chemical properties to the actives but different in their chemical structures, the datasets are more reliable for testing virtual screening method. The number of decoys is much larger than that of actives. If the number of the actives for model training is too small, it cannot sufficiently represent the distribution of the positive data. And if the samples is less than the number of the features in machine-learning model, the risk of overfitting will be high. In our method, the number of the features is more than one hundred. For this reason, we selected 37 targets with more than 200 actives to build 37 target-specific models by ENS-VS. 12 out of 37 targets that cover a wide range of popular drug targets were selected to show the detail information, which contain 3 proteases, 2 nuclear receptors, 3 kinases, 2 GPCR, and 2 other target families. The initial number of actives, decoys and the protein targets used for model training are listed in Table 1.

Protein targets for benchmarking collected from DUD-E

FamilyaProteinPDBActivesbDecoysc
Proteasetry1 2ayw 44925 980
Proteasethrb 1ype 46127 004
Proteasebace1 3l5d 28318 100
Nuclearesr1 1sj0 38320 685
Nuclearppara 2p54 37319 339
Kinasesrc 3el8 52434 500
Kinaseegfr 2rgp 54225 050
Kinasevgfr2 2p2i 40924 950
GPCRaa2ar 3eml 48231 550
GPCRadrb1 2vt4 24715 850
Othershivrt 3lan 33818 891
Otherspgh2 3ln1 43523 150

Protein family classification of selected protein targets.

Number of actives collected from DUD-E.

Number of decoys collected from DUD-E.

Protein family classification of selected protein targets. Number of actives collected from DUD-E. Number of decoys collected from DUD-E. The DEKOIS 2.0 database[41] was used as an independent test set. The active compounds of DEKOIS were collected from ChEMBL database. The decoy compounds were generated from the ZINC database, regarding high physicochemical similarity between actives and decoys and avoidance of potentially active compounds. In this evaluation, the ligands in DEKOIS datasets were used for testing the model trained by DUD-E datasets. Eight DEKOIS2.0 targets with more than 200 actives in DUD-E were selected for the test: aa2ar (a2a), aces (ache), adrb2 (adrb2), akt1 (akt1), fa10 (fxa), egfr (egfr), hivrt (hiv1rt) and ppara (ppara). The former names and the latter names in the brackets were used in DUD-E and DEKOIS datasets, respectively. Structurally similar compounds (similarity ≥ 0.8) between training data and test data were excluded from training set.

Workflow

The workflow of ENS-VS development is shown in Fig. 1. The generic workflow includes the following steps: (i) dock all the actives and decoys into the binding pocket of the target and select the best pose of the ligands ranked by Autodock vina (step 1 of Fig. 1); (ii) calculate the five protein–ligand interaction energy terms and the structure vector representation of the ligands; and then create the feature matrix (step 2 of Fig. 1); (iii) train the ensemble classifier on the training set; and tune the hyperparameter based on the validation dataset (step 3 of Fig. 1); (iv) test the model by the test set and calculate performance metrics (step 4 of Fig. 1).
Fig. 1

The workflow of ENS-VS development.

Molecular docking

The generic process for docking simulation includes the following steps: (i) prepare proteins and ligands by adding hydrogens but merging non-polar hydrogens and removing water molecules. (ii) Convert the PDB files of the protein and the mol2 files of ligands into PDBQT formats by the python script prepare_protein4.py and prepare_ligand4.py in MGLTools.[42] (iii) Dock the actives and decoys to their target by Autodock vina.[43] The grid box is set to 20 × 20 × 20 with the center of the crystal ligand, and num_modes is set to 1. The num_modes is used to set the maximum number of binding modes generated by Vina. The binding modes are sorted by the scoring function of Vina. Here, we only obtain the top scoring binding mode. The rest of parameters are assigned default values. (v) The top scoring conformation will be obtained as the optimal binding mode of the ligand (step 1 of Fig. 1).

Descriptors selection

We used a combination descriptors including interaction energy terms and ligand features. Fergus et al.[44] combined 1D or 2D fingerprint as ligand features to improve the machine learning scoring functions which are used protein–ligand interactions as features. Their method achieved good results. But these ligand features were conformation independent. Therefore, we intended to integrate ligand features which can describe the 3D structures of the ligands. The selection of descriptors is from two aspects: protein–ligand interaction and the structure characteristic of the ligand. First of all, five widely used energy terms are used to describe protein–ligand interactions: van der Waals interactions, directional H-bond interactions, electrostatic interactions, desolvation potential energy and conformational entropy loss, calculated by the amber energy terms (eqn (1)–(5) in Table 2, and the key terms are defined in Table 3) in Autodock[45] (left panel of step 2 of Fig. 1).

Protein–ligand interaction energy terms

Energy termsFormula
van der Waals interactions (1)
Directional H-bond interactions (2)
Electrostatic interactions (3)
Desolvation potential energy (4)
Conformational entropy lossΔGtor = −Ntor(5)

The legend table defines the key terms of the eqn (1)–(5) in Table 2

TermsExplanation
p, qAtom types of atoms i and j, respectively
A pq, BpqLennard-Jones 12–6 coefficients for non-bonded interactions between atom types p and q
r ij Distance between atoms i and j
C pq, DpqLennard-Jones 12–10 coefficients for hydrogen bonding between atom types p and q
E(θij)The weight dependent upon the angle between i and j, with coulombic electrostatic shielding
ΔGp,waterFree energy change of hydrogen bonding between atom type p and water
q i , qjCharges of atoms i and j
S p Salvation parameter atom type p, defined as the volume change of solvating atom type p
V q Atomic volume of atom type q
N tor Number of rotatable bonds
Secondly, the structure vectors of the ligands are generated by Mol2vec.[46] Mol2vec is an unsupervised machine learning approach to learn vector representations. Compounds can finally be encoded as vectors by summing the vectors of the individual molecular substructures. The resulting Mol2vec model is pretrained once, yields dense vector representations, and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. Therefore, we used the ligand structure vectors generated by Mol2vec as ligand features. After that, the structure vectors undergo dimension reduction by principal components analysis (PCA)[47] (right panel of step 2 of Fig. 1). Lastly, the protein–ligand interaction energy terms is combined with the reduced dimension structure vectors of the ligands to form a combination descriptor.

Ensemble classifier construction

The ENS-VS construction process (step 3 of Fig. 1) includes the following steps (Fig. 2).
Fig. 2

The workflow of the ensemble learning in ENS-VS.

Firstly, the data set of each target is divided into training set, validation set and test set according to the proportion of 8 : 1 : 1. Training set is used for training model, validation set is used for adjusting hyperparameters, and test set is used for testing the performance of the model. Secondly, a number of decoy subsets with the same size as actives are sampled from the original decoys. Each subset of decoys and all of actives compose a subset for training sub-classifier, which contains part information of decoys and all information of actives. We use these subsets to train sub-classifiers separately, and combine the trained sub-classifiers by bagging. Undersampling is an efficient strategy to deal with class imbalance. However, the drawback of undersampling[48] is that it throws away many potentially useful data. But our algorithm makes better use of the majority class than undersampling, because multiple subsets contain more information than a single one. In order to select independent identical distribution samples, stratified sampling method is used to perform decoy subset sampling. Decoys are clustered by k-means algorithm, and the number of samples that selected from each cluster is determined by the variance of each cluster (eqn (8)). When the variance of the cluster is high, the data in the cluster are sparse, thus more samples need to be sampled from the cluster to keep the structural feature information of the original dataset. On the contrary, when the variance is low, the data in the cluster are relatively close, thus less samples need to be sampled from the cluster. Let μ (eqn (6)) and σ2 (eqn (7)) represent the mean value and the variance of cluster C, respectively. The number of samples should be extracted from one cluster is calculated by eqn (8).where, x denotes the sample in cluster C; k denotes the number of clusters; n denotes the number of samples in cluster C; |P| denotes the total number of actives; |N| denotes the total number of decoys; w denotes the proportion of n to |N|, namely, Thirdly, three types of classifiers including SVM,[49] decision tree[39] and Fisher[50] are trained on each training subset. Fscore&Diff method is designed to select a good and different single classifier among all the sub-classifiers. Fscore is calculated by eqn (9) and Diff is calculated by eqn (12). Fscore&Diff method selects a sub-classifier whose Fscore is greater than and Diff is less than the average value of all the sub-classifiers.where, TP is the number of predicted true positives; FP is the number of predicted false positives; P is the number of positives.where, r is the Pearson correlation coefficient between the results predicted by classifier i and classifier j, and Θ denotes all the classifiers. Finally, all generated classifiers are then combined by the weighted average method for the final decision. The weight of each classifier is calculated as follows:where, ε is the error rate of the ith sub-classifier. The parameters of sub-classifiers are set as follows: SVM classifier uses linear kernel, and decision tree and Fisher use default parameters. The hyperparameter to be adjusted is the number of the subsets. We use Matlab 2014a software[25] to implement this method. The core algorithm of ENS-VS is listed in Table 4.

Core algorithm of ENS-VS

Evaluation metrics

Receiver Operating Characteristic curve (ROC), Area Under Curve (AUC), Matthews correlation coefficient (MCC), the enrichment factor (EF) 1% values and the EF 10% values were used to evaluate the performance of this method. The ROC curve is used to visualize the performance of a classifier. AUC represents the probability that a randomly chosen positive sample is ranked higher than a randomly chosen negative sample. MCC is used in machine learning as a measure of the quality of binary (two-class) classifications. It takes into account true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN); and it is generally regarded as a balanced measure which can be used even if the classes are of very different sizes. This value is calculated by eqn (17). EF values are commonly used in machine learning studies as accuracy metrics. The EF x% value is defined as the ratio between the predicted hit rate and the random hit rate, when the top x% ranked compounds are selected as actives. This value is calculated by eqn (18).

Results and discussion

In order to find out the points that contribute to the performance of ENS-VS, we designed three comparison tests based on the 12 datasets in Table 1. The MCC and AUC were used as the metrics for evaluation. The Mann–Whitney U test was used for testing the significance. First, we used protein–ligand interaction descriptor instead of the combination descriptor. This comparison model is denoted as ComModel1. The MCC and AUC results for 12 targets are presented in Fig. 3. The MCC and AUC of ComModel1 were all less than those of ENS-VS for 12 targets. The mean MCC and mean AUC of ComModel1 (MCC = 0.121, AUC = 0.836) were both statistically significantly less than those of ENS-VS (MCC = 0.82, AUC = 0.989), with p < 0.05 (Fig. S1†). It can be seen that the combination descriptor selected by ENS-VS is effective for improving the performance of the VS model.
Fig. 3

The MCC and AUC of ComModel1 and ENS-VS for 12 targets.

Second, ENS-VS was modified by only undersampling once from decoys. This comparison model is denoted as ComModel2. The results are presented in Fig. 4. The MCC and AUC of ComModel2 were less than those of ENS-VS for each target. The mean MCC and AUC of ENS-VS (MCC = 0.82, AUC = 0.989) were statistically significantly better than those of ComModel2 (MCC = 0.44, AUC = 0.973), with p < 0.05 (Fig. S2†). It is revealed that the processing method for the problem of data imbalance in this study is effective for improving the prediction performance of the VS model.
Fig. 4

The MCC and AUC of ComModel2 and ENS-VS for 12 targets.

Third, three types of classifiers in ENS-VS were replaced by only one type of classifier: SVM, decision tree and Fisher, denoted as ComModel3_SVM, ComModel3_Dtree and ComModel3_Fisher, respectively. The mean MCC of ENS-VS was statistically significantly higher than that of ComModel3_SVM, ComModel3_Dtree and ComModel3_Fisher (MCC: ENS-VS = 0.82, ComModel3_SVM = 0.75, ComModel3_Dtree = 0.60, ComModel3_Fisher = 0.60), with P < 0.05 (Fig. 5). The AUC of ENS-VS was statistically significantly higher than that of ComModel3_SVM and ComModel3_Dtree, and had no significant difference compared with ComModel3_Fisher (AUC: ENS-VS = 0.989, ComModel3_SVM = 0.984, ComModel3_Dtree = 0.978 and ComModel3_Fisher = 0.99). The results show that ENS-VS integrating three types of classifier effectively improves the prediction performance of the VS model.
Fig. 5

The MCC and AUC of ComModel3_SVM, ComModel3_Dtree, ComModel3_Fisher and ENS-VS for 12 targets.

Next, we compared ENS-VS with Autodock vina,[11] because we used Vina to generate the poses of the ligands in ENS-VS. The EF and AUC results for the diverse subsets of DUD-E are shown in Table 5. The ROC curves are shown in Fig. 6. The EF 1% and EF 10% results for ENS-VS were both improved significantly for all twelve targets. On average, the EF 1% for ENS-VS was 6 times higher than that for Vina, which indicated that 6 times more active compounds were found by ENS-VS than by Vina on average when the top 1% ranked compounds were biologically assayed for these target proteins. The ROC curve of ENS-VS was very close to the upper left corner for each target, which means that the classifier is effective.

Comparison of EF 1%, EF 10% and AUC results between ENS-VS and Autodock vina for 12 targets. The bold means the better value between the two methodsa

TargetEF 1%EF 10%AUC
VinaENS-VSVinaENS-VSVinaENS-VS
try112.71 58 4.59 9.78 0.786 0.974
thrb3.9 53.7 3.75 9.99 0.798 0.998
bace14.94 59.5 2.97 9.3 0.713 0.975
esr118.23 53 4.49 9.73 0.801 0.986
ppara6.7 51 5.6 9.99 0.871 0.999
src3.8 55.44 2.02 9.8 0.647 0.988
egfr3.53 65 2.04 9.81 0.634 0.998
vgfr29.06 61 3.42 10 0.714 0.998
aa2ar2.08 62.97 1.68 9.58 0.616 0.977
adrb13.23 64 2.47 10 0.717 0.999
hivrt4.46 56 2.23 10.02 0.654 0.999
pgh224.44 46.09 5.1 9.32 0.75 0.974
Average8.09 57.14 3.36 9.78 0.725 0.989

The bold means the best value among the two models.

Fig. 6

ROC curve comparing the performance of Autodock vina (blue line) and that of the ENS-VS (red line) at discriminating actives from decoys for 12 targets. Random performance is indicated by the black line.

The bold means the best value among the two models. We also considered a comparison with RF-Score-VS_v3_vina[22] and SIEVE-Score.[51] RF-Score-VS[20-22] is a state-of-the-art machine learning-based scoring function. RF-Score-VS_v3_vina is the latest version of RF-Score-VS with docking pose generation by Vina. SIEVE-Score is the newest study about virtual screening method and it has been proved that SIEVE-Score achieves a better performance than three versions of RF-Score-VS. Fig. 7 shows boxplots for ENS-VS, Autodock vina, RF-Score-VS_v3_vina and SIEVE-Score on 37 targets. The results of RF-Score-VS and SIEVE-Score are taken from the original paper of SIEVE-Score.[51] Each boxplot shows the EF 1% results on the DUD-E datasets. The EF 1% of RF-Score-VS_v3_vina was higher than that of Vina and less than that of SIEVE-Score. But the performance of ENS-VS about EF 1% was the best among the four methods. Fig. 8 presents a scatter plot of the EF 1% results for ENS-VS vs. SIEVE-Score. Each point represents a target. ENS-VS achieved better predictions for 30 of the 37 DUD-E targets and was tied with SIEVE-Score for the remaining seven targets. The overall EF 1% of ENS-VS for all 37 targets was significantly higher than that of SIEVE-Score (mean EF 1%: ENS-VS = 52.77, SIEVE-Score = 42.64), with p < 0.05. Similarly, the overall EF 10% (mean EF 10%: ENS-VS = 9.72, SIEVE-Score = 7.66) and AUC (mean AUC: ENS-VS = 0.982, SIEVE-Score = 0.912) were also significantly higher (Fig. S3†).
Fig. 7

Comparison among the results of ENS-VS, RF-Score-VS_v3_vina, SIEVE-Score and Autodock vina. Each boxplot shows the EF 1% values for the 37 target proteins in DUD-E as obtained with the given method.

Fig. 8

Scatter plot of the EF 1% results of ENS-VS and SIEVE-Score. Each point corresponds to the results for one target protein in the DUD-E dataset. The dotted line represents identical results.

We further compared our method with the recent similar research. The selected methods are shown as follows: Refmodel1: Yan et al.[13] developed a classification model (PLEIC-SVM) with protein–ligand empirical interaction components as descriptors. Refmodel2: Ragoza et al.[16] proposed a neural network for protein–ligand scoring consisting of three convolutional layers. They scored all docked poses using a single, universal model, and took the maximum as the final score. Refmodel3: Fergus et al.[34] coupled densely connected CNN with a transfer learning approach to produce an ensemble of protein family-specific models. Refmodel4: Janaina et al.[33] proposed a deep learning approach to improve docking-based virtual screening, which outperformed the other 25 docking methods in both AUC ROC and enrichment factor when evaluated on the DUD datasets. Excluding try1 data set, the AUC value of ENS-VS is the highest of the five methods for the other eleven targets, and the standard deviation of ENS-VS is the lowest (Table 6), which suggests that the performance of ENS-VS is better than the other four methods, and ENS-VS has strong robustness.

AUC of four reference methods and ENS-VSa

TargetsRefmolde1Refmolde2Refmolde3Refmolde4ENS-VS
try10.950.953 0.996 0.974
thrb0.950.9240.978 0.998
bace10.910.8080.930 0.975
esr10.970.9300.951 0.986
ppara0.920.8740.9880.90 0.999
src0.930.9500.9860.85 0.988
egfr0.930.9660.9850.86 0.998
vgfr20.950.9670.9930.90 0.998
aa2ar0.950.9410.9080.77 0.977
adrb10.950.8760.947 0.999
hivrt0.890.7340.7680.88 0.999
pgh20.900.8400.877 0.974
Average0.9330.8970.9420.7370.989
SD0.0240.0730.0660.0490.011

The bold means the best value among the five models.

The bold means the best value among the five models. We also used DEKOIS 2.0 database as independent test sets and performed the test by Vina, Glide, SIEVE-Score, RF-Score-VS_v3_vina and ENS-VS, respectively. The methodology is described in more detail in the Methods section. The EF 1%, EF 10% and AUC of Vina, Glide, SIEVE-Score, RF-Score-VS_v3_vina and ENS-VS are shown in Table 7. Except adrb2 and the EF 10% of fa10, ENS-VS outperformed the other four methods for all the metrics. The mean EF 1%, EF 10% and AUC of ENS-VS are the best among the five methods. Therefore, ENS-VS performs better than Vina, Glide, SIEVE-Score and RF-Score-VS_v3_vina for DEKOIS test sets. The mean EF 1%, EF 10% and AUC of ENS-VS for DEKOIS test sets are all less than those for DUD-E test sets. The reason may be in part that the ligand structural similarity between the training set and the test set of DUD-E is higher than that between the test set of DEKOIS and the training set of DUD-E.

EF 1%, EF 10% and AUC results of Vina, Glide, SIEVE-Score, RF-Score-VS_v3_vina and ENS-VS for eight protein targets of the DEKOIS 2.0 dataseta

TargetEF 1%EF 10%AUC
VinaGlideSIEVE-ScoreRF-Score-VS_v3_vinaENS-VSVinaGlideSIEVE-ScoreRF-Score-VS_v3_vinaENS-VSVinaGlideSIEVE-ScoreRF-Score-VS_v3_vinaENS-VS
aa2ar07.834.716.5 42.9 1.21.48.75.6 9.0 0.7440.7580.8240.805 0.895
aces8.616.930.124.6 33.4 3.47.06.55.8 8.5 0.7210.810.8050.758 0.827
adrb24.86.7 32.6 17.928.72.72.8 9.8 4.78.00.6980.715 0.819 0.7240.798
akt17.513.627.422.5 30.4 3.81.55.04.8 6.7 0.6750.6440.7530.712 0.802
fa105.316.528.816.8 35.7 2.25.8 6.4 3.56.00.7580.7920.8420.776 0.855
egfr011.212.610.8 18.8 1.64.03.52.1 5.8 0.6420.7040.6960.677 0.724
hivrt07.517.511.3 22.3 1.81.56.82.8 8.3 0.6070.5920.6520.628 0.695
ppara05.720.89.7 25.6 2.03.27.33.9 8.0 0.6900.6980.7270.701 0.746
Average3.2810.725.5616.27 29.73 2.343.46.754.15 7.54 0.6920.7140.7650.723 0.793

The bold means the best value among the five methods.

The bold means the best value among the five methods. ENS-VS succeeds in improving the virtual screening accuracy. There are several reasons. First, the combination descriptor can effectively describe both the characteristic of protein–ligand interactions and the structural characteristic of ligands. After PCA dimension reduction, the number of descriptor was moderate. Thus the combination descriptor is able to not only improve the performance of the model but also prevent overfitting. Second, in order to solve the severe imbalance issue of the dataset that was often ignored in previous studies, we designed a method using the ensemble learning mechanism to sample the decoys. Several subsets of decoys with the same size as actives were sampled from original decoys by stratified sampling. The subset of decoys and all of actives composed a subset for training sub-classifier. The final result was decided by all the sub-classifiers. In this way, the decoys are under-sampled in each sub-classifier, but the important information of the decoys is not lost in the whole situation. Third, to solve the problem that a single machine learning method is not suitable for most targets, ENS-VS integrates a variety of classifiers, i.e. SVM, decision tree and Fisher, to increase diversity, and adaptively selects suitable classifiers for different targets by Fscore&Diff method. It can improve the performance and enhance the robustness of the model for different targets by combining the advantages of three types of classifiers. Therefore, from the above analysis, we can conclude that the performance improvement of ENS-VS is related to the selection of descriptors, imbalanced data processing measure and ensemble learning method. Autodock vina is a generic scoring function, which has the advantage of being applicable to any target without retraining. But it is not the case of the better performing target-specific scoring functions. The hit rate is low when Vina is used for virtual screening.[52,53] But using ENS-VS after the pose generation by Vina can improve the accuracy of virtual screening significantly. Another advantage of ENS-VS is that it can be used in combination with other docking software besides Autodock vina to improve their performance of virtual screening. However, this method is based on ensemble learning, it will increase the running time. Therefore, in the future, we will research on the parallel implementation of ENS-VS to improve the execution speed.

Conclusion

In this study, we developed a target-specific virtual screening method called ENS-VS to improve the accuracy of structure-based virtual screening. The combination descriptor of protein–ligand interaction energy term and ligand structure vector representation is used; the processing measure for data imbalanced problem is designed and SVM, decision tree and Fisher classifier are integrated in ENS-VS. We performed comprehensive comparisons of this method with several state-of-the-art methods, namely, Autodock vina, Glide, RF-Score-VS and SIEVE-Score, etc. ENS-VS achieved a significant improvement in screening accuracy for different target proteins in the DUD-E and DEKOIS 2.0 benchmark database based on the EF 1%, EF 10% and the AUCs of the ROC curves. Moreover, ENS-VS can be used in combination with any docking software to improve their performance of virtual screening.

Conflicts of interest

There are no conflicts to declare.
  38 in total

1.  Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-ligand binding interactions.

Authors:  Zhan Deng; Claudio Chuaqui; Juswinder Singh
Journal:  J Med Chem       Date:  2004-01-15       Impact factor: 7.446

2.  General and targeted statistical potentials for protein-ligand interactions.

Authors:  Wijnand T M Mooij; Marcel L Verdonk
Journal:  Proteins       Date:  2005-11-01

3.  Combining machine learning and pharmacophore-based interaction fingerprint for in silico screening.

Authors:  Tomohiro Sato; Teruki Honma; Shigeyuki Yokoyama
Journal:  J Chem Inf Model       Date:  2010-01       Impact factor: 4.956

4.  Evolutionary Cluster-Based Synthetic Oversampling Ensemble (ECO-Ensemble) for Imbalance Learning.

Authors:  Pin Lim; Chi Keong Goh; Kay Chen Tan
Journal:  IEEE Trans Cybern       Date:  2016-06-21       Impact factor: 11.448

5.  A machine learning-based method to improve docking scoring functions and its application to drug repurposing.

Authors:  Sarah L Kinnings; Nina Liu; Peter J Tonge; Richard M Jackson; Lei Xie; Philip E Bourne
Journal:  J Chem Inf Model       Date:  2011-02-03       Impact factor: 4.956

6.  Deep Least Squares Fisher Discriminant Analysis.

Authors:  David Diaz-Vico; Jose R Dorronsoro
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2019-04-11       Impact factor: 10.451

7.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility.

Authors:  Garrett M Morris; Ruth Huey; William Lindstrom; Michel F Sanner; Richard K Belew; David S Goodsell; Arthur J Olson
Journal:  J Comput Chem       Date:  2009-12       Impact factor: 3.376

8.  Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking.

Authors:  Michael M Mysinger; Michael Carchia; John J Irwin; Brian K Shoichet
Journal:  J Med Chem       Date:  2012-07-05       Impact factor: 7.446

9.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking.

Authors:  Pedro J Ballester; John B O Mitchell
Journal:  Bioinformatics       Date:  2010-03-17       Impact factor: 6.937

10.  Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity?

Authors:  Pedro J Ballester; Adrian Schreyer; Tom L Blundell
Journal:  J Chem Inf Model       Date:  2014-02-20       Impact factor: 4.956

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.