| Literature DB >> 27869685 |
Ben Hu1,2, Zheng-Kun Kuang3,4, Shi-Yu Feng5, Dong Wang6, Song-Bing He7, De-Xin Kong8,9.
Abstract
The crystallized ligands in the Protein Data Bank (PDB) can be treated as the inverse shapes of the active sites of corresponding proteins. Therefore, the shape similarity between a molecule and PDB ligands indicated the possibility of the molecule to bind with the targets. In this paper, we proposed a shape similarity profile that can be used as a molecular descriptor for ligand-based virtual screening. First, through three-dimensional (3D) structural clustering, 300 diverse ligands were extracted from the druggable protein-ligand database, sc-PDB. Then, each of the molecules under scrutiny was flexibly superimposed onto the 300 ligands. Superimpositions were scored by shape overlap and property similarity, producing a 300 dimensional similarity array termed the "Three-Dimensional Biologically Relevant Spectrum (BRS-3D)". Finally, quantitative or discriminant models were developed with the 300 dimensional descriptor using machine learning methods (support vector machine). The effectiveness of this approach was evaluated using 42 benchmark data sets from the G protein-coupled receptor (GPCR) ligand library and the GPCR decoy database (GLL/GDD). We compared the performance of BRS-3D with other 2D and 3D state-of-the-art molecular descriptors. The results showed that models built with BRS-3D performed best for most GLL/GDD data sets. We also applied BRS-3D in histone deacetylase 1 inhibitors screening and GPCR subtype selectivity prediction. The advantages and disadvantages of this approach are discussed.Entities:
Keywords: BRS-3D; QSAR; SVM; ligand-based virtual screening; molecular similarity profile; subtype selectivity
Mesh:
Substances:
Year: 2016 PMID: 27869685 PMCID: PMC6273508 DOI: 10.3390/molecules21111554
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Physicochemical properties of BRCD-3D (3D Biologically-relevant Representative Compound Database) ligands and classifications of their corresponding targets. (A–F) Properties distribution of the ligands, calculated by Pipeline Pilot 8.5. MW: molecular weight; AlogP: the octanol-water partition coefficient; HBAs: the count of hydrogen bond acceptors; HBDs: the count of hydrogen bond donors; PSA: polar surface area; RBs: the number of rotatable bonds; (G) Pie chart of the enzyme types of the targets. Details are shown in Supplementary Materials Table S3; (H) Pie chart of the SCOP classification of the targets. Entries without SCOP annotations are not taken into account. Details are shown in Supplementary Materials Table S4.
Figure 2Comparison of the three methods in handling the data imbalance. The red circle denotes model results based on data sets with the proportion of 1:10 (ligands:decoys). The purple triangle denotes model results based on data sets with the original ratio (1:39). The blue diamond denotes results of models with different weight for ligands class and decoys class in the SVM model training.
Results of the discriminant models for the 42 GLL/GDD (G protein-coupled receptor (GPCR) ligand library and the GPCR decoy database) data sets. Models were built with the “weighted” method for handling the data imbalance. The CV AUC is a 10-fold cross-validation result of the training set. Accuracy, Precision, Recall, and MCC are the prediction results for the test set. Results of the other two treatments for data imbalance can be found in Supplementary Materials Table S6.
| No. | Data Sets | CV AUC | Accuray | Precision | Recall | MCC |
|---|---|---|---|---|---|---|
| 1 | 5HT1A_Agonist | 0.989 | 0.994 | 0.986 | 0.763 | 0.865 |
| 2 | 5HT1A_Antagonist | 0.975 | 0.992 | 0.888 | 0.782 | 0.829 |
| 3 | 5HT1D_Agonist | 0.988 | 0.993 | 1.000 | 0.703 | 0.835 |
| 4 | 5HT1D_Antagonist | 0.980 | 0.995 | 0.981 | 0.825 | 0.898 |
| 5 | 5HT2A_Antagonist | 0.981 | 0.992 | 0.894 | 0.759 | 0.820 |
| 6 | 5HT2C_Agonist | 0.983 | 0.986 | 0.721 | 0.738 | 0.722 |
| 7 | 5HT2C_Antagonist | 0.957 | 0.991 | 1.000 | 0.625 | 0.787 |
| 8 | 5HT4R_Agonist | 0.992 | 0.991 | 1.000 | 0.638 | 0.795 |
| 9 | 5HT4R_Antagonist | 0.993 | 0.997 | 1.000 | 0.875 | 0.933 |
| 10 | AA1R_Antagonist | 0.986 | 0.992 | 0.894 | 0.750 | 0.814 |
| 11 | AA2AR_Antagonist | 0.985 | 0.995 | 0.983 | 0.808 | 0.889 |
| 12 | AA2BR_Antagonist | 0.984 | 0.993 | 0.894 | 0.797 | 0.841 |
| 13 | ACM1_Agonist | 0.985 | 0.992 | 0.851 | 0.820 | 0.831 |
| 14 | ACM3_Antagonist | 0.983 | 0.991 | 0.930 | 0.678 | 0.790 |
| 15 | ADA1A_Antagonist | 0.983 | 0.993 | 0.968 | 0.763 | 0.856 |
| 16 | ADA1B_Antagonist | 0.988 | 0.994 | 0.889 | 0.873 | 0.878 |
| 17 | ADA1D_Antagonist | 0.987 | 0.994 | 0.948 | 0.807 | 0.872 |
| 18 | ADA2A_Antagonist | 0.953 | 0.991 | 0.983 | 0.648 | 0.794 |
| 19 | ADA2B_Antagonist | 0.959 | 0.979 | 0.562 | 0.839 | 0.677 |
| 20 | ADA2C_Antagonist | 0.961 | 0.992 | 0.967 | 0.686 | 0.811 |
| 21 | ADRB1_Agonist | 0.995 | 0.992 | 0.912 | 0.738 | 0.816 |
| 22 | ADRB1_Antagonist | 0.986 | 0.991 | 0.964 | 0.643 | 0.783 |
| 23 | ADRB2_Agonist | 0.992 | 0.996 | 0.904 | 0.927 | 0.914 |
| 24 | ADRB2_Antagonist | 0.990 | 0.995 | 0.971 | 0.829 | 0.895 |
| 25 | ADRB3_Agonist | 0.994 | 0.996 | 0.982 | 0.860 | 0.917 |
| 26 | AG2R_Antagonist | 0.996 | 0.998 | 0.996 | 0.907 | 0.949 |
| 27 | CCKAR_Antagonist | 0.986 | 0.993 | 1.000 | 0.722 | 0.847 |
| 28 | CLTR1_Antagonist | 0.981 | 0.992 | 0.979 | 0.701 | 0.825 |
| 29 | DRD2_Antagonist | 0.977 | 0.992 | 0.951 | 0.726 | 0.827 |
| 30 | DRD3_Antagonist | 0.982 | 0.993 | 0.941 | 0.750 | 0.837 |
| 31 | DRD4_Antagonist | 0.993 | 0.995 | 0.982 | 0.827 | 0.899 |
| 32 | EDNRA_Antagonist | 0.987 | 0.994 | 0.932 | 0.809 | 0.865 |
| 33 | EDNRB_Antagonist | 0.986 | 0.993 | 0.902 | 0.814 | 0.853 |
| 34 | GASR_Antagonist | 0.990 | 0.995 | 0.979 | 0.816 | 0.891 |
| 35 | HRH3_Antagonist | 0.997 | 0.992 | 0.958 | 0.730 | 0.833 |
| 36 | LSHR_Antagonist | 0.990 | 0.989 | 1.000 | 0.543 | 0.733 |
| 37 | NK1R_Antagonist | 0.980 | 0.991 | 0.914 | 0.711 | 0.802 |
| 38 | OPRD_Agonist | 0.990 | 0.993 | 1.000 | 0.722 | 0.847 |
| 39 | OPRK_Agonist | 0.990 | 0.990 | 1.000 | 0.596 | 0.768 |
| 40 | TA2R_Antagonist | 0.991 | 0.994 | 0.974 | 0.772 | 0.864 |
| 41 | V1AR_Antagonist | 0.986 | 0.993 | 0.971 | 0.733 | 0.840 |
| 42 | V1BR_Antagonist | 0.983 | 0.992 | 0.969 | 0.689 | 0.813 |
Figure 3Comparison of SVM models using different BRS-3D feature subsets. For most data sets, the prediction performances were improved with the increasing of feature numbers.
SVM models based on Dragon 2D, MOE 3D, and BRS-3D descriptors.
| Data Sets | Accuracy | Precision | Recall | MCC | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dragon 2D | MOE 3D | BRS-3D | Dragon 2D | MOE 3D | BRS-3D | Dragon 2D | MOE 3D | BRS-3D | Dragon 2D | MOE 3D | BRS-3D | |
| 1 | 0.993 | 0.951 | 0.994 | 0.849 | 0.330 | 0.986 | 0.889 | 0.932 | 0.763 | 0.866 | 0.539 | 0.865 |
| 2 | 0.992 | 0.991 | 0.992 | 0.819 | 0.814 | 0.888 | 0.851 | 0.822 | 0.782 | 0.831 | 0.813 | 0.829 |
| 3 | 0.987 | 0.977 | 0.993 | 0.671 | 0.526 | 1.000 | 0.919 | 0.919 | 0.703 | 0.779 | 0.686 | 0.835 |
| 4 | 0.981 | 0.992 | 0.995 | 0.568 | 0.831 | 0.981 | 0.921 | 0.857 | 0.825 | 0.715 | 0.840 | 0.898 |
| 5 | 0.964 | 0.945 | 0.992 | 0.401 | 0.300 | 0.894 | 0.883 | 0.903 | 0.759 | 0.581 | 0.502 | 0.820 |
| 6 | 0.987 | 0.980 | 0.986 | 0.744 | 0.571 | 0.721 | 0.762 | 0.857 | 0.738 | 0.747 | 0.691 | 0.722 |
| 7 | 0.983 | 0.949 | 0.991 | 0.636 | 0.317 | 1.000 | 0.766 | 0.906 | 0.625 | 0.690 | 0.519 | 0.787 |
| 8 | 0.998 | 0.979 | 0.991 | 0.939 | 0.541 | 1.000 | 0.979 | 0.979 | 0.638 | 0.957 | 0.719 | 0.795 |
| 9 | 0.995 | 0.994 | 0.997 | 0.855 | 0.863 | 1.000 | 0.979 | 0.917 | 0.875 | 0.912 | 0.886 | 0.933 |
| 10 | 0.990 | 0.986 | 0.992 | 0.774 | 0.705 | 0.894 | 0.857 | 0.769 | 0.750 | 0.810 | 0.729 | 0.814 |
| 11 | 0.989 | 0.982 | 0.995 | 0.756 | 0.602 | 0.983 | 0.808 | 0.808 | 0.808 | 0.776 | 0.689 | 0.889 |
| 12 | 0.987 | 0.978 | 0.993 | 0.794 | 0.539 | 0.894 | 0.676 | 0.865 | 0.797 | 0.726 | 0.672 | 0.841 |
| 13 | 0.986 | 0.962 | 0.992 | 0.662 | 0.383 | 0.851 | 0.888 | 0.882 | 0.820 | 0.760 | 0.567 | 0.831 |
| 14 | 0.988 | 0.982 | 0.991 | 0.714 | 0.605 | 0.930 | 0.847 | 0.831 | 0.678 | 0.772 | 0.700 | 0.790 |
| 15 | 0.980 | 0.960 | 0.993 | 0.557 | 0.373 | 0.968 | 0.915 | 0.881 | 0.763 | 0.705 | 0.558 | 0.856 |
| 16 | 0.990 | 0.955 | 0.994 | 0.758 | 0.349 | 0.889 | 0.882 | 0.927 | 0.873 | 0.812 | 0.554 | 0.878 |
| 17 | 0.977 | 0.955 | 0.994 | 0.528 | 0.347 | 0.948 | 0.904 | 0.904 | 0.807 | 0.681 | 0.544 | 0.872 |
| 18 | 0.957 | 0.938 | 0.991 | 0.359 | 0.270 | 0.983 | 0.909 | 0.864 | 0.648 | 0.556 | 0.462 | 0.794 |
| 19 | 0.973 | 0.982 | 0.979 | 0.471 | 0.632 | 0.562 | 0.736 | 0.690 | 0.839 | 0.576 | 0.651 | 0.677 |
| 20 | 0.955 | 0.978 | 0.992 | 0.335 | 0.542 | 0.967 | 0.837 | 0.674 | 0.686 | 0.513 | 0.593 | 0.811 |
| 21 | 0.986 | 0.974 | 0.992 | 0.655 | 0.494 | 0.912 | 0.905 | 0.905 | 0.738 | 0.763 | 0.658 | 0.816 |
| 22 | 0.989 | 0.977 | 0.991 | 0.702 | 0.522 | 0.964 | 0.952 | 0.857 | 0.643 | 0.812 | 0.659 | 0.783 |
| 23 | 0.995 | 0.987 | 0.996 | 0.900 | 0.694 | 0.904 | 0.878 | 0.829 | 0.927 | 0.886 | 0.752 | 0.914 |
| 24 | 0.974 | 0.971 | 0.995 | 0.452 | 0.429 | 0.971 | 0.917 | 0.917 | 0.829 | 0.633 | 0.616 | 0.895 |
| 25 | 0.990 | 0.986 | 0.996 | 0.738 | 0.665 | 0.982 | 0.938 | 0.907 | 0.860 | 0.827 | 0.770 | 0.917 |
| 26 | 0.996 | 0.994 | 0.998 | 0.877 | 0.843 | 0.996 | 0.967 | 0.927 | 0.907 | 0.918 | 0.881 | 0.949 |
| 27 | 0.992 | 0.972 | 0.993 | 0.857 | 0.467 | 1.000 | 0.833 | 0.875 | 0.722 | 0.841 | 0.627 | 0.847 |
| 28 | 0.994 | 0.968 | 0.992 | 0.857 | 0.441 | 0.979 | 0.896 | 0.940 | 0.701 | 0.873 | 0.632 | 0.825 |
| 29 | 0.964 | 0.987 | 0.992 | 0.403 | 0.699 | 0.951 | 0.925 | 0.811 | 0.726 | 0.597 | 0.746 | 0.827 |
| 30 | 0.948 | 0.975 | 0.993 | 0.313 | 0.500 | 0.941 | 0.891 | 0.734 | 0.750 | 0.510 | 0.594 | 0.837 |
| 31 | 0.981 | 0.975 | 0.995 | 0.575 | 0.502 | 0.982 | 0.955 | 0.955 | 0.827 | 0.733 | 0.683 | 0.899 |
| 32 | 0.994 | 0.951 | 0.994 | 0.864 | 0.329 | 0.932 | 0.890 | 0.912 | 0.809 | 0.874 | 0.531 | 0.865 |
| 33 | 0.990 | 0.975 | 0.993 | 0.758 | 0.505 | 0.902 | 0.858 | 0.912 | 0.814 | 0.801 | 0.668 | 0.853 |
| 34 | 0.994 | 0.981 | 0.995 | 0.861 | 0.582 | 0.979 | 0.921 | 0.904 | 0.816 | 0.887 | 0.717 | 0.891 |
| 35 | 0.977 | 0.988 | 0.992 | 0.524 | 0.736 | 0.958 | 0.857 | 0.841 | 0.730 | 0.660 | 0.781 | 0.833 |
| 36 | 0.986 | 0.985 | 0.989 | 0.733 | 0.744 | 1.000 | 0.717 | 0.630 | 0.543 | 0.718 | 0.677 | 0.733 |
| 37 | 0.992 | 0.992 | 0.991 | 0.805 | 0.830 | 0.914 | 0.872 | 0.839 | 0.711 | 0.834 | 0.830 | 0.802 |
| 38 | 0.994 | 0.946 | 0.993 | 0.867 | 0.307 | 1.000 | 0.903 | 0.917 | 0.722 | 0.882 | 0.513 | 0.847 |
| 39 | 0.995 | 0.935 | 0.990 | 0.869 | 0.251 | 1.000 | 0.930 | 0.807 | 0.596 | 0.896 | 0.428 | 0.768 |
| 40 | 0.973 | 0.988 | 0.994 | 0.479 | 0.721 | 0.974 | 0.945 | 0.876 | 0.772 | 0.662 | 0.789 | 0.864 |
| 41 | 0.994 | 0.987 | 0.993 | 0.870 | 0.800 | 0.971 | 0.889 | 0.622 | 0.733 | 0.876 | 0.699 | 0.840 |
| 42 | 0.992 | 0.969 | 0.992 | 0.768 | 0.435 | 0.969 | 0.956 | 0.822 | 0.689 | 0.853 | 0.585 | 0.813 |
Figure 4The MCC values of SVM models based on BRS-3D and other two state-of-the-art descriptors.
Figure 5The CB1/CB2 subtype selectivity prediction models. (A) Cross-validation Q2 and test set R2 of the regression models with different feature subsets; (B) cross-validation and test set RMSE of the regression models with different feature subsets; (C) relationship between experimental and predicted SR of the model with 20% BRS-3D features; (D) discriminant models with different feature subsets; and (E,F) distribution of the selective compounds in the chemical space, composed of the most important features.
Scheme 1Workflow of QSAR study based on BRS-3D. The process contains three steps: (A) Construction of BRCD-3D. 3D shape similarity calculations and structural clustering were used to extract a set of 300 diverse ligands from the druggable protein-ligand database, sc-PDB. This ligand set was named the BRCD-3D; (B) Calculation of BRS-3D. The objective compound was flexibly superimposed onto the 300 BRCD-3D ligands (magenta ones), resulting in 300 similarity scores. The array of the scores was defined as BRS-3D, which could be used as a multi-dimensional molecular descriptor in virtual screening and QSAR studies; (C) Model development. Discriminant or regression models were developed with the machine learning methods (e.g., SVM), taking BRS-3D as the independent variable.
The 42 GLL/GDD data sets with more than 200 ligands.
| No. | Target | Target Name | Ligand Type | Ligand Count | Decoy Count |
|---|---|---|---|---|---|
| 1 | 5HT1A | 5-hydroxytryptamine receptor 1A | Agonist | 952 | 37,128 |
| 2 | 5HT1A | 5-hydroxytryptamine receptor 1A | Antagonist | 506 | 19,734 |
| 3 | 5HT1D | 5-hydroxytryptamine receptor 1D | Agonist | 558 | 21,762 |
| 4 | 5HT1D | 5-hydroxytryptamine receptor 1D | Antagonist | 315 | 12,285 |
| 5 | 5HT2A | 5-hydroxytryptamine receptor 2A | Antagonist | 725 | 28,275 |
| 6 | 5HT2C | 5-hydroxytryptamine receptor 2C | Agonist | 209 | 8151 |
| 7 | 5HT2C | 5-hydroxytryptamine receptor 2C | Antagonist | 318 | 12,402 |
| 8 | 5HT4R | 5-hydroxytryptamine receptor 4 | Agonist | 235 | 9165 |
| 9 | 5HT4R | 5-hydroxytryptamine receptor 4 | Antagonist | 241 | 9399 |
| 10 | AA1R | Adenosine receptor A1 | Antagonist | 280 | 10,920 |
| 11 | AA2AR | Adenosine receptor A2a | Antagonist | 361 | 14,079 |
| 12 | AA2BR | Adenosine receptor A2b | Antagonist | 370 | 14,430 |
| 13 | ACM1 | Muscarinic acetylcholine receptor M1 | Agonist | 806 | 31,434 |
| 14 | ACM3 | Muscarinic acetylcholine receptor M3 | Antagonist | 295 | 11,505 |
| 15 | ADA1A | Alpha-1A adrenergic receptor | Antagonist | 588 | 22,932 |
| 16 | ADA1B | Alpha-1B adrenergic receptor | Antagonist | 550 | 21,450 |
| 17 | ADA1D | Alpha-1D adrenergic receptor | Antagonist | 568 | 22,152 |
| 18 | ADA2A | Alpha-2A adrenergic receptor | Antagonist | 440 | 17,160 |
| 19 | ADA2B | Alpha-2B adrenergic receptor | Antagonist | 437 | 17,043 |
| 20 | ADA2C | Alpha-2C adrenergic receptor | Antagonist | 433 | 16,887 |
| 21 | ADRB1 | Beta-1 adrenergic receptor | Agonist | 209 | 8151 |
| 22 | ADRB1 | Beta-1 adrenergic receptor | Antagonist | 211 | 8229 |
| 23 | ADRB2 | Beta-2 adrenergic receptor | Agonist | 206 | 8034 |
| 24 | ADRB2 | Beta-2 adrenergic receptor | Antagonist | 204 | 7956 |
| 25 | ADRB3 | Beta-3 adrenergic receptor | Agonist | 643 | 25,077 |
| 26 | AG2R | Type-1 angiotensin II receptor | Antagonist | 1502 | 58,578 |
| 27 | CCKAR | Cholecystokinin receptor type A | Antagonist | 360 | 14,040 |
| 28 | CLTR1 | Cysteinyl leukotriene receptor 1 | Antagonist | 333 | 12,987 |
| 29 | DRD2 | D2 dopamine receptor | Antagonist | 529 | 20,631 |
| 30 | DRD3 | D3 dopamine receptor | Antagonist | 317 | 12,363 |
| 31 | DRD4 | D4 dopamine receptor | Antagonist | 665 | 25,935 |
| 32 | EDNRA | Endothelin-1 receptor | Antagonist | 676 | 26,364 |
| 33 | EDNRB | Endothelin B receptor | Antagonist | 561 | 21,879 |
| 34 | GASR | Gastrin/cholecystokinin type B receptor | Antagonist | 567 | 22,113 |
| 35 | HRH3 | Histamine H3 receptor | Antagonist | 313 | 12,207 |
| 36 | LSHR | Lutropin-choriogonadotropic hormone receptor | Antagonist | 230 | 8970 |
| 37 | NK1R | Substance-P receptor | Antagonist | 900 | 35,100 |
| 38 | OPRD | Delta-type opioid receptor | Agonist | 361 | 14,079 |
| 39 | OPRK | Kappa-type opioid receptor | Agonist | 284 | 11,076 |
| 40 | TA2R | Thromboxane A2 receptor | Antagonist | 725 | 28,275 |
| 41 | V1AR | Vasopressin V1a receptor | Antagonist | 225 | 8775 |
| 42 | V1BR | Vasopressin V1b receptor | Antagonist | 225 | 8775 |