| Literature DB >> 24188205 |
Sandeep Kumar Dhanda, Deepak Singla, Alok K Mondal, Gajendra P S Raghava1.
Abstract
BACKGROUND: Identification of drug-like molecules is one of the major challenges in the field of drug discovery. Existing approach like Lipinski rule of 5 (Ro5), Operea have their own limitations. Thus, there is a need to develop computational method that can predict drug-likeness of a molecule with precision. In addition, there is a need to develop algorithm for screening chemical library for their drug-like properties.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24188205 PMCID: PMC3826839 DOI: 10.1186/1745-6150-8-28
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Figure 1Variance of components in our dataset.
Figure 2Two-dimensional plot of Principal Component Analysis for approved and experimental drugs, each drug molecule is represented by circle.
Top-10 Substructure fingerprints and their respective frequency in our dataset
| Primary_alcohol | 139 | 0.69 | 539 | 1.13 | -(0.44) | |
| 1-2Diol | 67 | 0.31 | 663 | 1.29 | -(0.98) | |
| Aldehyde | 4 | 0.16 | 79 | 1.35 | -(1.19) | |
| Alpha_hydorxyacid | 1 | 0.07 | 50 | 1.39 | -(1.32) | |
| Sulfenic_derivatives | 23 | 0.59 | 108 | 1.17 | -(0.58) | |
| Phosphoric_monoester | 5 | 0.06 | 289 | 1.4 | -(1.34) | |
| Phosphoric_diester | 3 | 0.2 | 47 | 1.33 | -(1.13) | |
| Phosphoric_acid_derivatives | 16 | 0.1 | 506 | 1.38 | -(1.28) | |
| Sugar_pattern_1 | 71 | 0.34 | 627 | 1.28 | -(0.94) | |
| Mixed_anhydride | 3 | 0.05 | 214 | 1.4 | -(1.35) | |
aNfrag_aprd: number of fragment in approved class, Nfrag_exp: number of fragment in experimental class, Faprd: frequency of particular fragment in approved class, Fexp: frequency of particular fragment in experimental class, negative sign indicates that these fingerprints are not preferable in approved drug.
Highly significant MACCS fingerprints and their respective frequency in our dataset
| MACCS112 | 66.35 | 1288 | 1.58 | 1473 | 0.76 | 0.82 |
| MACCS122 | 66.66 | 1105 | 1.57 | 1276 | 0.76 | 0.81 |
| MACCS144 | 69.21 | 972 | 1.64 | 1027 | 0.73 | 0.91 |
| MACCS66 | 73.97 | 557 | 1.98 | 395 | 0.59 | 1.39 |
| MACCS150 | 57.57 | 1227 | 1.36 | 1812 | 0.85 | 0.52 |
| MACCS138 | 65.91 | 910 | 1.52 | 1115 | 0.78 | 0.74 |
aNfrag_aprd: number of fragment in approved class, Nfrag_exp: number of fragment in experimental class, Faprd: frequency of particular fragment in approved class, Fexp: frequency of particular fragment in experimental class.
Figure 3Representation of the selected MACCS keys.
Performance of various Fingerprints and selection-algorithm
| rm-useless | 52 | 0 | 70.23 | 80.6 | 77.53 | 0.49 | 0.82 | |
| cfsSubsetEval | 9 | 0.3 | 70.9 | 71.68 | 71.45 | 0.4 | 0.77 | |
| rm-useless | 1012 | 0 | 62.44 | 92.51 | 83.62 | 0.59 | 0.86 | |
| cfsSubsetEval | 25 | −0.4 | 60.13 | 85.93 | 78.3 | 0.47 | 0.79 | |
| rm-useless | 1024 | 0 | 61.99 | 92.36 | 83.37 | 0.58 | 0.86 | |
| cfsSubsetEval | 40 | −0.2 | 63.62 | 85.84 | 79.27 | 0.5 | 0.81 | |
| rm-useless | 1024 | 0 | 67.78 | 86.28 | 80.8 | 0.54 | 0.85 | |
| cfsSubsetEval | 43 | 0 | 69.86 | 73.71 | 72.57 | 0.41 | 0.78 | |
| rm-useless | 704 | 0 | 63.85 | 92.11 | 83.75 | 0.59 | 0.87 | |
| cfsSubsetEval | 27 | 0.4 | 66.59 | 79.76 | 75.86 | 0.45 | 0.8 | |
| rm-useless | 159 | 0 | 88.42 | 90.61 | 89.96 | 0.77 | 0.95 | |
| cfsSubsetEval | 10 | 0 | 89.83 | 81.72 | 84.12 | 0.67 | 0.87 | |
| rm-useless | 192 | 0 | 93.1 | 87.84 | 89.39 | 0.77 | 0.95 | |
| cfsSubsetEval | 16 | −0.3 | 84.71 | 78.32 | 80.21 | 0.59 | 0.88 | |
| rm-useless | 192 | 0 | 76.32 | 78.45 | 77.82 | 0.52 | 0.84 | |
| cfsSubsetEval | 18 | 0 | 50.71 | 84.44 | 74.46 | 0.37 | 0.74 | |
| rm-useless | 2273 | −0.2 | 63.92 | 90.42 | 82.58 | 0.57 | 0.85 | |
| cfsSubsetEval | 57 | 0 | 72.31 | 80.82 | 78.3 | 0.51 | 0.82 | |
| rm-useless | 2273 | 0 | 61.84 | 92.89 | 83.7 | 0.59 | 0.86 | |
| cfsSubsetEval | 51 | 0 | 53.75 | 91.98 | 80.67 | 0.51 | 0.81 |
Figure 4Various sets of descriptors versus Matthew’s Correlation Coefficient (MCC).
Performance of PCA based models on MACCS descriptors
| 166 | −0.1 | 91.24 | 90.33 | 90.60 | 0.79 | 0.96 | |
| 20 | 0.0 | 82.93 | 90.05 | 87.94 | 0.72 | 0.94 | |
| 15 | 0.0 | 85.97 | 85.03 | 85.31 | 0.68 | 0.92 | |
| 10 | 0.0 | 86.27 | 78.95 | 81.11 | 0.61 | 0.88 | |
| 5 | −0.1 | 75.95 | 80.57 | 79.20 | 0.54 | 0.84 |
Performance of various hybrid models developed using combination of descriptors.
| Hybrid-1 | 50 | 0 | 86.86 | 86.28 | 86.45 | 0.7 | 0.92 |
| Hybrid-2 | 50 | 0 | 74.09 | 65 | 67.69 | 0.36 | 0.73 |
| Hybrid-3 | 100 | −0.1 | 92.43 | 87.99 | 89.3 | 0.77 | 0.95 |
| Hybrid-4 | 296 | 0 | 90.57 | 89.86 | 90.07 | 0.78 | 0.96 |
| Hybrid-5 | 22 | 0 | 87.75 | 84.68 | 85.59 | 0.69 | 0.9 |
bHybrid-1: top 5 descriptors from each fingerprints based on their positive correlation against activity, Hybrid-2: top 5 descriptors from each fingerprints based on their negative correlation against activity, Hybrid-3: sum of descriptors from Hybrid-1 and Hybrid-2, Hybrid-4: sum of all 10 types of fingerprints after applying CfsSubsetEval algorithm, Hybrid-5: Running the CfsSubsetEval algorithm on the descriptors set of Hybrid-4 (296).
Performance of Models on New training and validation dataset built using MACCS fingerprints
| New train | 90.25 | 89.40 | 89.65 | 0.77 | 0.95 |
| Validation | 90.37 | 87.21 | 88.14 | 0.77 | 0.95 |
| New train | 85.70 | 88.65 | 87.78 | 0.72 | 0.94 |
| Validation | 81.85 | 87.21 | 85.62 | 0.67 | 0.92 |
| New train | 89.42 | 81.83 | 84.07 | 0.67 | 0.89 |
| Validation | 84.07 | 81.44 | 82.22 | 0.62 | 0.88 |
Shows the number of descriptors present in each type of fingerprint
| 1024 | 881 | ||
| 1024 | 166 | ||
| 1024 | 4860 | ||
| 307 | 4860 | ||
| 307 | 79 |