| Literature DB >> 18834515 |
Pelin Armutlu1, Muhittin E Ozdemir, Fadime Uney-Yuksektepe, I Halil Kavakli, Metin Turkay.
Abstract
BACKGROUND: A priori analysis of the activity of drugs on the target protein by computational approaches can be useful in narrowing down drug candidates for further experimental tests. Currently, there are a large number of computational methods that predict the activity of drugs on proteins. In this study, we approach the activity prediction problem as a classification problem and, we aim to improve the classification accuracy by introducing an algorithm that combines partial least squares regression with mixed-integer programming based hyper-boxes classification method, where drug molecules are classified as low active or high active regarding their binding activity (IC50 values) on target proteins. We also aim to determine the most significant molecular descriptors for the drug molecules.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18834515 PMCID: PMC2572625 DOI: 10.1186/1471-2105-9-411
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of R2 values for PLS models.
| Data set | CoMFA* | CoMSIAbasic* | CoMSIAextra* | EVA* | HQSAR* | 2D* | 2.5D* | e-DragonPLS-7 | e-DragonPLS-10 | e-DragonPLS-15 |
| AchE | 0.88 | 0.86 | 0.86 | 0.96 | 0.72 | 0.40 | 0.38 | 0.84 | 0.90 | 0.95 |
| BZR | 0.61 | 0.62 | 0.62 | 0.51 | 0.64 | 0.51 | 0.52 | 0.51 | 0.67 | 0.79 |
| COX-2 | 0.70 | 0.69 | 0.69 | 0.68 | 0.70 | 0.62 | 0.68 | 0.53 | 0.61 | 0.73 |
| DHFR_RL | 0.79 | 0.76 | 0.75 | 0.81 | 0.81 | 0.61 | 0.65 | 0.42 | 0.53 | 0.64 |
| DHFR_PC | N/A | N/A | N/A | N/A | N/A | N/A | N/A | 0.44 | 0.54 | 0.65 |
| DHFR_TG | N/A | N/A | N/A | N/A | N/A | N/A | N/A | 0.40 | 0.51 | 0.66 |
| Cytochrome P450 C17 | N/A | N/A | N/A | N/A | N/A | N/A | N/A | 0.84 | 0.91 | 0.95 |
* PLS results reported by Sutherland et al. [26].
Classification Accuracies of each iteration.
| 91.89 | 100.00 | ||||
| 86.48 | 89.19 | 91.89 | |||
| 86.05 | 89.18 | ||||
| 90.90 | 96.36 | ||||
| 92.73 | 94.55 | ||||
| 90.09 | 92.73 | ||||
| 94.39 | 95.33 | 97.20 | 98.13 | ||
| 91.58 | 97.20 | ||||
| 88.78 | 89.72 | 90.65 | |||
| 94.73 | 96.99 | ||||
| 93.98 | 97.74 | ||||
| 94.73 | |||||
| 95.23 | 96.83 | 97.62 | |||
| 94.44 | 95.24 | 98.41 | |||
| 92.06 | 93.65 | ||||
| 96.24 | 97.74 | ||||
| 93.23 | 93.98 | 96.24 | |||
| 96.24 | 97.74 | ||||
| 86.36 | 90.00 | 97.20 | 100.00 | ||
| 100.00 | |||||
| 100.00 |
Comparison of classification accuracies of best WEKA classifiers with the MILP based hyper-boxes classification.
| Bayes Network | 79.28 | 77.48 | 78.38 | Bayes Network | 77.91 | 77.3 | 73.62 |
| Naive Bayes | 80.18 | 80.18 | 81.08 | Naive Bayes | 80.37 | 77.91 | 66.26 |
| Naive Bayes Simple | 81.08 | 80.18 | 81.98 | Naive Bayes Simple | 79.14 | 77.3 | 68.71 |
| Naive Bayes Updatable | 80.18 | 80.18 | 81.08 | Naive Bayes Updatable | 80.37 | 77.91 | 66.26 |
| Lojistic | 79.28 | 80.18 | Lojistic | 80.98 | 80.98 | ||
| Multilayer Perceptron | 82.88 | 81.08 | 81.08 | Multilayer Perceptron | 79.75 | 80.98 | 79.14 |
| SimpleLogistic | 82.88 | 79.28 | SimpleLogistic | 80.98 | 79.14 | ||
| SMO (WEKA SVM) | 79.28 | 80.18 | 80.18 | SMO (WEKA SVM) | 79.14 | 77.91 | 77.91 |
| IB1 | 70.27 | 80.18 | 77.48 | IB1 | 72.39 | 74.85 | 75.46 |
| Ibk | 70.27 | 80.18 | 77.48 | IBk | 72.39 | 74.85 | 75.46 |
| Logit Boost | 82.88 | 81.08 | Logit Boost | 78.53 | 77.3 | 77.91 | |
| Multi Class Classifier | 79.28 | 80.18 | Multi Class Classifier | 80.98 | |||
| Threshold Selector | 47.75 | 68.47 | 60.36 | Threshold Selector | 78.53 | 76.69 | 75.46 |
| LMT | 82.88 | 79.28 | LMT | 80.98 | 79.14 | ||
| RandomForest | 80.18 | 80.18 | 81.98 | RandomForest | 77.3 | 79.75 | |
| OneR | 81.08 | 72.97 | 72.97 | OneR | 74.85 | 74.23 | 79.14 |
| Bayes Network | 77.33 | 78.09 | 73.05 | Bayes Network | 67.2 | 67.2 | 66.88 |
| Naive Bayes | 76.57 | 72.54 | Naive Bayes | 71.66 | 70.06 | 64.65 | |
| Naive Bayes Simple | 75.57 | 78.84 | 67 | Naive Bayes Simple | 72.29 | 70.06 | 64.65 |
| Naive Bayes Updatable | 76.57 | 72.54 | Naive Bayes Updatable | 71.66 | 70.06 | 64.65 | |
| Lojistic | 75.82 | 78.84 | 75.57 | Lojistic | 72.29 | 70.38 | 70.06 |
| Multilayer Perceptron | 76.32 | 77.08 | 75.06 | Multilayer Perceptron | 72.29 | ||
| SimpleLogistic | 74.56 | 77.83 | 75.31 | SimpleLogistic | 72.29 | 71.97 | 68.47 |
| SMO (WEKA SVM) | 72.54 | 79.09 | 72.54 | SMO (WEKA SVM) | 71.02 | 69.43 | 69.43 |
| IB1 | 75.31 | 79.09 | 75.82 | IB1 | 69.11 | 71.02 | 70.06 |
| Ibk | 75.31 | 79.09 | 75.82 | IBk | 69.11 | 71.02 | 70.06 |
| Logit Boost | 77.33 | 78.34 | 78.34 | Logit Boost | 71.66 | 70.06 | 70.7 |
| Multi Class Classifier | 75.82 | 78.84 | 75.57 | Multi Class Classifier | 72.29 | 70.38 | 70.06 |
| Threshold Selector | 69.77 | 74.81 | 73.55 | Threshold Selector | 68.47 | 65.29 | 64.65 |
| LMT | 76.07 | 76.57 | 77.83 | LMT | 71.34 | 71.02 | 68.15 |
| RandomForest | 79.09 | RandomForest | 71.97 | 70.06 | |||
| OneR | 69.77 | 69.77 | 70.53 | OneR | 70.7 | 70.38 | 70.06 |
| Bayes Network | 63.72 | 71.78 | 70.5 | Bayes Network | 80.42 | 80.42 | 78.04 |
| Naive Bayes | 63.97 | 68.76 | 71.7 | Naive Bayes | 82.54 | 81.48 | 80.95 |
| Naive Bayes Simple | 63.97 | 67.75 | 71 | Naive Bayes Simple | 82.8 | 79.89 | 81.22 |
| Naive Bayes Updatable | 63.98 | 68.77 | 71.78 | Naive Bayes Updatable | 82.54 | 81.48 | 80.95 |
| Lojistic | 73.8 | 78.58 | Lojistic | 81.75 | 83.33 | 81.75 | |
| Multilayer Perceptron | 62.72 | 76.57 | 77.58 | Multilayer Perceptron | 82.8 | 82.8 | 84.13 |
| SimpleLogistic | 66.75 | 73.55 | 78.33 | SimpleLogistic | 80.42 | 81.22 | |
| SMO (WEKA SVM) | 64.99 | 73.05 | 79.59 | SMO (WEKA SVM) | 82.28 | 83.33 | 79.1 |
| IB1 | 62.97 | 75.06 | 81.11 | IB1 | 82.28 | 80.16 | 81.75 |
| Ibk | 62.97 | 75.06 | IBk | 82.28 | 80.16 | 81.75 | |
| Logit Boost | 64.99 | 75.06 | 77.33 | Logit Boost | 83.33 | 81.48 | 81.48 |
| Multi Class Classifier | 69.52 | 73.8 | 78.59 | Multi Class Classifier | 81.75 | 83.33 | 81.75 |
| Threshold Selector | 64.99 | 69.52 | 78.59 | Threshold Selector | 83.33 | 79.1 | 81.22 |
| LMT | 65.24 | 77.83 | LMT | 83.07 | |||
| RandomForest | 68.51 | 77.08 | 77.83 | RandomForest | 82.8 | 80.95 | 83.07 |
| OneR | 61.46 | 66 | 62.72 | OneR | 79.89 | 79.89 | 80.16 |
Final average classification accuracies and corresponding standard deviations of classification with 10-fold cross validation with various number of descriptors.
| Average Accuracy | Std. Dev | Average Accuracy | Std. Dev | ||||
| 4 Attributes | 80.83 | 4.36 | 4 Attributes | 82.15 | 2.76 | ||
| 6 Attributes | 83.36 | 3.67 | 6 Attributes | 91.67 | 1.86 | ||
| 7 Attributes | 100 | 0 | 7 Attributes | 96.99 | 2.14 | ||
| 8 Attributes | 96.36 | 1.89 | 8 Attributes | 96.64 | 0.72 | ||
| 10 Attributes | 91.89 | 2.22 | 10 Attributes | 97.74 | 0.82 | ||
| 12 Attributes | 86.63 | 3.28 | 12 Attributes | 97.37 | 1.33 | ||
| 15 Attributes | 89.18 | 1.18 | 15 Attributes | 94.73 | 1.94 | ||
| 20 Attributes | 83.65 | 3.26 | 20 Attributes | 95.25 | 3.28 | ||
| 4 Attributes | 86.83 | 1.36 | 4 Attributes | 81.27 | 4.72 | ||
| 6 Attributes | 88.36 | 2.57 | 6 Attributes | 94.48 | 3.97 | ||
| 7 Attributes | 96.36 | 2.06 | 7 Attributes | 97.62 | 2.22 | ||
| 8 Attributes | 93.65 | 3.83 | 8 Attributes | 96.15 | 0.82 | ||
| 10 Attributes | 94.55 | 2.37 | 10 Attributes | 98.41 | 1.18 | ||
| 12 Attributes | 95.63 | 1.06 | 12 Attributes | 92.18 | 2.83 | ||
| 15 Attributes | 92.73 | 1.46 | 15 Attributes | 93.65 | 0.98 | ||
| 20 Attributes | 86.25 | 2.12 | 20 Attributes | 94.25 | 4.02 | ||
| 4 Attributes | 91.86 | 3.86 | 4 Attributes | 84.94 | 1.47 | ||
| 6 Attributes | 94.36 | 1.42 | 6 Attributes | 94.03 | 3.49 | ||
| 7 Attributes | 98.13 | 1.73 | 7 Attributes | 97.74 | 1.62 | ||
| 8 Attributes | 97.65 | 1.23 | 8 Attributes | 96.05 | 0.72 | ||
| 10 Attributes | 97.2 | 2.29 | 10 Attributes | 96.24 | 2.47 | ||
| 12 Attributes | 96.63 | 2.16 | 12 Attributes | 95.42 | 1.79 | ||
| 15 Attributes | 90.65 | 3.06 | 15 Attributes | 97.74 | 2.78 | ||
| 20 Attributes | 88.06 | 1.41 | 20 Attributes | 93.5 | 2.67 |
The descriptors leave the 7 descriptor model and the descriptors replacing them.
| Leaving | maxmax1 | maxmax2 | maxmax3 |
| 0.96416 | 0.9491 | 0.67855 | |
| Entering | minmax1 | minmax2 | minmax3 |
| 0.5455 | 0.5783 | 0.5946 |
Comparison of classification accuracies of best WEKA classifiers with MILP based hyper-boxes classification on P450 C17 inhibitors.
| Bayes Network | |||
| Naive Bayes | 62.50 | 71.88 | 53.13 |
| Naive Bayes Simple | 62.50 | 68.75 | 50.00 |
| Naive Bayes Updatable | 62.50 | 71.88 | 53.13 |
| Lojistic | 71.88 | 56.25 | 62.50 |
| Multilayer Perceptron | 62.50 | 71.88 | 59.38 |
| SimpleLogistic | 75.00 | 75.00 | |
| SMO | |||
| IB1 | 59.38 | 59.38 | |
| IBk | 59.38 | 59.38 | 62.50 |
| Logit Boost | 71.88 | 62.50 | 62.50 |
| Multi Class Classifier | 71.88 | 56.25 | 62.50 |
| Threshold Selector | 43.75 | 40.63 | 62.50 |
| LMT | 75.00 | 75.00 | |
| RandomForest | 75.00 | 68.75 | 65.63 |
| OneR | 75.00 | 71.88 | 75.00 |
Brief explanation of the most significant descriptors.
| Brief explanation | |
| 3D-MoRSE – signal 10/weighted by atomic masses | |
| d COMMA2 value/weighted by atomic polarizabilities | |
| 3D-MoRSE – signal 14/weighted by atomic Sanderson electronegativities | |
| 3D-MoRSE – signal 08/weighted by atomic masses | |
| number of acceptor atoms for H-bonds (N. O. F) | |
| Eigenvalue 04 from edge adj. matrix weighted by edge degrees | |
| d COMMA2 value/weighted by atomic van der Waals volumes |
Figure 1Outline of classification approach.
Figure 2Representative compounds from each QSAR data.
Figure 3Schematic representation of multi-class data classification using hyper-boxes.