| Literature DB >> 23561266 |
Sabina Smusz1, Rafał Kurczab, Andrzej J Bojarski.
Abstract
BACKGROUND: A growing popularity of machine learning methods application in virtual screening, in both classification and regression tasks, can be observed in the past few years. However, their effectiveness is strongly dependent on many different factors.Entities:
Year: 2013 PMID: 23561266 PMCID: PMC3626618 DOI: 10.1186/1758-2946-5-17
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Machine learning methods used in the experiments with the optional abbreviations used in further work
| Naïve Bayes (NB) | bayes | - |
| Sequential Minimal Optimization (SMO) | functions | The complexity parameter was set at 1, the epsilon for a round-off error was 1.0 E-12, and an option of normalizing training data was chosen. |
| Kernels: | ||
| 2) The polynomial kernel | ||
| 3) The RBF kernel | ||
| Instance-Based Learning (Ibk) | lazy | The brute force search algorithm for nearest neighbour search with Euclidean distance function. |
| The number of neighbours used: | ||
| 1) | ||
| 2) 5 | ||
| 3) 10 | ||
| 4) 20 | ||
| Decorate | meta | One artificial example used during training, number of member classifiers in the Decorate ensemble: 10, the maximum number of iterations: 10. |
| Base classifiers: | ||
| 1) | ||
| 2) J48 | ||
| Hyperpipes | misc | - |
| J48 | trees | 1) With reduced-error pruning |
| 2) | ||
| Random Forest (RF) | trees | Trees with unlimited depth, seed number: 1. |
| Number of generated trees: | ||
| 1) 5 | ||
| 2) 10 | ||
| 3) 50 | ||
| 4) |
Bolded parameters correspond with the one providing the best results for particular machine learning method (see Results section & Additional file 1: Figure S1).
Figure 1A set of heat maps visualizing the values of evaluating parameters obtained in a) common-test set mode experiments and in b) various-test set mode experiments. Figure 1 presents recall, precision and MCC values obtained in the experiments. Columns of maps are referring to particular evaluating parameter, rows to particular target. Rows in maps correspond with different machine learning methods, whereas columns in maps refer to different training sets and use of various fingerprints for molecules representation.
Figure 2A set of heat maps visualizing the values of standard deviation of evaluating parameters for experiments with the use of sets with variously generated inactive compounds. Figure 2 presents standard deviation of recall, precision and MCC values obtained in the experiments. Columns of maps refer to particular target, rows to common-test and various-test sets mode respectively. Rows in maps correspond with different machine learning methods, whereas columns in maps refer to different parameters and fingerprints.
Number of decoys selected for each target
| COX-2 | 1126 | 39508 |
| M1 | 1155 | 32511 |
| HIV PR | 1135 | 11113 |
| Metalloproteinase | 788 | 19868 |
| 5-HT1A | 1101 | 38477 |
Composition of training and test sets used in the experiments
| COX-2 | inhibitors | 78454 | 242/316 | 884/950 |
| M1 | agonists | 09249 | 281/315 | 874/950 |
| HIV PR | inhibitors | 71523 | 203/350 | 932/1100 |
| Metalloproteinase | inhibitors | 78432 | 144/280 | 644/800 |
| 5-HT1A | agonists | 06235 | 198/340 | 903/1050 |