| Literature DB >> 21798025 |
Jonna C Stålring1, Lars A Carlsson, Pedro Almeida, Scott Boyer.
Abstract
BACKGROUND: Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community.Entities:
Year: 2011 PMID: 21798025 PMCID: PMC3158423 DOI: 10.1186/1758-2946-3-28
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Figure 1The architecture of AZOrange. The architecture and the major Open Source codes constituting AZOrange.
Optimized model hyper-parameters
| Algorithm | Parameter | Range |
|---|---|---|
| RF | The number of active attributes | |
| SVM | C | 2-5 to 215 |
| 2-15 to 23 | ||
| ANN | The number of hidden neurons (one-layer) | |
| CvBoost | Maximum branching depth | 1 to 20 |
| The number of trees | 1 to 1000 | |
| PLS | The number of components | 1 to |
Model hyper-parameters being optimized by default and their corresponding ranges.
1nAttr is the number of attributes in the data set
2nEx is the number of examples in the data set
Figure 2Generalization accuracy. Assessing the generalization accuracy of a learner with optimized model hyper-parameters with the double loop data sampling algorithm.
Figure 3Parameter optimization. Using the "Parameter Optimizer" widget to optimize the parameters of any AZOrange learner.
Figure 4Saving a model. Saving a trained model with optimized model hyper-parameters.
Figure 5External test set. Test the performance of a saved model on an external test set.