Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Comparing the Influence of Simulated Experimental Errors on 12 Machine Learning Algorithms in Bioactivity Modeling Using 12 Diverse Data Sets.

Literature DB >> 26038978

Comparing the Influence of Simulated Experimental Errors on 12 Machine Learning Algorithms in Bioactivity Modeling Using 12 Diverse Data Sets.

Isidro Cortes-Ciriano¹, Andreas Bender², Thérèse E Malliavin¹.

Abstract

To date, no systematic study has assessed the effect of random experimental errors on the predictive power of QSAR models. To address this shortage, we have benchmarked the noise sensitivity of 12 learning algorithms on 12 data sets (15,840 models in total), namely the following: Support Vector Machines (SVM) with radial and polynomial (Poly) kernels, Gaussian Process (GP) with radial and polynomial kernels, Relevant Vector Machines (radial kernel), Random Forest (RF), Gradient Boosting Machines (GBM), Bagged Regression Trees, Partial Least Squares, and k-Nearest Neighbors. Model performance on the test set was used as a proxy to monitor the relative noise sensitivity of these algorithms as a function of the level of simulated noise added to the bioactivities from the training set. The noise was simulated by sampling from Gaussian distributions with increasingly larger variances, which ranged from zero to the range of pIC50 values comprised in a given data set. General trends were identified by designing a full-factorial experiment, which was analyzed with a normal linear model. Overall, GBM displayed low noise tolerance, although its performance was comparable to RF, SVM Radial, SVM Poly, GP Poly, and GP Radial at low noise levels. Of practical relevance, we show that the bag fraction parameter has a marked influence on the noise sensitivity of GBM, suggesting that low values (e.g., 0.1-0.2) for this parameter should be set when modeling noisy data. The remaining 11 algorithms display a comparable noise tolerance, as a smooth and linear degradation of model performance is observed with the level of noise. However, SVM Poly and GP Poly display significant noise sensitivity at high noise levels in some cases. Overall, these results provide a practical guide to make informed decisions about which algorithm and parameter values to use according to the noise level present in the data.

Entities: Disease

Mesh：

Year: 2015 PMID： 26038978 DOI： 10.1021/acs.jcim.5b00101

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Keyword Cloud
Cited

6 in total

Review 1. Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling.

Authors: Linlin Zhao; Heather L Ciallella; Lauren M Aleksunes; Hao Zhu
Journal: Drug Discov Today Date: 2020-07-11 Impact factor: 7.851

2. Compilation and physicochemical classification analysis of a diverse hERG inhibition database.

Authors: Remigijus Didziapetris; Kiril Lanevskij
Journal: J Comput Aided Mol Des Date: 2016-10-25 Impact factor: 3.686

Review 3. ASAS-NANP symposium: mathematical modeling in animal nutrition: limitations and potential next steps for modeling and modelers in the animal sciences.

Authors: Marc Jacobs; Aline Remus; Charlotte Gaillard; Hector M Menendez; Luis O Tedeschi; Suresh Neethirajan; Jennifer L Ellis
Journal: J Anim Sci Date: 2022-06-01 Impact factor: 3.338

Comparing the Influence of Simulated Experimental Errors on 12 Machine Learning Algorithms in Bioactivity Modeling Using 12 Diverse Data Sets.

Review 1. Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling.

2. Compilation and physicochemical classification analysis of a diverse hERG inhibition database.

Review 3. ASAS-NANP symposium: mathematical modeling in animal nutrition: limitations and potential next steps for modeling and modelers in the animal sciences.

4. Improved large-scale prediction of growth inhibition patterns using the NCI60 cancer cell line panel.

5. Experimental Errors in QSAR Modeling Sets: What We Can Do and What We Cannot Do.

6. QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction.