Literature DB >> 16153688

The importance of outlier detection and training set selection for reliable environmental QSAR predictions.

Erik Furusjö1, Anders Svenson, Magnus Rahmberg, Magnus Andersson.   

Abstract

Empirical QSAR models are only valid in the domain they were trained and validated. Application of the model to substances outside the domain of the model can lead to grossly erroneous predictions. Partial least squares (PLS) regression provides tools for prediction diagnostics that can be used to decide whether or not a substance is within the model domain, i.e. if the model prediction can be trusted. QSAR models for four different environmental end-points are used to demonstrate the importance of appropriate training set selection and how the reliability of QSAR predictions can be increased by outlier diagnostics. All models showed consistent results; test set prediction errors were very similar in magnitude to training set estimation errors when prediction outlier diagnostics were used to detect and remove outliers in the prediction data. Test set prediction errors for substances classified as outliers were much larger. The difference in the number of outliers between models with a randomly and systematically selected training illustrates well the need of representative training data.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 16153688     DOI: 10.1016/j.chemosphere.2005.07.002

Source DB:  PubMed          Journal:  Chemosphere        ISSN: 0045-6535            Impact factor:   7.086


  4 in total

1.  Problems with molecular mechanics implementations on the example of 4-benzoyl-1-(4-methyl-imidazol-5-yl)-carbonylthiosemicarbazide.

Authors:  Agata Siwek; Katarzyna Swiderek; Stefan Jankowski
Journal:  J Mol Model       Date:  2011-05-28       Impact factor: 1.810

2.  Quantile regression model for a diverse set of chemicals: application to acute toxicity for green algae.

Authors:  Jonathan Villain; Sylvain Lozano; Marie-Pierre Halm-Lemeille; Gilles Durrieu; Ronan Bureau
Journal:  J Mol Model       Date:  2014-11-29       Impact factor: 1.810

3.  Outliers in SAR and QSAR: 3. Importance of considering the role of water molecules in protein-ligand interactions and quantitative structure-activity relationship studies.

Authors:  Ki Hwan Kim
Journal:  J Comput Aided Mol Des       Date:  2021-03-13       Impact factor: 3.686

4.  RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells.

Authors:  Omer Kaspi; Abraham Yosipof; Hanoch Senderowitz
Journal:  J Cheminform       Date:  2017-06-06       Impact factor: 5.514

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.