Literature DB >> 18729318

Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection.

Igor V Tetko1, Iurii Sushko, Anil Kumar Pandey, Hao Zhu, Alexander Tropsha, Ester Papa, Tomas Oberg, Roberto Todeschini, Denis Fourches, Alexandre Varnek.   

Abstract

The estimation of the accuracy of predictions is a critical problem in QSAR modeling. The "distance to model" can be defined as a metric that defines the similarity between the training set molecules and the test set compound for the given property in the context of a specific model. It could be expressed in many different ways, e.g., using Tanimoto coefficient, leverage, correlation in space of models, etc. In this paper we have used mixtures of Gaussian distributions as well as statistical tests to evaluate six types of distances to models with respect to their ability to discriminate compounds with small and large prediction errors. The analysis was performed for twelve QSAR models of aqueous toxicity against T. pyriformis obtained with different machine-learning methods and various types of descriptors. The distances to model based on standard deviation of predicted toxicity calculated from the ensemble of models afforded the best results. This distance also successfully discriminated molecules with low and large prediction errors for a mechanism-based model developed using log P and the Maximum Acceptor Superdelocalizability descriptors. Thus, the distance to model metric could also be used to augment mechanistic QSAR models by estimating their prediction errors. Moreover, the accuracy of prediction is mainly determined by the training set data distribution in the chemistry and activity spaces but not by QSAR approaches used to develop the models. We have shown that incorrect validation of a model may result in the wrong estimation of its performance and suggested how this problem could be circumvented. The toxicity of 3182 and 48774 molecules from the EPA High Production Volume (HPV) Challenge Program and EINECS (European chemical Substances Information System), respectively, was predicted, and the accuracy of prediction was estimated. The developed models are available online at http://www.qspr.org site.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18729318     DOI: 10.1021/ci800151m

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  76 in total

1.  Evaluating the applicability domain in the case of classification predictive models for carcinogenicity based on the counter propagation artificial neural network.

Authors:  Natalja Fjodorova; Marjana Novič; Alessandra Roncaglioni; Emilio Benfenati
Journal:  J Comput Aided Mol Des       Date:  2011-12-03       Impact factor: 3.686

2.  QSAR classification of metabolic activation of chemicals into covalently reactive species.

Authors:  Chin Yee Liew; Chuen Pan; Andre Tan; Ke Xin Magneline Ang; Chun Wei Yap
Journal:  Mol Divers       Date:  2012-02-28       Impact factor: 2.943

3.  Toward better QSAR/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features.

Authors:  Dongsheng Cao; Yizeng Liang; Qingsong Xu; Yifeng Yun; Hongdong Li
Journal:  J Comput Aided Mol Des       Date:  2010-11-13       Impact factor: 3.686

Review 4.  Big-Data Science in Porous Materials: Materials Genomics and Machine Learning.

Authors:  Kevin Maik Jablonka; Daniele Ongari; Seyed Mohamad Moosavi; Berend Smit
Journal:  Chem Rev       Date:  2020-06-10       Impact factor: 60.622

5.  Iterative experimental and virtual high-throughput screening identifies metabotropic glutamate receptor subtype 4 positive allosteric modulators.

Authors:  Ralf Mueller; Eric S Dawson; Colleen M Niswender; Mariusz Butkiewicz; Corey R Hopkins; C David Weaver; Craig W Lindsley; P Jeffrey Conn; Jens Meiler
Journal:  J Mol Model       Date:  2012-05-17       Impact factor: 1.810

6.  Robust optimization of scoring functions for a target class.

Authors:  Markus H J Seifert
Journal:  J Comput Aided Mol Des       Date:  2009-05-27       Impact factor: 3.686

7.  QSAR modeling: where have you been? Where are you going to?

Authors:  Artem Cherkasov; Eugene N Muratov; Denis Fourches; Alexandre Varnek; Igor I Baskin; Mark Cronin; John Dearden; Paola Gramatica; Yvonne C Martin; Roberto Todeschini; Viviana Consonni; Victor E Kuz'min; Richard Cramer; Romualdo Benigni; Chihae Yang; James Rathman; Lothar Terfloth; Johann Gasteiger; Ann Richard; Alexander Tropsha
Journal:  J Med Chem       Date:  2014-01-06       Impact factor: 7.446

8.  The continuous molecular fields approach to building 3D-QSAR models.

Authors:  Igor I Baskin; Nelly I Zhokhova
Journal:  J Comput Aided Mol Des       Date:  2013-05-30       Impact factor: 3.686

9.  Cheminformatics analysis of assertions mined from literature that describe drug-induced liver injury in different species.

Authors:  Denis Fourches; Julie C Barnes; Nicola C Day; Paul Bradley; Jane Z Reed; Alexander Tropsha
Journal:  Chem Res Toxicol       Date:  2010-01       Impact factor: 3.739

10.  Prediction of Cytochrome P450 Profiles of Environmental Chemicals with QSAR Models Built from Drug-like Molecules.

Authors:  Hongmao Sun; Henrike Veith; Menghang Xia; Christopher P Austin; Raymond R Tice; Ruili Huang
Journal:  Mol Inform       Date:  2012-10-11       Impact factor: 3.353

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.