Literature DB >> 21076934

Toward better QSAR/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features.

Dongsheng Cao1, Yizeng Liang, Qingsong Xu, Yifeng Yun, Hongdong Li.   

Abstract

Building a robust and reliable QSAR/QSPR model should greatly consider two aspects: selecting the optimal variable subset from a large pool of molecular descriptors and detecting outliers from a pool of samples. The two problems have the specific similarity and complementarity to some extent. Given a particular learning algorithm on a particular data set, one should consider how the interaction could happen between variable selection and outlier detection. In this paper, we describe a consistent methodology for simultaneously performing variable subset selection and outlier detection using the idea of statistical distribution which can be simulated by the establishment of many cross-predictive linear models. The approach exploits the fact that the distribution of linear model coefficients provides a mechanism for ranking and interpreting the effects of variable, while the distribution of prediction errors provides a mechanism for differentiating the outliers from normal samples. The use of statistic of these distributions, namely mean value and standard deviation, inherently provides a feasible way to effectively describe the information contained by the original samples. Several examples are used to demonstrate the prediction ability of our proposed approach through the comparison of different approaches as well as their combinations.

Entities:  

Mesh:

Year:  2010        PMID: 21076934     DOI: 10.1007/s10822-010-9401-1

Source DB:  PubMed          Journal:  J Comput Aided Mol Des        ISSN: 0920-654X            Impact factor:   3.686


  28 in total

1.  QM/NN QSPR models with error estimation: vapor pressure and logP

Authors: 
Journal:  J Chem Inf Comput Sci       Date:  2000-07

2.  Assessing model fit by cross-validation.

Authors:  Douglas M Hawkins; Subhash C Basak; Denise Mills
Journal:  J Chem Inf Comput Sci       Date:  2003 Mar-Apr

3.  Modified particle swarm optimization algorithm for variable selection in MLR and PLS modeling: QSAR studies of antagonism of angiotensin II antagonists.

Authors:  Qi Shen; Jian-Hui Jiang; Chen-Xu Jiao; Guo-Li Shen; Ru-Qin Yu
Journal:  Eur J Pharm Sci       Date:  2004-06       Impact factor: 4.384

4.  Prediction of P-glycoprotein substrates by a support vector machine approach.

Authors:  Y Xue; C W Yap; L Z Sun; Z W Cao; J F Wang; Y Z Chen
Journal:  J Chem Inf Comput Sci       Date:  2004 Jul-Aug

5.  General melting point prediction based on a diverse compound data set and artificial neural networks.

Authors:  M Karthikeyan; Robert C Glen; Andreas Bender
Journal:  J Chem Inf Model       Date:  2005 May-Jun       Impact factor: 4.956

6.  Outliers in SAR and QSAR: is unusual binding mode a possible source of outliers?

Authors:  Ki Hwan Kim
Journal:  J Comput Aided Mol Des       Date:  2007-03-03       Impact factor: 3.686

7.  Investigation of the mechanism of flux across human skin in vitro by quantitative structure-permeability relationships.

Authors:  M T Cronin; J C Dearden; G P Moss; G Murray-Dickson
Journal:  Eur J Pharm Sci       Date:  1999-03       Impact factor: 4.384

8.  Modeling robust QSAR 3: SOM-4D-QSAR with iterative variable elimination IVE-PLS: application to steroid, azo dye, and benzoic acid series.

Authors:  Andrzej Bak; Jaroslaw Polanski
Journal:  J Chem Inf Model       Date:  2007-06-14       Impact factor: 4.956

9.  Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection.

Authors:  Igor V Tetko; Iurii Sushko; Anil Kumar Pandey; Hao Zhu; Alexander Tropsha; Ester Papa; Tomas Oberg; Roberto Todeschini; Denis Fourches; Alexandre Varnek
Journal:  J Chem Inf Model       Date:  2008-08-26       Impact factor: 4.956

Review 10.  Common disorders are quantitative traits.

Authors:  Robert Plomin; Claire M A Haworth; Oliver S P Davis
Journal:  Nat Rev Genet       Date:  2009-10-27       Impact factor: 53.242

View more
  2 in total

1.  3D-QSPR method of computational technique applied on red reactive dyes by using CoMFA strategy.

Authors:  Uzma Mahmood; Sitara Rashid; S Ishrat Ali; Rasheeda Parveen; Nida Ambreen; Khalid Mohammed Khan; Shahnaz Perveen; Wolfgang Voelter
Journal:  Int J Mol Sci       Date:  2011-12-05       Impact factor: 5.923

2.  Improvement of the Prediction Power of the CoMFA and CoMSIA Models on Histamine H3 Antagonists by Different Variable Selection Methods.

Authors:  Jahan B Ghasemi; Hossein Tavakoli
Journal:  Sci Pharm       Date:  2012-05-24
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.