Literature DB >> 23250826

Reliably assessing prediction reliability for high dimensional QSAR data.

Jianping Huang1, Xiaohui Fan.   

Abstract

Predictability and prediction reliability are of utmost important to characterize a good Quantitative structure-activity relationships (QSAR) model. However, validation methods are insufficient to guarantee the prediction reliability of QSAR models. Moreover, high dimensional samples also pose great challenge to traditional methods in terms of predictive power. Therefore, this study presents a predictive classifier (i.e., TreeEC) that can assess prediction reliability with high confidence, especially for facing high dimensional QSAR data. Two approaches for assessing prediction reliability are provided, i.e., applicability domain and prediction confidence. We demonstrate that the applicability domain has difficulty to guarantee the models' prediction reliability, where samples intensively close to the domain center are often poor predicted than those outside the domain. Instead, prediction confidence is more promising for assessing prediction reliability. Based on a large data set assessed by prediction confidence, external samples assessed with high confidence greater than 95 % can be reliably predicted with an accuracy of 94 %, in contrast to the average accuracy of 84 %. We also illustrate that TreeEC are less affected by high dimensionality than other popular methods according to 11 public data sets. A free version of TreeEC with a user-friendly interface can also be downloading from website http://pharminfo.zju.edu.cn/computation/TreeEC/TreeEC.html.

Entities:  

Mesh:

Year:  2012        PMID: 23250826     DOI: 10.1007/s11030-012-9415-9

Source DB:  PubMed          Journal:  Mol Divers        ISSN: 1381-1991            Impact factor:   2.943


  25 in total

1.  Assessing model fit by cross-validation.

Authors:  Douglas M Hawkins; Subhash C Basak; Denise Mills
Journal:  J Chem Inf Comput Sci       Date:  2003 Mar-Apr

Review 2.  In silico ADME/Tox: why models fail.

Authors:  Terry R Stouch; James R Kenyon; Stephen R Johnson; Xue-Qing Chen; Arthur Doweyko; Yi Li
Journal:  J Comput Aided Mol Des       Date:  2003 Feb-Apr       Impact factor: 3.686

3.  Decision forest for classification of gene expression data.

Authors:  Jianping Huang; Hong Fang; Xiaohui Fan
Journal:  Comput Biol Med       Date:  2010-06-29       Impact factor: 4.589

4.  On outliers and activity cliffs--why QSAR often disappoints.

Authors:  Gerald M Maggiora
Journal:  J Chem Inf Model       Date:  2006 Jul-Aug       Impact factor: 4.956

5.  The trouble with QSAR (or how I learned to stop worrying and embrace fallacy).

Authors:  Stephen R Johnson
Journal:  J Chem Inf Model       Date:  2007-12-28       Impact factor: 4.956

6.  Change correlations in structure-activity studies using multiple regression analysis.

Authors:  J G Topliss; R J Costello
Journal:  J Med Chem       Date:  1972-10       Impact factor: 7.446

7.  Identifying P-glycoprotein substrates using a support vector machine optimized by a particle swarm.

Authors:  Jianping Huang; Guangli Ma; Ishtiaq Muhammad; Yiyu Cheng
Journal:  J Chem Inf Model       Date:  2007-07-03       Impact factor: 4.956

8.  QSAR: dead or alive?

Authors:  Arthur M Doweyko
Journal:  J Comput Aided Mol Des       Date:  2008-01-09       Impact factor: 4.179

9.  Prediction of skin sensitization with a particle swarm optimized support vector machine.

Authors:  Hua Yuan; Jianping Huang; Chenzhong Cao
Journal:  Int J Mol Sci       Date:  2009-07-17       Impact factor: 6.208

10.  Assessment of prediction confidence and domain extrapolation of two structure-activity relationship models for predicting estrogen receptor binding activity.

Authors:  Weida Tong; Qian Xie; Huixiao Hong; Leming Shi; Hong Fang; Roger Perkins
Journal:  Environ Health Perspect       Date:  2004-08       Impact factor: 9.031

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.