Literature DB >> 26553402

A Multi-Objective Genetic Algorithm for Outlier Removal.

Oren E Nahum1,2,3, Abraham Yosipof4, Hanoch Senderowitz3.   

Abstract

Quantitative structure activity relationship (QSAR) or quantitative structure property relationship (QSPR) models are developed to correlate activities for sets of compounds with their structure-derived descriptors by means of mathematical models. The presence of outliers, namely, compounds that differ in some respect from the rest of the data set, compromise the ability of statistical methods to derive QSAR models with good prediction statistics. Hence, outliers should be removed from data sets prior to model derivation. Here we present a new multi-objective genetic algorithm for the identification and removal of outliers based on the k nearest neighbors (kNN) method. The algorithm was used to remove outliers from three different data sets of pharmaceutical interest (logBBB, factor 7 inhibitors, and dihydrofolate reductase inhibitors), and its performances were compared with those of five other methods for outlier removal. The results suggest that the new algorithm provides filtered data sets that (1) better maintain the internal diversity of the parent data sets and (2) give rise to QSAR models with much better prediction statistics. Equally good filtered data sets in terms of these metrics were obtained when another objective function was added to the algorithm (termed "preservation"), forcing it to remove certain compounds with low probability only. This option is highly useful when specific compounds should be preferably kept in the final data set either because they have favorable activities or because they represent interesting molecular scaffolds. We expect this new algorithm to be useful in future QSAR applications.

Mesh:

Year:  2015        PMID: 26553402     DOI: 10.1021/acs.jcim.5b00515

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  1 in total

1.  RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells.

Authors:  Omer Kaspi; Abraham Yosipof; Hanoch Senderowitz
Journal:  J Cheminform       Date:  2017-06-06       Impact factor: 5.514

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.