Literature DB >> 12767154

Predicting the genotoxicity of secondary and aromatic amines using data subsetting to generate a model ensemble.

Brian E Mattioni1, Gregory W Kauffman, Peter C Jurs, Laura L Custer, Stephen K Durham, Greg M Pearl.   

Abstract

Binary quantitative structure-activity relationship (QSAR) models are developed to classify a data set of 334 aromatic and secondary amine compounds as genotoxic or nongenotoxic based on information calculated solely from chemical structure. Genotoxic endpoints for each compound were determined using the SOS Chromotest in both the presence and absence of an S9 rat liver homogenate. Compounds were considered genotoxic if assay results indicated a positive genotoxicity hit for either the S9 inactivated or S9 activated assay. Each compound in the data set was encoded through the calculation of numerical descriptors that describe various aspects of chemical structure (e.g. topological, geometric, electronic, polar surface area). Furthermore, five additional descriptors that focused on the secondary and aromatic nitrogen atoms in each molecule were calculated specifically for this study. Descriptor subsets were examined using a genetic algorithm search engine interfaced with a k-Nearest Neighbor fitness evaluator to find the most information-rich subsets, which ultimately served as the final predictive models. Models were chosen for their ability to minimize the total number of misclassifications, with special attention given to those models that possessed fewer occurrences of positive toxicity hits being misclassified as nontoxic (false negatives). In addition, a subsetting procedure was used to form an ensemble of models using different combinations of compounds in the training and prediction sets. This was done to ensure that consistent results could be obtained regardless of training set composition. The procedure also allowed for each compound to be externally validated three times by different training set data with the resultant predictions being used in a "majority rules" voting scheme to produce a consensus prediction for each member of the data set. The individual models produced an average training set classification rate of 71.6% and an average prediction set classification rate of 67.7%. However, the model ensemble was able to correctly classify the genotoxicity of 72.2% of all prediction set compounds.

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 12767154     DOI: 10.1021/ci034013i

Source DB:  PubMed          Journal:  J Chem Inf Comput Sci        ISSN: 0095-2338


  4 in total

1.  Novel approach to evolutionary neural network based descriptor selection and QSAR model development.

Authors:  Zeljko Debeljak; Viktor Marohnić; Goran Srecnik; Marica Medić-Sarić
Journal:  J Comput Aided Mol Des       Date:  2006-04-11       Impact factor: 3.686

2.  Reverse fingerprinting, similarity searching by group fusion and fingerprint bit importance.

Authors:  Chris Williams
Journal:  Mol Divers       Date:  2006-09-21       Impact factor: 2.943

Review 3.  Bioinformatics opportunities for identification and study of medicinal plants.

Authors:  Vivekanand Sharma; Indra Neil Sarkar
Journal:  Brief Bioinform       Date:  2012-05-15       Impact factor: 11.622

4.  Discovery of a First-in-Class Gut-Restricted RET Kinase Inhibitor as a Clinical Candidate for the Treatment of IBS.

Authors:  Hilary Schenck Eidam; John Russell; Kaushik Raha; Michael DeMartino; Donghui Qin; Huiping Amy Guan; Zhiliu Zhang; Gong Zhen; Haiyu Yu; Chengde Wu; Yan Pan; Gerard Joberty; Nico Zinn; Sylvie Laquerre; Sharon Robinson; Angela White; Amanda Giddings; Ehsan Mohammadi; Beverly Greenwood-Van Meerveld; Allen Oliff; Sanjay Kumar; Mui Cheung
Journal:  ACS Med Chem Lett       Date:  2018-05-24       Impact factor: 4.345

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.