Literature DB >> 24279462

Binary classification of a large collection of environmental chemicals from estrogen receptor assays by quantitative structure-activity relationship and machine learning methods.

Qingda Zang1, Daniel M Rotroff, Richard S Judson.   

Abstract

There are thousands of environmental chemicals subject to regulatory decisions for endocrine disrupting potential. The ToxCast and Tox21 programs have tested ∼8200 chemicals in a broad screening panel of in vitro high-throughput screening (HTS) assays for estrogen receptor (ER) agonist and antagonist activity. The present work uses this large data set to develop in silico quantitative structure-activity relationship (QSAR) models using machine learning (ML) methods and a novel approach to manage the imbalanced data distribution. Training compounds from the ToxCast project were categorized as active or inactive (binding or nonbinding) classes based on a composite ER Interaction Score derived from a collection of 13 ER in vitro assays. A total of 1537 chemicals from ToxCast were used to derive and optimize the binary classification models while 5073 additional chemicals from the Tox21 project, evaluated in 2 of the 13 in vitro assays, were used to externally validate the model performance. In order to handle the imbalanced distribution of active and inactive chemicals, we developed a cluster-selection strategy to minimize information loss and increase predictive performance and compared this strategy to three currently popular techniques: cost-sensitive learning, oversampling of the minority class, and undersampling of the majority class. QSAR classification models were built to relate the molecular structures of chemicals to their ER activities using linear discriminant analysis (LDA), classification and regression trees (CART), and support vector machines (SVM) with 51 molecular descriptors from QikProp and 4328 bits of structural fingerprints as explanatory variables. A random forest (RF) feature selection method was employed to extract the structural features most relevant to the ER activity. The best model was obtained using SVM in combination with a subset of descriptors identified from a large set via the RF algorithm, which recognized the active and inactive compounds at the accuracies of 76.1% and 82.8% with a total accuracy of 81.6% on the internal test set and 70.8% on the external test set. These results demonstrate that a combination of high-quality experimental data and ML methods can lead to robust models that achieve excellent predictive accuracy, which are potentially useful for facilitating the virtual screening of chemicals for environmental risk assessment.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 24279462     DOI: 10.1021/ci400527b

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  19 in total

1.  Integrating docking scores and key interaction profiles to improve the accuracy of molecular docking: towards novel B-RafV600E inhibitors.

Authors:  Chun-Qi Hu; Kang Li; Ting-Ting Yao; Yong-Zhou Hu; Hua-Zhou Ying; Xiao-Wu Dong
Journal:  Medchemcomm       Date:  2017-07-24       Impact factor: 3.597

2.  Prediction of skin sensitization potency using machine learning approaches.

Authors:  Qingda Zang; Michael Paris; David M Lehmann; Shannon Bell; Nicole Kleinstreuer; David Allen; Joanna Matheson; Abigail Jacobs; Warren Casey; Judy Strickland
Journal:  J Appl Toxicol       Date:  2017-01-10       Impact factor: 3.446

3.  Undersampling: case studies of flaviviral inhibitory activities.

Authors:  Stephen J Barigye; José Manuel García de la Vega; Juan A Castillo-Garit
Journal:  J Comput Aided Mol Des       Date:  2019-11-26       Impact factor: 3.686

4.  In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning.

Authors:  Qingda Zang; Kamel Mansouri; Antony J Williams; Richard S Judson; David G Allen; Warren M Casey; Nicole C Kleinstreuer
Journal:  J Chem Inf Model       Date:  2017-01-09       Impact factor: 4.956

5.  Predictive Modeling of Estrogen Receptor Binding Agents Using Advanced Cheminformatics Tools and Massive Public Data.

Authors:  Kathryn Ribay; Marlene T Kim; Wenyi Wang; Daniel Pinolini; Hao Zhu
Journal:  Front Environ Sci       Date:  2016-03-08

6.  Machine Learning Models for Predicting Liver Toxicity.

Authors:  Jie Liu; Wenjing Guo; Sugunadevi Sakkiah; Zuowei Ji; Gokhan Yavas; Wen Zou; Minjun Chen; Weida Tong; Tucker A Patterson; Huixiao Hong
Journal:  Methods Mol Biol       Date:  2022

7.  Channel Interactions and Robust Inference for Ratiometric β-lactamase Assay Data: a Tox21 Library Analysis.

Authors:  Fjodor Melnikov; Jui-Hua Hsieh; Nisha S Sipes; Paul T Anastas
Journal:  ACS Sustain Chem Eng       Date:  2018-01-15       Impact factor: 8.198

8.  ADMET evaluation in drug discovery: 15. Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling.

Authors:  Tailong Lei; Youyong Li; Yunlong Song; Dan Li; Huiyong Sun; Tingjun Hou
Journal:  J Cheminform       Date:  2016-02-01       Impact factor: 5.514

9.  Multivariate models for prediction of human skin sensitization hazard.

Authors:  Judy Strickland; Qingda Zang; Michael Paris; David M Lehmann; David Allen; Neepa Choksi; Joanna Matheson; Abigail Jacobs; Warren Casey; Nicole Kleinstreuer
Journal:  J Appl Toxicol       Date:  2016-08-02       Impact factor: 3.446

10.  The identification of complex interactions in epidemiology and toxicology: a simulation study of boosted regression trees.

Authors:  Erik Lampa; Lars Lind; P Monica Lind; Anna Bornefalk-Hermansson
Journal:  Environ Health       Date:  2014-07-04       Impact factor: 5.984

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.