| Literature DB >> 31858821 |
F Lunghini1,2, G Marcou1, P Gantzer1, P Azam2, D Horvath1, E Van Miert2, A Varnek1.
Abstract
The European Registration, Evaluation, Authorization and Restriction of Chemical Substances Regulation, requires marketed chemicals to be evaluated for Ready Biodegradability (RB), considering in silico prediction as valid alternative to experimental testing. However, currently available models may not be relevant to predict compounds of industrial interest, due to accuracy and applicability domain restriction issues. In this work, we present a new and extended RB dataset (2830 compounds), issued by the merging of several public data sources. It was used to train classification models, which were externally validated and benchmarked against already-existing tools on a set of 316 compounds coming from the industrial context. New models showed good performances in terms of predictive power (Balance Accuracy (BA) = 0.74-0.79) and data coverage (83-91%). The Generative Topographic Mapping approach identified several chemotypes and structural motifs unique to the industrial dataset, highlighting for which chemical classes currently available models may have less reliable predictions. Finally, public and industrial data were merged into global dataset containing 3146 compounds. This is the biggest dataset reported in the literature so far, covering some chemotypes absent in the public data. Thus, predictive model developed on the Global dataset has larger applicability domain than the existing ones.Keywords: QSAR/QSPR; benchmarking; environmental fate; generative topographic mapping (GTM); reach; ready biodegradability
Mesh:
Substances:
Year: 2019 PMID: 31858821 DOI: 10.1080/1062936X.2019.1697360
Source DB: PubMed Journal: SAR QSAR Environ Res ISSN: 1026-776X Impact factor: 3.000