| Literature DB >> 27485201 |
Susann Vorberg1, Igor V Tetko2,3,4.
Abstract
Biodegradability describes the capacity of substances to be mineralized by free-living bacteria. It is a crucial property in estimating a compound's long-term impact on the environment. The ability to reliably predict biodegradability would reduce the need for laborious experimental testing. However, this endpoint is difficult to model due to unavailability or inconsistency of experimental data. Our approach makes use of the Online Chemical Modeling Environment (OCHEM) and its rich supply of machine learning methods and descriptor sets to build classification models for ready biodegradability. These models were analyzed to determine the relationship between characteristic structural properties and biodegradation activity. The distinguishing feature of the developed models is their ability to estimate the accuracy of prediction for each individual compound. The models developed using seven individual descriptor sets were combined in a consensus model, which provided the highest accuracy. The identified overrepresented structural fragments can be used by chemists to improve the biodegradability of new chemical compounds. The consensus model, the datasets used, and the calculated structural fragments are publicly available at http://ochem.eu/article/31660.Entities:
Keywords: Outlier detection; Ready biodegradability; Structural and functional interpretation
Year: 2013 PMID: 27485201 PMCID: PMC5175213 DOI: 10.1002/minf.201300030
Source DB: PubMed Journal: Mol Inform ISSN: 1868-1743 Impact factor: 3.353
The publisher did not receive permission from the copyright owner to include this object in this version of this product. Please refer either to the publisher's own online version of this product or the printed product where one exists.
The publisher did not receive permission from the copyright owner to include this object in this version of this product. Please refer either to the publisher's own online version of this product or the printed product where one exists.
The publisher did not receive permission from the copyright owner to include this object in this version of this product. Please refer either to the publisher's own online version of this product or the printed product where one exists.
Figure 1(a) Accuracy plot (y‐axis provides the ratio of correct predictions) for the WEKA model based on CDK descriptors with Bagging‐STD as distance to model. The 20 % of compounds with the lowest distance to the model, used for determination of outliers, are emphasized in the plot. (b) Venn diagram of the overlap of excluded compounds regarding 50 % of the lowest distance‐to‐model compounds (for the three best models of the whole dataset).
The publisher did not receive permission from the copyright owner to include this object in this version of this product. Please refer either to the publisher's own online version of this product or the printed product where one exists.
The publisher did not receive permission from the copyright owner to include this object in this version of this product. Please refer either to the publisher's own online version of this product or the printed product where one exists.
Figure 2Biodegradability classes and the confidence of prediction as calculated by the consensus model for four compounds with a hexahydrotriazine group. The predictions for compounds A and D have the lowest accuracy and are flagged as outside the applicability domain of the model.