| Literature DB >> 31165338 |
Sebastian Fritsch1, Stefan Neumann1, Jonas Schaub2, Christoph Steinbeck3, Achim Zielesny4.
Abstract
The Ertl algorithm for automated functional groups (FG) detection and extraction of organic molecules is implemented on the basis of the Chemistry Development Kit (CDK). A distinct impact of the chosen CDK aromaticity model is demonstrated by an FG analysis of the ChEMBL database compounds. The average performance of less than a millisecond for a single-molecule FG extraction allows for fast processing of even large compound databases.Entities:
Keywords: Aromaticity; CDK; Chemistry Development Kit; Cycle finder; Electron donation; Functional group
Year: 2019 PMID: 31165338 PMCID: PMC6549326 DOI: 10.1186/s13321-019-0361-8
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1Influence of the different CDK electron donation types on FG detection (identified FGs are being highlighted by a colored background, the same cycle finder Cycles.all algorithm is used for all electron donation types). Left: The daylight electron donation type assigns a fully aromatic ring structure with corresponding FGs. Right: The electron donation types cdk, cdkAllowingExocyclic and piBonds assign an aromatic benzene ring plus an annulated aliphatic ring with only one resulting larger FG on the right (highlighted in pink) instead of three corresponding FGs (highlighted in red and orange) for the daylight electron donation
Fig. 2Performance snapshot of ErtlFunctionalGroupsFinder for FG extraction from 1.8 million ChEMBL compounds in dependence of the number of parallelized processing threads
Fig. 3Frequencies of the twenty most frequent FGs of 1.8 million ChEMBL compounds for different electron donation types with cycle finder Cycles.all
Fig. 4Frequencies of the twenty most frequent FGs of 1.8 million ChEMBL compounds for different cycle finder algorithms and the daylight electron donation model
Fig. 5Comparison of FG detection between ErtlFunctionalGroupsFinder (blue bars) and IFG RDKit (green bars) for different aromaticity/electron donation models (the RDKit aromaticity model labels are abbreviated like IFG AROMATICITY_RDKIT to simply IFG RDKIT). For details see text