| Literature DB >> 22607271 |
Kalai Vanii Jayaseelan1, Pablo Moreno, Andreas Truszkowski, Peter Ertl, Christoph Steinbeck.
Abstract
BACKGROUND: Natural product-likeness of a molecule, i.e. similarity of this molecule to the structure space covered by natural products, is a useful criterion in screening compound libraries and in designing new lead compounds. A closed source implementation of a natural product-likeness score, that finds its application in virtual screening, library design and compound selection, has been previously reported by one of us. In this note, we report an open-source and open-data re-implementation of this scoring system, illustrate its efficiency in ranking small molecules for natural product likeness and discuss its potential applications.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22607271 PMCID: PMC3436723 DOI: 10.1186/1471-2105-13-106
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Molecule curation and atom signature generation workflow. This workflow takes input of compounds and performs curation and atom signature generation for every compound structure. takes input of compounds () in Structure Data Format (SDF) file. The input can be a single SDF file or list of files. The number of compounds to be read and passed down the workflow for each iteration is specified using the port . As soon as the compounds are read, the worker tags every compound with a UUID. This step helps in keeping track of compounds until the end of the scoring process. As a first step in the curation process, the checks for the connectedness of the atoms in the compound structure. This step removes counter ions and other small disconnected fragments. worker removes linear and ring sugars from the compound structures. Finally, the compound structures are checked for the presence of elements other than non-metals, and if present the structures are discarded by the worker. The curated molecules are consumed by the worker to generate atom signatures for every atom in the compound structure. The generated atom signatures are written out to a text file () for re-use. At any step of the process, the curated and discarded structures can be written out to an SDF file. In this workflow, initially tagged compounds () and fully curated compounds () are written out to SDF files. This workflow is available for free download at http://www.myexperiment.org/workflows/2120.html.
Figure 2NP-likeness scoring workflow. This workflow takes input of atom signatures file of natural product (), synthetic (), and query () compounds dataset. The indexes the natural product and synthetic molecule signatures internally and generate NP-likeness scores for query compounds based on the presence or absence of its atom signatures in the index. The higher the score, the higher is the NP-likeness of the compound. The scores assigned with the corresponding compound UUID are written out to a text file. The UUID of the score can then be matched with the (Shown in Figure 1) to retrieve the full structure. The worker is helpful in visualising the distribution of compound scores in a dataset. The scorer worker also rebuilds fragment structure from the atom signature and assigns its corresponding fragment score as the fragment property. These fragment structures are written out to a SDF file as it is helpful in obtaining structures of high scoring fragments. The is an optional worker to visualise the re-built fragments from the atom signature. This workflow is available for free download at http://www.myexperiment.org/workflows/2121.html.
Figure 3Distribution of NP-likeness score for the training (synthetic molecules and natural products) and the test datasets. The synthetic molecules are a subset of the clean lead-like collection from the ZINC database and the natural products are small molecules from ChEMBL database referenced to Journal of natural products. The more positive the score, the higher is the NP-likeness and vice versa.