| Literature DB >> 20298518 |
Syed Asad Rahman1, Matthew Bashton, Gemma L Holliday, Rainer Schrader, Janet M Thornton.
Abstract
BACKGROUND: Finding one small molecule (query) in a large target library is a challenging task in computational chemistry. Although several heuristic approaches are available using fragment-based chemical similarity searches, they fail to identify exact atom-bond equivalence between the query and target molecules and thus cannot be applied to complex chemical similarity searches, such as searching a complete or partial metabolic pathway.In this paper we present a new Maximum Common Subgraph (MCS) tool: SMSD (Small Molecule Subgraph Detector) to overcome the issues with current heuristic approaches to small molecule similarity searches. The MCS search implemented in SMSD incorporates chemical knowledge (atom type match with bond sensitive and insensitive information) while searching molecular similarity. We also propose a novel method by which solutions obtained by each MCS run can be ranked using chemical filters such as stereochemistry, bond energy, etc.Entities:
Year: 2009 PMID: 20298518 PMCID: PMC2820491 DOI: 10.1186/1758-2946-1-12
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Figure 1The compatibility graph between Isobutane and Cyclopropane will generate a compatibility graph with 36 edges and 12 vertices. There are 18 c-edges (green dotted lines) and non c-edges (red lines). This will lead to 18 MCS solutions, each of size 3.
Figure 2Flowchart describing the post-filtering step in the SMSD algorithm.
Figure 3Flowchart outlining the methodology of the Maximum Common Subgraph (MCS) algorithm used by SMSD.
Figure 4Head to head comparison of MCS jobs processed by SMSD and CDK-MCS. The similarity score frequencies (left axis) between each pair (colored boxes) of MCS solutions derived by the CDKMCS and the SMSD algorithm were sorted into bins ranging from 0 to 1 increasing in 0.01 increments. The cumulative percentage of the overall dataset for each bin is also shown (curves and right axis). Data shown in blue correspond to results from the SMSD algorithm while those in mauve correspond to CDKMCS. Data shown in yellow correspond to all the jobs that ran successfully by SMSD (includes 24 jobs which failed to run by CDKMCS). It is clear from the graph that the reported frequency of the SMSD similarity is almost similar to the CDKMCS similarity between the molecules. However the overall similarity between SMSD and CDKMCS is different because SMSD was able to process higher number of jobs than the latter. A good cut-off Tanimoto similarity score for reporting significant matches seems to be above 0.77 (at 99.9 percentile of the curve) for MCS based searches (indicated by the rightmost set of dashed lines).
Figure 5The similarity score frequencies (left axis) between each pair (colored boxes) of MCS solutions derived by the CDK-Fingerprint and the SMSD algorithm were sorted into bins ranging from 0 to 1 increasing in 0.01 increments. The cumulative percentage of the overall dataset for each bin is also shown (curves and right axis). Data shown in blue correspond to results from the SMSD algorithm while those in lilac correspond to CDK-Fingerprint. It is clear from the graph that the reported frequency of the SMSD similarity is different from the fingerprint similarity between the molecules. A good cut-off Tanimoto similarity score for reporting significant matches seems to be above 0.77 (at 99.9 percentile of the curve) for Fingerprint based searches and the MCS based search (indicated by the rightmost set of dashed lines).