| Literature DB >> 29969981 |
Tadi Venkata Sivakumar1, Anirban Bhaduri1, Rajasekhara Reddy Duvvuru Muni1, Jin Hwan Park2, Tae Yong Kim3.
Abstract
BACKGROUND: Computation of reaction similarity is a pre-requisite for several bioinformatics applications including enzyme identification for specific biochemical reactions, enzyme classification and mining for specific inhibitors. Reaction similarity is often assessed at either two levels: (i) comparison across all the constituent substrates and products of a reaction, reaction level similarity, (ii) comparison at the transformation center with various degrees of neighborhood, transformation level similarity. Existing reaction similarity computation tools are designed for specific applications and use different features and similarity measures. A single system integrating these diverse features enables comparison of the impact of different molecular properties on similarity score computation.Entities:
Keywords: Fingerprint; Reaction similarity; Similarity measures; Transformation similarity
Mesh:
Year: 2018 PMID: 29969981 PMCID: PMC6029250 DOI: 10.1186/s12859-018-2248-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Overview of SimCAL system and list of available features
List of four fingerprints available in SimCAL
| S. No. | Name | Description |
|---|---|---|
| 1. | Circular Fingerprint | Circular fingerprint is based on CDK’s [ |
| 2. | Extended Fingerprint | Functionally equivalent to ExtendedFingerprinter of CDK [ |
| 3. | Substructure Fingerprint | This is a structural key type fingerprint which considers assessment of 307 different substructures and is based on KlekotaRothFingerprinter [ |
| 4. | Enhanced Fingerprint | An in-house developed improvised extended fingerprint which accounts for stereochemistryand charges on molecules. |
List of binary similarity measures included in SimCAL
| S. No. | Measure | Definition | Range |
|---|---|---|---|
| 1. | Tanimoto |
| [0-1] |
| 2. | Dice |
| [0-1] |
| 3. | Ochiai |
| [0-1] |
| 4. | Simpson |
| [0-1] |
| 5. | Russell and Rao |
| [0-1] |
| 6. | Sokal and Michener |
| [0-1] |
| 7. | Faith |
| [0-1] |
| 8. | Gower and Legendre |
| [0-1] |
| 9. | Roger and Tanimoto |
| [0-1] |
The measures are in correspondence to [45]. a is count of set bits in both fingerprint of both the molecules. b is count of set bits in fingerprint of first molecule and not in second molecule. c is count of set bits in fingerprint of second molecule and not in first molecule. d is count of unset bits in both fingerprint of both the molecules. The size of the fingerprint is given by n = (a + b + c + d)
Fig. 2Exemplary computation of transformation based similarity
Fig. 3Schematic processing of the fingerprint based similarity computation
Fig. 4Receiver operating curves (ROC) for various approaches. a Reports the dependency of accuracy of predicting similar and non-similar reactions with cutoff (threshold) using the various approaches. b Reports the dependency of precision of predicting similar and non-similar reactions with cutoff (threshold) using the various approaches. c Reports the dependency of recall (true predictive rate) of predicting similar and non-similar reactions with cutoff (threshold) using the various approaches
Fig. 5Correlation matrix across the various approaches within SimCAL and 3 approaches of EC-BLAST and the molecular signature based chemical similarity method