| Literature DB >> 26740739 |
Kirsten E Beattie1, Luna De Ferrari1, John B O Mitchell1.
Abstract
First, we identify InterPro sequence signatures representing evolutionary relatedness and, second, signatures identifying specific chemical machinery. Thus, we predict the chemical mechanisms of enzyme-catalyzed reactions from catalytic and non-catalytic subsets of InterPro signatures. We first scanned our 249 sequences using InterProScan and then used the MACiE database to identify those amino acid residues that are important for catalysis. The sequences were mutated in silico to replace these catalytic residues with glycine and then again scanned using InterProScan. Those signature matches from the original scan that disappeared on mutation were called catalytic. Mechanism was predicted using all signatures, only the 78 "catalytic" signatures, or only the 519 "non-catalytic" signatures. The non-catalytic signatures gave indistinguishable results from those for the whole feature set, with precision of 0.991 and sensitivity of 0.970. The catalytic signatures alone gave less impressive predictivity, with precision and sensitivity of 0.791 and 0.735, respectively. These results show that our successful prediction of enzyme mechanism is mostly by homology rather than by identifying catalytic machinery.Entities:
Keywords: InterPro; active site; enzyme catalysis; evolution; homology; reaction mechanism; sequence signatures
Year: 2015 PMID: 26740739 PMCID: PMC4696837 DOI: 10.4137/EBO.S31482
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
The numbers of catalytic and non-catalytic signatures, both in the raw data and in the refined set suitable for machine learning. This refined set had to contain at least two instances of each mechanism to permit training and testing, so all singleton mechanisms were removed in the refinement process. The total number of signatures is 556, which is unequal to the sum of 78 and 519 since some signatures are both catalytic and non-catalytic in different sequences.
| DATASET | RAW TOTAL SIGNATURES | SIGNATURES FOR ML |
|---|---|---|
| Catalytic | 300 | 78 |
| Non-catalytic | 1860 | 519 |
| Total | 2160 | 556 |
Figure 1Illustration of the data, attributes, and labels used in this work. The sequences represented by InterPro signature sets, together with the associated MACiE mechanism labels, and also the illustration of the closely related relevant information such as PDB codes, EC numbers, and domain names that can easily be associated with our data.
Micro-averaged precision and sensitivity for catalytic and non-catalytic signatures.
| SIGNATURES | PRECISION | SENSITIVITY | TP | FP | FN | ATTRIBUTES |
|---|---|---|---|---|---|---|
| Catalytic | 0.791 | 0.735 | 125 | 33 | 45 | 78 |
| Non-catalytic | 0.991 | 0.970 | 228 | 2 | 7 | 519 |
| All 556 from study | 0.991 | 0.970 | 228 | 2 | 7 | 556 |
Figure 2Classification performance of catalytic and non-catalytic signatures. The micro-averaged precision and sensitivity achieved by using the catalytic and non-catalytic sets and the proportions of our InterPro signatures belonging to each group.