| Literature DB >> 24885296 |
Luna De Ferrari1, John B O Mitchell.
Abstract
BACKGROUND: In this work we predict enzyme function at the level of chemical mechanism, providing a finer granularity of annotation than traditional Enzyme Commission (EC) classes. Hence we can predict not only whether a putative enzyme in a newly sequenced organism has the potential to perform a certain reaction, but how the reaction is performed, using which cofactors and with susceptibility to which drugs or inhibitors, details with important consequences for drug and enzyme design. Work that predicts enzyme catalytic activity based on 3D protein structure features limits the prediction of mechanism to proteins already having either a solved structure or a close relative suitable for homology modelling.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24885296 PMCID: PMC4229970 DOI: 10.1186/1471-2105-15-150
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Datasets statistics
| 248 | 134 | 82 | |
| 248 | 82 | 82 | |
| 248 | 82 | 82 | |
| 248 | 456 | 82 | |
| 248 | 162 | 82 | |
| 248 | 743 | 82 | |
| 248 | 322 | 82 | |
| 290 | 917 | 290 | |
| 35,171 | 4418 | 82 | |
| 68,667 (226,213) | 4,825 | 0 |
The table presents the number of instances (proteins), attributes (signatures or sequence identity values) and class values (mechanisms) for the datasets used in this work; for the swissprot-non-EC set we present the instances that need prediction (the ones sharing a signature with the mechanism set), while the total number of instances is shown between parentheses.
Figure 1The sequence identity and Euclidean distance of enzymes with the same and different mechanism. The diagram presents, for every pair of proteins in the mechanism + negative datasets, the percentage of identity between the two proteins’ sequences and also the Euclidean distance between their signature sets (in the InterPro attribute space). Protein couples having the same MACiE mechanism are represented as circles, while those with different MACiE mechanisms as triangles. The colour scale is logarithmic increasing from blue (for one instance) to light blue (2-3 instances), green (4-9), yellow (70-100), orange (250) and red (up to 433 instances) and represents the number of protein couples having that sequence identity and Euclidean distance. The dashed grey line shown, with equation Euclidean distance = 7 × sequence identity, separates most same-mechanism couples (on its right) from an area dense with different-mechanism couples on its left.
Figure 2Predicting mechanism using InterPro and Catalytic Site Atlas attributes. A comparison of the predictive performance of various sets of attributes in a leave one out evaluation of the mechanism dataset. The x axis starts at 60% to better highlight the small differences between the top methods.
Figure 3Predicting mechanism using Catalytic Site Atlas 3D attributes. A comparison of the predictive performance of various sets of sequence based (2D) and structure based (3D) Catalytic Site Atlas attributes in a leave one out evaluation of the mechanism dataset. The x axis starts at 60%.
Prediction statistics
| 12 | 23 | ||
| 6 | 21 | ||
| 16 | n/a | ||
| 0 | n/a |
This table presents the number of false positive and false negative predictions for the training + testing evaluations (using InterPro attributes), a detailed list of the predictions is available in Additional file 6.