| Literature DB >> 26634450 |
Hamse Y Mussa1, Luna De Ferrari2, John B O Mitchell3.
Abstract
BACKGROUND: We recently reported that one may be able to predict with high accuracy the chemical mechanism of an enzyme by employing a simple pattern recognition approach: a k Nearest Neighbour rule with k = 1 (k1NN) and 321 InterPro sequence signatures as enzyme features. The nearest-neighbour rule is known to be highly sensitive to errors in the training data, in particular when the available training dataset is small. This was the case in our previous study, in which our dataset comprised 248 enzymes annotated against 71 enzymatic mechanism labels from the MACiE database. In the current study, we have carefully re-analysed our dataset and prediction results to "explain" why a high variance k1NN rule exhibited such remarkable classification performance.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26634450 PMCID: PMC4669639 DOI: 10.1186/s13104-015-1730-7
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Fig. 1Heatmap of our data matrix. The horizontal axis denotes InterPro signatures, whereas the vertical axis represents the enzyme sequence’s UniProt accession number and the corresponding MACiE enzymatic mechanism labels of the form M0123. The yellow colour signifies that feature x (InterPro signature) is present for the enzyme, while the red colour indicates that x is absent for this enzyme. Enzymes that possess the same MACiE mechanism label reside in a subspace of the feature space χ, which barely overlaps with other subspaces associated with other mechanisms. The inset depicts the heatmap for the dataset matrix corresponding to the InterPro signatures and names of enzymes with the MACiE enzymatic mechanism label M0218
Enzymatic MACiE mechanism labels w and the number of enzymes reported to possess this mechanism w
|
| Number of enzymes |
|---|---|
| M0346 | 3 |
| M0070 | 3 |
| M0118 | 2 |
| M0206 | 3 |
| M0034 | 3 |
| M0033 | 2 |
| M0235 | 3 |
| M0051 | 5 |
| M0312 | 4 |
| M0069 | 2 |
| M0050 | 4 |
| M0123 | 2 |
| M0248 | 3 |
| M0202 | 2 |
| M0007 | 5 |
| M0171 | 3 |
| M0255 | 2 |
| M0336 | 2 |
| M0117 | 2 |
| M0006 | 4 |
| M0131 | 2 |
| M0212 | 6 |
| M0017 | 3 |
| M0326 | 7 |
| M0218 | 12 |
| M0078 | 3 |
| M0314 | 4 |
| M0324 | 13 |
| M0175 | 4 |
|
| 2 |
| M0045 | 5 |
| M0003 | 3 |
| M0147 | 7 |
| M0121 | 2 |
|
| 2 |
| M0253 | 3 |
| M0026 | 3 |
| M0188 | 5 |
| M0130 | 4 |
| M0159 | 2 |
| M0213 | 4 |
| M0249 | 2 |
| M0055 | 3 |
| M0272 | 2 |
| M0122 | 2 |
| M0060 | 2 |
| M0148 | 2 |
| M0303 | 2 |
| M0029 | 2 |
| M0071 | 3 |
| M0099 | 6 |
| M0126 | 6 |
| M0262 | 2 |
| M0177 | 14 |
| M0013 | 4 |
| M0021 | 2 |
| M0015 | 2 |
| M0228 | 6 |
| M0058 | 2 |
| M0211 | 2 |
| M0309 | 2 |
| M0154 | 2 |
| M0244 | 2 |
| M0209 | 2 |
| M0270 | 3 |
| M0063 | 4 |
| M0328 | 2 |
| M0039 | 2 |
| M0252 | 3 |
| M0036 | 2 |
| M0080 | 2 |
Columns 1 and 2 denote enzymatic MACiE labels in our dataset and the number of enzymes reported for each, respectively. M0123, or similar, denotes the enzymatic mechanism’s label in the MACiE database. The two mechanism labels shown in italics are discussed in the main text