| Literature DB >> 19477985 |
Yoshihiro Yamanishi1, Masahiro Hattori, Masaaki Kotera, Susumu Goto, Minoru Kanehisa.
Abstract
MOTIVATION: The IUBMB's Enzyme Nomenclature system, commonly known as the Enzyme Commission (EC) numbers, plays key roles in classifying enzymatic reactions and in linking the enzyme genes or proteins to reactions in metabolic pathways. There are numerous reactions known to be present in various pathways but without any official EC numbers, most of which have no hope to be given ones because of the lack of the published articles on enzyme assays.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19477985 PMCID: PMC2687977 DOI: 10.1093/bioinformatics/btp223
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The alignment of reactant pairs and the definition of RDMs in an enzyme-catalyzed reaction (R00750 in KEGG). (a) The overall reaction where 4-hydroxy-2-oxopentanoate (C03589) is catalyzed to pyruvate (C00022) and acetaldehyde (C00084) by lyases (aldehyde-lyases or oxo-acid-lyases: EC 4.1.2.- or 4.1.3.-). (b) The reaction is decomposed into a couple of reactant pairs: reactant pair I (RP01083) containing pyruvate and reactant pair II (RP01084) containing acetaldehyde. The matched substructures obtained by SIMCOMP alignments are shown by dotted boxes for both pairs. Each structure is labeled with the KEGG atom types in order to reflect the environmental features of each atom, such as adjacent atoms, single, double, triples and aromatic bonds. The RDM atoms are colored in red, blue and yellow, respectively. The matched structure except the R and M atoms is colored in green. (c) The RDM patterns extracted from the two reactant pairs, where asterisks indicate hydrogen atoms. The RDM pattern is a set of KEGG atom type changes, such as C1b-C1a in the R atom, C1c-* in the D atoms and C5a-C5a in the M atoms for the reactant pair I.
Statistics of the RDM patterns in the RPAIR database (as of June 2008)
| Statistics | Single mode | Multiple mode |
|---|---|---|
| Number of reactant pairs | 5327 | - |
| Number of reactions | - | 5669 |
| Number of unique ‘R:D:M’ types | 1877 | 2301 |
| Number of unique ‘R:D’ types | 1103 | 1443 |
| Number of unique ‘R:M’ types | 1376 | 2071 |
| Number of unique ‘D:M’ types | 1805 | 2272 |
| Number of unique ‘R’ atom types | 607 | 1078 |
| Number of unique ‘D’ atom types | 727 | 1031 |
| Number of unique ‘M’ atom types | 1131 | 1836 |
A total of 5327 reactant pairs were assigned from 5669 reactions involving 4302 compounds. The numbers of unique RDM patterns are shown here for all possible combinations of the different atom types: ‘R’, ‘D’ and ‘M’.
Fig. 2.An illustration of the reaction pattern profile for each RDM pattern and the computation of the reaction similarity.
Fig. 3.An illustration of the sequentially conducted partial matching procedure. In the process of prediction with each RDM type, the reaction similarity for each RDM type is evaluated in the weighted major voting.
The prediction performance for each individual prediction layer
| Layer | Statistics | Single mode | Multiple mode | ||||
|---|---|---|---|---|---|---|---|
| EC main | EC sub | EC subsub | EC main | EC sub | EC subsub | ||
| R:D:M | Coverage | 0.769 | 0.769 | 0.769 | 0.754 | 0.754 | 0.754 |
| Recall | 0.629 | 0.608 | 0.546 | 0.679 | 0.658 | 0.619 | |
| Precision | 0.817 | 0.790 | 0.710 | 0.901 | 0.873 | 0.821 | |
| R:D | Coverage | 0.878 | 0.878 | 0.878 | 0.849 | 0.849 | 0.849 |
| Recall | 0.718 | 0.680 | 0.594 | 0.788 | 0.751 | 0.698 | |
| Precision | 0.817 | 0.775 | 0.677 | 0.928 | 0.884 | 0.822 | |
| D:M | Coverage | 0.773 | 0.773 | 0.773 | 0.730 | 0.730 | 0.730 |
| Recall | 0.629 | 0.607 | 0.538 | 0.681 | 0.659 | 0.618 | |
| Precision | 0.813 | 0.785 | 0.696 | 0.932 | 0.902 | 0.847 | |
| R:M | Coverage | 0.836 | 0.836 | 0.836 | 0.763 | 0.763 | 0.763 |
| Recall | 0.619 | 0.547 | 0.472 | 0.695 | 0.652 | 0.612 | |
| Precision | 0.741 | 0.655 | 0.565 | 0.911 | 0.855 | 0.802 | |
| R | Coverage | 0.938 | 0.938 | 0.938 | 0.895 | 0.895 | 0.895 |
| Recall | 0.662 | 0.544 | 0.430 | 0.785 | 0.715 | 0.654 | |
| Precision | 0.706 | 0.581 | 0.458 | 0.877 | 0.799 | 0.731 | |
| D | Coverage | 0.919 | 0.919 | 0.919 | 0.894 | 0.894 | 0.894 |
| Recall | 0.710 | 0.646 | 0.538 | 0.803 | 0.738 | 0.658 | |
| Precision | 0.772 | 0.702 | 0.585 | 0.898 | 0.825 | 0.736 | |
| M | Coverage | 0.862 | 0.862 | 0.862 | 0.789 | 0.789 | 0.789 |
| Recall | 0.581 | 0.437 | 0.378 | 0.681 | 0.596 | 0.561 | |
| Precision | 0.674 | 0.506 | 0.438 | 0.863 | 0.755 | 0.711 | |
Comparison of the prediction performance between the previous method (exact matching & simple major voting) and the proposed method (multi-layered matching & weighted major voting).
| Method | Statistics | Single mode | Multiple mode | ||||
|---|---|---|---|---|---|---|---|
| EC main | EC sub | EC subsub | EC main | EC sub | EC subsub | ||
| Previous | Coverage | 0.769 | 0.769 | 0.769 | 0.754 | 0.754 | 0.754 |
| method | Recall | 0.629 | 0.608 | 0.546 | 0.679 | 0.658 | 0.619 |
| Precision | 0.817 | 0.790 | 0.710 | 0.901 | 0.873 | 0.821 | |
| Proposed | Coverage | 0.961 | 0.961 | 0.961 | 0.933 | 0.933 | 0.933 |
| method | Recall | 0.803 | 0.765 | 0.683 | 0.875 | 0.839 | 0.794 |
| Precision | 0.835 | 0.796 | 0.711 | 0.937 | 0.899 | 0.851 | |
The detailed performance for each layer in the prediction flow of the sequentially conducted partial matching procedure
| Total | EC1 | EC2 | EC3 | EC4 | EC5 | EC6 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Oxidoreductases | Transferases | Hydrolases | Lyases | Isomerases | Ligases | |||||||||
| Layers | pairs | precision | pairs | precision | Pairs | precision | pairs | precision | pairs | precision | pairs | precision | pairs | Precision |
| Single mode | ||||||||||||||
| R:D:M | 4097 | 73.8% | 1318 | 79.5% | 1340 | 78.6% | 825 | 52.4% | 316 | 86.7% | 107 | 97.1% | 191 | 57.5% |
| R:D | 629 | 71.0% | 283 | 67.1% | 84 | 73.8% | 87 | 74.7% | 111 | 84.6% | 55 | 60.0% | 9 | 33.3% |
| D:M | 14 | 42.8% | 3 | 0.0% | 5 | 60.0% | 2 | 50.0% | 4 | 50.0% | 0 | 0.0% | 0 | 0.0% |
| R:M | 187 | 47.0% | 52 | 40.3% | 57 | 45.6% | 34 | 55.8% | 38 | 50.0% | 2 | 50.0% | 4 | 50.0% |
| R | 117 | 43.5% | 45 | 60.0% | 16 | 56.2% | 12 | 41.6% | 33 | 24.2% | 9 | 22.2% | 2 | 0.0% |
| D | 57 | 42.1% | 27 | 29.6% | 5 | 60.0% | 4 | 50.0% | 10 | 80.0% | 9 | 22.2% | 2 | 50.0% |
| M | 19 | 10.5% | 8 | 12.5% | 7 | 14.2% | 0 | 0.0% | 3 | 0.0% | 0 | 0.0% | 1 | 0.0% |
| (No hit) | 207 | 0.0% | 61 | 0.0% | 38 | 0.0% | 32 | 0.0% | 50 | 0.0% | 23 | 0.0% | 3 | 0.0% |
| Total | 5327 | 71.1% | 1736 | 74.6% | 1514 | 76.4% | 964 | 54.4% | 515 | 78.6% | 182 | 78.0% | 209 | 55.5% |
| Multiple mode | ||||||||||||||
| R:D:M | 4274 | 87.6% | 1712 | 88.5% | 1415 | 88.9% | 648 | 81.0% | 227 | 95.5% | 107 | 97.1% | 165 | 75.7% |
| R:D | 670 | 84.1% | 287 | 83.2% | 97 | 90.7% | 104 | 89.4% | 113 | 92.0% | 59 | 59.3% | 10 | 50.0% |
| D:M | 8 | 62.5% | 1 | 100.0% | 3 | 66.6% | 4 | 50.0% | 0 | 0.0% | 0 | 0.0% | 0 | 0.0% |
| R:M | 96 | 78.1% | 31 | 83.8% | 28 | 75.0% | 19 | 84.2% | 16 | 68.7% | 2 | 50.0% | 0 | 0.0% |
| R | 123 | 56.0% | 45 | 66.6% | 26 | 50.0% | 18 | 61.1% | 27 | 37.0% | 6 | 66.6% | 1 | 100.0% |
| D | 100 | 37.0% | 45 | 22.2% | 28 | 60.7% | 3 | 66.6% | 14 | 50.0% | 8 | 0.0% | 2 | 50.0% |
| M | 20 | 45.0% | 5 | 80.0% | 7 | 42.8% | 2 | 50.0% | 5 | 20.0% | 0 | 0.0% | 1 | 0.0% |
| (No hit) | 378 | 0.0% | 143 | 0.0% | 76 | 0.0% | 44 | 0.0% | 81 | 0.0% | 23 | 0.0% | 11 | 0.0% |
| Total | 5669 | 85.1% | 2126 | 85.8% | 1604 | 87.4% | 798 | 81.4% | 402 | 87.0% | 182 | 79.1% | 179 | 73.7% |
Fig. 4.A screenshot of the output page.