| Literature DB >> 22536854 |
Keisuke Ueno1, Katsuhiko Mineta, Kimihito Ito, Toshinori Endo.
Abstract
BACKGROUND: Structural genomics approaches, particularly those solving the 3D structures of many proteins with unknown functions, have increased the desire for structure-based function predictions. However, prediction of enzyme function is difficult because one member of a superfamily may catalyze a different reaction than other members, whereas members of different superfamilies can catalyze the same reaction. In addition, conformational changes, mutations or the absence of a particular catalytic residue can prevent inference of the mechanism by which catalytic residues stabilize and promote the elementary reaction. A major hurdle for alignment-based methods for prediction of function is the absence (despite its importance) of a measure of similarity of the physicochemical properties of catalytic sites. To solve this problem, the physicochemical features radially distributed around catalytic sites should be considered in addition to structural and sequence similarities.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22536854 PMCID: PMC3408369 DOI: 10.1186/1472-6807-12-5
Source DB: PubMed Journal: BMC Struct Biol ISSN: 1472-6807
Figure 1RDF of total charge.The line indicates the distances contributing to each catalytic residue peak for (A) the C4N atom of NAD in 1dc6 and (B) the FE atom of HEM in 1sog (PDB) and to each peak of degenerated catalytic residues and the RDFs for (C) 1dc6 and (D) 1sog (PDB).
Effect of mutations on the physicochemical properties of active sites
| UniProt | Ligand | RDF | SiteEngine | ||||
|---|---|---|---|---|---|---|---|
| | | Euclidean | 1 – cosine | 100 – match score | |||
| w/w, m/m | w/m | w/w, m/m | w/m | w/w, m/m | w/m | ||
| CCPR_YEAST | HEM | 198 | 195 | 0.0041 | 0.0041 | ||
| CHOD_STRS0 | FAD | 51.9 | 59.4 | ||||
| FPRA_MYCTU | FAD | 168 | 231 | 0.0039 | 0.0053 | ||
| | NDP/ODP | 228 | 340 | 0.0073 | 0.0147 | ||
| FRDA_SHEFN | FAD | 14.1 | 13.2 | ||||
| G3P_BACST | NAD | ||||||
| IDH_ECOLI | NAP | 0.0357 | 0.0312 | ||||
| MDH_ECOLI | NAD | 322 | 385 | 0.0023 | 0.0133 | 29.5 | 32.2 |
| NIA1_MAIZE | FAD | 163 | 300 | 0.0037 | 0.0132 | 25.0 | 37.2 |
| OYE1_SACPS | FMN | 201 | 224 | 0.0051 | 0.0064 | 34.3 | 40.3 |
The w/w and m/m columns show wild-type/wild-type or mutant/mutant pairs. The w/m columns show wild-type/mutant pairs. The results with statistically significant differences between the match and mismatch are shown in bold font. The statistical significance was assessed by Wilcoxon rank sum tests with a 5% significance level.
Figure 2Nonlinear projection of RDFs. The SOM was run using an RDF with an Epanechnikov neighborhood function in a [46, 28]-sized rectangular lattice (left) and a magnified section (right). Following training, each node was colored according to the enzymes or catalytic residues in the RDFs that were mapped onto it. The size of the squares or triangles indicates the relative frequency of the mapped RDFs.
Figure 3Comparison between active sites in homologous enzymes mapped onto the SOM. (A) Superposition of 3-phospho-glyceraldehyde dehydrogenase (PDB code 1dc6, node [38, 5]; and PDB code 1nq5, node [38, 9]). (B) Superposition of cytochrome c peroxidase (PDB code 1sog, node [36, 8]; and PDB code 1dso, node [34, 13]). The catalytic sites are indicated by light blue (1dc6, 1sog) and gray (1nq5, 1dso). The replaced residues are denoted and colored.
SOM assignment of RDFs of oxidoreductases
| Node composition | EC | SCOP§ | Catalytic residues |
|---|---|---|---|
| Occupied by one class | 2,929 (156) | 1,966 (77) | 949 (67) |
| Conflict* | 129 (27) | 42 (9) | 112 (15) |
| All RDFs | 4,092 (241) | 2,526 (100) | 1,910 (231) |
The numbers indicate the RDF counts assigned to the nodes, and the number of classes is shown in parentheses. The SOM was performed by the RDF with an Epanechnikov neighborhood function in a [46, 28]-sized rectangular lattice
*One class is more than 80% of the total. §The nodes were labeled using SCOP[44].
SOM assignment of RDFs of transferases
| Node composition | EC | SCOP§ | Catalytic residues |
|---|---|---|---|
| Occupied by one class | 885 (59) | 526 (37) | 356 (40) |
| Conflict* | 25 (6) | 12 (3) | 11 (2) |
| All RDFs | 1,444 (119) | 797 (60) | 736 (122) |
The numbers indicate the RDF counts assigned to the nodes, and the number of classes is shown in parentheses. The SOM was performed by the RDF with a cut-Gaussian neighborhood function in a [40, 19]-sized rectangular lattice.
*One class is more than 80% of the total. §The nodes were labeled using SCOP[44].
Partial correlation between the different measures of oxidoreductases
| Measures | MAMMOTH | Needleman- Wunsch | Smith- Waterman | Site Engine* | SOM distance |
|---|---|---|---|---|---|
| MAMMOTH | | 0.409 | 0.148 | −0.318 | −0.084 |
| Needleman- Wunsch | 0.409 | | 0.404 | −0.198 | 0.009 |
| Smith- Waterman | −0.148 | 0.404 | | −0.101 | −0.015 |
| SiteEngine* | −0.318 | −0.198 | −0.101 | | 0.052 |
| SOM distance | −0.084 | 0.009 | −0.015 | 0.052 |
*The complement 100 minus the match score.
Partial correlation between the different measures of transferases
| Measures | MAMMOTH | Needleman- Wunsch | Smith- Waterman | Site Engine* | SOM distance |
|---|---|---|---|---|---|
| MAMMOTH | | 0.375 | −0.020 | −0.284 | −0.078 |
| Needleman- Wunsch | 0.375 | | 0.642 | −0.309 | −0.006 |
| Smith- Waterman | −0.020 | 0.642 | | −0.142 | −0.058 |
| Site Engine* | −0.284 | −0.309 | −0.142 | | 0.049 |
| SOM distance | −0.078 | −0.006 | −0.058 | 0.049 |
*The complement 100 minus the match score.
Evaluation of the SOM distance with the RDFs for the prediction of enzyme function of oxidoreductases
| Dataset* | AUC | ||
|---|---|---|---|
| SOM distance | SiteEngine | Alignment | |
| MAMMOTH | 0.746 | 0.410 | 0.415 |
| Needleman-Wunsch | 0.729 | 0.558 | 0.654 |
| Smith-Waterman | 0.744 | 0.541 | 0.471 |
*The datasets were created by culling the pairs with greater than 25% pairwise identity. The SOM was run using an RDF with an Epanechnikov neighborhood function in a [46, 28]-sized rectangular lattice.
Evaluation of the SOM distance with the RDFs for the prediction of enzyme function of transferases
| Dataset* | AUC | ||
|---|---|---|---|
| SOM distance | SiteEngine | Alignment | |
| MAMMOTH | 0.800 | 0.626 | 0.376 |
| Needleman-Wunsch | 0.790 | 0.678 | 0.474 |
*The datasets were created by culling the pairs with greater than 15% pairwise identity. The SOM was run using an RDF with a cut-Gaussian neighborhood function in a [40, 19]-sized rectangular lattice.
Identification of remote orthologs assigned to the same nodes in the SOM
| PDB query | PDB target | EC number | Identity (%) | ETA |
|---|---|---|---|---|
| 1j1wA | 1xkdB | 1.1.1.42 | 9.9 | - |
| 2aczA | 1jryA | 1.3.99.1 | 17.1 | detected |
| 1nekA | 1jrxA | 1.3.99.1 | 17.4 | - |
| 1nenA | 1jrxA | 1.3.99.1 | 17.4 | - |
| 1qjdA | 2aczA | 1.3.99.1 | 17.4 | detected |
| 1d4dA | 2b76A | 1.3.99.1 | 18 | detected |
| 1d4eA | 1kfyM | 1.3.99.1 | 18 | - |
| 1i2zA | 1uh5A | 1.3.1.9 | 21.4 | - |
| 2gsmA | 2qpeA | 1.9.3.1 | 21.4 | - |
| 1ocrA | 2qpeA | 1.9.3.1 | 22.6 | - |
| 1qleA | 2qpeA | 1.9.3.1 | 22.6 | - |
| 1ar1A | 2qpeA | 1.9.3.1 | 23 | - |
| 1qr6B | 2dvmA | 1.1.1.38 | 23.1 | detected |
| 2dvmA | 1pjlE | 1.1.1.38 | 23.1 | - |
| 1d1gA | 1rb2A | 1.5.1.3 | 24.9 | - |
| 1ra2A | 1d1gA | 1.5.1.3 | 24.9 | - |
| 1cm0A | 1fy7A | 2.3.1.48 | 9.8 | - |
| 1cm0A | 1mj9A | 2.3.1.48 | 10.6 | - |
| 2dpmA | 1nw5A | 2.1.1.720 | 13.6 | - |
| 1nw7A | 2oreE | 2.1.1.720 | 14 | - |
| 1gc3E | 1oxoA | 2.6.1.1 | 15.5 | - |
| 1gc3F | 9aatA | 2.6.1.1 | 15.5 | - |
| 1ahgA | 1j32B | 2.6.1.1 | 15.8 | - |
| 1akaA | 1gc3F | 2.6.1.1 | 16 | - |
| 3bo5A | 1zkkB | 2.1.1.430 | 17.5 | - |
| 1g55A | 2qrvD | 2.1.1.370 | 17.6 | - |
| 3pgtA | 2caqA | 2.5.1.18 | 19.2 | - |
| 2fyfA | 1bjoA | 2.6.1.52 | 19.6 | - |
| 1dl5B | 1i1nA | 2.1.1.770 | 20.5 | - |
| 1dl5B | 1kr5A | 2.1.1.770 | 20.5 | - |
| 1i1nA | 1dl5A | 2.1.1.770 | 20.5 | - |
| 1kr5A | 1dl5A | 2.1.1.770 | 20.5 | detected |
| 3aatA | 1gc3H | 2.6.1.1 | 22.5 | - |
SOM predictions for the proteins with unknown function in structural genomics
| PDB (Ligand) | SOM | ETA |
|---|---|---|
| 1h2hA (NAD) | ||
| 1npdA (NAD) | 1.14.99.3 | |
| 1o61A (PLP) | 2.1.1.104 | |
| 1o8cA (NDP) | 2.3.3, 5.4.4, 6.3.2 | |
| 1rljA (FMN) | 2.4.1 | |
| 1t57A (FMN) | 3.2.1 | |
| 1ue8A (HEM) | 1.2.1.9 | 1.14.14, 2.3.2, 2.7.7, 3.5.4, 3.6.1, 4.2.99, 5.1.3 |
| 1ve3A (SAM) | ||
| 1ve3B (SAM) | 2.6.1.1 | |
| 1xq6A (NAP) | 1.2.4.4 | |
| 1y81A (COA) | 1.13.11, 2.3.2, 2.7.10, 2.8.1, 3.6.1, 3.6.3, 4.1.2, 4.3.1, 6.3.2 | |
| 1yoaA (FAD) | 1.3.1.24 1.5.1.30 | 1.3.1, |
| 1yreD (COA) | 2.1.1.79 | 1.1.1, 2.3.1, 3.4.11, 3.4.22, 4.2.99 |
| 2e6uX (COA) | 2.5.1.18 | 3.5.1 |
| 2eisA (COA) | 2.5.1.6 | 3.1.2 |
| 2gluA (SAM) | 2.3.1.168 | |
| 2gqfA (FAD) | 1.3.1.26 | 1.1.1, 1.18.6, 1.3.3, 1.7.1, 2.7.1, 2.7.7, 3.2.1, 3.3.2, 3.4.21, 4.1.1, 6.3.3, 6.3.5 |
| 2gswA (FMN) | 1.5.1, 1.7.1, 3.1.4 | |
| 2ptfA (FMN) | ||
| 2q46A (NAP) | 1.2.4.4 | |
| 3cgvA (FAD) | 2.4.1, 6.1.1 | |
| 3dmeB (FAD) | 3.5.2 | |
| 3f2vA (FMN) | 1.10.99 |
The EC numbers compatible with the bound ligands are shown in bold font.
Figure 4Comparison between active sites in caspase-1 and 3-phospho-glyceraldehyde dehydrogenase. Structures of active sites in (A) caspase-1 (PDB code 1bmq) and (B) 3-phospho-glyceraldehyde dehydrogenase (PDB code 1dc6) are drawn in stick representation. Comparison of the RDFs of the total charge for (C) 1bmq and (D) 1dc6, where the line indicates the distances contributing to each peak of the Cys-His catalytic diad and the RDFs for the C27 atom of MNO in 1bmq and the C4N atom of NAD in 1dc6.