| Literature DB >> 19219566 |
Ursula Pieper1, Ranyee Chiang, Jennifer J Seffernick, Shoshana D Brown, Margaret E Glasner, Libusha Kelly, Narayanan Eswar, J Michael Sauder, Jeffrey B Bonanno, Subramanyam Swaminathan, Stephen K Burley, Xiaojing Zheng, Mark R Chance, Steven C Almo, John A Gerlt, Frank M Raushel, Matthew P Jacobson, Patricia C Babbitt, Andrej Sali.
Abstract
To study the substrate specificity of enzymes, we use the amidohydrolase and enolase superfamilies as model systems; members of these superfamilies share a common TIM barrel fold and catalyze a wide range of chemical reactions. Here, we describe a collaboration between the Enzyme Specificity Consortium (ENSPEC) and the New York SGX Research Center for Structural Genomics (NYSGXRC) that aims to maximize the structural coverage of the amidohydrolase and enolase superfamilies. Using sequence- and structure-based protein comparisons, we first selected 535 target proteins from a variety of genomes for high-throughput structure determination by X-ray crystallography; 63 of these targets were not previously annotated as superfamily members. To date, 20 unique amidohydrolase and 41 unique enolase structures have been determined, increasing the fraction of sequences in the two superfamilies that can be modeled based on at least 30% sequence identity from 45% to 73%. We present case studies of proteins related to uronate isomerase (an amidohydrolase superfamily member) and mandelate racemase (an enolase superfamily member), to illustrate how this structure-focused approach can be used to generate hypotheses about sequence-structure-function relationships.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19219566 PMCID: PMC2693957 DOI: 10.1007/s10969-008-9056-5
Source DB: PubMed Journal: J Struct Funct Genomics ISSN: 1345-711X
List of 80 NYSGXRC genomes (as of June 2005)
| Organism | Taxonomy ID | Organism | Taxonomy ID |
|---|---|---|---|
| 56636 | 1639 | ||
| 63363 | Metagenome sequences (Gene synthesis) | 256318 | |
| 3702 | 2190 | ||
| 2234 | 10090 | ||
| 1396 | 83332 | ||
| 86665 | 2104 | ||
| 1423 | 485 | ||
| 1428 | 487 | ||
| 38323 | 1180 | ||
| 520 | 9986 | ||
| 139 | 4530 | ||
| 9913 | 9940 | ||
| 6239 | 837 | ||
| 197 | 287 | ||
| 5476 | 303 | ||
| 9615 | 2261 | ||
| 9925 | 53953 | ||
| 155892 | 10116 | ||
| 1488 | 1063 | ||
| 1717 | 4932 | ||
| 5207 | 602 | ||
| 5807 | 4896 | ||
| 1299 | 42897 | ||
| 881 | Simian immunodeficiency virus | 11723 | |
| 44689 | 1280 | ||
| 7227 | 1282 | ||
| 550 | 1309 | ||
| 1351 | 1313 | ||
| 9796 | 1314 | ||
| 562 | 2287 | ||
| 83334 | 9823 | ||
| 9685 | 31033 | ||
| 9031 | 2303 | ||
| 727 | 50339 | ||
| 64091 | 2336 | ||
| 210 | 2130 | ||
| 9606 | 666 | ||
| Human immunodeficiency virus type 1 | 11676 | 8355 | |
| 573 | 2371 | ||
| 446 | 4577 |
Summary of new enolase and amidohydrolase X-ray crystal structures and automated template-based modeling results, including subgroup and family assignments
| PDB code | Database accession number (Genpept GI IDs) | No of sequences in Psi-blast alignment | No of sequences with acceptable models and/or fold assignments | No of models >50% seq. ID (min 50% template coverage) | No of models 30–50% seq. ID (min 50% template coverage) | No of models <30% seq. ID (min 50% template coverage) | Subgroup assignment | Family assignment |
|---|---|---|---|---|---|---|---|---|
| 2GL5 | 16420812 | 2,863 | 2,777 | 0 | 98 | 2,462 | Mandelate racemase-like | Galactonate dehydratase |
| 2GDQ | 2633433 | 2,234 | 2,129 | 1 | 0 | 2,036 | Mandelate racemase-like | None |
| 2GSH | 16420830 | 2,588 | 2,286 | 16 | 9 | 2,110 | Mandelate racemase-like | |
| 2HNE | 21115341 | 2,746 | 2,712 | 83 | 20 | 2,527 | Mandelate racemase-like | None |
| 2HZG | 77386310 | 2,667 | 2,341 | 1 | 1 | 2,248 | Mandelate racemase-like | None |
| 2I5Q | 15832389 | 2,566 | 2,340 | 21 | 13 | 2,206 | Mandelate racemase-like | None |
| 2NQL | 17743914 | 2,849 | 2,470 | 2 | 1 | 2,356 | Mandelate racemase-like | None |
| 2O56 | 16767118 | 3,016 | 2,968 | 15 | 127 | 2,735 | Mandelate racemase-like | None |
| 2OQH | 21225834 | 2,690 | 2,668 | 2 | 32 | 2,630 | Glucarate dehydratease-like | None |
| 2OQY | 23100298 | 2,700 | 2,631 | 1 | 0 | 3,004 | Muconate cycloisomerase-like | None |
| 2OVL | 21221904 | 2,670 | 2,656 | 1 | 97 | 2,534 | Mandelate racemase-like | None |
| 2OG9 | 91786345 | 2,669 | 2,664 | 10 | 75 | 2,553 | Mandelate racemase-like | |
| 2OLA | 88195610 | 2,719 | 2,697 | 5 | 3 | 2,652 | Muconate cycloisomerase-like | |
| 2OO6 | 91778214 | 3,271 | 3,221 | 3 | 2 | 3,111 | Mandelate racemase-like | None |
| 2OKT | 57650581 | 2,723 | 2,705 | 5 | 3 | 2,664 | Muconate cycloisomerase-like | |
| 2OPJ | 72161814 | 2,562 | 1,855 | 19 | 31 | 1,712 | Mandelate racemase-like | |
| 2OX4 | 56552160 | 2,733 | 2,639 | 11 | 136 | 2,449 | Mandelate racemase-like | None |
| 2OZ3 | 67154209 | 2,743 | 2,656 | 38 | 25 | 2,567 | Mandelate racemase-like | None |
| 2OZ8 | 13475907 | 2,821 | 2,674 | 0 | 0 | 2,641 | Mandelate racemase-like | None |
| 2POI | 46136735 | 2,747 | 2,661 | 13 | 52 | 2561 | Mandelate racemase-like | None |
| 2OZT | 22294898 | 2,816 | 2,726 | 0 | 16 | 2,722 | Muconate cycloisomerase-like | |
| 2PCE | 83951697 | 2,693 | 2,683 | 1 | 16 | 2,635 | Muconate cycloisomerase-like | None |
| 2PGE | 51244103 | 2,779 | 2,767 | 1 | 19 | 2,768 | Muconate cycloisomerase-like | |
| 2PGW | 16263250 | 2,781 | 2,743 | 1 | 3 | 2,694 | Mandelate racemase-like | None |
| 2PMQ | 114764387 | 2,881 | 2,760 | 3 | 14 | 2,723 | Muconate cycloisomerase-like | None |
| 2POD | 53723090 | 2,745 | 2,732 | 12 | 97 | 2,585 | Mandelate racemase-like | Galactonate dehydratase |
| 2POZ | 13488170 | 2,861 | 2,836 | 1 | 162 | 2,687 | Mandelate racemase-like | None |
| 2PPG | 16262827 | 2,947 | 2,755 | 2 | 66 | 2,707 | Mandelate racemase-like | None |
| 2PS2 | 83774494 | 2,777 | 2,753 | 3 | 16 | 2,712 | Muconate cycloisomerase-like | None |
| 2QDE | 56478643 | 2,930 | 2,670 | 1 | 62 | 2,595 | Muconate cycloisomerase-like | None |
| 2QGY | 110347373 | 2,988 | 2,899 | 0 | 1 | 2,912 | Mandelate racemase-like | None |
| 2QQ6 | 108803396 | 3,238 | 3,216 | 0 | 201 | 3,081 | Mandelate racemase-like | Galactonate dehydratase |
| 2QYE | 83951695 | 3,128 | 3,121 | 0 | 21 | 3,110 | Muconate cycloisomerase-like | None |
| 3BJS | 6791043 | 3,261 | 2,897 | 4 | 82 | 2,810 | Mandelate racemase-like | None |
| 2QDD | 83951694 | 2,868 | 2,852 | 0 | 20 | 2,849 | Muconate cycloisomerase-like | |
| 3CAW | 42522147 | 2,220 | 2,139 | 0 | 0 | 2,137 | Muconate cycloisomerase-like | |
| 3CT2 | 70731221 | 3,483 | 2,771 | 84 | 77 | 2,667 | Muconate cycloisomerase-like | Muconate cycloisomerase |
| 3CYJ | 108805509 | 3,551 | 2,879 | 8 | 35 | 2,838 | Mandelate racemase-like | None |
| 3DDM | 33575875 | 3,603 | 3,576 | 6 | 27 | 3,591 | Mandelate racemase-like | None |
| 3BSM | 92115090 | 3,372 | 3,359 | 86 | 165 | 3,097 | Mannonate dehydratase-like | Mannonate dehydratase |
| Total (unique sequences) | 7,013 | 5,804 | 398 | 766 | 5,190 | |||
| 2GOK | 17742376 | 3,001 | 2,943 | 96 | 103 | 2,678 | Imidazolonepropionase-like | Imidazolonepropionase |
| 2OOD | 27378991 | 3,609 | 3,572 | 0 | 160 | 3,440 | Guanine deaminase-like | None |
| 2OOF | 83646866 | 3,588 | 3,578 | 142 | 154 | 3,270 | Imidazolonepropionase-like | Imidazolonepropionase |
| 2I5G | 9951721 | 569 | 448 | 28 | 50 | 340 | None | None |
| 2I9U | 15023121 | 3,386 | 3,363 | 5 | 96 | 3,198 | Newfam59 | None |
| 2ICS | 29342885 | 3,433 | 3,334 | 3 | 28 | 3,209 | Unknown18 | None |
| 2IMR | 9911007 | 3,790 | 3,502 | 1 | 3 | 3,498 | None | None |
| 2OGJ | 17741648 | 3,527 | 3,510 | 5 | 18 | 3,395 | Newfam71 | None |
| 2P9B | 23466009 | 3,319 | 3,302 | 2 | 37 | 3,230 | Unknown41 | None |
| 2PAJ | 91783796 | 3,264 | 3,252 | 4 | 116 | 3,128 | Unknown55 | None |
| 2QO1 | 13422863 | 460 | 263 | 35 | 14 | 172 | Uronate isomerase-like | Uronate isomerase |
| 2Q6B | 15615056 | 306 | 189 | 3 | 0 | 167 | Uronate isomerase-like | Uronate isomerase |
| 2QS8 | 114773165 | 3,508 | 3,497 | 15 | 144 | 3,280 | Unknown43 | None |
| 2QT3 | 32455889 | 3,723 | 3,693 | 1 | 49 | 3,606 | Unknown95 | None |
| 2RAG | 16126978 | 911 | 602 | 7 | 53 | 502 | Newfam32 | None |
| 2R8C | 4447959 | 3,649 | 3,632 | 19 | 195 | 3,359 | Unknown47 | None |
| 2I9U | 150231121 | 3,386 | 3,363 | 5 | 96 | 3,198 | Newfam59 | None |
| 2OOF | 83646866 | 3,588 | 3,578 | 142 | 154 | 3,270 | Imidazolonepropinase-like | Imidazolonepropionase |
| 3B40 | 9948434 | 1,149 | 656 | 16 | 34 | 504 | Newfam190 | None |
| 3CJP | 15896580 | 3,289 | 3,286 | 1 | 1,467 | 1,851 | Newfam63 | None |
| 3BE7 | 4436882 | 3,697 | 3,198 | 4 | 112 | 3,042 | Unknown42 | None |
| Total (unique sequences) | 12,101 | 11,628 | 302 | 2,429 | 8,912 | |||
Only one entry is shown for structures determined in different crystal forms or ligand binding states. An acceptable model is defined to be based on a significant PSI-BLAST E-value (0.0001) or a favorable GA341 model score (>0.7) [60]
Putative amidohydrolase superfamily members
| Database ID (GenPept GI IDs) | Method | Organism | Length | Annotation available at target selection | Verification |
|---|---|---|---|---|---|
| 7462218 | Structure-based | 434 | Conserved hypothetical protein | HMM | |
| 7497374 | Structure-based | 818 | Hypothetical protein C44B7.10 | HMM | |
| 7500805 | Structure-based | 313 | T21966 hypothetical protein F38E11.3— | HMM | |
| 9948434 | Structure-based | 448 | Probable dipeptidase precursor ( | HMM | |
| 10173106 | Structure-based | 427 | BH0493 | HMM | |
| 10175729 | Structure-based | 571 | DNA-dependent DNA polymerase beta chain | HMM | |
| 13700943 | Structure-based | 570 | DNA-dependent DNA polymerase beta chain | HMM | |
| 14600641 | Structure-based | 313 | 313aa long hypothetical microsomal dipeptidase | HMM | |
| 14601853 | Template | 394 | Hypothetical protein ( | HMM | |
| 14602106 | Structure-based | 327 | Hypothetical protein ( | HMM | |
| 15600589 | Structure-based | 325 | D82971 hypothetical protein PA5396 (imported)— | HMM | |
| 15612748 | Structure-based | 448 | BH0185 | HMM | |
| 15614834 | Structure-based | 310 | Dipeptidase | HMM | |
| 15791917 | Structure-based | 265 | Hypothetical protein Cj0556 | HMM | |
| 15805850 | Structure-based | 418 | Hydrolase, putative | HMM | |
| 15896580 | Structure-based | 262 | Predicted amidohydrolase (dihydroorotase family) | HMM | |
| 15898656 | Structure-based | 314 | Microsomal dipeptidase | HMM | |
| 15925570 | Structure-based | 336 | Conserved hypothetical protein | HMM | |
| 16125737 | Structure-based | 487 | Uronate isomerase (EC 5.3.1.12) (Glucuronate isomerase) (UronicDE isomerase) | HMM | |
| 16126978 | Structure-based | 417 | Dipeptidase | HMM | |
| 16127409 | Structure-based | 353 | Hypothetical protein | HMM | |
| 16130781 | Structure-based | 464 | Soluble protein involved in cell viability at the beginning of stationary phase; soluble protein involved in cell viability at the beginning of stationary phase, contains urease domain | HMM | |
| 16410647 | Structure-based | 570 | lmo1231 | HMM | |
| 17556402 | Structure-based | 352 | Hypothetical protein Y71D11A.3a | HMM | |
| 19705473 | Structure-based | 336 | 2-amino-3-carboxymuconate-6-semialdehyde decarboxylase | HMM | |
| 19911227 | Structure-based | 336 | 2-amino-3-carboxylmuconate-6-semialdehyde decarboxylase | HMM | |
| 19911231 | Structure-based | 401 | 2-amino-3-carboxylmuconate-6-semialdehyde decarboxylase | HMM | |
| 24379660 | Structure-based | 267 | conserved hypothetical protein | HMM | |
| 33592291 | Structure-based | 284 | Putative 2-pyrone-4,6-dicarboxylic acid hydrolase | HMM | |
| 33593502 | Structure-based | 341 | Putative dipeptidase | HMM | |
| 39976001 | Sequence- and structure-based | 417 | Hypothetical protein | HMM | |
| 42527610 | Structure-based | 371 | Dihydroorotase, putative | HMM | |
| 42631159 | Structure-based | 330 | Hypothetical protein | HMM | |
| 51012913 | Structure-based | 313 | YMR262W | HMM | |
| 51968376 | Structure-based | 346 | Unnamed protein product | HMM | |
| 51968996 | Structure-based | 346 | Unnamed protein product | HMM | |
| 55980841 | Structure-based | 369 | Amidohydrolase family protein | HMM | |
| 60279993 | STRUCTURE-based | 403 | PvdM HMM | ||
| 66807941 | Structure-based | 359 | Hypothetical protein | HMM | |
| 66808659 | Structure-based | 322 | Hypothetical protein | HMM | |
| 1065989 | Sequence-based | 577 | Adenine deaminase | HMM | |
| 15023784 | Sequence-based | 570 | Adenine deaminase | HMM | |
| 24636152 | Structure-based | 403 | Hypothetical protein C44B7.12 | HMM | |
| 29377069 | Structure-based | 444 | Chlorohydrolase family protein | HMM | |
| 40788915 | Structure-based | 777 | Q93075_chr3:10265710-10295706_H233R_V272I_L374P PUTATIVE DEOXYRIBONUCLEASE KIAA0218 (EC 3.1.21.-) | HMM | |
| 45446932 | Sequence- and structure-based | 774 | CG32626-PA, isoform A | HMM | |
| 56203368 | Sequence- and structure-based | 776 | Adenosine monophosphate deaminase 1 (isoform M | HMM | |
| 56203369 | Sequence-based | 780 | OTTHUMP00000059283 | HMM | |
| 57230710 | Structure-based | 469 | Hydrolase, putative | HMM | |
| 63055053 | Structure-based | Homo sapiens | 761 | TatD DNase domain containing 2 | HMM |
| 68250266 | Structure-based | 251 | Conserved putative deoxyribonuclease | HMM | |
| 429129 | Sequence-based | 797 | YB9Z_YEAST HYPOTHETICAL 92.9 KD PROTEIN IN SSH1-APE3 INTERGENIC REGION | Manual | |
| 7293948 | Sequence-based | 520 | CG5998-PA | Manual | |
| 11463854 | Sequence-based | 561 | Male-specific IDGF | manual | |
| 14602062 | Structure-based | 375 | Hypothetical protein [ | Manual | |
| 15898896 | Structure-based | 269 | Conserved hypothetical protein | Manual | |
| 16264026 | Template | 466 | HYPOTHETICAL PROTEIN | Manual | |
| 17646150 | Sequence- and structure-based | 506 | Adenosine deaminase-related growth factor C | Manual | |
| 23093239 | Sequence-based | 561 | CG32178-PA | Manual | |
| 25009707 | Sequence-based | 561 | AT05468p | Manual | |
| 33593596 | Structure-based | 523 | Conserved hypothetical protein | Manual | |
| 40744823 | Structure-based | 562 | HYPOTHETICAL protein | Manual | |
| 47678365 | Sequence-based | 511 | Cat eye syndrome critical region protein 1 [Homo sapiens] | Manual | |
| 49116836 | Sequence- and structure-based | 510 | Hypothetical protein | Manual |
Tables listing all amidohydrolase and enolase superfamily targets can be found at http://salilab.org/projects/enspec/ (HMM Hidden Markov Model verification)
Fig. 1Flowchart of the target expansion strategy of sequence-based target expansion (left) and structure-based target expansion (right)
Fig. 2Phylogenetic tree of the organisms for the selected amidohydrolase targets. The numbers in parentheses represent the number of targets for confirmed (first number) and putative (second number) amidohydrolase superfamily members. The tree was generated using the NCBI Taxonomy Browser [61]
Success rates for the steps in the structural genomics pipeline as of June 2008
| Step | Amidohydrolase superfamily | Enolase superfamily | Both superfamilies | |||
|---|---|---|---|---|---|---|
| Total | Fraction (%) | Total | Fraction (%) | Total | Fraction (%) | |
| In pipeline | 279 | 222 | 501 | |||
| Cloned | 254 | 91 | 206 | 93 | 460 | 92 |
| Expressed | 225 | 88 | 177 | 86 | 402 | 87 |
| Soluble | 167 | 74 | 112 | 63 | 279 | 69 |
| Purified | 110 | 66 | 67 | 60 | 177 | 63 |
| Crystallized | 63 | 57 | 44 | 66 | 107 | 60 |
| Unique structures | 20 | 32 | 41 | 93 | 61 | 57 |
| All structures | 25 | 50 | 75 | |||
Comparison of template-based modeling statistics for the 61 ENSPEC/NYSGXRC structures and all 327 NYSGXRC structures (May 2007)
| Amidohydrolase and enolase superfamily members | All | |
|---|---|---|
| Average number of sequences with acceptable models | 2,681 | 1,964 |
| Minimum/maximum number of sequences with acceptable models | 189/3693 | 30/6320 |
| Average number of sequences with >50% sequence identity, at least 50% coverage | 15 | 20 |
| Average number of sequences with 30–50% sequence identity, at least 50% coverage | 59 | 113 |
| Average number of sequences with <30% sequence identity, at least 50% coverage | 2,572 | 1,400 |
An acceptable model is defined to be based on a significant PSI-BLAST E-value (0.0001) or a favorable GA341 model score (>0.7)
Fig. 3a Cytoscape clustering for the amidohydrolase superfamily. The most homogeneous subgroups have been named. An additional figure with full subgroup coloring is available in Supplemental Materials. Green diamonds Structures determined prior to the start of the ENSPEC/NYSGXRC project in June 2005. Red triangles Superfamily members in the target list. Purple squares Five divergent structures determined by ENSPEC/NYSGXRC. Blue squares All other structures determined by ENSPEC/NYSGXRC. Ovals indicate subgroups: red dihydroorotase 3-like; dark blue urease-like; purple NagA/AgaA-like; light-blue: 2-Pyrone-4,6-dicarboxylate lactonase-like; pink uronate-isomerase-like; orange PHP-like; delft-blue membrane dipeptidase-like. b Cytoscape clustering for the enolase superfamily. Subgroup clusters are marked for four subgroups. The full subgroup assignments can be found in Supplemental Materials. Green diamonds Structures determined prior to the start of the ENSPEC/NYSGXRC project in June 2005. Red triangles Superfamily members in the target list. Blue squares All structures determined by ENSPEC/NYSGXRC. Ovals indicate subgroups: pink mannonate dehydratase-like; orange mandelate racemase-like; blue muconate cycloisomerase-like; green glucarate dehydratase-like
Fig. 4a Cytoscape network showing the uronate isomerase family. The E-value threshold for displaying edges is 10−10. The large cluster represents the “typical” uronate isomerases; sequences in this cluster are more similar to other members of the amidohydrolase superfamily than is Bh0493. Bh0705 is shown in purple and the structurally characterized enzyme from Thermotoga maritima is shown in red. On the right, the outlier uronate isomerase, Bh0493, is shown in purple along with a small number of sequences of unknown function. b Ribbon diagram [62] of a superposition of the trimeric structures of Bh0493 (2Q6E, blue) and a uronate isomerase from Thermotoga maritima (1J5S, red). The detailed box shows the active site residues of chain A including a Zn2+ ion for 2Q6E
Fig. 5Mandelate racemase bound to a substrate analog, atrolactate, (1MDR: red), is shown superimposed with two structures of unknown function. In both superpositions, active site metal ligands D195, E221, E247, the active site His-Asp dyad (H297, D270), and a Lys-X-Lys motif (K164, K166) conserved in 1MDR and other members of the mandelate racemase subgroup are labeled (1MDR numbering). a Superposition of 2GL5 (blue) with 1MDR shows conservation of all of these active site residues, except for the second Lys in the Lys-X-Lys motif of 1MDR, which is replaced in 2GL5 by Asp170. This residue faces away from the active site in 2GL5. b Superposition of 2POD (green) with 1MDR also shows conservation of all of listed residues, except for the second Lys in the Lys-X-Lys motif that is replaced in 2POD by W176