Literature DB >> 24697546

Discovery of function in the enolase superfamily: D-mannonate and d-gluconate dehydratases in the D-mannonate dehydratase subgroup.

Daniel J Wichelecki¹, Bryan M Balthazor, Anthony C Chau, Matthew W Vetting, Alexander A Fedorov, Elena V Fedorov, Tiit Lukk, Yury V Patskovsky, Mark B Stead, Brandan S Hillerich, Ronald D Seidel, Steven C Almo, John A Gerlt.

Abstract

The continued increase in the size of the protein sequence databases as a result of advances in genome sequencing technology is overwhelming the ability to perform experimental characterization of function. Consequently, functions are assigned to the vast majority of proteins via automated, homology-based methods, with the result that as many as 50% are incorrectly annotated or unannotated ( Schnoes et al. PLoS Comput. Biol. 2009 , 5 ( 12 ), e1000605 ). This manuscript describes a study of the D-mannonate dehydratase (ManD) subgroup of the enolase superfamily (ENS) to investigate how function diverges as sequence diverges. Previously, one member of the subgroup had been experimentally characterized as ManD [dehydration of D-mannonate to 2-keto-3-deoxy-D-mannonate (equivalently, 2-keto-3-deoxy-D-gluconate)]. In this study, 42 additional members were characterized to sample sequence-function space in the ManD subgroup. These were found to differ in both catalytic efficiency and substrate specificity: (1) high efficiency (kcat/KM = 10(3) to 10(4) M(-1) s(-1)) for dehydration of D-mannonate, (2) low efficiency (kcat/KM = 10(1) to 10(2) M(-1) s(-1)) for dehydration of d-mannonate and/or D-gluconate, and 3) no-activity with either D-mannonate or D-gluconate (or any other acid sugar tested). Thus, the ManD subgroup is not isofunctional and includes D-gluconate dehydratases (GlcDs) that are divergent from the GlcDs that have been characterized in the mandelate racemase subgroup of the ENS (Lamble et al. FEBS Lett. 2004 , 576 , 133 - 136 ) (Ahmed et al. Biochem. J. 2005 , 390 , 529 - 540 ). These observations signal caution for functional assignment based on sequence homology and lay the foundation for the studies of the physiological functions of the GlcDs and the promiscuous ManDs/GlcDs.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2014 PMID： 24697546 PMCID： PMC4007978 DOI： 10.1021/bi500264p

Source DB: PubMed Journal: Biochemistry ISSN： 0006-2960 Impact factor: 3.162

The massive influx of sequence data since the first bacterial genome sequence was published in 1995 has necessitated a reliance on homology-based annotations of protein function.[1,2] However, because this method assigns the function of the “closest” homologue, an estimated 30–50% of the functional annotations in the databases are incorrect,[3−5] with the magnitude of the problem increasing as the incorrect annotations are propagated in assigning functions to proteins discovered in newly sequenced genomes. In a study of several functionally diverse superfamilies, Schnoes, Babbitt, and co-workers concluded that 85% of misannotations resulted from annotations that are more detailed than justified.[3] Automated methods often are able to achieve high degrees of accuracy in the transfer of the first three Enzyme Commission (EC) code numbers, but accurate transfer of the fourth EC code number (substrate specificity) is much more difficult.[6] This study examines the d-mannonate dehydratase (ManD) subgroup of the enolase superfamily (ENS) to determine experimentally, on a large scale, how function diverges as sequence diverges in highly homologous enzymes. Our results illustrate the difficulty of accurately assigning function via homology-based methods and, also, provide insights into how different functions can arise in highly homologous enzymes. Two conserved features are shared by members of the ENS: mechanism and structure. The mechanism is general base-catalyzed abstraction of a proton alpha to a carboxylate group of the substrate to form an enediolate intermediate.[7] The enediolate intermediate is stabilized by coordination to an active site divalent metal cation (usually Mg2+). Furthermore, members of the ENS share a common structural motif: an (α + β) capping domain that contains the residues that determine substrate specificity and a modified TIM-barrel domain ((β/α)7β-barrel) that contains the residues that mediate acid/base chemistry.[8,9] Subgroups are differentiated by the conserved metal-binding residues at the ends of the third, fourth, and fifth β-strands as well as conserved catalytic acid/base residues at the ends of the second, third, sixth, and/or seventh β-strands of the modified TIM-barrel domain. The ENS is particularly interesting because its members share a common mechanism and the same structural motif but are functionally diverse (e.g., β-elimination and 1,1-proton transfer reactions).[10,11] Thus, the ENS is a good model to investigate how function diverges as sequence diverges. In 2007, Rakus and co-workers discovered the ManD subgroup of the ENS.[12] In that study, a protein from Novosphingobium aromaticivorans (NaManD) was structurally characterized and discovered to catalyze the syn-dehydration of d-mannonate to 2-keto-3-deoxy-d-mannonate (equivalently, 2-keto-3-deoxy-d-gluconate) (EC 4.2.1.8 and Unitprot ID A4XF23). The Mg2+-binding residues located at the ends of the third, fourth, and fifth β-strands are Asp 210, Glu 236, and Glu 262, respectively. The general base that abstracts the 2-proton is the Tyr 159-Arg 147 dyad in the “150–180s” loop between the second and third β-strands; the general acid that facilitates departure of the 3-hydroxyl group is His 212 located at the end of the third β-strand[12] (Figure 1). His 315, located at the end of the seventh β-strand that is hydrogen-bonded to the 5-hydroxyl group of the d-mannonate substrate, is also conserved. Since this study, no other members of the ManD subgroup have been experimentally characterized. Given the conservation of metal-binding residues, catalytic residues, and active site architecture, the assumption was that the entire subgroup is isofunctional.

Figure 1

Denotes the structural features of the ManDs (PDB 2QJJ). On the left, the “150–180s” loop (blue), TIM barrel (red), and capping domain (green) are displayed. The right inset shows the active site residues: metal binding Asp210, Glu236, Glu262 (green); Tyr157-Arg149 catalytic dyad (magenta); acidic His212 (blue); and conserved His315 at the end of the 7th β-strand (red). The d-mannonate ligand from the 2QJM structure is shown in light blue. In this study, we sought to investigate how function diverges as sequence diverges within this subgroup and to determine if the ManD subgroup is, in fact, isofunctional. When the target proteins for this study were selected in April 2011, the ManD subgroup included NaManD[12] and 299 uncharacterized proteins that share ≥35% sequence identity [Structure–Function Linkage Database (http://sfld.rbvi.ucsf.edu/)]. [At the time of submission of this manuscript, the UniProtKB database contained the sequences for 2919 members of the ManD subgroup.] Forty-three members representing the breadth of sequence–function space were produced as soluble proteins and screened for activity using a library of acid sugars. Surprisingly, we found that the ManD subgroup is not isofunctional; instead, in addition to ManDs it also contains d-gluconate dehydratases (GlcDs) that catalyze the anti-dehydration of d-gluconate to 2-keto-3-deoxy-d-gluconate as well as promiscuous proteins that catalyze both the ManD and GlcD reactions (Scheme 1). In addition, a wide range of catalytic efficiencies (values of kcat/KM) were discovered. Using sequence similarity networks (SSNs),[13] the members with these divergent functions could be separated into isofunctional clusters. Furthermore, 16 unique crystal structures (for a total of 36 unliganded and liganded structures) were solved to survey sequence and structure space; these revealed conserved active site structures but divergent conformations for the “150–180s” loops that contain the general basic Tyr-Arg dyads and close over the active site to sequester the substrate from solvent. Taken together, the functional and structural data provide a comprehensive description of how in vitro function diverges as sequence diverges.

Scheme 1

Materials and Methods

Cloning, Expression, and Purification of Targets (AECOM)

pNIC28-BSA4-based expression vectors were transformed into BL21(DE3) Escherichia coli containing the pRIL plasmid (Stratagene) and used to inoculate a 10 mL 2xYT culture containing 25 μg/mL kanamycin and 34 μg/mL chloramphenicol. The cultures were allowed to grow overnight at 37 °C in a shaking incubator. The overnight culture was used to inoculate 2 L of PASM-5052 autoinduction media.[14] The culture was placed in a LEX48 airlift fermenter and incubated at 37 °C for 4 h and then at 22 °C overnight. The culture was harvested and pelleted by centrifugation. Cells were resuspended in lysis buffer (20 mM HEPES, pH 7.5, 500 mM NaCl, 20 mM imidazole, and 10% glycerol) and lysed by sonication. The lysate was clarified by centrifugation at 35000g for 30 min. The protein was purified using an AKTAxpress FPLC (GE Healthcare). The lysate was loaded onto a 1 mL His60 column (Clontech), washed with 10 column volumes of lysis buffer, and eluted with buffer containing 20 mM HEPES, pH 7.5, 500 mM NaCl, 500 mM Imidazole, and 10% glycerol. This partially purified protein was loaded onto a HiLoad S200 16/60 PR gel filtration column equilibrated with SECB buffer (20 mM HEPES, pH7.5, 150 mM NaCl, 10% glycerol, and 5 mM DTT). The protein was analyzed by SDS-PAGE, flash frozen in liquid nitrogen, and stored at −80°C.

Expression and Purification of N-Terminal His-Tagged Proteins (UIUC)

Genes in pET15b (Novagen) were expressed in E. coli strain BL21. Small-scale cultures were grown at 37 °C for 18 h in 5 mL of LB containing 100 μg/mL ampicillin and used to inoculate 1 L LB containing 100 μg/mL ampicillin. IPTG (500 μM) was added at OD600 nm = 0.6–0.8 to induce expression. The induced cultures then were grown for an additional 18 h at 37 °C. The cells were harvested by centrifugation at 5000 rpm for 10 min and resuspended in 70 mL of binding buffer (6 mM imidazole, 20 mM Tris-HCl, pH 7.9, 5 mM MgCl2, and 500 mM NaCl). The resuspended cells were lysed by sonication and centrifuged at 17 000 rpm for 30 min. The supernatant was loaded onto a column of 50 mL chelating Sepharose Fast Flow (Amersham Biosciences) charged with Ni2+ and eluted with a linear gradient of imidazole (0–1 M over 600 mL). Fractions were analyzed using SDS-PAGE. Fractions containing protein at high purity (>90%) were combined and dialyzed against 4 L of buffer containing 100 mM imidazole, 20 mM Tris-HCl, pH 7.9, 10 mM MgCl2, 150 mM NaCl, and 10% glycerol for 2 h at 4 °C. The protein was dialyzed in this manner a total of three times. Finally, the protein was concentrated to a maximum of ∼10 mg/mL (depending on solubility) and flash-frozen using liquid nitrogen and stored at −80 °C.

Expression and Purification of Tagless ManD Constructs (UIUC)

The genes in pET17b (Novagen) were expressed in E. coli strain BL21. Small-scale cultures were grown at 37 °C for 18 h in 5 mL of LB containing 100 μg/mL ampicillin and used to inoculate 1 L LB containing 100 μg/mL ampicillin. The 1 L cultures were grown for an additional 18 h at 37 °C without induction. The cells were harvested by centrifugation at 5000 rpm for 10 min and resuspended in 70 mL of binding buffer. The resuspended cells were lysed by sonication and centrifuged at 17 000 rpm for 30 min. The supernatant was loaded onto a 300 mL DEAE Sepharose column (Amersham Biosciences) and eluted with a linear gradient of NaCl (0–1 M over 1.6 L) in 10 mM Tris-HCl, pH 7.9, containing 5 mM MgCl2. Fractions containing the protein of interest at high purity were combined and dialyzed against 4 L buffer containing 10 mM Tris-HCl, pH 7.9, and 5 mM MgCl2 for 2 h at 4 °C. The dialyzed protein was then loaded onto a 30 mL Q-Sepharose column (Amersham Biosciences) and eluted with a linear gradient of NaCl (0–1 M over 500 mL) in 10 mM Tris-HCl, pH 7.9, containing 5 mM MgCl2. Fractions containing the protein of interest at high purity were combined and dialyzed in 4 L buffer containing 10 mM Tris-HCl, pH 7.9, containing 5 mM MgCl2 for 2 h at 4 °C. Ammonium sulfate was added to a final concentration of 1 M, and the sample was loaded onto a 30 mL phenylsepharose column (Amersham Biosciences). The protein was eluted with a gradient of ammonium sulfate (1–0 M over 500 mL) in 10 mM Tris-HCl, pH 7.9, containing 5 mM MgCl2. Fractions with pure protein (SDS-PAGE) were combined and dialyzed against 4 L buffer containing 10 mM Tris-HCl, pH 7.9, 5 mM MgCl2, 150 mM NaCl, and 10% glycerol for 2 h at 4 °C. Finally, the protein was concentrated to a maximum of ∼10 mg/mL (depending on solubility) and flash frozen in liquid nitrogen and stored at −80 °C.

Screen for Dehydration

Reactions to test for enzymatic activity were performed in acrylic, UV transparent 96-well plates (Corning Incorporated) using a library of 72 acid sugars (Figure S1, Supporting Information). Reactions (60 μL) contained 50 mM HEPES, pH 7.9, 10 mM MgCl2, 1 μM enzyme, and 1 mM of acid sugar substrate (blanks without enzyme). The plates were incubated at 30 °C for 16 h. A 1% semicarbazide reagent solution (240 μL) was added to each well and incubated for 1 h at room temperature. The absorbancies were measured at 250 nm (ε = 10 200 M–1 cm–1) using a Tecan Infinite M200PRO plate reader.

Kinetic Assays

Dehydration of d-mannonate and d-gluconate was quantitated using either a discontinuous assay with the semicarbazide reagent[15,16] or a continuous, coupled-enzyme spectrophotometric assay. In the latter assay, the product was phosphorylated using 2-keto-3-deoxy-d-gluconate kinase (KdgK) and ATP; formation of ADP was measured using pyruvate kinase (PK) and l-lactate dehydrogenase (LDH). The assay (200 μL) at 25 °C contained 50 mM potassium HEPES, pH 7.5, 5 mM MgCl2, 1.5 mM ATP, 1.5 mM PEP, 0.16 mM NADH, 9 units of PK, 9 units of LDH, 18 units of KdgK, and ManD/GlcD. Dehydration was quantitated by measuring the decrease in absorbance at 340 nm (ε = 6220 M–1 cm–1). Low-activity enzymes were characterized using the discontinuous assay; high-activity ManDs were characterized using the coupled assay.

Site-Directed Mutagenesis (SDM)

Mutants were constructed using primers designed with the Agilent Technologies online webserver (https://www.genomics.agilent.com/) and purchased from Bio-Synthesis, Inc. Forward and reverse primers containing the mutations of interest are listed in Table S1, Supporting Information. PCR reactions (30 μL) contained 1 mM MgCl2, 1× Pfx Amp buffer, 0.33 mM dNTP, 0.33 μM of FOR/REV primer, and 1.25 units Pfx polymerase (Invitrogen Platinum Pfx DNA Polymerase kit). The templates were 50 ng ManD-containing pET15b (NaManD) or pET17b (CsManD). Amplifications were performed according to the manufacturer’s guidelines. After addition of DpnI (10 units), the reactions were incubated for 4 h at 37 °C. The DpnI-digested products were purified by gel electrophoresis, extracted, and transformed (electroporation, Bio-Rad Micropulsar Electroporator) into XL1 Blue competent cells. Finally, plasmids isolated from the transformants were sequenced to confirm the mutations.

Circular Dichroism of NaManD and CsManD Loop Mutants

The circular dichroism spectrum of a 10 μM solution of mutant enzyme in an optically clear borate buffer (50 mM boric acid, 100 mM KCl, 0.7 mM DTT, pH 8.0) was measured from 190 to 260 nm using a Jasco J-715 spectropolarimeter. Five replicate measurements were made.

Protein Crystallization and X-ray Diffraction Data Collection

Proteins were crystallized by the sitting-drop vapor diffusion method. The concentrated (usually 5–40 mg/mL) protein solutions (from 0.3 to 1 μL) were mixed with an equal volume of a precipitant solution and equilibrated at room temperature (∼294 K) against the same precipitant solution in clear tape-sealed 96-well INTELLI-plates (Art Robbins Instruments, Sunnyvale, CA). Crystallization was performed using either a TECAN crystallization robot (TECAN US, Research Triangle Park, NC) or a PHOENIX crystallization robot (Art Robbins Instruments) and four types of commercial crystallization screens: the WIZARD I&II screen (Emerald BioSystems, Bainbridge Island, WA); the INDEX HT and the CRYSTAL SCREEN HT (both from Hampton Research, Aliso Viejo, CA); and the MCSG screen (Microlytic, Woburn, MA). The appearance of protein crystals has been monitored either by visual inspection or using a Rock Imager 1000 (Formulatrix, Waltham, MA) starting within 24 h of incubation and again at weeks 1, 2, 3, 5, 8, and 12. Where necessary, the crystallization conditions were optimized manually using 24-well Cryschem sitting drop plates (Hampton Research). The crystallization conditions for all crystal structures are listed in the PDB methods tab and the Supporting Information. The crystals were either directly frozen in liquid nitrogen or treated with a cryoprotectant (glycerol or ethylene glycol, 20–30%, vol/vol) before freezing. The X-ray diffraction data for the frozen crystals were collected at 100 K on the beamline X29A (National Synchrotron Light Source, Brookhaven National Laboratory, Upton, NY) using a wavelength of 1.075 Å or on the beamline 31I-D (LRL-CAT, Advanced Photon Source, Argonne National Laboratory, IL, USA) using a wavelength of 0.9793 Å. The diffraction data were processed and scaled with SCALA[17] (APS data) or HKL[18] (NSLS data). The crystal structures reported here were determine by molecular replacement using coordinates for similar structures from the PDB (listed in each PDB deposition as REMARK 200: STARTING MODEL) and PHASER MR software (the CCP4 program package suite).[19] Each structure was refined using the programs REFMAC[20] or PHENIX,[21] and the resulting models were rebuilt manually using COOT visualization and refinement software.[22] The data collection and refinement statistics for all crystal structures are listed in the Table S2, Supporting Information.

Results and Discussion

Selection of Targets

In April 2011, the Structure–Function Linkage Database (SFLD; sfld.rbvi.ucsf.edu/) contained 300 sequences for the ManD subgroup of the ENS (NaManD and 299 uncharacterized homologues). These sequences were used to generate a sequence similarity network (SSN) with a BLASTP e-value threshold of 10–80 (∼35% sequence identity) (Figure 2a).[13,23,24] As the BLASTP e-value threshold is decreased to 10–190, the sequences segregate into several clusters sharing >70% sequence identity (Figure 2c).

Figure 2

Sequence similarity networks (SSNs) of the ManD subgroup at several e-value thresholds to illustrate the effect of increasing stringency on clustering. Panel A, 10–80, ∼35% identity. Panel B, 10–120, ∼45% identity. Panel C, 10–190, ∼75% identity. Pink coloring indicates proteins predicted to be ManDs by the Structure Function Linkage Database. Green coloring indicates proteins that were purified and subjected to activity screening. The genome neighborhoods of the genes encoding members of the ManD subgroup (±10 genes) were analyzed to aid in target selection for protein production and structure determination. The genome neighborhoods of some members encode 2-keto-3-deoxy-d-gluconate kinase (KdgK) and 2-keto-3-deoxy-d-gluconate-6-P aldolase (KdgA). KdgK and KdgA metabolize the 2-keto-3-deoxy-d-gluconate product of the ManD reaction to pyruvate and d-glyceraldehyde 3-phosphate, indicating a catabolic role for the proximal member of the ManD subgroup. Alternatively, for some members the genome neighborhoods lack these enzymes but contain, for example, dehydrogenases, suggesting divergent catalytic and metabolic functions. Targets for protein production by the Protein Core of the Enzyme Function Initiative (EFI; enzymefunction.org), functional characterization by the University of Illinois, and structure determination by the Structure Core of the EFI were chosen from both types of genome neighborhoods. A large number of targets (115) were chosen with the anticipation that not all targets would produce soluble, purified proteins.

Substrate Screening

Of the 115 targets, 42 were produced as soluble, purified proteins (sharing less than 95% sequence identity). The ManD SSN in Figure 2 highlights the diversity of purified proteins assayed. The proteins were screened for dehydration activity with a library of 72 acid sugars using a semicarbazide-based assay (Figure S1, Supporting Information).[25−27] The catalytically active proteins (24 of the 42 screened) utilize d-mannonate, d-gluconate, or both as substrates; no other hits were observed with the acid sugar library. Positive hits were verified with 1H NMR spectra of the products (2-keto-3-deoxy-d-mannonate/2-keto-3-deoxy-d-gluconate) before the proteins were subjected to more in-depth analyses to determine kinetic constants. The ability of some members to catalyze the dehydration of d-gluconate was not expected (vide infra). The kinetic characterizations revealed further unexpected divergence in function (Table 1). Among the newly characterized ManDs, seven dehydrate d-mannonate with catalytic efficiencies similar to that of NaManD (kcat/KM = 103 to 104 M–1 s–1). However, 16 targets showed low catalytic efficiencies (kcat/KM = 101 to 102 M–1 s–1); 19 showed no detectable activity with any member of the acid sugar library. Three of the 12 proteins that dehydrate d-mannonate with low catalytic efficiency also dehydrate d-gluconate with low catalytic efficiency. Furthermore, 4 of the 23 targets with no activity on d-mannonate dehydrate d-gluconate. Thus, the functionally characterized members were assigned into three categories according to catalytic efficiency and substrate specificity: (1) high-activity (kcat/KM = 103 to 104 M–1 s–1) and specific for d-mannonate; (2) low-activity (kcat/KM = 101 to 102 M–1 s–1) and specific for either d-mannonate or d-gluconate or promiscuous for both; and (3) no-activity with either d-mannonate or d-gluconate (or any acid sugar in the library). The SSN constructed with a threshold of 10–190 (∼75% sequence identity) segregates groups with different catalytic efficiencies and substrate specificities (Figure 3).

Table 1

Kinetic Parameters for Members of the ManD Subgroup

Cluster	Uniprot ID	d-mannonate k_cat (s^–1)	d-mannonate k_cat/K_M (M^–1 s^–1)	d-gluconate k_cat (s^–1)	d-gluconate k_cat/K_M (M^–1 s^–1)	end of 7th β-strand	UxuA?
1	A5KUH4					Pro	yes
1	C9NUM5					Pro	no
1	C9Y5D5					Pro	yes
1	D0KC90					Pro	yes
1	A4W7D6					Pro	yes
1	D0X4R4					Pro	yes
1	A6AMN2					Pro	yes
1	Q6DAR4					Pro	yes
1	C6DI84					Pro	yes
6	B8HCK2					Pro	no
9	C9CN91					Pro	yes
9	C8ZZN2					Pro	yes
10	C7PW26					Pro	no
Singleton	A6M2W4					Pro	yes
Singleton	Q2CIN0					Pro	yes
Singleton	A8RQK7					Pro	yes
Singleton	C6CVY9					Gly	yes
Singleton	C9A1P5					Pro	yes
Singleton	B5GCP6					Pro	no
3	A6VRA1	0.02 ± 0.001	160			Ala	yes
3	E1V4Y0	0.03 ± 0.006	20	0.05 ± 0.004	20	Pro	yes
3	B3PDB1	0.03 ± 0.002	100			Ala	yes
3	CsManD/Q1QT89	0.02 ± 0.0005	5	0.04 ± 0.006	40	Pro	yes
4	Q8FHC7	0.02 ± 0.001	10			Pro	yes
4	A4WA78	0.02 ± 0.002	30			Pro	yes
4	B1ELW6	0.02 ± 0.001	20			Pro	yes
4	D8ADB5	0.01 ± 0.002	30			Pro	yes
4	J7KNU2	0.01 ± 0.001	20			Pro	yes
4	B5RAG0	0.01 ± 0.001	50			Pro	yes
5	D4GJ14			0.04 ± 0.003	120	Pro	yes
5	B5R541			0.05 ± 0.003	80	Pro	yes
5	B5QBD4			0.02 ± 0.0005	150	Pro	yes
5	C6CBG9	0.04 ± 0.002	50	0.03 ± 0.002	60	Pro	yes
Singleton	D9UNB2	0.004 ± 0.001	60			Pro	yes
8	D7BPX0			0.01 ± 0.002	40	Pro	no
2	Q1NAJ2	2 ± 0.07	4200			Ala	no
2	Q9AAR4	1 ± 0.006	12300			Ala	no
2	Q9A4L8	0.65 ± 0.02	1200			Ala	no
2	B0T4L2	0.3 ± 0.01	1200			Ala	no
2	B0T0B1	2 ± 0.2	12100	0.003 ± 0.001	5	Ala	no
2	A5V6Z0	4 ± 0.2	2900	0.01 ± 0.001	10	Ala	no
2	NaManD/A4XF23	1.3 ± 0.1	3200			Ala	no
7	G7TAD9	0.8 ± 0.03	4400			Ala	no

Figure 3

SSN (e-value threshold of at 10–190) showing the distribution of high- (green), low- (blue), and no-activity (red) proteins along with substrate specificities (M, d-mannonate; G, d-gluconate; M/G, d-mannonate and d-gluconate). Proteins for which structures were determined are marked with asterisks. The Pro and Ala residues associated with different substrate specificities for d-mannonate and d-gluconate are located in separate clusters: clusters 1, 4, 5, 6, 8, 9, and 10 contain Pro; clusters 2 and 7 contain Ala; and cluster 3 contains both. Pro-containing clusters exhibit low or no dehydration activity; Ala-containing clusters exhibit high dehydration activity with d-mannonate.

Divergence in Activity

Members with different in vitro activities are assumed to have different in vivo functions. Physiologically, the dehydration of d-mannonate to 2-keto-3-deoxy-d-mannonate is found in the d-glucuronate degradation pathway in which d-glucuronate is isomerized to 5-keto-d-mannonate and then reduced to d-mannonate. The d-mannonate is dehydrated, phosphorylated, and cleaved by an aldolase to form pyruvate and glyceraldehyde 3-phosphate. When this pathway was discovered in E. coli and Erwinia carotovora, dehydration of d-mannonate was found to be catalyzed by a dehydratase, UxuA, that is not a member of the ENS.[28,29] Therefore, the discovery that members of the ManD subgroup dehydrate d-mannonate with high catalytic efficiency implies convergent evolution of function in different superfamilies within the d-glucuronate catabolic pathway. Interestingly, the genomes of all of the organisms encoding high-activity ManDs lack the gene encoding UxuA; however, the genomes of the majority of organisms with low- or no-activity ManDs have a gene encoding UxuA (Table 1). This suggests that the high-activity ManDs perform the same role as UxuA in the encoding organisms; however, the low-activity members have a different metabolic function. In those organisms that encode low- or no-activity members of the ManD subgroup but no UxuA, growth on d-glucuronate likely is enabled by an alternate catabolic pathway, such as the uronate dehydrogenase or KduI pathway.[30,31] Therefore, members with low in vitro catalytic efficiencies likely have different in vivo metabolic functions, even if they can dehydrate d-mannonate. In work that will be described elsewhere, we are characterizing some of these divergent physiological functions.

d-Gluconate Dehydration

Although, in retrospect, the discovery of d-gluconate as a substrate for some members is not surprising because d-gluconate and d-mannonate are epimers at carbon-2 so they yield the same dehydration product, this stereochemical difference requires that a base other than the Tyr-Arg dyad in the “150–180s” loop abstract the 2-proton from d-gluconate. To investigate which residue could function as the d-gluconate specific base, d-mannonate and d-gluconate were modeled into the active site of the member from Chromohalobacter salexigens (CsManD) (Uniprot ID Q1QT89) that dehydrates both d-mannonate and d-gluconate (PDB code 3BSM). This was accomplished by superposing the structures of NaManD with d-mannonate in its active site (2QJM), Uniprot ID B5R541 with d-gluconate in its active site (3TWB), and unliganded CsManD (3BSM) (Figure 4). On the basis of this comparison, we hypothesized that the conserved His after the seventh β-strand is the base in d-gluconate dehydration (His 315 in CsManD).

Figure 4

A superposition of a structure with d-mannonate bound in the active site (2QJM, NaManD) with one with d-gluconate bound in the active site (3TWB; CsManD). In 2QJM, Tyr 161 is the general base that abstracts the 2-proton and His 215 is the general acid that catalyzes the departure of the 3-OH group from d-mannonate. In 3TWB, His 315 is proposed to be the general base that abstracts the 2-proton from d-gluconate or hydrogen bonds with the C5 hydroxyl of d-mannonate. The ε-nitrogen of His 315 is 3.0 Å from the C5 hydroxyl of d-mannonate and 3.1 Å from C2 of d-gluconate. Both distances are appropriate for proton abstraction or hydrogen bonding. Site-directed mutagenesis was performed to convert His 315 in CsManD to either Asn or Gln. His 315 is conserved in all members of the ManD subgroup, including NaManD, because it hydrogen bonds to the 5-hydroxyl group of d-mannonate (Figure S2, Supporting Information).[12] Both mutants abolished dehydration activity with d-gluconate. However, the H315Q mutant maintained wild-type catalytic efficiency with d-mannonate (Table 2). In contrast, the H315N mutant was inactive with d-mannonate, presumably because it is not able to hydrogen bond to the 5-hydroxyl group. These studies support the suggested role of His 315 as the base for dehydration of d-gluconate.

Table 2

Kinetic Parameters for His 315 Mutants of CsManDa

	d-mannonate			d-gluconate
protein	WT	H315Q	H315N	WT	H315Q	H315N
k_cat/K_M (M^–1 s^–1)	10	10	NA	40	NA	NA

WT = wild type; NA = no activity.

Structure Analysis

“New” crystal structures were solved for 12 sequence diverse members of the subgroup; structures previously were available for four other members. Taken together, a total of 36 unliganded and liganded structures are now available for members of the ManD subgroup (Table S3, Supporting Information). These structures were used to identify the general base that initiates dehydration of d-gluconate and also yielded insights into how structure diverges as sequence diverges. An overlay of one structure from each structure-containing cluster is shown in Figure 5 (the overlay includes only structures with ordered “150–180s” loops). The structures of the modified TIM-barrel and capping domains are highly conserved, although the conformations of the “150–180s” loop are divergent. The sequences of this loop are also highly variable (Figure S3, Supporting Information).

Figure 5

An overlay of NaManD (blue), CsManD (tan), Uniprot ID Q8FHC7 (green), Uniprot ID B5R541 (magenta), Uniprot ID A5KUH4 (red), and Uniprot ID A6M2W4 (gray) showing the overall structural homology. The “150–180s” loops are conformationally distinct. Initially, the “150–180s” loops were proposed to contain the substrate specificity determinants in analogy with the “20s” loops in other ENS members.[32] Of the 36 structures available (Figure S1, Supporting Information), 9 have an ordered “150–180s” loop covering the active site, 3 with a substrate bound. Seven of the 11 structures with disordered “150–180s” loops also have a substrate bound. The disorder of the loop, even in the presence of substrate, suggests that the substrate and the “150–180s” loop are not interacting strongly. In the structures with a bound substrate and an ordered loop, the only hydrogen bonds to the substrate involve ordered water molecules and backbone amide groups. This suggests a role other than determining substrate specificity for the “150–180s” loops. To determine the importance of the sequence of the “150–180s” loops, segments of the loops in CsManD (A164 to E172) and NaManD (V161 to E169) that are proximal to the substrate were mutated to AGAGGAGAG (Figure 6) to eliminate any ionic or hydrogen-bonding interactions with the substrate. Both mutants were expressed and purified as soluble proteins; correct folding was verified via circular dichroism and X-ray crystallography (Figures S4 and S5, Supporting Information). Screening of both mutants with the acid sugar library showed a complete loss of activity. We conclude that this loop is important for catalysis, although its exact contribution is unknown.

Figure 6

An overlay of CsManD (4F4R, blue ribbon, orange loop) and NaManD (2QJJ, tan ribbon, green loop) showing the regions of their “150–180s” loops that were mutated. The sequences of the loops are given with the area of mutagenesis highlighted with yellow.

Structural Differences between d-Mannonate/d-Gluconate Specific ManDs

The structures of proteins with divergent functions are very similar. A superposition of the structures of a high-activity ManD (NaManD), a low activity ManD (Uniprot ID Q8FHC7), a low-activity GlcD (Uniprot ID B5R541), and a member with no activity (Unitprot ID A5KUH4) reveals that the identities and positions of the active site residues are conserved (Figure 7). In addition, the residues within 6 Å of the substrate are conserved. The primary site of divergence in the active site is a Pro/Ala substitution two residues after the conserved His at the end of the seventh β-strand that hydrogen bonds to the 5-hydroxyl group of the substrate in the ManD reaction or is the general base in the GlcD reaction (Figure 4). When this substitution is mapped onto the SSN, low- and no-activity proteins possess Pro at this position, but high-activity ManDs possess an Ala (Figure 3). The single exception is cluster 3, which is recently separated from cluster 2 (i.e., connected at an e-value of 10–184). Cluster 2 includes high-activity ManDs that contain Ala; cluster 3 contains low-activity proteins that contain either Ala or Pro. Members that contain Pro dehydrate both d-mannonate and d-gluconate, but members with Ala dehydrate only d-mannonate. Two of the 8 high-activity ManDs also dehydrate d-gluconate, but with very low catalytic efficiency. We hypothesize that the proteins in cluster 3 may be intermediates in the evolution of the GlcD function.

Figure 7

An overlay of the active sites of NaManD (2QJJ, high-activity, d-mannonate specific - red), CsManD (4F4R, low-activity, promiscuous for d-mannonate/d-gluconate - blue), Uniprot ID D4GJ14 (3T6C, low-activity, d-gluconate specific - green), and Uniprot ID A4W7D6 (3TJI, no-activity - magenta). The metal binding and acid/base residues are superimposable. The Pro/Ala dimorphism is also shown. The ligands are d-mannonate from 2QJM (red) and d-gluconate from 3T6C (green). This hypothesis was investigated by constructing the NaManD A314P and CsManD P317A mutants. The A314P mutant of NaManD maintained its specificity for d-mannonate, but with a reduced catalytic efficiency (3200 M–1 s–1 to 60 M–1 s–1). The P317A mutant of CsManD maintained low-activity on d-mannonate and d-gluconate; the catalytic efficiency for d-mannonate is increased (from 5 M–1 s–1 to 100 M–1 s–1), but that for d-gluconate is somewhat decreased (from 40 M–1 s–1 to 5 M–1 s–1). Thus, we conclude that (1) Ala favors d-mannonate dehydration and disfavors d-gluconate dehydration; and (2) Pro disfavors d-mannonate dehydration. A restriction of backbone flexibility may explain the observed change in substrate specificity and catalytic efficiency, with the flexibility associated with Ala allowing hydrogen-bonding to the substrate and the rigidity associated with Pro promoting general-base catalysis.

Proteins with No Activity

The genes encoding many inactive members of the ManD subgroup are neighbors of genes encoding 2-keto-3-deoxy-d-gluconate kinase and 2-keto-3-deoxy-d-gluconate-6-phosphate aldolase that are downstream of ManD in the d-glucuronate catabolic pathway (Figure S6, Supporting Information). However, they have no activity with d-mannonate or the three carbon-2 or -3 epimers that would be dehydrated to 2-keto-3-deoxy-d-mannonate (d-altronate, d-allonate, or d-gluconate). Variations in pH, temperature, salt concentration, osmolarity, osmolytes, and metal cations as well as the presence of dithiothreitol and nucleotide mono-, di-, and triphosphates were tested to determine whether these enzymes may be subject to unanticipated regulation; however, no changes in activity were observed. d-Mannonate 6-phosphate and d-gluconate 6-phosphate also were tested, but no activity was observed. Perhaps these proteins function in multienzyme complexes (channeling) or utilize an acid sugar that is not present in our library.

Conclusions

The ManD subgroup of the ENS is an excellent example of homologous proteins that have divergent catalytic efficiencies and substrate specificities. Unexpectedly, we discovered that the ManD subgroup is not isofunctional: in addition to the ManDs, some members catalyze the dehydration of d-gluconate, and others are promiscuous for dehydration of both d-mannonate and d-gluconate. Clearly, automated methods would provide misleading/incorrect annotations that would be of limited/no value in deducing their metabolic functions. We have also determined that the structural determinations of substrate specificity are both indirect and subtle: a Pro/Ala substitution appears to be the major determination of specificity. In addition, the role of the sequence divergent and conformationally flexible “150–180s” loop is uncertain. The side chains of the loop makes no direct contacts with the substrate, so the loop does not appear to be a determinant of substrate specificity as has been well-established for members of the muconate lactonizing enzyme (MLE) and mandelate racemase (MR) subgroups in the ENS. Although we attempted to determine whether the loop is involved in protein–protein interactions/substrate channeling, we could not obtain any conclusive results. We also discovered that the catalytic efficiencies of members of the subgroup are highly variable, despite conservation of active site residues and structures. These data may provide important insights for the metabolic engineering community. The data illustrates how closely related sequences can perform different reactions, and therefore, may help guide studies which aim to redesign proteins for use in new pathways. Enzymological dogma has been that enzymes have evolved to achieve catalytic perfection; i.e., the reactions are diffusion-controlled. In vitro experiments alone can and will not provide biological insights into why the values of kcat/KM for some of the members of the subgroup are “low”. We have selected several of these for future biological/metabolic characterization so that we might be able to better understand the relationship between catalytic efficiencies and metabolic requirements.

33 in total

1. Intrinsic errors in genome annotation.

Authors: D Devos; A Valencia
Journal: Trends Genet Date: 2001-08 Impact factor: 11.639

2. Spectrophotometric measurement of alpha-keto acid semicarbazones.

Authors: J A OLSON
Journal: Arch Biochem Biophys Date: 1959-11 Impact factor: 4.013

3. Catabolism of galacturonic and glucuronic acids by Erwinia carotovora.

Authors: W W KILGORE; M P STARR
Journal: J Biol Chem Date: 1959-09 Impact factor: 5.157

4. Evolutionary potential of (beta/alpha)8-barrels: in vitro enhancement of a "new" reaction in the enolase superfamily.

Authors: Jacob E Vick; Dawn M Z Schmidt; John A Gerlt
Journal: Biochemistry Date: 2005-09-06 Impact factor: 3.162

5. Using sequence similarity networks for visualization of relationships across diverse protein superfamilies.

Authors: Holly J Atkinson; John H Morris; Thomas E Ferrin; Patricia C Babbitt
Journal: PLoS One Date: 2009-02-03 Impact factor: 3.240

6. Features and development of Coot.

Authors: P Emsley; B Lohkamp; W G Scott; K Cowtan
Journal: Acta Crystallogr D Biol Crystallogr Date: 2010-03-24

7. Gluconate dehydratase from the promiscuous Entner-Doudoroff pathway in Sulfolobus solfataricus.

Authors: Henry J Lamble; Christine C Milburn; Garry L Taylor; David W Hough; Michael J Danson
Journal: FEBS Lett Date: 2004-10-08 Impact factor: 4.124

Review 8. Scaling and assessment of data quality.

Authors: Philip Evans
Journal: Acta Crystallogr D Biol Crystallogr Date: 2005-12-14

9. Estimating the annotation error rate of curated GO database sequence annotations.

Authors: Craig E Jones; Alfred L Brown; Ute Baumann
Journal: BMC Bioinformatics Date: 2007-05-22 Impact factor: 3.169

10. Phaser crystallographic software.

Authors: Airlie J McCoy; Ralf W Grosse-Kunstleve; Paul D Adams; Martyn D Winn; Laurent C Storoni; Randy J Read
Journal: J Appl Crystallogr Date: 2007-07-13 Impact factor: 3.304

13 in total

Review 1. Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks.

Authors: John A Gerlt; Jason T Bouvier; Daniel B Davidson; Heidi J Imker; Boris Sadkhin; David R Slater; Katie L Whalen
Journal: Biochim Biophys Acta Date: 2015-04-18

2. ATP-binding Cassette (ABC) Transport System Solute-binding Protein-guided Identification of Novel d-Altritol and Galactitol Catabolic Pathways in Agrobacterium tumefaciens C58.

Authors: Daniel J Wichelecki; Matthew W Vetting; Liyushang Chou; Nawar Al-Obaidi; Jason T Bouvier; Steven C Almo; John A Gerlt
Journal: J Biol Chem Date: 2015-10-15 Impact factor: 5.157

3. Investigating the physiological roles of low-efficiency D-mannonate and D-gluconate dehydratases in the enolase superfamily: pathways for the catabolism of L-gulonate and L-idonate.

Authors: Daniel J Wichelecki; Jean Alyxa Ferolin Vendiola; Amy M Jones; Nawar Al-Obaidi; Steven C Almo; John A Gerlt
Journal: Biochemistry Date: 2014-08-27 Impact factor: 3.162

4. Experimental strategies for functional annotation and metabolism discovery: targeted screening of solute binding proteins and unbiased panning of metabolomes.

Authors: Matthew W Vetting; Nawar Al-Obaidi; Suwen Zhao; Brian San Francisco; Jungwook Kim; Daniel J Wichelecki; Jason T Bouvier; Jose O Solbiati; Hoan Vu; Xinshuai Zhang; Dmitry A Rodionov; James D Love; Brandan S Hillerich; Ronald D Seidel; Ronald J Quinn; Andrei L Osterman; John E Cronan; Matthew P Jacobson; John A Gerlt; Steven C Almo
Journal: Biochemistry Date: 2015-01-16 Impact factor: 3.162

5. Classification and substrate head-group specificity of membrane fatty acid desaturases.

Authors: Dongdi Li; Ruth Moorman; Thomas Vanhercke; James Petrie; Surinder Singh; Colin J Jackson
Journal: Comput Struct Biotechnol J Date: 2016-09-12 Impact factor: 7.271

Review 6. Diversity in protein domain superfamilies.

Authors: Sayoni Das; Natalie L Dawson; Christine A Orengo
Journal: Curr Opin Genet Dev Date: 2015-11-03 Impact factor: 5.578

7. Identification of the in vivo function of the high-efficiency D-mannonate dehydratase in Caulobacter crescentus NA1000 from the enolase superfamily.

Authors: Daniel J Wichelecki; Dylan C Graff; Nawar Al-Obaidi; Steven C Almo; John A Gerlt
Journal: Biochemistry Date: 2014-06-20 Impact factor: 3.162

8. Enzymatic and structural characterization of rTSγ provides insights into the function of rTSβ.

Authors: Daniel J Wichelecki; D Sean Froese; Jolanta Kopec; Joao R C Muniz; Wyatt W Yue; John A Gerlt
Journal: Biochemistry Date: 2014-04-15 Impact factor: 3.162

9. Carbon partitioning in green algae (chlorophyta) and the enolase enzyme.

Authors: Jürgen E W Polle; Peter Neofotis; Andy Huang; William Chang; Kiran Sury; Eliza M Wiech
Journal: Metabolites Date: 2014-08-04

10. Prediction of enzymatic pathways by integrative pathway mapping.

Authors: Sara Calhoun; Magdalena Korczynska; Daniel J Wichelecki; Brian San Francisco; Suwen Zhao; Dmitry A Rodionov; Matthew W Vetting; Nawar F Al-Obaidi; Henry Lin; Matthew J O'Meara; David A Scott; John H Morris; Daniel Russel; Steven C Almo; Andrei L Osterman; John A Gerlt; Matthew P Jacobson; Brian K Shoichet; Andrej Sali
Journal: Elife Date: 2018-01-29 Impact factor: 8.140