The continued increase in the size of the protein sequence databases as a result of advances in genome sequencing technology is overwhelming the ability to perform experimental characterization of function. Consequently, functions are assigned to the vast majority of proteins via automated, homology-based methods, with the result that as many as 50% are incorrectly annotated or unannotated ( Schnoes et al. PLoS Comput. Biol. 2009 , 5 ( 12 ), e1000605 ). This manuscript describes a study of the D-mannonate dehydratase (ManD) subgroup of the enolase superfamily (ENS) to investigate how function diverges as sequence diverges. Previously, one member of the subgroup had been experimentally characterized as ManD [dehydration of D-mannonate to 2-keto-3-deoxy-D-mannonate (equivalently, 2-keto-3-deoxy-D-gluconate)]. In this study, 42 additional members were characterized to sample sequence-function space in the ManD subgroup. These were found to differ in both catalytic efficiency and substrate specificity: (1) high efficiency (kcat/KM = 10(3) to 10(4) M(-1) s(-1)) for dehydration of D-mannonate, (2) low efficiency (kcat/KM = 10(1) to 10(2) M(-1) s(-1)) for dehydration of d-mannonate and/or D-gluconate, and 3) no-activity with either D-mannonate or D-gluconate (or any other acid sugar tested). Thus, the ManD subgroup is not isofunctional and includes D-gluconate dehydratases (GlcDs) that are divergent from the GlcDs that have been characterized in the mandelate racemase subgroup of the ENS (Lamble et al. FEBS Lett. 2004 , 576 , 133 - 136 ) (Ahmed et al. Biochem. J. 2005 , 390 , 529 - 540 ). These observations signal caution for functional assignment based on sequence homology and lay the foundation for the studies of the physiological functions of the GlcDs and the promiscuous ManDs/GlcDs.
The continued increase in the size of the protein sequence databases as a result of advances in genome sequencing technology is overwhelming the ability to perform experimental characterization of function. Consequently, functions are assigned to the vast majority of proteins via automated, homology-based methods, with the result that as many as 50% are incorrectly annotated or unannotated ( Schnoes et al. PLoS Comput. Biol. 2009 , 5 ( 12 ), e1000605 ). This manuscript describes a study of the D-mannonate dehydratase (ManD) subgroup of the enolase superfamily (ENS) to investigate how function diverges as sequence diverges. Previously, one member of the subgroup had been experimentally characterized as ManD [dehydration of D-mannonate to 2-keto-3-deoxy-D-mannonate (equivalently, 2-keto-3-deoxy-D-gluconate)]. In this study, 42 additional members were characterized to sample sequence-function space in the ManD subgroup. These were found to differ in both catalytic efficiency and substrate specificity: (1) high efficiency (kcat/KM = 10(3) to 10(4) M(-1) s(-1)) for dehydration of D-mannonate, (2) low efficiency (kcat/KM = 10(1) to 10(2) M(-1) s(-1)) for dehydration of d-mannonate and/or D-gluconate, and 3) no-activity with either D-mannonate or D-gluconate (or any other acid sugar tested). Thus, the ManD subgroup is not isofunctional and includes D-gluconate dehydratases (GlcDs) that are divergent from the GlcDs that have been characterized in the mandelate racemase subgroup of the ENS (Lamble et al. FEBS Lett. 2004 , 576 , 133 - 136 ) (Ahmed et al. Biochem. J. 2005 , 390 , 529 - 540 ). These observations signal caution for functional assignment based on sequence homology and lay the foundation for the studies of the physiological functions of the GlcDs and the promiscuous ManDs/GlcDs.
The massive
influx of sequence
data since the first bacterial genome sequence was published in 1995
has necessitated a reliance on homology-based annotations of protein
function.[1,2] However, because this method assigns the
function of the “closest” homologue, an estimated 30–50%
of the functional annotations in the databases are incorrect,[3−5] with the magnitude of the problem increasing as the incorrect annotations
are propagated in assigning functions to proteins discovered in newly
sequenced genomes. In a study of several functionally diverse superfamilies,
Schnoes, Babbitt, and co-workers concluded that 85% of misannotations
resulted from annotations that are more detailed than justified.[3] Automated methods often are able to achieve high
degrees of accuracy in the transfer of the first three Enzyme Commission
(EC) code numbers, but accurate transfer of the fourth EC code number
(substrate specificity) is much more difficult.[6] This study examines the d-mannonate dehydratase
(ManD) subgroup of the enolase superfamily (ENS) to determine experimentally,
on a large scale, how function diverges as sequence diverges in highly
homologous enzymes. Our results illustrate the difficulty of accurately
assigning function via homology-based methods and, also, provide insights
into how different functions can arise in highly homologous enzymes.Two conserved features are shared by members of the ENS: mechanism
and structure. The mechanism is general base-catalyzed abstraction
of a proton alpha to a carboxylate group of the substrate to form
an enediolate intermediate.[7] The enediolate
intermediate is stabilized by coordination to an active site divalent
metal cation (usually Mg2+). Furthermore, members of the
ENS share a common structural motif: an (α + β) capping
domain that contains the residues that determine substrate specificity
and a modified TIM-barrel domain ((β/α)7β-barrel)
that contains the residues that mediate acid/base chemistry.[8,9] Subgroups are differentiated by the conserved metal-binding residues
at the ends of the third, fourth, and fifth β-strands as well
as conserved catalytic acid/base residues at the ends of the second,
third, sixth, and/or seventh β-strands of the modified TIM-barrel
domain. The ENS is particularly interesting because its members share
a common mechanism and the same structural motif but are functionally
diverse (e.g., β-elimination and 1,1-proton transfer reactions).[10,11] Thus, the ENS is a good model to investigate how function diverges
as sequence diverges.In 2007, Rakus and co-workers discovered
the ManD subgroup of the
ENS.[12] In that study, a protein from Novosphingobium aromaticivorans (NaManD)
was structurally characterized and discovered to catalyze the syn-dehydration of d-mannonate to 2-keto-3-deoxy-d-mannonate (equivalently, 2-keto-3-deoxy-d-gluconate)
(EC 4.2.1.8 and Unitprot ID A4XF23). The Mg2+-binding residues
located at the ends of the third, fourth, and fifth β-strands
are Asp 210, Glu 236, and Glu 262, respectively. The general base
that abstracts the 2-proton is the Tyr 159-Arg 147 dyad in the “150–180s”
loop between the second and third β-strands; the general acid
that facilitates departure of the 3-hydroxyl group is His 212 located
at the end of the third β-strand[12] (Figure 1). His 315, located at the end of
the seventh β-strand that is hydrogen-bonded to the 5-hydroxyl
group of the d-mannonate substrate, is also conserved. Since
this study, no other members of the ManD subgroup have been experimentally
characterized. Given the conservation of metal-binding residues, catalytic
residues, and active site architecture, the assumption was that the
entire subgroup is isofunctional.
Figure 1
Denotes the structural features of the
ManDs (PDB 2QJJ). On the left, the
“150–180s” loop (blue), TIM barrel (red), and
capping domain (green) are displayed. The right inset shows the active
site residues: metal binding Asp210, Glu236, Glu262 (green); Tyr157-Arg149
catalytic dyad (magenta); acidic His212 (blue); and conserved His315
at the end of the 7th β-strand (red). The d-mannonate
ligand from the 2QJM structure is shown in light blue.
Denotes the structural features of the
ManDs (PDB 2QJJ). On the left, the
“150–180s” loop (blue), TIM barrel (red), and
capping domain (green) are displayed. The right inset shows the active
site residues: metal binding Asp210, Glu236, Glu262 (green); Tyr157-Arg149
catalytic dyad (magenta); acidic His212 (blue); and conserved His315
at the end of the 7th β-strand (red). The d-mannonate
ligand from the 2QJM structure is shown in light blue.In this study, we sought to investigate how function
diverges as
sequence diverges within this subgroup and to determine if the ManD
subgroup is, in fact, isofunctional. When the target proteins for
this study were selected in April 2011, the ManD subgroup included NaManD[12] and 299 uncharacterized
proteins that share ≥35% sequence identity [Structure–Function
Linkage Database (http://sfld.rbvi.ucsf.edu/)]. [At the
time of submission of this manuscript, the UniProtKB database contained
the sequences for 2919 members of the ManD subgroup.] Forty-three
members representing the breadth of sequence–function space
were produced as soluble proteins and screened for activity using
a library of acid sugars. Surprisingly, we found that the ManD subgroup
is not isofunctional; instead, in addition to ManDs
it also contains d-gluconate dehydratases (GlcDs) that catalyze
the anti-dehydration of d-gluconate to 2-keto-3-deoxy-d-gluconate as well as promiscuous proteins that catalyze both
the ManD and GlcD reactions (Scheme 1). In
addition, a wide range of catalytic efficiencies (values of kcat/KM) were discovered.
Using sequence similarity networks (SSNs),[13] the members with these divergent functions could be separated into
isofunctional clusters. Furthermore, 16 unique crystal structures
(for a total of 36 unliganded and liganded structures) were solved
to survey sequence and structure space; these revealed conserved active
site structures but divergent conformations for the “150–180s”
loops that contain the general basic Tyr-Arg dyads and close over
the active site to sequester the substrate from solvent. Taken together,
the functional and structural data provide a comprehensive description
of how in vitro function diverges as sequence diverges.
Scheme 1
Materials and Methods
Cloning, Expression, and Purification of
Targets (AECOM)
pNIC28-BSA4-based expression vectors were
transformed into BL21(DE3) Escherichia coli containing
the pRIL plasmid (Stratagene)
and used to inoculate a 10 mL 2xYT culture containing 25 μg/mL
kanamycin and 34 μg/mL chloramphenicol. The cultures were allowed
to grow overnight at 37 °C in a shaking incubator. The overnight
culture was used to inoculate 2 L of PASM-5052 autoinduction media.[14] The culture was placed in a LEX48 airlift fermenter
and incubated at 37 °C for 4 h and then at 22 °C overnight.
The culture was harvested and pelleted by centrifugation.Cells
were resuspended in lysis buffer (20 mM HEPES, pH 7.5, 500 mM NaCl,
20 mM imidazole, and 10% glycerol) and lysed by sonication. The lysate
was clarified by centrifugation at 35000g for 30
min. The protein was purified using an AKTAxpress FPLC (GE Healthcare).
The lysate was loaded onto a 1 mL His60 column (Clontech), washed
with 10 column volumes of lysis buffer, and eluted with buffer containing
20 mM HEPES, pH 7.5, 500 mM NaCl, 500 mM Imidazole, and 10% glycerol.
This partially purified protein was loaded onto a HiLoad S200 16/60
PR gel filtration column equilibrated with SECB buffer (20 mM HEPES,
pH7.5, 150 mM NaCl, 10% glycerol, and 5 mM DTT). The protein was analyzed
by SDS-PAGE, flash frozen in liquid nitrogen, and stored at −80°C.
Expression and Purification of N-Terminal
His-Tagged Proteins
(UIUC)
Genes in pET15b (Novagen) were expressed in E. coli strain BL21. Small-scale cultures were grown at
37 °C for 18 h in 5 mL of LB containing 100 μg/mL ampicillin
and used to inoculate 1 L LB containing 100 μg/mL ampicillin.
IPTG (500 μM) was added at OD600 nm = 0.6–0.8
to induce expression. The induced cultures then were grown for an
additional 18 h at 37 °C. The cells were harvested by centrifugation
at 5000 rpm for 10 min and resuspended in 70 mL of binding buffer
(6 mM imidazole, 20 mM Tris-HCl, pH 7.9, 5 mM MgCl2, and
500 mM NaCl). The resuspended cells were lysed by sonication and centrifuged
at 17 000 rpm for 30 min. The supernatant was loaded onto a
column of 50 mL chelating Sepharose Fast Flow (Amersham Biosciences)
charged with Ni2+ and eluted with a linear gradient of
imidazole (0–1 M over 600 mL). Fractions were analyzed using
SDS-PAGE. Fractions containing protein at high purity (>90%) were
combined and dialyzed against 4 L of buffer containing 100 mM imidazole,
20 mM Tris-HCl, pH 7.9, 10 mM MgCl2, 150 mM NaCl, and 10%
glycerol for 2 h at 4 °C. The protein was dialyzed in this manner
a total of three times. Finally, the protein was concentrated to a
maximum of ∼10 mg/mL (depending on solubility) and flash-frozen
using liquid nitrogen and stored at −80 °C.
Expression
and Purification of Tagless ManD Constructs (UIUC)
The genes
in pET17b (Novagen) were expressed in E. coli strain
BL21. Small-scale cultures were grown at 37 °C for 18
h in 5 mL of LB containing 100 μg/mL ampicillin and used to
inoculate 1 L LB containing 100 μg/mL ampicillin. The 1 L cultures
were grown for an additional 18 h at 37 °C without induction.
The cells were harvested by centrifugation at 5000 rpm for 10 min
and resuspended in 70 mL of binding buffer. The resuspended cells
were lysed by sonication and centrifuged at 17 000 rpm for
30 min. The supernatant was loaded onto a 300 mL DEAE Sepharose column
(Amersham Biosciences) and eluted with a linear gradient of NaCl (0–1
M over 1.6 L) in 10 mM Tris-HCl, pH 7.9, containing 5 mM MgCl2. Fractions containing the protein of interest at high purity
were combined and dialyzed against 4 L buffer containing 10 mM Tris-HCl,
pH 7.9, and 5 mM MgCl2 for 2 h at 4 °C. The dialyzed
protein was then loaded onto a 30 mL Q-Sepharose column (Amersham
Biosciences) and eluted with a linear gradient of NaCl (0–1
M over 500 mL) in 10 mM Tris-HCl, pH 7.9, containing 5 mM MgCl2. Fractions containing the protein of interest at high purity
were combined and dialyzed in 4 L buffer containing 10 mM Tris-HCl,
pH 7.9, containing 5 mM MgCl2 for 2 h at 4 °C. Ammonium
sulfate was added to a final concentration of 1 M, and the sample
was loaded onto a 30 mL phenylsepharose column (Amersham Biosciences).
The protein was eluted with a gradient of ammonium sulfate (1–0
M over 500 mL) in 10 mM Tris-HCl, pH 7.9, containing 5 mM MgCl2. Fractions with pure protein (SDS-PAGE) were combined and
dialyzed against 4 L buffer containing 10 mM Tris-HCl, pH 7.9, 5 mM
MgCl2, 150 mM NaCl, and 10% glycerol for 2 h at 4 °C.
Finally, the protein was concentrated to a maximum of ∼10 mg/mL
(depending on solubility) and flash frozen in liquid nitrogen and
stored at −80 °C.
Screen for Dehydration
Reactions to test for enzymatic
activity were performed in acrylic, UV transparent 96-well plates
(Corning Incorporated) using a library of 72 acid sugars (Figure S1, Supporting Information). Reactions
(60 μL) contained 50 mM HEPES, pH 7.9, 10 mM MgCl2, 1 μM enzyme, and 1 mM of acid sugar substrate (blanks without
enzyme). The plates were incubated at 30 °C for 16 h. A 1% semicarbazide
reagent solution (240 μL) was added to each well and incubated
for 1 h at room temperature. The absorbancies were measured at 250
nm (ε = 10 200 M–1 cm–1) using a Tecan Infinite M200PRO plate reader.
Kinetic Assays
Dehydration of d-mannonate
and d-gluconate was quantitated using either a discontinuous
assay with the semicarbazide reagent[15,16] or a continuous,
coupled-enzyme spectrophotometric assay. In the latter assay, the
product was phosphorylated using 2-keto-3-deoxy-d-gluconate
kinase (KdgK) and ATP; formation of ADP was measured using pyruvate
kinase (PK) and l-lactate dehydrogenase (LDH). The assay
(200 μL) at 25 °C contained 50 mM potassium HEPES, pH 7.5,
5 mM MgCl2, 1.5 mM ATP, 1.5 mM PEP, 0.16 mM NADH, 9 units
of PK, 9 units of LDH, 18 units of KdgK, and ManD/GlcD. Dehydration
was quantitated by measuring the decrease in absorbance at 340 nm
(ε = 6220 M–1 cm–1). Low-activity
enzymes were characterized using the discontinuous assay; high-activity
ManDs were characterized using the coupled assay.
Site-Directed
Mutagenesis (SDM)
Mutants were constructed
using primers designed with the Agilent Technologies online webserver
(https://www.genomics.agilent.com/) and purchased from
Bio-Synthesis, Inc. Forward and reverse primers containing the mutations
of interest are listed in Table S1, Supporting
Information. PCR reactions (30 μL) contained 1 mM MgCl2, 1× Pfx Amp buffer, 0.33 mM dNTP, 0.33 μM of FOR/REV
primer, and 1.25 units Pfx polymerase (Invitrogen Platinum Pfx DNA
Polymerase kit). The templates were 50 ng ManD-containing pET15b (NaManD) or pET17b (CsManD). Amplifications
were performed according to the manufacturer’s guidelines.
After addition of DpnI (10 units), the reactions were incubated for
4 h at 37 °C. The DpnI-digested products were purified by gel
electrophoresis, extracted, and transformed (electroporation, Bio-Rad
Micropulsar Electroporator) into XL1 Blue competent cells. Finally,
plasmids isolated from the transformants were sequenced to confirm
the mutations.
Circular Dichroism of NaManD and CsManD Loop
Mutants
The circular dichroism spectrum of a 10 μM
solution of mutant
enzyme in an optically clear borate buffer (50 mM boric acid, 100
mM KCl, 0.7 mM DTT, pH 8.0) was measured from 190 to 260 nm using
a Jasco J-715 spectropolarimeter. Five replicate measurements were
made.
Protein Crystallization and X-ray Diffraction Data Collection
Proteins were crystallized by the sitting-drop vapor diffusion
method. The concentrated (usually 5–40 mg/mL) protein solutions
(from 0.3 to 1 μL) were mixed with an equal volume of a precipitant
solution and equilibrated at room temperature (∼294 K) against
the same precipitant solution in clear tape-sealed 96-well INTELLI-plates
(Art Robbins Instruments, Sunnyvale, CA). Crystallization was performed
using either a TECAN crystallization robot (TECAN US, Research Triangle
Park, NC) or a PHOENIX crystallization robot (Art Robbins Instruments)
and four types of commercial crystallization screens: the WIZARD I&II
screen (Emerald BioSystems, Bainbridge Island, WA); the INDEX HT and
the CRYSTAL SCREEN HT (both from Hampton Research, Aliso Viejo, CA);
and the MCSG screen (Microlytic, Woburn, MA). The appearance of protein
crystals has been monitored either by visual inspection or using a
Rock Imager 1000 (Formulatrix, Waltham, MA) starting within 24 h of
incubation and again at weeks 1, 2, 3, 5, 8, and 12. Where necessary,
the crystallization conditions were optimized manually using 24-well
Cryschem sitting drop plates (Hampton Research). The crystallization
conditions for all crystal structures are listed in the PDB methods
tab and the Supporting Information. The
crystals were either directly frozen in liquid nitrogen or treated
with a cryoprotectant (glycerol or ethylene glycol, 20–30%,
vol/vol) before freezing.The X-ray diffraction data for the
frozen crystals were collected at 100 K on the beamline X29A (National
Synchrotron Light Source, Brookhaven National Laboratory, Upton, NY)
using a wavelength of 1.075 Å or on the beamline 31I-D (LRL-CAT,
Advanced Photon Source, Argonne National Laboratory, IL, USA) using
a wavelength of 0.9793 Å. The diffraction data were processed
and scaled with SCALA[17] (APS data) or HKL[18] (NSLS data). The crystal structures reported
here were determine by molecular replacement using coordinates for
similar structures from the PDB (listed in each PDB deposition as
REMARK 200: STARTING MODEL) and PHASER MR software (the CCP4 program
package suite).[19] Each structure was refined
using the programs REFMAC[20] or PHENIX,[21] and the resulting models were rebuilt manually
using COOT visualization and refinement software.[22] The data collection and refinement statistics for all crystal
structures are listed in the Table S2, Supporting
Information.
Results and Discussion
Selection of Targets
In April 2011, the Structure–Function
Linkage Database (SFLD; sfld.rbvi.ucsf.edu/) contained 300 sequences
for the ManD subgroup of the ENS (NaManD and 299
uncharacterized homologues). These sequences were used to generate
a sequence similarity network (SSN) with a BLASTP e-value threshold
of 10–80 (∼35% sequence identity) (Figure 2a).[13,23,24] As the BLASTP e-value threshold is decreased to 10–190, the sequences segregate into several clusters sharing >70% sequence
identity (Figure 2c).
Figure 2
Sequence similarity networks
(SSNs) of the ManD subgroup at several
e-value thresholds to illustrate the effect of increasing stringency
on clustering. Panel A, 10–80, ∼35% identity.
Panel B, 10–120, ∼45% identity. Panel C,
10–190, ∼75% identity. Pink coloring indicates
proteins predicted to be ManDs by the Structure Function Linkage Database.
Green coloring indicates proteins that were purified and subjected
to activity screening.
Sequence similarity networks
(SSNs) of the ManD subgroup at several
e-value thresholds to illustrate the effect of increasing stringency
on clustering. Panel A, 10–80, ∼35% identity.
Panel B, 10–120, ∼45% identity. Panel C,
10–190, ∼75% identity. Pink coloring indicates
proteins predicted to be ManDs by the Structure Function Linkage Database.
Green coloring indicates proteins that were purified and subjected
to activity screening.The genome neighborhoods of the genes encoding members of
the ManD
subgroup (±10 genes) were analyzed to aid in target selection
for protein production and structure determination. The genome neighborhoods
of some members encode 2-keto-3-deoxy-d-gluconate kinase
(KdgK) and 2-keto-3-deoxy-d-gluconate-6-P aldolase (KdgA).
KdgK and KdgA metabolize the 2-keto-3-deoxy-d-gluconate product
of the ManD reaction to pyruvate and d-glyceraldehyde 3-phosphate,
indicating a catabolic role for the proximal member of the ManD subgroup.
Alternatively, for some members the genome neighborhoods lack these
enzymes but contain, for example, dehydrogenases, suggesting divergent
catalytic and metabolic functions. Targets for protein production
by the Protein Core of the Enzyme Function Initiative (EFI; enzymefunction.org),
functional characterization by the University of Illinois, and structure
determination by the Structure Core of the EFI were chosen from both
types of genome neighborhoods. A large number of targets (115) were
chosen with the anticipation that not all targets would produce soluble,
purified proteins.
Substrate Screening
Of the 115 targets,
42 were produced
as soluble, purified proteins (sharing less than 95% sequence identity).
The ManD SSN in Figure 2 highlights the diversity
of purified proteins assayed. The proteins were screened for dehydration
activity with a library of 72 acid sugars using a semicarbazide-based
assay (Figure S1, Supporting Information).[25−27] The catalytically active proteins (24 of the 42 screened)
utilize d-mannonate, d-gluconate, or both as substrates;
no other hits were observed with the acid sugar library. Positive
hits were verified with 1H NMR spectra of the products
(2-keto-3-deoxy-d-mannonate/2-keto-3-deoxy-d-gluconate)
before the proteins were subjected to more in-depth analyses to determine
kinetic constants. The ability of some members to catalyze the dehydration
of d-gluconate was not expected (vide infra).The kinetic characterizations revealed further unexpected
divergence in function (Table 1). Among the
newly characterized ManDs, seven dehydrated-mannonate with
catalytic efficiencies similar to that of NaManD
(kcat/KM =
103 to 104 M–1 s–1). However, 16 targets showed low catalytic efficiencies (kcat/KM = 101 to 102 M–1 s–1); 19 showed no detectable activity with any member of the acid sugar
library. Three of the 12 proteins that dehydrated-mannonate
with low catalytic efficiency also dehydrated-gluconate
with low catalytic efficiency. Furthermore, 4 of the 23 targets with
no activity on d-mannonate dehydrate d-gluconate.
Thus, the functionally characterized members were assigned into three
categories according to catalytic efficiency and substrate specificity:
(1) high-activity (kcat/KM = 103 to 104 M–1 s–1) and specific for d-mannonate; (2)
low-activity (kcat/KM = 101 to 102 M–1 s–1) and specific for either d-mannonate or d-gluconate or promiscuous for both; and (3) no-activity with
either d-mannonate or d-gluconate (or any acid sugar
in the library). The SSN constructed with a threshold of 10–190 (∼75% sequence identity) segregates groups with different
catalytic efficiencies and substrate specificities (Figure 3).
Table 1
Kinetic Parameters for Members of
the ManD Subgroup
Cluster
Uniprot ID
d-mannonate kcat (s–1)
d-mannonate kcat/KM (M–1 s–1)
d-gluconate kcat (s–1)
d-gluconate kcat/KM (M–1 s–1)
end of 7th β-strand
UxuA?
1
A5KUH4
Pro
yes
1
C9NUM5
Pro
no
1
C9Y5D5
Pro
yes
1
D0KC90
Pro
yes
1
A4W7D6
Pro
yes
1
D0X4R4
Pro
yes
1
A6AMN2
Pro
yes
1
Q6DAR4
Pro
yes
1
C6DI84
Pro
yes
6
B8HCK2
Pro
no
9
C9CN91
Pro
yes
9
C8ZZN2
Pro
yes
10
C7PW26
Pro
no
Singleton
A6M2W4
Pro
yes
Singleton
Q2CIN0
Pro
yes
Singleton
A8RQK7
Pro
yes
Singleton
C6CVY9
Gly
yes
Singleton
C9A1P5
Pro
yes
Singleton
B5GCP6
Pro
no
3
A6VRA1
0.02 ± 0.001
160
Ala
yes
3
E1V4Y0
0.03 ± 0.006
20
0.05 ± 0.004
20
Pro
yes
3
B3PDB1
0.03 ± 0.002
100
Ala
yes
3
CsManD/Q1QT89
0.02 ± 0.0005
5
0.04 ± 0.006
40
Pro
yes
4
Q8FHC7
0.02 ± 0.001
10
Pro
yes
4
A4WA78
0.02 ± 0.002
30
Pro
yes
4
B1ELW6
0.02 ± 0.001
20
Pro
yes
4
D8ADB5
0.01 ± 0.002
30
Pro
yes
4
J7KNU2
0.01 ± 0.001
20
Pro
yes
4
B5RAG0
0.01 ± 0.001
50
Pro
yes
5
D4GJ14
0.04 ± 0.003
120
Pro
yes
5
B5R541
0.05 ± 0.003
80
Pro
yes
5
B5QBD4
0.02 ± 0.0005
150
Pro
yes
5
C6CBG9
0.04 ± 0.002
50
0.03 ± 0.002
60
Pro
yes
Singleton
D9UNB2
0.004 ± 0.001
60
Pro
yes
8
D7BPX0
0.01 ± 0.002
40
Pro
no
2
Q1NAJ2
2 ± 0.07
4200
Ala
no
2
Q9AAR4
1 ± 0.006
12300
Ala
no
2
Q9A4L8
0.65 ± 0.02
1200
Ala
no
2
B0T4L2
0.3 ± 0.01
1200
Ala
no
2
B0T0B1
2 ± 0.2
12100
0.003 ± 0.001
5
Ala
no
2
A5V6Z0
4 ± 0.2
2900
0.01 ± 0.001
10
Ala
no
2
NaManD/A4XF23
1.3 ± 0.1
3200
Ala
no
7
G7TAD9
0.8 ± 0.03
4400
Ala
no
Figure 3
SSN (e-value threshold of at 10–190)
showing
the distribution of high- (green), low- (blue), and no-activity (red)
proteins along with substrate specificities (M, d-mannonate;
G, d-gluconate; M/G, d-mannonate and d-gluconate).
Proteins for which structures were determined are marked with asterisks.
The Pro and Ala residues associated with different substrate specificities
for d-mannonate and d-gluconate are located in separate
clusters: clusters 1, 4, 5, 6, 8, 9, and 10 contain Pro; clusters
2 and 7 contain Ala; and cluster 3 contains both. Pro-containing clusters
exhibit low or no dehydration activity; Ala-containing clusters exhibit
high dehydration activity with d-mannonate.
SSN (e-value threshold of at 10–190)
showing
the distribution of high- (green), low- (blue), and no-activity (red)
proteins along with substrate specificities (M, d-mannonate;
G, d-gluconate; M/G, d-mannonate and d-gluconate).
Proteins for which structures were determined are marked with asterisks.
The Pro and Ala residues associated with different substrate specificities
for d-mannonate and d-gluconate are located in separate
clusters: clusters 1, 4, 5, 6, 8, 9, and 10 contain Pro; clusters
2 and 7 contain Ala; and cluster 3 contains both. Pro-containing clusters
exhibit low or no dehydration activity; Ala-containing clusters exhibit
high dehydration activity with d-mannonate.
Divergence in Activity
Members with different in vitro activities are
assumed to have different in vivo functions. Physiologically,
the dehydration of d-mannonate to 2-keto-3-deoxy-d-mannonate is found
in the d-glucuronate degradation pathway in which d-glucuronate is isomerized to 5-keto-d-mannonate and then
reduced to d-mannonate. The d-mannonate is dehydrated,
phosphorylated, and cleaved by an aldolase to form pyruvate and glyceraldehyde
3-phosphate. When this pathway was discovered in E. coli and Erwinia carotovora, dehydration of d-mannonate was found to be catalyzed by a dehydratase, UxuA, that
is not a member of the ENS.[28,29] Therefore, the discovery
that members of the ManD subgroup dehydrated-mannonate with
high catalytic efficiency implies convergent evolution of function
in different superfamilies within the d-glucuronate catabolic
pathway. Interestingly, the genomes of all of the organisms encoding
high-activity ManDs lack the gene encoding UxuA; however, the genomes
of the majority of organisms with low- or no-activity ManDs have a
gene encoding UxuA (Table 1). This suggests
that the high-activity ManDs perform the same role as UxuA in the
encoding organisms; however, the low-activity members have a different
metabolic function. In those organisms that encode low- or no-activity
members of the ManD subgroup but no UxuA, growth on d-glucuronate
likely is enabled by an alternate catabolic pathway, such as the uronate
dehydrogenase or KduI pathway.[30,31] Therefore, members
with low in vitro catalytic efficiencies likely have
different in vivo metabolic functions, even if they
can dehydrated-mannonate. In work that will be described
elsewhere, we are characterizing some of these divergent physiological
functions.
d-Gluconate Dehydration
Although, in retrospect,
the discovery of d-gluconate as a substrate for some members
is not surprising because d-gluconate and d-mannonate
are epimers at carbon-2 so they yield the same dehydration product,
this stereochemical difference requires that a base other than the
Tyr-Arg dyad in the “150–180s” loop abstract
the 2-proton from d-gluconate. To investigate which residue
could function as the d-gluconate specific base, d-mannonate and d-gluconate were modeled into the active
site of the member from Chromohalobacter salexigens (CsManD) (Uniprot ID Q1QT89) that dehydrates both d-mannonate and d-gluconate (PDB code 3BSM). This was accomplished
by superposing the structures of NaManD with d-mannonate in its active site (2QJM), Uniprot ID B5R541 with d-gluconate in its active site (3TWB), and unliganded CsManD
(3BSM) (Figure 4). On the basis of this comparison, we hypothesized
that the conserved His after the seventh β-strand is the base
in d-gluconatedehydration (His 315 in CsManD).
Figure 4
A superposition of a structure with d-mannonate bound
in the active site (2QJM, NaManD) with one with d-gluconate bound in the active site (3TWB; CsManD). In 2QJM, Tyr 161 is the general base that abstracts the 2-proton
and His 215 is the general acid that catalyzes the departure of the
3-OH group from d-mannonate. In 3TWB, His 315 is proposed
to be the general base that abstracts the 2-proton from d-gluconate or hydrogen bonds with the C5 hydroxyl of d-mannonate.
The ε-nitrogen of His 315 is 3.0 Å from the C5 hydroxyl
of d-mannonate and 3.1 Å from C2 of d-gluconate.
Both distances are appropriate for proton abstraction or hydrogen
bonding.
A superposition of a structure with d-mannonate bound
in the active site (2QJM, NaManD) with one with d-gluconate bound in the active site (3TWB; CsManD). In 2QJM, Tyr 161 is the general base that abstracts the 2-proton
and His 215 is the general acid that catalyzes the departure of the
3-OH group from d-mannonate. In 3TWB, His 315 is proposed
to be the general base that abstracts the 2-proton from d-gluconate or hydrogen bonds with the C5 hydroxyl of d-mannonate.
The ε-nitrogen of His 315 is 3.0 Å from the C5 hydroxyl
of d-mannonate and 3.1 Å from C2 of d-gluconate.
Both distances are appropriate for proton abstraction or hydrogen
bonding.Site-directed mutagenesis was
performed to convert His 315 in CsManD to either
Asn or Gln. His 315 is conserved in all
members of the ManD subgroup, including NaManD, because
it hydrogen bonds to the 5-hydroxyl group of d-mannonate
(Figure S2, Supporting Information).[12] Both mutants abolished dehydration activity
with d-gluconate. However, the H315Q mutant maintained wild-type
catalytic efficiency with d-mannonate (Table 2). In contrast, the H315N mutant was inactive with d-mannonate, presumably because it is not able to hydrogen bond to
the 5-hydroxyl group. These studies support the suggested role of
His 315 as the base for dehydration of d-gluconate.
Table 2
Kinetic Parameters for His 315 Mutants
of CsManDa
d-mannonate
d-gluconate
protein
WT
H315Q
H315N
WT
H315Q
H315N
kcat/KM (M–1 s–1)
10
10
NA
40
NA
NA
WT = wild type; NA = no activity.
WT = wild type; NA = no activity.
Structure Analysis
“New” crystal structures
were solved for 12 sequence diverse members of the subgroup; structures
previously were available for four other members. Taken together,
a total of 36 unliganded and liganded structures are now available
for members of the ManD subgroup (Table S3, Supporting
Information). These structures were used to identify the general
base that initiates dehydration of d-gluconate and also yielded
insights into how structure diverges as sequence diverges. An overlay
of one structure from each structure-containing cluster is shown in
Figure 5 (the overlay includes only structures
with ordered “150–180s” loops). The structures
of the modified TIM-barrel and capping domains are highly conserved,
although the conformations of the “150–180s”
loop are divergent. The sequences of this loop are also highly variable
(Figure S3, Supporting Information).
Figure 5
An overlay
of NaManD (blue), CsManD (tan),
Uniprot ID Q8FHC7 (green), Uniprot ID B5R541 (magenta),
Uniprot ID A5KUH4 (red), and Uniprot ID A6M2W4 (gray) showing the overall structural
homology. The “150–180s” loops are conformationally
distinct.
An overlay
of NaManD (blue), CsManD (tan),
Uniprot ID Q8FHC7 (green), Uniprot ID B5R541 (magenta),
Uniprot ID A5KUH4 (red), and Uniprot ID A6M2W4 (gray) showing the overall structural
homology. The “150–180s” loops are conformationally
distinct.Initially, the “150–180s”
loops were proposed
to contain the substrate specificity determinants in analogy with
the “20s” loops in other ENS members.[32] Of the 36 structures available (Figure
S1, Supporting Information), 9 have an ordered “150–180s”
loop covering the active site, 3 with a substrate bound. Seven of
the 11 structures with disordered “150–180s”
loops also have a substrate bound. The disorder of the loop, even
in the presence of substrate, suggests that the substrate and the
“150–180s” loop are not interacting strongly.
In the structures with a bound substrate and an ordered loop, the
only hydrogen bonds to the substrate involve ordered water molecules
and backbone amide groups. This suggests a role other than determining
substrate specificity for the “150–180s” loops.To determine the importance of the sequence of the “150–180s”
loops, segments of the loops in CsManD (A164 to E172)
and NaManD (V161 to E169) that are proximal to the
substrate were mutated to AGAGGAGAG (Figure 6) to eliminate any ionic or hydrogen-bonding interactions
with the substrate. Both mutants were expressed and purified as soluble
proteins; correct folding was verified via circular dichroism and
X-ray crystallography (Figures S4 and S5, Supporting
Information). Screening of both mutants with the acid sugar
library showed a complete loss of activity. We conclude that this
loop is important for catalysis, although its exact contribution is
unknown.
Figure 6
An overlay of CsManD (4F4R, blue ribbon, orange
loop) and NaManD (2QJJ, tan ribbon, green loop) showing
the regions of their “150–180s” loops that were
mutated. The sequences of the loops are given with the area of mutagenesis
highlighted with yellow.
An overlay of CsManD (4F4R, blue ribbon, orange
loop) and NaManD (2QJJ, tan ribbon, green loop) showing
the regions of their “150–180s” loops that were
mutated. The sequences of the loops are given with the area of mutagenesis
highlighted with yellow.
Structural Differences between d-Mannonate/d-Gluconate
Specific ManDs
The structures of proteins with
divergent functions are very similar. A superposition of the structures
of a high-activity ManD (NaManD), a low activity
ManD (Uniprot ID Q8FHC7), a low-activity GlcD (Uniprot ID B5R541), and a
member with no activity (Unitprot ID A5KUH4) reveals that the identities and
positions of the active site residues are conserved (Figure 7). In addition, the residues within 6 Å of
the substrate are conserved. The primary site of divergence in the
active site is a Pro/Ala substitution two residues after the conserved
His at the end of the seventh β-strand that hydrogen bonds to
the 5-hydroxyl group of the substrate in the ManD reaction or is the
general base in the GlcD reaction (Figure 4). When this substitution is mapped onto the SSN, low- and no-activity
proteins possess Pro at this position, but high-activity ManDs possess
an Ala (Figure 3). The single exception is
cluster 3, which is recently separated from cluster 2 (i.e., connected
at an e-value of 10–184). Cluster 2 includes high-activity
ManDs that contain Ala; cluster 3 contains low-activity proteins that
contain either Ala or Pro. Members that contain Pro dehydrate both d-mannonate and d-gluconate, but members with Ala dehydrate
only d-mannonate. Two of the 8 high-activity ManDs also dehydrated-gluconate, but with very low catalytic efficiency. We hypothesize
that the proteins in cluster 3 may be intermediates in the evolution
of the GlcD function.
Figure 7
An overlay of the active sites of NaManD
(2QJJ,
high-activity, d-mannonate specific - red), CsManD (4F4R, low-activity, promiscuous for d-mannonate/d-gluconate - blue), Uniprot ID D4GJ14 (3T6C, low-activity, d-gluconate specific - green), and Uniprot ID A4W7D6 (3TJI, no-activity
- magenta). The metal binding and acid/base residues are superimposable.
The Pro/Ala dimorphism is also shown. The ligands are d-mannonate
from 2QJM (red) and d-gluconate from 3T6C (green).
An overlay of the active sites of NaManD
(2QJJ,
high-activity, d-mannonate specific - red), CsManD (4F4R, low-activity, promiscuous for d-mannonate/d-gluconate - blue), Uniprot ID D4GJ14 (3T6C, low-activity, d-gluconate specific - green), and Uniprot ID A4W7D6 (3TJI, no-activity
- magenta). The metal binding and acid/base residues are superimposable.
The Pro/Ala dimorphism is also shown. The ligands are d-mannonate
from 2QJM (red) and d-gluconate from 3T6C (green).This hypothesis was investigated
by constructing the NaManD A314P and CsManD P317A mutants. The A314P mutant
of NaManD maintained its specificity for d-mannonate, but with a reduced catalytic efficiency (3200 M–1 s–1 to 60 M–1 s–1). The P317A mutant of CsManD maintained low-activity
on d-mannonate and d-gluconate; the catalytic efficiency
for d-mannonate is increased (from 5 M–1 s–1 to 100 M–1 s–1), but that for d-gluconate is somewhat decreased (from
40 M–1 s–1 to 5 M–1 s–1). Thus, we conclude that (1) Ala favors d-mannonatedehydration and disfavors d-gluconatedehydration;
and (2) Pro disfavors d-mannonatedehydration. A restriction
of backbone flexibility may explain the observed change in substrate
specificity and catalytic efficiency, with the flexibility associated
with Ala allowing hydrogen-bonding to the substrate and the rigidity
associated with Pro promoting general-base catalysis.
Proteins with
No Activity
The genes encoding many inactive
members of the ManD subgroup are neighbors of genes encoding 2-keto-3-deoxy-d-gluconate kinase and 2-keto-3-deoxy-d-gluconate-6-phosphate
aldolase that are downstream of ManD in the d-glucuronate
catabolic pathway (Figure S6, Supporting Information). However, they have no activity with d-mannonate or the
three carbon-2 or -3 epimers that would be dehydrated to 2-keto-3-deoxy-d-mannonate (d-altronate, d-allonate, or d-gluconate). Variations in pH, temperature, salt concentration,
osmolarity, osmolytes, and metal cations as well as the presence of
dithiothreitol and nucleotide mono-, di-, and triphosphates were tested
to determine whether these enzymes may be subject to unanticipated
regulation; however, no changes in activity were observed. d-Mannonate 6-phosphate and d-gluconate 6-phosphate also
were tested, but no activity was observed. Perhaps these proteins
function in multienzyme complexes (channeling) or utilize an acid
sugar that is not present in our library.
Conclusions
The
ManD subgroup of the ENS is an excellent example of homologous
proteins that have divergent catalytic efficiencies and substrate
specificities. Unexpectedly, we discovered that the ManD subgroup
is not isofunctional: in addition to the ManDs, some members catalyze
the dehydration of d-gluconate, and others are promiscuous
for dehydration of both d-mannonate and d-gluconate.
Clearly, automated methods would provide misleading/incorrect annotations
that would be of limited/no value in deducing their metabolic functions.
We have also determined that the structural determinations of substrate
specificity are both indirect and subtle: a Pro/Ala substitution appears
to be the major determination of specificity.In addition, the
role of the sequence divergent and conformationally
flexible “150–180s” loop is uncertain. The side
chains of the loop makes no direct contacts with the substrate, so
the loop does not appear to be a determinant of substrate specificity
as has been well-established for members of the muconate lactonizing
enzyme (MLE) and mandelate racemase (MR) subgroups in the ENS. Although
we attempted to determine whether the loop is involved in protein–protein
interactions/substrate channeling, we could not obtain any conclusive
results.We also discovered that the catalytic efficiencies
of members of
the subgroup are highly variable, despite conservation of active site
residues and structures. These data may provide important insights
for the metabolic engineering community. The data illustrates how
closely related sequences can perform different reactions, and therefore,
may help guide studies which aim to redesign proteins for use in new
pathways.Enzymological dogma has been that enzymes have evolved
to achieve
catalytic perfection; i.e., the reactions are diffusion-controlled. In vitro experiments alone can and will not provide biological
insights into why the values of kcat/KM for some of the members of the subgroup are
“low”. We have selected several of these for future
biological/metabolic characterization so that we might be able to
better understand the relationship between catalytic efficiencies
and metabolic requirements.
Authors: Airlie J McCoy; Ralf W Grosse-Kunstleve; Paul D Adams; Martyn D Winn; Laurent C Storoni; Randy J Read Journal: J Appl Crystallogr Date: 2007-07-13 Impact factor: 3.304
Authors: John A Gerlt; Jason T Bouvier; Daniel B Davidson; Heidi J Imker; Boris Sadkhin; David R Slater; Katie L Whalen Journal: Biochim Biophys Acta Date: 2015-04-18
Authors: Daniel J Wichelecki; Matthew W Vetting; Liyushang Chou; Nawar Al-Obaidi; Jason T Bouvier; Steven C Almo; John A Gerlt Journal: J Biol Chem Date: 2015-10-15 Impact factor: 5.157
Authors: Daniel J Wichelecki; Jean Alyxa Ferolin Vendiola; Amy M Jones; Nawar Al-Obaidi; Steven C Almo; John A Gerlt Journal: Biochemistry Date: 2014-08-27 Impact factor: 3.162
Authors: Matthew W Vetting; Nawar Al-Obaidi; Suwen Zhao; Brian San Francisco; Jungwook Kim; Daniel J Wichelecki; Jason T Bouvier; Jose O Solbiati; Hoan Vu; Xinshuai Zhang; Dmitry A Rodionov; James D Love; Brandan S Hillerich; Ronald D Seidel; Ronald J Quinn; Andrei L Osterman; John E Cronan; Matthew P Jacobson; John A Gerlt; Steven C Almo Journal: Biochemistry Date: 2015-01-16 Impact factor: 3.162
Authors: Daniel J Wichelecki; D Sean Froese; Jolanta Kopec; Joao R C Muniz; Wyatt W Yue; John A Gerlt Journal: Biochemistry Date: 2014-04-15 Impact factor: 3.162
Authors: Sara Calhoun; Magdalena Korczynska; Daniel J Wichelecki; Brian San Francisco; Suwen Zhao; Dmitry A Rodionov; Matthew W Vetting; Nawar F Al-Obaidi; Henry Lin; Matthew J O'Meara; David A Scott; John H Morris; Daniel Russel; Steven C Almo; Andrei L Osterman; John A Gerlt; Matthew P Jacobson; Brian K Shoichet; Andrej Sali Journal: Elife Date: 2018-01-29 Impact factor: 8.140