The recognition of carbohydrates by proteins is a fundamental aspect of communication within and between living cells. Understanding the molecular basis of carbohydrate-protein interactions is a prerequisite for the rational design of synthetic ligands. Here we report the high- to ultra-high-resolution crystal structures of the carbohydrate recognition domain of galectin-3 (Gal3C) in the ligand-free state (1.08 Å at 100 K, 1.25 Å at 298 K) and in complex with lactose (0.86 Å) or glycerol (0.9 Å). These structures reveal striking similarities in the positions of water and carbohydrate oxygen atoms in all three states, indicating that the binding site of Gal3C is preorganized to coordinate oxygen atoms in an arrangement that is nearly optimal for the recognition of β-galactosides. Deuterium nuclear magnetic resonance (NMR) relaxation dispersion experiments and molecular dynamics simulations demonstrate that all water molecules in the lactose-binding site exchange with bulk water on a time scale of nanoseconds or shorter. Nevertheless, molecular dynamics simulations identify transient water binding at sites that agree well with those observed by crystallography, indicating that the energy landscape of the binding site is maintained in solution. All heavy atoms of glycerol are positioned like the corresponding atoms of lactose in the Gal3C complexes. However, binding of glycerol to Gal3C is insignificant in solution at room temperature, as monitored by NMR spectroscopy or isothermal titration calorimetry under conditions where lactose binding is readily detected. These observations make a case for protein cryo-crystallography as a valuable screening method in fragment-based drug discovery and further suggest that identification of water sites might inform inhibitor design.
The recognition of carbohydrates by proteins is a fundamental aspect of communication within and between living cells. Understanding the molecular basis of carbohydrate-protein interactions is a prerequisite for the rational design of synthetic ligands. Here we report the high- to ultra-high-resolution crystal structures of the carbohydrate recognition domain of galectin-3 (Gal3C) in the ligand-free state (1.08 Å at 100 K, 1.25 Å at 298 K) and in complex with lactose (0.86 Å) or glycerol (0.9 Å). These structures reveal striking similarities in the positions of water and carbohydrateoxygen atoms in all three states, indicating that the binding site of Gal3C is preorganized to coordinate oxygen atoms in an arrangement that is nearly optimal for the recognition of β-galactosides. Deuterium nuclear magnetic resonance (NMR) relaxation dispersion experiments and molecular dynamics simulations demonstrate that all water molecules in the lactose-binding site exchange with bulk water on a time scale of nanoseconds or shorter. Nevertheless, molecular dynamics simulations identify transient water binding at sites that agree well with those observed by crystallography, indicating that the energy landscape of the binding site is maintained in solution. All heavy atoms of glycerol are positioned like the corresponding atoms of lactose in the Gal3C complexes. However, binding of glycerol to Gal3C is insignificant in solution at room temperature, as monitored by NMR spectroscopy or isothermal titration calorimetry under conditions where lactose binding is readily detected. These observations make a case for protein cryo-crystallography as a valuable screening method in fragment-based drug discovery and further suggest that identification of water sites might inform inhibitor design.
Interactions between carbohydrates and proteins mediate numerous important biological functions, such as signal transduction, cell adhesion, host–pathogen recognition, and the immune response.[1−4] Carbohydrate-recognizing proteins are involved in a number of human disease states, including inflammation and cancer. These key functional properties have stimulated significant efforts in drug design targeting carbohydrate-binding proteins.[5−8] Carbohydrate–protein interactions are typically relatively weak, with dissociation constants of ∼1 mM. Binding affinity is typically driven by a favorable enthalpic component[9−12] that is partly offset by negative entropy. The low affinity is related to the inherent properties of saccharides, such as their lack of charges and lack of extended hydrophobic surfaces, which both reduce the likelihood of forming strong interactions with proteins. Instead, the formation of carbohydrate–protein complexes involves relatively weak van der Waals interactions and hydrogen bonds to the carbohydrate hydroxyl groups, acetamides, and ring and glycosidic oxygens. As a result of these specific properties of carbohydrate–protein interactions, the design of high-affinity inhibitors has proven to be challenging,[6−8] and the relative importance of the various types of interactions in driving carbohydrate recognition is a matter of intense research.[13−18] During the ligand binding process, significant solvent reorganization takes place across the contact surface.[19−22] Thus, considerable attention has been paid to the structure and dynamics of water molecules in carbohydrate-binding sites and to the role these play in mediating carbohydrate recognition by proteins.[20,23−26] Nonetheless, our current understanding of the thermodynamics and kinetics of the solvent reorganization process at the microscopic level remains incomplete.[27] A full understanding of carbohydrate recognition benefits strongly from atomic-resolution descriptions of both the liganded and unliganded states.Galectins are small soluble proteins that constitute a family of mammalian lectins, defined by a carbohydrate recognition domain (CRD) with a conserved sequence motif that confers affinity for β-galactoside-containing glycans.[28] Galectins have several important functions in both carbohydrate-dependent extracellular and carbohydrate-independent intracellular activities.[28−33] Even though galectins are synthesized in and primarily remain in the cytosol, they reach the extracellular space or lumen of vesicles[9,34] by a nonclassical secretory pathway[35,36] and can then take part in regulation of cellular trafficking of glycoproteins, signaling, and cell adhesion.[31,37] A growing body of evidence links galectins to important roles in cell growth, cell differentiation, cell cycle regulation, and apoptosis, making them potential pharmaceutical targets in inflammation, immunity, and cancer.[29,38−41] Thus, it is critical to understand the molecular driving forces for the ligand binding specificity of galectins.Structures of several galectins in complex with natural and designed ligands are known, and it has been seen that galectins bind oligosaccharides in a conserved recognition mode involving a network of hydrogen bonds and several bound waters, which form bridges between hydrogen-bonding partners in the protein–ligand complex.[16,25,42,43] As is typical for lectins, galectins bind the monosaccharidegalactose with dissociation constants in the millimolar range, in their conserved binding site. However, sugars adjacent to the galactose may interact with neighboring sites to provide stepwise boosts in affinity.[28,44] Thus, for galectin-3 addition of glucose, as in lactose, enhances affinity 50-fold (Kd ∼ 0.2 mM), and additional saccharides at position 3 of the galactose can give affinities in the low micromolar range. Moreover, non-natural derivatives at this position may enhance the affinity to the nanomolar range, e.g., by exploiting cation−π interactions with the surface residue Arg144.[16,45] Furthermore, recent work shows that triazole derivatization at C3 of galactose results in a high affinity, similar to that observed for aromatic amido compounds.[46,47] Several studies have highlighted the significance of high-affinity and selective galectin inhibitors that act intracellularly, with potential use in modulating inflammatory processes and cancer growth.[30,48−51] To develop effective approaches for the structure-based design of potent galectin-3 inhibitors, it is important to understand the detailed molecular basis for carbohydrate recognition, based on the three-dimensional structure and physicochemical properties of the conserved binding motif. High-resolution structural information greatly aids in this respect. Here we report a study that combines X-ray crystallography, NMR spectroscopy, molecular dynamics (MD) simulations, and isothermal titration calorimetry (ITC) to probe the role of water molecules in the binding site and the details of the hydrogen-bonding networks at the protein–ligand interface of Gal3C. Our results highlight the role of an oxygen coordination framework in the extended carbohydrate-binding site of galectin-3 that potentially can be further exploited in future drug design efforts.
Materials and Methods
Protein Expression and Purification
The galectin-3carbohydrate recognition domain (Gal3C; amino acid residues 113–250) was expressed and purified either as a thioredoxin fusion construct[52] or as isolated Gal3C.[43] In the first case, DNA encoding amino acids 113–250 was amplified via polymerase chain reaction from galectin-3 in pET3C[53] and cloned into the pET-32 Ek/LIC vector (Novagen, Madison, WI) according to the manufacturer’s instructions, and as described for galectin-8 previously.[44] In the second case, DNA encoding amino acids 113–250 was inserted into pET 3C without additional tags. The expression protocol for the two constructs is identical and has been reported previously.[52] The purification protocol for the isolated Gal3C domain is highly similar to that reported previously,[52,53] except that the final separation step after cleavage of the expressed product using a lactosyl-Sepharose column is not needed. Typical yields were 150 mg/L of culture of isolated Gal3C. The absence of lactose from the apo-Gal3C samples (used to determine the crystal structures of apo-Gal3C and glycerol-bound Gal3C) was verified using NMR spectroscopy.
Crystallization and Structure Determination
All Gal3C crystals were obtained using the hanging drop vapor diffusion method in NeXtal plates (Qiagen). The Gal3C–lactose complex was obtained by mixing 9.5 μL of protein solution [20 mg/mL Gal3C, 10 mM β-mercaptoethanol, 100 mM lactose, 10 mM sodium phosphate buffer (pH 7.5), 100 mM NaCl, and 0.02% NaN3] with 0.5 μL of 100 mM 3′-benzamido-N-acetyllactosamine on ice. After being incubated for 1 h, the solution was centrifuged at 7000 rpm for 10 min at 4 °C, and hanging drops were set up using 4 μL drops containing equal volumes of a protein solution and a reservoir solution [30% (w/v) PEG 4000, 0.1 M Tris-HCl (pH 7.5), 0.1 M MgCl2, 0.4 M NaSCN, and 8 mM β-mercaptoethanol]. Immediately after setup, 0.3 μL of a seed bead solution (Hampton Research protocol) was added to the drop. The seed bead solution was made from lactose-containing Gal3C crystals crushed in stabilizing solution [1 mM lactose, 29% PEG 4000, 0.1 M Tris-HCl (pH 7.5), 0.1 M MgCl2, 0.4 M NaSCN, and 8 mM β-mercaptoethanol]. The largest crystals grew to dimensions of 0.4 mm × 0.4 mm × 0.5 mm within 1 month. This was an attempt to replace lactose with 3′-benzamido-N-acetyllactosamine, which failed because even if the affinity of lactose is lower (Kd = 231 μM) than that of 3′-benzamido-N-acetyllactosamine (Kd = 18.2 μM) for Gal3C,[45] the lactose concentration was 19-fold higher (95 mM vs 5 mM). The complex of 3′-benzamido-N-acetyllactosamine and Gal3C has been determined using the same crystallization conditions but without lactose present.[45]Apo-Gal3C crystals were obtained from 4 μL drops containing equal volumes of a protein solution [19 mg/mL Gal3C, 10 mM sodium phosphate buffer (pH 7.5), 100 mM NaCl, 10 mM β-mercaptoethanol, and 0.02% NaN3] and a reservoir solution [30% (w/v) PEG 4000, 0.1 M Tris-HCl (pH 7.5 or 8.0), 0.1 M MgCl2, 0.4 M NaSCN, and 8 mM β-mercaptoethanol]. Apo crystals appeared overnight and grew within a few days to dimensions of 0.1 mm × 0.1 mm × 0.2 mm.The Gal3C–lactose crystal was flash-cooled to 100 K in the cold N2 gas stream of a Cryojet (Oxford Diffraction) using a cryo solution consisting of 15% (v/v) glycerol, 25.5% (w/v) PEG 4000, 0.25 M NaSCN, 85 mM Tris-HCl (pH 7.5), 85 mM MgCl2, 7 mM β-mercaptoethanol, and 4 mM 3′-benzamido-N-acetyllactosamine. The same cryoprotectant but without 3′-benzamido-N-acetyllactosamine was used to cryoprotect the Gal3C–glycerol complex crystals, whereas 15% (v/v) PEG 400 [instead of 15% (v/v) glycerol] was used for apo-Gal3C. For data collection at room temperature, the crystal was mounted using the MicroRT kit (MiTeGen). The X-ray diffraction data for all the crystals were collected on a 165 mm marResearch CCD detector on beamline I911-5 of the MAX-II synchrotron in Lund, Sweden. Diffraction data were integrated and scaled using XDS and XSCALE.[54] Data statistics are listed in Table 1.
Table 1
Crystallographic Data Collection and Refinement Statisticsa
lactose
glycerol
apo (100 K)
apo (room temperature)
Data Collection
wavelength (Å)
0.9078
0.9078
0.9078
0.9080
resolution (Å)
30.0–0.86(0.88–0.86)
30.0–0.90(0.92–0.90)
30.0–1.08(1.11–1.08)
30.0–1.25(1.28–1.25)
space group
P212121
P212121
P212121
P212121
unit cell parameters (a, b, c) (Å)
37.8, 58.3, 63.1
35.8, 58.2, 62.5
35.7, 58.3, 62.8
36.5, 58.2, 63.7
no. of measured reflections
680438
549501
319133
294647
no. of unique reflections
111079
93342
56940
37798
completeness (%)
98.9 (95.4)
95.8 (87.3)
99.9 (99.9)
98.6 (97.7)
multiplicity
6.1 (4.0)
5.9 (4.6)
5.6 (4.7)
7.8 (5.7)
Rmerge (%)b
5.0 (85.3)
3.8 (79.2)
7.0 (68.2)
5.3 (85.2)
⟨I/σ(I)⟩
19.3 (1.8)
20.4 (2.1)
11.6 (2.4)
18.7 (2.5)
Refinement
resolution limits (Å)
20.0–0.86
20.0–0.90
20.0–1.08
20.0–1.25
Rmodel (%)c
12.7
13.2
14.9
11.9
Rfree (%)d
14.2
15.0
18.2
16.6
no. of waters
275
262
320
168
rmsd from ideal values
bond lengths (Å)e
0.017
0.015
0.13
0.015
angle distances (Å)f
0.042
0.032
0.032
0.033
average B factor (Å2) (protein/solvent/ligand)
11.5/25.8/18.0
12.7/24.8/15.1
15.3/31.9/na
16.6/42.8/na
average anisotropy for all atoms (protein/solvent/ligand)
0.35/0.30/0.32
0.36/0.34/0.35
0.40/0.33/na
0.38/0.41/na
data/parameter ratio
7.4
6.6
3.6
2.9
Ramachandran plotg
residues in most favored regions
98.5% (134/136)
98.5% (134/136)
97.1% (132/136)
97.8% (133/136)
Values in parentheses correspond to the highest-resolution shell. The resolution limit was taken to be the point at which the ⟨I/σ(I)⟩ ratio was approximately equal to 2. na means not available
Rmerge = ∑∑|I(hkl) – I(hkl)|/∑∑I(hkl).
Rmodel = ∑|Fo(hkl) – Fc(hkl)|/∑|Fo(hkl)|, where Fo and Fc are the observed and calculated structure factors, respectively.
A 5% random test set.
Calculated from DFIX restraints in SHELXL.
Calculated from DANG restraints in SHELXL.
Calculated using Molprobity.[79]
Values in parentheses correspond to the highest-resolution shell. The resolution limit was taken to be the point at which the ⟨I/σ(I)⟩ ratio was approximately equal to 2. na means not availableRmerge = ∑∑|I(hkl) – I(hkl)|/∑∑I(hkl).Rmodel = ∑|Fo(hkl) – Fc(hkl)|/∑|Fo(hkl)|, where Fo and Fc are the observed and calculated structure factors, respectively.A 5% random test set.Calculated from DFIX restraints in SHELXL.Calculated from DANG restraints in SHELXL.Calculated using Molprobity.[79]The structure of the lactose–Gal3C complex was determined by rigid-body refinement using the structure of another Gal3C–ligand complex as the initial model[45] and Refmac5,[55] as implemented in the CCP4 suite.[56] The apo and glycerol–Gal3C structures were determined in a similar manner, with the lactose–Gal3C structure minus the ligand serving as a starting model. In the initial stages, all refinement was conducted to a resolution of 1.4 Å using Refmac5, with 5% of the total reflections randomly set aside for cross validation. Subsequently, the resolution was gradually extended to the full resolution range, and refinement was conducted using SHELXL-97.[57] Manual model building was conducted using Coot.[58] The structure of lactose and the accompanying refinement restraints were generated using the CCP4 program Monomer Library Sketcher. In the final stages of refinement, many of the hydrogen atoms were visible in difference electron density maps and were added at calculated positions using the SHELXL riding model. Alternating steps of anisotropic refinement and minor structure adjustment were performed until convergence. Refinement statistics are listed in Table 1. Molecular images were generated using PyMOL.[59]
High-Resolution Protein NMR Experiments
Lactose-bound Gal3C and apo-Gal3C samples were prepared as reported previously,[52] using proteins expressed in [15N,1-13C1]glucose-containing minimal medium.[60,61]1H–15N HSQC spectra were acquired with increasing concentrations of glycerol on a sample of Gal3C that was initially in the apo state at 0.46 mM. Nine additions of 10 μL of 25 mM glycerol in NMR buffer were made, yielding a final glycerol concentration of 5.9 mM, corresponding to 16 equiv with respect to Gal3C. The 1H and 15N spectral widths were 8000 and 1825 Hz, respectively. The static magnetic field strength was 14.1 T, and the temperature was 301 K.The tautomeric state of histidines was monitored by 1H–15N HMQC and 1H–13C HSQC spectra, centered at 200 and 128 ppm in the 15N and 13C dimensions, respectively, and covering spectral widths of 180 and 28 ppm, respectively. The magnetization transfer delay for 1H–15N HMQC was set to 22.2 ms, to refocus magnetization arising from 1JNH couplings. The static magnetic field strength was 11.7 T, and the temperature was 298 K.
Low-field 2H Relaxation Dispersion Experiments
Lactose-bound Gal3C and apo-Gal3C samples were prepared as reported previously,[52] except that the solvent contained 50% 2H2O and the lactose concentration in the former sample was 20 mM. The longitudinal relaxation rate R1 of the water2H magnetization was measured with an accuracy of ∼0.5–1.0% at seven different frequencies, using conventional cryomagnets for the higher frequencies and an iron core magnet (Drusch EAR-35N) interfaced with a Tecmag Discovery console for the lower frequencies. The sample temperature was regulated to 299.8 ± 0.1 K. The relaxation rate was measured at each frequency for both apo-Gal3C and lactose-bound Gal3C, as well as for two matching reference solutions that did not contain Gal3C.The relaxation dispersion is given bywhere the spectral density is modeled asin which α describes a frequency-independent contribution from water molecules that have rotational correlation times significantly longer than those in bulk water, but shorter than 1 ns, due to interactions with the external protein surface. β describes the frequency-dependent contribution from water molecules with longer correlation times due to interactions with internal sites in the protein and is given bywhere ωQ is the 2H quadrupole coupling frequency, NT is the known ratio of water and protein molecules in the sample, Nβ is the number of water molecules with a τ of >1 ns, and Sβ is the orientational order parameter of bound waters.
Isothermal Titration Calorimetry
Experiments were performed as described previously,[45] in two series of 30 injections of 10 μL of either 58 or 117 mM glycerol (first injection of 5 μL) in 5 mM HEPES buffer (pH 7.4). The concentration of Gal3C was 0.1 mM.
Molecular Dynamics Simulations and Analysis
MD simulations were performed as described previously,[62] starting from the 100 K apo-Gal3C crystal structure reported here. Ten independent simulations were used, each 5 ns long. Snapshots were saved every picosecond for analysis.Water occupancy was investigated by the clustering approach of Friesner and co-workers.[63,64] In each snapshot, wateroxygen atoms within the active site were stored for clustering. The active site was defined by fitting each MD snapshot to the lactose–Gal3C structure by superimposing the backbone atoms of all residues within 10 Å of lactose. The active site was then defined as the maximal extent of the lactose molecule plus 1 Å in each direction. Subsequently, the wateroxygen atoms within the active site were clustered by an iterative approach. In each iteration, the number of wateroxygen atoms in all snapshots within 1 Å of a particular wateroxygen atom in a given snapshot was counted and the water molecule with the largest number was marked as a water site. That water molecule and all of its neighbors within 1 Å were removed from the list, and the process was repeated until the number of neighbors of a water site was lower than in bulk water.The clustering procedure was applied to the 10 independent simulations, and the water sites identified in each individual simulation were clustered using a single-linkage hierarchical approach.[65] Constraints were imposed such that two water sites identified from the same simulation could not be in the same cluster. When the minimal distance between two clusters was larger than 1 Å, the clustering was stopped. Euclidian distances between the water sites were used to determine their closeness. For each cluster of water sites, the population number and the maximal spatial extent were calculated. The occupancy is the fraction of the 10 independent simulations in which the water site was found. The maximal spatial extent is the maximal distance between two water molecules in the same cluster and indicates the precision of the position of the water site. The geometric centers of the water site clusters are reported.
Results and Discussion
We have determined the crystal structures of the lactose-bound [Protein Data Bank (PDB) entry 3ZSJ], glycerol-bound (PDB entry 3ZSK), and ligand-free (apo; PDB entry 3ZSL at 100 K and PDB entry 3ZSM at room temperature) states of Gal3C at atomic or near-atomic resolution (Table 1). Overall, these structures are highly isomorphous to each other (Figure 1A and Table S1 of the Supporting Information) and served as a basis for further investigations of oxygen recognition within the binding site of Gal3C. To this end, we performed MD simulations of apo-Gal3C to determine the population density and residence times of water molecules in the binding site. Furthermore, we conducted nuclear magnetic relaxation dispersion measurements to experimentally verify the residence times of water molecules bound to Gal3C. Finally, ITC and chemical shift mapping by NMR spectroscopy were used to study whether glycerol binds to Gal3C in solution under ambient conditions.
Figure 1
Atomic-resolution structures of Gal3C, in the presence of lactose (green) or glycerol (pink) and in the apo form (blue). (A) Gal3C displays an identical conformation in all complexes, and the binding sites are highly comparable. Lactose and glycerol molecules are shown as green and pink sticks, respectively. Key side chains coordinating the ligands are indicated. (B) The carbohydrate-binding site of the lactose–Gal3C complex at 0.86 Å resolution reveals water molecules W1–W5 (dark green spheres) that mediate hydrogen bonding between lactose and Gal3C. Additional water molecules that coordinate lactose, but not the protein, are shown as light green spheres. Hydrogen bonds are represented as dotted lines. (C) Electron density in the lactose-binding site, identifying important hydrogen atoms. The 2Fo – Fc electron density map contoured at 1.5σ (gray mesh) and the Fo – Fc map contoured at 2.0σ (pink) were obtained after removal of H atoms followed by 10 refinement cycles to remove bias. The Fo – Fc map was also contoured at a lower level of 1.6σ to visualize the H density around C6 and O6 of lactose.
Atomic-resolution structures of Gal3C, in the presence of lactose (green) or glycerol (pink) and in the apo form (blue). (A) Gal3C displays an identical conformation in all complexes, and the binding sites are highly comparable. Lactose and glycerol molecules are shown as green and pink sticks, respectively. Key side chains coordinating the ligands are indicated. (B) The carbohydrate-binding site of the lactose–Gal3C complex at 0.86 Å resolution reveals water molecules W1–W5 (dark green spheres) that mediate hydrogen bonding between lactose and Gal3C. Additional water molecules that coordinate lactose, but not the protein, are shown as light green spheres. Hydrogen bonds are represented as dotted lines. (C) Electron density in the lactose-binding site, identifying important hydrogen atoms. The 2Fo – Fc electron density map contoured at 1.5σ (gray mesh) and the Fo – Fc map contoured at 2.0σ (pink) were obtained after removal of H atoms followed by 10 refinement cycles to remove bias. The Fo – Fc map was also contoured at a lower level of 1.6σ to visualize the H density around C6 and O6 of lactose.Crystal structures similar to those reported here have been published previously,[66] but these were determined at lower resolution (1.35–2.45 Å) from samples with mixed populations of the glycerol-bound, lactose-bound, and apo states. Both of these factors may compromise the interpretation of the observed electron density. Our data mitigate such complications, because the diffraction experiments were conducted with crystals of well-defined samples with only a single state present. Thus, these structures form a solid basis for investigations of water and ligand coordination within the carbohydrate-binding site and for detailed comparisons with results from MD simulations and NMR spectroscopy.
The Structure of Lactose-Bound Gal3C Reveals Water Molecules and Hydrogen Atoms
The 0.86 Å resolution structure of Gal3C in complex with lactose yields a highly detailed description of the lactose-binding site (Figure 1B,C), including the positions of many hydrogen atoms (see further below). The conformation of lactose is identical to that in the previously reported crystal structures of lactose- and N-acetyllactosamine (LacNAc)-bound Gal3C at lower resolution.[16,43,66] As observed previously,[66] the electron density clearly reveals the presence of both α and β anomers of lactose with equal occupancy (Figure S1A of the Supporting Information). However, the quality and the detail of the electron density map were markedly improved at this resolution compared to those of previous structures. In particular, we can identify a significantly increased number of water molecules, 289 (of which 14 have partial occupancy), compared to the 197 seen in the previously reported structure at 1.35 Å.[66] In our structures, we have maintained a consistent numbering system for the water molecules, such that wherever possible an experimentally observed water molecule at a given position has the same number in all structures. Details are provided in Table S2 of the Supporting Information.The binding site includes 11 water molecules. Five of these (W1–W5) are conserved in all available lactose- and LacNAc-bound Gal3C structures.[16,43,66] W2–W4 make important bridging hydrogen bonds between lactose and Gal3C residues Arg144, Asn160, Glu165, and Glu184 (Figure 1B). In addition, six water molecules (W6–W11) that have not previously been seen were identified. These coordinate lactose through hydrogen bonds involving O1–O3 of the galactose moiety and O1′, O2′, and O6′ of the glucose moiety (Figure 1B) but are loosely bound and do not contact any protein atoms.The ultra-high-resolution structure clearly reveals the positions of many hydrogen atoms (Figure 1C and Figure S1B of the Supporting Information). A total of 497 positive peaks above 2σ were experimentally observed at ideal hydrogen positions using the Fo – Fc difference omit map, in which all hydrogen atoms were removed from the model. These peaks represent 45% of the theoretical number and are mostly located close to the backbone N and Cα atoms (corresponding to 70 and 85% of all HN and Hα atoms, respectively). One hundred twenty intramolecular N–H···O hydrogen bonds were predicted (excluding those to water molecules) based on the positions of riding hydrogen atoms, using HBPLUS with a maximal D–H···A distance of 3.5 Å and a minimal D–H···A angle of 90° as criteria.[67] The H atom omit map experimentally confirms the presence of 80 hydrogen bonds, comprising 63 of 80 possible main chain–main chain, nine of 18 possible side chain–main chain, and eight of 18 side chain–side chain interactions. Within the binding site, 49% of the possible H atoms were experimentally observed. These primarily belong to the main chain (80% visible), whereas the side chain hydrogens are less well determined (40% visible).Because the ligand binding environment is a crucial factor to consider in the design of synthetic ligands, a detailed analysis of the hydrogen bonding pattern in the binding site is of particular interest. Direct identification of the hydrogen atoms improves the description of the hydrogen bonding patterns for key residues. Hydrogens are visible on several conserved amino acid residues of the binding site (Figure 1C). In addition, several hydrogens of lactose are observed, which allow the direct assignment of individual hydroxyl groups as either hydrogen bond donors or acceptors. For example, the electron density clearly defines the tautomeric state of His158, showing that Nδ1 carries the hydrogen (Figure 1C). Further, Nε2 is seen to accept a hydrogen bond from the galactose O4 hydroxyl group, whose H atom is observed. The tautomer assignment of His158 is verified by the 13C and 15N chemical shifts and cross-peak patterns in the 1H–13C and 1H–15N correlation spectra of the imidazole ring (Figure S2A of the Supporting Information), which provide unequivocal evidence that Nδ1 is protonated.[68] Two Hη atoms of Arg162 are observed to coordinate O3′ of the glucose moiety, which is the only glucose atom involved in a direct interaction with the protein (Figure 1C).Hydrogens were also identified on the galactosehydroxyl groups O2 and O3, which interact with surface water molecules (W1, W4, and W7–W9). Aliphatic H atoms in four of the lactose CH groups (C1, C3, C5, and C4′) show good electron density (Figure 1C), whereas those on C3 and C5 of galactose interact with the π-electron system of Trp181.
Structure of Glycerol-Bound Gal3C
It is of interest to compare the binding modes of lactose and glycerol in a complex with Gal3C, because glycerol can be viewed as a fragment of lactose. As a result of a quick soak of apo-Gal3C crystals with glycerol, the carbohydrate-binding site of Gal3C reveals electron density at 0.9 Å resolution that can be fitted with two glycerol molecules. In one of these, the oxygen and carbon atoms of glycerol are positioned identically to galactose atoms O4–O6 and C4–C6. The second glycerol molecule, modeled with 50% occupancy, was found in the same position as the O1, O3′, O5′, and C3′–C5′ atoms of glucose. This second glycerol molecule was not seen at 1.35 Å.[66] In addition to glycerol, the electron density can be fitted with six water molecules (W12–W17), five of which have partial occupancy (Figure 2A). W1–W5, present in the lactose- and LacNAc-bound structures, are also present in glycerol-bound Gal3C. The water- and glycerol-binding sites partially overlap, indicating that the electron density reports on the ensemble average of different glycerol binding modes throughout the crystal. A third partially occupied glycerol molecule might be invoked (not shown) to explain the electron density (partly assigned to W12) that appears near the sites where the galactose C1 and C2 atoms are located in the lactose-bound state. Together, the oxygen atoms of glycerol in these overlapping binding modes perfectly map out the binding site observed for lactose, except for the most peripheral part of the glucose moiety (Figure 2A,C). To this extent, glycerol and lactose constitute a naturally occurring illustration of the principle of fragment-based drug discovery in that the atoms of the initial fragments (glycerol) maintain their original orientations in the elaborated, higher-affinity ligand (lactose).
Figure 2
Carbohydrate-binding site of Gal3C. The 2Fo – Fc electron density maps, contoured at 1.0σ, are shown for the binding sites of glycerol-bound Gal3C and apo-Gal3C, superimposed on the structure of lactose (stick model) in lactose-bound Gal3C. (A) Electron density for glycerol-bound Gal3C. Glycerol and water molecules in the glycerol compex are shown as magenta sticks and spheres, respectively. To illustrate the molecular mimicry of lactose by glycerol, lactose is shown as thin sticks. The lactose and glycerol atom annotations are colored green and magenta, respectively. (B) Electron density for water molecules in apo-Gal3C. Water molecules are shown as blue spheres. Labels a and b are used to denote alternate positions of water molecules W8 and W9. As in panel A, lactose is shown as thin sticks as an aid to interpretation. (C) Superposition of the lactose-bound Gal3C, glycerol-bound Gal3C, and apo-Gal3C structures, demonstrating the common oxygen recognition motif of the binding site. The oxygens of lactose (green) and glycerol (magenta) occupy the same positions as water molecules found in apo (blue spheres) and glycerol-bound (magenta spheres) Gal3C.
Carbohydrate-binding site of Gal3C. The 2Fo – Fc electron density maps, contoured at 1.0σ, are shown for the binding sites of glycerol-bound Gal3C and apo-Gal3C, superimposed on the structure of lactose (stick model) in lactose-bound Gal3C. (A) Electron density for glycerol-bound Gal3C. Glycerol and water molecules in the glycerol compex are shown as magenta sticks and spheres, respectively. To illustrate the molecular mimicry of lactose by glycerol, lactose is shown as thin sticks. The lactose and glycerol atom annotations are colored green and magenta, respectively. (B) Electron density for water molecules in apo-Gal3C. Water molecules are shown as blue spheres. Labels a and b are used to denote alternate positions of water molecules W8 and W9. As in panel A, lactose is shown as thin sticks as an aid to interpretation. (C) Superposition of the lactose-bound Gal3C, glycerol-bound Gal3C, and apo-Gal3C structures, demonstrating the common oxygen recognition motif of the binding site. The oxygens of lactose (green) and glycerol (magenta) occupy the same positions as water molecules found in apo (blue spheres) and glycerol-bound (magenta spheres) Gal3C.
Binding of Glycerol to Gal3C Is Insignificant at Room Temperature
We used ITC and NMR to monitor the binding of glycerol to Gal3C at room temperature and under conditions (protein and ligand concentrations) similar to those used to study lactose binding. ITC experiments show no significant heat evolution upon addition of up to 130 equiv of glycerol to apo-Gal3C, beyond the heat of dilution (Figure S3A of the Supporting Information). Similarly, 1H–15N HSQC spectra of Gal3C acquired with increasing concentrations of glycerol show that the protein chemical shifts are unperturbed by the addition of 16 equiv of glycerol (Figure S3B of the Supporting Information). Thus, both ITC and NMR show that binding of glycerol to Gal3C is very weak (Kd > 100 mM) at room temperature. By contrast, lactose binds to Gal3C with a dissociation constant (Kd) of 230 μM, governed by a favorable enthalpy, but an unfavorable entropy,[45,69] as is typical for interactions of galectin with oligosaccharides.[70] The lack of glycerol binding at ambient temperature is consistent with the thermodynamics of binding of a carbohydrate to Gal3C. Evidently, the enthalpy of formation of the glycerol–Gal3C complex cannot overcome the entropic penalty at room temperature. This observation is explained by the smaller size of glycerol, which is essentially one-quarter of a lactose molecule and therefore engages in fewer oxygen-binding sites on Gal3C at a time, relative to lactose. In contrast, at the low temperature (100 K) of the cryocooled X-ray diffraction experiments, the thermodynamic balance is likely reversed so that the entropic penalty is now overcome by the favorable enthalpy of binding. In addition, the glycerol concentration in the cryoprotectant solution is very high (2 M), which further serves to drive glycerol into the binding site of Gal3C in the crystal. The fact that binding of glycerol to Gal3C could not be detected in solution at room temperature yet occurs perfectly as a fragment lead that can be elaborated into lactose emphasizes the important role of protein crystallography in fragment-based drug discovery.[71,72]
Structure of Apo-Gal3C
The crystal structure of apo-Gal3C was determined at 1.08 Å using PEG 400 as a cryoprotectant, to avoid glycerol binding. There are no significant differences in the conformation of the binding site from that of either the lactose- or glycerol-bound forms (Figure 2). Despite the slightly lower resolution of the apo structure, the 2Fo – Fc electron density map contoured at 2σ shows electron density at Nδ1 of His158. As for lactose-bound Gal3C, the tautomeric state is corroborated by the 1H–13C and 1H–15N correlation spectra, which are virtually identical to those of the lactose-bound state (Figure S2B of the Supporting Information). Thus, the tautomeric state is the same in both forms, and the hydrogen bonding interaction remains in the apo state in solution.The electron density at the binding site reveals six water molecules (W8–W11, W14, and W15) that align very well with oxygen positions of lactose (O6, O5, O4, O1, O3, and O3′, respectively) (see Figure 2B). The water molecules located at the O5 (W9a) and O6 (W8a) atoms were modeled with alternate conformations (W9b and W8b) that coincide with the C5 and C6 atoms of lactose, and they were assigned occupancies of 0.4 and 0.3, respectively. Two other water molecules with half-occupancy were found at positions equivalent to those of O2′ (W16) and O6′ (W17) in lactose. For comparison, the previously determined apo-Gal3C structure included only three water molecules in the binding site (corresponding to O3, O4′, and O6′).[66]To eliminate the possibility that the observed water molecules could be explained by PEG 400 or fragments thereof, the structure of apo-Gal3C was also determined at room temperature (298 K), without the addition of either glycerol or PEG 400. The resulting structure at 1.25 Å resolution confirms the position of the water molecules observed in the data at 100 K, except for the two partially occupied water molecules at the glucose O2′ and O6′ positions (W16 and W17 in Figure 2B), which have the fewest contacts with the protein. Conversely, the room-temperature data actually reveal another fully occupied water molecule at the position equivalent to O5′ of glucose. The close agreement between the two apo structures in the number and positions of bound waters indicates that the low temperature of the cryocooled experiment does not alter the water binding properties of Gal3C in any major way.
The Binding Site Is Preorganized To Recognize a Sugarlike Framework of Oxygens
The crystal structures presented here demonstrate that Gal3C coordinates water or hydroxyloxygens (e.g., of lactose or glycerol) at identical positions in its binding site, as summarized in Figure 2C. The structures also emphasize the significance of conserved water-mediated hydrogen bonding in stabilizing the Gal3C ligands. Further, side chain conformations within the binding site and the loop regions on either side of the binding groove are indistinguishable among the apo, glycerol-bound, and lactose-bound forms; i.e., the site is fully preorganized in the crystallized apo form to accommodate a sugarlike framework. This view contrasts with the solution structure of apo-Gal3C, in which the loops surrounding the binding site have a different conformation in the absence of ligand.[73] Also, the NMR order parameter for the Trp181 side chain of apo-Gal3C indicates that it is highly flexible in solution.[45] These differences in conformation between the solution and crystalline states reflect crystal contacts near the binding site as well as different flexibilities of the protein in solution and in the crystal. Taken together, the NMR and crystal structures suggest that the crystal traps a conformation of the protein that is not the dominant one in solution but is optimized for ligand binding. Consistent with this conclusion, the previous observation that lactose diffuses very slowly out of Gal3C crystals[66] suggests that the crystalline environment restricts the protein flexibility required for ligand recognition and release.
Enhanced Occupancy of Water Molecules at Specific Sites in Apo-Gal3C in Solution
Molecular dynamics simulations of apo-Gal3C in water were conducted to study the water structure in the binding site (Figure 3 and Table 2), as a further probe of whether the observations made for the crystal structures also hold in solution. We identified 13 sites (S1–S13) with enhanced water density within the binding site. In general, the centers of the identified water sites are localized at or close to the water molecules observed in the apo-Gal3C structure. Sites S3, S6, S7, and S9–S12 are very close to the water sites identified in a previous MD simulation initiated from the crystal structure of LacNAc-bound Gal3C.[74] However, that study was limited to identifying water sites based on radial and angular distribution functions around expected H-binding atoms in the protein, while our identification was unbiased by prior expectations and identified all sites with high water density regardless of their position relative to any hydrogen bonding partners in the protein.
Figure 3
Water sites identified in the MD simulations. The sites are labeled S1–S12 from left to right. The lactose molecule (green) is shown in the active site together with the water sites. The color scheme ranges from high occupancy (red, present in all 10 simulations) to low occupancy (yellow, present in only two of 10 simulations). The size of a sphere is related to the maximal extent (see the text); i.e., water sites defined with high precision are depticted as smaller spheres and vice versa. For reference, the water molecules in the crystal structures of lactose-bound Gal3C, glycerol-bound Gal3C, and apo-Gal3C are shown as green, magenta, and blue spheres, respectively.
Table 2
Details of the Water Site Clusters within 1.6 Å of Lactose Obtained from the MD Simulations of Apo-Gal3C
water site
occupancya
maximal extent (Å)b
r(Lac) (Å)c
r(Gal3C) (Å)d
r(H2O) (Å)e
S1
1.0
0.6
1.0 (O3)
3.3 (H158 CE1)
1.6 (W1/W14)
S2
1.0
0.8
1.2 (O4)
3.0 (H158 NE2/R144 NE1)
0.4 (W4)
S3
1.0
0.6
0.5 (O4)
2.5 (H158 NE2)
0.6 (W10)
S4
1.0
1.7
0.8 (C5)
3.4 (W181 CE2)
0.6 (W9)
S5
0.4
0.6
0.4 (C6)
3.5 (H158 CD2)
0.5 (W8a)
S6
1.0
1.3
0.2 (O6)
3.1 (N174 ND2)
0.2 (W8b)
S7
0.7
1.1
1.0 (O6)
3.0 (N174 ND2)
1.4 (W8b)
S8
0.7
2.5
1.0 (O1)
3.0 (R162 NH2)
0.4 (W11)
S9
1.0
1.7
0.9 (O3′)
2.1 (R162 NH2)
1.1 (W15)
S10
0.2
0.5
0.3 (O3′)
2.2 (R162 NH1)
0.2 (W15)
S11
0.7
1.0
1.0 (O3′)
2.6 (R162 NH1)
1.0 (W15)
S12
0.9
0.8
1.6 (O2′)
3.1 (E184 OE2)
1.0 (W16)
S13
0.8
1.1
1.6 (O6′)
4.1 (W181 CB)
1.3 (W17)
Occupancy is the fraction of the 10 independent simulations in which a given water site was identified.
Maximal extent is the maximal distance between two members in a water site.
r(Lac) is the shortest distance to a lactose atom, identified within parentheses.
r(Gal3C) is the shortest distance to a protein atom, identified within parentheses.
r(H2O) is the shortest distance to a water oxygen in the apo crystal structure, identified within parentheses.
Water sites identified in the MD simulations. The sites are labeled S1–S12 from left to right. The lactose molecule (green) is shown in the active site together with the water sites. The color scheme ranges from high occupancy (red, present in all 10 simulations) to low occupancy (yellow, present in only two of 10 simulations). The size of a sphere is related to the maximal extent (see the text); i.e., water sites defined with high precision are depticted as smaller spheres and vice versa. For reference, the water molecules in the crystal structures of lactose-bound Gal3C, glycerol-bound Gal3C, and apo-Gal3C are shown as green, magenta, and blue spheres, respectively.Occupancy is the fraction of the 10 independent simulations in which a given water site was identified.Maximal extent is the maximal distance between two members in a water site.r(Lac) is the shortest distance to a lactose atom, identified within parentheses.r(Gal3C) is the shortest distance to a protein atom, identified within parentheses.r(H2O) is the shortest distance to a wateroxygen in the apo crystal structure, identified within parentheses.Water sites S2–S6, S8, and S10 are located very close (≤0.6 Å) to the water molecules observed in apo-Gal3C at the sites of lactose atoms O4, C6, O6, O1, and O3′, respectively [corresponding to W4, W10, W9, W8a, W8b, W11, and W15 in the apo structures, respectively (see Figure 2B)]. Moreover, S1, S7, S9, and S11–S13 are only slightly offset (≤1.6 Å) from the observed positions of water or lactoseoxygens. Notably, S12 matches well with the position of the acetyl group of LacNAc in the LacNAc–Gal3C complex.[43] S9 and S11 can be seen as alternative positions of the low-occupancy S10 site, 0.9–1.0 Å from O3′. S5 has a low occupancy of 0.4, in agreement with the 30% occupancy estimated for the water molecule (W8b) close to C6 in the apo-Gal3C structure. Similarly, site S12 is close to W16, which is only half-occupied. Thus, essentially all water molecules in the apo crystal structures are close to water sites in the MD simulation of the apo state, which further supports our conclusion that the apo structure actually is devoid of ligands, such as traces of lactose or PEG, and that the water sites persist in solution.Comparing the MD-derived water sites in apo-Gal3C with the electron densities of the glycerol-bound Gal3C structure further aids in the interpretation of the latter. As already stated above, we find water sites on and close to the glycerol molecules. Most of the glycerol molecule that overlaps with the O4–O6 atoms of lactose maps to water sites (S3–S6) in the MD simulation of apo-Gal3C, while the other, partially occupied glycerol molecule overlaps with only two water sites (S10 and S13). Notably, the relative populations of the two glycerol molecules match the relative number and occupancy of water sites. However, two water molecules in the glycerol-bound Gal3C structure [W12 and W13, located near the C2 and O2 sites of lactose, respectively (Figure 2A)] do not correspond to any identified water site in the MD simulations, nor do they correspond to any waters in the apo-Gal3C structure. This observation further supports the hypothesis that the electron density of these tentative waters might in fact be explained by a third glycerol molecule with low occupancy.
The Bound Water Molecules in the Lactose-Binding Site Exchange Rapidly with Bulk Water
To investigate the residence times of water molecules bound to Gal3C, we used nuclear magnetic relaxation dispersion (MRD) experiments, which sensitively detect long-lived waters bound to proteins.[75,76] We performed 2H relaxation dispersion experiments covering resonance frequencies from 2.5 to 92.1 MHz (Figure 4). The MRD profiles of apo-Gal3C and lactose-bound Gal3C are highly similar, indicating that lactose does not displace any long-lived water molecules in the binding site. If this had been the case, the dispersion step for the apo sample would be greatly enhanced compared to that for the lactose-bound sample. Using a rotational diffusion correlation time (τ) of 7.3 ns,[45] the fitted relaxation dispersions yield NβSβ2 values of 4.5 ± 0.1 and 4.3 ± 0.1 for the lactose-bound and apo states, respectively. The slightly enhanced dispersion observed for lactose-bound Gal3C relative to apo, 0.2 ± 0.1 (mean ± one standard deviation), can readily be explained by exchange of 2H between bulk solvent and the hydroxyl groups of lactose bound to Gal3C, which is estimated to give an NβSβ2 contribution of 0.20, based on the exchange rate constants for glucosehydroxyl protons.[77] Both apo-Gal3C and lactose-bound Gal3C show a significant relaxation dispersion that is due to long-lived waters and labile deuterons (e.g., in hydroxyl groups) in the protein, the latter of which are estimated to make an NβSβ2 contribution of 0.5 at the present pH. The remaining dispersion step (NβSβ2 ≈ 3.8) is due to long-lived waters. Expecting that these waters are relatively well ordered (⟨Sβ2⟩ = 0.75–0.95), we obtain an Nβ of 4–5. This value is in agreement with the apo-Gal3C and lactose-bound Gal3C crystal structures, which each have three to five completely buried internal water molecules, all in regions outside of the lactose-binding site.
Figure 4
Nuclear magnetic relaxation dispersion of apo-Gal3C (○) and lactose-bound Gal3C (●). Empty and filled squares represent the corresponding data from the reference samples without Gal3C. The relaxation rates for the lactose samples are downshifted by a small (0.05 s–1) and frequency-independent difference between the reference samples to facilitate comparison. The errors are of the same size as the symbols.
Nuclear magnetic relaxation dispersion of apo-Gal3C (○) and lactose-bound Gal3C (●). Empty and filled squares represent the corresponding data from the reference samples without Gal3C. The relaxation rates for the lactose samples are downshifted by a small (0.05 s–1) and frequency-independent difference between the reference samples to facilitate comparison. The errors are of the same size as the symbols.Importantly, the MRD data reveal that all bound waters in the binding site of apo-Gal3C exchange with bulk water on a time scale of nanoseconds or faster at room temperature, in keeping with the MD simulations, which show that the mean residence time of water molecules in the binding site is only 14 ps, with individual values ranging between a few picoseconds to 1.3 ns.
Concluding Remarks: Implications for Drug Discovery and Design
The water molecules observed in the crystal structures of Gal3C are weakly bound at room temperature and have only slightly prolonged residence times relative to water molecules in the hydration layer surrounding the protein. Nonetheless, the enhanced water population translates into a potential of mean force that attracts oxygen atoms to these sites, as exemplified previously.[25,26,74,78] Thus, the weak interactions of water molecules with the binding site of Gal3C essentially map out a free energy surface for oxygen/hydroxyl binding. Ligands such as lactose, which have several oxygens arranged to fit this surface, will bind to the protein, partly as a consequence of the concerted forces driving each ligand oxygen atom into a given minimum of the energy surface. In contrast, binding of smaller ligands such as glycerol is penalized by the fewer favorable interactions, which cannot overcome the unfavorable entropy of binding (loss of translational and rotational freedom) at room temperature. Indeed, binding of glycerol to Gal3C was not detected at room temperature by ITC or NMR, despite clear evidence of two glycerol molecules in the cryo-crystal structure. This observation has a bearing on fragment-based drug discovery, emphasizing the role of protein cryo-crystallography in screening for suitable lead fragments that may otherwise escape detection.[71,72]
Authors: J L Asensio; H C Siebert; C W von Der Lieth; J Laynez; M Bruix; U M Soedjanaamadja; J J Beintema; F J Cañada; H J Gabius; J Jiménez-Barbero Journal: Proteins Date: 2000-08-01
Authors: María del Carmen Fernández-Alonso; Francisco Javier Cañada; Jesús Jiménez-Barbero; Gabriel Cuevas Journal: J Am Chem Soc Date: 2005-05-25 Impact factor: 15.419
Authors: Hans Ippel; Michelle C Miller; Sabine Vértesy; Yi Zheng; F Javier Cañada; Dennis Suylen; Kimiko Umemoto; Cecilia Romanò; Tilman Hackeng; Guihua Tai; Hakon Leffler; Jürgen Kopitz; Sabine André; Dieter Kübler; Jesús Jiménez-Barbero; Stefan Oscarson; Hans-Joachim Gabius; Kevin H Mayo Journal: Glycobiology Date: 2016-02-23 Impact factor: 4.313
Authors: Pavan K Ghattyvenkatakrishna; Emal M Alekozai; Gregg T Beckham; Roland Schulz; Michael F Crowley; Edward C Uberbacher; Xiaolin Cheng Journal: Biophys J Date: 2013-02-19 Impact factor: 4.033
Authors: Yannick Baschung; Loredana Lupu; Adrian Moise; Michael Glocker; Stephan Rawer; Alexander Lazarev; Michael Przybylski Journal: J Am Soc Mass Spectrom Date: 2018-06-25 Impact factor: 3.109