Literature DB >> 26561965

Carbohydrate-Aromatic Interactions in Proteins.

Kieran L Hudson¹, Gail J Bartlett¹, Roger C Diehl², Jon Agirre³, Timothy Gallagher¹, Laura L Kiessling^2,4, Derek N Woolfson^1,5,6.

Abstract

Protein-carbohydrate interactions play pivotal roles in health and disease. However, defining and manipulating these interactions has been hindered by an incomplete understanding of the underlying fundamental forces. To elucidate common and discriminating features in carbohydrate recognition, we have analyzed quantitatively X-ray crystal structures of proteins with noncovalently bound carbohydrates. Within the carbohydrate-binding pockets, aliphatic hydrophobic residues are disfavored, whereas aromatic side chains are enriched. The greatest preference is for tryptophan with an increased prevalence of 9-fold. Variations in the spatial orientation of amino acids around different monosaccharides indicate specific carbohydrate C-H bonds interact preferentially with aromatic residues. These preferences are consistent with the electronic properties of both the carbohydrate C-H bonds and the aromatic residues. Those carbohydrates that present patches of electropositive saccharide C-H bonds engage more often in CH-π interactions involving electron-rich aromatic partners. These electronic effects are also manifested when carbohydrate-aromatic interactions are monitored in solution: NMR analysis indicates that indole favorably binds to electron-poor C-H bonds of model carbohydrates, and a clear linear free energy relationships with substituted indoles supports the importance of complementary electronic effects in driving protein-carbohydrate interactions. Together, our data indicate that electrostatic and electronic complementarity between carbohydrates and aromatic residues play key roles in driving protein-carbohydrate complexation. Moreover, these weak noncovalent interactions influence which saccharide residues bind to proteins, and how they are positioned within carbohydrate-binding sites.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2015 PMID： 26561965 PMCID： PMC4676033 DOI： 10.1021/jacs.5b08424

Source DB: PubMed Journal: J Am Chem Soc ISSN： 0002-7863 Impact factor: 15.419

Introduction

There is growing appreciation of the fundamental roles of protein–carbohydrate interactions in biologically and medically important processes. Inhibiting or co-opting these interactions could lead to new classes of therapeutics,[1] but despite a few notable successes,[2,3] harnessing and controlling these interactions remains challenging. To elucidate and intervene in the biological processes mediated by protein–carbohydrate interactions, an understanding of their molecular basis is critical. Substantial advances are being made in this area.[4] Nonetheless, the precise nature and balance of forces that drive the complexation of carbohydrates by proteins are not fully understood. The importance of hydrogen bonds between the carbohydrate hydroxyl groups and polar moieties of amino acids in the binding of carbohydrates by proteins is well recognized.[5−7] However, the role played by hydrophobic aliphatic and aromatic side chains in binding water-soluble carbohydrates is more obscure, with emphasis placed on interactions with carbohydrate C–H groups through the hydrophobic effect.[8] Aromatic residues have long been implicated in binding carbohydrates.[5,9] Carbohydrate-aromatic interactions are increasingly the subject of study in their own right,[10] and an underlying contributer to affinity is the CH−π interaction, i.e., the interaction of an aromatic π-system with a C–H bond.[11,12] Indeed, carbohydrate–aromatic interactions have been examined in model systems using a variety of methods, including computational studies; investigation of the folding of synthetic glycopeptides designed to form intramolecular interactions; and the interrogation of small-molecule systems by solution-phase NMR studies.[10,13−25] These fundamental studies establish the importance of carbohydrate–aromatic interactions, but some gaps in knowledge remain: The relative propensities of specific monosaccharides and aromatic residues to participate in carbohydrate–aromatic interactions have not been quantified, nor is it known whether certain carbohydrate C–H bonds are prone to engage more than others. Addressing these issues would aid in understanding and predicting the features of protein–carbohydrate complexes, and it would facilitate the design of efficacious inhibitors. Answering these questions depends on understanding the forces underlying carbohydrate–aromatic interactions. CH−π interactions have an agreed dispersion, or van der Waals component. However, additional electrostatic contributions—namely, potentially attractive interactions between partial positive charges on C–H protons and the electronegative π-system—are less certain.[17,26] Therefore, the importance of electronic effects in the species—i.e., the factors affecting these charges, such as inductive and stereoelectronic effects—is not established. Theoretical and experimental studies of model carbohydrate–aromatic complexes have found cases both where electronics are important for CH−π interactions,[22,24,25] and where they do not play a major role.[16,18,21,23] Structural bioinformatics analyses allow protein–carbohydrate interactions to be probed directly at the atomistic level. To date, such analyses have been restricted to specific protein families or carbohydrate residues.[17,27] Thus, there is not yet a general understanding of how the structural properties of individual monosaccharides lead to their binding and discrimination through the inherent characteristics and positioning of amino acids within carbohydrate-binding sites in proteins. The increased size of the Protein Data Bank (PDB) over the past decade[28] provides a rich source of structural data on protein–carbohydrate complexes.[29] We reasoned that quantitative analyses across all protein classes would uncover general and clear principles of protein–carbohydrate interactions, should they exist. Our analyses reveal that the noncovalently bound carbohydrates make more-numerous and more-specific contacts with protein side chains than do covalently attached carbohydrates (i.e., in glycoproteins) in the PDB. In the binding sites of the former, polar amino acids mostly occur with frequencies expected by chance; aliphatic hydrophobic residues are underrepresented, whereas electron-rich aromatic side chains, particularly tryptophan, are favored. Moreover, there are preferred relative orientations of the aromatic and carbohydrate rings, which depend on the identity of the saccharide residue. CH−π interactions to the electronegative aromatic rings are observed more frequently for more-electropositive C–H bonds, indicating important contributions from both orbital overlap and complementary electronics between the carbohydrate and π-system. This analysis is supported by determination of linear free energy relationships using substituted indoles and methyl glycosides, which highlight a key role for electronic effects in CH−π interactions.

Experimental Section

To generate the protein–carbohydrate interaction database, context data were obtained from GlyVicinity[30,31] for amino acids with any atom within 4.0 Å of any atom of a carbohydrate moiety. In order to deal with any potential mistakes that structures deposited in the PDB[28] may contain, which is a problem inherent in any attempt at gaining chemical information from a public structural biology repository,[32,33] strict validation criteria were employed. The carbohydrate residues within all of the PDB entries listed by GlyVicinity were validated with the Privateer software,[34] according to the following criteria: first, only monosaccharides showing the strongly preferred minimal energy conformation (4C1 for d-sugars, 1C4 for l-sugars) were considered; and second, only models with a good fit to bias-minimized electron density were selected. Only PDB entries deposited along with structure factors—i.e., experimental data—were considered. The selected agreement metric was the real-space correlation coefficient (RSCC), with a minimum cutoff value of 0.8. As the significance of this indicator decreases with decaying resolution, only entries with a reported resolution of 2.0 Å or better were included. Of these, the coordinates of the monosaccharide and amino-acid residues identified were extracted from the parent PDB files, where possible, with examples where the nearby amino acids were identical (as in homooligomeric crystals) discounted. The data set for each examined monosaccharide was obtained using the GlyVicinity assignment of the monosaccharide, with erroneous assignments removed. For each monosaccharide class, structures in which it was found were culled using CD-HIT[35] at 95% pairwise protein sequence identity, in order to maximize the data available for each carbohydrate type while minimizing bias from identical protein structures and point mutations. The relative occurrence of each amino acid in the vicinity of all of the investigated monosaccharides was compared to that in the UniprotKB/Swiss-Prot data bank.[36,37] Propensity = (proportion of an amino acid in the data set)/(proportion of that amino acid in UniprotKB); error bars represent 95% confidence assuming a normal approximation of a binomial distribution. Amino acids interacting with the α-/β-faces were defined as those where the center of the side chain was within 6 Å of the ring atoms or C6 of the carbohydrate. CH−π interactions were identified using three parameters adapted from those previously used in a study of proteins.[38] If multiple C–H bonds fell within these parameters for a single aromatic ring, that with the smallest C-projection distance was taken as the primary interacting C–H bond. To generate electrostatic surface potentials (ESPs), minimized conformations were generated from Density Functional Theory (B3LYP/6-31+(d)) calculations in the gas phase using Gaussian09.[39] ESPs were then generated from Hartree–Fock (B3LYP/6-31(d)) energy calculations of these conformations at isovalue 0.002and visualized using GaussView 5.[40] For the NMR experiments, indole, 5-substituted indoles, and deuterium oxide were obtained from Sigma-Aldrich and TCI. 4,4-dimethyl-4-silapentane 1-sulfonic acid (DSS) was obtained from Uvasol. Glycosides (other than methyl-β-d-mannopyranoside, synthesis outlined in Supporting Information) were obtained from Pfanstiem and Sigma-Aldrich. All chemicals were of at least 97% purity. Solutions were prepared on a weight per volume basis. Proton NMR spectra were acquired in D2O on a Bruker Avance-500 500 MHz spectrometer with a DCH cryoprobe. Experiments used a spectral window from 11 to −1 ppm, a 4 s acquisition time, a 2 s relaxation delay, and 64 scans. NMR experiments with a relaxation delay of 15 s were run to verify indole concentration. The shift of the trimethyl peak of DSS was normalized to δDSS = 0 ppm. For the data points shown, three series of experiments were conducted at the same glycoside and indole concentrations: indole only, glycoside only, and mixed samples. The chemical shifts were averaged over three replicates, and the chemical-shift perturbations were reported as Δδ = δindole – δindole-free. Total examples in data set. Total proximal amino acids across data set, and composition of these. Average proximal amino acids per example, and standard deviation. Average number of amino acids associated with each carbohydrate face, and composition of these. Facial distribution of CH−π interactions, and average per example.

Results and Discussion

A Database of Protein–Carbohydrate Interactions

To examine features of protein–carbohydrate interactions, first we used GlyVicinity[31] to create a structural database of monosaccharide residues—i.e., free monosaccharides, or separated constituents of larger oligosaccharides—together with proximal amino acids from X-ray crystal structures from the PDB. Strict validation criteria were set to avoid incorporating entries with incorrect nomenclature,[32] unlikely conformations, or poorly fitted experimental data.[33] For the elucidation of interactions discussed herein, we used the data in its broadest form: We chose 7 of the biologically relevant carbohydrates that occurred most frequently in the data set, as both α- and β-anomers, namely: d-glucose (d-Glc), d-galactose (d-Gal), d-N-acetylglucosamine (d-GlcNAc), d-N-acetylgalactosamine (d-GalNAc), d-mannose (d-Man), d-xylose (d-Xyl), and l-fucose (l-Fuc). We treated each residue as an isolated unit, considering only the pyranose form, and ignoring any modifications of the hydroxyl groups (e.g., O-methylation, O-phosphorylation, etc.). We recognize that substituents on the carbohydrate frameworks may well affect interactions, but our focus on unmodified saccharide residues was simply to maximize the available data and to find general, or first-order, interactions between carbohydrates and their protein hosts. The resulting data set encompassed carbohydrate moieties that could be divided into two groups: covalently bound glycans (from glycoproteins), and ligands bound noncovalently to proteins, Table S1. The overall database provides a means to interrogate many features of protein–carbohydrate complexes in finer detail. An initial scan of the database indicated that for glycans there were fewer close-contacts between carbohydrate residues and protein side chains in glycosylated proteins than there were for the same monosaccharides from ligands in protein–carbohydrate complexes, Tables and S2. For the four cases with sufficient examples to allow comparisons—α/β-d-Man, α-l-Fuc, and β-d-GlcNAc—the covalently bound carbohydrates made on average approximately one-half to two-thirds the number of contacts with protein side chains, and less than one-fifth of the CH−π interactions, than observed for the corresponding noncovalent complexes. These differences are perhaps not surprising, as the covalent linkage in glycoproteins does not require effective noncovalent interactions to bind the carbohydrate to the protein. An interesting additional possibility, however, is that such interactions may be less likely to occur in glycoproteins, where the glycan can participate in intermolecular protein−carbohydrate interactions as an alternative. Thus, the saccharide’s most-effective binding face is not occluded through an intramolecular interaction, but rather left free to engage in an intermolecular interaction. Without binding partners present in the X-ray crystal structures, whether such trade-offs occur cannot be seen. Whatever the reasons for the lower density of protein–carbohydrate interactions in the glycans, we focused our subsequent analyses on noncovalent protein–carbohydrate complexes, Table , as we were interested in the interactions of carbohydrate ligands for this study.

Table 1

Complete Tables of Statistics by Monosaccharide of All Classes Investigated from Noncovalent Species

Total examples in data set.

Total proximal amino acids across data set, and composition of these.

Average proximal amino acids per example, and standard deviation.

Average number of amino acids associated with each carbohydrate face, and composition of these.

Facial distribution of CH−π interactions, and average per example.

Aromatic Amino Acids Are Markedly Preferred in Carbohydrate-Binding Sites

The amino acids proximal to carbohydrates were normalized to their occurrence in all protein sequences, Figure . Independent of the method of normalization employed (Figure S1), three trends emerged. First, we observed only a small preference for polar, hydrogen-bonding residues within these binding sites; although of these residues, aspartic acid (Asp) and asparagine (Asn) were particularly favored, occurring approximately twice as often as expected by chance. Second, and without exception, aliphatic residues were disfavored in carbohydrate-binding pockets. This exclusion would not be expected if the hydrophobic effect alone played a major role in carbohydrate binding. Third, and most conspicuously, three of the four aromatic residues contacted carbohydrates more frequently than expected by chance, in the order tryptophan (Trp) ≫ tyrosine (Tyr) > histidine (His). These last two observations highlight that carbohydrate–aromatic interactions are a key defining characteristic of carbohydrate-binding sites, whereas, hydrophobic interactions per se are not. They also reveal that not all aromatic residues are equivalent—some are more likely than others to interact with carbohydrates.

Figure 1

Amino acids proximal to carbohydrates in X-ray crystal structures of protein–carbohydrate complexes. Propensities of amino acids (in order of increasing hydrophobicity[41]) in carbohydrate-binding sites from the data set compared to the distribution of amino acids across all proteins in Uniprot.[37] Alternative methods for normalization are given in Figure S1; however, the overall trends shown here are preserved. Color code: white, hydrogen-bonding side chains; gray, aliphatic hydrophobic side chains, including Gly, Pro, Cys and Met; beige, aromatic side chains.

The Positional Distributions of Aromatic Residues around Carbohydrates Are Biased

We examined the aromatic residues that we identified in detail, postulating that the juxtapositions of carbohydrate and aromatic residues should illuminate the forces that drive protein–carbohydrate interactions. In the following, we illustrate our observations and arguments with comparisons between two well-represented isomers, β-d-Glc and β-d-Gal, that differ in stereochemistry at only the 4-hydroxyl group, Figure A,D. The general and discriminating features emerging from this comparison are emblematic of those that we observed more broadly for carbohydrate–protein complexation, Figure S2 and Table .

Figure 2

Distribution of aromatic and aliphatic amino acids around carbohydrates. (A–C) β-d-Glc, and (D–F) β-d-Gal. (A, D) α- and β-faces and ring C–H bonds. (B, E) Centers, represented as spheres, of aromatic and aliphatic side chains interacting with the faces of the carbohydrates (i.e., within 6 Å of any carbohydrate carbon or the ring oxygen). (C, F) Proportions of aromatic and aliphatic side chains interacting with the α- and β-faces reported to the nearest carbon atom of the pyranose ring. See Figure S2 for the analyses for all monosaccharides.

We compared amino-acid distributions around β-d-Glc and β-d-Gal by first focusing on the two distinct surfaces of carbohydrate rings, the α- and β-face, Figure A,D. These each present select C–H bonds that differ in stereochemistry and stereoelectronics between monosaccharides configurations. With its completely equatorial arrangement of hydroxyl and alkoxyl groups, β-d-Glc has approximate symmetry, with a polar perimeter in the plane of the saccharide ring bisecting the α- and β-faces consisting of C–H bonds above and below it. These properties have been exploited to design synthetic carbohydrate-binding receptors.[16] Consistent with this C–H bond arrangement, we found similar numbers of aliphatic and aromatic contacts on the β-face, and a slight (2.7-fold) preference for aromatic over aliphatic residues on the α-face, Figure B, Video S1 and Table . We quantified the proportions of side chains nearest each carbon of the carbohydrate to determine how different C–H bonds interacted with the local protein environment, Figure C. Our observations largely tracked the direction of the C–H bond, with a higher preference for aromatics and aliphatics on the face toward which the C–H bond was oriented. For example, contacts to both aromatic and aliphatic side chains on the β-face were made by C(2)–H and C(4)–H; those made on the α-face were largely effected by C(1)–H, C(3)–H, and C(5)–H, whereas C6 failed to exhibit a facial preference, presumably because of rotation around the C5–C6 bond. Amino acids proximal to carbohydrates in X-ray crystal structures of protein–carbohydrate complexes. Propensities of amino acids (in order of increasing hydrophobicity[41]) in carbohydrate-binding sites from the data set compared to the distribution of amino acids across all proteins in Uniprot.[37] Alternative methods for normalization are given in Figure S1; however, the overall trends shown here are preserved. Color code: white, hydrogen-bonding side chains; gray, aliphatic hydrophobic side chains, including Gly, Pro, Cys and Met; beige, aromatic side chains. In contrast, β-d-Gal exhibited marked differences in amino-acid environment between the α- and β-faces, Table , Figure D–F, Video S2. These findings underscore the importance of the carbohydrate stereochemistry, as the change in configuration at the C4 position has a major effect on interaction with aliphatic and aromatic amino acids. In detail, aliphatic residues were largely excluded from the α-face of β-d-Gal, but aromatic side chains were prevalent, with a 14-fold preference for aromatic moieties. This preference was especially strong at the C(4)–H and C(5)–H positions, Figure F, and was much starker than that observed for β-d-Glc C–H protons, indicating more-favorable interactions with aromatics. Analogous variations in C–H bond interactions were seen for other monosaccharides, Figure S2. For example, for α-d-Glc the only axial hydroxyl is on the α-face, the reverse case to β-d-Gal. Correspondingly, opposite to β-d-Gal, we found a high preference for C–H bonds to interact with aromatic residues on the β-face of α-d-Glc, but little discrimination for those on the α-face, Figure S2A. Distribution of aromatic and aliphatic amino acids around carbohydrates. (A–C) β-d-Glc, and (D–F) β-d-Gal. (A, D) α- and β-faces and ring C–H bonds. (B, E) Centers, represented as spheres, of aromatic and aliphatic side chains interacting with the faces of the carbohydrates (i.e., within 6 Å of any carbohydrate carbon or the ring oxygen). (C, F) Proportions of aromatic and aliphatic side chains interacting with the α- and β-faces reported to the nearest carbon atom of the pyranose ring. See Figure S2 for the analyses for all monosaccharides. Thus, C–H bonds that seem chemically similar, such as the C(4)–H bonds of β-d-Glc and β-d-Gal, have different preferences for interaction with aromatic moieties. Furthermore, preference for aromatics is at the expense of aliphatic amino acids, further discounting the hydrophobic effect as an explanation. Therefore, we sought to elucidate the role of electronics in carbohydrate–aromatic interactions by investigating the electrostatic potentials of the aromatic moieties and carbohydrate C–H bonds.

Role of Electronics in CH−π Interactions

Unlike aliphatic residues, aromatic amino acids present electronegative π-electron systems above and below the planes of the aromatic rings that can interact with carbohydrate C–H bonds through CH−π interactions.[10] We posited that if electrostatic contributions are important for CH−π interactions in protein–carbohydrate complexes, differences in the electronics of the aromatic systems and carbohydrate C–H bonds would determine participation in such interactions. We identified CH−π interactions in the data set using a three-parameter operational definition for the interaction[27] (Figure A), and then we probed for any correlations between the electronics of aromatic and carbohydrate rings, calculated and visualized as electrostatic surface potentials (ESPs), at the sites of the interactions.

Figure 3

Definition of parameters for CH−π interactions and participating amino acids. (A) Parameters used to identify CH−π interactions:[38] CH−π angle (θ, ≤ 40°), CH−π distance (C-X, ≤ 4.5 Å), C-projection distance (Cp–X, ≤ 1.6 Å for His and TrpA; ≤ 2.0 Å for Phe, TrpB, Tyr). (B) Raw-count distribution of aromatic side chains identified making CH−π interactions with carbohydrates. For Trp, CH−π interactions were identified for cases where either the five- or six-membered ring interacts with a CH proton, TrpA and TrpB, respectively, and where the two rings both interact with separate CH protons, TrpA+B. (C) Structure of proteinogenic aromatic amino acids, with corresponding electrostatic surface potentials for the π-systems (highlighted in beige) of the side-chain moieties: indole (Trp); phenol (Tyr); benzene (Phe); imidazole (His). For indole and phenol, the forms as hydrogen-bond donors (H-bonded to water) are shown, as these are predominant in protein X-ray crystal structures.[42] To show the differences in the π-systems, the scale is shown from ≥130 kJ mol–1 (electropositive, blue) through neutral (green) to ≤ −130 kJ mol–1 (electronegative, red).

We found that across our database the four aromatic side chains engaged in CH−π interactions with carbohydrate C–H bonds to different extents, with the order Trp > Tyr > phenylalanine (Phe) > His, Figure B. This ranking reflects the ESPs of these side chains (Figures C and S4A–I) and implies that electron-rich aromatic systems are the most likely to engage in CH−π interactions. The aforementioned ranking could stem solely from the relative surface areas of the aromatic side chains. When normalized for surface area of the π-systems, however, the most electron-rich Trp remained the most common acceptor of CH−π interactions Figure S5. The preference for Tyr over Phe also supports the importance of electronics. The aromatic systems of Tyr and Phe both present a similar surface area, comprising 6-carbon-membered rings. Indeed, a study of such interactions between amino acids within protein crystal structures found Phe and Tyr were equally likely to participate as CH−π acceptors,[38] possibly highlighting differences for intra- and intermolecular systems. In terms of electronics the two systems are not equivalent. Participation of the Tyr hydroxyl in hydrogen bonding as an H-bond donor—which is the case for almost all examples of Tyr in proteins[42]—increases the electron-density of the π-system of Tyr, Figure S4. As shown by the ESPs, Figures C and S4C–F, this increases the electronegativity of the π-system, hence making it a preferred acceptor over Phe. Trp is almost always involved as an H-bond donor in proteins,[42] which increases the electronegativity of the π-system beyond H-bonded Tyr, Figure S4A,B. Interpretation of the data for the side chain of His is complicated by the different hydrogen-bonded and protonation states that it can take; however, its involvement in CH−π interactions in protein–carbohydrate complexes, Figure B, and proteins in general,[38] is relatively small. Definition of parameters for CH−π interactions and participating amino acids. (A) Parameters used to identify CH−π interactions:[38] CH−π angle (θ, ≤ 40°), CH−π distance (C-X, ≤ 4.5 Å), C-projection distance (Cp–X, ≤ 1.6 Å for His and TrpA; ≤ 2.0 Å for Phe, TrpB, Tyr). (B) Raw-count distribution of aromatic side chains identified making CH−π interactions with carbohydrates. For Trp, CH−π interactions were identified for cases where either the five- or six-membered ring interacts with a CH proton, TrpA and TrpB, respectively, and where the two rings both interact with separate CH protons, TrpA+B. (C) Structure of proteinogenic aromatic amino acids, with corresponding electrostatic surface potentials for the π-systems (highlighted in beige) of the side-chain moieties: indole (Trp); phenol (Tyr); benzene (Phe); imidazole (His). For indole and phenol, the forms as hydrogen-bond donors (H-bonded to water) are shown, as these are predominant in protein X-ray crystal structures.[42] To show the differences in the π-systems, the scale is shown from ≥130 kJ mol–1 (electropositive, blue) through neutral (green) to ≤ −130 kJ mol–1 (electronegative, red). It is striking that the ranking of aromatic amino acids involved in CH−π interactions closely aligns with that observed for cation-π interactions in similar ligand binding systems.[43] For many cation-π interactions, such as those of the tetramethylammonium cation, the interaction of the positive charge with electron-rich aromatic rings is mediated by C–H protons,[44] and this could be argued to be analogous to a CH−π interaction involving extremely polarized C–H bonds.

Importance of the Electronics of the Carbohydrate C–H Bond

Next, we investigated whether involvement in CH−π interactions also depended on the electronics of the carbohydrate C–H bonds. Such preference could contribute to carbohydrate discrimination: The positivity of the carbohydrate C–H protons results from the overall hydroxyl stereochemistry. Therefore, to compare the C–H protons, we examined the ESPs of the different monosaccharides in more detail. We considered β-d-Gal first, Figure A, because carbohydrate–aromatic interactions are already known to play key roles in its binding;[9] and indeed, of all the well-represented monosaccharides, our analysis revealed that it made the highest proportions of CH−π interactions, Table . While steric hindrance can impact the ability of some C–H bonds (e.g., C(2)–H) to participate in CH−π interactions, the data suggested electronic effects are critical. The configuration of the hydroxyl groups of β-d-Gal give a cluster of C–H bonds on its α-face, formed by C(1)–H, C(3)–H, and C(5)–H and extending to the edge where C(4)–H and one of the C(6)–H atoms are located, Figure B. While often described as a “nonpolar patch”,[6−8] the ESP indicates that it is in fact partially positive, and this “positive patch” corresponds to the area where interacting side chains are almost exclusively aromatic, Figure E,F. One way to rationalize this particularly electropositive patch is through stereoelectronic effects leading to more positive C–H protons: the axial C4-hydroxyl withdraws electron density from C3 and C5 protons via overlap of the C–H σ orbital with the σ* orbital of the C(4)–O bond, and the C4 proton is rendered electron-poor through overlap with σ* orbital of the ring C–O bond.

Figure 4

Superposition of the subset of aromatic side chains engaged in CH−π interactions revealed them located predominantly over the most electropositive C–H bonds of C4 and C5, Figure C and Video S3. Very few examples interacted with the C(2)–H of the β-face, for which the electrostatic potential is more neutral. That the more-positive protons of the carbohydrate interact more frequently with the electron-rich aromatic systems is consistent with a contribution from electrostatics to CH−π interactions. To test the importance of electronics more generally, we compared the ESPs of further carbohydrates and assessed their engagement in CH−π interactions, Figures , S6 and S7. In all cases, our findings support a role for an electrostatic contribution to the CH−π interactions. As the electronics of the carbohydrate C–H bonds are determined by the identity of the monosaccharide and the anomer, this leads to distinct modes of interaction for the different classes. For example, β-d-Gal and β-d-Glc more often than not engaged in CH−π interactions with proximal aromatic residues; however, such contacts were less common in binding sites of α-d-Man, α-l-Fuc, α-d-Xyl, and α- and β-d-GlcNAc, which do not present such electropositive C–H bonds, Table and Figure S6.

Figure 5

Hydroxyl group stereochemistry influences carbohydrate electrostatics and CH−π interactions. (A) β-d-Gal, (B) β-d-Glc, and (C) α-d-Glc. Column 1: Stick models for representative minimized conformations viewed from the α-faces with C–H protons numbered. Column 2: Normalized calculated ESPs for the same orientation of the minimized conformation. The scale is shown from ≥260 kJ mol–1 (electropositive, blue) through neutral (green) to ≤ −260 kJ mol–1 (electronegative, red); as with Figure B this is double that used for the aromatic systems in Figure C. Column 3: The distributions of aromatic side chains that form CH−π interactions with the monosaccharides. Column 4: Average frequency of involvement of the monosaccharide C–H protons in the CH−π interactions. For complete analyses for all monosaccharides see Figures S6 and S7.

Relationship between carbohydrate electrostatic surface potential and formation of CH−π interactions. (A) Orthogonal views of a minimized conformation of β-d-Gal, representative of the majority of those found in the database, which has the ω-angle favored by Gal in solution and in protein crystal structures,[45] in stick-model representation with C–H protons numbered systematically. (B) ESP calculated for the minimized conformation. To show the differences in the C–H bonds, the scale is shown from ≥260 kJ mol–1 (electropositive, blue) through neutral (green) to ≤ −260 kJ mol–1 (electronegative, red). This is double that used for the aromatic systems; i.e., similar changes in color here signify bigger differences than in Figure C. (C) Juxtaposed aromatic moieties of amino acids engaged in CH−π interactions with β-d-Gal. The α-faces of β-d-Glc and β-d-Gal isomers are sterically similar, Figure A,B, and yet the propensity for the two carbohydrates to engage in CH−π interactions on this face differed. This is because the α-face C–H protons are comparatively more electropositive for β-d-Gal, which should promote CH−π interactions, particularly those involving the C4 and C5 protons, Figure A. 97% of CH−π interactions occurred on the α-face for β-d-Gal, at an average of almost one interaction per example, Table . The corresponding α-face protons of β-d-Glc are less electropositive, and, as a result, CH−π interactions were less frequent, Figure B. 68% of interactions occurred on the α-face for β-d-Glc, just over 0.5 per example on average. Examination of other, albeit less-well represented, monosaccharides in our database provided further support for electronic effects, Figures S6 and S7. For example, for both α-d-Gal and α-d-Glc the axial hydroxyl on the α-face reduces the electropositivity, and correspondingly, there are CH−π interactions, of the α-face C–H bonds compared to the β-anomers, Figures S6A,C and S7A,C. For α-d-Glc the most positive C–H bonds are on the β-face, and this is where most CH−π interactions occurred, Figure C. Disruption or reduction of the electropositive patches led to lesser involvement in CH−π interactions. For α-d-Man, the 1,2-diaxial arrangement of hydroxyl groups prevent there being any very electropositive C–H protons, Figures S6E and S7E. The CH−π interactions of α-l-Fuc also suggested a contribution of electrostatics over hydrophobic or simple steric effects particularly well: The lack of oxygen at C6 relative to α-d-Gal reduces the electropositivity of the C–H protons at C5 and C6, and correspondingly fewer CH−π interactions, despite Fuc being the more hydrophobic overall, Figures S6M and S7M.

Electronic Effects Promote Carbohydrate–Aromatic Interactions in Solution

Finally, and as an experimental test, we probed how our two exemplar carbohydrate residues, β-d-Glc and β-d-Gal, interacted with aromatic residues in aqueous solution. We used 1H NMR spectroscopy to follow the association of indole (as a Trp surrogate) and the two β-methyl-glycosides. In both cases, there were small but measurable and reproducible upfield shifts (negative Δδ) indicative of CH−π interactions[13] of some, but not all, C–H protons of the carbohydrates, Figures A, S8, and S9. Moreover, the magnitudes of the changes differed between protons, with the NMR data, Figures A and S9, in good agreement with the database-derived propensities, Figure . As predicted, carbohydrate–aromatic interactions were stronger for β-methyl-d-Gal than for β-methyl-d-Glc. For the former, larger chemical-shift changes were observed for the C1, C3, C4, and C5 protons, i.e., all on the electropositive α-face of the monosaccharide. The interactions with β-methyl-d-Glc were weaker, consistent with a less-electropositive α-face and our database analysis, Figures A and S9. Indole gave stronger CH−π interactions than previously reported for phenol or benzene,[13] in accord with the observed preference for Trp in carbohydrate-binding sites, Table S3. Our findings are in accord with those of others on model peptides,[14] and between methyl glycosides with the free amino acids l-Phe, l-Trp, and l-Tyr.[15] Again, these data suggest that the favorable CH−π interactions make critical contributions to the binding of some but not all saccharides.

Figure 6

1H NMR chemical shift perturbations in carbohydrate–aromatic interactions in solution. (A) Interactions between methyl glycosides and 7.5 mM indole in D2O. The circle color and size is scaled to represent the chemical-shift change relative to indole-free solutions (Δδ = δindole – δindole-free). From left to right: β-d-Gal, β-d-Glc, and β-d-Man. (B) Δδ shift for H5 and methyl C–H protons of methyl-β-d-Gal versus the Hammett σp parameter of the 5-substituent in a series of substituted indoles. To allow for solubility limitations, all perturbations were normalized to 7.5 mM indole using the linear dependence of chemical-shift perturbation on indole concentration, Figure S9. Linear fits of the data are shown for H5 (gradient = 5.7, R2 = 0.86) and Me (gradient = 2.1, R2 = 0.63). Δδ values were independent of glycoside concentration. ppb = parts per billion.

Our analyses of the ESPs suggested that other saccharides, less-well represented in our bioinformatics study, also present clusters of electropositive C–H bonds that might facilitate favorable CH−π interactions. One such carbohydrate epitope is β-d-Man. Because of the axial C(2)–OH, the α-face C–H bonds of β-d-Man (at C1, C2, C3 and C5) form an electropositive patch analogous to that of β-d-Gal, Figure S6F. Therefore, we postulated that β-d-Man should engage in CH−π interactions at these positions. This hypothesis was supported by the relatively small number of examples in our structural database, Table . By 1H NMR we detected similar CH−π interaction strengths as those observed for β-methyl-d-Gal. As predicted, the indole interacted with the most-electropositive C–H protons on the α-face of β-d-Man, Figures A and S9. Hydroxyl group stereochemistry influences carbohydrate electrostatics and CH−π interactions. (A) β-d-Gal, (B) β-d-Glc, and (C) α-d-Glc. Column 1: Stick models for representative minimized conformations viewed from the α-faces with C–H protons numbered. Column 2: Normalized calculated ESPs for the same orientation of the minimized conformation. The scale is shown from ≥260 kJ mol–1 (electropositive, blue) through neutral (green) to ≤ −260 kJ mol–1 (electronegative, red); as with Figure B this is double that used for the aromatic systems in Figure C. Column 3: The distributions of aromatic side chains that form CH−π interactions with the monosaccharides. Column 4: Average frequency of involvement of the monosaccharide C–H protons in the CH−π interactions. For complete analyses for all monosaccharides see Figures S6 and S7. To examine further electronic effects in the associations in solution, we carried out a linear free energy (Hammett) analysis of the binding of methyl-β-d-Gal to different 5-substituted indoles, Figures S10 and S4. We monitored changes in chemical shift for the most perturbed Gal ring proton, C(5)–H, Figure B. Electron-rich indoles gave larger changes in chemical shift than did indole itself, indicating that the former engaged in stronger CH−π interactions. In contrast, electron-poor indoles afforded weaker interactions, and the strongly electron-withdrawing nitro-substituent appeared to abolish the interactions entirely. The linear trend observed, Figure B, indicates that electronic effects are critical in CH−π interactions.

Conclusions

In summary, we provide a quantitative assessment of the interactions made between protein side chains and the pyranose forms of the most-common monosaccharides found across all high-resolution structures of protein–carbohydrate complexes in the Protein Data Bank. We have quantified biases in the amino-acid occurrence in the immediate vicinities of the carbohydrates, with a preponderance of aromatic residues, and particularly the electron-rich side chain of tryptophan, above and/or below the plane of the carbohydrate rings. This preference for aromatics is at the expense of aliphatic hydrophobic residues. Thus, it is not simply the case that the faces of the carbohydrate are sequestered through the hydrophobic effect. Our data indicate that two effects are at play. As a first-order effect, the electronegative faces of the aromatic rings engage in favorable electrostatic interactions with certain electropositive faces of the carbohydrates. In addition, a more-specific and more-intimate second-order effect operates. Specifically, polarized, electropositive C–H bonds of the carbohydrate engaging in CH−π interactions with a contacting aromatic ring. This model is supported by calculation of the electrostatic surface potentials of both the carbohydrate and arene rings, examination of the proximity of individual carbohydrate carbon atoms to the aromatic groups, and the linear free energy relationship analysis. Moreover, because the electrostatic surfaces, and, importantly, the electropositive characters of C–H bonds differ between carbohydrate isomers, the aromatic side chains engage with different regions of the carbohydrate. This not only provides a mechanism contributing to the binding of carbohydrates by proteins, but also for discriminating between one monosaccharide and other closely similar structures within their binding sites. 1H NMR chemical shift perturbations in carbohydrate–aromatic interactions in solution. (A) Interactions between methyl glycosides and 7.5 mM indole in D2O. The circle color and size is scaled to represent the chemical-shift change relative to indole-free solutions (Δδ = δindole – δindole-free). From left to right: β-d-Gal, β-d-Glc, and β-d-Man. (B) Δδ shift for H5 and methyl C–H protons of methyl-β-d-Gal versus the Hammett σp parameter of the 5-substituent in a series of substituted indoles. To allow for solubility limitations, all perturbations were normalized to 7.5 mM indole using the linear dependence of chemical-shift perturbation on indole concentration, Figure S9. Linear fits of the data are shown for H5 (gradient = 5.7, R2 = 0.86) and Me (gradient = 2.1, R2 = 0.63). Δδ values were independent of glycoside concentration. ppb = parts per billion. These bioinformatics and experimental findings provide a strong construct for understanding the fundamental forces underpinning protein–carbohydrate interactions, and they have implications for studies of their molecular recognition. For instance, by increasing the electropositivity of C–H bonds, carbohydrate binding should be facilitated via improved carbohydrate–aromatic interactions. In this way, carbohydrates with electron-withdrawing O-acylated or O-sulfated groups could form stronger CH−π interactions. Similarly, hydrogen bonding or calcium-ion coordination to key carbohydrate hydroxyl groups could increase the strength of CH−π interactions. Given the vital role that carbohydrate–protein interactions play in biology, one strategy for designing glycomimetic drugs would be to exploit specific CH−π interactions, or the general presence of electron-rich aromatic rings to complement electropositive faces of carbohydrates in binding sites. While the importance of CH−π interactions in carbohydrate-based environments is apparent from our studies, this class of interaction plays roles within wider ligand binding, the structure of macromolecules and proteins, and in the mechanisms of chemical reactions.[12] Therefore, appreciation of the impact of stereoelectronic effects on these and similar noncovalent interactions has potential for application within many contexts.

37 in total

1. C-H...pi-interactions in proteins.

Authors: M Brandl; M S Weiss; A Jabs; J Sühnel; R Hilgenfeld
Journal: J Mol Biol Date: 2001-03-16 Impact factor: 5.469

2. Data mining the protein data bank: automatic detection and assignment of carbohydrate structures.

Authors: Thomas Lütteke; Martin Frank; Claus-W von der Lieth
Journal: Carbohydr Res Date: 2004-04-02 Impact factor: 2.104

3. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors: Weizhong Li; Adam Godzik
Journal: Bioinformatics Date: 2006-05-26 Impact factor: 6.937

4. Nature and physical origin of CH/pi interaction: significant difference from conventional hydrogen bonds.

Authors: Seiji Tsuzuki; Asuka Fujii
Journal: Phys Chem Chem Phys Date: 2008-04-04 Impact factor: 3.676

5. Solvent interactions determine carbohydrate conformation.

Authors: K N Kirschner; R J Woods
Journal: Proc Natl Acad Sci U S A Date: 2001-08-28 Impact factor: 11.205

6. Molecular recognition of saccharides by proteins. Insights on the origin of the carbohydrate-aromatic interactions.

Authors: María del Carmen Fernández-Alonso; Francisco Javier Cañada; Jesús Jiménez-Barbero; Gabriel Cuevas
Journal: J Am Chem Soc Date: 2005-05-25 Impact factor: 15.419

7. Aromatic-carbohydrate interactions: an NMR and computational study of model systems.

Authors: Sophie Vandenbussche; Dolores Díaz; María Carmen Fernández-Alonso; Weidong Pan; Stéphane P Vincent; Gabriel Cuevas; Francisco Javier Cañada; Jesús Jiménez-Barbero; Kristin Bartik
Journal: Chemistry Date: 2008 Impact factor: 5.236

8. Carbohydrate-pi interactions: what are they worth?

Authors: Zachary R Laughrey; Sarah E Kiehna; Alex J Riemen; Marcey L Waters
Journal: J Am Chem Soc Date: 2008-10-10 Impact factor: 15.419

9. Carbohydrate Structure Suite (CSS): analysis of carbohydrate 3D structures derived from the PDB.

Authors: Thomas Lütteke; Martin Frank; Claus-W von der Lieth
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

10. The Universal Protein Resource (UniProt).

Authors: Amos Bairoch; Rolf Apweiler; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

83 in total

1. Differential Peptidoglycan Recognition Assay Using Varied Surface Presentations.

Authors: Elizabeth A D'Ambrosio; Klare L Bersch; Mackenzie L Lauro; Catherine L Grimes
Journal: J Am Chem Soc Date: 2020-06-16 Impact factor: 15.419

2. Anatomy of noncovalent interactions between the nucleobases or ribose and π-containing amino acids in RNA-protein complexes.

Authors: Katie A Wilson; Ryan W Kung; Simmone D'souza; Stacey D Wetmore
Journal: Nucleic Acids Res Date: 2021-02-26 Impact factor: 16.971

3. Transmembrane motions of PglB induced by LLO are coupled with EL5 loop conformational changes necessary for OST activity.

Authors: Hui Sun Lee; Wonpil Im
Journal: Glycobiology Date: 2017-08-01 Impact factor: 4.313

4. Structures of Xenopus Embryonic Epidermal Lectin Reveal a Conserved Mechanism of Microbial Glycan Recognition.

Authors: Kittikhun Wangkanont; Darryl A Wesener; Jack A Vidani; Laura L Kiessling; Katrina T Forest
Journal: J Biol Chem Date: 2016-01-11 Impact factor: 5.157

5. The Dependence of Carbohydrate-Aromatic Interaction Strengths on the Structure of the Carbohydrate.

Authors: Che-Hsiung Hsu; Sangho Park; David E Mortenson; B Lachele Foley; Xiaocong Wang; Robert J Woods; David A Case; Evan T Powers; Chi-Huey Wong; H Jane Dyson; Jeffery W Kelly
Journal: J Am Chem Soc Date: 2016-06-14 Impact factor: 15.419

Review 6. Predicting the Structures of Glycans, Glycoproteins, and Their Complexes.

Authors: Robert J Woods
Journal: Chem Rev Date: 2018-08-09 Impact factor: 60.622

7. Structural analysis and unique molecular recognition properties of a Bauhinia forficata lectin that inhibits cancer cell growth.

Authors: Jacek Lubkowski; Sarah V Durbin; Mariana C C Silva; David Farnsworth; Jeffrey C Gildersleeve; Maria Luiza V Oliva; Alexander Wlodawer
Journal: FEBS J Date: 2017-02-01 Impact factor: 5.542

8. Structural Analysis of an Avr4 Effector Ortholog Offers Insight into Chitin Binding and Recognition by the Cf-4 Receptor.

Authors: Amanda C Kohler; Li-Hung Chen; Nicholas Hurlburt; Anthony Salvucci; Benjamin Schwessinger; Andrew J Fisher; Ioannis Stergiopoulos
Journal: Plant Cell Date: 2016-07-08 Impact factor: 11.277

9. Using Cooperatively Folded Peptides To Measure Interaction Energies and Conformational Propensities.

Authors: Maziar S Ardejani; Evan T Powers; Jeffery W Kelly
Journal: Acc Chem Res Date: 2017-07-19 Impact factor: 22.384

10. Analysis of Melanin-like Pigment Synthesized from Homogentisic Acid, with or without Tyrosine, and Its Implications in Alkaptonuria.

Authors: Adam M Taylor; Koen P Vercruysse
Journal: JIMD Rep Date: 2016-12-10