Literature DB >> 29028318

Combination of Cα-H Hydrogen Bonds and van der Waals Packing Modulates the Stability of GxxxG-Mediated Dimers in Membranes.

Samantha M Anderson¹, Benjamin K Mueller¹, Evan J Lange¹, Alessandro Senes¹.

Abstract

The GxxxG motif is frequently found at the dimerization interface of a transmembrane structural motif called GASright, which is characterized by a short interhelical distance and a right-handed crossing angle between the helices. In GASright dimers, such as glycophorin A (GpA), BNIP3, and members of the ErbB family, the backbones of the helices are in contact, and they invariably display networks of 4 to 8 weak hydrogen bonds between Cα-H carbon donors and carbonyl acceptors on opposing helices (Cα-H···O═C hydrogen bonds). These networks of weak hydrogen bonds at the helix-helix interface are presumably stabilizing, but their energetic contribution to dimerization has yet to be determined experimentally. Here, we present a computational and experimental structure-based analysis of GASright dimers of different predicted stabilities, which show that a combination of van der Waals packing and Cα-H hydrogen bonding predicts the experimental trend of dimerization propensities. This finding provides experimental support for the hypothesis that the networks of Cα-H hydrogen bonds are major contributors to the free energy of association of GxxxG-mediated dimers. The structural comparison between groups of GASright dimers of different stabilities reveals distinct sequence as well as conformational preferences. Stability correlates with shorter interhelical distances, narrower crossing angles, better packing, and the formation of larger networks of Cα-H hydrogen bonds. The identification of these structural rules provides insight on how nature could modulate stability in GASright and finely tune dimerization to support biological function.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：
Glycophorins

Year: 2017 PMID： 29028318 PMCID： PMC5927632 DOI： 10.1021/jacs.7b07505

Source DB: PubMed Journal: J Am Chem Soc ISSN： 0002-7863 Impact factor: 15.419

Introduction

Oligomerization is critical for the biological function of many membrane proteins. In particular, oligomerization is important for the bitopic or “single-pass” proteins [i.e., those that span the membrane bilayer with a single transmembrane (TM) helix], which are the largest class of integral membrane proteins.[1−3] Over 2300 single-pass proteins are predicted to exist in the human proteome alone, including oligomerizing systems such as receptor tyrosine kinases,[4−8] cytokine receptors,[9,10] integrins,[11,12] cadherins,[13] apoptotic regulators,[14−16] enzymes,[17] immunological complexes,[18] and many more.[19] The TM helices often have a critical role in driving and modulating the oligomerization of these systems, frequently acting in cooperation with the proteins’ soluble domains. Deciphering the rules that govern TM helix oligomerization in these systems is critical to understanding function and mechanisms of disease in a broad array of biological events. The oligomerization of TM helices is often mediated by structural motifs that are evolutionarily optimized for protein–protein interactions.[20,21] One of the most prevalent dimerization motifs for single-pass proteins is the fold of the glycophorin A dimer (GpA), which is named GASright from the right-handed crossing angle between the helices (near −40°), and the presence of small amino acids (Gly, Ala, Ser: GAS).[20] These small residues are arranged to form GxxxG and GxxxG-like sequence motifs (GxxxG, GxxxA, SxxxG, etc.)[22−24] typically found at the GASright dimerization interface (Figure a). As extensively reviewed by Teese and Langosch, GxxxG sequence motifs are prevalent in biology, and they are frequently associated with parallel, right-handed GASright structures (although GxxxG can also be found in antiparallel or left-handed dimers and even at lipid-binding sites).[19] The sequence context surrounding the GxxxG motif can modulate stability,[25,26] and thus, the versatile GASright motif can be found both in proteins that form very stable “structural” dimers (such as GpA[27] and BNIP3[16]), as well as in weaker and dynamic systems in which changes in conformation or oligomerization state are necessary for supporting function (such as signaling in members of the ErbB receptor tyrosine kinase[4,5,7,28−30] and integrin families[31−33]). Despite its common occurrence and importance, however, the fundamental physical rules that determine the strength of GASright dimerization are yet not well understood.

Figure 1

The GASright dimerization motif. (a) The GASright motif is a right-handed helical dimer with a short interhelical distance (6.3–7.5 Å) and a right-handed crossing angle of approximately −40°. The GxxxG sequence pattern at the crossing point (red) allows the backbones to come into contact. (b) The contact enables formation of networks of weak interhelical H bonds between Cα–H donors and carbonyl oxygen acceptors (shown in detail in c). The major unknown is the contribution of weak hydrogen bonds that occur at the interface of GASright dimers to the free energy of dimerization. GASright invariably displays networks of hydrogen bonds formed by Cα–H carbon donors and carbonyl acceptors (Cα–H···O=C), occurring in four to eight instances between atoms on opposing helices at the association interface (Figure , panels b and c).[34] In general, hydrogen bonding can be a stabilizing force in membrane proteins, and it has been shown that “canonical” hydrogen bonds (i.e., those formed by oxygen or nitrogen donors) can drive the interaction of TM helices.[35−39] Carbon is a weaker donor than oxygen or nitrogen, but Cα–H groups are activated by the flanking electron-withdrawing amide groups in the peptide backbone, and thus the strength of Cα–H hydrogen bonds has been estimated to be as much as one-half of the N–H donors in vacuum.[40,41] Therefore, it is plausible that multiple Cα–H hydrogen bonds occurring at the dimerization interface would contribute significantly to the free energy of association in GASright dimers.[34,42] Nevertheless, experimental demonstration of this hypothesis has, so far, remained elusive. A major technical challenge in measuring the contribution of Cα–H hydrogen bonds to TM helix association in GASright dimers is the fact that both the donor and acceptor groups are part of the backbone, making a rational mutation strategy difficult to implement. To date, there have been only two experimental studies that have probed the contribution of Cα–H hydrogen bonds in membrane proteins. One of these studies was not performed on a GASright dimer but rather on the 7-TM helix membrane protein bacteriorhodopsin.[43] The study focused on the interaction between a Cα–H hydrogen bond donor and a threonine hydroxyl group acceptor and found that the removal of the side-chain acceptor group by mutation did not destabilize folding. However, it should be noted that the study targeted one isolated Cα–H hydrogen bond that occurs in the context of a large, multispan membrane protein. A second study investigated the energy of interaction of a Cα–H hydrogen bond in a GASright dimer by IR spectroscopy, estimating a favorable interaction energy of −0.88 kcal/mol between the Cα–H donor of Gly 79 and the carbonyl of Ile 76 of GpA.[44] This result supports the notion that Cα–H hydrogen bonds are likely significantly stabilizing. However, it is understood that geometry can play a significant role in determining the strength of Cα–H hydrogen bonds,[45] and this study is limited to a single specific bond among the many found in GpA. Moreover, the study measured hydrogen bonding strength but not its contribution to the free energy of dimerization, which has not been yet directly assessed. The hypothesis that Cα–H hydrogen bonds are major contributors to the free energy of GASright dimerization remains compelling, particularly given by the unique ability of the structural motif to form this unusual feature. In fact, among all possible symmetric homodimeric configurations, GASright is the only one that promotes the formation of a large number of concurrent Cα–H hydrogen bonds.[42] This ability arises from three unique aspects of the geometry of GASright: (1) a crossing angle that precisely aligns Cα–H donors and carbonyl acceptors across two helices, (2) the presence of Gly at certain specific positions (producing the GxxxG pattern), where they are necessary to prevent clashing between the close helices, and (3) the ability of those same Gly residues to increment the number of Cα–H bonds by donating their second Hα. Therefore, GASright appears to be a structural motif optimized for the formation of Cα–H hydrogen bond networks. We found that an algorithm (CATM) based on the simultaneous optimization of van der Waals forces and Cα–H hydrogen bonding was able to predict the small database of known three-dimensional structures of GASright homodimers to near atomic precision,[42] another finding that indirectly reinforces the importance of these forces in dimerization. The CATM algorithm was later successfully applied to predict the interface of a previously uncharacterized GxxxG-containing dimer, ADCK3, a mitochondrial protein that plays an essential role in the biosynthesis of coenzyme Q.[17] CATM can capture, with remarkable accuracy, the structural features of a variety of GASright dimers. The success in predicting structure raises the question of whether the underlying energetic model can also capture, at least in part, the energetics of GASright dimerization. To address this question, here we have combined CATM with a high-throughput biological assay to examine the relationship between structure and stability of GASright dimers of various geometries. We have applied CATM to the over 2300 sequences of TM domains of single-pass proteins present in the human genome, predicting the structure of hundreds of potential GASright dimers. We then selected candidates that represent a range of predicted dimerization stabilities and assessed their association propensity with TOXCAT, a widely used in vivo reporter assay that is sensitive to the relative association of TM dimers in a biological membrane.[46] After several steps of experimental validation, we obtained computational and experimental measurements for 26 well-behaved candidate GASright homodimers. We observe a significant correlation in the overall trend of energies predicted computationally and the dimerization propensities measured experimentally. These data provide the first experimental evidence for a model in which a combination of van der Waals forces and Cα–H hydrogen bonding acts as a primary source of stability, modulating the strength of GASright association.

Results and Discussion

Structural Prediction of GASright Homodimers

The CATM algorithm is designed to predict the structure of potential GASright homodimers from the amino acid sequence of a TM domain by docking the two helices and simultaneously optimizing van der Waals interactions, weak and canonical hydrogen bondings, and an implicit membrane solvation model. The algorithm only considers potential GASright conformations, and it does not explore the entire conformational range of a generic TM helix dimer, which makes it efficient and capable of searching for potential GASright dimers in high-throughput in large databases of TM sequences. To create a diverse set of predicted GASright dimer structures to be tested experimentally, we drew sequences from the human proteome. The Uniprot database of annotated protein sequences currently identifies 2383 human proteins containing a single TM domain.[47] When these TM domain sequences were run through CATM, they produced 1141 potential GASright dimers with a negative (i.e., favorable) energy score (dimer energy–monomer energy). The CATM scores assume a broad range of association energies, from −70 to 0 kcal/mol, with a skewed bell distribution (Figure a). The left tail of the distribution contains sequences enriched in well-packed structures with extensive Cα–H hydrogen bonding networks that are predicted to be very stable (Figure , panels b and c). The top 10% of the predicted structures form an average of 6.0 ± 1.7 Cα–H bonds. The predicted structures represent a rich repertoire of potential GASright dimers covering a wide range of predicted stabilities for follow-up experimental analysis.

Figure 2

Energy distribution of CATM predicted GASright dimers in human single-pass sequences. (a) Histogram of calculated energies of human GASright dimers. CATM identified 1141 sequences that produced a model with a negative (favorable) energy of association. (b) Extensive complementary packing, as well as (c) the characteristic networks of Cα–H hydrogen bonds displayed by the lowest energy structures, chondroitin sulfate glucuronyltransferase (Uniprot accession Q9P2E5). For the subsequent experimental phase, we did not consider any sequence whose dimer interface contained strongly polar residues (Asp, Glu, Arg, Lys, His, Asn, and Gln) or Pro. We chose to exclude these residues because proline has a tendency to form kinks in helices that are difficult to predict,[48,49] while strongly polar residues have a propensity to drive TM association through the formation of interhelical hydrogen bonds.[37−39,50] Their inclusion would have increased the probability of dimers mediated by nonspecific interfaces, breaking the desired structural correspondence between the model predicted by CATM and the constructs in experimental conditions. These exclusions reduced the number of available sequences to 668. We also excluded sequences with predicted marginal stability (a score higher than −5 kcal/mol). From the remaining 604 sequences, we randomly selected 65 diverse candidates for experimental analysis (Tables S1–S4).

Experimental Strategy: TOXCAT Assay Using Standardized Sequences

To experimentally assess the dimerization of the 65 predicted GASright dimers and their mutants, we used TOXCAT, a widely adopted assay that measures TM homo-oligomerization in biological membranes.[46] This system is based on the in vivo expression of a chimeric protein in the inner membrane of Escherichia coli in which the TM domain of interest is fused to the ToxR transcriptional activator. Dimerization of the TM helices brings together two ToxR subunits, which bind to a specific promoter, activating the expression of the reporter gene chloramphenicol acetyltransferase (CAT). Quantification of CAT thus provides an indication of the extent of TM helix–helix association in a biological membrane (Figure a).

Figure 3

Experimental design. (a) TOXCAT is an in vivo assay based on a construct in which the TM domain under investigation is fused to the ToxR transcriptional activator. TM association results in the expression of a reporter gene in E. coli cells, which can be quantified. (b) To reduce variability in TOXCAT, the eight interfacial amino acids identified by CATM in the wild-type sequences (top) were “stitched” into a standardized poly-Leu sequence (bottom). Standardization of the predicted constructs retains the geometry of the interface while controlling the length of the TM helix, the position of the crossing point, and the hydrophobicity of the TM segment. The general relationship between reporter gene expression in TOXCAT and thermodynamic stability of any given dimer is likely complex, but reasonable correlation has been found for collections of point mutants of GpA and their energy of dimerization in detergents.[51,52] In these studies, the constructs are homogeneous, having identical length of the TM region, nearly identical sequence, and comparable hydrophobicity. Because TOXCAT’s response may be dependent on these variables,[15,53,54] controlling them is likely to simplify the comparison between constructs. The predicted lengths of the TM domains of the 2383 human single-pass sequences in Uniprot range widely,[22] and their estimated ΔG of membrane insertion ranges from −6.7 to +11.9 kcal/mol (using the biological ΔGapp predictor[55]). To reduce heterogeneity as much as possible, we adopted a strategy of “stitching” the 8 positions predicted by CATM to be at the helix–helix interface of a standardized TM helix of 21 amino acids consisting of a poly-Leu backbone (LLLxxLLxxLLxxLLxxLILI, where the x represents the variable interfacial positions). As illustrated in Figure b, this stitching strategy ensures that all constructs have the same TM domain length and that the predicted interface is centered in the middle of the membrane. Perhaps most importantly, the standardized sequence reduces the variability in hydrophobicity. Because the noninterfacial residues in all the constructs remain constant, the ΔGapp range for membrane insertion is reduced to −6.6 to −2.9 kcal/mol, likely leading to a more consistent expression of the constructs in the E. coli membrane. Another important reason for standardizing all noninterfacial positions is that the strategy removes potential alternative dimerization interfaces that may be present within the wild-type sequence because only the amino acids involved in the predicted GASright interfaces are carried over into the standardized constructs. There is an existing precedent for such a strategy with GASright homodimers: it has been shown that the interfacial residues of GpA in a leucine backbone behave similarly to the wild-type sequence.[46] In addition, a pure poly-Leu sequence has a relatively low propensity for self-association in TOXCAT,[37,39,56] which is important for reducing the risk of alternate interfaces. To ensure that the interfaces of the standardized sequences were consistent with those initially predicted for the wild-type sequences, the standardized sequences were also evaluated with CATM (Tables S1–S4). We found that CATM consistently predicts nearly identical interfaces for wild-type and standardized constructs. The computed energies that we report for our analysis below correspond to those calculated using the standardized poly-Leu construct and not the original wild-type sequences.

Experimental Validation of the Predicted Structures

To partially validate the predicted structural models, we adopted a mutagenesis strategy. Saturation mutagenesis has been commonly used to identify or confirm the interface of TM dimers.[15,17,57−59] Because it would be impractical to perform saturation mutagenesis of all 65 candidate constructs, we opted to introduce in each construct a single mutation predicted to be highly detrimental, selecting the most sensitive interfacial position of GASright homodimers, the so-called “C1” position, as defined in our previous work.[42] The C1 position is one of the residues near the crossing point of the helical dimer. In GASright homodimers, C1 is required to be occupied by Gly in order to allow contact between the backbones of the two helices.[42] Substitution of Gly at C1 with a large hydrophobic amino acid, such as Ile, would push the helices apart and completely eliminate any potential association mediated by the predicted interface. We computationally verified that all models of C1Gly→Ile variants contained significant clashes. Introduction of this control enabled the removal of constructs that retained significant association in TOXCAT after the C1Gly→Ile mutation, since these results suggest that the dimerization observed experimentally was not mediated by the predicted GASright structural model (or, alternatively, that a second possible dimerization interface is also present in the construct, which is not disrupted by the C1Gly→Ile mutation). To confirm proper membrane insertion, each of the 65 constructs and their C1Gly→Ile variants was tested for its ability to support growth in minimal medium containing maltose as the only carbon source, as standard practice in TOXCAT.[46] A total of 15 constructs (wild type or C1 variant) did not fully grow in these conditions (Table S2). These constructs were not further considered in the study. We then eliminated constructs whose TOXCAT signal was below the minimal threshold of a pure poly-Leu construct because we would not be able to differentiate specific GASright-mediated dimerization from background association. A pure poly-Leu construct displays approximately 30% of the CAT expression level of the GpA standard, therefore any construct below the 30% threshold was eliminated (10 constructs, Table S3). Finally, any constructs whose C1Gly→Ile control variant scored above 30% of relative CAT expression level were also eliminated from the analysis because they did not match our expected model, as explained earlier (14 constructs, Table S4). As an exception to this rule, if a C1Gly→Ile mutation reduced the “wild-type” CAT activity by at least 75% we retained it for analysis, even if it was above the 30% threshold, because of the dramatic reduction in dimerization (3 constructs). The final 26 GASright constructs are listed in Table S1. Their predicted structural models are illustrated in Figure S1. The progression from the 2328 genomic sequence to the final 26 experimental constructs is summarized in Table S5. We verified the expression of the ToxR-TM-MBP chimeras of the 26 constructs by Western blots: the constructs displayed rather homogeneous levels of expression, with a standard deviation of 22% (Figure S2).

Cα–H Hydrogen Bonds and vdW Predict Experimental Association Propensities

The comparison of association energies calculated with CATM and dimerization propensities assessed by TOXCAT for the 26 selected constructs is shown in Figure a. The plot shows a statistically significant correlation (R2 = 0.441, p < 0.0005, t test of linear regression slope). One clear outlier is present in the plot (the TNR12 construct, TOXCAT 119%, CATM −6 kcal/mol, highlighted in gray): if this point is excluded, the R2 increases to 0.647 (p < 0.000005). The correlation is also statistically significant by rank order correlation coefficient analysis, which does not assume a linear model (r = −0.683, p < 0.005, and r = −0.827, p < 0.000001, with and without TNR12, respectively).[60] Some of the variance is likely due to the biological nature of the TOXCAT assay, some to imprecision by CATM in predicting the structures, and the remaining variance can be attributed to the limitations of the energy model, which was constructed solely on its ability to predict structure. However, the energetic model is clearly able to capture the trend of dimerization propensities observed experimentally.

Figure 4

Comparison of CATM energies with apparent TOXCAT dimerization. (a) Comparison of CATM energy score of 26 sequences and their TOXCAT signal (measured as the enzymatic activity of the reporter gene CAT). The points are color-coded according to the grouping in (b). The error bars represent the standard deviation among replicates. The dashed line represents the linear regression fit of the data, with the exclusion of the outlier point highlighted in gray (R2 = 0.647, p < 0.000005). (b) Same data as in (a), grouped and averaged in five bins based on CAT activity from weak (>25%, magenta) to very strong (>125%, blue), in 25% intervals. The error bars represent the standard error of the average. The dashed line is the linear regression of the data (R2 = 0.931, p < 0.01). The groups are the base of the analysis reported in Figure .

Figure 5

Sequence and structural bias occur in groups with different stabilities. (a) Sequences of the 26 constructs ranked by TOXCAT signal showing the groups, as defined in Figure . GxxxG motifs are underlined with a solid line, GxxxG-like motifs with a dotted line. Color coding as in Figure b. (b) Nomenclature of the interfacial positions, as defined previously.[42] The sequence and structural biases of the groups of experimental constructs (orange symbols) are illustrated for (c) the number of Cα–H hydrogen bonds, which increases with stability, (d) the interhelical distance, and (e) crossing-angle, which decrease with stability, and the fraction of sequences containing (f) GxxxG and (g) Gly at the N1 position, which also increase. Data also reported in Table . The same trends are observed in groups of different stabilities computed from the entire data set of 670 structures predicted from the human proteome (blue symbols).

Structural and Sequence Analysis of Groups with Distinct Stability

Interesting differences in structural and sequence features are observed among constructs with different dimerization propensities. To appreciate these structural and energetic properties that distinguish strong from weaker dimers, we grouped the data according to five levels of TOXCAT signal, using five 25%-wide bins, from very weak (25–50% GpA) to very strong apparent dimerization (>125% GpA). We first confirmed that the energy model is sensitive enough to distinguish between the five stability groups. Indeed, proportionality is retained after TOXCAT and CATM values are averaged within each groups (Figure b). Linear regression of these averaged values produces a significant fit (p < 0.01), with a R2 value of 0.931 if the TNR12 outlier is excluded, and a R2 value of 0.883 when TNR12 is included (p < 0.05, Figure S3). The regression analyses of Figure (panels a and b) produce two distinct equations of the line, which is an expected mathematical outcome of averaging. However, it should be noted that a linear relationship is likely not the correct physical model and is not necessarily expected.[61] What is important is that there is proportionality between TOXCAT and CATM outcomes, and that the energetic model is able to clearly differentiate among the five sets of constructs. Therefore, the grouping is suitable for a comparative analysis of sequence and structural features that characterize constructs with increasing apparent stability. Statistical analysis of the trends independent of grouping is also provided.

Stability Correlates with Sequence Biases

The results of the sequence and structural features of the five groups are summarized in Table and Figure . Some sequence biases at the interface of the predicted dimers were already present in the initial pool of 604 sequences, as expected for a selection of GASright dimers (Figure S4). Most notably, the sequences are enriched with GxxxG and GxxxG-like motifs, and Gly is nearly absolutely preserved at position C1, where this amino acid is required for interhelical backbone contact (the nomenclature of the positions is defined in Figure b).[42] However, on top of these biases, a number of interesting trends emerged within our experimental pool that correlate statistically with their stability.

Table 1

Energetic and Geometric Properties of Groups of Constructs of Different Apparent Dimerization and Statistical Significance of the Distributionsa

TOXCAT range (% GpA)	25–50%	50–75%	75–100%	100–125%	125+%	correlation with TOXCAT
number of constructs	7	7	4	4	3
average TOXCAT (% GpA)	37 ± 1	58 ± 2	88 ± 4	118 ± 2	141 ± 7
CATM energy score (kcal/mol)b	–14.7 ± 2.5	–27.1 ± 2.5	–30.0 ± 2.0	–39.1 ± 2.8	–41.7 ± 7.5	p < 0.000001d
van der Waals (kcal/mol)	–26.2 ± 5.3	–33.7 ± 4.5	–33.6 ± 2.1	–39.3 ± 2.4	–39.0 ± 11.1	p < 0.005d
Cα-H hydrogen bonding (kcal/mol)	–5.2 ± 1.1	–8.0 ± 1.9	–9.7 ± 0.5	–12.0 ± 2.3	–13.0 ± 0.8	p < 0.000001d
solvation (kcal/mol)	16.7 ± 1.9	14.2 ± 1.9	13.3 ± 2.0	11.7 ± 2.4	10.6 ± 2.7	p < 0.00005d
crossing angle (deg)	–51 ± 4	–47 ± 6	–49 ± 2	–41 ± 7	–39 ± 7	p < 0.01d
number of Cα-H hydrogen bonds	4.6 ± 1.0	5.1 ± 1.1	6.0 ± 0.0	7.5 ± 1.0	8.0 ± 0.0	N/Ae
interface surface area (Å²)	4810 ± 490	4660 ± 500	4630 ± 190	4770 ± 540	4510 ± 280	–
interhelical distance (Å)	7.1 ± 0.2	6.7 ± 0.3	6.5 ± 0.1	6.4 ± 0.1	6.5 ± 0.0	p < 0.00005d
van der Waals/interface surface area [kcal/(mol Å²)]	–0.0054 ± 0.0012	–0.0073 ± 0.0015	–0.0073 ± 0.0005	–0.0083 ± 0.0008	–0.0086 ± 0.0021	p < 0.001d
sequences with GxxxG	2/7	4/7	3/4	4/4	3/3	p < 0.01f
sequences with Sm-xxx-Smc	6/7	7/7	4/4	4/4	3/3	–
sequences with Gly at N1	0/7	3/7	3/4	4/4	3/3	p < 0.0001f
sequences with Gly at C1	7/7	7/7	4/4	4/4	3/3	–
sequences with Gly at C5	2/7	1/7	0/4	2/4	1/3	–

All values are reported as averages ± standard deviation unless noted. The outlier TNR12 was excluded from the 100–125% group.

Values are reported as averages ± standard error as in Figure .

Sm-xxx-Sm are defined by any combinations of Gly, Ala, Ser, and Cys at the first and last position.

Rank order (Spearman) correlation analysis.[60]

Rank correlation statistics not applicable to noncontinuous variable.

Point Biserial Correlation analysis.[62]

All values are reported as averages ± standard deviation unless noted. The outlier TNR12 was excluded from the 100–125% group. Values are reported as averages ± standard error as in Figure . Sm-xxx-Sm are defined by any combinations of Gly, Ala, Ser, and Cys at the first and last position. Rank order (Spearman) correlation analysis.[60] Rank correlation statistics not applicable to noncontinuous variable. Point Biserial Correlation analysis.[62] Sequence and structural bias occur in groups with different stabilities. (a) Sequences of the 26 constructs ranked by TOXCAT signal showing the groups, as defined in Figure . GxxxG motifs are underlined with a solid line, GxxxG-like motifs with a dotted line. Color coding as in Figure b. (b) Nomenclature of the interfacial positions, as defined previously.[42] The sequence and structural biases of the groups of experimental constructs (orange symbols) are illustrated for (c) the number of Cα–H hydrogen bonds, which increases with stability, (d) the interhelical distance, and (e) crossing-angle, which decrease with stability, and the fraction of sequences containing (f) GxxxG and (g) Gly at the N1 position, which also increase. Data also reported in Table . The same trends are observed in groups of different stabilities computed from the entire data set of 670 structures predicted from the human proteome (blue symbols). The first trend is the frequency of the GxxxG motif, which increases from the least to the most stable groups (Figure c, orange symbols, and Table ). In particular, the three more stable groups (>75%, >100%, >125%) contain GxxxG motifs in all but one construct, formed by the C1 Gly and a second Gly either at N1 (the position at i-4 from C1) or at C5 (the position at i+4); conversely, in the two less stable groups (>50% and >25%) GxxxG is found in just 43% of the sequences. The biased distribution of GxxxG containing sequences is confirmed, independently from the grouping scheme, using Point Biserial correlation statistics, which measures correlation between a continuous variable (TOXCAT signal) and a binary variable (occurrence of GxxxG) (correlation coefficient r = +0.63, p < 0.001).[62] The fact that GxxxG is present in the most stable constructs is not surprising. However, it should be noted that some low-stability sequences also contain GxxxG, further demonstrating that the presence of the sequence motif is not the sole determinant of stability.[25,26] Notably, all sequences that do not contain a GxxxG motif contain a Small-xxx-Small motif (i.e., GxxxA, SxxxG, etc.), with the exception of one low affinity construct (1A32–2). The GxxxG motifs in the sequence can be formed by the invariable Gly at C1 together with a second Gly at either N1 or C5. However, the marked increase of GxxxG in the most stable constructs is primarily due to the presence of a Gly at position N1 (Figure d). In the three most stable groups 10 out of 11 of the sequences have a Gly at N1, whereas Gly occurs rarely at the same position in the lower stability groups (p < 0.0001). Conversely, Gly at C1 is rarer and its presence does not correlate with apparent stability.

Stability Correlates with Structural Features

These trends suggest that distinct sequence biases occur among GASright dimers of different stabilities. To understand their physical basis, we looked at how structural parameters varied as a function of apparent stability. We observed numerous structure-related differences, which are summarized in Table and Figure . The analysis indicates that as stability increases, (i) the distance between the helices becomes increasingly shorter, (ii) the crossing angle becomes smaller, (iii) the structural models become increasingly better packed, and (iv) they display larger networks of Cα–H hydrogen bonds. The interhelical distance (measured between the helical axes) progressively decreases from an average of 7.1 Å for the lowest stability set, down to 6.5 Å for the most stable group, which is near the closest two helices can approach before their backbones would sterically clash (Figure e). The correlation between TOXCAT and interhelical distance is statistically significant (rank order spearman correlation coefficient r = +0.74, p < 0.0005, Table ). We also observe a reduction of the interhelical angle, which progressively decreases toward −40° (p < 0.01, Figure f). These geometric changes are favored by the presence of Gly at N1, as discussed in the previous section. The tighter interhelical contact in the most stable constructs leads to an increase of favorable van der Waals interactions between the helices (p < 0.005), which improve by 49% from the lowest to the highest dimerizing groups from −26.2 to −39.0 kcal/mol. These improved van der Waals interactions do not originate from a larger dimer interface (which remains relatively constant across the sets), therefore they are attributable to better packing. The more intimate interhelical contact also favors a very significant change in hydrogen bonding: the number of interhelical Cα–H hydrogen bonds increases from 4.6 to 8.0 on average (Figure g). Correspondingly, the average contribution of hydrogen bonding to the binding energy more than doubles from −5.2 to −13.0 kcal/mol (p < 0.000001). Finally, we observe a reduction of the cost of desolvating the helices (from +17 kcal/mol to +11 kcal/mol) that also contributes to the better energy score computed in CATM for the more stable dimers (p < 0.00005). To further investigate these sequence and geometry biases, we performed a similar analysis on the entire set of 604 poly-Leu predicted structures, grouping the results by decreasing CATM energy (blue symbols in Figure and Table S7). We observe very similar trends across all examined variables. A progressive reduction of the crossing angle and interhelical distance is observed as the energy score decreases, along with an increased number of Cα–H hydrogen bonds and improvement of the packing. Similarly, presence of a GxxxG motif increases, reaching 100% in the lowest energy groups. The trend mirrors the presence of Gly at position N1, whereas the presence of Gly at C5 (the second position that can form a GxxxG motif with C1) also increases but not as dramatically, topping at 50% in the lowest energy group. In summary, the model suggests that the stronger interactions tend to be formed by helices that have a closer distance and a smaller crossing angle. These geometries tend to be favored by the presence of a second Gly at N1, forming a GxxxG motif with the Gly at C1, although the precise stability and conformation of each dimer is influenced by its entire sequence context. It is possible that some of the observed results may be influenced by the current experimental conditions. For example, the optimal crossing angle could be sensitive to the thickness of the membrane and the length of the TM helices, which were not varied in either the computational or the TOXCAT experiments. Nevertheless, these biases provide important insight into how the GASright sequence is able to modulate stability, a feature that is likely important for a structural motif that is found in both stable constitutive dimers, as well as in weaker “dynamic” or transient dimers, where dissociation or conformational change is required for the function of the protein.

Conclusions

An unusual interaction is at the core of one of the most common transmembrane motifs, and yet the contribution of these Cα–H hydrogen bond networks to the free energy of dimerization has remained uncertain. This is in part due to scarce availability of structures, which poses a serious hurdle to understanding the structural basis of TM helix oligomerization. Structure-based analysis has been possible for a few structurally characterized dimeric systems, such as GpA[25,63] and BNIP3.[16] Conversely, large-scale comparative analyses, based either on combinatorial libraries,[23,64−68] comprehensive protein families,[69] or homology-based clusters of human proteins[53] have been performed primarily on sequences of unknown structure. Computational modeling has often been applied in coordination to these approaches to aid in the interpretation of experimental data.[11,17,18,57,58,70−76] An advance of the present work is the availability of a reliable structural prediction method, which has enabled the design of an experimental analysis of dimers of diverse stabilities to test pre-existing structural and energetic models. This analysis addressed the question of whether Cα–H hydrogen bonding and van der Waals forces are predictive of the dimerization propensity of GASright dimers. The results provide the first experimental support for the hypothesis that Cα–H hydrogen bonds are indeed major determinants of dimerization in GxxxG-mediated dimers. Our data complement the only other experimental report in the literature that has shown that Cα–H hydrogen bonds have the potential to stabilize GASright, a study by Arbely & Arkin that measured the strength of a single hydrogen bond interaction in a well-characterized GASright model system;[44] here, we addressed the role of Cα–H hydrogen bonds as contributor to the free energy of dimerization, examined at the level of the entire structural motif. We found that a simple energy model combining Cα–H hydrogen bonding and van der Waals already forms a good base when tested in biological membranes, albeit in standardized sequence conditions. The present analysis also provides initial insight on how change in the sequence and geometry may modulate these terms and therefore overall stability in GASright. The results also suggest that, with more data, a similar strategy would likely support the development of a more sophisticated energy function, which would provide further insight into the forces involved in GASright association as well as improve our ability to accurately predict structure and stability of these dimers from primary sequence data alone.

Methods

Software

All calculations were implemented and performed using MSL v. 1.1,[77] an open source C++ library that is freely available at http://msl-libraries.org.

Prediction of GASright Structure and Dimerization Energy

The structure of GASright dimers was predicted from a database of 2383 human sequences annotated as single-pass membrane proteins in Uniprot (as of November 2, 2016).[47] Structural prediction was performed with the program CATM.[42] Side chain mobility was modeled using the energy-based conformer library applied at the 95% level.[78] Energies were determined using the CHARMM 22 van der Waals function,[79] the IMM1 membrane implicit solvation model,[80] and the hydrogen bonding function of SCWRL 4,[81] as implemented in MSL,[77] with the following parameters for Cα donors, as reported previously: B = 60.278, D0 = 2.3 Å, σd = 1.202 Å, αmax = 74.0°, and βmax = 98.0°.[42] The CATM algorithm was described in detail previously.[42] Briefly, the sequence of interest is threaded into a set of different registers at each of 463 representative geometries. If sequence-based filtering rules are met, the sequence is built on the backbone in all atoms and the helices are docked by reducing the interhelical distance in steps. At each step, the side chains are optimized and the interaction energy is evaluated until a minimum energy is found. To further optimize the dimer, the geometry is then subjected to Monte Carlo backbone perturbation cycles in which all interhelical parameters (distance, Z shift, axial rotation, and crossing angle) are locally varied. If the final interaction energy (calculated as the energy of the dimer minus the energy of two monomers separated at long distance) is negative, the solution is accepted. The solutions are then clustered using an RMSD criterion to produce a series of distinct models. The computation produced 1141 structures of predicted GASright homodimers. These structures are available at http://seneslab.org/CATM.

Cloning and Expression of Chimeric Proteins in MM39 Cells and MalE Complementation Assay

DNA sequences containing the transmembrane region of interest were cut with NheI and DpnII restriction enzymes and cloned into the NheI-BamHI restriction sites of the pccKAN vector as previously described.[17,57] The TOXCAT constructs were transformed into MM39 cells. A freshly streaked colony was inoculated into 3 mL of LB broth containing 100 μg/mL ampicillin and grown overnight at 37 °C. 50 μL of overnight cultures were inoculated into 3 mL of LB broth and grown to an OD600 of 0.8–1.0 at 37 °C. After recording the optical density, 1 mL of cells was spun down for 15 min at 17000g and resuspended in 500 mL of sonication buffer (25 mM Tris-HCl, 2 mM EDTA, pH 8.0). Cells were lysed by probe sonication at medium power for 10 s over ice. An aliquot was removed from each sample and stored in SDS–PAGE loading buffer for immunoblotting. The lysates were then cleared by centrifugation at 17000g, and the supernatant was kept on ice for chloramphenicol acetyltransferase (CAT) activity assay. To confirm proper membrane insertion and orientation of the TOXCAT constructs, overnight cultures were plated on M9 minimal medium plates containing 0.4% maltose as the only carbon source and grown at 37 °C for 48–72 h. The variants that did not grow in these conditions were not considered for this study.

Chloramphenicol Acetyltransferase (CAT) Spectrophotometric Assay

CAT activity was measured as described.[57,82] Briefly, 750 μL of buffer containing 0.4 mg/mL 5,5′- dithiobis(2-nitrobenzoic acid) or Ellman’s reagent and 0.1 M Tris-HCl, pH 7.8, were mixed with 250 μL of 0.4 mM acetyl CoA and 40 μL of cleared cell lysates, and the absorbance at 412 nm was measured for 2 min to establish basal enzyme activity rate. After addition of 40 μL of 2.5 mM chloramphenicol in 10% ethanol, the absorbance was measured for an additional 2 min to determine CAT activity. The basal CAT activity was subtracted, and the value was normalized by the cell density measured as OD600. All measurements were determined by at least four independently cultured biological replicates, each of which was measured with two technical replicates.

Quantification of Expression by Immunoblotting

Protein expression was confirmed by immunoblotting. The cell lysates were normalized by cell density and loaded onto a NuPAGE 4–12% bis-tris SDS–PAGE gel (Invitrogen) and then transferred to PVDF membranes (VWR) for 1 h at 100 millivolts. Blots were blocked using 5% bovine serum albumin (US Biologicals) in TBS-Tween buffer (50 mM Tris, 150 mM NaCl, 0.05% Tween 20) for overnight at 4 °C, incubated with goat biotinylated anti-maltose binding protein antibodies (Vector laboratories) for 2 h at room temperature, followed by peroxidase-conjugated streptavidin antigoat secondary antibodies (Jackson ImmunoResearch) for 2 h at 4 °C. Blots were developed with the Pierce ECL Western Blotting Substrate Kit; 1 mL of ECL solution was added to the blot and incubated for 90 s. Chemiluminescence was measured using an ImageQuant LAS 4000 (GE Healthsciences). Individual bands were quantified by ImageJ.

79 in total

1. The Calpha ---H...O hydrogen bond: a determinant of stability and specificity in transmembrane helix interactions.

Authors: A Senes; I Ubarretxena-Belandia; D M Engelman
Journal: Proc Natl Acad Sci U S A Date: 2001-07-31 Impact factor: 11.205

2. Activation of integrin alphaIIbbeta3 by modulation of transmembrane helix associations.

Authors: Renhao Li; Neal Mitra; Holly Gratkowski; Gaston Vilaire; Rustem Litvinov; Chandrasekaran Nagasami; John W Weisel; James D Lear; William F DeGrado; Joel S Bennett
Journal: Science Date: 2003-05-02 Impact factor: 47.728

3. Thermodynamics of glycophorin A transmembrane helix dimerization in C14 betaine micelles.

Authors: Karen G Fleming; Cha-Chi Ren; Abigail K Doura; Matthew E Eisley; Felix J Kobus; Ann Marie Stanley
Journal: Biophys Chem Date: 2004-03-01 Impact factor: 2.352

4. Sequence determinants of a transmembrane proton channel: an inverse relationship between stability and function.

Authors: Amanda L Stouffer; Vikas Nanda; James D Lear; William F DeGrado
Journal: J Mol Biol Date: 2005-01-21 Impact factor: 5.469

5. Unique dimeric structure of BNip3 transmembrane domain suggests membrane permeabilization as a cell death trigger.

Authors: Eduard V Bocharov; Yulia E Pustovalova; Konstantin V Pavlov; Pavel E Volynsky; Marina V Goncharuk; Yaroslav S Ermolyuk; Dmitry V Karpunin; Alexey A Schulga; Michail P Kirpichnikov; Roman G Efremov; Innokenty V Maslennikov; Alexander S Arseniev
Journal: J Biol Chem Date: 2007-04-04 Impact factor: 5.157

6. An energy-based conformer library for side chain optimization: improved prediction and adjustable sampling.

Authors: Sabareesh Subramaniam; Alessandro Senes
Journal: Proteins Date: 2012-06-18

Review 7. Transmembrane helix dimerization: beyond the search for sequence motifs.

Authors: Edwin Li; William C Wimley; Kalina Hristova
Journal: Biochim Biophys Acta Date: 2011-09-01

8. Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions.

Authors: A Senes; M Gerstein; D M Engelman
Journal: J Mol Biol Date: 2000-02-25 Impact factor: 5.469

9. The structure of a receptor with two associating transmembrane domains on the cell surface: integrin alphaIIbbeta3.

Authors: Jieqing Zhu; Bing-Hao Luo; Patrick Barth; Jack Schonbrun; David Baker; Timothy A Springer
Journal: Mol Cell Date: 2009-04-24 Impact factor: 17.970

10. Measuring transmembrane helix interaction strengths in lipid bilayers using steric trapping.

Authors: Heedeok Hong; Yu-Chu Chang; James U Bowie
Journal: Methods Mol Biol Date: 2013

17 in total

1. The Cytokinin Oxidase/Dehydrogenase CKX1 Is a Membrane-Bound Protein Requiring Homooligomerization in the Endoplasmic Reticulum for Its Cellular Activity.

Authors: Michael C E Niemann; Henriette Weber; Tomáš Hluska; Georgeta Leonte; Samantha M Anderson; Ondřej Novák; Alessandro Senes; Tomáš Werner
Journal: Plant Physiol Date: 2018-01-04 Impact factor: 8.340

2. Engineered occluded apo-intermediate of LacY.

Authors: Irina Smirnova; Vladimir Kasho; H Ronald Kaback
Journal: Proc Natl Acad Sci U S A Date: 2018-11-26 Impact factor: 11.205

3. De novo designed transmembrane peptides activating the α5β1 integrin.

Authors: Marco Mravic; Hailin Hu; Zhenwei Lu; Joel S Bennett; Charles R Sanders; A Wayne Orr; William F DeGrado
Journal: Protein Eng Des Sel Date: 2018-05-01 Impact factor: 1.650

4. Packing of apolar side chains enables accurate design of highly stable membrane proteins.

Authors: Marco Mravic; Jessica L Thomaston; Maxwell Tucker; Paige E Solomon; Lijun Liu; William F DeGrado
Journal: Science Date: 2019-03-29 Impact factor: 47.728

5. Impact of Cholesterol Concentration and Lipid Phase on Structure and Fluctuation of Amyloid Precursor Protein.

Authors: George A Pantelopulos; Afra Panahi; John E Straub
Journal: J Phys Chem B Date: 2020-11-02 Impact factor: 2.991

Review 6. Regulation of the alternative β-secretase meprin β by ADAM-mediated shedding.

Authors: Franka Scharfenberg; Fred Armbrust; Liana Marengo; Claus Pietrzik; Christoph Becker-Pauly
Journal: Cell Mol Life Sci Date: 2019-06-14 Impact factor: 9.261

Review 7. Membrane receptor activation mechanisms and transmembrane peptide tools to elucidate them.

Authors: Justin M Westerfield; Francisco N Barrera
Journal: J Biol Chem Date: 2019-12-25 Impact factor: 5.157

8. Structure of APP-C99_1-99 and implications for role of extra-membrane domains in function and oligomerization.

Authors: George A Pantelopulos; John E Straub; D Thirumalai; Yuji Sugita
Journal: Biochim Biophys Acta Biomembr Date: 2018-04-24 Impact factor: 3.747

9. Folding and Misfolding of Human Membrane Proteins in Health and Disease: From Single Molecules to Cellular Proteostasis.

Authors: Justin T Marinko; Hui Huang; Wesley D Penn; John A Capra; Jonathan P Schlebach; Charles R Sanders
Journal: Chem Rev Date: 2019-01-04 Impact factor: 60.622

Review 10. De novo protein design, a retrospective.

Authors: Ivan V Korendovych; William F DeGrado
Journal: Q Rev Biophys Date: 2020-02-11 Impact factor: 5.318