Literature DB >> 31461441

A lipophilicity-based energy function for membrane-protein modelling and design.

Jonathan Yaacov Weinstein¹, Assaf Elazar¹, Sarel Jacob Fleishman¹.

Abstract

Membrane-protein design is an exciting and increasingly successful research area which has led to landmarks including the design of stable and accurate membrane-integral proteins based on coiled-coil motifs. Design of topologically more complex proteins, such as most receptors, channels, and transporters, however, demands an energy function that balances contributions from intra-protein contacts and protein-membrane interactions. Recent advances in water-soluble all-atom energy functions have increased the accuracy in structure-prediction benchmarks. The plasma membrane, however, imposes different physical constraints on protein solvation. To understand these constraints, we recently developed a high-throughput experimental screen, called dsTβL, and inferred apparent insertion energies for each amino acid at dozens of positions across the bacterial plasma membrane. Here, we express these profiles as lipophilicity energy terms in Rosetta and demonstrate that the new energy function outperforms previous ones in modelling and design benchmarks. Rosetta ab initio simulations starting from an extended chain recapitulate two-thirds of the experimentally determined structures of membrane-spanning homo-oligomers with <2.5Å root-mean-square deviation within the top-predicted five models (available online: http://tmhop.weizmann.ac.il). Furthermore, in two sequence-design benchmarks, the energy function improves discrimination of stabilizing point mutations and recapitulates natural membrane-protein sequences of known structure, thereby recommending this new energy function for membrane-protein modelling and design.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2019 PMID： 31461441 PMCID： PMC6736313 DOI： 10.1371/journal.pcbi.1007318

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

Introduction

Membrane proteins have essential biological roles as receptors, channels, and transporters. Over the past decade, significant progress has been made in the design of membrane proteins, including the first design of membrane-integral inhibitors [1], a transporter [2,3], and a de novo designed structure based on coiled-coil motifs [4]. Despite this exciting progress, however, modelling, design, and engineering of membrane proteins lag far behind those of soluble proteins [5]. This lag is due, in part, to the relatively small number of high-resolution membrane-protein structures [6] and is exacerbated by these proteins’ typically large size. Clearly, however, the heterogeneity of a membrane protein’s environment, comprising water, lipid, and polar headgroups, is the most significant complication [7]. Therefore, modelling solvation is a fundamental problem that impacts all membrane-protein structure prediction and design. Current energy functions used in modelling and design incorporate simplified solvation models [8]. For instance, RosettaMP uses information inferred from water-to-cyclohexane partitioning [9] as a proxy for amino acid solvation in the plasma membrane [10,11]. Due to these simplifications, expert analysis has been a prerequisite for accurate membrane-protein modelling and design [12,13]. Automating modelling and design processes and extending them to complex membrane proteins will likely require an accurate energy function that correctly balances intra-protein interactions, membrane solvation and water solvation [14-16]. To understand the contributions to membrane-protein solvation, we recently established a high-throughput experimental screen, called deep sequencing TOXCAT-β-lactamase (dsTβL), which quantified apparent amino acid transfer energies from the cytosol to the E. coli plasma membrane [17]. From the resulting data, we inferred apparent position-specific insertion profiles for each amino acid relative to alanine, reconciling previously conflicting lines of evidence [18]. Foremost, the lipophilicity inferred for hydrophobic residues, such as Leu, Ile, and Phe, was greater than previously measured in some membrane mimics, including the water-to-cyclohexane transfer energies that are the basis for membrane solvation in Rosetta [9,11,16] (approximately 2 kcal/mol according to dsTβL compared to ½ kcal/mol), and in line with theoretical considerations [19,20]. Second, the profiles exhibited a strong 2 kcal/mol preference for Arg and Lys in the intracellular side of the plasma membrane compared to the extracellular side. While this preference, known as the “positive-inside” rule, was revealed based on sequence analysis 30 years ago [21-23], the dsTβL assay was the first to indicate a large energy gap favouring positively charged residues in the intracellular relative to the extracellular membrane leaflet. The accuracy and generality of the dsTβL apparent transfer energies were partly verified by demonstrating that they correctly predicted the locations and orientations of membrane spans directly from sequence even in several large and complex eukaryotic transporters [24]. Taken together, these results provided reassurance that the dsTβL apparent insertion energies correctly balanced essential aspects of membrane-protein solvation. As the next step towards accurate all-atom membrane-protein modelling and design, we develop a new lipophilicity-based energy term based on the dsTβL amino acid-specific insertion profiles and integrate this energy term in the Rosetta centroid-level and all-atom potentials. We furthermore develop a strategy to enhance conformational sampling of membrane-spanning helical segments and of helix-tilt angles observed in naturally occurring membrane proteins. Encouragingly, the new energy function outperforms previous ones in three benchmarks essential to modelling and design: atomistic ab initio structure prediction starting from completely extended chains of single-spanning membrane homo-oligomers of known structure, prediction of mutational effects on stability, and sequence recovery in combinatorial sequence design. An automated web server for structure prediction in transmembrane homooligomeric proteins (TMHOP) is available at (http://tmhop.weizmann.ac.il) and may enable modelling of the membrane-spanning domains of receptors. We conclude that the combination of lipophilicity and energetics developed for soluble proteins provides a basis for accurate structure prediction and design of membrane proteins.

Results

A lipophilicity-based membrane-protein energy function

The recent all-atom energy function in Rosetta, ref2015, is dominated by physics-based terms, including van der Waals packing, hydrogen bonding, electrostatics and water solvation [25]. This energy function was parameterized on a large set of crystallographic structures and experimental data of water-soluble proteins and was shown to outperform previous energy functions in several structure-prediction benchmarks. For membrane-protein modelling and design, however, the ref2015 solvation potential is relevant only to the water-embedded regions of the protein; a different potential is required to model the energetics of amino acids near and within different regions of the plasma membrane. Accordingly, we sought to replace the ref2015 solvation model with one that encodes a gradual transition from the default water-solvation that evaluates regions distant from the plasma membrane and the dsTβL insertion profiles near and within the plasma membrane. The dsTβL profiles were inferred from an experimental mutation analysis of a monomeric membrane span into which each of the 20 amino acids were individually introduced at each position [17]; the profiles were then normalized to express the apparent transfer energy for each amino acid at each position relative to a theoretical poly-Ala membrane span, yielding apparent ΔΔGAla—>mut at each position across the plasma membrane (Fig 1). As a first step to encoding these energy profiles in Rosetta, we smoothed these profiles and symmetrised them with respect to the presumed membrane midplane, except the profiles for Arg, His, and Lys, for which the “positive-inside” rule applies (S1 Fig).

Fig 1

The lipophilicity-based ref2015_memb energy function.

The lipophilicity-based ref2015_memb energy function.

Membrane-insertion profiles for six representative amino acids are shown. Raw dsTβL data (purple dots), ref2015 (dashed green line), the ref2015_memb potential (dashed blue line) and the dsTβL profiles (red line). Negative and positive membrane depths indicate the inner and outer membrane leaflets, respectively; the presumed membrane midplane is at 0. Next, we implemented an iterative strategy to encode the dsTβL energetics in a modified ref2015 all-atom energy function which we called ref2015_memb. To enable efficient conformational search as required in ab initio structure prediction and de novo design, we also encoded this energetics in the centroid-level energy function [26]. As a reference state in both all-atom and centroid-level modelling, we generated an ideal poly-Ala α helix and placed it perpendicular to the membrane plane. At each position along the helix (including the aqueous and membrane phases), we introduced each of the 19 point mutations, relaxed the models using the all-atom or centroid-level energy function, and computed the energy difference due to each single-point mutation ΔΔGAla—>mut. In the first iteration of these calculations, the unmodified ref2015 or centroid-level energy functions were used, resulting, as expected, in large deviations from the apparent energies observed in the dsTβL profiles (red lines in Fig 1). We then added a new context-dependent 1-body energy term, called MPResidueLipophilicity, which encoded the difference between the computed and dsTβL energies for each mutation at each position, ΔΔΔGAla—>mut. We iterated mutation, relaxation, energy calculations, and MPResidueLipophilicity updates for each of the mutations at each position up to ten times, noting that the computed energies converged with the trends observed in the experiment (blue and red lines in Fig 1, respectively). Scripts for calibrating the all-atom and centroid energy functions are available at github.com/Fleishman-Lab/membrane_protein_energy_function to enable adapting future improvements of the Rosetta energy functions to encode the dsTβL energetics. The dsTβL apparent energy profiles were inferred from a monomeric segment [17]. Consequently, the profiles express the lipophilicity of each amino acid relative to Ala across the membrane when that amino acid is maximally solvent-exposed. To account for amino acid burial in multispan or oligomeric membrane proteins, we derived a continuous, differentiable and easily computable weighting term that expresses the extent of a residue’s burial in other protein segments. For any amino acid, this weighting term is based on the number of heavy-atom neighbours within 6 and 12 Å distance of the amino acid’s Cβ atom (Eq 1) resulting in a weight that expresses the extent to which a residue is buried in other protein segments or exposed to solvent (0 to 1, respectively). Water-embedded and completely buried positions are treated with the ref2015 solvation energy; fully membrane-exposed positions are treated with the MPResidueLipophilicity energy, and positions of intermediate exposure are treated with a linearly weighted sum of the two terms. In summary, the actual contribution from solvation of an amino acid is a function of its exposure to the membrane and depends on the amino acid’s lipophilicity according to the dsTβL apparent energy and the position’s location relative to the membrane midplane. Note that this energy term averages lipophilicity contributions in the plasma membrane and does not express atomic contributions to solvation that are likely to be important in calculating membrane-protein energetics in different types of biological membranes [11,27], in non-helical membrane-exposed segments, or surrounding water-filled cavities [28]. The dsTβL assay reports on residue-specific insertion into the plasma membrane. Ab initio modelling and de novo design, however, also require a potential that addresses the protein backbone solvation. Although the low-dielectric environment in the core of the membrane enforces a strong tendency for forming canonical α helices [7], deviations from canonical α helicity can make important contributions to membrane-protein structure and function [29]. Therefore, we encoded an energy term, called MPHelicality, that allows sampling backbone dihedral angles and penalises deviations from α helicity (Eq 5). MPHelicality enforces strong constraints on the dihedral angles in the lipid-exposed surfaces at the core of the membrane and is attenuated in regions that are buried in other protein segments and in the extra-membrane environment (using the same weighting as for lipophilicity, Eq 1); this term thus allows significant deviations from α helicity only in buried or water-embedded regions. In preliminary ab initio calculations starting from a fully extended chain, we noticed that conformational sampling significantly favoured large helical tilt angles relative to the membrane normal (θ in Fig 2). By contrast, 50% of naturally observed membrane spans exhibit small tilt angles in the range 15–30°. The skew in conformational sampling towards large tilt angles is expected from previous theoretical investigations according to which the distribution of helix-tilt angles in random sampling is proportional to sin(θ), substantially preferring large angles compared to the distribution observed in natural membrane proteins [30]. To eliminate this skew in conformational sampling, we introduced another energy term, called MPSpanAngle (Eq 4 and Fig 2), that strongly penalised large tilt angles, guiding ab initio sampling to tilt angles observed in natural proteins.

Fig 2

Observed versus expected tilt angles in membrane-spanning helices relative to the membrane normal.

Observed versus expected tilt angles in membrane-spanning helices relative to the membrane normal.

The distribution of helix tilt angles (θ in the inset sphere) in natural membrane proteins shows a strong preference for small angles (red bars, left), whereas the distribution resulting from random conformational sampling is proportional to sin(θ) (blue bars) [30] significantly overrepresenting large tilt angles. The MPSpanAngle energy term (green line; Eq 4) penalises large tilt angles and focuses ab initio conformational sampling on tilt angles observed in membrane-protein structures. inset The expected distribution of helix-tilt angles is proportional to the circumference of a circle plotted by that helix around an axis perpendicular to the membrane-normal (panel adapted from ref. [30]). The membrane plane is depicted as a grey circle. In summary, ref2015_memb encodes three new energy terms relative to the soluble energy function ref2015: (1) a lipophilicity term based on amino acid type, membrane-depth, and burial; (2) a penalty on deviations from α helicity in backbone-dihedral angles; and (3) a penalty on the sampling of large tilt angles with respect to the membrane-normal (S1 Table). In the calculations reported below, the penalties on deviations from α helicity and helix-tilt angles are implemented in all centroid-level ab initio structure prediction simulations; all-atom calculations use the ref2015 energy modified with the lipophilicity term (ref2015_memb).

Ab initio structure prediction in membrane proteins

Previous structure-prediction benchmarks started from canonical α helices or from monomers obtained from experimental structures of homodimers and used the bound-structures in grid search or rigid-body docking [11,16,31-34]. Additionally, structure-prediction studies used experimental constraints, conservation analysis or correlated-mutation analysis to predict residue contacts in order to constrain conformational sampling [12,13,35-40]. Several automated predictors dedicated to single-span dimers used shape complementarity [41,42], sequence-packing motifs [43] or comparative modelling [44], but to the best of our knowledge, ab initio modelling calculations, starting from a fully extended chain, have not been described. Given that deviations from canonical α helicity make important contributions to membrane-protein structure and function [29], we decided to apply a more stringent test using ab initio modelling, sampling all symmetric backbone, sidechain, and rigid-body degrees of freedom. To test ab initio modelling using the new energy function, we applied the fold and dock protocol [45], which has been successfully applied in a variety of soluble-protein structure prediction and design studies [46-49]. Briefly, fold and dock starts from an extended chain and conducts several hundred iterations of symmetric centroid-level backbone-fragment insertion and relaxation moves. It then applies symmetric all-atom refinement including all dihedral sidechain and backbone degrees of freedom (S1 Movie). To generate an energy landscape, we ran 5,000 independent trajectories (50,000 for high-order oligomers) for every 19 and 21 residue subsequence of each homooligomer, filtered the resulting models according to energy and structure parameters (Methods), and isolated the lowest-energy 10% of the models. Models were then clustered according to their energies and conformations, and five cluster representatives were compared to the experimental structures (Figs 3 and 4, Table 1). For comparison, we applied the described methodology using ref2015_memb, ref2015 and the current membrane-protein energy function in Rosetta, RosettaMP [11].

Fig 3

Energy landscapes for the TMHOP ab initio structure prediction benchmark.

All models that passed the energy and structure-based filters are shown as semi-transparent grey dots. Each of the five lowest-energy clusters is indicated by coloured circles (lowest-to-highest energy: red, blue, green, purple and black). The PDB entry is indicated on each panel, and the oligomeric state is specified by grey circles for oligomeric states than homodimers. Y-axes report the ref2015_memb energy normalised by the monomeric sequence length of each model.

Fig 4

Structural comparison of the top-predicted model (by RMSD) produced by TMHOP and the experimentally determined structure (gold and grey, respectively).

PDB entry, RMSD and the model’s ranking (in energy) among the top-5 predicted models are indicated. Only accurately predicted structures (< 2.5 Å) are shown.

Table 1

TMHOP Structure prediction benchmark.

Grey cells indicate RMSD < 2.5Å or fraction of native contacts using ref2015_memb > 0.7.

			RMSD of nearest model structure (Å)
PDB code	# subunits¹	fraction of native contacts²	ref2015_memb	RosettaMP	ref2015	PREDDIMER	TMDIM
2J5D	2	0.92	0.95	1.04	1.06	8.37	2.44
2L6W	2	0.90	0.98	2.67	2.04	2.28	5.22
1AFO	2	0.87	1.02	2.87	1.13	1.99	0.84
2L2T	2	0.86	1.01	4.77	6.40	1.85	0.75
2MEU	2	0.80	1.17	2.81	4.65	2.71	4.05
2J7A	2	0.80	0.85	3.36	0.94	9.74	6.54
2M0B	2	0.72	1.12	1.57	1.55	3.22	1.78
2K9Y	2	0.58	2.39	1.82	1.59	1.89	3.88
2K1K	2	0.54	1.62	1.38	1.18	1.77	1.51
2LZ3	2	0.47	2.24	2.14	1.95	2.09	3.56
2MK9	2	0.41	2.30	2.15	2.56	6.88	2.95
2HAC	2	0.30	2.13	3.24	2.31	2.06	2.34
2JWA	2	0.17	2.63	1.36	NA	2.42	2.21
2LZL	2	0.00	4.72	4.54	NA	3.64	3.55
2L9U	2	0.00	5.22	4.15	NA	4.24	1.66
2L34	2	0.00	3.98	3.92	NA	1.12	4.84
2MIC	2	0.00	3.25	3.33	NA	8.70	5.57
3LBW	4	0.14	2.45	1.82	7.10	NA	NA
2KIX	4	0.10	3.43	4.06	NA	NA	NA
2KYV	5	0.22	2.29	1.82	1.46	NA	NA

1 oligomeric state (dimer, tetramer, or pentamer)

2 fraction of native contacts in the lowest RMSD model from ref2015_memb. A contact is defined as two positions across the interface that are closer than 8 Å. The fraction of all contacts in the native structure recapitulated by the model is reported.

Energy landscapes for the TMHOP ab initio structure prediction benchmark.

Structural comparison of the top-predicted model (by RMSD) produced by TMHOP and the experimentally determined structure (gold and grey, respectively).

PDB entry, RMSD and the model’s ranking (in energy) among the top-5 predicted models are indicated. Only accurately predicted structures (< 2.5 Å) are shown.

TMHOP Structure prediction benchmark.

Grey cells indicate RMSD < 2.5Å or fraction of native contacts using ref2015_memb > 0.7. 1 oligomeric state (dimer, tetramer, or pentamer) 2 fraction of native contacts in the lowest RMSD model from ref2015_memb. A contact is defined as two positions across the interface that are closer than 8 Å. The fraction of all contacts in the native structure recapitulated by the model is reported. The Protein Data Bank (PDB) contains 17 nonredundant (sequence identity <80%) NMR and X-ray crystallographic structures (adopted from Lomize et al. [44]) of natural single-span homodimers, two tetramers and one pentameric structure. Of the 20 cases in the benchmark, fold-and-dock simulations using ref2015_memb predicted near-native (<2.5 Å root-mean-square deviation [RMSD]) low-energy models for 14 homooligomers compared to nine using RosettaMP; of the 14 oligomers accurately predicted by ref2015_memb, the soluble energy function ref2015 also resulted in nine correct predictions. Prediction success rate using ref2015_memb was somewhat higher for right-handed relative to left-handed homodimers (80 and 50%, respectively; S2 Table), reflecting the tendency of right-handed homodimers to be more tightly packed [31], and in 11 cases, a near-native prediction was found among the top 3 lowest-energy predicted models (Fig 3). Of the three high-order oligomers tested, ref2015_memb successfully recapitulated the structures of the M2 tetramer and phospholamban pentamer. The PREDDIMER [42] and TMDIM [43] structure-prediction web servers, which do not use ab initio modelling, found models at <2.5 Å RMSD for nine and eight of the 17 homodimers, respectively. Thus, ab initio calculations using ref2015_memb accurately predict structures in two-thirds of the homooligomers in our benchmark, including high-order oligomers that cannot be predicted by other automated methods. Given the high success rate of the ab initio calculations, we developed a web-accessible server for predicting the structures of membrane-spanning homo-oligomers such as are observed in receptor tyrosine kinases and other membrane proteins (http://tmhop.weizmann.ac.il). The successfully predicted homooligomers exhibit different structural packing motifs. The majority of the homodimer interfaces are mediated by the ubiquitous Gly-xxx-Gly motif [50], in which two small amino acids separated by three positions on the primary sequence enable close packing between the helices. There is uncertainty whether these motifs additionally form stabilising Cα hydrogen bonds [51,52]. Our structure-prediction analysis cannot resolve this uncertainty; note, however, that the new energy function ref2015_memb does not encode terms for Cα hydrogen bonds and yet recapitulates a large fraction of the homodimer structures (Figs 3 and 4, and Table 1). The underlying reason for successful prediction is that the dsTβL energetics encodes a strong penalty on exposing Gly residues to the lipid bilayer (approximately 2 kcal/mol/Gly at the membrane mid-plane; Fig 1), driving the burial of Gly amino acids within the homodimer interface (i.e., “solvophobicity”). Thus, lipophilicity and interfacial residue packing are sufficient for accurate structure prediction in a large fraction of the targets we examined. In several single-spanning membrane receptors, conformational change in the membrane domain is thought to underlie receptor activation. For instance, past modelling of the ErbB2 membrane domain suggested two non-overlapping interaction sites involving two small-xxx-small motifs within the membrane domain and a molecular switching mechanism that underlies receptor activation [54]. The only experimental structure for ErbB2 involves the N-terminal small-xxx-small motif [55], which is recapitulated by the second predicted cluster (Fig 5A), whereas in the fourth predicted cluster, dimerisation is mediated via the C-terminal motif (Fig 5B), suggesting that in some cases, TMHOP may provide structural hypotheses for alternative binding modes for receptor homooligomeric domains.

Fig 5

Low-energy predicted models of ErbB2 recapitulate two binding modes thought to underlie receptor activation.

Low-energy predicted models of ErbB2 recapitulate two binding modes thought to underlie receptor activation.

(A) The ErbB2 experimental structure (PDB entry 2JWA, grey) and a representative of the second-best cluster (orange). (B) The fourth-best cluster (orange), RMSD 5.8 Å from 2JWA, exhibits a right-handed conformation in which Gly68 and Gly72 are buried, in qualitative agreement with experiments [53] and modelling [54]. Using the dsTβL assay, we also examined the effects of dozens of point mutations in glycophorin A on apparent association energy (ΔΔG) in the bacterial plasma membrane [17]. As a stringent test of the new energy function, we conducted fold-and-dock calculations using both ref2015_memb and RosettaMP starting from the sequences of each of the point mutants. To reduce uncertainty in interpreting the experimental results, we focused on 32 mutations that exhibited large apparent energy changes in the experiment (|ΔΔG| ≥ 2 kcal/mol) and compared the median computed ΔΔG of the lowest-energy models to the experimental observation (Fig 6, S3 Table). ref2015_memb outperformed RosettaMP, correctly assigning 81% of mutations as stabilizing or destabilizing compared to 66% for RosettaMP. The six false-positive predictions using ref2015_memb are due to mutations at position Gly86, which is exposed to the membrane, explaining why our simulations predict these mutations to be neutral or favourable. Note that as observed in studies of mutational effects on stability in soluble proteins, the correlation coefficient between computed and observed values is low (Pearson r2 = 0.21 and 0.02 for ref2015_memb and RosettaMP, respectively) [56-59]. Such low correlation coefficients provide an impetus for improving the energy function; however, as we previously demonstrated, discriminating stabilizing from destabilizing mutations is sufficient to enable the design of accurate, stable, and functionally efficient proteins [59-64].

Fig 6

Predicted versus experimental ΔΔG values of single-point mutations in glycophorin A.

Predicted versus experimental ΔΔG values of single-point mutations in glycophorin A.

The structure of every point mutant was predicted ab initio, and the median ΔΔG relative to the wild type sequence is reported. Only point mutations that exhibited |ΔΔG| ≥ 2 kcal/mol in the experiment were analysed. TP, TN, FP, and FN—true positive, true negative, false positive, and false negative, respectively. We next tested sequence-recovery rates using combinatorial sequence optimisation based on ref2015, ref2015_memb, and RosettaMP in a benchmark of 20 non-redundant structures (<80% sequence identity) ranging in size from 124–765 amino acids [65]. ref2015_memb outperformed the other energy functions, exhibiting 83% sequence recovery, on average, when each design was compared to the target’s natural homologs (Table 2). To our surprise, the soluble energy function ref2015 outperformed RosettaMP in this test and was almost as successful as ref2015_memb (78% overall success), implying that the packing and electrostatic models of ref2015 [25] enabled at least some of the improvement observed in sequence recovery by ref2015_memb (see S1 Table for a comparison of the energy functions). High sequence recovery in both buried and exposed positions implies that ref2015_memb may be applied effectively to design large and complex membrane proteins.

Table 2

Sequence recovery rates in Rosetta combinatorial sequence optimisation.

	Sequence recovery¹			Homology recovery²
	buried	exposed	all	buried	exposed	all
ref2015_memb	0.52	0.32	0.42	0.86	0.81	0.83
ref2015	0.53	0.33	0.43	0.85	0.71	0.78
RosettaMP	0.23	0.20	0.21	0.64	0.70	0.67

1 Only exact matches to the natural protein sequence are counted as recovered

2 For each target protein, a position-specific scoring matrix (PSSM) was computed from a multiple-sequence alignment. At each position, recovery was considered if the amino acid identity had a PSSM score ≥ 0.

1 Only exact matches to the natural protein sequence are counted as recovered 2 For each target protein, a position-specific scoring matrix (PSSM) was computed from a multiple-sequence alignment. At each position, recovery was considered if the amino acid identity had a PSSM score ≥ 0.

Discussion

An accurate energy function is a prerequisite for automated modelling and design, and solvation makes a critical contribution to protein structure and function. The recent dsTβL apparent energies of insertion into the plasma membrane [17] enabled us to derive an empirical lipophilicity-based energy function for Rosetta. The results demonstrate that ref2015_memb outperforms RosettaMP in three benchmarks that are important for structure prediction and design. As ref2015_memb is based on the current state-of-the-art water-soluble Rosetta energy function, prediction accuracy is high for ref2015_memb both in soluble regions and in the core of the membrane domain. Thus, the lipophilicity preferences inferred from the dsTβL energetics together with the residue packing calculations in Rosetta enable accurate modelling in several ab initio prediction cases. The current energy function and the fold and dock procedure accurately model homooligomeric interactions in the membrane and the effects of point mutations, suggesting that they may enable the accurate design of homooligomeric single-span receptor-like transmembrane domains. The high accuracy models generated by the TMHOP method also suggest that laborious and often failed experiments to determine the structures of homooligomeric receptor membrane domains may be circumvented through ab initio modelling. Nevertheless, certain important attributes of membrane-protein energetics are not yet addressed by ref2015_memb; for instance, atomic-level solvation and the impact on electrostatic interactions due to changes in the dielectric constant in various parts of the membrane are currently not treated [16,28] and warrant further research. Furthermore, the dsTβL profiles are based on measurements conducted on α-helical proteins in E. coli inner membranes. Ref2015_memb may, therefore, not perform as well in outer-membrane proteins or in proteins residing in membranes with a substantially different lipid composition. The benchmark reported here provides a basis on which improvements in the energy function can be verified. Furthermore, structure prediction in heterooligomers is important for understanding receptor cross-activation and for the design of membrane inhibitors [1]. In preliminary calculations, however, we found that fold and dock simulations of heterooligomeric systems fail to converge due to the much larger conformation space open to a non-symmetric system. A potentially exciting extension of the current work is to use the information on preferred crossing angles between membrane helices to constrain conformational sampling in heterooligomers [66,67]. We recently showed that evolution-guided atomistic design calculations, which use phylogenetic analysis to guide the design processes [68], enabled the automated, accurate and effective design of large and topologically complex soluble proteins. Designed proteins exhibited atomic accuracy, high expression levels, stability [59,60,69], binding affinity, specificity [64], and catalytic efficiency [62,63]. Membrane proteins are typically large and challenging targets for conventional protein-engineering and design methods. Looking ahead, we anticipate that evolution-guided atomistic design using ref2015_memb may enable reliable design in this important but often formidable class of proteins.

Methods

Rosetta source code

All code is available in the Rosetta release at www.rosettacommons.org (git version: b210d6d5a0c21208f4f874f62b2909f926379c0f). Command lines and RosettaScripts [70] are available in the supplement.

Membrane-insertion profiles

The original dsTβL insertion profiles [17] were modified to generate smooth and symmetric functions [24]. The polar and charged residues Asp, Glu, Gln and Asn, which exhibited few counts in the deep sequencing analysis, were averaged such that the insertion energy at the membrane core (-10 to 10 Å; negative values correspond to the inner membrane leaflet and positive values to the outer leaflet) was applied uniformly to the entire membrane span. The profile for His was capped at the maximal value observed in the experiment (2.3 kcal/mol) between 0Å (membrane midplane) and 20 Å. The dsTβL profile for Cys is unusually asymmetric. Cys residues are rare in membrane proteins [71] and are likely to have similar polarity to Ser. We, therefore, applied the profile measured for Ser to Cys. To convert the values from the dsTβL insertion profiles to Rosetta energy units (R.e.u.) they were multiplied by 2.94 following the interpolation reported in ref. [25]. The dsTβL profiles spanned 27 positions, and we correspondingly translated them to span -20 to +20 Å relative to the membrane midplane.

Residue lipophilicity

The context-dependent, one-body energy term MPResidueLipophilicity was implemented to encode the dsTβL insertion profiles in ref2015. Starting from an ideal poly Ala α helix embedded perpendicular to a virtual membrane, every position was mutated to all 19 identities, relaxed, and the energy difference between the ref2015 energy and the dsTβL energy was implemented in MPResidueLipophilicity. This process was repeated ten times to reach convergence, and the resulting energy profiles were fitted by a cubic spline [72], generating continuous, differentiable functions for all 19 amino acids relative to Ala, which was assumed to be 0 throughout the membrane. The splines were recorded in the Rosetta database and are loaded at runtime. Insertion -profile adjustments were done using a python3 script available at github.com/Fleishman-Lab/membrane_protein_energy_function.

Residue burial

The number of protein atoms within 6 and 12 Å of each amino acid’s Cβ atom is computed and transformed to a burial score (Eq 1). We used sigmoid functions which range from 0 to 1, corresponding to completely buried and completely lipid-exposed, respectively. Where N is the number of heavy atoms and S and O determine the slope and offset of the sigmoids and are different for all-atom and centroid calculations. Each parameter has different thresholds at 6 or 12 Å. For all-atom calculations, S = 0.15 and 0.5 and O = 20 and 475, for 6 and 12 Å radii, respectively. For centroid-level calculations, S = 0.15 and 5 and O = 20 and 220 for 6 and 12Å radii, respectively. For each amino acid, the product of the 6 and 12Å sigmoid functions is taken, producing a continuous, differentiable function that transitions from buried to exposed states. These parameters were determined by visualising the burial scores of all amino acids in several polytopic membrane proteins of known structure.

Tilt-angle (Θ; Fig 2) penalty

All membrane-spanning helices reported in the PDBTM [73] dataset (version 20170210) were analyzed for their tilt angles with respect to the membrane normal. A second-degree polynomial was fitted to this distribution using scikit-learn [74]. As Bowie noted, the expected distribution function of helix-tilt angles is sin(Θ) [30]. We, therefore, used a partition function to convert the expected distribution (sin(Θ)) and observed one (Eq 2) to energy functions, finally subtracting the expected energy from the observed one to derive the helix-tilt penalty function: Where θ is given in degrees. In order to simplify runtime calculations, we approximated Eq 3 using a third-degree polynomial (using scikit-learn) (Fig 2).

Penalizing deviations from ideal α helicity

The MPHelicality energy term penalizes the energy of every position that exhibits ϕ-ψ torsion angles significantly different from ideal α helices. A paraboloid function was manually calibrated to express a penalty for any given (ϕ, ψ). The paraboloid centre, for which the penalty is 0, was set to the centre of the helical region according to the Ramachandran plot (ϕ = 60°, ψ = 45°) [75]. The paraboloid curvature was set to 25, such that the penalty is low throughout the ϕ-ψ torsion angles space observed for α helices [75]. As segments buried against the protein should not be penalized to the same extent as those completely exposed to the membrane, the burial approximation of Eq 1 is used to weight MPHelicality. Moreover, as the protein extends outside of the membrane, the penalty is attenuated with a function that follows the trend observed for the hydrophobic residues, Leu, Ile, and Phe (see Fig 1A). In effect, the MPHelicality term favours α helicity in lipid-exposed surfaces in the core of the membrane, thereby enforcing some of the electrostatic and solvophobic effects that are essential for correctly modelling the backbone but are not expressed in the residue-specific dsTβL energy profiles. Where ϕ and ψ are given in degrees, z is the distance from the membrane midplane of residue i, and burial is calculated as in Eq 1.

A benchmark for structure prediction of single-span homooligomers

17 structures of single-span homodimers, two homotetramers and one pentamer were selected from the PDB (S2 Table). For each structure, a 20–30 residue segment comprising the membrane-spanning domain was manually chosen. A sliding window then extracted all 19 or 21 residue subsequences. For each subsequence, three and nine residue backbone fragments were generated using the Rosetta fragment picker application [76]. The fold-and-dock protocol [45] was used to compute 5000 models (50,000 models for tetramers and the phospholamban pentamer), and the lowest-energy 10% of the models were subsequently filtered using structure and energy-based filters (solvent accessible surface area >500 Å; shape complementarity [77] Sc>0.5; ΔΔGbinding <-5 R.e.u.; rotameric binding strain [78] < 4 R.e.u.; helicality <0.1 R.e.u. (computed using Eq 5); and closest distance between the interacting helices < 9 Å, as calculated by the filter HelixHelixAngle). For each target, the filtered models from all subsequences were then pooled together and clustered using a score-wise clustering algorithm. This is an iterative process, where each iteration calculates the RMSD of all unclustered models to the best-energy model, and removes the ones closer than 4 Å. RMSD to NMR structures were calculated with respect to the first model in the PDB entry.

A benchmark for ΔΔG predictions of single-spanning homodimers

Glycophorin A mutants that exhibited |ΔΔG| > 2 kcal/mol according to the dsTβL study [17] were modelled using the same fold-and-dock protocol described for the structure prediction of homodimers. To reduce computational load, we used a single sequence (73-ITLIIFGVMAGVIGTILLI-91), and the median of computed ΔΔG for the top models was reported (models were filtered using structure and energy based filters; solvent accessible surface area > 600 Å, packing > 0.4, shape complementarity > 0.5, ΔΔGbinding < -10 R.e.u., binding strain < 4 R.e.u. MPHelicality < 0.1, minimal atomic distance between helices < 4.5 Å and minimal distance between helix vectors < 8 Å. Of these models, only the top 10% scoring models were used).

Sequence-recapitulation benchmark

20 structures of polytopic membrane-spanning proteins were taken from ref. [65], 11 of which were symmetric complexes. All were refined (eliminating sidechain conformation information before refinement), and for each protein, 100 designs were computed using combinatorial sequence design followed by sidechain and backbone minimization, and the lowest-energy 10 designs were checked for the fraction of mutations relative to the target protein. For each target protein, a multiple-sequence alignment was prepared: homologous sequences were automatically collected using BLASTP [79] on the nonredundant sequence database [80] with a maximal number of targets set to 3,000 and an e-value ≤ 10−4. All sequences were clustered using CD-hit [81] with a 90% sequence identity threshold. Sequences were then aligned using MUSCLE [82] with default parameters. A position-specific scoring matrix (PSSM) was calculated using PSI-BLAST [83]. In the sequence-recovery benchmark, where homologous sequences are considered, the substitution of a given position to an identity with a PSSM score ≥ 0 is considered a match.

Comparison of energy term weights.

ref2015_memb is identical to the ref2015 energy function, except for the addition of the mp_ terms, whereas RosettaMP is based on score12[10]. For a detailed explanation on energy terms see ref [25,26,84]. 1 used only in centroid-level sampling. 2 used in centroid-level sampling and full-atom sampling of single spanning proteins. (PDF) Click here for additional data file.

Structures in the prediction benchmark.

1small-xxx-small in sequence. (PDF) Click here for additional data file.

Raw data on the ΔΔG prediction data.

(PDF) Click here for additional data file.

Structures for the sequence recovery benchmark.

(PDF) Click here for additional data file.

A representative fold and dock simulation for glycophorin A.

In the first 25 seconds of the movie, simulations are in centroid mode, followed by all-atom refinement. (MP4) Click here for additional data file.

Adjustment and calibration of insertion profiles.

Each panel shows membrane insertion profiles for a different amino acid. Raw dsTβL data (purple dots, right-hand Y-axis), dsTβL adjusted profiles (red line), ref2015 insertion profiles (green) and ref2015_memb profiles (blue dashed line). Different residues affect the α helix differently, and therefore have different baselines. Note that Pro has no profile under ref2015_memb due to its effect on the backbone. (TIF) Click here for additional data file.

RosettaScripts XMLs and command lines for all benchmarks.

(PDF) Click here for additional data file.

Top models from structure prediction benchmark.

(GZ) Click here for additional data file.

81 in total

1. Multipass membrane protein structure prediction using Rosetta.

Authors: Vladimir Yarov-Yarovoy; Jack Schonbrun; David Baker
Journal: Proteins Date: 2006-03-01

2. E(z), a depth-dependent potential for assessing the energies of insertion of amino acid side-chains into membranes: derivation and applications to determining the orientation of transmembrane and interfacial helices.

Authors: Alessandro Senes; Deborah C Chadi; Peter B Law; Robin F S Walters; Vikas Nanda; William F Degrado
Journal: J Mol Biol Date: 2006-09-12 Impact factor: 5.469

3. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors: Weizhong Li; Adam Godzik
Journal: Bioinformatics Date: 2006-05-26 Impact factor: 6.937

4. Helix-packing motifs in membrane proteins.

Authors: R F S Walters; W F DeGrado
Journal: Proc Natl Acad Sci U S A Date: 2006-09-05 Impact factor: 11.205

Review 5. Has the code for protein translocation been broken?

Authors: Dalit Shental-Bechor; Sarel J Fleishman; Nir Ben-Tal
Journal: Trends Biochem Sci Date: 2006-03-10 Impact factor: 13.807

6. Computational design of peptides that target transmembrane helices.

Authors: Hang Yin; Joanna S Slusky; Bryan W Berger; Robin S Walters; Gaston Vilaire; Rustem I Litvinov; James D Lear; Gregory A Caputo; Joel S Bennett; William F DeGrado
Journal: Science Date: 2007-03-30 Impact factor: 47.728

7. Spatial structure of the dimeric transmembrane domain of the growth factor receptor ErbB2 presumably corresponding to the receptor active state.

Authors: Eduard V Bocharov; Konstantin S Mineev; Pavel E Volynsky; Yaroslav S Ermolyuk; Elena N Tkach; Alexander G Sobol; Vladimir V Chupin; Michail P Kirpichnikov; Roman G Efremov; Alexander S Arseniev
Journal: J Biol Chem Date: 2008-01-04 Impact factor: 5.157

8. Generalized fragment picking in Rosetta: design, protocols and applications.

Authors: Dominik Gront; Daniel W Kulp; Robert M Vernon; Charlie E M Strauss; David Baker
Journal: PLoS One Date: 2011-08-24 Impact factor: 3.240

9. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design.

Authors: Rebecca F Alford; Andrew Leaver-Fay; Jeliazko R Jeliazkov; Matthew J O'Meara; Frank P DiMaio; Hahnbeom Park; Maxim V Shapovalov; P Douglas Renfrew; Vikram K Mulligan; Kalli Kappel; Jason W Labonte; Michael S Pacella; Richard Bonneau; Philip Bradley; Roland L Dunbrack; Rhiju Das; David Baker; Brian Kuhlman; Tanja Kortemme; Jeffrey J Gray
Journal: J Chem Theory Comput Date: 2017-05-12 Impact factor: 6.006

10. Mutational scanning reveals the determinants of protein insertion and association energetics in the plasma membrane.

Authors: Assaf Elazar; Jonathan Weinstein; Ido Biran; Yearit Fridman; Eitan Bibi; Sarel Jacob Fleishman
Journal: Elife Date: 2016-01-29 Impact factor: 8.140

10 in total

1. Protein Structure Prediction and Design in a Biologically Realistic Implicit Membrane.

Authors: Rebecca F Alford; Patrick J Fleming; Karen G Fleming; Jeffrey J Gray
Journal: Biophys J Date: 2020-03-14 Impact factor: 4.033

2. Structure and receptor recognition by the Lassa virus spike complex.

Authors: Michael Katz; Jonathan Weinstein; Maayan Eilon-Ashkenazy; Katrin Gehring; Hadas Cohen-Dvashi; Nadav Elad; Sarel J Fleishman; Ron Diskin
Journal: Nature Date: 2022-02-16 Impact factor: 69.504

3. In Silico Screening of Bioactive Compounds of Representative Seaweeds to Inhibit SARS-CoV-2 ACE2-Bound Omicron B.1.1.529 Spike Protein Trimer.

Authors: Muruganantham Bharathi; Bhagavathi Sundaram Sivamaruthi; Periyanaina Kesika; Subramanian Thangaleela; Chaiyavat Chaiyasut
Journal: Mar Drugs Date: 2022-02-17 Impact factor: 5.118

4. Local Bilayer Hydrophobicity Modulates Membrane Protein Stability.

Authors: Dagan C Marx; Karen G Fleming
Journal: J Am Chem Soc Date: 2021-01-07 Impact factor: 15.419

5. Modeling Immunity with Rosetta: Methods for Antibody and Antigen Design.

Authors: Clara T Schoeder; Samuel Schmitz; Jared Adolf-Bryfogle; Alexander M Sevy; Jessica A Finn; Marion F Sauer; Nina G Bozhanova; Benjamin K Mueller; Amandeep K Sangha; Jaume Bonet; Jonathan H Sheehan; Georg Kuenze; Brennica Marlow; Shannon T Smith; Hope Woods; Brian J Bender; Cristina E Martina; Diego Del Alamo; Pranav Kodali; Alican Gulsevin; William R Schief; Bruno E Correia; James E Crowe; Jens Meiler; Rocco Moretti
Journal: Biochemistry Date: 2021-03-11 Impact factor: 3.162

Review 6. Membrane proteins enter the fold.

Authors: Dagan C Marx; Karen G Fleming
Journal: Curr Opin Struct Biol Date: 2021-05-08 Impact factor: 7.786

7. Diverse Scientific Benchmarks for Implicit Membrane Energy Functions.

Authors: Rebecca F Alford; Rituparna Samanta; Jeffrey J Gray
Journal: J Chem Theory Comput Date: 2021-07-26 Impact factor: 6.578

8. Computational structure prediction provides a plausible mechanism for electron transfer by the outer membrane protein Cyc2 from Acidithiobacillus ferrooxidans.

Authors: Virginia Jiang; Sagar D Khare; Scott Banta
Journal: Protein Sci Date: 2021-05-25 Impact factor: 6.993

9. Prediction of amphipathic helix-membrane interactions with Rosetta.

Authors: Alican Gulsevin; Jens Meiler
Journal: PLoS Comput Biol Date: 2021-03-17 Impact factor: 4.475

10. De novo-designed transmembrane domains tune engineered receptor functions.

Authors: Assaf Elazar; Nicholas J Chandler; Ashleigh S Davey; Jonathan Y Weinstein; Julie V Nguyen; Raphael Trenker; Ryan S Cross; Misty R Jenkins; Melissa J Call; Matthew E Call; Sarel J Fleishman
Journal: Elife Date: 2022-05-04 Impact factor: 8.713

10 in total