Literature DB >> 28043122

Highly Coarse-Grained Representations of Transmembrane Proteins.

Jesper J Madsen¹, Anton V Sinitskiy¹, Jianing Li¹, Gregory A Voth¹.

Abstract

Numerous biomolecules and biomolecular complexes, including transmembrane proteins (TMPs), are symmetric or at least have approximate symmetries. Highly coarse-grained models of such biomolecules, aiming at capturing the essential structural and dynamical properties on resolution levels coarser than the residue scale, must preserve the underlying symmetry. However, making these models obey the correct physics is in general not straightforward, especially at the highly coarse-grained resolution where multiple (∼3-30 in the current study) amino acid residues are represented by a single coarse-grained site. In this paper, we propose a simple and fast method of coarse-graining TMPs obeying this condition. The procedure involves partitioning transmembrane domains into contiguous segments of equal length along the primary sequence. For the coarsest (lowest-resolution) mappings, it turns out to be most important to satisfy the symmetry in a coarse-grained model. As the resolution is increased to capture more detail, however, it becomes gradually more important to match modular repeats in the secondary structure (such as helix-loop repeats) instead. A set of eight TMPs of various complexity, functionality, structural topology, and internal symmetry, representing different classes of TMPs (ion channels, transporters, receptors, adhesion, and invasion proteins), has been examined. The present approach can be generalized to other systems possessing exact or approximate symmetry, allowing for reliable and fast creation of multiscale, highly coarse-grained mappings of large biomolecular assemblies.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Membrane Proteins

Year: 2017 PMID： 28043122 PMCID： PMC5312841 DOI： 10.1021/acs.jctc.6b01076

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

The symmetry of biomolecules originating from gene duplication and consolidated by evolution,[1−3] while often only approximate, is intimately linked to functionality.[4] For transmembrane proteins (TMPs) in particular, symmetry is one of the common properties shared in their functional states,[5,6] and it has been related to dynamics,[7] fast folding kinetics,[8] high stability,[2] and allosteric regulation.[9−11] In addition, engineering of proteins with internal symmetry has become an emerging field with a growing body of reported success.[12−14] TMPs, such as G protein-coupled receptors and ion channels, are crucial targets in drug discovery due to their physiological roles as direct rectors for drug-like solutes;[15,16] it has been suggested that for receptors of neurotransmitters, for instance,[17−19] an indirect and less specific mechanism whereby solutes absorbed into the lipid bilayer[20−24] can affect the receptor. With around half of current drug targets being these TMPs[25] and most of those drugs targeting only a few members,[26] it is hardly surprising that the study of TMPs has become an active field of research with an increasing amount of experimental and computational efforts for potential pharmaceutical applications. Despite advances in both hardware and software for atomistic molecular simulations,[27−30] there is still a large gap between the duration of all-atom molecular dynamics (MD) trajectories produced on a routine basis (typically, microseconds) and time scales of biologically relevant events observed in experiments involving TMPs (usually milliseconds to seconds[31,32]). One fruitful strategy to overcome this gap and better bridge experiments and simulations is to apply a coarse-graining approach. The structure-based coarse-grained (CG) representation encompasses a reduced level of detail of the system, as atoms are grouped into “effective” particles also termed CG sites, and many of the biofunctionally irrelevant degrees of freedom are integrated out. One level of coarse-graining is the “high resolution” level in which each amino acid is represented by several CG sites or “beads”. Another level of coarse-graining is the “low resolution” highly CG level, where each CG site or bead represents some number of amino acids (e.g., tens or more). This paper concerns the latter limit of CG models. A variety of modern coarse-graining approaches have been developed to define highly CG protein models, including essential dynamics coarse-graining,[33−36] topology representing network,[37] and rigid unit recognition.[38] At the highly coarse-grained level, constructing CG models that satisfy the correct underlying physics is by no means a trivial task and often the resulting models are neither unique nor transferable.[39,40] To simulate TMPs at very large spatial and temporal scales relevant for most biological processes, it is both useful and necessary to resort to models of the lowest, such as ultracoarse-grained (UCG) models,[41,42] where one CG site represents many amino acid residues and may also have internal “states” to represent the various conformations, chemical forms, etc., of those eliminated amino acids from each CG site. The UCG methodology, often motivated in the context of modeling of the actin filament,[41] has only recently been applied to other families of proteins[43] but not yet to TMPs. This work therefore describes our most recent efforts to construct highly CG models for TMPs based on the essential dynamics coarse-graining (ED-CG) method.[33−36] The ED-CG method (or a similar approach[44]) is a systematic variational way of creating CG models that capture the most essential functional motions of biomolecules by a direct mapping of their atomistic motions. In this context, the essential dynamics[45] from the atomistic simulations is used as a proxy for the functionally relevant motions. The ED-CG method determines the assignment of atoms to CG sites (the CG mapping) such that the essential dynamics subspace is best preserved at a given resolution.[33] The ED-CG method has been applied to a variety of globular proteins and protein complexes, including a ribosome,[46] actin filaments,[47] and a hydrogenase.[48] However, two limitations of the ED-CG method should be taken into consideration. First, the ED-CG approach does not by itself automatically determine the optimal resolution level of a CG model. The total number of CG sites is an externally set parameter by the user. (We note that this issue has been partially resolved in our previous work where we developed a set of criteria to choose optimal numbers of CG sites in different parts of a large biomolecular complex in a self-consistent way.[35]) Second, there is no guarantee that the ED-CG technique will create the same CG model for a protein in two or more discrete functional states. A previous study from our group shows that the ED-CG models of globular proteins like G-actin only share 60–80% of similarity between the ATP- and ADP-bound states.[47] This creates a difficulty in using a CG representation, especially when it is desirable to study effects of transitions between distinct topological conformations. In this work, we have focused on addressing these issues for an important class of proteins, namely, TMPs. As pointed out in prior work,[41] it is important to understand the biomolecular features and peculiarities of the systems in order to construct meaningful CG models. It is generally appreciated in the field of coarse-graining that even small “inadequacies” in the CG mapping can manifest as damage beyond repair when the usual pairwise interaction potentials are used; two-site methanol is a classic example of a problematic CG mapping for a molecular liquid.[49,50] For TMPs, the membrane environment imposes particular constraints onto the structure and dynamics of the transmembrane domains inserted into the lipid bilayer[51] and differentiates them from extra- or intracellular domains of TMPs or their soluble counterparts. Such constraints give rise to many intriguing structural and dynamic properties of TMPs to account for their functions, such as symmetry. Although TMPs often exist in multimeric symmetric complexes of several repeating subunits with similar tertiary structures (even though the primary sequences of these subunits may be diverse),[6] they also frequently possess approximate internal symmetry. This work is primarily focused on TMPs with approximate internal symmetry, but the findings have the potential to be extended to cases with generalized symmetry; a comparison is made between CG models built using ED-CG methods and ones built on a simple and intuitive heuristic that exploits the molecular symmetry. It is shown that, by exploiting symmetry, we are able to construct CG mappings of TMPs for highly CG simulations consistent with the mappings resulting from the systematic “bottom-up” ED-CG method without the need for fine-grained MD trajectories and complex numerical optimization schemes.

Models, Theory, and Methods

In principle, the ED-CG method could be applied to all the atoms in a given protein structure. However, as a matter of practice, we use a residue-based strategy instead, wherein the position of each residue is represented solely by its Cα atom. Given a protein of N amino acid residues and N CG sites to assign (N≫ N), we can calculate the ED-CG variational residual χ2 and use it as a measure of the accuracy of a CG mapping to an underlying atomistic MD trajectory with n frames. As defined in prior work,[33] the residual is given bywhere Δr(t) is the fluctuation of the Cα atom of residue i in the essential subspace at time t, calculated from principal component analysis[52] of the atomistic MD simulation. If the Cα atom of another residue j exhibits motion (in the essential subspace) similar to that of the Cα atom of residue i, then it is reasonable to assign residues i and j to the same CG site I. This idea is mirrored in the definition of the χ2 (a “cost function”) by summing terms of fluctuation differences |Δr(t) – Δr(t)|2 over pairs of atoms belonging to the same CG site. In this scheme, the ED-CG method samples a variety of possible ways to group atoms/residues and selects the one with the minimum residual χ2 as the optimal CG model.[35]

Sequence-Based and Space-Based ED-CG Methods

The ED-CG approach comes in two main variations, namely, sequence-based[33] and space-based[36] ED-CG. Both methods group the atoms into CG sites based on minimizing intrasite correlated fluctuations, but the different variants of the method applies different rules in sampling to locate the global minimum of χ2. The sequence-based ED-CG method divides the primary sequence of the protein into contiguous CG domains, while the spaced-based ED-CG method favors CG site definitions with atoms/residues close in the three-dimensional space. Provided the contiguous sequence constraint, the sequence-based ED-CG method is less demanding in sampling, but it does not permit nonadjacent domains in the same CG site, even if they are correlated in fluctuation but separated in the sequence (for example, in the case of a disulfide bond). Because of the much greater number of CG mappings allowed by space-based ED-CG, a brute-force search for the global minimum of χ2 would require looking through an exponentially greater number of combinations in comparison to sequence-based ED-CG. The use of simulated annealing and steepest descent techniques significantly decreases the number of combinations to be considered.[33] Nevertheless, the amount of computations required to achieve a reasonably low value of χ2 is still greater in the case of the space-based ED-CG, and this gap increases with the number of atoms or residues in the biomolecule under investigation.

Power Law Scaling of the ED-CG Residual χ2

In our prior work,[35] it was demonstrated that the ED-CG residual χ2 for the optimal CG map with a given number of CG sites can be approximated by a simple function of the protein size and the number of CG sites,where the anomalous dimension γ is a protein-specific parameter, δ is a protein-independent coefficient, and C′(T) is a temperature-dependent prefactor. For a wide class of proteins, the value of γ was found[35] to range from 0.00 to 0.91 (however, TMPs were not included into the studied set of proteins), while δ ≈ 0.35.

Internal Symmetry, Protein Fluctuation, and Symmetric CG Models

Internal symmetry will provide additional restraints in the coarse-graining of TMPs. In the context of biomolecules, we use the term internal symmetry for symmetry operations obeyed by the three-dimensional structure of the primary polypeptide chain sequence. On the basis of normal-mode analysis of MD simulations and group theory, Matsunaga and co-workers revealed that structural symmetry of homooligomers is a principal determinant of the entire protein complex’s symmetric fluctuation.[7] In the same way, TMPs with internal symmetry should also have symmetric thermal fluctuation, which can be captured by ED-CG methods. Mapped onto the CG model, the symmetric domains of a TMP should result in identical CG domains. Directly, this suggests that the CG model should better describe symmetric fluctuation of the target TMP if it is consistent with the protein symmetry. In the simplest case of building a two-site CG model for a protein with perfect S-fold symmetry, we can always obtain the lowest ED-CG residual χ2 when either CG site contains half of the residues. Our direct method (without ED-CG) of systematically constructing directly comparable CG mappings (of adjustable resolution) that satisfy the three-dimensional structural symmetry of the molecule that it represents is as follows. The contiguous protein sequence is evenly divided into N domains, which gives rise to a CG model that has an identical number of residues in each CG site (setting aside rounding errors); we shall refer to this construction as a symmetric model in this present work because these mappings satisfy a modular symmetry in the sense that each CG site is of equal size and separation in sequence space (N.B. only a subset of these mappings will be consistent with the structural symmetry of the molecule). We have collected a representative benchmark data set of eight important TMPs from Protein Data Bank (PDB)[53] (Table ) that all exhibit approximate internal symmetry in order to compare the CG models built by the ED-CG method to these symmetric CG models and thereby elaborate on the necessity of preserving symmetries, exact or approximate, when constructing highly CG mappings for biomolecular systems.

Table 1

Transmembrane Proteins Analyzed in This Work Belong to Different Structural Types and Approximate Symmetry Groups

protein	PDB ID code	residues	approximate symmetry point group	number of modular repeats	structure type
human integral membrane protein (hIMP) TMEM14A	2LOP(75)	25–99	C₃	3	α-helical bundle
transmembrane domain of N-acetylcholine receptor (nAChR) β2 subunit	2KSR(76)	25–164	C₄	4	α-helical bundle
human water channel aquaporin-1 (AQP1)	1H6I(77)	9–233	S₂ (=C_i)	8	α-helical bundle
mitochondrial ADP/ATP carrier	1OKC(78)	2–293	C₃	9	α-helical bundle
ammonia transporter (AMT1)	2B2F(79)	1–391	S₂ (=C_i)	11	α-helical bundle
cytochrome c oxidase subunit 1 (COX1)-β	1QLE(80)	17–554	C₃	12	α-helical bundle
outer membrane protein X (OmpX)	1Q9F(81)	1–148	C₄	8	β-barrel
outer membrane protein A (OmpA)	2GE4(82)	0–176	C₄	8	β-barrel

Modeling and Simulations of Transmembrane Proteins

We selected a set of test cases by choosing TMPs with internal symmetry and no missing residues in the sequence. Our set of eight proteins represents TMPs of different size, structure, symmetry, function, and complexity and includes structures of either α-helical bundles or β-barrels (see Table and Figure ). We note that all of these proteins are folded and fluctuate around the stable equilibrium structure with no large-scale conformational rearrangements.

Figure 1

Cartoon representations of eight transmembrane proteins studied in this work. Different colors are used to show symmetric units. PDB ID codes are indicated in parentheses.

Cartoon representations of eight transmembrane proteins studied in this work. Different colors are used to show symmetric units. PDB ID codes are indicated in parentheses. These protein models were set up in a membrane-bound environment before performing the atomistic MD simulations. With Maestro (Schrödinger, Inc.), each PDB structure was prepared using Protein Preparation Wizard and embedded in a 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) bilayer by the System Builder. The TMP-membrane assemblies were placed in the simulation boxes, which were filled with explicit water (TIP3P water model[54]) and physiological salt (0.15 M NaCl) on both sides of the membrane. The distance between protein atoms and the box boundaries was at least 12 Å in all directions. CHARMM22/CMAP protein[55,56] and CHARMM36 lipid[57] force fields were used to assign parameters with the tool Viparr.[58] After a 9-step standard relaxation protocol, which has been successfully applied in previous studies,[59−61] each atomistic MD simulation was run for 30 ns in the isothermal–isobaric ensemble with constant temperature, T = 310 K, and constant pressure, P = 1 atm, using the Martyna-Tobias-Klein coupling scheme.[62] Electrostatic forces were calculated using the particle mesh Ewald technique.[63,64] van der Waals and short-range electrostatics were cut off at 9 Å. Long-range electrostatics were updated every third time step. All MD simulations were performed in the Desmond 3.0 simulation package[65] with an integration time step of 2 fs. Hereafter, we applied the ED-CG method[33] to build the CG models from the simulated all-atom MD trajectories.

Data Analysis

Structure visualization was performed with VMD[66] and PyMOL.[67] Plots were prepared using Grace (xmgrace; http://plasma-gate.weizmann.ac.il/Grace) and NumPy[68]/matplotlib.[69] For the different sets of CG models for each TMP with the same CG resolution level, we computed and analyzed the naïve model similarity defined as the fraction of residues assigned to the same CG sites in the two compared models,where N is the number of residues and δ is the Kronecker delta function adding to the similarity whenever residue i is mapped to the same CG site by the two mappings of equal resolution, M and N.

Results and Discussion

ED-CG and Symmetric Models of Transmembrane Proteins

Initial tests were performed to compare the CG models built with space- and sequence-based ED-CG methods. We decided to proceed with the sequence-based variant as a suitable representative approach; results were almost identical for the systems and resolutions studied in this paper (in general, though, they will not be), as the space-based method exhibited much slower convergence rates. To compare the highly CG models built with the ED-CG method to the symmetric CG models, we calculated the value of the residual χ2 (Figure ) over a range of different numbers of CG sites (i.e., the CG resolution), corresponding to the highly CG mapping regime where multiple amino acid residues are represented by a single CG site. It is seen that the residuals for the symmetric models tend to exhibit an oscillatory behavior compared to ED-CG models and that the period of this oscillation depends on the CG resolution. These oscillatory “footprints” indicate that the collective dynamics, which encompasses symmetric modes for structurally symmetric molecules, is better captured by CG mappings that preserve the dominant symmetries. Since the calculated ED-CG χ2 residuals are a good proxy for the lower bound of the residual χ2 at a certain mapping resolution, we can identify a subset of the symmetric models that is optimal in the sense that the symmetric residual χ2 is almost identical to its lower bound for these models. The error of the symmetric model can be estimated by comparing its residual χ2 to the ED-CG χ2-residual-minimized mapping. While this error tends to be small, it increases systematically whenever the CG mapping does not preserve the structural symmetry of the TMPs, giving rise to what we shall call a symmetry mismatch that appears as an oscillatory difference in χ2 between the symmetric model and the ED-CG model (Figure ). It can therefore be eliminated to the point where the residual χ2 ≅ χ2 by appropriately choosing the symmetric model that optimally aligns with the topological features of the TMP. The penalty for a symmetry mismatch follows the same power law relation as the ED-CG χ2 residual, and the relative error is therefore strongly dampened as the resolution of the mapping is increased.

Figure 2

Plots of the χ2 residuals for the symmetric mappings (squares, green) and the ED-CG method resulting mappings (circles, red) for the eight transmembrane proteins plotted against numbers of CG sites (N). The panel with blue dots below each major plot shows the difference in χ2 between the symmetric model and the ED-CG model. Note the logarithmic scale for the y axis in the plotted χ2 residuals.

Optimal Symmetric Models in the CG Regime with ∼10–20 Amino Acid Residues per CG Site Satisfy Symmetry

For all the test cases (Table and Figure 1), it is observed that, for low values of N (highly CG models), the optimal subset of symmetric models always contains models for which the number of CG sites complies with the symmetry point group in the sense that for the TMP with approximate S-fold internal symmetry. When this rule is not obeyed, there will in general be a penalty in the χ2 residual. Our results also show a number of differences between small and large TMPs. For small proteins, such as TMEM14A (75 residues), we observe excellent agreement between the two CG models when the N is a multiple of 3, which can be visually understood looking at the CG map with 6 sites (Figure ). The relatively large symmetry-mismatch penalty observed in the χ2 residual for TMEM14A is attributed to two factors: (1) the small size of the protein and (2) the fact that the protein has three modular repeats (α-helices in this case), which coincides with the approximate 3-fold axis of symmetry (C3). For the larger TMPs in our set of test cases, this effect is weaker (Figure ). Model similarity between the ED-CG mapping and the symmetric mapping at this level of resolution was very high (∼80–90%) in all tested cases.

Figure 3

An example of a symmetric CG map for the protein TMEM14A. The backbone of the atomistic X-ray crystal structure is shown as translucent ribbons. The corresponding CG sites of the mapped structure are shown as solid spheres. The approximate C symmetry axis is indicated by a vertical solid line. The geometric planes that flank the molecule in the long (transmembrane) dimension are indicated by dashed-line triangles.

Optimal Symmetric Models in the CG Regime with ∼5–10 Amino Acid Residues per CG Site Satisfy Modular Repeats in the Secondary Structure

The symmetry mismatch penalty for the higher resolution models is negligible. While model similarity between the ED-CG mapping and the symmetric mapping at this level of resolution for the tested cases varied somewhat (∼45–75%), the absolute difference in the values of the residual χ2 is subtle (Figure , lower panels). This makes physical sense because, at a certain threshold resolution (here, ∼10 amino acids per CG site), there will be enough CG sites in the asymmetric subunit to adequately represent the dynamics of the unit in the essential subspace. However, it turns out that longer-period oscillations appear instead. These oscillations can be interpreted as mismatches (albeit numerically very small compared with the previously described symmetry mismatches) to the modular repeats in the secondary structure of the TMP. For α-helical bundles (β-barrels), the modular repeats are the individual helix-loop (strand-loop) motifs.

Physical Significance of the Anomalous Dimension γ

On the basis of the data plotted in Figure , we calculated the values of the anomalous dimension γ and the temperature-dependent prefactor C(T,N), as defined by eq . As shown in Table , γ falls in a small range around 1.0 for α-helical bundles and in another small range around 1.5 for β-barrels. These values are generally higher than the values previously reported for globular proteins like ubiquitin (γ = 0.50) or G-actin (γ = 0.33), implying that χ2 decreases faster for TMPs than for other proteins when the resolution of the CG mapping is increased. Our results also show that the anomalous dimension γ falls within a very well-defined range for specific TMPs with similar topology. In addition, the similar γ values between the sequence-based ED-CG models and the symmetric CG models indicate good agreement with respect to scaling behavior through the whole range of mapping resolutions.

Table 2

Anomalous Dimensions γ of TMPs Are Close to 1, Unlike Those of Globular Proteinsa

protein	N_res	ED-CG γ	sym. γ
human integral membrane protein (hIMP) TMEM14A	75	1.10 (0.02)	0.95 (0.04)
transmembrane domain of N-acetylcholine receptor (nAChR) β2 subunit	140	0.96 (0.01)	0.99 (0.03)
human water channel aquaporin-1 (AQP1)	225	0.96 (0.03)	0.98 (0.01)
mitochondrial ADP/ATP carrier	292	1.06 (0.04)	1.08 (0.05)
ammonia transporter (AMT1)	391	1.01 (0.01)	1.03 (0.02)
cytochrome c oxidase subunit 1 (COX1)-β	538	1.15 (0.02)	1.18 (0.03)
outer membrane protein X (OmpX)	148	1.54 (0.05)	1.57 (0.04)
outer membrane protein A (OmpA)	177	1.44 (0.05)	1.49 (0.03)

Standard deviations of our estimates of γ are shown in parentheses.

Standard deviations of our estimates of γ are shown in parentheses. To explain why the values of γ in the case of TMPs are typically higher than in the case of globular proteins studied earlier, we studied two simplified models: one of a solid ball and the other of a straight rod. The anomalous dimensions for these two extreme case model systems are demonstrated to be 0 and 1, respectively (see Appendix A for details). Most proteins considered in the previous work[35] are globular; hence, it is reasonable that their anomalous dimensions are typically closer to 0. On the other hand, most TMPs considered in this work are formed by sets of transmembrane α-helices. A set of straight rods, in the approximation of weak interactions between the rods, has the same anomalous dimension as a single rod does (for details, see Appendix B). This analytical result explains why the anomalous dimensions of TMPs are closer to 1 and, therefore, greater than those for globular proteins. The difference in the anomalous dimensions of the two groups of proteins (or, in general, any biomolecules) leads to an interesting consequence for a multimolecular complex formed by weakly interacting n “rod-shaped” components (such as α-helices embedded into a lipid bilayer) and n “ball-shaped” molecules (such as extra- or intracellular parts of membrane-associated proteins). In this case, an increase in the average resolution level of the CG model of the complex leads to a higher resolution representation of the “ball-shaped” parts in comparison to the “rod-shaped” parts or, in other words, the new CG sites added to the complex upon increasing resolution mainly end up in “ball-shaped” (e.g., extra- or intracellular) components of the complex. In mathematical terms, if the total number of CG sites in the multimolecular complex is denoted N, then the ratio of the optimal number of CG sites per each “rod-shaped” component N to the optimal number of CG sites per each “ball-shaped” component N has the following asymptotic behavior as N → ∞:or, equivalently, the fraction of CG sites within “rod-shaped” components decreases with the increase of the resolution level of a CG model of the complexInversely, in coarser CG models, for example, UCG models,[41] the optimal distribution of the CG sites implies a more detailed description of “rod-shaped” components (e.g., filamentous proteins or α-helices in a protein). The oscillatory behavior of the χ2(N) curves for TMPs with n-fold rotational or rotoreflection symmetry can be explained on the basis of the universal scaling behavior for χ2 provided by eq and the fact that the anomalous dimension for straight rods equals 1 (see details in Appendix B). The dependence of χ2 on N predicted by this simple model is shown in Figure in black solid lines. The behavior of these χ2(N) curves is qualitatively similar to those in Figure (especially, TMEM14A, AMT1, and COX1-β) despite the fact that the model of weakly interacting rods provides a simplified representation of dynamical behavior of TMPs.

Figure 4

A model of n = 3 (left panels) and n = 7 (right panels) weakly interacting straight rods demonstrates an oscillatory behavior of the χ2(N) curves (shown with solid lines and circles; the corresponding χ2(N) curves are shown with dashed lines). Therefore, the damped oscillatory behavior of the χ2(N) curves for TMPs analyzed in this Article (see Figure ) is qualitatively captured by the simple model approximating TMPs by several interaction rods. Note the logarithmic scale for the y axis in the top panels.

Connection to Information Content in the CG Model

Very recently, Foley et al.[70] investigated the connection between the entropic component of the potential of mean force (PMF) and the CG representation both in general terms and for concrete models, notably the Gaussian linear chain model where an exact explicit PMF could be derived. Their analysis suggests that there are bounds on the resolution range wherein information-efficient CG mappings can found. Our results presented herein add a new perspective by emphasizing that careful consideration of structural symmetries and local modularities in approximately symmetric transmembrane proteins may help to choose between CG mappings that preserve a comparable fraction of nontrivial information.

Conclusions

In this work, we have demonstrated that accurate and precise CG mappings can be generated for a diverse class of TMPs without the use of computationally expensive MD simulations and subsequent global residual χ2 minimization. To investigate the design principle in a general sense, we have studied CG mappings that partition transmembrane domains into contiguous segments of equal length along the primary sequence. The relative error in χ2 resulting from the use of the proposed heuristic rule is oscillatory and strongly damped, which has two practical consequences. First, symmetry mismatch generally decreases for an increasing number of CG sites. Second, it is possible for the heuristic to produce CG mappings with negligible relative difference in χ2 values to ED-CG methods even in the UCG regime, as long as the number of CG sites agrees with the overall symmetry group of the system (most important for low-resolution CG models) and conforms with the modular repeats (most important for medium-resolution CG models). It is likely that this heuristic will be especially useful when used in conjugation with other procedures to select optimal CG mappings on a case-by-case basis. Moreover, from the analysis of simple models, we predict that low resolution UCG models generated with the ED-CG approach should contain more CG sites in “rod-shaped” parts of proteins and protein complexes, such as α-helices immersed into a lipid bilayer, while higher-resolution CG models with more CG sites contain a larger fraction of CG sites in “ball-shaped” parts of the system, such as extra- or intracellular parts. In summary, our study provides new insight into highly CG modeling of TMPs and facilitates CG simulations by demonstrating that simple symmetry-preserving CG mappings are fast and reliable constructions, which have potential applications to future highly CG (or UCG) simulations of large TMPs and TMP assemblies on long time scales.

Table 3

Anomalous Dimensions γ of a Solid Ball and a Straight Rod Converge to 0 and 1, Respectively, in the Continuous Limit of the Number of Pseudo-Atoms N → ∞, Confirming the Validity of eq a

	N_res	500	1000	5000
solid ball	γ	0.063	0.041	0.020
solid ball	R²	0.99992	0.99998	0.99999
straight rod	γ	1.00005	1.00001	1.00000
straight rod	R²	1.00000	1.00000	1.00000

Calculations were performed using the χ2(N) values at N = 1, 2, ..., 9, 10. The coefficients of determination (R2) are very close to 1, showing the applicability of eq .

66 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Receptor desensitization by neurotransmitters in membranes: are neurotransmitters the endogenous anesthetics?

Authors: Robert S Cantor
Journal: Biochemistry Date: 2003-10-21 Impact factor: 3.162

3. Coarse-Grained Representations of Large Biomolecular Complexes from Low-Resolution Structural Data.

Authors: Zhiyong Zhang; Gregory A Voth
Journal: J Chem Theory Comput Date: 2010-08-23 Impact factor: 6.006

Review 4. Biomolecular simulation: a computational microscope for molecular biology.

Authors: Ron O Dror; Robert M Dirks; J P Grossman; Huafeng Xu; David E Shaw
Journal: Annu Rev Biophys Date: 2012 Impact factor: 12.981

5. The multiscale coarse-graining method. IV. Transferring coarse-grained potentials between temperatures.

Authors: Vinod Krishna; Will G Noid; Gregory A Voth
Journal: J Chem Phys Date: 2009-07-14 Impact factor: 3.488

6. Constructing Optimal Coarse-Grained Sites of Huge Biomolecules by Fluctuation Maximization.

Authors: Min Li; John Zenghui Zhang; Fei Xia
Journal: J Chem Theory Comput Date: 2016-03-14 Impact factor: 6.006

7. Binding of serotonin to lipid membranes.

Authors: Günther H Peters; Chunhua Wang; Nicolaj Cruys-Bagger; Gustavo F Velardez; Jesper J Madsen; Peter Westh
Journal: J Am Chem Soc Date: 2013-01-31 Impact factor: 15.419

8. Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ(1) and χ(2) dihedral angles.

Authors: Robert B Best; Xiao Zhu; Jihyun Shim; Pedro E M Lopes; Jeetain Mittal; Michael Feig; Alexander D Mackerell
Journal: J Chem Theory Comput Date: 2012-07-18 Impact factor: 6.006

9. Refining the treatment of membrane proteins by coarse-grained models.

Authors: Igor Vorobyov; Ilsoo Kim; Zhen T Chu; Arieh Warshel
Journal: Proteins Date: 2015-12-09

10. Interactions of protein kinase C-α C1A and C1B domains with membranes: a combined computational and experimental study.

Authors: Jianing Li; Brian P Ziemba; Joseph J Falke; Gregory A Voth
Journal: J Am Chem Soc Date: 2014-08-11 Impact factor: 15.419

4 in total