Literature DB >> 34525125

Principal component analysis of alpha-helix deformations in transmembrane proteins.

Alexander Bevacqua¹, Sachit Bakshi², Yu Xia¹.

Abstract

α-helices are deformable secondary structural components regularly observed in protein folds. The overall flexibility of an α-helix can be resolved into constituent physical deformations such as bending in two orthogonal planes and twisting along the principal axis. We used Principal Component Analysis to identify and quantify the contribution of each of these dominant deformation modes in transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins. Using three α-helical samples from Protein Data Bank entries spanning these three cellular contexts, we determined that the relative contributions of these modes towards total deformation are independent of the α-helix's surroundings. This conclusion is supported by the observation that the identities of the top three deformation modes, the scaling behaviours of mode eigenvalues as a function of α-helix length, and the percentage contribution of individual modes on total variance were comparable across all three α-helical samples. These findings highlight that α-helical deformations are independent of cellular location and will prove to be valuable in furthering the development of flexible templates in de novo protein design.

Entities: Chemical

Mesh：

Substances：

Year: 2021 PMID： 34525125 PMCID： PMC8443038 DOI： 10.1371/journal.pone.0257318

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

α-helices are deformable bodies

The α-helix is an essential secondary structural component commonly observed in native state protein folds. α-helices are broadly classified as a series of backbone atoms arranged in a right-handed helix with a large dipole moment through backbone carbonyl groups that all point in the same direction. The Ramachandran diagram studies backbone steric clashes and degrees of freedom to conclude on which dihedral angles are most appropriate for the α-helix [1]. The helical geometry is typically specified as having a periodicity of 3.6 residues and a rise of 5.4 Å per helix turn. Although these parameters are generally used to specify the α-helix, by no means is it an immutable structure. α-helices are flexible bodies, as further evidenced by the variety of helical deformations that are recorded in Protein Data Bank (PDB) submissions [2]. The ability to quantify the deformations of flexible elements in a protein fold is paramount for the development of flexible templates in computational de novo protein design. The earliest computational protein design strategies focused on rigid backbone templates. The atomic coordinates of these templates were fixed to simplify the design process and reduce the combinatorial complexity in searching for an optimal protein fold [3]. Studies done with these fixed templates identified sets of side-chain conformations, known as rotamers, that could build a stable protein core for the de novo protein [3]. These protein cores were well-suited for folding by hydrophobic collapse, thereby providing a low-energy structure which could stabilize the surface regions [3]. Although the rigid backbone template is a relatively simple model, it is scrutinized for ignoring backbone flexibility. The superposition of 20 different nuclear magnetic resonance structures of PDB entry 1AEL shows slight positional variations in the backbone atom positions [3]. This implies that rigid templates do not properly balance packing energies and deformation energies [4]. Flexible templates offer more design parameters to refine, which introduces the possibility that these templates can further optimize the free energy of a protein fold, with the drawback of a greater computational complexity. These additional parameters stem from backbone flexibility on the atomic scale and the collective flexible motions of secondary structures. The collective deformations experienced by α-helices can be resolved into individual deformation modes (such as bending and twisting), which from a computational standpoint, represent additional degrees of freedom in the de novo protein design process over existing rigid template design studies [5, 6].

α-helix flexibility is analyzed through constituent deformation modes

α-helix flexibility can be investigated using Principal Component Analysis (PCA) on the atomic coordinates of α-helices collected from the PDB. PCA is a data-driven analysis that can be performed on a sample of static α-helical structures to reveal their principal components. In this context, principal components and deformation modes are interchangeable terms because they both originate from two distinct models (PCA and normal mode analysis) that draw similar conclusions on the flexibility of an α-helix. These modes are each represented by one physical deformation and their individual contribution to the overall deformation of the α-helix is quantified by an eigenvalue (λ). We illustrate the three dominant principal components exhibited in α-helices in Fig 1.

Fig 1

The three dominant deformation modes correspond to three physical deformations seen in α-helices with 18 residues (L = 18).

The three dominant deformation modes correspond to three physical deformations seen in α-helices with 18 residues (L = 18).

The collections of individual atom displacements on these deformed α-helices lead to individual deformation modes. (A) The first deformation mode, Bend 1, has the largest eigenvalue and it is associated with bending of the α-helix in one plane. (B) The second deformation mode, Bend 2, has the second largest eigenvalue and it is associated with bending of the α-helix in another plane, orthogonal to the first one. (C) The third deformation mode captures the twisting of the α-helix along its principal axis, and it has the third largest eigenvalue. (A)-(C) In each subfigure, the two α-helices are individual helices from the PDB in the transmembrane α-helix dataset that represent the two extreme cases of each deformation mode. The arrows illustrate the displacement vector from each atom of a standard α-helix (with a periodicity Δθ of 3.6 residues per helix turn, a rise Δz of 1.5 Å per residue) to its corresponding atom on the deformed α-helix. The tails of these arrows are all translated to the corresponding atom on the deformed α-helix to more easily illustrate how each atom is pulled under the influence of a particular deformation mode. Previous work identified that the three dominant modes of flexibility from the PCA of α-helices are two bending modes and one twist mode [4]. The two largest eigenvalues capture two nearly degenerate bending modes in two orthogonal planes, which is owed to the approximate cylindrical symmetry of an α-helix [4]. The third largest eigenvalue represents a twisting mode along the principal axis of the α-helix [4]. Each deformation mode has a pair of extreme cases, which are shown individually in each subfigure of Fig 1A–1C, but when these extremes are superimposed, they provide a visual aide on the bounds between which an α-helix may deform (See S1 Fig). The work done by Emberly et al. determined these three dominant deformation modes and studied the scaling behaviour of the eigenvalues as a function of the α-helix length [4]. We aim to expand on that research by elaborating on how the dominant deformation modes and scaling behaviour depend on the location of the α-helix in the cell, namely, whether the protein is surrounded by membrane or aqueous environments. In the past decades, bioinformaticians struggled with the scarcity of high-resolution structural information of transmembrane proteins [7-9]. The amount of publicly available transmembrane data over time has been tracked by Stephen White and co-workers, where they catalogue high-resolution structures of membrane proteins as part of their mpstruc database [10]. In 2003, at the time of the work completed by Emberly et al. [4], 88 membrane proteins were listed on the mpstruc database [10]. This shortage of data would not have led to a comprehensive and convincing analysis for comparing the deformation modes of α-helices in soluble proteins and membrane proteins. Our work covers three different α-helix types: transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins. We aim to substantiate and validate the conclusions reached by Emberly et al. [4] using a dataset that is over 500% the size of theirs. Furthermore, we expand the study of dominant principal components into several cellular environments to examine how an α-helix’s cellular milieu affects the physical deformations it experiences in its native state. As an α-helix approaches its native state conformation, the total deformation it experiences will be partitioned between bending and twisting. We study this partition using the variance explained by each principal component as a function of the α-helix length across membrane and aqueous environments. If these profiles are similar between cellular environments, then the variance explained by each deformation mode would exclusively rely on α-helix geometry. The variance explained by each principal component as a function of the α-helix length consequently describes an important relationship between the proportion of deformation manifested as bending or twisting, the cellular milieu of the α-helix, and the α-helix length; however, these profiles would not describe differences in α-helical mechanical properties (intensive properties) across cellular milieus. For example, prior work from Bavi et al. used molecular dynamics to estimate the Young’s modulus of α-helices from M. tuberculosis and E. coli homolog mechanosensitive channels [11]. Their work concludes that the Young’s modulus from α-helix stretching simulations is higher in a vacuum than it is in water [11], but this result would not describe exactly how variance is partitioned between the constituent modes.

Transmembrane and soluble proteins have notable similarities and differences

Transmembrane α-helices and α-helices in soluble proteins have different amino acid compositions. The analysis done by Baeza-Delgado et al. on amino acid composition in α-helices revealed that transmembrane α-helices possess glycine and large hydrophobic amino acids such as leucine, valine, isoleucine, and phenylalanine more frequently whereas polar amino acids like glutamate, lysine, asparagine, arginine, and glutamine were less prevalent [8]. Although their study had 792 transmembrane α-helices and 7348 α-helices in soluble proteins compared to our study with 6075 transmembrane α-helices and 6716 α-helices in soluble proteins, our conclusions on the most prevalent amino acid types were the same (S2 Fig). In a bioinformatic study of the yeast membrane proteome where membrane-embedded transmembrane residues were compared with extramembrane residues, it was concluded that for a fixed degree of residue burial, transmembrane regions evolve 42% more slowly than extramembrane regions using the ratio of the rate of nonsynonymous substitutions to the rate of synonymous substitutions at the DNA level [12]. The transmembrane regions evolve more slowly since the membrane environment imposes greater selective constraint than the aqueous environment surrounding the extramembrane regions [12-14]. Even more, residue evolutionary rate scales in a strong, positive, and linear trend with relative solvent accessibility in both transmembrane and extramembrane regions of membrane proteins [12]. Although extramembrane regions of membrane proteins and soluble proteins have different functional roles, they are both surrounded by an aqueous environment and have similar linear relationships between residue-level evolutionary rate and relative solvent accessibility [12]. Hydrogen bonding is a crucial force in preserving native state transmembrane protein folds. A polar residue in a transmembrane protein is thermodynamically unfavourable unless it is in a hydrogen bonded state as a result of the low dielectric constant of the membrane environment [15]. Transmembrane apolar to polar mutations can lead to non-native hydrogen bonding which can compromise protein function and lead to diseased phenotypes [15]. The glycine-to-arginine mutation alone leads to 4.8% of all transmembrane domain phenotypic mutations, which is statistically more frequent than its occurrence in soluble proteins [15]. More generally, Partridge et al. determined that residues which participate in hydrogen bonds “are overrepresented as molecular causes of disease when they replace a native [transmembrane domain] residue” [16]. Transmembrane α-helices exhibit structural irregularities more frequently than α-helices in soluble proteins. The standard α-helix is defined in terms of several key metrics including the number of residues per turn (which falls between 3.4 and 4.0) and the rise per residue (between 1.36 Å and 1.76 Å) [17]. α-helix structural irregularities include kinks, the 310-helix, and the π-helix [17]. If the local bending angle at a residue within an α-helix is greater than 20°, then the hydrogen bond between residue i and i +4 is broken, and it is consequently called a kinked helix [17]. Hall et al. determined that 44% of transmembrane α-helices had a significant helical kink, with 35% of those kinks caused by proline [18]. The angles of proline-based helical kinks are modulated by proximal serines and threonines [18, 19]. Non-proline kinks were mainly associated with serines and glycines at the center of the kink [7, 18]. In particular, the serine side chain of residue i forms a hydrogen bond with either residue i−4 or i+4 [7, 18]. The 310-helix is a tight-turning and tall α-helix with a periodicity of less than 3.4 residues per helix turn and a rise of greater than 1.76 Å per residue [17]. The π-helix is a wide-turning and short α-helix with a periodicity of greater than 4.0 residues per helix turn and a rise of less than 1.36 Å per residue [17]. Kinks (K), kinks associated with tight turns (K−310), and kinks associated with wide turns (K−π) are more frequently observed irregularities in transmembrane α-helices than in α-helices in soluble proteins [17]. More specifically, the ratios (TM:soluble) are 6:1 for K, 9:5 for tight turns, and 11:4 for wide turns [17]. These irregularities are biologically relevant as White et al. show that serine and threonine motifs shape the local structure of transmembrane α-helices through local kinking to improve both solvation and flexibility [20]. In response to the similarities and differences between transmembrane and soluble proteins on a residue-level, we studied the effect of an α-helix’s cellular environment on its deformation modes, the scaling behaviour of its eigenvalues, and the contribution of each physical deformation to the overall flexibility of the secondary structure.

Results and discussion

There are notable comparisons between transmembrane proteins and soluble proteins highlighted by previous research on amino acid propensity, residue-level evolutionary rates, hydrogen bonding, and the frequency of structural irregularities. We investigated the effect of the surrounding environment on the deformation behaviours of transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins. As deformable bodies, the flexibility of an α-helix can be quantified through the collective deformations of its residues using Principal Component Analysis (PCA) [4].

The total deformation of an α-helix can be resolved into deformation modes

N α-helices of a given length (L residues) were collected from PDB entries (See ). Once the α-helices were structurally aligned, the raw data for PCA comprised of an N by 3L matrix of transformed 3D α-carbon atomic coordinates. We decided to use the α-carbon positions instead of all backbone atoms because α-carbon position appropriately captures the geometry of the backbone and to remain consistent with Emberly et al. [4]. Upon performing PCA, the total deformation of the α-helical sample was segmented into constituent modes, with each mode describing a part of the total deformation. The contribution of each mode to the flexibility of an α-helix is quantified with an eigenvalue (λ). These eigenvalues measure the variance in Å2 captured by an individual deformation mode. The eigenvalues associated with each of the 3L principal components were computed for transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins in the range 10≤L≤25 for a total of 48 sets of eigenvalues.

The deformation modes have different magnitudes in different cellular milieu

In Fig 2, the ten PCA modes with the largest eigenvalues are presented for α-helices with 18 residues (L = 18). Modes #1–3 in Fig 2 represent the three dominant deformation modes that were illustrated in Fig 1: Bend 1, Bend 2, and Twist. The triplet bars for each mode in Fig 2A are included to compare the eigenvalues in the three types of α-helices that we studied. The PCA eigenvalues for L = 12, 15, 21, and 24 can be found in S3 Fig. Since α-helices are roughly cylindrical in shape, the two bending modes have similar eigenvalues. This observation is supported by the work done by Emberly et al., in which they also report a nearly degenerate pair of PCA bending modes with nearly identical eigenvalues [4]. Across all three α-helix types in Fig 2A, the twisting mode represented a smaller contribution to the total deformation with the third largest eigenvalue.

Fig 2

The ten principal components with the largest eigenvalues (λ) from 18-residue transmembrane α-helices (N = 6075), extramembrane α-helices (N = 2198), and α-helices in soluble proteins (N = 6716).

(A) The eigenvalues (λ). (B) The eigenvalues, when normalized by total variance.

The ten principal components with the largest eigenvalues (λ) from 18-residue transmembrane α-helices (N = 6075), extramembrane α-helices (N = 2198), and α-helices in soluble proteins (N = 6716).

(A) The eigenvalues (λ). (B) The eigenvalues, when normalized by total variance. The deformation modes that we elucidated from our samples were larger in magnitude (i.e. the eigenvalues were larger) than those published by Emberly et al. [4] in the range of 10≤L≤25. This implies that the total variance in each of our α-helical samples were greater than the total variance in their dataset. This is due to the fact that their threshold for accepting potential candidate α-helices (done by selecting unbroken series of residues with dihedral angles {ϕ,ψ = −50°±30°,−50°±30°}) [4] was more stringent than ours. In other words, their study was more likely than our study to reject α-helices with more extreme deformation types. On the topic of total variance exhibited by a helical dataset, since there are different physical constraints in the plasma membrane and the cytoplasm due to differences in hydrogen bonding and electrostatic interactions between the two environments, the total variance in helical deformation will be different in each cellular setting. Therefore, for each respective mode in transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins, the eigenvalues should not equal one another, and the amplitude of the individual deformation modes cannot be meaningfully compared across different cellular milieus. To address differences in total variance between each dataset, we normalized the eigenvalues by the total variance in their respective datasets as shown in Fig 2B. The resulting percentage of variance explained is a more worthwhile metric to compare since it describes on a percentage basis the way that total deformation is partitioned between constituent modes. In the range 10≤L≤25, focusing on individual deformation modes, we found the eigenvalues between transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins were different. This suggests that the eigenvalues of the deformation modes of an α-helix depend on its cellular environment, owing to differences in the physical constraints of these environments. The amplitudes of the α-helical deformation modes rely on the environmental constraints which restrict their deformation. Other metrics such as the helix’s scaling behaviour may not necessarily be reliant on these constraints. To investigate this claim further, we studied the scaling behaviour of the three dominant deformation modes.

α-helices have comparable scaling behaviours, irrespective of cellular environment

The eigenvalues (λ) of the first three deformation modes were scaled as a function of the α-helix length (L) using a power law function (λ∝L∎). The scaling exponents associated with each of the three types of α-helices are summarized in Table 1 (with more details in S1 Table).

Table 1

The scaling exponents derived from a power law relationship between the eigenvalues (λ) of the first three deformation modes and the α-helix length (L).

λ∝L^∎	Transmembrane α-helices	Extramembrane α-helices	α-helices in soluble proteins	α-helices in soluble proteins [4]
Bend 1	3.3	3.2	3.4	4
Bend 2	3.6	3.5	3.6	4
Twist	2.7	2.3	2.7	2

The first three columns of entries in Table 1 contain the empirical scaling exponents associated with the eigenvalues of the top three deformation modes in the range 10≤L≤25. These exponents were calculated by preparing a log-log plot of the α-helix lengths against the PCA mode eigenvalues and identifying the slope of the linear relationship. For Bend 1 and Bend 2, the scaling exponents were very similar between transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins. The scaling exponent of Twist is similar between the three different types of helices, especially for the transmembrane α-helices and α-helices in soluble proteins. Moreover, the scaling exponents of the twisting mode are consistently lower than the scaling exponents of the two bending modes across all three α-helix types. The distinction between the bending mode exponents and the twisting mode exponent exists due to the way in which the deformation modes induce displacements away from a mean α-helical structure: for bending modes, these displacements increase quadratically with α-helix length (δx≈L2/R,λ∝L4) [4]; however, for the twisting mode, these displacements increase linearly with helix length (δx≈Lδθ,λ∝L2) [4]. In this approach, the scaling of PCA eigenvalues of an α-helix was likened to the scaling of a fluctuating elastic rod in thermal equilibrium [4], irrespective of the rod’s surrounding environment. The final column of Table 1 summarizes a key conclusion made by Emberly et al. in their comparisons of the principal components of PCA with the dynamical normal modes of normal mode analysis (NMA) [4]. Unlike PCA, which summarizes a set of related static atomic structures, NMA describes protein dynamics through the collective motions of atoms [21-23]. Emberly et al. used a spring model describing the thermodynamics of a free α-helix to determine normal mode eigenvalues representative of the lowest energy deformations and described an inverse relationship between the principal component eigenvalues and the spring constants [4]. In their study, since the top three principal components agreed with the three lowest-energy normal modes, they concluded that the scaling behaviours between PCA modes and normal modes must also match [4]. By approximating an α-helix as an elastic rod, they identified that the two bending modes scale with λ∝L4 and that the twisting mode scales with λ∝L2 [4]. In other words, the data-driven methods of PCA and the fundamental physics arguments of NMA reach the same conclusions on how α-helices behave as deformable bodies. In principle, the results of the NMA should be the same regardless of which environment the elastic rod is located, so the α-helical normal modes identified by Emberly et al. are extendable to membrane environments [4]. The results in Table 1 show consistency in PCA scaling behaviour of mode eigenvalues between transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins. This is evidence that the way deformation depends on α-helix geometry (i.e., scales with helix length) is independent of cellular microenvironment.

The contribution of each deformation mode as a fraction of total α-helix flexibility

Next, we investigated the percentage of contribution made by each deformation mode to the overall flexibility. Since the eigenvalues each measure the variance in Å2 captured by an individual deformation mode and the total variance was different in each of the three α-helical samples that we investigated, it would be worthwhile to normalize the eigenvalues across all three α-helical samples as a percentage of their total variance (from all 3L deformation modes) for 10≤L≤25. Then, eigenvalue trends can be observed independent of the differences in total variance between the three α-helix samples. The eigenvalues of the deformation modes are normalized in Fig 3 to display trends across the principal component number and trends along the α-helix length. When comparing transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins in Fig 3, the collection of sixteen lines in each panel are all generally concave up. By inspection, the blue lines, which describe the relative contribution of each deformation mode for L = 25, have a much greater concavity (steeper initial ‘slope’) than the red lines, which describe the relative contribution of each deformation mode for L = 10. This means that the fraction (λ+λ)/λ is much greater in 25-residue α-helices than in 10-residue α-helices. This follows our intuition well since we expect large, exaggerated bends to hold a greater contribution to the total deformation in the longer α-helices. In fact, the percentage of variance explained by the twisting mode is lower in 25-residue α-helices than in 10-residue α-helices across all three α-helix types shown in Fig 3.

Fig 3

Each line represents the percentage of total variance explained by the first ten principal components for α-helices of a certain length (L).

Each line represents the percentage of total variance explained by the first ten principal components for α-helices of a certain length (L).

Sixteen lines are plotted to illustrate this trend in the range 10≤L≤25. The length of the α-helix in question is represented by the colour and thickness of each line. These distributions were plotted for (A) transmembrane α-helices, (B) extramembrane α-helices, and (C) α-helices in soluble proteins. The structures of PDB entries 3JBR [24] and 5AM9 [25] are shown for illustrative purposes. While the fourth and fifth deformation modes are not negligible in magnitude when compared with the three dominant deformation modes, we decided to focus on the first three because they capture the majority of variance explained. This is illustrated more clearly in Fig 4, where we can more closely examine how Bend 1, Bend 2, and Twist–the most prominent physical deformations–contribute the majority of variance explained in each cellular environment.

Fig 4

The percentage of total variance explained by each of the first three principal components individually (red, blue, and green) and combined (pink) for α-helices with helix lengths ( The red, blue, and green lines represent the contributions of Bend 1, Bend 2, and Twist modes respectively towards explaining the total variance. The pink line represents the summed contributions of the first three principal components towards explaining the total variance. These results are plotted for (A) transmembrane α-helices, (B) extramembrane α-helices, and (C) α-helices in soluble proteins. The structures of PDB entries 3JBR [24] and 5AM9 [25] are shown for illustrative purposes. Following each of the pink lines in Fig 4 from left to right, the summed contributions of the first three principal components describe around 60% of the variance explained for L = 10 and the variance explained rises to around 75% as the α-helix length increases to L = 25. This observation is invariant to changes in the location of α-helices in the cell. This remarkable similarity between transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins is another indication towards α-helix principal components relying primarily, if not solely, on the geometry as opposed to its cellular environment. The relative importance of the bending modes in explaining the total variance within all three samples increases as the α-helix gets longer as illustrated by the red and blue lines in Fig 4. The relative importance of the twist mode in explaining the total variance within all three samples lowers as the α-helix gets longer as illustrated by the green line in Fig 4. These directional trends match the results of the previous study done by Emberly et al. on 680 α-helices in coiled-coil structures [4]. In these coiled-coiled motifs, once α-helix lengths exceeded 80 residues, higher-order harmonics of the bend mode become lower in energy than the twist mode (i.e. the higher-order harmonics of the bend mode explain a greater percentage of variance than the twist mode) [4]. This means that in α-helices in coiled-coil motifs with lengths greater than 80 residues (and in free α-helices with lengths exceeding 33 residues), the twisting mode will cease to be the third lowest normal mode (and therefore will no longer be the third largest eigenvalue in PCA either as we had represented in Fig 2) [4]. This is consistent with the steady decrease in the percentage of variance explained by the twisting mode in the range 10≤L≤25 across all three α-helix types that we observed in Fig 4. The diminishing importance of the twisting mode across all α-helix types as L increases implies that higher-order harmonics of the bending mode will overshadow the twisting mode in longer α-helices regardless of the α-helix’s location in the cell. This overshadowing of the twisting mode will rarely be a concern in transmembrane α-helices, and consequently transmembrane protein design since the thickness of the cell membrane imposes a natural constraint on the maximal length of transmembrane α-helices. Returning to the computational work on an α-helix’s Young’s modulus by Bavi et al., they determined that water acts as a ‘lubricant’ as the TM1 α-helix in a mechanosensitive channel pore is elongated [11]. At first glance, since the reported Young’s modulus of their simulated α-helix is higher in a vacuum than it is in water (i.e., the α-helix is stiffer in a vacuum than in water) [11], it would appear to contradict our conclusion that deformation mode scaling behaviour and percentage of variance explained are independent of cellular surroundings. Deformation modes (including the profiles of variance explained) across cellular milieus cannot be directly compared with an intensive property like Young’s modulus. For Bavi et al., the difference in Young’s modulus is attributed to changes in the number of hydrogen bonds between the solvent and the helix [11], but for our study, a constant number of native state hydrogen bonds are automatically accounted for in the static PDB structure of each α-helix. Consequently, it is possible to have a lower Young’s modulus in an aqueous environment, while also maintaining the same percentage of variance explained profile seen in both a membrane environment and an aqueous environment. We considered the possibility that the resolution of the protein structures used to pursue our study could affect the deformation mode eigenvalues, scaling behaviour, and percentage of total variance explained that we observe. The average resolution of soluble proteins collected in our study is 2.31 Å and the average resolution of soluble proteins collected in our study is 3.02 Å (see the histograms in S4 Fig). We repeated our analysis on structures within our original three datasets that have a resolution of ≤ 3 Å. The ten largest eigenvalues of 18-residue α-helices across the three datasets in protein structures with a resolution of ≤ 3 Å are presented in S5 Fig. Using these eigenvalues, the scaling exponents (in S2 Table), and the percentage of variance explained by each deformation mode (in S6 and S7 Figs) were calculated. The results of our high-resolution analysis closely match the ones presented in our main study, except for the extramembrane α-helices’ scaling behaviour. With a resolution of 3 Å as an upper bound, the extramembrane α-helix dataset shrunk to about 20% of its original size. As presented in S2 Table, this resulted in a Bend 2 scaling exponent of 2.9 (NMA predicts a scaling exponent of 4 for bending modes) and a Twist scaling exponent of 2.1 (NMA predicts a scaling exponent of 2 for the twisting mode). Future work stemming from our analysis could go in several directions. We decided to use L α-carbons in each α-helix for PCA to remain consistent with Emberly et al. [4] and pursued the assumption that in any one α-helix, if side chain-environment interactions led to some native state structural deformation of the backbone, then it might be manifested in the corresponding α-carbon coordinates that we see in the PDB. It would be worthwhile to include side chain identities in PCA, which would imply that the dataset would need to be segmented by cellular microenvironment, α-helix length, as well as by sequence. This would require a far greater amount of data than is available now. Moreover, in future work, α-helices could be stratified by their degree of solvent exposure, but this would also require more data than is available now, especially for membrane proteins. In addition to including residue identity and degree of solvent exposure, future analyses could include all α-helix backbone atoms. This would open the possibility of using torsion angle representations since this approach follows the assumption that bond lengths are invariant. Since the distance between α-carbons is not uniform, this internal representation would not be accurate with the α-carbon dataset we used to pursue this study. Furthermore, an analysis of all α-helix backbone atoms could lead to an improved understanding of how the prevalence of structural irregularities such as kinks between transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins depend on α-helix length. In our analysis, the top three deformation modes are manifested as Bend 1, Bend 2, and Twist specifically because PCA outputs the principal components using an orthogonal basis. We selected PCA as it is considered a data-driven counterpart to NMA [4]. It is possible as future work to analyze the α-helix atomic coordinates using other data-driven approaches such as Independent Component Analysis (ICA), which will not force the components into an orthogonal basis. At the same time, the independent components likely will present the results differently in such a way that they would not be directly comparable to NMA.

Conclusion

We investigated the relationship between the cellular surroundings of an α-helix and their deformation modes by performing PCA on three α-helical samples representative of three different cellular contexts: transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins. Our findings confirmed that for α-helices with lengths in the range of 10–25 residues, the total deformation is described primarily by two nearly degenerate bending modes and a twisting mode. The eigenvalues, which quantify the variance in the sample captured by each individual deformation mode, were calculated across all three cellular milieus and used to study the scaling behaviour of the eigenvalues as a function of the α-helix length using a power law function. The scaling exponents were consistent across the three types of α-helices even though the eigenvalues were not comparable. The independence of deformation mode scaling behaviour on cellular surroundings supports the theory and applicability of normal mode analysis in diverse cellular contexts [4]. The different physical constraints of each cellular environment led to differences in the total variance of each dataset, implying that the amplitudes of individual deformation modes were different across the three different samples. We then studied the contribution of each deformation mode as a fraction of the total deformability in our α-helical samples by plotting mode eigenvalues that were normalized by the total variance of their respective datasets. From these plots, we inferred that the relative contributions of the bending modes and the twisting mode towards the total deformation relied on the length of the α-helix, and not their environment. The similarity between the scaling behaviour and percentage of variance explained profiles of transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins can be incorporated in flexible templates in computational protein design to refine the structures of de novo transmembrane proteins.

Methods

667 PDB entries classified as α-helical transmembrane proteins were collected from the mpstruc database for Membrane Proteins of Known 3D Structure [10]. These PDB files have α-helix annotations. Their corresponding entry was collected from the Orientations of Proteins in Membranes (OPM) Database from the University of Michigan [26, 27]. The OPM PDB files modify the Standard Research Collaboratory for Structural Bioinformatics (RCSB) PDB entries by rotating the coordinate system of the 3D atomic coordinates [26, 27]. They set the origin (0,0,0) at the center of the membrane bilayer as illustrated in S8A Fig. The z-axis points to the extracellular space and it is a normal vector with respect to the membrane. The OPM PDB files also include the ‘½ of bilayer thickness’ remark at the top of the file [26, 27]. This reported bilayer thickness was used to determine which α-carbons are located inside the membrane. RCSB PDB files have α-helix annotation information whereas OPM PDB files have transmembrane region information. When these two pieces of information are brought together, then transmembrane α-helical regions can be properly identified and annotated. Each residue (or more specifically, the α-carbon associated with each residue) of the 667 α-helical transmembrane proteins was annotated as either part of an α-helix, as part of a transmembrane region, as part of a transmembrane α-helix (both), or having no annotation (neither). Once annotation is complete, the outputted files are then imported into MATLAB for structural alignment. To prepare the input data for PCA, N α-helices of equal amino acid length (L) must first be superposed. The goal is to optimally overlay each candidate α-helix (represented as an L by 3 matrix) with the ideal α-helix using only translations and rotations. We parameterized an ideal α-helix with a periodicity Δθ of 3.6 residues per helix turn, a rise Δz of 1.5 Å per residue, and a radius of 2.3 Å. Complete details on α-helical superposition are in the Supporting Information with accompanying illustrations in S9 Fig. The PCA function in MATLAB [coeff, score, latent,~,explained,~] = pca(___) was used to identify principal components, calculate their associated eigenvalues, and the percentage of variance explained. This protocol was done sixteen times (10≤L≤25) for transmembrane α-helices to study the scaling relationship of deformation mode eigenvalues as a function of α-helix length. A biplot of orthonormal principal component coefficients (of the 3L PCA variables) and principal component scores for each of the N observations were used to identify pairs of extreme observations for each of the first three deformation types: Bend 1, Bend 2, and Twist. These extreme α-helix observations were used for illustrative purposes in Fig 1. The entire methodology outlined above was repeated for two other types of α-helices: extramembrane α-helices and α-helices in soluble proteins. This was done to verify Emberly et al.’s results [4] on α-helix deformation modes and to highlight any potential differences in α-helix flexibility that would arise from its dependence on the surrounding environment. The 667 PDB entries that were used to collect transmembrane α-helix data were also used to collect extramembrane α-helix data. α-carbon atomic coordinates annotated with ‘Alpha Helix’ in S8B Fig were used as extramembrane α-helix data for import into MATLAB for superposition as well as for PCA. 959 PDB entries were consulted to acquire the data for α-helices in soluble proteins. Files resembling the one in S8B Fig for soluble proteins were prepared in Python 3, and the data was imported into MATLAB for superposition and PCA as outlined in the above methodology. Once the main deformation modes of each α-helix type were characterized as shown in Fig 1, the scaling behaviours of each mode for each α-helix type was studied (i.e., the relationships between eigenvalues (λ) and α-helix length (L) were elucidated). The scaling exponents recorded in Table 1 were calculated using a log-log plot of the α-helix lengths (10≤L≤25) against the PCA mode eigenvalues using the Curve Fitting Toolbox in MATLAB. The three dominant deformation modes were inspected individually under a power law function. When the eigenvalue data was fit to the relationship log(λ) = log(L)+b, the parameter was the appropriate scaling exponent to fulfill the λ∝L∎ relationship in Table 1.

The three dominant deformation modes seen in α-helices with 18 residues (L = 18).

(A)-(C) In each subfigure, α-helix 1 and α-helix 2 are individual helices from the PDB in the transmembrane α-helix dataset. More specifically, they represent the two extreme cases of each deformation mode in the transmembrane α-helix dataset. (TIF) Click here for additional data file.

Amino acid distribution representative of 18-residue transmembrane α-helices (N = 6075), extramembrane α-helices (N = 2198), and α-helices in soluble proteins (N = 6716).

(TIF) Click here for additional data file.

The ten principal components with the largest eigenvalues (λ) from 12-, 15-, 21-, and 24-residue transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins.

(TIF) Click here for additional data file.

A pair of normalized histograms presenting the resolution of the structures used to pursue our original analysis.

(A) The membrane protein PDB entries, specifically the transmembrane α-helix and extramembrane α-helix datasets, have an average resolution of 3.02 Å. (B) The soluble protein PDB entries used in this analysis have an average resolution of 2.31 Å. (TIF) Click here for additional data file.

The ten principal components with the largest eigenvalues (λ) from 18-residue transmembrane α-helices (N = 2617), extramembrane α-helices (N = 428), and α-helices in soluble proteins (N = 5360), all for our analysis of only high-resolution structures (≤ 3 Å).

(A) The eigenvalues (λ). (B) The eigenvalues, when normalized by total variance. (TIF) Click here for additional data file.

Each line represents the percentage of total variance explained by the first ten principal components for α-helices of a certain length (L) for our analysis of only high−resolution structures (≤ 3 Å).

Sixteen lines are plotted to illustrate this trend in the range 10≤L≤25. The length of the α-helix in question is represented by the colour and thickness of each line. These distributions were plotted for (A) transmembrane α-helices, (B) extramembrane α-helices, and (C) α-helices in soluble proteins. (TIF) Click here for additional data file. The percentage of total variance explained by each of the first three principal components individually (red, blue, and green) and combined (pink) for α-helices with helix lengths ( The red, blue, and green lines represent the contributions of Bend 1, Bend 2, and Twist modes respectively towards explaining the total variance. The pink line represents the summed contributions of the first three principal components towards explaining the total variance. These results are plotted for (A) transmembrane α-helices, (B) extramembrane α-helices, and (C) α-helices in soluble proteins. (TIF) Click here for additional data file.

An overview of our transmembrane α-helix annotation methods.

(A) A cartoon representation of transformed 3D atomic coordinates in the Orientations of Proteins in Membranes (OPM) Database. When the |z|<½ lipid bilayer thickness, the α-carbon is part of a transmembrane region. (B) A piece of an outputted annotation text file: The preprocessed data from the RCSB and OPM PDB files include amino acid identity, residue number, protein subunit, α-carbon coordinates measured in Å, and the appropriate annotations. (TIF) Click here for additional data file.

An overview of our α-helix superposition methods.

(A) The candidate α-helix and the ideal α-helix are not yet optimally superposed. (B) In the first step of superposition, the centroid of the candidate α-helix is translated to the origin. (C) In the second step of superposition, the candidate α-helix is rotated with respect to the ideal α-helix. (D) The displacement between the z-coordinate of α-carbon 6 in candidate α-helix 3 of the sample and the z-coordinate of α-carbon 6 in the mean α-helix is one of many data points in the raw data for PCA. (E) The raw data for PCA is an N by 3L matrix recording the displacements between each atomic coordinate of the transformed candidate α-helix and the corresponding atomic coordinate in the mean α-helix. (TIF) Click here for additional data file.

The power law relationship between the eigenvalues (λ) of the first three deformation modes and the α-helix length (L).

(DOCX) Click here for additional data file.

The scaling exponents derived from a power law relationship between the eigenvalues (λ) of the first three deformation modes and the α-helix length (L) for our analysis of only high-resolution structures (≤ 3 Å).

(DOCX) Click here for additional data file. 17 Jun 2021 PONE-D-21-09843 Principal component analysis of alpha-helix deformations in transmembrane proteins PLOS ONE Dear Dr. Xia, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that the manuscript does not fully meet PLOS ONE’s publication criteria as it currently stands. As you will see from the attached review comments, reviewer #1 has pointed out some serious reservations against the publication, and in our opinion, the comments need to be addressed before a decision can be made. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. A further review of the manuscript shall be necessary. Please submit your revised manuscript by Aug 01 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Parag A. Deshpande Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. 3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: No Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: I Don't Know Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The manuscript is technically sound. Principal Component Analysis is a well known mathematic approach for identifying correlation of variables such as collective motions in systems, if the variables are defined as the deviation of catesian coordinates from a reference structure. The reference structure is in this case a 'perfect' a-helix peptide of variable length L. The authors show that the three first principal components correspond to deformations and a twist mode of the helical spine. These results seem to be in good agreement with the lowest energy vibrational modes predicted from NMA (Emberly) . However, the main conclusion arising from this paper is that the 'flexibility' of alpha-helices , which is described in terms of deformations and twisting modes, does not depend on the environment. That means that an alpha helix behaves similarly in solution and embeded in a membrane. This cannot be the case since the interactions (in particular H-bond, electrostatics, etc) with the environment are different (see discussion on page 7 and on Youngs modulus in page 16). Since the authors only consider the Calpha of the a-helix for their PCA, specific interactions between side chains and environment are not taken into acount. Furthermore, although the amino acid composition of a-helices in different environments is discused in pages 6-8, this property is neglected when using only Calpha for PCA . Therefore I consider that the data does not support the conclusions regarding the flexibility of the entire alpha helix in different environment. Furthermore, there are several points that should be further discussed: 1) Fig 2, : Why are only the results for L=18 plotted and futher discussed?, since the eigenvalues computeted for other L values are different (Figure 3). Eigenvalues 4 and 5 are not negligible compared to 3. 2) in the case of soluble proteins , did the authors only select the solvent exposed a-helices , or simply all helices in the PDB were considered for the analysis? 3) A schematic view of Bend 1 and Bend 2 , in the form of structure with arrows describing the displacements would be helpfull. Does the kink of the bending change with the length? 4) What do we learn by scaling the eigenvalues? How are the scaling factors determined?. 5) Caption in Fig 3, should be written in more detail. Protein structures are hardly visible. 6) What are the resolution of the structures used in this study? Are high resolution structures used for the analysis? if not, the fact that the authors do not see any difference in the PC of a-helices in different environments is most likely a result of the inaccuracy of structure determination. Reviewer #2: The work follows in the footstep on an older study [4] and analyses the eigenvalues of the top 3 values (and others as well) of the PCA of the superimposed helices in different environments. As a suggestion: it would be interesting to understand if these results are due to the methodology of representing the data: that is running a PCA calculation over helices that are initially superimposed: that the helices are generally placed so that their main axis is aligned implies that the first two eigenvalues will be those perpendicular to this axis (spanning the two dimensions of the plane). If an internal representation was used (e.g., torsion angles) no superposition was needed, and then maybe the main deformation directions were different? Or alternatively, if ICA was used instead of PCA, there wouldn't be the constraint that these are perpendicular to each other. Minor comments: (1) It would be helpful if there are more details why the principal components and deformation modes are interchangeable terms (line 66) -- this is described in greater detail in [4], but it would helpful if one is not required to go there. (2) Does Figure 1 show the average, and the deformed helix (similar to the figure in [4])? in this case, and otherwise, more details are needed on what exactly is shown. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 12 Aug 2021 Please see our Response to Reviewers, where we respond to each point raised in the Decision Letter. Our response is also included below: RESPONSE TO COMMENTS FROM REVIEWER #1 Reviewer’s Comments: The manuscript is technically sound. Principal Component Analysis is a well known mathematic approach for identifying correlation of variables such as collective motions in systems, if the variables are defined as the deviation of catesian coordinates from a reference structure. The reference structure is in this case a 'perfect' a-helix peptide of variable length L. The authors show that the three first principal components correspond to deformations and a twist mode of the helical spine. These results seem to be in good agreement with the lowest energy vibrational modes predicted from NMA (Emberly) . Author’s Response: We thank the reviewer for their positive evaluation of our work and for their very insightful comments, which helped us refine our manuscript and the figures which accompany it. We have addressed all the reviewer’s comments as described below. Reviewer’s Comments: However, the main conclusion arising from this paper is that the 'flexibility' of alpha-helices , which is described in terms of deformations and twisting modes, does not depend on the environment. That means that an alpha helix behaves similarly in solution and embeded in a membrane. This cannot be the case since the interactions (in particular H-bond, electrostatics, etc) with the environment are different (see discussion on page 7 and on Youngs modulus in page 16). Author’s Response: We thank the reviewer for this very important comment. We completely agree with the reviewer that the flexibility of α-helices depends strongly on the environment, due to large differences in hydrogen bonding, electrostatics, and packing between α-helices in soluble and membrane proteins. Our main conclusion in the paper is that, despite these large differences in the flexibility of α-helices from soluble and membrane proteins, Principal Component Analysis (PCA) reveals that several specific deformation properties of α-helices do not depend on its cellular environment. These include the physical nature of the top three deformation modes (two degenerate bending modes followed by the twisting mode), the percentage of total variance in helical deformation explained by each deformation mode, and the scaling behaviour of the deformation mode over the length of the helix. In response to the reviewer’s comment, we have added the following discussions to the manuscript to clarify that there are indeed significant differences in the overall flexibility of α-helices in different environments. Page 11, Lines 238-248: “On the topic of total variance exhibited by a helical dataset, since there are different physical constraints in the plasma membrane and the cytoplasm due to differences in hydrogen bonding and electrostatic interactions between the two environments, the total variance in helical deformation will be different in each cellular setting. Therefore, for each respective mode in transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins, the eigenvalues should not equal one another, and the amplitude of the individual deformation modes cannot be meaningfully compared across different cellular milieus. To address differences in total variance between each dataset, we normalized the eigenvalues by the total variance in their respective datasets as shown in Fig 2B. The resulting percentage of variance explained is a more worthwhile metric to compare since it describes on a percentage basis the way that total deformation is partitioned between constituent modes.” Page 12, Lines 250-257: “In the range 10≤L≤25, focusing on individual deformation modes, we found the eigenvalues between transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins were different. This suggests that the eigenvalues of the deformation modes of an α-helix depend on its cellular environment, owing to differences in the physical constraints of these environments. The amplitudes of the α-helical deformation modes rely on the environmental constraints which restrict their deformation. Other metrics such as the helix’s scaling behaviour may not necessarily be reliant on these constraints. To investigate this claim further, we studied the scaling behaviour of the three dominant deformation modes.” Pages 20-21, Lines 442-454: “The eigenvalues, which quantify the variance in the sample captured by each individual deformation mode, were calculated across all three cellular milieus and used to study the scaling behaviour of the eigenvalues as a function of the α-helix length using a power law function. The scaling exponents were consistent across the three types of α-helices even though the eigenvalues were not comparable. The independence of deformation mode scaling behaviour on cellular surroundings supports the theory and applicability of normal mode analysis in diverse cellular contexts [4]. The different physical constraints of each cellular environment led to differences in the total variance of each dataset, implying that the amplitudes of individual deformation modes were different across the three different samples. We then studied the contribution of each deformation mode as a fraction of the total deformability in our α-helical samples by plotting mode eigenvalues that were normalized by the total variance of their respective datasets. From these plots, we inferred that the relative contributions of the bending modes and the twisting mode towards the total deformation relied on the length of the α-helix, and not their environment.” Reviewer’s Comments: Since the authors only consider the Calpha of the a-helix for their PCA, specific interactions between side chains and environment are not taken into acount. Furthermore, although the amino acid composition of a-helices in different environments is discused in pages 6-8, this property is neglected when using only Calpha for PCA . Therefore I consider that the data does not support the conclusions regarding the flexibility of the entire alpha helix in different environment. Author’s Response: We thank the reviewer for this important comment. We made the decision to only consider the α-carbons of the α-helix for PCA to remain consistent with Emberly et al.’s prior work so that we can compare our results with theirs, and because α-carbon position appropriately captures the geometry of the backbone of the α-helix. Furthermore, we pursued the assumption that if side chain-environment interactions led to some native state structural deformation of the backbone of the α-helix, then it might be manifested in the corresponding α-carbon coordinates. While it is true that presenting the PCA results of only α-carbons lowers the resolution of the analysis when compared with an analysis on all atoms, the approach we took in preparing the data was consistent across all α-helical samples with very different sequences. In response to the reviewer’s comment, we added this as a caveat on Page 9, Lines 202-204. We agree that including side chain atoms’ positions in PCA would lead to a more comprehensive analysis of the interactions between side chains and the environment. However, this is not feasible at this moment due to lack of data. Since PCA requires the same number (and type) of atoms for all structures, the α-helix data will need to be segmented by cellular microenvironment, length, as well as by sequence. This would require a much greater amount of data than is available now. In response to the reviewer’s comment, we have revised the manuscript to include the following text in the Results and Discussion section (Page 19, Lines 408-415): “Future work stemming from our analysis could go in several directions. We decided to use L α-carbons in each α-helix for PCA to remain consistent with Emberly et al. [4] and pursued the assumption that in any one α-helix, if side chain-environment interactions led to some native state structural deformation of the backbone, then it might be manifested in the corresponding α-carbon coordinates that we see in the PDB. It would be worthwhile to include side chain identities in PCA, which would imply that the dataset would need to be segmented by cellular microenvironment, α-helix length, as well as by sequence. This would require a far greater amount of data than is available now.” Reviewer’s Comments: Furthermore, there are several points that should be further discussed: 1) Fig 2, : Why are only the results for L=18 plotted and futher discussed?, since the eigenvalues computeted for other L values are different (Figure 3). Eigenvalues 4 and 5 are not negligible compared to 3. Author’s Response: We thank the reviewer for these important comments. Fig 2 presents the deformation modes for L = 18 alone and its purpose is to act as an introduction into the topic of PCA. For the sake of brevity, we only included L = 18 in Fig 2 because soon after, we define a metric that we found more important than the deformation mode eigenvalues: the percentage of variance explained by each deformation mode. Fig 3 and Fig 4 present this more important metric in a systematic way, for helices of all lengths (10≤L≤25). At the same time, we agree with the reviewer that it is important for readers to be able to consult the deformation mode eigenvalue results for other L values. In response to the reviewer’s comments, we have included in the manuscript the deformation mode eigenvalue results for additional values of L (L = 12, 15, 21, and 24) in a new supplementary figure (S3 Fig), which is analogous to Fig 2, and revised the manuscript’s text accordingly (Page 10, Lines 217-218). Across all values of L, we decided that the top 3 deformation modes are an appropriate cut-off since they together contribute over 60% to 75% of total variance explained as observed in Fig 4. Furthermore, these modes were studied because they are the same ones studied by Emberly et al. Eigenvalues 4 and 5 could be analyzed in the same ways that were done for eigenvalues 1 through 3 but is beyond the scope of the current study. In response to the reviewer’s comments, we revised the manuscript to include the following text to provide further justification on why we focused on the top three deformation modes even though deformation modes 4 and 5 are not negligible in magnitude compared to the third eigenvalue (Page 16, Lines 336-340): “While the fourth and fifth deformation modes are not negligible in magnitude when compared with the three dominant deformation modes, we decided to focus on the first three because they capture the majority of variance explained. This is illustrated more clearly in Fig 4, where we can more closely examine how Bend 1, Bend 2, and Twist – the most prominent physical deformations – contribute the majority of variance explained in each cellular environment.” Reviewer’s Comments: 2) in the case of soluble proteins , did the authors only select the solvent exposed a-helices , or simply all helices in the PDB were considered for the analysis? Author’s Response: We selected a representative set of α-helices directly from PDB entries of soluble proteins. These α-helices may be either buried or exposed in the soluble protein. Future work may include further stratification of α-helices according to their degree of solvent exposure, but currently there is not enough data for doing so, especially for membrane proteins. In response to the reviewer’s comment, we have revised the manuscript to discuss this direction for future work (Page 19, Lines 415-417). Reviewer’s Comments: 3) A schematic view of Bend 1 and Bend 2 , in the form of structure with arrows describing the displacements would be helpfull. Author’s Response: We thank the reviewer for this helpful comment. In response to the reviewer’s suggestion, we created a new Fig 1, adding arrows with displacements to each atom. We agree with the reviewer that the new Fig 1 better illustrates to the reader how the collections of individual atom displacements lead to individual deformation modes. Reviewer’s Comments: Does the kink of the bending change with the length? Author’s Response: We thank the reviewer for raising this very interesting question. The current study focuses on PCA-based deformation modes (e.g., bending and twisting), rather than on kinks in helices. While a quantitative analysis of kinks is beyond the scope of the current study, it is a worthwhile topic for future work. In response to the reviewer’s comment, we have revised the manuscript to discuss this direction for future work (Page 19, Lines 423-426). Reviewer’s Comments: 4) What do we learn by scaling the eigenvalues? Author’s Response: Assuming that α-helices are perfectly elastic rods, normal mode analysis (NMA) predicts that the eigenvalue associated with each dynamical normal mode has a characteristic scaling exponent when plotted against helix length. In particular, the eigenvalue associated with the bending mode is predicted to scale quartically with helix length (λ∝L^4), whereas the eigenvalue associated with the twisting mode is predicted to scale quadratically with helix length (λ∝L^2). In our current study, we compared the scaling behaviour of the PCA eigenvalues of α-helices in different cellular microenvironments. We find that the scaling exponents of the top three PCA eigenvalues of α-helices are similar across different microenvironments, and they broadly agree with the NMA-based predictions. The similarity between these scaling exponents suggests a homogeny in how the magnitude of each deformation mode scales as a function of α-helix length across all cellular environments. Furthermore, their consistency supports the theory that the principal components exhibited in an α-helix are comparable with NMA-based dynamical normal modes described by Emberly et al.. Reviewer’s Comments: How are the scaling factors determined?. Author’s Response: The scaling exponents are determined using the Curve Fitting Toolbox in MATLAB as described in Methods (Page 23, Lines 509-514): “The scaling exponents recorded in Table 1 were calculated using a log-log plot of the α-helix lengths (10≤L≤25) against the PCA mode eigenvalues using the Curve Fitting Toolbox in MATLAB. The three dominant deformation modes were inspected individually under a power law function. When the eigenvalue data was fit to the relationship log(λ)=a log⁡(L)+b, the parameter a was the appropriate scaling exponent to fulfill the λ∝L^∎ relationship in Table 1.” Furthermore, additional information is included in Supporting Information. S1 Table and the text beneath that table: “The power law relationship for transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins was determined by establishing the best fit for the parameters a and b in log(λ)=a log⁡(L)+b. The scaling exponent (slope) and intercept are tabulated alongside their 95% confidence intervals in parentheses.” Reviewer’s Comments: 5) Caption in Fig 3, should be written in more detail. Protein structures are hardly visible. Author’s Response: We appreciate this important feedback and agree with the reviewer’s comments. In response to the reviewer’s comments, we revised the Fig 3 caption to provide more detail on the relationship between the length of the α-helix, and the colour and thickness of each line (Page 15, Lines 329-334; Fig 3): “Fig 3. Each line represents the percentage of total variance explained by the first ten principal components for α-helices of a certain length (L). Sixteen lines are plotted to illustrate this trend in the range 10≤L≤25. The length of the α-helix in question is represented by the colour and thickness of each line. These distributions were plotted for (A) transmembrane α-helices, (B) extramembrane α-helices, and (C) α-helices in soluble proteins. The structures of PDB entries 3JBR [24] and 5AM9 [25] are shown for illustrative purposes.” Furthermore, in response to the reviewer’s comments, we increased the size of the images of the protein structures in both Fig 3 and Fig 4. Reviewer’s Comments: 6) What are the resolution of the structures used in this study? Are high resolution structures used for the analysis? if not, the fact that the authors do not see any difference in the PC of a-helices in different environments is most likely a result of the inaccuracy of structure determination. Author’s Response: We thank the reviewer for this very important comment. In response to the reviewer’s comment, we have revised the manuscript to include a new supplementary figure S4 Fig showing normalized histograms of the resolutions of the structures used in our study for both membrane and soluble proteins. This figure shows that the average resolution of proteins used in this study is 2.31 Å for soluble proteins and 3.02 Å for membrane proteins. The reviewer raises an important point that since soluble proteins have, on average, a better resolution than membrane proteins, our results and conclusions could potentially be confounded by the low-resolution structures included in our analysis. To investigate this, we repeated our study using only high-resolution structures for both membrane and soluble proteins with a resolution of ≤ 3 Å. Our results are broadly consistent, and our conclusions remain unchanged. We present these new results using exclusively α-helices from high-resolution structures in the Supporting Information (S5 Fig, S2 Table, S6 Fig, and S7 Fig). We added a new part to our discussion (Pages 18-19, Lines 393-406) that describes the high-resolution analysis we pursued: “We considered the possibility that the resolution of the protein structures used to pursue our study could affect the deformation mode eigenvalues, scaling behaviour, and percentage of total variance explained that we observe. The average resolution of soluble proteins collected in our study is 2.31 Å and the average resolution of soluble proteins collected in our study is 3.02 Å (see the histograms in S4 Fig). We repeated our analysis on structures within our original three datasets that have a resolution of ≤ 3 Å. The ten largest eigenvalues of 18-residue α-helices across the three datasets in protein structures with a resolution of ≤ 3 Å are presented in S5 Fig. Using these eigenvalues, the scaling exponents (in S2 Table), and the percentage of variance explained by each deformation mode (in S6 Fig and S7 Fig) were calculated. The results of our high-resolution analysis closely match the ones presented in our main study, except for the extramembrane α-helices’ scaling behaviour. With a resolution of 3 Å as an upper bound, the extramembrane α-helix dataset shrunk to about 20% of its original size. As presented in S2 Table, this resulted in a Bend 2 scaling exponent of 2.9 (NMA predicts a scaling exponent of 4 for bending modes) and a Twist scaling exponent of 2.1 (NMA predicts a scaling exponent of 2 for the twisting mode).” RESPONSE TO COMMENTS FROM REVIEWER #2 Reviewer’s Comments: The work follows in the footstep on an older study [4] and analyses the eigenvalues of the top 3 values (and others as well) of the PCA of the superimposed helices in different environments. Author’s Response: We thank the reviewer for their positive evaluation of our work and for their helpful suggestions on improving the clarity of our work and on suggesting future directions for our work. We revised our manuscript to address their comments, as described below. Reviewer’s Comments: As a suggestion: it would be interesting to understand if these results are due to the methodology of representing the data: that is running a PCA calculation over helices that are initially superimposed: that the helices are generally placed so that their main axis is aligned implies that the first two eigenvalues will be those perpendicular to this axis (spanning the two dimensions of the plane). If an internal representation was used (e.g., torsion angles) no superposition was needed, and then maybe the main deformation directions were different? Or alternatively, if ICA was used instead of PCA, there wouldn't be the constraint that these are perpendicular to each other. Author’s Response: We thank the reviewer for this important suggestion. While the purpose of the current study is to compare the deformation behaviour of α-helices in membrane versus soluble proteins within the analytical framework of Emberly et al., who used Cartesian representation and PCA, we completely agree with the reviewer that torsion angle representation and ICA are important topics for future work. While the current study only uses α-carbons, we believe that the torsion angle representation would be more appropriate for a future analysis done on all backbone atoms since this approach would assume that bond lengths are invariant. In addition, we agree with the reviewer that it would be interesting to apply ICA to our dataset, especially because ICA does not have an orthogonal basis, but at the same time it would be more difficult to directly compare ICA results to normal mode analysis (NMA). In response to the reviewer’s comments, we revised the manuscript to include the following discussion of future work (Page 19, Lines 419-423): “In addition to including residue identity and degree of solvent exposure, future analyses could include all α-helix backbone atoms. This would open the possibility of using torsion angle representations since this approach follows the assumption that bond lengths are invariant. Since the distance between α-carbons is not uniform, this internal representation would not be accurate with the α-carbon dataset we used to pursue this study.” Pages 20, Lines 428-434: “In our analysis, the top three deformation modes are manifested as Bend 1, Bend 2, and Twist specifically because PCA outputs the principal components using an orthogonal basis. We selected PCA as it is considered a data-driven counterpart to NMA [4]. It is possible as future work to analyze the α-helix atomic coordinates using other data-driven approaches such as Independent Component Analysis (ICA), which will not force the components into an orthogonal basis. At the same time, the independent components likely will present the results differently in such a way that they would not be directly comparable to NMA.” Reviewer’s Comments: Minor comments: (1) It would be helpful if there are more details why the principal components and deformation modes are interchangeable terms (line 66) -- this is described in greater detail in [4], but it would helpful if one is not required to go there. Author’s Response: We thank the reviewer for this helpful comment. We had originally written that line to state the interchangeability of the terms ‘principal components’ and ‘deformation modes’ stemming from Emberly et al. [4] so that there would be no doubt in the reader’s mind about this. We agree that we should expand this statement to more clearly justify this. In response to the reviewer’s comment, we have revised the manuscript to provide more detail on the interchangeability between these two terms (Pages 3-4, Lines 65-68): “In this context, principal components and deformation modes are interchangeable terms because they both originate from two distinct models (PCA and normal mode analysis) that draw similar conclusions on the flexibility of an α-helix.” Reviewer’s Comments: (2) Does Figure 1 show the average, and the deformed helix (similar to the figure in [4])? in this case, and otherwise, more details are needed on what exactly is shown. Author’s Response: This is an excellent point. For the sake of clarity, we should describe in greater detail exactly what α-helix 1 and α-helix 2 mean and where they come from. In each subplot (A)-(C), α-helix 1 and α-helix 2 are individual helices from the PDB in our transmembrane α-helix dataset. They are plotted specifically because they represent the most extreme cases in our dataset of each deformation type, which makes it easier for the reader to see what the principal components should look like. They are here as a visual aide to provide a more intuitive understanding of what the three dominant principal components resolved from an α-helix are. Based on Reviewer #1’s suggestion, we decided to create a new Fig 1 and relocate the Fig 1 from the original manuscript to the Supporting Information (S1 Fig). Fig 1, S1 Fig, and their captions reflect the changes described in this response (Page 4, Lines 80-86; Fig 1). Submitted filename: Response to Reviewers.docx Click here for additional data file. 31 Aug 2021 Principal component analysis of alpha-helix deformations in transmembrane proteins PONE-D-21-09843R1 Dear Dr. Xia, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Parag A. Deshpande Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: N/A ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: No ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: As I mentioned before. The authors address all points in a precise manner. Thus, I recommend the current version of the manuscript for publication Reviewer #2: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No 6 Sep 2021 PONE-D-21-09843R1 Principal component analysis of alpha-helix deformations in transmembrane proteins Dear Dr. Xia: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Parag A. Deshpande Academic Editor PLOS ONE

24 in total