Literature DB >> 23424587

Statistical Analysis of Terminal Extensions of Protein β-Strand Pairs.

Ning Zhang1, Shan Gao, Lei Zhang, Jishou Ruan, Tao Zhang.   

Abstract

The long-range interactions, required to the accurate predictions of tertiary structures of β-sheet-containing proteins, are still difficult to simulate. To remedy this problem and to facilitate β-sheet structure predictions, many efforts have been made by computational methods. However, known efforts on β-sheets mainly focus on interresidue contacts or amino acid partners. In this study, to go one step further, we studied β-sheets on the strand level, in which a statistical analysis was made on the terminal extensions of paired β-strands. In most cases, the two paired β-strands have different lengths, and terminal extensions exist. The terminal extensions are the extended part of the paired strands besides the common paired part. However, we found that the best pairing required a terminal alignment, and β-strands tend to pair to make bigger common parts. As a result, 96.97%  of β-strand pairs have a ratio of 25% of the paired common part to the whole length. Also 94.26% and 95.98%  of β-strand pairs have a ratio of 40% of the paired common part to the length of the two β-strands, respectively. Interstrand register predictions by searching interacting β-strands from several alternative offsets should comply with this rule to reduce the computational searching space to improve the performances of algorithms.

Entities:  

Year:  2013        PMID: 23424587      PMCID: PMC3569888          DOI: 10.1155/2013/909436

Source DB:  PubMed          Journal:  Adv Bioinformatics        ISSN: 1687-8027


1. Introduction

The issue of protein structure prediction is still extremely challenging in bioinformatics [1, 2]. Usually, structural information for protein sequences with no detectable homology to a protein of known structure could be obtained by predicting the arrangement of their secondary structural elements [3]. As we know, the two predominant protein secondary structures are α-helices and β-sheets. However, a combination of the early suitable α-helical model systems and sustained researches have resulted in a detailed understanding of α-helix, while comparatively little is known about β-sheet [4]. Tertiary structures of β-sheet-containing proteins are especially difficult to simulate [3, 5]. Unlike α-helices, β-sheets are more complex resulting from a combination of two or more disjoint peptide segments, called β-strands. Therefore, the β-sheet topology is very useful for elucidating protein folding pathways [6, 7] for predicting tertiary structures [3, 8–11], and even for designing new proteins [12-14]. As fundamental components, β-sheets are plentifully contained in protein domains. In a β-sheet, multiple β-strands held together linked by hydrogen bonds and can be classified into parallel and antiparallel direction styles. Adjacent β-strands bring distant residues on sequences into close special contact with one another and constitute a specific mode of amino acid pairing [1, 15–17], interactions (like DNA base pairing). There is a growing recognition of the importance of the strand-to-strand interactions among β-sheets [18]. Several studies, including statistical studies examining frequencies of nearest-neighbor amino acids in β-sheets, found a significantly different preference for certain interstrand amino acid pairs at nonhydrogen-bonded and hydrogen-bonded sites [1, 17, 19, 20], Dou et al. [21] created a comprehensive database of interchain β-sheet (ICBS) interactions. We also developed the SheetsPair database [22] to compile both the interchain and the intrachain amino acid pairs. Generally speaking, previous work on β-sheets mainly focused on the interresidue contacts or amino acid partners [23-28]. Prediction of inter-residue contacts in β-sheets is interesting, while the prediction by ab initio structure is also useful to understand protein folding [29, 30]. Our previous studies showed that the interstrand amino acid pairs played a significant role to determine the parallel or antiparallel orientation of β-strands [15], and the statistical results could possibly be used to predict the β-strand orientation [16]. Cheng and Baldi [11] introduced BETAPRO method to predict and assemble β-strands into a β-sheet, in which a single misprediction of one amino acid pairing from the first stage could be amplified by the next stages and results in serious wrong set of partner assignments between β-strands. However, those studies can be viewed as initial steps of β-sheet studies relative to predict strand level pairing [25]. In this paper, to go one step further, we investigate the β-strand pairing on the strand level for exploring the rules of how β-strands form a β-sheet. Many results have shown the importance of statistical analysis in protein structure studies [15, 16]. In particular, statistical information could provide a starting point for de novo computational design methods that are now becoming successful for short, single-chain proteins [14], as well as methods of protein structure predictions and understanding of protein folding mechanisms [31, 32]. Fooks et al. [1] also indicated that such statistical analysis results would be useful for protein structure prediction. Therefore, we advocate using the tools of statistics and informatics to study β-sheet and generate new rules for algorithm development. In this study, we focused on the terminal extensions of paired β-strands.

2. Results

2.1. Dataset

All protein structure data used in this study were taken from a PISCES [33, 34] dataset generated on May 16, 2009. In the dataset, the percentage identity cutoff is 25%, the resolution cutoff is 2.0 angstroms, and the R-factor cutoff is 0.25. Secondary structures were assigned from the experimentally determined tertiary structures by using the DSSP program. Besides proteins containing disordered regions [35-37], all data were further preprocessed according to the following criteria: (i) no β-sheet-containing protein chains were removed; (ii) protein chains with nonstandard three-letter residue names (such as DPN, EFC, ABA, C5C, PLP, etc.) were removed, since these indicate that the protein chains have covalently bounded ligands or modified residues; (iii) protein chains with uncertain structures or incorrect data were removed. Since β-bulges tend to be isolated and rare [11], we did not consider β-bulges in this study either, as several previous studies did [1, 3]. Finally, 2,315 protein chains were extracted, containing 19,214 β-strand pairs. Note that in the special case of β-bulges, no amino acid pair is assigned.

2.2. The β-Sheet Structure

The β-sheets, where two or more β-strands are arranged in a specific conformation, are illustrated in Figure 1(a), by a protein example (PDB code 1HZT). Adjacent strands, or the so-called strand pairs, can either run in the same (parallel) or in the opposite (antiparallel) direction styles. In protein 1HZT, there are 3 β-sheets called A, B, and C, formed by 10 different β-strands numbered from 1 to 10, making 7 different β-strand pairs, respectively. The 10 β-strands can be named by the β-sheet each belongs to and the index numbers in the order of partnership. For example, the 3 β-strands forming β-sheet A can be called “A1,” “A2,” and “A3,” while other 4 β-strands forming β-sheet B can be called “B1,” “B2,” “B3,” and “B4,” respectively. “A1-A2,” “A2-A3,” “B1-B2,” “B2-B3,” and “B3-B4” are all β-strand pairs. Sequences of the 10 β-strands with their initial and ending residue numbers are also given in Figure 1(b).
Figure 1

An illustrated example of β-strand pairing in a β-sheet (PDB code: 1HZT). (a) The sketch of the tertiary structure of the protein produced by using RASMOL. Protein 1HZT is an α/β protein with 10 β-strands numbered from 1 to 10, forming seven different strand pairs. (b) The sequences of the 10 β-strands with their initial and ending residue numbers. (c) The 10 β-strands in the linear primary sequence. (d) An example of a β-strand partnership graph. The pairing is between strands “B3” and “B4,” with the light gray box representing the common pairing part.

2.3. Different Lengths of Paired β-Strands

For a β-strand pair, the terminal of one β-strand does not always align with the terminal of the other (Figure 2), making “terminal extensions” besides the common paired parts. Note that only amino acids in the common part construct amino acid pairs.
Figure 2

A schematic diagram of terminal extensions of β-strand pairs. The two blank lines represent the two β-strands, respectively. The light gray box represents the common pairing part of the two β-strands with amino acid pairing.

Why “terminal extensions” exist widely in β-strand pairs? We firstly investigated the lengths of two paired β-strands and then calculated the percent of each case whether the “terminal extensions” exist or not. Results are shown in Table 1.
Table 1

Statistical results of lengths of two paired β-strands and percent of samples in each case whether the two “terminal extensions” exist or not.

Abs (SL 1SL 2)*Number of pairsPercentPercent of Et 1 = 0 and Et 2 = 0Percent of Et 1 = 0 and Et 2 > 0Percent of Et 1 > 0 and Et 2 = 0Percent of Et 1 > 0 and Et 2 > 0
0567329.53%82.95%0.00%0.00%17.05%
1563329.32%0.00%40.48%44.70%14.82%
2317016.50%0.00%30.57%41.10%28.33%
317989.36%0.00%28.59%31.15%40.27%
410165.29%0.00%26.77%29.82%43.41%
56183.22%0.00%29.13%27.51%43.37%
64012.09%0.00%30.42%25.69%43.89%
73231.68%0.00%25.39%32.51%42.11%
82471.29%0.00%16.19%30.36%53.44%
91010.53%0.00%26.73%24.75%48.51%
10690.36%0.00%20.29%39.13%40.58%
>101650.86%0.00%15.15%27.27%57.58%

 *Absolute value of the difference of SL 1  −  SL 2.

As shown in Table 1, the two paired β-strands having the same length only account for 29.53% of all samples. In other 70.47% percent of samples, lengths of the two paired β-strands are different.

2.4. Statistical Results of Variables

We define the following variables. Let SL 1 and SL 2 represent the lengths of two paired β-strands, respectively. Length of the β-strand with smaller strand number (strand numbers can be obtained from PDB database) is defined as SL 1, while length of the other β-strand is defined as SL 2. Let PL stand for the length of the common part, which is often smaller than SL 1 and SL 2. Terminal extensions can be found in either of the two β-strands. We define the lengths of the two terminal extensions Et 1 and Et 2, respectively. Length of the terminal extension of the β-strand with length SL 1 is defined as Et 1while the other as Et 2. Let EL represent the whole length; EL = PL + Et 1 + Et 2. Then, the paring ratio R could be calculated by The ratio of the common paired part to the length of each β-strand (i = 1,2) could be calculated by A small percent of β-strand pairs have no “terminal extensions,” the R, Rt 1, and Rt 2 values for which will be 100%. We calculated PL, Et 1, Et 2, EL for all β-strand pairs in the present dataset. Table 2 gives the range of these variables as well as the averages and standard deviations.
Table 2

Statistical results of variables of β-strand pairs in the current dataset.

Minimum valueMaximum valueAverageStandard deviation
SL 1 1254.992.82
SL 2 1254.902.80
PL 1234.862.26
Et 1 0181.151.79
Et 2 0221.031.64
EL 2297.033.09
We also calculated R, Rt 1, and Rt 2 for all β-strand pairs in the present dataset. The distribution of these variables is shown in Figure 3.
Figure 3

Distribution of R, Rt 1, and Rt 2 variables in the current dataset.

3. Discussion

3.1. Strands Tend to Align Their Terminals

For the 70.47% of samples with different strand lengths, although they have different lengths, the differences are not big for most of them. Only a small percent of samples (below 2.09%) have the difference above 5. In these cases, it is obvious that they cannot align the terminals (with both Et 1 = 0 and Et 2 = 0). They have two ways to choose from: either align to only one terminal making another “terminal extension”, or align to none of the two terminals making both “terminal extensions.” However, it can be seen from Table 1 that most β-strands tend to be in the former case. For example, in case of the length difference 1, the former case accounts for 85.18% while the latter only 14.82%. It is consistent with the case of same-length strand pairs, in which β-strands tend to align their terminals with each other. Interestingly, it is suggested that β-strands tend to align their terminals. In different-length strand pairs, they still retain one terminal alignment, although they can not align both ends.

3.2. Small “Terminal Extensions”

From Table 2, it can be seen that lengths of β-strands are not very long, ranging from 1 to 25 with an average length about 4-5 amino acids. The averages and the standard deviations are similar between lengths of the two paired β-strands (SL 1 and SL 2). The length of the common part PL has a range similar to that of lengths of β-strands. This indicates that although “terminal extensions” exist, common pairing parts occupy most of β-strands, while “terminal extensions” occupy least. The fact that the maximum value of EL is 29, only a little bigger than that of lengths of β-strands, and the fact that in average both the “terminal extensions” only have about 1 amino acid (Et 1 = 1.05 and Et 2 = 1.03) also support this assumption. Figure 3 gives percent of samples for R, Rt 1, and Rt 2 in each range of their possible values (from 0% to 100%), respectively. It can be seen that the distributions of Rt 1 and Rt 2 are similar. More than half of the β-strand pairs have these two variables above 95% (or in the range (95–100)). Big Rt 1 or Rt 2 means big common part of β-strands, or small “terminal extensions.” Rare β-strand pairs have smaller values of R, Rt 1, and Rt 2, which indicates that most β-strands do not pair by means of small “common part” or big “terminal extensions.” It could be concluded from the results that β-strands tend to pair with bigger pairing common parts, leaving smaller “terminal extensions.”

3.3. Possible Reasons for β-Strand Extensions

Why “terminal extensions” exist so widely in β-strand pairs? The fact that lengths of two paired β-strands are not the same in most cases as shown in Table 1 may be one of the possible reasons. If paired β-strands have the same lengths, most of them (82.95%) tend to align their terminals with each other, leaving no “terminal extensions.” A β-strand is led to pair with another by several kinds of potential forces. Steward and Thornton [3] indicated that a single β-strand was still able to recognize a noninteracting β-strand with greater accuracy than that in the case of between two random sequences. The potential forces include hydrogen bonds, van der Waals forces, electrostatic interaction, ionic bonds, hydrophobic effects, and so forth. Parisien and Major [38] revealed that among all the forces, the most important one was the construction of a hydrophobic face. It is conceivable that one residue of a β-strand prefers to pair with the residue of another resulting in a stable state of hydrophobic effects. Optimizing such interactions may result in extensions, which could be the second reason, since more often than not the “terminal alignment” is not the case of optimized pairing style. A third possible reason could be due to the nucleation events that initiate the β-sheet folding. Amino acids in the central part could pair firstly and then fold to extend to terminals. Another reason is the roles of the nonpaired terminal amino acids in stabilizing the β-sheet structure. Several other studies have identified their key roles in modulating protein folding rates, stability, and folding mechanism [39-43]. Therefore, the β-strand terminals could also be important factors for a β-sheet formation.

3.4. Ratio Rule of Pairing Strand Alignment

To quantify the pairing common part of paired β-strands, we calculated the cumulative percent of variables R, Rt 1, and Rt 2 and depicted them in Figure 4.
Figure 4

Cumulative percentages (CPs) of R, Rt 1, and Rt 2 calculated from the present dataset. The horizontal axis denotes the percentage of common paired region PL to EL (for curve R) or to SL (for curves Rt 1 and Rt 2). Points on the R curve denote the cumulative percentages of samples whose R = PL/EL equals or is bigger than the corresponding abscissa value. Points on the Rt 1 and Rt 2 curves denote the cumulative percentages of samples whose Rt 1 = PL/Rt 1 or Rt 2 = PL/Rt 2 equals or is bigger than the corresponding abscissa value, respectively.

From Figure 4, it can be seen that when Rt 1 ≥ 40% and Rt 2 ≥ 40%, the cumulative percentages reach 94.26% and 95.98%, respectively, while when R ≥ 40% only 89.89%. When R ≥ 25%, the cumulative percentages reach up to 96.97%. Therefore, a rule can be made of the alignment of β-strand pair as follows: Almost all samples (above 94%) obey this rule. In a β-strand alignment prediction algorithm, all possible pairings should be examined and scored; it is a time-consuming task. Kato et al. [44] stated that prediction of planar β-sheet structures was NP-hard in the present state of our knowledge (http://en.wikipedia.org/wiki/NP-hard). However, this previous rule should be used as a constraint of the relative positions in β-strand alignment to reduce the computational searching space, which could be used to develop high-speed β-strand topology prediction algorithms.

4. Conclusion

At the most straightforward level, full “identification” of a β-strand pair could consist of (i) finding the interacting partner β-strand(s), (ii) predicting the relative orientation (i.e. parallel or antiparallel), and (iii) shifting the relative positions of the two interacting β-strands [15, 16]. In this study, we focused on the third aspect. The formation of protein structure and protein folding mechanism are very complex, and the mechanisms of β-sheet formation are unclear [45]. However, simple rules could contribute to developing new algorithms in the step of full prediction of β-sheet and understanding of protein folding pathways in ongoing research. In this study, to go one step further, we studied β-sheets on the strand level instead of amino acid level. Statistical analyses of the terminal extensions of paired β-strands were performed and a simple rule “R ≥ 25% and Rt ≥ 40%, i = 1,2” was made. Steward and Thornton [3] developed an information theory approach to predict the relative offset positions by shifting one β-strand up to 10 residues either side of that observed. Such a rule could be used in similar studies. We certainly believe that the conclusions presented in this study could contribute to predict protein structures and to develop β-sheet prediction methods.
  37 in total

1.  A minimal peptide scaffold for beta-turn display: optimizing a strand position in disulfide-cyclized beta-hairpins.

Authors:  A G Cochran; R T Tong; M A Starovasnik; E J Park; R S McDowell; J E Theaker; N J Skelton
Journal:  J Am Chem Soc       Date:  2001-01-31       Impact factor: 15.419

2.  Contributions of residue pairing to beta-sheet formation: conservation and covariation of amino acid residue pairs on antiparallel beta-strands.

Authors:  Y Mandel-Gutfreund; S M Zaremba; L M Gregoret
Journal:  J Mol Biol       Date:  2001-02-02       Impact factor: 5.469

3.  Matching protein beta-sheet partners by feedforward and recurrent neural networks.

Authors:  P Baldi; G Pollastri; C A Andersen; S Brunak
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  2000

4.  Protein beta-sheet nucleation is driven by local modular formation.

Authors:  Brent Wathen; Zongchao Jia
Journal:  J Biol Chem       Date:  2010-04-10       Impact factor: 5.157

5.  Ranking the factors that contribute to protein beta-sheet folding.

Authors:  Marc Parisien; François Major
Journal:  Proteins       Date:  2007-09-01

6.  Prediction of the parallel/antiparallel orientation of beta-strands using amino acid pairing preferences and support vector machines.

Authors:  Ning Zhang; Guangyou Duan; Shan Gao; Jishou Ruan; Tao Zhang
Journal:  J Theor Biol       Date:  2009-12-24       Impact factor: 2.691

7.  The interstrand amino acid pairs play a significant role in determining the parallel or antiparallel orientation of beta-strands.

Authors:  Ning Zhang; Jishou Ruan; Guangyou Duan; Shan Gao; Tao Zhang
Journal:  Biochem Biophys Res Commun       Date:  2009-06-18       Impact factor: 3.575

8.  Dynamic programming algorithms and grammatical modeling for protein beta-sheet prediction.

Authors:  Yuki Kato; Tatsuya Akutsu; Hiroyuki Seki
Journal:  J Comput Biol       Date:  2009-07       Impact factor: 1.479

9.  Modulating protein folding rates in vivo and in vitro by side-chain interactions between the parallel beta strands of green fluorescent protein.

Authors:  J S Merkel; L Regan
Journal:  J Biol Chem       Date:  2000-09-22       Impact factor: 5.157

10.  A cross-strand Trp Trp pair stabilizes the hPin1 WW domain at the expense of function.

Authors:  Marcus Jäger; Maria Dendle; Amelia A Fuller; Jeffery W Kelly
Journal:  Protein Sci       Date:  2007-08-31       Impact factor: 6.725

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.