Ning Zhang1, Yuanming Feng2, Shan Gao3, Jishou Ruan4, Tao Zhang5. 1. Department of Biomedical Engineering, Tianjin University, Tianjin Key Lab of BME Measurement, Tianjin, 300072, PR China; College of Life Sciences, Nankai University, Tianjin, PR China, 300071. 2. Department of Biomedical Engineering, Tianjin University, Tianjin Key Lab of BME Measurement, Tianjin, 300072, PR China. 3. College of Mathematical Science, Nankai University, Tianjin 300071, PR China; College of Life Sciences, Nankai University, Tianjin, PR China, 300071. 4. College of Mathematical Science, Nankai University, Tianjin 300071, PR China; State Key Laboratory for Medical Chemical and Biology at Nankai University, Tianjin, PR China, 300071. 5. College of Life Sciences, Nankai University, Tianjin, PR China, 300071.
Abstract
The folding of denatured proteins into their native conformations is called Anfinsen's dogma, and is the rationale for predicting protein structures based on primary sequences. Through the last 40 years of study, all available algorithms which either predict 3D or 2D protein structures, or predict the rate of protein folding based on the amino acid sequence alone, are limited in accuracy (80 %). This fact has led some researchers to look for the lost information, from mRNA to protein sequences, and it encourages us to rethink the rationale of Anfinsen's dogma. In this study, we focus on the relationship between the strand and its partners. We find two rules based on a non-redundant dataset taken from the PDB database. We refer to these two rules as the "first coming first pairing" rule and the "loveless" rule. The first coming first pairing rule indicates that a given strand prefers to pair with the next strand, if the connected region is flexible enough. The loveless rule means that the affinities between a given strand and another strand are comparable to the affinity between the given strand and its partner. Of course, the affinities between the given strand and a helix/coil peptide are significantly less than the affinity between the given strand and its partner. These two rules suggest that in protein folding, we have folding taking place during translation, and suggest also that a denatured protein is not the same as its primary sequence. Rechecking the original Anfinsen experiments, we find that the method used to denature protein in the experiment simply breaks the disulfide bonds, while the helices and sheets remain intact. In other words, denatured proteins still retain all helices and beta sheets, while the primary sequence does not. Although further verification via biological experiments is needed, our results as shown in this study may reveal a new insight for studying protein folding.
The folding of denatured proteins into their native conformations is called Anfinsen's dogma, and is the rationale for predicting protein structures based on primary sequences. Through the last 40 years of study, all available algorithms which either predict 3D or 2D protein structures, or predict the rate of protein folding based on the amino acid sequence alone, are limited in accuracy (80 %). This fact has led some researchers to look for the lost information, from mRNA to protein sequences, and it encourages us to rethink the rationale of Anfinsen's dogma. In this study, we focus on the relationship between the strand and its partners. We find two rules based on a non-redundant dataset taken from the PDB database. We refer to these two rules as the "first coming first pairing" rule and the "loveless" rule. The first coming first pairing rule indicates that a given strand prefers to pair with the next strand, if the connected region is flexible enough. The loveless rule means that the affinities between a given strand and another strand are comparable to the affinity between the given strand and its partner. Of course, the affinities between the given strand and a helix/coil peptide are significantly less than the affinity between the given strand and its partner. These two rules suggest that in protein folding, we have folding taking place during translation, and suggest also that a denatured protein is not the same as its primary sequence. Rechecking the original Anfinsen experiments, we find that the method used to denature protein in the experiment simply breaks the disulfide bonds, while the helices and sheets remain intact. In other words, denatured proteins still retain all helices and beta sheets, while the primary sequence does not. Although further verification via biological experiments is needed, our results as shown in this study may reveal a new insight for studying protein folding.
Entities:
Keywords:
beta-sheet; helix; near-neighbor pairing; protein folding; strand-level
Anfinsen's dogma ensures that protein 3D-structure is perfectly determined by the amino acid sequence (Anfinsen, 1973[1]). As the rationale to support the protein folding problem and the de novo structure prediction to obtain tremendous progresses. For example, the fragment assembly method (Bradley et al., 2005[6]; Lee et al., 2005[22]; Fujitsuka et al., 2006[14]) and TASSER method (Wu et al., 2007[37]; Zhou et al., 2007[44]; Zhou and Skolnick, 2007[45]). However, all available algorithms to describe amino acid sequences folding into their native structures (Fooks et al., 2006[13]; Parisien and Major, 2007[29]; Dorn and Souza, 2010[10]) have not arrived at the ideal accuracy. The protein folding kinetics and design are still challenge problems (Huang and Gromiha, 2010[17]; Bowman et al., 2011[5]). Why protein de novo structure prediction obstacles? We should rethink the Anfinsen dogma. Does the Anfinsen dogma have flaws? The denatured protein is really the same as the primary sequence?The functional areas on protein tertiary structures often involve secondary structure elements (i.e., α-helices, β-sheets). The study of secondary structures is very important for recognizing protein folding and solving structure prediction problems (Steward and Thornton, 2002[32]; Zhang et al., 2005[39]). Therefore, it is valid to mine this knowledge from secondary structures. Regarding α-helices and β-sheets, the α-helix has been understood in much detail, while comparatively little is known about the β-sheet (Jäger et al., 2007[18]). The tertiary structures of β-sheet-containing proteins are especially difficult to simulate (Steward and Thornton, 2002[32]; Kuhn et al., 2004[20]; Wathen and Jia, 2010[35]). Unlike α-helices folded by one peptide, β-sheets are folded by two or more disjoint peptides (strands). In this structure, adjacent β-strands bring distant residues into close contact with one another, and constitute a specific mode of amino acid pairing (like DNA base pairing) (Fooks et al., 2006[13]; Ashkenazy et al., 2011[2]; Zhang et al., 2009[41], 2010[40]).Studies on β-sheets have become interesting problems in bioinformatics. There is a growing recognition of the importance of strand-to-strand interactions among β-sheets (Nowick, 2008[28]). Several studies, including statistical studies examining the frequencies of nearest-neighbor amino acids, found significantly different preferences for certain inter-strand amino acid pairs (Russell and Cochran, 2001[31]; Fooks et al., 2006[13]; Ashkenazy et al., 2011[2]). Dou and his colleagues created a comprehensive database for interchain β-sheet (ICBS) interactions (Dou et al., 2004[11]). In our previous studies, we also constructed the SheetsPair database (Zhang et al., 2007[42]) to compile both interchain and intrachain amino acid pairs.The known efforts on β-sheets focus mainly on the inter-residue contacts or amino acid partners (Baldi et al., 2000[4]; Zhang et al., 2005[43]; Grana et al., 2005[15]; Halperin et al., 2006[16]; Kundrotas and Alexov, 2006[21]; Cheng and Baldi, 2007[8]). Although predictions of inter-residue contacts are interesting and useful for an understanding of protein folding (Zhang et al., 2005[39]; Cheng and Baldi, 2007[8]), those studies should be viewed as the initial steps of β-sheet studies (Baldi et al., 2000[4]). BETAPRO, a method to assemble β-strands to predict β-sheets, was introduced by Cheng and Baldi (2005[9]). However, BETAPRO was based on prediction results of residue contacts, in which a single mis-prediction of one amino acid pair from the first stage could be amplified through subsequent stages and results in seriously incorrect strand pairs. Kato et al. (2009[19]) stated that the prediction of planar β-sheet structures belongs to the NP-hard class of complexity in our present state of knowledge. Our previous studies showed that the interstrand amino acid pairs played a significant role in determining the parallel or antiparallel orientation of β-strands (Zhang et al., 2009[41]), and the statistical results could possibly be used to predict β-strand orientation (Zhang et al., 2010[40]). In our present study, we attempted to take further steps in the investigation of β-sheets in strand-level, in hopes of gaining insight.
Dataset
All protein structures used in this study were taken from a PISCES (Wang and Dunbrack, 2003[33], 2005[34]) dataset, generated on May 16, 2009. In this dataset, the percentage identity cutoff is 25 %, the resolution cutoff is 2.0 angstroms, and the R-factor cutoff is 0.25. Besides removing the proteins containing disordered regions (Ferron et al., 2006[12]; Linding et al., 2003[24]; Liu et al., 2009[25]), all data were further pre-processed according to the following criteria: (1) Protein chains having no β-sheet are removed; (2) Protein chains containing non-standard residues (i.e., DPN, EFC, ABA, C5C, PLP, et al.) are removed because these protein chains have covalently-bounded ligands or modified residues; (3) Protein chains having no uncertain structures or incorrect data are removed. Finally, 2,298 protein chains are kept, and 6,740 parallel β-strand pairs and 12,474 antiparallel β-strand pairs are obtained from these 2,298 protein chains.
Results and Discussion of the Statistical Analysis of BSD
The β-strand Distance (BSD)
A β-sheet is folded by two or more extended strands. We select the protein 1HZT (PDB code) as an illustrative example, shown in Figure 1(a)(Fig. 1). Protein 1HZT has three β-sheets, called A, B and C respectively. A, B and C are folded by 10 different β-strands numbered 1 to 10 from N-terminal to C-terminal, respectively. The locations and the amino acids corresponding to the 10 β-strands are shown in Figure 1(b)(Fig. 1). In each β-sheet folded by multiple strands, the subunit folded by two strands is referred to as a β-strand pair. Thus, each β-sheet has at least one β-strand pair. All β-strand pairs (SR) may be classified into parallel and antiparallel pairs, according to the directions of the two strands in the β-strand pair. Typically, the protein 1HZT has 2 parallel and 5 antiparallel pairs. The strands 'B3' and 'B4' are folded to an antiparallel pair, shown in Figure 1(d)(Fig. 1).
Figure 1
Illustration of β-strand pairing in a β-sheet (1HZT) (a) The sketch of the tertiary structure of the protein produced using RASMOL. Protein 1HZT is an α/β protein having 10 β-strands, numbered from 1 to 10 from N-terminal to C-terminal. These 10 β-strands fold into three β-sheets, and we can produce 7 strand pairs. (b) The sequences of the 10 β-strands with their initial and ending residue numbers. (c) The 10 β-strands in the linear primary sequence. The BSD of A1 and B2 is 1, while the BSD of A1 and C2 is 2 and the BSD of A1 and C1 is 3. (d) An example of a β-strand pair formed by strand “B3” and “B4”, with the light gray box representing the common region of the pair.
The β-sheet topology or architecture (i.e. the pairing assignments of all the forming β-strands) is essential for understanding a protein's tertiary structure (Zhang and Kim, 2000[38]). In this study, the β-strand distance (BSD) of a strand pair is defined as the number of β-strands along the primary sequence between the two paired strands. No matter the number of residues between them, the BSD only considers the number of strands (Figure 1(c)(Fig. 1)). It is obvious that the BSD is 1 in the case where there are no other β-strands between the two paired strands along the primary sequence, and we refer to this as 'nearest pairing' in this study.1(b): In each β-sheet folded by multiple strands, the subunit folded by two strands is referred to as a β-strand pair. Thus, each β-sheet has at least one β-strand pair. All β-strand pairs (SR) may be classified into parallel and antiparallel pairs, according to the directions of the two strands in the β-strand pair. Typically, the protein 1HZT has 2 parallel and 5 antiparallel pairs. The strands 'B3' and 'B4' are folded to an antiparallel pair, shown in Figure 1(d)(Fig. 1).
The “First Coming First Pairing” rule
Based on the benchmark dataset consisting of the 6,740 parallel pairs and the 12,474 antiparallel pairs, we compute the β-strand Distance (BSD) for all strand pairs in our dataset. The rates of the number of strand pairs having different BSDs based on the set of 6,740 parallel pairs, the set of 12,474 antiparallel pairs and the entire benchmark dataset are shown in Table 1(Tab. 1) and Figure 2(Fig. 2). From Table 1(Tab. 1), we note that the maximal BSD within the parallel pairs is 30, and the maximal BSD within antiparallel pairs is 54. Typically, the occurrence rate of strand pairs with BSD=1 is about 60 %, the rate of these strand pairs with BSD less than 3 is about 80 %, and the rate of these strand pairs with BSD less than 10 is about 97 %. The cumulative percents, according to BSDs, are shown in Figure 2(a)(Fig. 2). It is obvious that the curve increases sharply when the BSD is small, and it seems to be constant as BSD increases to larger than 10. Moreover, this rule does not depend on the selection of the sets of parallel and antiparallel pairs. Figure 2(b)(Fig. 2) shows that the occurrence rate of strand pairs with BSD=1 is major, while the occurrence rates of these strand pairs having either BSD=2 or BSD=3 are both minor. Notably, strand pairs with BSD>3 are rare. This suggests that a β-strand most often prefers to choose its nearest strands to partner with. We term this propensity the “First Coming First Pairing” rule.
Table 1
The maximal BSD and the cumulative rates of the strand pairs having different BSDs
Figure 2
(a) Cumulative percent of β-strand pairs as BSD increases. (b) Distribution of β-strand pairs as BSD changes (truncated to 17 since percents are almost 0 when BSD>17). Both pictures mention the sets of parallel, antiparallel and all strand pairs.
Among all non-nearest β-strand pairs (BSD>1), we mainly consider those pairs with BSD=2 and BSD=3. In other words, 1- or 2-interval strands are sandwiched by the two paired strands. Then the 1- or 2-interval strands may join to the same β-sheet of the given non-nearest β-strand pair, or join to another β-sheet. In the former case, we call the 1- or 2-interval strands “national strands”. In the latter case, we call the 1- or 2-interval strands “foreign strands”. Based on all strand pairs with BSD=2 or BSD=3, we compute the rates of national interval and foreign interval strands, respectively, with the statistical results shown in Table 2(Tab. 2).
Table 2
Percentages of national interval and foreign interval strands, based on all β-strand pairs with BSD=2 or BSD=3 respectively
When the BSD=2, Table 2(Tab. 2) shows that the rate of national interval strands is much greater than that of foreign interval strands. This suggests that the initial strand must wait for the next nearest strand to become its partner if the connection region between the initial strand and its nearest strand are not flexible. Then, the nearest strand will most often pair with another strand within the same sheet of the initial strand, and is only infrequently paired with a strand of another sheet.When the BSD=3, Table 2(Tab. 2) shows that the pairing states of the two interval strands for parallel and antiparallel are different, although they would like to be paired with each other in both cases. For the parallel case, two interval strands will overwhelmingly prefer to be paired with each other, either remaining in the same sheet of the initial strand or going outside of the sheet. For the parallel case, it is rare that two interval strands do not pair with each other and become separated into two sheets (4.81 %). For the antiparallel case, however, this is not rare, as seen by the rate of 17.05 %. In each case, a similar explanation might be given as that for the case when the BSD=2. The nearest partner and the next-nearest strand of the initial strand are both blocked, and the initial strand must await the third nearest strand to partner with. In the former case (the same β-sheet), a possible blocking factor might be the fact that a strand cannot partner if it has already paired with others, since one strand can have no more than 2 partners. In the latter case (a different β-sheet), the different β-sheet formed by the two-interval strands could be a stronger blocker, obstructing the potential strands in closing with each other in 3-D space. As a matter of fact, the two interval strands are also nearest-pairing in most cases (see below). It is from this perspective that we offer the possible explanation that one nearest-pairing blocks another, with the result that the second BSD is 3.For BSD=3 pairs (as shown in Figure 3(Fig. 3)), we further investigate all possible pairing styles, with results shown in Table 3(Tab. 3). From Table 3(Tab. 3), it is interesting to note that the case where the two interval strands 'f' and 'g' pair together accounts for the majority (overall 74.03 %). Note that the f-g pairing is also a nearest-pairing, obeying the “First Come First Pair” rule (BSD=1). One possible explanation could be that the rule is first obeyed between strands 'f' and 'g', which pair together in the first stage. Due to the blocking factor initiated by the f-g pairing, strand 'a' can neither choose its nearest neighbor 'f' as its partner, nor the next nearest 'g'. Thus, it must choose the next-next-nearest, 'b', resulting in a BSD=3 pair. Collectively, although the BSD=3 case does not outwardly obey the “First Come First Pair” rule, it could indeed be a consequence of such a rule. Another observable fact supporting this assumption could be found in the case of the first style in Table 3(Tab. 3), in which 'f' and 'g' cannot pair together. This could be due to blocking, caused by the a-f pair (BSD=1) and g-b pair (BSD=1), in which two nearest-pairings block another nearest-pairing 'f-g'.
Figure 3
Strands along the primary sequence of a BSD=3 pair
The four dark gray lines represent the four β-strands in the primary sequence, while strand 'a' and 'b' are the given β-strand pair with BSD=3. Strand 'f' and 'g' are the 2 interval strands.
Table 3
Percentage of occurrences and cases of each of all possible pairing styles of a BSD=3 pair*
In summary, the “First Come First Pair” rule is encountered widely in β-strand pairing, but does not occur in all strand pairs. One possible reason could be that already-paired strands may hinder others from pairing with the nearest neighbors, considering the fact that one strand can only have 1 or 2 partners. There could be other reasons, in view of the complexity of protein folding. Regardless, the “First Come First Pair” rule remains important in β-strand pairing, which could eventually lead to protein folding pathways.
Results and Discussion of the Features of Real vs. Pseudo Strand Pairs
Terminal extensions of β-strand pairs
For the two strands in a pair, the N or C terminals of one strand do not always align with the N or C terminals of the other, giving rise to terminal extensions besides the common pairing region (Figure 1(d)(Fig. 1)). Let PL stand for the length of the common region, Et1 and Et2 stand for the length of the two terminal extensions, respectively, and let EL represent the total pair length (i.e. EL = PL+Et1+Et2). Then, the common paring ratio R could be calculated by:R = PL/EL x 100 % =PL/(PL + Et1 + Et2) x 100 %If the lengths of two strands are represented by SL1 and SL2, respectively, the ratio of the common pairing region to the length of each strand could be calculated by:Rti = PL/SLi x 100%, i =1,2The R, Rt1 and Rt2 of every strand pair in our dataset have been calculated in the present study. Results are shown in Figure 4(Fig. 4). It can be seen from Figure 4(Fig. 4) that when Rt1≥40 % and Rt2≥40 %, the cumulative percentages of the two strands reach 94.26 % and 95.98 %, respectively; and when R≥25 %, the cumulative percentage rises to 96.97 %. Therefore, a rule of β-strand pairing could be as follows:
Figure 4
Cumulative percentages (CP) of R, Rt1 and Rt2 calculated from the present dataset
R≥25 % and Rti≥40 %The horizontal axis denotes the percentage of common paired region PL to EL (for curve R) or to SL (for curves Rt1 and Rt2). Points on the R curve denote the cumulative percentages of samples whose R=PL/EL equals or exceeds the corresponding abscissa value. Points on the Rt1 and Rt2 curves denote the cumulative percentages of samples whose Rt1=PL/Rt1 or Rt2=PL/Rt2 equals or exceeds the corresponding abscissa value, respectively.To reduce computational searching space, we will use this rule in subsequent steps when we traverse all possible relative positions of two specific strands to pair in the present study.
Real β-strand pairs and pseudo strand pairs
In order to investigate the assignments of β-strand pairs, we analyzed another 3 types of 'pseudo' strand pairs, as well as the 'real' β-strand pairs. The pseudo pairs are generated from primary sequences by randomly selecting stretches of different secondary structures as alternative partners of a β-strand. Since such pairs never occur in a functional protein, these types of pairs are called “Pseudo Strand Pairs”.The real β-strand pairs are denoted as SR (a β-Strand with its Real partner β-strand). The other three pseudo strand pairs are denoted as: (i) SS (a β-Strand with a no-real-partner β-Strand, i.e. the partner is randomly selected from other β-strand stretches from the primary sequence); (ii) SH (a β-Strand with a randomly selected α-Helix stretch from the primary sequence); (iii) SC (a β-Strand with a randomly selected Coil stretch from the primary sequence). The random procedure was repeated iteratively 5 times.Ultimately, four types of pairs were obtained. In the next step, features of these pairs were extracted and compared.
Feature extraction from the four types of pairs
Many studies (Asogawa, 1997[3]; Steward and Thornton, 2002[32]; Fooks, et al., 2006[13]) suggest that amino acid pairing in β-sheets involves implicit information which was not only helpful for the β-sheet structure prediction, but also significant to disclose the potential mechanisms and rules of β-sheet assembly. In this study, to extract features of the four types of strand pairs above, we used the Average Amino Acid Pairing Encoding Matrix (APEM) which was generated in our previous study (Zhang et al., 2009[41], 2010[40]). The matrix compiled information regarding the amino acid pairs. The APEM matrix was an upper triangular matrix, since only 210 possible amino acid pairs were considered, regardless of the order of the two amino acids within one pair. An element in the matrix was defined as follows:in which A and A are the two amino acids forming an inter-strand pair, and P(A : A) represents the observed frequency of the amino acid pair A : A. The terms P(A), P(A) are the background probability generated by counting single amino acid frequencies of A , A respectively across all protein sequences in the dataset, which was similar to the previous work by Bryan et al. (2009[7]).The feature extraction steps were as follows:Firstly, each element m(A: A) in APEM was transformed by:The average value of all r(A was calculated by:Then, we defined the feature score f and feature score d as follows:where PL represents the length of the common pairing region of the two strands; Rpos(A: A) and Rneg(A: A) were calculated by:For each strand pair (for both real and pseudo ones), all possible relative pairing positions were traversed according to the rule (R≥25 % and Rti≥40 %), running in both parallel and antiparallel fashions. The relative pairing position and the orientation fashion were determined for the maximum f value. The f value was one of the features used. At this position, the corresponding d value was calculated, which became another feature. For each pair of strands, one set of the two features was calculated, and then used in the next step.
The non-conservative (loveness) propensity of β-strand partner
We investigated and compared the extracted features of the real and the three pseudo strand pairs. A scatter plot of d value (y) versus f value (x) of the four types of pairs are given in Figure 5(Fig. 5). It is obvious from Figure 5(Fig. 5) that the distributions of SR and SS features are similar (Figure 5 (a) and (b)(Fig. 5)), while they differ for SH and SC pairs (Figure 5 (c) and (d)(Fig. 5)). It can also be seen that the distributions of SH are slightly more similar to SR than SC (d).
Figure 5
Scattered plot of d value (y) versus f value (x) of real β-strand pairs and pseudo strand pairs. (a) SR: real pairs (a β-Strand with its Real partner β-strand); (b) SS: pseudo pairs (a β-Strand with a no-real-partner β-Strand, i.e. the partner is randomly selected from other β-strand stretches from the primary sequence); (c) SH pseudo pairs (a β-strand with a randomly selected α-Helix stretch from the primary sequence); (d) SC pseudo pairs (a β-strand with a randomly selected Coil stretch from the primary sequence)
Since in a scatter plot the similarities between the four types cannot be clearly demonstrated, we adopt the famous pattern recognition method -- support vector machine (SVM) -- to attempt to distinguish the features of the four types of pairs. Here, SVM was used only for distinguishing features and not for prediction, as it was used in (Zhang et al., 2009[41]). The distinguishing results obtained via SVM are shown in Table 4(Tab. 4).
Table 4
Results of feature distinguishing between the four types of pairs, using SVM. (7-fold cross-validation test. RBF kernel function, with c and gamma set to the default value in LibSVM 2.83. )*
From Table 4(Tab. 4), it can be seen that the classification efficiency of SR-SH, SR-SC, SS-SH and SS-SC can be accepted, while the SH-SC result is poor, and SR-SS is very poor. This is consistent with observations from the scatter plot. The features of SR and SS are so similar that even SVM can not distinguish them. The poor efficiency of SH-SC indicates that helix and coil stretches both have similar characteristics, different from β-strand pairs. In view of strand pair formation, there is no significant difference between helix and coil segments. The much better distinguishing results of SR-SH and SR-SC indicate that a strand has the ability to distinguish its real partner from helix or coil segments. The moderately good distinguishing of SS-SH and SS-SC indicates that other non-real-partner strands have this ability as well. However, the very poor SR-SH distinguishing suggests that a strand cannot distinguish its real partner from other non-real-partner β-strands.From these results, it can be concluded that the partner is loveness for a single β-strand. Although a single β-strand has the ability to distinguish its partner from a helix or a coil, it lacks the ability to distinguish it from other non-partner β-strands. Similar results were obtained in earlier studies. Ren et al. (2006[30]) reported that pairs of residues on neighboring strands were neither more strongly conserved nor more strongly covariant than pairs of the same type in non-interacting positions. Mandel-Gutfreund et al. (2001[26]) found that residue pairs in antiparallel β-sheets were equally conserved and covaried as much as non-interacting residue pairs. Steward and Thornton (2002[32]) also indicated that a single β-strand was able to recognize a non-interacting β-strand with greater accuracy than in the case of recognition between two random sequences. However, these studies did not consider partners among random selected stretches of different secondary structures. It could be suggested that the loveness nature of β-strand partners could also be employed to explain why the β-sheet structures are so especially difficult to simulate in protein tertiary structure predictions (Steward and Thornton, 2002[32]; Kuhn et al., 2004[20]).
Conclusion
The “First Come First Pair” rule implies that one β-strand is inclined to pair with its nearest neighbor strands, or strands not far from it along the primary linear sequence. Analysis of pseudo strand pairs indicates that partner recognition is not conservative. Combining these two findings above, it can be concluded that in the process of β-sheet formation, the pairing of β-strands is not exclusively driven by specific residues or interacting amino acid pairs. Instead, in most cases, a single β-strand may follow the “First Come First Pair” rule to choose its partner. It prefers first to choose the nearest neighbor (with the smallest BSD value). However, if the first nearest neighbor is blocked, it must choose the next nearest neighbor, and if the next nearest is also blocked, it must then choose the next-next one, which still has a smaller BSD value. These results are in agreement with earlier studies by Wathen and Jia (2010[35]), in which they investigated the initial nucleation step of β-sheet formation and indicated that nucleation was not primarily driven by specific interacting residue pairs; instead, β-nucleation was a local phenomenon resulting either from sequential or topological proximity. However, note that not all β-strand pairs obey this rule; although the reason is currently unclear. The chaperones may be one of the reasons, the circumstances may be another, but there definitely must be other reasons (Meiler and Baker, 2003[27]).The findings in this study complement Anfinsen's discovery. Anfinsen discovered three decades ago that denatured proteins can spontaneously self-assemble into their native conformations (Anfinsen, 1973[1]). In various studies, Anfinsen further showed that denatured ribonuclease could be completely reversed by removing denaturing chemicals or by lowering the temperature. The ribonuclease could fold back to its natural functional state on its own. Therefore, Anfinsen concluded that the amino-acid sequence determines the structure of a protein. However, it still remains a challenge to explain how proteins fold into their native structures directly from their primary sequences (Bowman et al., 2011[5]). It is worth pointing out that secondary structures were not known in Anfinsen's time, and secondary structures may not be totally collapsed during his denaturation process (Li et al., 1998[23]). How do amino acids located far apart in the primary sequence find one another to interact in the 3D space? Studies shown that the degree of specificity between side-chain/side-chain interactions between residues on neighboring strands seem to be very weak (Wouters and Curmi, 1995[36]). As a consequence, the interactions between amino acids and the 3D structures could not always be predicted if only the primary sequence is given. Since most proteins' folding processes are carried out simultaneous with translation, rather than after translation, the near-neighbor pairing propensity can be regarded in terms of the earliest-translated strands participating in pairing first. It is conceivable that this assumption is a consequence of the above fact, where the region of the translated primary sequence could partially determine the secondary structures and their neighbor interactions.In conclusion, we imply that the 3D structure of a protein could be determined not only by the primary sequence, but also by the nearest-neighbor interacting secondary structure elements, which in turn may indeed be determined by local primary sequences. Although further verification must be done via biological experiments, the statistical results in the present study may point towards the notion that the nearest pairing propensity of secondary structure elements could be a potential rule among so many unknown protein-folding determining factors. This in turn could contribute to protein structure prediction, and the mechanisms of protein folding.
Notes
Jishou Ruan and Tao Zhang (College of Life Sciences, Nankai University, Tianjin, PR China, 300071; Tel: +86 022 23500237; zhangtao@nankai.edu.cn) contributed equally as corresponding authors.
Acknowledgement
We would like to thank Michelle Hanlon from Cross Cancer Institute, Edmonton, Alberta, Canada for her kindly help. This work was supported by grants from the National Natural Science Foundation of China (31171053, 11232005, 68075049, 10671100, 31150110577, 31050110432 and 81171342), Tianjin research program of application foundation and advanced technology (12JCZDJC22300).
Authors: Allen W Bryan; Matthew Menke; Lenore J Cowen; Susan L Lindquist; Bonnie Berger Journal: PLoS Comput Biol Date: 2009-03-27 Impact factor: 4.475