| Literature DB >> 25625198 |
Abstract
We present an alternative approach to protein 3D folding prediction based on determination of rules that specify distribution of "favorable" residues, that are mainly responsible for a given fold formation, and "unfavorable" residues, that are incompatible with that fold, in polypeptide sequences. The process of determining favorable and unfavorable residues is iterative. The starting assumptions are based on the general principles of protein structure formation as well as structural features peculiar to a protein fold under investigation. The initial assumptions are tested one-by-one for a set of all known proteins with a given structure. The assumption is accepted as a "rule of amino acid distribution" for the protein fold if it holds true for all, or near all, structures. If the assumption is not accepted as a rule, it can be modified to better fit the data and then tested again in the next step of the iterative search algorithm, or rejected. We determined the set of amino acid distribution rules for a large group of beta sandwich-like proteins characterized by a specific arrangement of strands in two beta sheets. It was shown that this set of rules is highly sensitive (~90%) and very specific (~99%) for identifying sequences of proteins with specified beta sandwich fold structure. The advantage of the proposed approach is that it does not require that query proteins have a high degree of homology to proteins with known structure. So long as the query protein satisfies residue distribution rules, it can be confidently assigned to its respective protein fold. Another advantage of our approach is that it allows for a better understanding of which residues play an essential role in protein fold formation. It may, therefore, facilitate rational protein engineering design.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25625198 PMCID: PMC4384110 DOI: 10.3390/biom5010041
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Figure 1Supersecondary structure of sandwich-like proteins with a specific “interlock” arrangement of strands 2, 3, 5 and 6. Beta strands are represented by arrows and protein loops are shown as lines. The interlock strands are shown in red. Six positions in each strand are considered. In this model, it is assumed that residues at positions 1, 3 and 5 are directed inside (hydrophobic positions), whereas residues at positions 2, 4 and 6 are on the surface (hydrophilic positions).
Figure 2Disallowed arrangement of strands 1, 2 and 3 in a beta sheet because of loops overlapping.
The rules of assignment of residues at hydrophobic positions in strands.
| Rule No. | Residues | Interlock Strands | Total | Non-Interlock Strands | Total | Total | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Strand 2 | Strand 3 | Strand 5 | Strand 6 | Interlock Strands | Strand 1 | Strand 4 | Strand 7 | Non‑Interlock Strands | All Strands | ||
| 1 | Trp, Ile, Phe, Leu, Cys, Val, Met, Ala, Tyr | 3 | 3 | ≥1 | ≥1 | ≥9 and ≤12 | ≥1 | ≥1 | ≥1 | ≥4 and ≤6 | ≥13 and 17≤ |
| 2 | Trp, Ile, Phe, Leu, Val, Met | ≥1 | ≥1 | ≥1 | ≥1 | ≥6 and ≤10 | ≥0 | ≥0 | ≥0 | ≥2 and ≤5 | ≥8 and 15≤ |
| 3 | Ala | 1≤ | 1≤ | 1≤ | 1≤ | 2≤ | 1≤ | 1≤ | 1≤ | 2≤ | 2≤ |
| 4 | Cys | 1≤ | 1≤ | 1≤ | 1≤ | 2≤ | 1≤ | 1≤ | 1≤ | 2≤ | 2≤ |
| 5 | Tyr | 1≤ | 1≤ | 1≤ | 1≤ | 1≤ | 1≤ | 1≤ | 1≤ | 1≤ | 2≤ |
| 6 | Phe, Trp, Tyr | 2≤ | 2≤ | 2≤ | 2≤ | 3≤ | 1≤ | 1≤ | 1≤ | 1≤ | 3≤ |
| 7 | Pro | 0 | 0 | 0 | 0 | 0 | 1≤ | 1≤ | 1≤ | 1≤ | 1≤ |
| 8 | Gly | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 9 | Asp, Glu, Arg, Lys, His | 0 | 0 | 2≤ | 2≤ | 2≤ | 2≤ | 2≤ | 2≤ | 3≤ | 4≤ |
The rules of assignment of residues at hydrophilic positions in strands.
| Rule No. | Residues | Interlock Strands | Total | Non-Interlock Strands | Total | Total | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Strand 2 | Strand 3 | Strand 5 | Strand 6 | Interlock Strands | Strand 1 | Strand 4 | Strand 7 | Non-Interlock Strands | All Strands | ||
| 1 | Gln, Glu, Arg, Thr, Ser, Tyr, Asp, His, Lys, Asn | ≥1 | ≥1 | ≥1 | ≥1 | ≥7 and 10≤ | ≥1 | ≥1 | ≥1 | ≥5 and 8≤ | ≥13 and 18≤ |
| 2 | Glu, Arg, Lys, Asp, His | ≥0 | ≥0 | ≥0 | ≥0 | ≥0 and 5≤ | ≥0 | ≥0 | ≥0 | ≥0 and 6≤ | ≥1 and 9≤ |
| 3 | Pro | 0 | 0 | 1≤ | 0 | 1≤ | 1≤ | 1≤ | 1≤ | 2≤ | 2≤ |
| 4 | Gly | 1≤ | 1≤ | 1≤ | 1≤ | 2≤ | 1≤ | 1≤ | 1≤ | 1≤ | 1≤ |
| 5 | Pro + Gly | 1≤ | 1≤ | 1≤ | 1≤ | 3≤ | 1≤ | 1≤ | 1≤ | 2≤ | 3≤ |
The rules of assignment of residues at any position in strands.
| Rule No. | Residues | Interlock Strands | Total | Non-Interlock Strands | Total | Total | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Strand 2 | Strand 3 | Strand 5 | Strand 6 | Interlock Strands | Strand 1 | Strand 4 | Strand 7 | Non-Interlock Strands | All Strands | ||
| 1 | Trp, Ile, Phe, Leu, Cys, Val, Met, Ala, Tyr | ≥3 | ≥3 | ≥1 | ≥2 | ≥10 and 18≤ | ≥1 and 4≤ | ≥1 and 4≤ | ≥1 and 4≤ | ≥5 and 9≤ | ≥17 and 26≤ |
| 2 | Ala | 1≤ | 1≤ | 2≤ | 1≤ | 3≤ | 2≤ | 2≤ | 2≤ | 3≤ | ≥0 and 4≤ |
| 3 | Phe, Trp, Tyr | 2≤ | 3≤ | 2≤ | 2≤ | 6≤ | 2≤ | 1≤ | 2≤ | 3≤ | ≥2 and 8≤ |
| 4 | Gln, Glu, Arg, Thr, Tyr, Asp, His, Lys, Ser, Asn | ≥1 | ≥1 | ≥1 | ≥1 | ≥8 and 15≤ | ≥2 and 5≤ | ≥2 and 5≤ | ≥2 and 5≤ | ≥9 and 13≤ | ≥18 and 27≤ |
| 5 | Glu, Arg, Lys, Asp, His | 2≤ | 3≤ | 2≤ | 2≤ | ≥1 and 7≤ | 3≤ | 4≤ | 4≤ | ≥1 and 7≤ | ≥1 and 12≤ |
| 6 | Pro, Gly, Asn, Asp, Glu | 1≤ | 3≤ | 2≤ | 2≤ | 5≤ | 2≤ | 4≤ | 4≤ | ≥2 and 6≤ | ≥2 and 10≤ |
| 7 | Pro | 0 | 0 | 1≤ | 0 | 1≤ | 2≤ | 2≤ | 1≤ | ≥0 and 3≤ | ≥0 and 3≤ |
| 8 | Gly | 1≤ | 1≤ | 1≤ | 1≤ | 2≤ | 1≤ | 1≤ | 1≤ | ≥0 and 1≤ | ≥0 and 2≤ |
| 9 | Pro + Gly + Ala | 2≤ | 1≤ | 2≤ | 1≤ | 4≤ | 3≤ | 2≤ | 2≤ | ≥1 and 5≤ | ≥0 and 6≤ |
Tests of specificity and sensitivity.
| Target | Other Proteins | |||||
|---|---|---|---|---|---|---|
| Rules | Proteins (144) | Beta-Proteins (3951) | Alpha-Proteins (3006) | c-Protein (3925) | d-Protein (3950) | Total (14,832) |
| 1 All rules | 130 (90%) | 91 (2%) | 3 (0.1%) | 33 (0.8%) | 21 (0.5%) | 148 (0.9%) |
| 2 The rules of | 136 (94%) | 2120 (54%) | 442 (15%) | 2259 (58%) | 1369 (35%) | 6190 (41.7%) |
| 3 The rules of | 134 (93%) | 1653 (42%) | 223 (7%) | 1292 (33%) | 808 (20%) | 3976 (26.8%) |
| 4 The rules of | 130 (90%) | 132 (3%) | 6 (0.2%) | 50 (1%) | 34 (0.8%) | 222 (2.4%) |
| 5 The rules of | 130 (90%) | 796 (20%) | 100 (3%) | 896 (23%) | 432 (11%) | 2224 (14.9%) |
| 6 All rules, exception rule 1a in | 0 | 276 (7%) | 15 (0.5%) | 128 (3%) | 71 (2%) | 490 (3%) |