| Literature DB >> 26252221 |
Brinda Vallat1, Carlos Madrid-Aliste1, Andras Fiser1.
Abstract
Predicting the three-dimensional structure of proteins from their amino acid sequences remains a challenging problem in molecular biology. While the current structural coverage of proteins is almost exclusively provided by template-based techniques, the modeling of the rest of the protein sequences increasingly require template-free methods. However, template-free modeling methods are much less reliable and are usually applicable for smaller proteins, leaving much space for improvement. We present here a novel computational method that uses a library of supersecondary structure fragments, known as Smotifs, to model protein structures. The library of Smotifs has saturated over time, providing a theoretical foundation for efficient modeling. The method relies on weak sequence signals from remotely related protein structures to create a library of Smotif fragments specific to the target protein sequence. This Smotif library is exploited in a fragment assembly protocol to sample decoys, which are assessed by a composite scoring function. Since the Smotif fragments are larger in size compared to the ones used in other fragment-based methods, the proposed modeling algorithm, SmotifTF, can employ an exhaustive sampling during decoy assembly. SmotifTF successfully predicts the overall fold of the target proteins in about 50% of the test cases and performs competitively when compared to other state of the art prediction methods, especially when sequence signal to remote homologs is diminishing. Smotif-based modeling is complementary to current prediction methods and provides a promising direction in addressing the structure prediction problem, especially when targeting larger proteins for modeling.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26252221 PMCID: PMC4529212 DOI: 10.1371/journal.pcbi.1004419
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Flowchart of the SmotifTF prediction algorithm.
GDT_TS values of top scoring models obtained with SmotifTF method using dynamic Smotif library generated at different e-value cutoffs.
| PDB code and chain | No cutoff | e-value > 10-10 | e-value > 10-5 | e-value > 10-1 | e-value > 100 |
|---|---|---|---|---|---|
| 1aabA | 67.06 | 46.43 | 46.43 | 46.43 | 46.43 |
| 1bqzA | 58.08 | 39.62 | 39.62 | 39.62 | 39.62 |
| 1dcjA | 80.36 | 45.36 | 45.36 | 36.07 | 35.00 |
| 1hdnA | 52.44 | 32.32 | 32.32 | 32.32 | 26.83 |
| 1iloA | 51.35 | 47.97 | 39.87 | 35.47 | 27.70 |
| 1khmA | 40.29 | 36.46 | 36.46 | 36.46 | 40.28 |
| 1lq7A | 61.91 | 61.91 | 61.91 | 61.91 | 61.91 |
| 1myoA | 63.38 | 18.20 | 20.61 | 20.61 | 20.61 |
| 1ng7A | 44.00 | 44.00 | 44.00 | 38.50 | 38.50 |
| 1om2A | 45.06 | 45.60 | 45.60 | 45.60 | 45.06 |
| 1pveA | 56.13 | 62.74 | 62.74 | 58.96 | 58.96 |
| 1rg6A | 37.50 | 28.33 | 28.33 | 32.32 | 30.00 |
| 1uzcA | 40.18 | 38.39 | 38.39 | 38.39 | 38.39 |
| 1ss2A | 36.72 | 31.64 | 30.08 | 25.78 | 28.91 |
| 1tizA | 71.15 | 55.39 | 47.69 | 38.08 | 41.54 |
| 1wgnA | 78.13 | 63.54 | 49.48 | 51.04 | 51.04 |
| 1wgwA | 62.79 | 40.99 | 40.99 | 40.99 | 40.00 |
| 1wgvA | 30.06 | 27.38 | 25.60 | 25.60 | 25.16 |
| 1wjtA | 46.43 | 49.03 | 37.34 | 37.34 | 41.56 |
| 1wgqA | 70.43 | 34.14 | 29.30 | 31.99 | 36.29 |
| Mean | 54.67 | 42.47 | 40.11 | 38.70 | 38.71 |
Fig 2Performance evaluation in the training set.
Prediction quality (assessed as the mean GDT_TS of the top-scoring model against the native structure) is plotted on the X-axis for 20 cases at different e-value cutoffs used in generating the dynamic Smotif library. The data points for different e-value cutoffs are shown in different symbols (no cutoff (square), 10-10 (circle), 10-5 (triangle), 10-1 (star) and 100 (diamond)). The dual Y-axes correspond to the mean number of hits in the dynamic Smotif database (right axis, inversed scale, black data points) and to the mean e-value of the best hit in the dynamic database (left axis, log scale, red data points), respectively.
Performance of SmotifTF on the benchmarking test set in comparison to other methods
| PDB | Nres
| SS | e-value | SmotifTF | I-tasser | HHpred | Rosetta |
|---|---|---|---|---|---|---|---|
| 4v1am | 109 | Mainly-α | 0.000001 | 47.64 | 53.21 | 55.28 | 30.73 |
| 4rd5A | 156 | α+β | 0.006 | 16.67 | 17.31 | 20.35 | 20.99 |
| 2mpvA | 145 | Mainly-β | 0.0098 | 25.71 | 55.69 | 38.10 | 12.93 |
| 4ux3B | 64 | α+β | 0.029 | 25.00 | 32.81 | 42.58 | 32.42 |
| 4nknA | 116 | Mainly-α | 0.039 | 34.55 | 22.61 | 20.00 | 50.00 |
| 3wzsA | 140 | α+β | 0.072 | 26.79 | 57.14 | 47.68 | 29.46 |
| 4wwrA | 49 | Mainly-α | 0.12 | 57.65 | 52.55 | 54.08 | 49.49 |
| 4pqzA | 131 | Mainly-α | 0.17 | 34.92 | 23.47 | 20.61 | 27.10 |
| 4ro3A | 103 | α+β | 0.27 | 41.26 | 55.58 | 53.88 | 34.95 |
| 4uzxA | 54 | Mainly-α | 0.41 | 56.94 | 61.57 | 37.96 | 52.78 |
| 4waiA | 82 | Mainly-α | 0.79 | 30.18 | 35.67 | 26.52 | 33.84 |
| 4o7kA | 190 | α+β | 0.8 | 20.83 | 12.90 | 11.97 | 19.34 |
| 2mpoA | 182 | Mainly-β | 2.1 | 22.35 | 15.52 | 10.30 | 12.50 |
| 4ndsA | 74 | α+β | 2.4 | 32.43 | 43.24 | 29.05 | 33.78 |
| 4qtnA | 236 | Mainly-α | 2.6 | 21.78 | 22.69 | 15.42 | 18.51 |
| 4wyqA | 119 | Mainly-α | 5.5 | 34.03 | 29.62 | 21.22 | 32.35 |
| Mean (all rows) | |||||||
| 122.50 | 0.96 | 33.05 | 36.97 | 31.56 | 30.70 | ||
| Mean (rows with e-value of best hit > 0.1) | |||||||
| 132.00 | 1.52 | 35.24 | 35.28 | 28.10 | 31.46 | ||
| Mean (rows with e-value of best hit > 2.0) | |||||||
| 152.75 | 3.15 | 27.65 | 27.77 | 19.00 | 24.29 | ||
1 = Number of residues in the query protein
2 = Major secondary structure class according to DSSP [57]
3 = e-value of the best hit in the dynamic database
4 = GDT_TS score of the best scoring model when compared to the native structure.
Fig 3Examples of SmotifTF predictions in the benchmark test set.
The structural superposition of the top-scoring model (pink cartoon) with the native structure (green cartoon) is shown in the middle. The proteins that provide the Smotif fragments to the top-scoring model are shown in grey cartoon, with the Smotif themselves colored according to the secondary structure elements present in them (helix = red, strand = yellow, loop = green). The PDB id, chain id and residue numbers of the Smotif fragments are shown along with the root mean square deviation (RMSD) of the respective Smotif fragments compared to the corresponding native Smotif. The SCOP ids of the proteins are provided, where available. (a) N-terminal domain of a protein with unknown function from Vibrio Cholerae (PDB: 4ro3A) (b) RNA binding protein Tho1 from Saccharomyces Cerevisiae (PDB: 4uzxA) (c) Mammalian Endoribonuclease Dicer (PDB: 4wyqA).