| Literature DB >> 19944385 |
Pantelis G Bagos1, Georgios N Tsaousis, Stavros J Hamodrakas.
Abstract
It has been shown that the progress in the determination of membrane protein structure grows exponentially, with approximately the same growth rate as that of the water-soluble proteins. In order to investigate the effect of this, on the performance of prediction algorithms for both alpha-helical and beta-barrel membrane proteins, we conducted a prospective study based on historical records. We trained separate hidden Markov models with different sized training sets and evaluated their performance on topology prediction for the two classes of transmembrane proteins. We show that the existing top-scoring algorithms for predicting the transmembrane segments of alpha-helical membrane proteins perform slightly better than that of beta-barrel outer membrane proteins in all measures of accuracy. With the same rationale, a meta-analysis of the performance of the secondary structure prediction algorithms indicates that existing algorithmic techniques cannot be further improved by just adding more non-homologous sequences to the training sets. The upper limit for secondary structure prediction is estimated to be no more than 70% and 80% of correctly predicted residues for single sequence based methods and multiple sequence based ones, respectively. Therefore, we should concentrate our efforts on utilizing new techniques for the development of even better scoring predictors.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19944385 PMCID: PMC5054404 DOI: 10.1016/S1672-0229(08)60041-8
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Studies included in the meta-analysis for the accuracy of the secondary structure prediction algorithms
| Year | Reference | Training set (No. of proteins) | Q3 | Evolutionary information |
|---|---|---|---|---|
| 1978 | 29 | 53 | NO | |
| 1978 | 25 | 57 | NO | |
| 1986 | 61 | 62.2 | NO | |
| 1987 | 59 | 61.3 | NO | |
| 1987 | 68 | 63 | NO | |
| 1987 | 25 | 66 | YES | |
| 1988 | 62 | 58.7 | NO | |
| 1988 | 106 | 64.3 | NO | |
| 1989 | 48 | 63 | NO | |
| 1990 | 62 | 64 | NO | |
| 1992 | 107 | 66.4 | NO | |
| 1993 | 91 | 64.5 | NO | |
| 1993 | 126 | 72 | YES | |
| 1993 | 110 | 68 | NO | |
| 1996 | 318 | 72.9 | YES | |
| 1996 | 318 | 67 | NO | |
| 1996 | 267 | 64.4 | NO | |
| 1996 | 126 | 71.3 | YES | |
| 1996 | 126 | 66.3 | NO | |
| 1997 | 556 | 75 | YES | |
| 1997 | 402 | 67.5 | NO | |
| 1997 | 512 | 68 | NO | |
| 1997 | 512 | 72.4 | YES | |
| 1997 | 90 | 73.5 | YES | |
| 1997 | 304 | 72 | YES | |
| 1997 | 473 | 67 | NO | |
| 1999 | 1,180 | 76.6 | YES | |
| 1999 | 681 | 76.6 | YES | |
| 1999 | 396 | 72.9 | YES | |
| 1999 | 187 | 76.5 | YES | |
| 2000 | 480 | 76.4 | YES | |
| 2000 | 496 | 76.7 | YES | |
| 2000 | 1,032 | 80.6 | YES | |
| 2000 | 452 | 68.8 | NO | |
| 2001 | 513 | 73.5 | YES | |
| 2001 | 396 | 73.7 | YES | |
| 2001 | 396 | 68.8 | NO | |
| 2001 | 126 | 75.1 | YES | |
| 2002 | 513 | 73.5 | YES | |
| 2002 | 513 | 67.5 | NO | |
| 2002 | 1,180 | 78.13 | YES | |
| 2003 | 480 | 78.5 | YES | |
| 2003 | 126 | 72.8 | YES | |
| 2003 | 1,460 | 77.07 | YES | |
| 2004 | 513 | 75.2 | YES | |
| 2004 | 1,612 | 70.2 | NO | |
| 2004 | 513 | 77 | YES | |
| 2004 | 513 | 78.44 | YES | |
| 2004 | 513 | 76.5 | YES | |
| 2005 | 3,553 | 77.1 | YES | |
| 2005 | 860 | 78.4 | YES | |
| 2005 | 396 | 76.3 | YES | |
| 2005 | 2,171 | 79 | YES | |
| 2005 | 513 | 79.4 | YES | |
| 2005 | 513 | 69 | NO | |
| 2005 | 513 | 76.4 | YES | |
| 2005 | 374 | 76 | YES | |
| 2005 | 3,925 | 81.8 | YES | |
| 2005 | 297 | 70 | YES |
Results obtained from the linear and non-linear regression for secondary structure, α-helical membrane and β-barrel membrane proteins
| RMSE | |||||
|---|---|---|---|---|---|
| Non-linear | |||||
| Q | 0.869 (0.010) | 0.153 (0.034) | −7.876 (2.579) | 0.0070 | |
| C | 0.734 (0.0217) | 0.153 (0.036) | −2.183 (1.516) | 0.0149 | |
| SOV | 0.874 (0.0121) | 0.216 (0.041) | −1.398 (1.184) | 0.0132 | |
| Q | 0.884 (0.018) | 0.019 (0.020) | −140.0415 (145.267) | 0.0093 | |
| C | 0.776 (0.098) | 0.012 (0.018) | −139.895 (183.753) | 0.0213 | |
| SOV | 0.904 (0.013) | 1.984 (−) | 14.583 (0.366) | 0.0376 | |
| Secondary structure | |||||
| Q3 (single) | 0.679 (0.006) | 0.022 (0.004) | −50.405 (17.613) | 0.0182 | |
| Q3 (multiple) | 0.790 (0.011) | 0.002 (7.4×10−4) | −976.918 (351.151) | 0.0219 | |
| Linear | |||||
| Q | 0.735 (0.014) | 0.007 (9.7×10−4) | – | 0.0157 | |
| C | 0.462 (0.028) | 0.013 (0.002) | – | 0.0322 | |
| SOV | 0.646 (0.034) | 0.012 (0.002) | – | 0.0389 | |
| Q | 0.843 (0.006) | 3.3×10−4 (8.6×10−5) | – | 0.0093 | |
| C | 0.655 (0.013) | 7.9×10−4 (1.9×10−4) | – | 0.0206 | |
| SOV | 0.879 (0.025) | 3.3×10−4 (3.7×10−4) | – | 0.0398 | |
| Secondary structure | |||||
| Q3 (single) | 0.627 (0.009) | 7.1×10−5 (2.2×10−5) | – | 0.0349 | |
| Q3 (multiple) | 0.740 (0.007) | 2.0×10−5 (7.0×10−6) | – | 0.0269 | |
Figure 1The prediction accuracy (Q3) of secondary structure prediction algorithms in relation to the size of the training set. Single sequence methods are depicted with squares and multiple alignment-based ones are depicted with triangles. The non-linear regression curves for single sequence and multiple alignment ones are depicted with solid and dotted lines respectively.
Figure 2The prediction accuracy (Q) of prediction algorithms for α-helical membrane proteins in relation to the size of the training set. The non-linear and linear regression curves are depicted with solid and dotted lines respectively.
Figure 3The prediction accuracy (Q) of prediction algorithms for β-barrel membrane proteins in relation to the size of the training set. The non-linear and linear regression curves are depicted with solid and dotted lines respectively.