| Literature DB >> 28865433 |
Xiaoyang Jing1, Qiwen Dong2, Ruqian Lu1.
Abstract
BACKGROUND: In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair.Entities:
Keywords: Fusion method; Learning-to-rank; Protein residue-residue contact prediction
Mesh:
Substances:
Year: 2017 PMID: 28865433 PMCID: PMC5581475 DOI: 10.1186/s12859-017-1811-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The overall flowchart of the proposed contact prediction framework
The comparative results of the proposed method with other methods on CASP11 dataset
| Methodsa | Short-range | Medium-range | Long-range | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Top 5 | L/10 | L/5 | Top 5 | L/10 | L/5 | Top 5 | L/10 | L/5 | |
| PSICOV | 35.12% | 24.59% | 19.00% | 34.47% | 26.75% | 21.82% | 40.98% | 33.12% | 28.02% |
| CCMpred | 40.00% | 30.13% | 22.60% | 40.33% | 31.66% | 26.36% | 43.90% | 38.55% | 33.51% |
| GREMLIN | 40.33% | 29.71% | 22.80% | 40.49% | 32.19% | 26.55% | 43.25% | 38.19% | 33.64% |
| RF-classifiersb | 62.76% | 50.11% | 42.18% | 37.87% | 31.69% | 28.27% | 25.41% | 22.74% | 19.85% |
| RRCRank |
|
|
|
|
|
|
|
|
|
aThe best results are shown in bold font. bThe average of three independent RF-classifiers for each contact category
Fig. 2Comparison of the top L/5 prediction performance between the RRCRank and other methods. (a) PSICOV. (b) GREMLIN. (c) CCMpred. (d) RF-classifiers. (Line x = y is shown for reference)
The comparative results of the proposed method with other methods on CASP12 dataset
| Methodsa | Short-range | Medium-range | Long-range | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Top 5 | L/10 | L/5 | Top 5 | L/10 | L/5 | Top 5 | L/10 | L/5 | |
| PSICOV | 33.09% | 25.99% | 19.44% | 38.55% | 31.54% | 23.86% | 37.09% | 33.65% | 28.01% |
| CCMpred | 40.00% | 31.56% | 24.10% |
| 36.52% | 30.22% | 41.82% | 38.54% |
|
| GREMLIN | 40.00% | 30.75% | 24.08% | 46.18% | 35.84% |
| 44.00% | 37.59% | 34.31% |
| RF-classifiers | 55.27% | 45.78% | 37.81% | 31.64% | 29.11% | 23.67% | 29.45% | 23.04% | 20.16% |
| RRCRank |
|
|
| 42.18% |
| 29.93% |
|
| 34.37% |
aThe best results are shown in bold font
The comparative results of the proposed method with traditional strategies
| Methodsa | Short-range | Medium-range | Long-range | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Top 5 | L/10 | L/5 | Top 5 | L/10 | L/5 | Top 5 | L/10 | L/5 | ||
| All targets | SVR |
| 53.57% | 44.98% | 43.44% | 36.20% | 30.44% | 38.37% | 32.84% | 27.96% |
| SVC | 62.60% | 49.68% | 42.26% | 38.20% | 32.05% | 27.56% | 36.89% | 29.75% | 26.51% | |
| RRCRank | 67.48% |
|
|
|
|
|
|
|
| |
| Hard targets | SVR | 56.80% | 45.83% | 39.53% | 35.20% | 31.06% | 26.07% | 19.60% | 16.40% | 14.18% |
| SVC | 54.00% | 44.77% | 38.52% | 35.60% | 30.24% | 26.09% | 20.00% | 15.62% | 14.77% | |
| RRCRank |
|
|
|
|
|
|
|
|
| |
aThe best results are show with bold font for each category
The comparative results of the proposed method with the state-of-the-art methods on CASP11 dataset
| Methodsa | Short-range | Medium-range | Long-range | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Top 5 | L/10 | L/5 | Top 5 | L/10 | L/5 | Top 5 | L/10 | L/5 | ||
| All targets | CONSIP2 |
|
|
|
|
|
|
|
|
|
| Shen-Group | 58.31% | 50.01% | 43.11% | 47.61% | 41.10% | 36.07% | 34.37% | 33.30% | 28.94% | |
| MULTICOM-CLUSTER | 68.13% | 55.47% | 46.17% | 49.27% | 41.12% | 37.52% | 35.12% | 30.27% | 26.32% | |
| UCI-IGB-CMpro | 52.20% | 42.79% | 36.09% | 48.94% | 41.68% | 36.09% | 36.75% | 30.38% | 28.10% | |
| RRCRank | 67.48% | 54.97% | 46.02% | 47.38% | 37.87% | 31.74% | 48.69% | 40.78% | 34.77% | |
| Hard targets | CONSIP2 |
|
|
|
|
|
|
|
|
|
| Shen-Group | 60.89% | 50.74% | 43.72% | 48.89% | 41.62% | 35.25% | 29.33% | 27.29% | 22.94% | |
| MULTICOM-CLUSTER | 62.40% | 52.91% | 43.81% | 50.00% | 40.31% | 35.74% | 24.40% | 22.09% | 17.89% | |
| UCI-IGB-CMpro | 51.20% | 42.58% | 36.64% | 49.20% | 41.23% | 34.94% | 24.80% | 19.93% | 18.42% | |
| RRCRank | 57.20% | 46.06% | 39.72% | 40.00% | 31.39% | 26.29% | 30.40% | 23.31% | 18.57% | |
aThe best results are show with bold font for each category
Fig. 3Comparison of the top L/5 prediction performance between the RRCRank and four leading methods in CASP11. (a) CONSIP2. (b) Shen-Group. (c) MULTICOM-CLUSTER. (d) UCI-IGB-CMpro. (Line x = y is shown for reference)
The p-values in Student’s t-test for the difference on prediction precision between different methods on CASP11 dataset
| Methods | CONSIP2 | Shen-Group | MULTICOM-CLUSTER | UCI-IGB-CMpro | RRCRank |
|---|---|---|---|---|---|
| CONSIP2 | 1.00E + 00 | 1.28E-145 | 9.44E-36 | 1.15E-54 | 1.02E-27 |
| Shen-Group | 1.28E-145 | 1.00E + 00 | 1.53E-51 | 6.28E-34 | 1.37E-61 |
| MULTICOM-CLUSTER | 9.44E-36 | 1.53E-51 | 1.00E + 00 | 1.00E-03 | 1.00E-01 |
| UCI-IGB-CMpro | 1.15E-54 | 6.28E-34 | 1.00E-03 | 1.00E + 00 | 8.34E-07 |
| RRCRank | 1.02E-27 | 1.37E-61 | 1.00E-01 | 8.34E-07 | 1.00E + 00 |