| Literature DB >> 31874596 |
Hongjie Wu1, Hongmei Huang1, Weizhong Lu2, Qiming Fu1,3, Yijie Ding1, Jing Qiu1, Haiou Li1.
Abstract
BACKGROUND: In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein structures with the lowest free energy; however, limitations in clustering methods and three-dimensional structural-distance assessments make identifying exact order of the best five or three near-native candidate structures difficult.Entities:
Keywords: Protein structural prediction; Random forest; SPICKER
Mesh:
Substances:
Year: 2019 PMID: 31874596 PMCID: PMC6929337 DOI: 10.1186/s12859-019-3257-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Schematic of the re-ranking method via random forest classification
Improved detection of near-native structures via random forest classification
Datasets
| Data set | Number of proteins | Average length |
|---|---|---|
| I-TASSER Decoy Set-I | 43 | 80 |
| QUARK Decoy Set | 145 | 107 |
| CASP10 dataset | 54 | 212 |
| CASP11 dataset | 39 | 203 |
RMSD comparison of the first model of 43 proteins
| PDB | Lena | Bestb | SPICKER | Calibur | Durandal | |||
|---|---|---|---|---|---|---|---|---|
| Originalc | RF_SPICKERd | Originalc | RF_Calibure | Originalc | RF_Durandalf | |||
| 1abv_ | 103 | 4.81 | 13.93 | 13.17 | 13.17 | 13.57 | ||
| 1af7__ | 72 | 2.92 | 5.73 | 5.73 | 4.45 | 4.45 | 10.28 | |
| 1ah9_ | 63 | 1.88 | 4.66 | 4.66 | 3.31 | 3.02 | 3.02 | |
| 1b4bA | 71 | 4.20 | 7.18 | 5.57 | 5.57 | 5.54 | 5.54 | |
| 1b72A | 49 | 2.36 | 5.08 | 3.73 | 3.23 | 3.23 | ||
| 1bm8_ | 99 | 6.67 | 7.18 | 7.18 | 7.07 | 7.07 | 7.48 | 7.48 |
| 1bq9A | 53 | 3.98 | 7.39 | 8.18 | 8.36 | 8.36 | ||
| 1cewI | 108 | 3.20 | 3.92 | 3.92 | 12.49 | 3.75 | 3.75 | |
| 1cqkA | 101 | 1.40 | 2.78 | 2.78 | 1.95 | 2.37 | 2.37 | |
| 1dcjA_ | 73 | 9.31 | 11.66 | 12.18 | 11.97 | |||
| 1di2A_ | 69 | 1.32 | 2.49 | 2.49 | 2.62 | 2.49 | 2.49 | |
| 1dtjA_ | 74 | 1.58 | 3.22 | 2.83 | 1.88 | 1.88 | ||
| 1egxA | 115 | 1.93 | 2.31 | 2.31 | 2.95 | 2.59 | 2.59 | |
| 1g1cA | 98 | 2.16 | 2.97 | 2.97 | 2.65 | 2.49 | 2.49 | |
| 1gjxA | 77 | 5.01 | 7.30 | 14.09 | 8.09 | 8.09 | ||
| 1gnuA | 117 | 4.06 | 7.09 | 9.15 | 9.54 | 9.54 | ||
| 1gpt_ | 47 | 2.79 | 5.52 | 5.53 | 6.29 | 3.68 | 3.68 | |
| 1gyvA | 117 | 2.69 | 3.78 | 3.78 | 3.63 | 3.39 | 3.39 | |
| 1hbkA | 89 | 2.69 | 3.57 | 3.57 | 3.48 | 3.48 | 3.48 | |
| 1itpA | 68 | 4.10 | 11.23 | 10.92 | 11.48 | 11.48 | ||
| 1jnuA | 104 | 2.30 | 3.45 | 3.45 | 3.21 | 2.76 | 2.76 | |
| 1kjs_ | 74 | 4.65 | 8.67 | 8.44 | 8.75 | |||
| 1mkyA3 | 81 | 3.68 | 5.16 | 5.16 | 5.54 | 5.49 | 5.49 | |
| 1mla_2 | 70 | 2.04 | 3.18 | 3.18 | 2.82 | 3.38 | 3.38 | |
| 1mn8A | 84 | 5.14 | 6.69 | 6.69 | 7.45 | 7.45 | 10.38 | 10.38 |
| 1n0uA4 | 69 | 3.14 | 4.59 | 4.59 | 4.62 | 4.28 | 4.28 | |
| 1ne3A | 56 | 3.16 | 6.63 | 6.09 | 5.96 | 5.96 | ||
| 1no5A | 93 | 6.12 | 10.82 | 10.69 | 11 | 11 | ||
| 1npsA | 88 | 1.81 | 3.07 | 3.07 | 2.74 | 8.29 | ||
| 1o2fB_ | 77 | 4.08 | 7.41 | 9.03 | 3.91 | 3.91 | ||
| 1ogwA_ | 77 | 0.96 | 1.81 | 1.81 | 1.34 | 2.43 | 3.00 | |
| 1pgx_ | 59 | 2.79 | 3.42 | 3.42 | 4.19 | 3.26 | 3.26 | |
| 1r69_ | 61 | 1.30 | 2.28 | 2.28 | 2.14 | 1.99 | 1.99 | |
| 1shfA | 59 | 1.18 | 2.86 | 2.86 | 2.75 | 1.29 | 1.29 | |
| 1sro_ | 71 | 2.59 | 3.54 | 3.89 | 3.54 | 3.54 | ||
| 1tfi_ | 47 | 2.49 | 5.72 | 5.08 | 5.08 | 4.48 | 4.48 | |
| 1thx_ | 108 | 1.71 | 2.67 | 2.67 | 2.27 | 2.10 | 2.10 | |
| 1tif_ | 59 | 6.47 | 7.45 | 7.45 | 7.57 | 7.57 | 9.44 | |
| 1tig_ | 88 | 3.00 | 9.12 | 3.58 | 3.58 | 4.25 | 4.25 | |
| 1vcc_ | 76 | 4.52 | 6.53 | 6.53 | 8.13 | 7.46 | 7.46 | |
| 256bA | 106 | 2.75 | 3.20 | 3.20 | 6.23 | 3.73 | ||
| 2pcy_ | 99 | 3.87 | 5.46 | 5.46 | 2.12 | 4.71 | 4.71 | |
| 2a0b_ | 118 | 2.05 | 2.20 | 2.20 | 3.48 | 2.75 | 2.75 | |
| Average | 81.09 | 3.28 | 5.36 | 5.53 | 5.36 | |||
a:Length of protein sequence
b:RMSD between the best model in the decoy and native
c:RMSD of the first model predicted by SPICKER,Calibur and Durandal
d:RMSD of the first model predicted by the random forest classification from SPICKER results
e:RMSD of the first model predicted by the random forest classification from Calibur results
f:RMSD of the first model predicted by the random forest classification from Durandal result
The RMSD in bold and italic indicates RF(RF_SPICKER, RF_Calibur and RF_Durandal) methods obtain lower RMSD than their original methods
Fig. 2Comparison of RMSD of the second model in the absence of the first model
Fig. 3Comparison of the RMSD of the third model and the fourth model. a. Comparison of the RMSD of the third model. b. Comparison of the RMSD of the fourth model
Fig. 4Comparison of the numbers of correct predictions
Fig. 5Visual Comparison of random forest classifier and current prediction methods on 1dcjA and 1kjs_