| Literature DB >> 29221443 |
Yuangen Yao1, Rong Gui1, Quan Liu1, Ming Yi1, Haiyou Deng2,3.
Abstract
BACKGROUND: As one of the most successful knowledge-based energy functions, the distance-dependent atom-pair potential is widely used in all aspects of protein structure prediction, including conformational search, model refinement, and model assessment. During the last two decades, great efforts have been made to improve the reference state of the potential, while other factors that also strongly affect the performance of the potential have been relatively less investigated.Entities:
Keywords: Distance cutoff; Distance-dependent atom-pair potential; Protein structure prediction; Reference state; Residue interval
Mesh:
Substances:
Year: 2017 PMID: 29221443 PMCID: PMC5723101 DOI: 10.1186/s12859-017-1983-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Brief description of six reference states for distance-dependent atom-pair potential
| Reference statea | Description |
|---|---|
| Averaging (ave-) | Take the average distance distribution over different atom types from experimental conformations as the reference state, which means the distance distributions for all types of atom pair are identical in the reference state [ |
| Quasi-chemical approximation (kbp-) | Use the overall distance distribution of atom pair from experimental structures and calculate the specific distance distribution of atom types i and j based on the mole fraction (on the whole dataset) of atom type i and j [ |
| Finite ideal-gas (dfire-) | Treat the reference state as finite ideal-gas that probability of atom pair in a particular distance bin increases in ra with a to-be-determined constant a (a < 2) [ |
| Spherical non-interacting (dope-) | Treat the reference state as a sphere in which all atoms of a protein evenly distributed without ineraction. The size of sphere is specifically decided by corresponding experimental structure [ |
| Random-walk chain (rw-) | Treat the reference state as an ideal random-walk chain of a rigid step length, which mimics well the generic entropic elasticity and inherent connectivity of polymer protein molecules and yet ignores the atomic interactions of amino acids [ |
| Atom-shuffled (srs-) | Generate a shuffled structure dataset by preserving all atomic positions while shuffling atom identities within each of the experimental structures [ |
aThe abbreviation is given in parentheses
Fig. 1The flowchart of our studies. Step 1. PDB dataset preparation; Step 2. Potential construction; Step 3. Potential application; Step 4. Result analysis
Basic information of the six groups of structural decoy sets
| Sets Name | Number of sets | Average lengtha | Number of structures |
|---|---|---|---|
| I-TASSER | 56 | 80 (47–118) | 24,707 |
| Moulder | 20 | 174 (81–340) | 6406 |
| Rosetta | 58 | 83 (50–146) | 5858 |
| 3DRobot | 200 | 133 (80–240) | 60,200 |
| CASP10 | 72 | 224 (24–587) | 5805 |
| CASP11 | 62 | 206 (37–462) | 4522 |
| Total/Ave | 468 | 146 | 107,498 |
aThe length range is given in parentheses
Fig. 2The variation of R1-num with the distance cutoff and residue interval for potentials based on different reference states. R1-num refers to the number of decoy sets whose native structure is given the lowest energy score by the potential. a. aveREF. b. kpbREF. c. dfireREF. d. dopeREF. e. rwREF. f. srsREF
Fig. 3The variation of average PCC between energy score and TM-score with the distance cutoff and residue interval for potentials based on different reference states. PCC refers to Pearson’s correlation coefficient. Since lower energy score (higher TM-score) is desired, the value of PCC is usually negative, the lower the better. a. aveREF. b. kpbREF. c. dfireREF. d. dopeREF. e. rwREF. f. srsREF
Fig. 4The variation of average R1-num (over all 16 residue intervals) with distance cutoff for the six groups of decoy sets. R1-num refers to the number of decoy sets whose native structure is given the lowest energy score by the potential. a. I-TASSER decoy set. b. Moulder decoy sets. c. Rosetta decoy sets. d. 3DRobot decoy sets. e. CASP10 decoy sets. f. CASP11 decoy sets
Fig. 5The distribution of MolProbity score from two typical decoy sets. a 1ail decoy set from Rosetta decoy sets; b 1PSRA decoy set from 3DRbot decoy sets. The native structure is highlighted by open circles
Fig. 6The variation of average R1-num (over all 18 distance cutoff) with residue interval for the six groups of decoy sets. R1-num refers to the number of decoy sets whose native structure is given the lowest energy score by the potential. a. I-TASSER decoy set. b. Moulder decoy sets. c. Rosetta decoy sets. d. 3DRobot decoy sets. e. CASP10 decoy sets. f. CASP11 decoy sets
Fig. 7The number of potentials that can recognize the native structure for each set from I-TASSER and CASP11 decoy sets. There are 288 (18 distance cutoffs × 16 residue intervals) potentials on each reference state, and 1728 (288 × 6 reference states) potentials in total
Fig. 8The distribution of PCC between energy score and TM-score from 1728 potentials for the six groups of decoy sets. The bin width of PCC (Pearson’s correlation coefficient) for statistic is 0.1
Performance comparisons between the potentials we built and several widely-used statistical potentials
| Decoy sets | Measurements | Dfire | RW | GOAP | ave-6-6d | rw-17-3e | Bestf |
|---|---|---|---|---|---|---|---|
| I-TASSER | R1-numa | 43 | 53 | 45 | 42 | 41 | 56 (6) |
| Z-scoreb | 2.80 | 4.42 | 4.98 | 2.42 | 2.97 | 11.21 (dope-5-0) | |
| PCCc | −0.47 | −0.50 | −0.50 | −0.09 | −0.51 | −0.55 (rw-15/16–0) | |
| Moulder | R1-num | 18 | 19 | 19 | 19 | 19 | 20 (89) |
| Z-score | 2.67 | 2.78 | 3.48 | 2.97 | 2.75 | 8.17 (rw-8-0) | |
| PCC | −0.84 | −0.83 | −0.88 | −0.52 | −0.88 | −0.89 (rw-16-2) | |
| Rosetta | R1-num | 22 | 20 | 45 | 41 | 18 | 48 (ave-8-14, srs-6-8) |
| Z-score | 1.55 | 1.48 | 3.38 | 3.11 | 1.46 | 3.56 (srs-6-7) | |
| PCC | −0.37 | −0.36 | −0.51 | −0.31 | −0.36 | −0.45 (srs-6-13/15) | |
| 3DRobot | R1-num | 1 | 0 | 94 | 176 | 19 | 184 (ave-5-5/6) |
| Z-score | 0.83 | −0.30 | 1.85 | 3.19 | 1.16 | 3.50 (ave-5-5) | |
| PCC | −0.86 | −0.85 | −0.90 | −0.70 | −0.86 | −0.88 (ave-19/20/21–5) | |
| CASP10 | R1-num | 26 | 16 | 41 | 53 | 31 | 55 (ave-7-6/7/8) |
| Z-score | 0.76 | 0.86 | 1.60 | 1.34 | 1.31 | 1.70 (dope-6-10/11/12) | |
| PCC | −0.40 | −0.41 | −0.53 | −0.22 | −0.54 | −0.56 (rw-18-3, rw-19-4) | |
| CASP11 | R1-num | 24 | 15 | 37 | 47 | 33 | 49 (14) |
| Z-score | 0.82 | 1.01 | 1.91 | 1.37 | 1.50 | 1.72 (dope-6-11) | |
| PCC | −0.36 | −0.40 | −0.54 | −0.23 | −0.52 | −0.52 (rw-17-3) | |
| Total/Average | R1-num | 134 | 123 | 281 | 378 | 161 | 378 (ave-6-6) |
| Z-score | 1.20 | 0.95 | 2.40 | 2.55 | 1.55 | 2.66 (dope-5-5) | |
| PCC | −0.60 | −0.61 | −0.68 | −0.43 | −0.66 | −0.66 (rw-17/18–3) |
aThe number of decoy sets whose native structure is given the lowest energy score by the potential
bDefined as (
cThe average Pearson’s correlation coefficient between the energy score and TM-score of all structures in each decoy set, including the native structure
dThe potential based on the averaging reference state with both distance cutoff and residue interval to be 6
eThe potential based on the random-walk chain reference state with distance cutoff = 17 and residue interval = 3
fThe best values among the results of all 1728 potentials with different reference states, distance cutoffs and residue intervals. The corresponding potentials that achieve this values are given in parentheses (e.g. rw-15/16–0 means the potentials rw-15–0 and rw-16-0). Only the number of potentials is given in parentheses if more than 3 potentials can achieve the best value
Fig. 9The performance variation when applying potentials with different residue intervals. a The variation of average PCC between energy score and TM-score (over six groups of decoy sets and potentials of different distance cutoffs and reference states); b The variation of average R1-num (over potentials of different distance cutoffs and reference states)