| Literature DB >> 23671331 |
Pralay Mitra1, David Shultis, Yang Zhang.
Abstract
Protein design aims to identify new protein sequences of desirable structure and biological function. Most current de novo protein design methods rely on physics-based force fields to search for low free-energy states following Anfinsen's thermodynamic hypothesis. A major obstacle of such approaches is the inaccuracy of the force field design, which cannot accurately describe the atomic interactions or distinguish correct folds. We developed a new web server, EvoDesign, to design optimal protein sequences of given scaffolds along with multiple sequence and structure-based features to assess the foldability and goodness of the designs. EvoDesign uses an evolution-profile-based Monte Carlo search with the profiles constructed from homologous structure families in the Protein Data Bank. A set of local structure features, including secondary structure, torsion angle and solvation, are predicted by single-sequence neural-network training and used to smooth the sequence motif and accommodate the physicochemical packing. The EvoDesign algorithm has been extensively tested in large-scale protein design experiments, which demonstrate enhanced foldability and structural stability of designed sequences compared with the physics-based designing methods. The EvoDesign server is freely available at http://zhanglab.ccmb.med.umich.edu/EvoDesign.Entities:
Mesh:
Year: 2013 PMID: 23671331 PMCID: PMC3692067 DOI: 10.1093/nar/gkt384
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The overview of the EvoDesign server. The process is divided into three steps. Pre-processing and clustering take place in a single processor, whereas simulations are completed in parallel on 10 processors.
Figure 2.The screenshot of the result page of EvoDesign. The user choices and input structure are shown in region A and region B, which appears as soon as the job is submitted. The summary of EvoDesign will appear in region C after successful completion of the job.
Accuracy versus efficiency of the EvoDesign server on seven non-homologous proteins at different sizes
| Computational time (h) | Goodness of design | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Pre-processing | Simulation | Clustering | Total | Sequence identity (%) | Normalized relative error (NRE) | RMSD (Å) | ||||||||
| DB ID (SCOP class | Protein length | Included physics-based force field? | Secondary structure | Solvent accessibility | Torsional angles | I-TASSER | SPARKS-X | Rosetta | ||||||
| Φ | Ψ | |||||||||||||
| 1GUT_A (b) | 52 | No | 0.5 | 3.8 | 0.1 | 4.4 | 22 | 1.0 | −0.2 | −0.0 | −0.1 | 0.8 | 1.2 | 2.9 |
| Yes | 1.6 | 16.5 | 0.1 | 18.2 | 31 | 0.5 | 0.0 | −0.1 | −0.3 | 0.5 | 1.1 | 5.2 | ||
| 1V5I_B (a + b) | 71 | No | 0.7 | 4.4 | 0.1 | 5.2 | 11 | 0.0 | 0.2 | 0.4 | 1.1 | 3.3 | 3.4 | 9.5 |
| Yes | 2.5 | 16.3 | 0.1 | 18.9 | 24 | −0.2 | 0.2 | 0.4 | 1.3 | 1.5 | 4.7 | 8.2 | ||
| 1BKR_A (a) | 109 | No | 1.0 | 5.8 | 0.2 | 7.0 | 27 | 0.1 | 0.0 | 0.1 | −0.1 | 0.4 | 1.9 | 5.3 |
| Yes | 4.6 | 24.1 | 0.2 | 28.9 | 30 | 0.2 | −0.0 | 0.1 | −0.1 | 0.3 | 1.2 | 2.7 | ||
| 1T3Y_A (a + b) | 132 | No | 1.3 | 6.7 | 0.2 | 8.2 | 18 | −0.0 | 0.1 | 0.0 | −0.1 | 1.9 | 2.0 | 4.9 |
| Yes | 6.5 | 59.0 | 0.2 | 65.7 | 24 | −0.2 | −0.0 | 0.2 | −0.2 | 1.8 | 2.4 | 5.5 | ||
| 2GMY_A (a) | 148 | No | 1.5 | 7.4 | 0.3 | 9.2 | 14 | 1.0 | 0.0 | 0.4 | 0.3 | 1.0 | 1.8 | 12.4 |
| Yes | 5.0 | 34.0 | 0.3 | 39.3 | 20 | 0.4 | 0.0 | 0.3 | 0.2 | 0.3 | 3.1 | 9.9 | ||
| 1Y25_A (a/b) | 165 | No | 1.5 | 8.1 | 0.3 | 9.9 | 17 | 0.2 | 0.3 | 0.3 | 0.4 | 2.3 | 6.7 | 10.8 |
| Yes | 6.6 | 58.3 | 0.3 | 65.2 | 29 | 0.1 | 0.1 | 0.2 | 0.1 | 1.2 | 1.6 | 14.0 | ||
| 2PTH_A (a/b) | 194 | No | 1.9 | 9.0 | 0.3 | 11.2 | 12 | 0.7 | 0.6 | 0.7 | 0.8 | 1.7 | 17.1 | 16.7 |
| Yes | 7.3 | 45.5 | 0.3 | 53.1 | 20 | 0.7 | 0.3 | 0.4 | 0.6 | 0.9 | 2.1 | 15.4 | ||
| Average | No | 1.2 | 6.4 | 0.2 | 7.9 | 17 | 0.4 | 0.1 | 0.3 | 0.3 | 1.6 | |||
| Yes | 4.9 | 36.2 | 0.2 | 41.3 | 25 | 0.2 | 0.1 | 0.2 | 0.2 | 0.9 | ||||
a(a) means class-α and (b) means class-β.
bRMSD is computed between the target scaffold and the model structure by I-TASSER, Rosetta and SPARKS-X on the design sequence (see Supplementary Table S1 for detail results, including TM-score and alignment coverage in both first and the best in top 10 models).