| Literature DB >> 31969975 |
Aleksandra E Badaczewska-Dawid1, Andrzej Kolinski1, Sebastian Kmiecik1.
Abstract
Three-dimensional protein structures, whether determined experimentally or theoretically, are often too low resolution. In this mini-review, we outline the computational methods for protein structure reconstruction from incomplete coarse-grained to all atomistic models. Typical reconstruction schemes can be divided into four major steps. Usually, the first step is reconstruction of the protein backbone chain starting from the C-alpha trace. This is followed by side-chains rebuilding based on protein backbone geometry. Subsequently, hydrogen atoms can be reconstructed. Finally, the resulting all-atom models may require structure optimization. Many methods are available to perform each of these tasks. We discuss the available tools and their potential applications in integrative modeling pipelines that can transfer coarse-grained information from computational predictions, or experiment, to all atomistic structures.Entities:
Keywords: Coarse-grained modeling; Protein modeling; Protein reconstruction; Structure prediction; Structure refinement
Year: 2019 PMID: 31969975 PMCID: PMC6961067 DOI: 10.1016/j.csbj.2019.12.007
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Overview of protein reconstruction methods. The accuracy of some methods is evaluated using RMSD values between reconstructed and reference structures measured on: alpha carbons (RMSDCA) or backbone (RMSDBB) or side chain (RMSDSC) heavy atoms. The accuracy of side chain reconstruction is also evaluated using chi angles, the first (χ1) and the second (χ2, if applicable).
| Method, reference and year of the last publication | Software availability* | Reconstruction** task | Description*** | Benchmark sets and comments*** |
|---|---|---|---|---|
| server (confold) + standalone (confold2): | CM → CA | The method translates contact maps into distance restraints and uses them as the input to distance geometry algorithm which builds tertiary structure models. CONFOLD2 predicts 200 models using various subsets of input contacts and selects five top models by clustering them. | CONFOLD2 is an improved version of CONFOLD method. Structure predictions for 150 proteins from the PSICOV dataset and for CASP12 targets showed that the for most protein sequences CONFOLD2 was able to capture the structural fold of the protein. | |
| standalone | CM → CA | A heuristic procedure for building tertiary structure models | Tested on 100 non-redundant single-domain protein chains (α, β, α+β, α/β; size from 55 to 786 residues) from SCOPE release 1.67. FT-COMAR is much more tolerant to under prediction than to over prediction of contacts. It can ignore up to 75% of the contact map and still compute a protein structure whose RMSDCA < 4 Å (assuming that the remaining 25% contains no errors). | |
| server + standalone: | CM → AA | The method transforms contact maps into distance restrains and uses them as the input to MODELLER method | Tested on 45 single-domain targets analyzed in the CASP10 experiment and 150 proteins of the PSICOV dataset. The tests showed that GDFuzz3D is slightly more accurate (based on TM-score and RMSD) than FT-COMAR and slightly inferior to PconsFold but more computationally efficient. | |
| standalone: | CM → AA | Merges PconsC contact prediction tool | Tested on 150 proteins (from 52 to 266 residues) of the PSICOV dataset. The input sequence can come from a PDB header (instead of an ATOM section) to avoid internal gaps of chain. This approach enables protein structure prediction of single-domain targets. PconsFold performance was also compared to that of GDFuzz3D | |
| standalone: | SICHO → AA | Method for | Tested on 13 high-resolution X-ray structures. Reconstruction quality RMSDCA: < 0.6 Å on experimental structures. | |
| standalone: | SURPASS → CA | Method for | Tested on PISCES_4600, BAKER_62 and other various proteins (α, β, α+β, α/β; size from 56 to 1016 residues). Reconstruction quality RMSDCA: < 0.5 Å on experimental structures and 1–2 Å on distorted models. | |
| standalone: | CA → BB | Uses the library of 5148 backbone 4-residue fragments (quadrilaterals) and algorithm described by Milik et al. | Tested on 81 non-redundant experimental protein structures and near-native decoys. Reconstruction quality RMSDBB < 0.7 Å on experimental structures. Available as part of the Bioshell package. The algorithm is implemented in java programming language. BBQ performance was also compared to that of PD2 and other tools | |
| standalone | CA → BB | Uses the library of high-resolution structural fragments between 4 and 14 residue long and local fit approximation algorithm. Newer version | Tested on all known human structures from the PDB (935, Park & Levitt protein set), with a global 0.48 Å RMSD | |
| server + standalone: | CA → BB | Uses the library of short 528 backbone fragments obtained using Gaussian mixture models (GMMs). The accuracy of reconstruction can be improved by additional (optional) energy gradient minimization. | Tested on 15 low-resolution and 28 high-resolution protein structures. Reconstruction quality RMSDBB < 0.4 Å on experimental structures. When combined with Rosetta, PD2 method | |
| server: | CA → BB | Uses a 27-letter hidden Markov model-derived structural alphabet described by 155 backbone fragments from known protein structures and a greedy algorithm (based on the OPEP force field) to obtain an optimal combination of fragments. | Tested on the Adcock subset of 14 proteins from 58 to 437 residues and a 7 PDB newcomers subset up to 666 residues. Reconstruction quality RMSDBB is near 0.4 Å for experimental structures. The | |
| server | BB → SC | Uses Dunbrack backbone-dependent rotamer library, SCWRL3-based scoring function and clash-reduction guided iterative search (CIS) with conjugate gradients optimization of rotamers (rotamer relaxation, RR). CIS-RR | Tested on 180 proteins (SCWRL3 test set) and 65 high-resolution crystal structures of proteins. Compared to other tools (SCWRL4, IRECS and SCAP) reconstruction accuracy is similar but | |
| standalone: | BB → SC | Uses a coarse‐grained backbone‐dependent rotamer library, heuristic greedy iteration scheme and effective score (based on knowledge‐based scoring term ROTA 10 Å) for ranking all SC rotamers according to the probability of rotamer conformation. | Tested on 641 high resolution X-ray structures (194 with single conformation for all SCs and 447 with at least one SC of multiple conformations). Reconstruction accuracy similar to SCWRL3 and SCAP, RMSDSC ~1.5 Å. Allows the use of additional template of side-chain conformations. | |
| standalone:available on request from the authors | BB → SC | Uses optimized OPLS parameters for long-range and multi-body terms (van der Waals and electrostatic terms), hydrogen-bonding potential and frequency of rotameric states from PDB. The library contains 49,042 discrete rotamers. | Tested on 65 high resolution X-ray structures. Highly accurate tool for SC reconstruction | |
| standalone: | BB → SC | Uses rotamer frequency and van der Waals potentials and two additional unique pairwise energy terms: short-range orientation-dependent (OPUS-PSP) for side chain packing interactions and explicit solvation effects. In newer OPUS_Rota2 version, OPUS-PSP had been replaced by OPUS-DASF term that describes relative positions of atoms on the side chains. | Tested on 65 high resolution X-ray structures and a 379-protein PISCES subset (sequence identity 30%, 1.8 Å) | |
| standalone: | BB → SC | Uses a flexible (-o, slow modeling) or rigid (-star, fast modeling) rotamer model. The energy terms include distance and orientation-dependent potentials and side chain dihedral angle potential energy function. The library of sub-rotamers was derived by perturbation of dihedral angles of rotamers from Dunbrack and Cohen | Tested on 218 proteins and a RAPPER decoy set. | |
| server: | BB → SC | Uses position-dependent | Tested on a set of 639 non-redundant and a blind set of 95 | |
| standalone | BB → SC | Uses backbone-dependent rotamer library, an optimized energy terms and the clash elimination strategy to guide the optimization of side chain conformations. Combinatorial search includes dead-end elimination, graph theory-based, branch-and-terminate, backtrack and Monte Carlo algorithms. | Tested on 2412 high-resolution (≤1.8 Å) structures with complete side chains obtained from PISCES server. RASP had comparable prediction accuracy (%chi1, % chi1+2, RMSD) and returned | |
| standalone: | BB → SC | Heuristic approach using optimized CHARMM parameters for van der Waals torsion-angle terms in an iterative repacking protocol. The library contains 7562 discrete rotamers in terms of 1) Cartesian coordinates, 2) dihedral angles.. | Tested on 33 high resolution protein structures (66–328 residues) not included in the creation of rotamer library. For multi-chain proteins, only the first chain was used. Reconstruction quality RMSDSC < 2 Å. | |
| standalone: | BB → SC | Uses a backbone-dependent rotamer library (the same as SCWRL3), interaction scores by dead end elimination and energy minimization by tree decomposition. This tool does not attempt to regularize the backbone geometry or solve punched rings. | Tested on 180 experimental structures from the SCWRL3 benchmark set of proteins. This approach was several times faster than SCWRL3 especially on larger proteins or cases with heavy atomic clashes. SCATD is freely available and was only tested on a Debian Linux machine. | |
| standalone: | BB → AA | Uses a backbone-dependent rotamer library based on kernel density estimates to provide rotamer frequencies and torsional angles, a tree decomposition algorithm to solve the side chain packing problem, specific potentials (anisotropic hydrogen-bonding, soft pairwise van der Waals), and fast collision detection. | Optimized on a set of 100 protein structures and tested on 379 X-ray structures with electron densities available from UEDS | |
| server + standalone: | BB → SC rotamer optimization | Uses a machine learning approach based on 156 neural networks that are trained to compute an energy function based on pairwise contact distances and a backbone-dependent rotamer library (the same as OPUS-Rota | Tested on the SCWRL4 benchmark set (379 proteins), 94 proteins from CASP9, 7 large protein complexes and a ribosome with and without RNA. SIDEpro can | |
| standalone: | BB → SC | Uses side chain free energy in a molecular dynamics simulations scheme. During the optimization of side chain packing, each rotamer state is represented by a single oriented CG bead (3 spatial and 2 orientation coordinates). Uses a combination of isotropic (excluded volume) and directional interactions (chemical character, e.g. polar, aromatic) for each pair of interacting side chains or backbones. The side chain model is trained by the maximum-likelihood scheme. The NDRD rotamer library | Tested on a large, non-redundant set of crystal structures of globular proteins from the PDB with 50–500 residues and resolution < 2.2 Å (6255 chains). The method gave similar accuracy of chi1 angle as SCWRL4 and RASP, but is | |
| All-atom reconstruction from CA-trace | ||||
| standalone: | CA → AA | The Rosetta protocol ca_to_allatom reconstructs AA structure and performs structure refinement. Uses the initial Cα-trace (with a user-defined parameter specifying how far Cα atoms are allowed to deviate from the initial model)) and rigid-body perturbation of secondary structure fragments from known protein structures. The protocol includes | Tested on 8 proteins (from 101 to 310 residues) from cryoEM maps at 5 and 10 Å resolution. | |
| standalone | CA → AA | Uses a strictly geometric approach based on Cα triplets and parameters from the Amber03 force field for rebuilding the protein backbone and Cβ. The side chain is rebuilt based on the definition of the united atom for the side group. | Tested on 5 experimental protein structures with reconstruction quality RMSDBB: <0.8 Å, | |
| standalone: | CA → AA | Uses protein template(s) in CG representation (it can be in Cα-trace) to create a set of distance restraints that guide the reconstruction. Stereochemical restraints (bond lengths and angles) are obtained from the CHARMM force field and statistical analysis of known structures. MODELLER employs various structure optimization techniques. | Available as part of the Modeller package. The algorithm is | |
| server + standalone: | CA → AA | Reconstructs and refines protein structures, first the BB only and, after adding SC, the entire structure. Both side-chain and backbone atoms are flexible during refinement simulations, while conformational search is driven by physics- and knowledge-based force-field. It can optionally use secondary structure assignment/prediction to drive the refinement. The method can start from the CA, BB or SC model. | Tested on 261 proteins up to 150 residues (148 hard targets for I-TASSER and 113 with good templates). Compared to other tools, ModRefiner was better in side chain packing and | |
| standalone: | CA → AA | Uses backbone fragment library, rotamer library and backbone reconstruction algorithm described by Milik et al. | Tested on 30 high-quality X-ray structures. (reconstruction quality RMSDAA 1.0–1.5 Å) and on a set of 500 low-resolution protein models. Initial Cα coordinates can be distorted. This approach enables | |
| server available on request from the authors | CA → AA | Uses a geometric approach to place the backbone atoms at the average positions derived from known protein structures (based on the algorithm by Milik et al. | Tested on CG trajectories of SH3, S6 systems and a subset of 2945 non-redundant experimental structures from PDB. This approach enables reconstruction of all-atom details from large regions of | |
| server + standalone: | CA → AA | Uses backbone isomer libarary (528,798 fragments) and backbone-dependent rotamer library (SCWRL) for atomic details reconstruction. Backbone rebuilding stage includes | Tested on 230 non-redundant proteins up to 300 residues (experimental and CG decoys generated by I-TASSER in the CASP8). This approach can remove steric clashes, retain correct topology and improve the backbone hydrogen-bonding network. | |
| standalone: | SC → AA | Adds missing hydrogen and OXT atoms. Uses the atom types and steric-only or H-bonds (default, slower) criterion to determine the number and positions of added hydrogens. Bond lengths are taken from Amber parm99 parameters. | ||
| standalone: | SC → AA | The algorithm starts from random positions of hydrogen atoms and optimizes them using an iterative procedure of molecular dynamics simulations and Powell energy minimization steps. The energy function includes bonded terms and van der Waals. | The method is able to | |
| server | SC → AA | Uses a force field with the concept of | Tested successfully in modeling binding affinities of protein-ligand complexes: β secretase (2va7), mutant HIV-1 reverse transcriptase (2opq) and human sialidase NEU2 complexed with an isobutyl ether mimetic inhibitor (2f11). The method | |
| standalone: | SC → AA | Uses a geometry-based approach and performs molecular dynamics simulations. Uses different bond lengths and angles according to selected force field parameters. The energy function includes bonded terms, van der Waals and electrostatics. | The method enables | |
| server + standalone: | SC → AA | Combines local geometry restraints and conformational search that minimizes atomic overlap, encourage hydrogen bonding and optimize electrostatic interactions. Local geometries of the initial positions of H-atoms are taken from the CHARMM22 force field. | Tested on three sets of experimental data: high-resolution X-ray crystallography, structures from neutron diffraction, and NOE proton-proton distance restraints. Compared with other methods (CHARMM and REDUCE) HAAD was | |
| standalone: | SC → AA | Searches hydrogen atom positions at intervals of 10° (ϕ = 10) or 3° (ϕ = 3) around the axis of a cone with a side equal to the bond length or places hydrogens using geometric criteria. Uses different bond lengths and angles according to the selected version of the CHARMM force field. The energy function includes torsion angle, van der Waals and electrostatics. | All hydrogen atoms (including non-polar) are described explicitly. | |
| standalone | SC → AA | Uses a geometry-based and molecular mechanics approach to place all non-hydroxyl hydrogen atoms. For hydroxyl and water hydrogens it uses systematic search of torsion angles. The energy function includes torsion angle (from CHARMM), van der Waals, solvation and continuum electrostatics. | ||
| standalone: | SC → AA | Molecular visualization tools that use only geometric criteria, without minimization. | ||
| Standalone available on request from the authors | SC → AA | Predicts hydrogen geometry, ionization, and tautomer states for macromolecular structures based on 3D coordinates. The energy model includes van der Waals, electrostatics, solvation, rotamer, tautomer, and titration effects. Optimal states are chosen according to a chemical model derived from the MMFF94 force field. | Tested on ultra-high resolution X-ray structures. The method | |
| server: | SC → AA | Adds hydrogen atom positions based on optimal hydrogen bond networks in the protein-ligand interface. Networks are modeled as graphs. Uses an efficient dynamic programming approach with storing partial solutions and combining them to globally optimal solutions. The algorithm is split into two phases: initialization (performed only once) and optimization. | ||
| standalone: | SC → AA | Adds hydrogens based on expected atomic geometry lengths and angles. Places hydrogens to optimize local H-bonding networks, avoid steric overlaps and | ||
| server: | SC → AA | Adds all missing hydrogens to the structure. It contains several servers which additionally compute all possible hydrogen bonds, but in default they do not determine which bonds would be most favorable. | Uses the Optimal Hydrogen Bonds server for computing the best possible hydrogen bond network. The program works much slower when the system contains many water molecules. Dedicated for LINUX systems. | |
| standalone: | Protein-lipid MARTINI → AA | Method for reconstruction from the MARTINI coarse-grained representation of | Tested on 6 systems including lipid bilayers, proteins in solution (YvoA), membrane proteins (ASIC) and peptides (WALP). Reconstruction quality RMSDBB: <0.6 Å. The approach enables integral backmapping and reconstructing complete systems, including the solvent. | |
| Standalone available on request from the authors MemProtMD database: | Protein-lipid CG → AA optimization | Method for reconstruction from the MARTINI coarse-grained representation of | Tested on 10 membrane protein-lipid bilayer systems of different size and complexity, generated by self-assembly CGMD simulations (leuT, aquaporin, ELIC, ASIC, Cyt Ox, KcsA, SERCA, β2AdR/lysozyme, OmpC, OSC). This approach does not attempt to convert united water particles. | |
| Standalone available on request from the authors | Protein-DNA | Method for reconstruction from coarse-grained representation of | A library of 22,347 DNA fragments is derived from high-resolution X-ray structures from PDB. Tested on 180 complex protein-DNA experimental structures with | |
* links to web servers or standalone methods have been provided only if working at the time of writing this publication.
** reconstruction tasks realized by outlined methods are summarized in the third column using the following shortcuts: contact map (CM), alpha carbon atoms (CA), backbone atoms (BB), backbone and side chain atoms (SC), all-atom representation that includes backbone, side chain and hydrogen atoms (AA), coarse-grained representation (CG).
*** some major or unique features are bolded for readers convenience.
Fig. 1Typical stages of protein structure reconstruction. The required range of reconstruction stages depends on the resolution of the initial models. For some deeply coarse-grained (CG) models, the first step is to reconstruct positions of C-alpha (CA) atoms. For most medium resolution CG models, recovering atomistic details starts with backbone (BB) reconstruction from the CA atoms that is followed by side-chain (SC) reconstruction and, subsequently, adding hydrogen atoms. The geometry of the final all-atom structure can be further improved using various refinement techniques.
Fig. 2Example tripeptide presented in all-atom and corresponding coarse-grained resolutions. Various coarse-grained modeling tools are shown: Rosetta-centroid, MARTINI, CABS, UNRES, SICHO and SURPASS. Note that most coarse-grained models use explicit positions of (pseudo) atoms while ROSETTA uses a set of torsional angles φ, ψ, ω to describe backbone geometry. The legend explaining the colors of atoms and pseudoatoms is presented in top right.