| Literature DB >> 24189240 |
Gábor Bunkóczi1, Nathaniel Echols, Airlie J McCoy, Robert D Oeffner, Paul D Adams, Randy J Read.
Abstract
Phaser.MRage is a molecular-replacement automation framework that implements a full model-generation workflow and provides several layers of model exploration to the user. It is designed to handle a large number of models and can distribute calculations efficiently onto parallel hardware. In addition, phaser.MRage can identify correct solutions and use this information to accelerate the search. Firstly, it can quickly score all alternative models of a component once a correct solution has been found. Secondly, it can perform extensive analysis of identified solutions to find protein assemblies and can employ assembled models for subsequent searches. Thirdly, it is able to use a priori assembly information (derived from, for example, homologues) to speculatively place and score molecules, thereby customizing the search procedure to a certain class of protein molecule (for example, antibodies) and incorporating additional biological information into molecular replacement.Entities:
Keywords: automation; molecular replacement; phaser.MRage; pipeline
Mesh:
Year: 2013 PMID: 24189240 PMCID: PMC3817702 DOI: 10.1107/S0907444913022750
Source DB: PubMed Journal: Acta Crystallogr D Biol Crystallogr ISSN: 0907-4449
Figure 1Phaser.MRage workflow showing the model-generation hierarchy. The Ensembles stage (indicated with white text on a light green background) is used directly in molecular-replacement calculations. Blue arrows indicate existing processing steps. The empty arrow highlights a possible automation step that could select models for a multi-model ensemble from a set of hits detected by a homology search. Users can select models using any combination of the displayed stages, and the highlighted steps will be performed to convert those into the Ensembles stage. The molecular graphics in this figure were rendered with CCP4mg (McNicholas et al., 2011 ▶).
Figure 2Simplified molecular-replacement workflow of an extension cycle, including solution analyses. The process starts with partial solutions taken from the previous cycle (marked with a light blue box) and ends with refined solutions that will be propagated to the next cycle (marked with white text on a light green background), if any. Common molecular-replacement steps are performed with each model that is applicable to a given partial structure (decided by the composition). If a clear solution is identified, quick scoring by superposition can be performed. The dead-end ‘Rejected packing’ is shown to highlight a potential automation step, namely automatic model pruning, if rejected solutions are found with good statistics. The grey dashed box highlights solution-analysis steps. Assemblies identified after refinement (or specified by the user) are used to fill in missing molecules (also shown as white text on a light green background), which enter the workflow as a translation peak (in the following cycle; indicated by the empty arrow). Assemblies can also be used to augment the model list (if requested). With the exception of solution categorization and steps in solution analysis, all processing steps indicated by arrows can run in parallel. The molecular graphics in this figure were rendered with CCP4mg (McNicholas et al., 2011 ▶).
Figure 3Speedup of the trypsin example (full mode) relative to execution on a single CPU (total time 41 min 30 s). Timings are single measurements performed on a 64-CPU machine. Parallel jobs were run on separate threads. Speedup factors were not corrected for the input-processing (including anisotropic scaling) and job-startup overhead.
Course of the structure solution for shiga-like toxin (four pentamers, using a monomer as a search model)
Significant solutions appear when placing the second copy. As the solution becomes more and more complete, the program identifies a pentameric assembly, adds it to the list of search models and uses it to locate a full pentamer with very clear statistics. Note the low translation-function Z-score obtained for the last molecule, which is a consequence of its high B factors. It is difficult to locate this molecule using conventional searches and it requires a very thorough exploration. However, when placed in an approximately correct location predicted from available assembly information it is found immediately.
| Index | Model | TFZ | LLG | ΔLLG |
|---|---|---|---|---|
| 1 | Monomer | 5.3 | 43.1 | 43.1 |
| 2 | Monomer | 11.9 | 154.7 | 111.6 |
| … | ||||
| 10 | Monomer | 22.3 | 2005.9 | 293.3 |
| 11 | 5 × monomer | 42.6 | 3889.3 | 1883.4 |
| 16 | Monomer | 33.5 | 4545.4 | 656.1 |
| … | ||||
| 19 | Monomer | 38.1 | 6557.1 | 673.9 |
| 20 | Monomer | 9.2 | 7322.9 | 765.8 |
Translation-function Z-score.
Log-likelihood gain.
Change in LLG from previous step.
Figure 4Solution process for the glutathione synthase example: (a) target structure, (b) N-terminal domain found (PDB entry 3nzt; hit 4; 29% identical), (c) C-terminal domain found (PDB entry 1uc8; hit 7; 21% identical), (d) structure after phenix.autobuild (Terwilliger et al., 2008 ▶). The grey trace indicates the correct structure. This figure was created using PyMOL (v.1.6; Schrödinger LLC).