| Literature DB >> 20386612 |
Jason W Schmidberger1, Mark A Bate, Cyril F Reboul, Steve G Androulakis, Jennifer M N Phan, James C Whisstock, Wojtek J Goscinski, David Abramson, Ashley M Buckle.
Abstract
BACKGROUND: The crystallographic determination of protein structures can be computationally demanding and for difficult cases can benefit from user-friendly interfaces to high-performance computing resources. Molecular replacement (MR) is a popular protein crystallographic technique that exploits the structural similarity between proteins that share some sequence similarity. But the need to trial permutations of search models, space group symmetries and other parameters makes MR time- and labour-intensive. However, MR calculations are embarrassingly parallel and thus ideally suited to distributed computing. In order to address this problem we have developed MrGrid, web-based software that allows multiple MR calculations to be executed across a grid of networked computers, allowing high-throughput MR. METHODOLOGY/PRINCIPALEntities:
Mesh:
Substances:
Year: 2010 PMID: 20386612 PMCID: PMC2850370 DOI: 10.1371/journal.pone.0010049
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1MrGrid web interface showing user input for (a) MTZ file, sequence and space group(s); and (b) search model(s) and RMSD values.
Figure 2Typical results interface showing PHASER jobs running on Xgrid, allowing the user to view the results of completed jobs independently of others running.
List of test case proteins extracted from the Protein PDB.
| |PDB ID | Protein Name | Space Group | Resolution Limit (Å) | Molecular Mass (Da) |
| 2GPZ | Transthyretin-like protein |
| 2.5 | 12700 |
| 2NO4 | Haloacid Dehalogenase |
| 1.9 | 24000 |
| 2CWQ | Hypothetical protein TTHA0727 |
| 1.9 | 12581 |
| 2ENX | Mn-dependant inorganic pyrophosphatase |
| 2.8 | 33597 |
| 2RH5 | Adenylate kinase |
| 2.48 | 23231 |
| 1S3G | Adenylate kinase |
| 2.25 | 23888 |
| 2JCB | 5-Formyl-tetrahydrofolate cycloligase |
| 1.6 | 23385 |
| 2H74 | Thioredoxin |
| 2.4 | 11807 |
| 1FB0 | Thioredoxin |
| 2.26 | 11782 |
| 2MM1 | Myoglobin |
| 2.8 | 17184 |
Details about respective datasets are also listed.
Summary of MrGrid results for 10 test cases studied.
| PDB ID | # SGs in Point Group | # Search Models | # Jobs | Linear Time (mins) | MrGrid Time (mins) | Speed Up Factor |
| 2JCB | 1 | 4 | 4 | 40.63 | 20.60 | 1.97 |
| 2ENX | 1 | 7 | 7 | 17.65 | 4.12 | 4.28 |
| 2NO4 | 3 | 5 | 15 | 1339.22 | 437.92 | 3.06 |
| 2RH5 | 2 | 8 | 16 | 98.32 | 25.20 | 3.90 |
| 2CWQ | 3 | 8 | 24 | 1424.5 | 309.50 | 4.64 |
| 2GPZ | 6 | 4 | 24 | 76.89 | 13.40 | 5.74 |
| 1S3G | 3 | 8 | 24 | 438.90 | 66.97 | 6.55 |
| 1FB0 | 3 | 9 | 27 | 204.57 | 29.67 | 6.89 |
| 2MM1 | 3 | 12 | 36 | 80.68 | 10.7 | 7.54 |
| 2H74 | 6 | 9 | 54 | 272.47 | 22.20 | 12.27 |
Note – Search model count includes a ‘self’ model, which was the actual protein being investigated.
Specifications of the Xgrid resource utilized during this study.
| Machine # | Machine type | Operating System | Processors (GHz) | RAM (GB) |
| 1 | G4 iMac | OS X 10.5.2 | 1.42 | 1 |
| 2 | G5 iMac | OS X 10.4.11 | 2 | 2 |
| 3 | Intel iMac | OS X 10.4.11 | (2×) 2.16 | 2 |
| 4 | G4 MacMini | OS X 10.5 | 1.42 | 1 |
| 5 | Intel iMac | OS X 10.4.11 | (2×) 2.16 | 2 |
| 6 | Intel Quad core Duo | OS X 10.4.11 | (8×) 3 | 8 |
| 7 | Intel MacBook | OS X 10.4.11 | (2×) 2 | 2 |
| 8 | G5 iMac | OS X 10.4.11 | (2×) 2 | 2 |
| 9 | G5 iMac | OS X 10.5.2 | (2×) 2 | 2 |
| 10 | Intel iMac | OS X 10.4.11 | (2×) 1.83 | 2 |
| 11 | G5 iMac | OS X 10.4.11 | 1.8 | 2 |
| 12 | Intel iMac | OS X 10.4.11 | (2×) 2 | 2 |
| 13 | Intel iMac | OS X 10.5.1 | (2×) 2.4 | 2 |
| Total | 65.54 | 30 |
It should be noted that at particular times some workstations were in use by their operator and thus unavailable to Xgrid.
Figure 3Graph depicting the linear relationship between the numbers of jobs submitted to the Xgrid and the respective speed up values.
Speed-up is calculated by dividing linear run time by MrGrid total run time. Linear run time is defined as the sum of the run times of all jobs (job1_runtime + job2_runtime + jobN_runtime). MrGrid total run time is defined as the time difference between the start of the first job and the end of the last job (jobN_finish - job1_start). The linear run time is intended to provide an estimation of how long jobs would take to run synchronously on one computer. r 2 represents the ‘goodness of fit’ of the linear regression line to the data points. y is the intercept on the y axis.