| Literature DB >> 29533240 |
Evgeny Krissinel1, Ville Uski1, Andrey Lebedev1, Martyn Winn1, Charles Ballard1.
Abstract
Modern crystallographic computing is characterized by the growing role of automated structure-solution pipelines, which represent complex expert systems utilizing a number of program components, decision makers and databases. They also require considerable computational resources and regular database maintenance, which is increasingly more difficult to provide at the level of individual desktop-based CCP4 setups. On the other hand, there is a significant growth in data processed in the field, which brings up the issue of centralized facilities for keeping both the data collected and structure-solution projects. The paradigm of distributed computing and data management offers a convenient approach to tackling these problems, which has become more attractive in recent years owing to the popularity of mobile devices such as tablets and ultra-portable laptops. In this article, an overview is given of developments by CCP4 aimed at bringing distributed crystallographic computations to a wide crystallographic community.Entities:
Keywords: computational cloud; crystallographic computing; data and project management; distributed computing; web services
Mesh:
Substances:
Year: 2018 PMID: 29533240 PMCID: PMC5947778 DOI: 10.1107/S2059798317014565
Source DB: PubMed Journal: Acta Crystallogr D Struct Biol ISSN: 2059-7983 Impact factor: 7.652
The main automated crystallographic pipelines distributed by CCP4
| Pipeline | Description | Use of databases | Expected CPU time |
|---|---|---|---|
|
| Data processing | 0 | Minutes to hours |
|
| Molecular replacement | 3 GB | Hours to days |
|
| Molecular replacement | 3 GB | Hours to days |
|
| Molecular replacement | 26 GB | Hours to days |
|
|
|
| Days to weeks |
|
| Contaminant searches and sequence-less MR | 25 GB | Hours to days |
|
| Experimental phasing (SAD, SIRAS, MAD) | 0 | Hours to days |
|
| Fragment-based MR | ∼1 GB | Hours to weeks |
|
| Automated phasing | 0 | Minutes to weeks |
|
| Model building | 0 | Minutes to hours |
|
| DNA/RNA model building | 0 | Minutes to hours |
Figure 1Schematic of a conventional web-service setup. The setup contains four basic elements: (1) a web-server machine, (2) data storage, (3) computational machines connected to the web server via an internal network and (4) client devices communicating with the web server via http or https protocols. (1), (2) and (3) can all be placed on physical machines individually or shared in any combination.
Figure 2General workflow of CCP4 web services.
Figure 3CCP4 Cloud schematic. (1) Client device. (2) Front-end virtual machine. (3) Persistent data storage. (4) Data-producing facility (for example a synchrotron). (5) Local number-crunching facility. (6) Number-crunching virtual machines. Black lines indicate in-house communications; blue fuzzy lines correspond to external http(s) connections.
Figure 4A snapshot of the CCP4 Cloud FEVM desktop.
Figure 5Schematic of the CCP4 web application. (1) Front-end machine (FE). (2) Data storage. (3) Client machines with optional local servers. (4) Number-cruncher servers (NCs). (5) Data-producing facility. Black lines indicate in-house communications; blue fuzzy lines correspond to http(s) connections.
Figure 6A snapshot of the jsCoFE project window.
CCP4 tasks currently available through jsCoFE
| Task | Description |
|---|---|
| Data import | Import of project data: merged and unmerged MTZ files, PDB/mmCIF files, sequence files. The file type is recognized automatically and all data are arranged in data sets at the metadata level, such that all subsequent jobs can operate with data sets rather than explicit reference to files. For example, MTZ files can be logically split into several data sets, PDB files may be split into chains, and files with multiple sequences are split into individual sequence entities. |
| Convert to structure | Association of coordinate and density/phase data, usually after data import. In |
| POINTLESS/AIMLESS |
|
| Reindex | Changing the space group, usually required in the case of reflection-data enantiomorphism |
| AsuDef | Definition of the asymmetric unit and Matthews analysis |
| BALBES | Automatic molecular replacement with |
| MoRDa | Automatic molecular replacement with |
| MR ensembling from sequence | Making MR model ensembles through sequence searches in the PDB |
| MR ensembling from coordinates | Making MR model ensembles from given coordinate data |
| MOLREP | Molecular replacement with |
| Phaser-MR | Molecular replacement with |
| SHELX-MR | After-MR autotracing with |
| CRANK2 | Automatic experimental phasing with |
| SHELX-auto | Automatic experimental phasing (SAD, MAD, SIRAS) using the |
| SHELX-Substr | Heavy-atom location with |
| Phaser-EP | Experimental phasing with |
| Parrot | Density modification with |
| REFMAC | Macromolecular refinement with |
| LORESTR | Low-resolution refinement pipeline |
| Buccaneer-MR | Automatic model building with |
| MakeLigand | Making ligand structure and restraints with |
| FitLigand | Fitting ligand with |
| FitWaters | Fitting water molecules with |
| Zanuda | Space-group validation with |
| GESAMT | Pairwise and multiple structural alignment and structural searches in the PDB with |
| PISA | Oligomeric state and interface analysis with |
Comparison of the discussed approaches to distributed computing models and the traditional CCP4 desktop setup in crystallography
| Criteria | Web services |
|
| Traditional desktop setup |
|---|---|---|---|---|
| Functionality | Limited to automatic structure-solution pipelines | Full | Subject to the level of development; currently lacks a number of tasks and interactive graphical tools ( | Full |
| Interface complexity | Easy to use | Usual | Usual | Usual |
| User projects support | No | Yes | Yes | Yes |
| Bandwidth requirements | Low | High | Medium to low | N/A |
| Suitability for standalone use on desktops | Not suitable | Can execute jobs on remote servers from ordinary | Allows desktop setups in flexible configurations | Fully suitable |
| Suitability for use on mobile devices | Yes | Not very suitable owing to the specific graphical design and extensive use of mouse | Yes | No |
| Hardware requirements | Low beyond number-crunching facilities | High in addition to number-crunching facilities | Low beyond number-crunching facilities | Medium for most tasks; high for automatic structure solvers |
| Additional software requirements | None | High | Low | N/A |
| Portability (suitability for corporate deployment) | May require a custom installation | Requires IT support with cloud setup experience | High | N/A |
| Maintenance burden | Low | Dependent on local IT support | Low | Low to medium |