| Literature DB >> 25032219 |
Jens Krüger1, Richard Grunzke2, Sonja Herres-Pawlis3, Alexander Hoffmann3, Luis de la Garza1, Oliver Kohlbacher1, Wolfgang E Nagel2, Sandra Gesing4.
Abstract
Virtual high-throughput screening (vHTS) is an invaluable method in modern drug discovery. It permits screening large datasets or databases of chemical structures for those structures binding possibly to a drug target. Virtual screening is typically performed by docking code, which often runs sequentially. Processing of huge vHTS datasets can be parallelized by chunking the data because individual docking runs are independent of each other. The goal of this work is to find an optimal splitting maximizing the speedup while considering overhead and available cores on Distributed Computing Infrastructures (DCIs). We have conducted thorough performance studies accounting not only for the runtime of the docking itself, but also for structure preparation. Performance studies were conducted via the workflow-enabled science gateway MoSGrid (Molecular Simulation Grid). As input we used benchmark datasets for protein kinases. Our performance studies show that docking workflows can be made to scale almost linearly up to 500 concurrent processes distributed even over large DCIs, thus accelerating vHTS campaigns significantly.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25032219 PMCID: PMC4083208 DOI: 10.1155/2014/624024
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1A docking workflow for CADDSuite including the preparation of the receptor and the preparation of the ligands.
Figure 2Example for a metaworkflow containing basic workflows.
Figure 3The performance of LigandFileSplitter for different sizes of data chunks.
Wall time distribution for IMGDock.
| Jobs | 448 | 224 | 150 | 112 |
|
| ||||
| Ligands per file | 25 | 50 | 75 | 100 |
|
| ||||
| Wall time in hours | 301 | 326 | 357 | 444 |
|
| ||||
| Average wall time in sec per job | 2422 | 5239 | 8586 | 14275 |
|
| ||||
| Average wall time in sec per ligand | 97 | 104 | 115 | 143 |
Figure 4The distribution of the runtime classes of the docking runs with 25 ligands, 50 ligands, and 75 ligands is similar, which is also reflected in their runtimes for the whole process with the tendency to consume more time the larger the input files are.
Figure 5The distribution of the runtime classes of the simulations with 100 ligands differs significantly from the three other cases and reflects the increased runtime in this use case.