| Literature DB >> 25045716 |
Daniele D'Agostino1, Giulia Pasquale1, Andrea Clematis1, Carlo Maj2, Ettore Mosca3, Luciano Milanesi3, Ivan Merelli3.
Abstract
There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the Sτ-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures.Entities:
Mesh:
Year: 2014 PMID: 25045716 PMCID: PMC4082941 DOI: 10.1155/2014/980501
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Algorithm 1The pseudocode of the Sτ-DPP algorithm.
Figure 1Schematic representation of the space domain. The 2D grid of compartments represents a section of the nucleus in which two genes, G1 and G2 (black rectangles), are located. Four regulatory factors (represented with different colour circles) can diffuse within this environment. Compartments filled in yellow have a lower free space, which model the macromolecular crowding due to chromatin. The representation is not in scale with actual sizes used in simulations.
Figure 2Evolution of G1 state probability starting from an initial condition in which an activator is placed closer to the gene with respect to an inhibitor. The vertical axis represents the G1 frequencies that have been found for each state in relation to the simulation time (horizontal axis).
List of rules defining reaction and diffusion processes. (x, y) are the compartment's coordinates, n ∈ {−1,0, 1}, and z ∈ {1,2, 3,4}.
| Process | Rule |
|---|---|
| Activation of G1 mediated by F1 | G1 + F1 →G1 + |
| Dissociation of F1 from G1 | G1 + → G1 + F1 |
| Activation of G2 mediated by F1 | G2 + F1 → G2 + |
| Dissociation of F1 from G1 | G2 + → G2 + F2 |
| Inhibition of G1 mediated by F2 | G1 + F2 → G1 − |
| Dissociation of F2 from G1 | G1 − → G1 + F2 |
| Inhibition of G2 mediated by F2 | G2 + F2 → G2 − |
| Dissociation of F2 from G2 | G2 − → G2 + F2 |
| Activation of G1 mediated by F3 | G1 + F3 → G1 + |
| Dissociation of G1 from F3 | G1 + → G1 + F3 |
| Inhibition of G2 mediated by F3 | G2 + F3 → G2 − |
| Dissociation of F3 from G2 | G2 − → G2 + F3 |
| Inhibition of G1 mediated by F4 | G1 + F4 → G1 − |
| Dissociation of G1 from F4 | G1 − → G1 + F4 |
| Activation of G2 mediated by F4 | G2 + F4 →G2 + |
| Dissociation of F4 from G2 | G2 + → G2 + F4 |
| Diffusion from ( | F |
Evaluation of the TimePerStep execution times: in milliseconds for the sequential version and speedup values using up to 4 nodes of the first cluster with Infiniband connection.
| Number of compartments | Number of cores | ||||||
|---|---|---|---|---|---|---|---|
| 1 | 2 | 4 | 8 | 16 | 24 | 32 | |
| 256 | 2.9 ms. | 1.86 | 2.80 | 3.50 | 3.20 | 2.08 | 2.12 |
| 1024 | 14.6 ms. | 1.81 | 3.90 | 5.30 | 6.47 | 7.03 | 7.54 |
| 4096 | 63.2 ms. | 1.87 | 3.70 | 6.10 | 10.40 | 15.70 | 19.80 |
Evaluation of the execution times considering the node of the second cluster equipped with the Intel Xeon E5645 CPU and the GTX-580 device. Speedup values are computed considering the TimePerStep values.
| Number of compartments | Blocks thread |
|
| Speedup |
|---|---|---|---|---|
| 256 | Seq. | 81.2 | 2.1 | — |
| 1-32 | 163.7 | 6.5 | 0.3 | |
| 32-32 | 170.8 | 0.4 | 5.8 | |
| 64-32 | 159.8 | 0.2 | 12.4 | |
| 128-32 | 186.7 | 0.1 | 22.6 | |
| 256-32 | 162.7 | 0.1 | 24.5 | |
|
| ||||
| 1024 | Seq. | 82.5 | 7.6 | — |
| 1-32 | 193.5 | 22.1 | 0.3 | |
| 32-32 | 179.1 | 1.4 | 5.5 | |
| 64-32 | 188.6 | 0.6 | 12.0 | |
| 128-32 | 182.2 | 0.4 | 20.4 | |
|
| ||||
| 4096 | Seq. | 88.9 | 29.9 | — |
| 1-32 | 175.6 | 80.4 | 0.4 | |
| 32-32 | 177.0 | 4.9 | 6.1 | |
| 64-32 | 205.0 | 2.4 | 12.3 | |
Evaluation of the execution times considering the three different CUDA devices available on the second cluster nodes for the 4096-compartment system. The number of blocks we consider is multiple of the number of SMs available on these devices. Speedup values are computed considering the TimePerStep values.
| Device | Blocks threads |
|
| Speedup |
|---|---|---|---|---|
| CPU | — | 88.9 | 29.9 | — |
|
| ||||
| GTX-580 | 32-32 | 177.0 | 4.9 | 6.1 |
| 64-32 | 205.0 | 2.4 | 12.3 | |
|
| ||||
| K20 | 13-128 | 198.6 | 2.9 | 10.2 |
| 26-128 | 182.7 | 1.5 | 20.5 | |
| 52-128 | 189.9 | 0.8 | 37.8 | |
| 78-128 | 208.6 | 0.9 | 32.8 | |
|
| ||||
| GTX-Titan | 14-128 | 137.4 | 2.2 | 13.5 |
| 28-128 | 136.1 | 1.1 | 27.6 | |
| 56-128 | 139.2 | 0.5 | 57.4 | |
| 84-128 | 146.5 | 0.6 | 48.1 | |