| Literature DB >> 25707855 |
Takehiro Shimoda, Shuji Suzuki, Masahito Ohue, Takashi Ishida, Yutaka Akiyama.
Abstract
BACKGROUND: The hardware accelerators will provide solutions to computationally complex problems in bioinformatics fields. However, the effect of acceleration depends on the nature of the application, thus selection of an appropriate accelerator requires some consideration.Entities:
Mesh:
Year: 2015 PMID: 25707855 PMCID: PMC4331681 DOI: 10.1186/1752-0509-9-S1-S6
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Process flow of FFT-based protein-protein docking tools.
Docking calculation time profile using a one CPU core (PDB ID: [PDB:1JK9]).
| Time [sec.] | Ratio [%] | |
|---|---|---|
| P1. Initialization | 0.0 | 0.0 |
| P2. Receptor voxelization | 0.3 | 0.2 |
| P3. Forward FFT of receptor | 0.1 | 0.0 |
| P4. Ligand rotation & voxelization | 12.9 | 6.9 |
| P5. Forward FFT of ligand | 69.8 | 37.5 |
| P6. Convolution | 27.4 | 14.7 |
| P7. Inverse FFT | 71.5 | 38.4 |
| P8. Identifying best solutions | 4.3 | 2.3 |
| P9. Post processes | 0.0 | 0.0 |
| Total | 186.4 | 100.0 |
Figure 2FFT size of the proteins registered in the PDB (experimentally determined by X-ray diffraction, 78,958 structures).
Difference of parallelization among GPU and MIC offload and native implementations (nis number of MIC threads).
| Target of parallelization | #threads used for one ligand angle | Consumption of accelerator memory | |
|---|---|---|---|
| GPU | Each process in one ligand angle | All GPU threads | For only one ligand |
| MICoffload | Each process in one ligand angle | All MIC threads | For only one ligand |
| MICnative | Loop of rotational angles of ligand | One MIC thread | For |
Computational environment.
| CPU/MIC node | GPU node | |
|---|---|---|
| CPU | Intel Xeon E5-2670, 2.60 GHz (8 cores) | Intel Xeon X5670, 2.93 GHz (6 cores) |
| Memory | 54 GB | 64 GB |
| Accelerator | Intel Xeon Phi 5110P, 1.05 GHz (60 cores) | NVIDIA Tesla K20X, 0.73 GHz (2,688 CUDA cores) |
| Accelerator memory | 8 GB | 6 GB |
| OS | CentOS 6.3 | SUSE LES 11 SP1 |
| Compiler | Intel C++ Compiler 13.0 | Intel C++ Compiler 13.0 |
| FFT library | Intel MKL 11.0 | cuFFT (CUDA 5.0) |
Figure 3Acceleration rate for each system based on the total docking calculation time for 352 protein complexes.
Total docking calculation times for 352 protein complexes.
| 1CPU | 8CPUs | GPU | MICoffload | MICnative | |
|---|---|---|---|---|---|
| Total docking time [hour] | 30.8 | 4.9 | 2.0 | 9.4 | 6.0 |
Figure 4Images showing protein pairs of different sizes.
Docking calculation times and acceleration rates for three proteins of different sizes.
| Small | Medium | Large | |||||
|---|---|---|---|---|---|---|---|
| Receptor (#residues) | GRB2 C-ter | CCS metallochaperone (249) | Nitrogenase Mo-Fe protein (2026) | ||||
| Ligand (#residues) | Vav N-ter | SOD1 superoxide dismutase (153) | Nitrogenase Fe protein (578) | ||||
| PDB ID | [PDB: | [PDB: | [PDB: | ||||
| FFT size | 80 × 80 × 80 | 128 × 128 × 128 | 216 × 216 × 216 | ||||
| Docking time [second] (vs. 1CPU) | 1CPU | 38.3 | (1.0×) | 186.4 | (1.0×) | 1105.6 | (1.0×) |
| 8CPUs | 8.4 | (4.6×) | 38.5 | (4.8×) | 177.5 | (6.2×) | |
| GPU | 5.8 | (6.6×) | 10.8 | (17.3×) | 62.2 | (17.8×) | |
| MICoffload | 58.7 | (0.7×) | 77.0 | (2.4×) | 180.6 | (6.1×) | |
| MICnative | 7.6 | (5.0×) | 26.8 | (7.0×) | 310.5 | (3.6×) | |
The MIC native mode (MICnative) used an optimized numbers of threads, which were the largest numbers available for each protein size (small = 240 threads, medium = 171 threads, and large = 38 threads).
Docking calculation time results for the protein complex (PDB ID: [PDB:1JK9]) for each process (in seconds).
| 1CPU | 8CPUs | GPU | MICoffload | MICnative | |||||
|---|---|---|---|---|---|---|---|---|---|
| P1. Initialization | 0.0 | 0.0 | 0.8 | 4.0 | 0.7 | ||||
| P2. Receptor voxel | 0.3 | 0.3 | (1.1×) | 0.3 | (1.1×) | 0.3 | (1.1×) | 4.4 | (0.1×) |
| P3. Receptor FFT | 0.1 | 0.1 | (1.0×) | 0.0 | (1.7×) | 1.0 | (0.1×) | 0.3 | (0.2×) |
| P4. Ligand rot & voxel | 12.9 | 3.4 | (3.8×) | 2.3 | (5.5×) | 7.4 | (1.7×) | 1.2 | (11.1×) |
| P5. Ligand FFT | 69.8 | 14.2 | (4.9×) | 2.2 | (31.1×) | 15.1 | (4.6×) | 7.9 | (8.9×) |
| P6. Convolution | 27.4 | 4.6 | (5.9×) | 1.1 | (25.6×) | 13.9 | (2.0×) | 3.6 | (7.7×) |
| P7. Inverse FFT | 71.5 | 14.1 | (5.1×) | 2.2 | (31.8×) | 15.2 | (4.7×) | 8.3 | (8.6×) |
| P8. Identifying the bests | 4.3 | 1.7 | (2.5×) | 1.7 | (2.5×) | 9.8 | (0.4×) | 0.3 | (12.5×) |
| P9. Post processes | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ||||
| Data transfer | 0.6 | 10.1 | |||||||
| Total | 186.4 | 38.5 | (4.8×) | 10.8 | (17.3×) | 77.0 | (2.4×) | 26.8 | (7.0×) |
The values shown in parentheses are the acceleration rates relative to one CPU core. MIC native used 171 threads.