| Literature DB >> 23710465 |
Heeseung Jo1, Jinkyu Jeong, Myoungho Lee, Dong Hoon Choi.
Abstract
Recently, biological applications start to be reimplemented into the applications which exploit many cores of GPUs for better computation performance. Therefore, by providing virtualized GPUs to VMs in cloud computing environment, many biological applications will willingly move into cloud environment to enhance their computation performance and utilize infinite cloud computing resource while reducing expenses for computations. In this paper, we propose a BioCloud system architecture that enables VMs to use GPUs in cloud environment. Because much of the previous research has focused on the sharing mechanism of GPUs among VMs, they cannot achieve enough performance for biological applications of which computation throughput is more crucial rather than sharing. The proposed system exploits the pass-through mode of PCI express (PCI-E) channel. By making each VM be able to access underlying GPUs directly, applications can show almost the same performance as when those are in native environment. In addition, our scheme multiplexes GPUs by using hot plug-in/out device features of PCI-E channel. By adding or removing GPUs in each VM in on-demand manner, VMs in the same physical host can time-share their GPUs. We implemented the proposed system using the Xen VMM and NVIDIA GPUs and showed that our prototype is highly effective for biological GPU applications in cloud environment.Entities:
Mesh:
Year: 2013 PMID: 23710465 PMCID: PMC3654629 DOI: 10.1155/2013/939460
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The architecture of a GPU equipped machine on PCI-E channel.
Figure 2The system architecture of direct pass-through GPU virtualization.
Figure 3The overall coarse-grained GPU sharing mechanism sequences.
Figure 4The detailed modules and operations of the GPU-Admin and the GPU-Manger.
The functions that are wrapped by the WrapCUDA library.
| GPU allocation call | GPU deallocation call |
|---|---|
| cudaGetDeviceCount ( ) | cudaThreadExit ( ) |
| cudaGetDevice ( ) | |
| cudaMalloc ( ) | |
| cudaDeviceReset ( ) | |
| cudaChooseDevice ( ) | |
| cudaDeviceSynchronize ( ) |
Algorithm 1An example implementation of the WrapCUDA library function.
The specifications of evaluation system.
| Device | Specification |
|---|---|
| CPU | Intel(R) Xeon(R) E5620 (2.40 GHz) |
| Chipset | Intel(R) 5520 |
| Memory | DDR3 1333 MHz (24 G) |
| PCI slot | PCI Express Gen2, 4 EA |
| GPU | NVIDIA Quadro FX 3800, 4 EA |
Figure 5The performance comparison with other schemes.
Figure 6The performance evaluation using bioapplications.
The time to hot plug-in/out of PCI-E channel.
| Operation | Time (second) |
|---|---|
| GPU allocation (hot plug-in) | 1.3 ± 0.1 |
| GPU deallocation (hot plug-out) | 1.3 ± 0.1 |
Figure 7The GPU sharing effect using the barraCUDA.
Figure 8The average waiting time of VMs.