| Literature DB >> 29374404 |
Leiming Yu1, Fanny Nina-Paravecino1, David Kaeli1, Qianqian Fang2.
Abstract
We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs. (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).Entities:
Keywords: Monte Carlo; Open Computing Language; heterogeneous computing; photon transport
Mesh:
Year: 2018 PMID: 29374404 PMCID: PMC5785911 DOI: 10.1117/1.JBO.23.1.010504
Source DB: PubMed Journal: J Biomed Opt ISSN: 1083-3668 Impact factor: 3.170
Fig. 1Generalized parallel Monte Carlo photon transport simulation workflow for heterogeneous systems.
Fig. 2The MCX-CL simulation speed () on different computing devices after applying three optimization schemes: Opt1: using hardware-native math library; Opt2: using optimized thread configuration; and Opt3: reducing thread divergence. The throughputs in the B1, B2, and B2a benchmarks are shown as a stacked-bar and the four bars for each hardware are baseline (•), Opt1 (+), Opt1+2 (×), Opt1+2+3 (#), displayed from left to right. The inset shows the speed comparison between the OpenCL and CUDA versions of the algorithm on NVIDIA GPUs.
Fig. 3Validation of workload-balancing strategies. (a) Comparison between thread-level and workgroup-level load-balancing approaches using benchmark B1, (b) comparison between three device-level load-balancing strategies, and (c) acceleration using to NVIDIA 1080Ti GPUs for the three benchmarks; linear acceleration (ideal case) is shown in dashed lines.