| Literature DB >> 31746154 |
Qianqian Fang1,2, Shijie Yan2.
Abstract
The mesh-based Monte Carlo (MMC) algorithm is increasingly used as the gold-standard for developing new biophotonics modeling techniques in 3-D complex tissues, including both diffusion-based and various Monte Carlo (MC)-based methods. Compared to multilayered and voxel-based MCs, MMC can utilize tetrahedral meshes to gain improved anatomical accuracy but also results in higher computational and memory demands. Previous attempts of accelerating MMC using graphics processing units (GPUs) have yielded limited performance improvement and are not publicly available. We report a highly efficient MMC-MMCL-using the OpenCL heterogeneous computing framework and demonstrate a speedup ratio up to 420× compared to state-of-the-art single-threaded CPU simulations. The MMCL simulator supports almost all advanced features found in our widely disseminated MMC software, such as support for a dozen of complex source forms, wide-field detectors, boundary reflection, photon replay, and storing a rich set of detected photon information. Furthermore, this tool supports a wide range of GPUs/CPUs across vendors and is freely available with full source codes and benchmark suites at http://mcx.space/#mmc.Entities:
Keywords: Monte Carlo method; heterogeneous computing; light transport; optical imaging
Year: 2019 PMID: 31746154 PMCID: PMC6863969 DOI: 10.1117/1.JBO.24.11.115002
Source DB: PubMed Journal: J Biomed Opt ISSN: 1083-3668 Impact factor: 3.170
Fig. 1Illustration of a ray-tetrahedron intersection testing using the branchless Badouel algorithm. records the signed distances from the current position to the four facets of the tetrahedron along the current direction . A negative distance means intersecting in the direction (such as and ).
Summary of the meshes and baseline simulation speeds (in photon/ms, the higher the faster) for the selected benchmarks. The baseline speeds were measured using a single-thread (MMC-1) or eight-thread (MMC-8) SSE4-enabled MMC on an Intel i7-7700K CPU.
| Benchmark | B1 (cube60) | B1D (d-cube60) | B2 (sphshells) | B2D (d-sphshells) | B3 (colin27) | B4 (skin-vessel) |
|---|---|---|---|---|---|---|
| Node# | 29,791 | 8 | 604,297 | 3723 | 70,226 | 76,450 |
| Elem.# | 135,000 | 6 | 3,733,387 | 21,256 | 423,375 | 483,128 |
| MMC-1 | 67.25 | 100.06 | 5.73 | 10.39 | 12.34 | 36.72 |
| MMC-8 | 351.26 | 568.06 | 26.43 | 57.49 | 67.43 | 150.42 |
Fig. 2Fluence (, in -scale) contour plots of MMC and MMCL in various benchmarks: (a) B1/B1D, (b) B2/B2D, (c) B3, and (d) B4. In (b), we also include a voxel-based MC (MCX-CL) output for comparison. Black-dashed lines mark tissue boundaries.
Fig. 3Speeds of MMCL in six benchmarks (B1 to B4, single-grid; B1D/B2D, dual-grid): (a) speedup ratios over a single-threaded (on i7-7700K) SSE4 MMC and (b) speeds in dual-grid simulations compared to MCX-CL. In (a), we also report the speed (photon/ms, light-blue) and speedups over the single-threaded (red) and multithreaded (green) MMC in the labels for benchmark-B3 (Colin27).
Simulation speed (in photon/ms, the higher the faster) of MMCL in six benchmarks (B1 to B4, single grid; B1D and B2D, dual-grid); we also report the voxel-based MCX-CL speed in benchmarks B1D and B2D. The master script to reproduce the above results can be found in the “mmc/examples/mmclbench/” folder of our software.
| Device | B1 | B2 | B3 | B4 | B1D | B2D | B1D (MCXCL) | B2D (MCXCL) |
|---|---|---|---|---|---|---|---|---|
| NVIDIA Titan V | 7874.02 | 899.25 | 5198.05 | 8829.24 | 23359.03 | 3709.20 | 37835.79 | 9353.66 |
| NVIDIA RTX 2080 | 7319.04 | 465.62 | 3930.20 | 4147.83 | 22202.49 | 3295.87 | 43917.44 | 10095.91 |
| NVIDIA GTX 1080Ti | 2959.81 | 357.89 | 1008.48 | 2721.38 | 6079.03 | 924.16 | 20648.36 | 3826.14 |
| NVIDIA GTX 1080 | 2642.92 | 318.22 | 843.28 | 1759.82 | 4547.73 | 665.72 | 15351.55 | 2796.73 |
| NVIDIA GTX 980Ti | 2254.44 | 295.36 | 663.21 | 2925.43 | 3872.37 | 542.73 | 12211.50 | 2319.06 |
| AMD Vega 20 | 1315.62 | 189.17 | 424.07 | 2839.54 | 2579.51 | 378.00 | 29577.05 | 7105.30 |
| AMD Vega 10 | 1086.45 | 161.42 | 326.89 | 2444.87 | 2170.52 | 302.46 | 25680.53 | 5865.45 |
| Dual Xeon E5-2658v3 | 708.19 | 103.86 | 265.00 | 756.74 | 1405.90 | 167.69 | 1127.86 | 181.83 |
| Intel i7-8700K | 538.80 | 49.29 | 126.80 | 313.77 | 670.12 | 80.92 | 597.07 | 89.02 |
| Intel i7-7700K | 434.86 | 35.41 | 103.64 | 188.83 | 522.77 | 67.28 | 397.16 | 58.96 |