| Literature DB >> 26877902 |
Abstract
This paper focuses on evaluating the impact of different data layouts on the computational efficiency of GPU-accelerated Inverse Distance Weighting (IDW) interpolation algorithm. First we redesign and improve our previous GPU implementation that was performed by exploiting the feature of CUDA dynamic parallelism (CDP). Then we implement three versions of GPU implementations, i.e., the naive version, the tiled version, and the improved CDP version, based upon five data layouts, including the Structure of Arrays (SoA), the Array of Structures (AoS), the Array of aligned Structures (AoaS), the Structure of Arrays of aligned Structures (SoAoS), and the Hybrid layout. We also carry out several groups of experimental tests to evaluate the impact. Experimental results show that: the layouts AoS and AoaS achieve better performance than the layout SoA for both the naive version and tiled version, while the layout SoA is the best choice for the improved CDP version. We also observe that: for the two combined data layouts (the SoAoS and the Hybrid), there are no notable performance gains when compared to other three basic layouts. We recommend that: in practical applications, the layout AoaS is the best choice since the tiled version is the fastest one among three versions. The source code of all implementations are publicly available.Entities:
Keywords: CUDA dynamic parallelism; Data layout; GPU; IDW interpolation
Year: 2016 PMID: 26877902 PMCID: PMC4735051 DOI: 10.1186/s40064-016-1731-6
Source DB: PubMed Journal: Springerplus ISSN: 2193-1801
Fig. 1Data layouts: Array-of-Structures (AoS) and Structure-of-Arrays (SoA) a AoS; b SoA
Fig. 2Performance comparison of the original (old) and the improved (new) CDP versions. a Execution time of the new and old versions; b speedups of the new version over the old version
Fig. 3Data layout: Array of aligned Structures (AoaS). a Single precision; b double precision
Fig. 4Several build-in data types in CUDA. (The data type double4 is aligned into two 16 bytes words)
Fig. 5Data layout: Structure of Arrays of aligned Structures (SoAoS)
Fig. 6The hybrid data layout by combining AoS and AoV
Fig. 7Performance of GPU implementations on single precision
Fig. 8Performance of GPU implementations on double precision