| Literature DB >> 28989754 |
Gang Mei1, Liangliang Xu1, Nengxiong Xu1.
Abstract
This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points' spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available.Entities:
Keywords: geographic information system; graphics processing unit; inverse distance weighting; parallel algorithm; spatial interpolation
Year: 2017 PMID: 28989754 PMCID: PMC5627094 DOI: 10.1098/rsos.170436
Source DB: PubMed Journal: R Soc Open Sci ISSN: 2054-5703 Impact factor: 2.963
Figure 1.(a-g) Demonstration of the finding of k nearest neighbours (k=10).
Figure 2.Data layouts SoA (a), AoS (b) and AoaS (c).
Figure 3.A CUDA kernel of the naive version of GPU-accelerated AIDW.
Specifications of the adopted three machines for conducting the experimental tests.
| specifications | PC no. 1 | PC no. 2 | PC no. 3 |
|---|---|---|---|
| CPU | Intel Core i7-4700MQ | Intel Xeon E5-2650 v3 | Intel Xeon E5-2680 v2 |
| CPU frequency (GHz) | 2.40 | 2.30 | 2.80 |
| CPU RAM (GB) | 4 | 144 | 96 |
| CPU core | 8 | 40 | 40 |
| GPU | GeForce GT 730M | Quadro M5000 | Tesla K40c |
| GPU memory (GB) | 1 | 8 | 12 |
| GPU core | 384 | 2048 | 2880 |
| OS | Windows 7 Professional | Windows 7 Professional | Windows 7 Professional |
| compiler | Visual Studio 2010 | Visual Studio 2010 | Visual Studio 2010 |
| CUDA version | v. 7.0 | v. 7.0 | v. 7.0 |
Execution time (ms) of CPU and GPU implementations of the AIDW method on single precision on the PC equipped with NVIDIA GPU GeForce GT730M.
| data size (1 | ||||||
|---|---|---|---|---|---|---|
| version | data layout | 10 K | 50 K | 100 K | 500 K | 1000 K |
| CPU | — | 6791 | 168 234 | 673 806 | 16 852 984 | 67 471 402 |
| GPU naive | SoA | 65 | 863 | 2884 | 63 599 | 250 574 |
| AoaS | 66 | 875 | 2933 | 64 593 | 254 488 | |
| GPU tiled | SoA | 61 | 714 | 2242 | 43 843 | 168 189 |
| AoaS | 62 | 722 | 2276 | 44 891 | 172 605 | |
Figure 4.Speed-ups of the GPU-accelerated AIDW method on the GPU GT730M. (a) On single precision and (b) on double precision.
Execution time (ms) of CPU and GPU implementations of the AIDW method on double precision on the PC equipped with NVIDIA GPU GeForce GT730M.
| data size (1 | ||||||
|---|---|---|---|---|---|---|
| version | data layout | 10 K | 50 K | 100 K | 500 K | 1000 K |
| CPU | — | 6791 | 168 234 | 673 806 | 16 852 984 | 67 471 402 |
| GPU naive | SoA | 924 | 20 761 | 82 400 | 2 047 590 | 8 184 090 |
| AoaS | 929 | 20 821 | 82 524 | 2 050 269 | 8 199 389 | |
| GPU tiled | SoA | 915 | 20 521 | 81 306 | 2 017 650 | 8 062 332 |
| AoaS | 916 | 20 505 | 81 218 | 2 016 367 | 8 057 219 | |
Execution time (ms) of CPU and GPU implementations of the AIDW method on single precision on the PC equipped with NVIDIA GPU Quadro M5000.
| data size (1 | ||||||
|---|---|---|---|---|---|---|
| version | data layout | 10 K | 50 K | 100 K | 500 K | 1000 K |
| CPU | — | 5897 | 141 582 | 576 228 | 14 224 126 | 57 207 987 |
| GPU naive | SoA | 24 | 325 | 917 | 20 307 | 93 556 |
| AoaS | 21 | 269 | 867 | 19 685 | 87 992 | |
| GPU tiled | SoA | 16 | 258 | 859 | 19 225 | 79 692 |
| AoaS | 16 | 257 | 833 | 18 640 | 75 646 | |
Figure 5.Speed-ups of the GPU-accelerated AIDW method on the GPU M5000. (a) On single precision and (b) on double precision.
Execution time (ms) of CPU and GPU implementations of the AIDW method on double precision on the PC equipped with NVIDIA GPU Quadro M5000.
| data size (1 | ||||||
|---|---|---|---|---|---|---|
| version | data layout | 10 K | 50 K | 100 K | 500 K | 1000 K |
| CPU | — | 5897 | 141 582 | 576 228 | 14 224 126 | 57 207 987 |
| GPU naive | SoA | 305 | 5121 | 17 626 | 406 338 | 1 628 726 |
| AoaS | 306 | 5127 | 17 635 | 404 180 | 1 643 925 | |
| GPU tiled | SoA | 263 | 5038 | 17 589 | 402 488 | 1 626 039 |
| AoaS | 263 | 5040 | 17 587 | 401 551 | 1 616 581 | |
Execution time (ms) of CPU and GPU implementations of the AIDW method on single precision on the PC equipped with NVIDIA GPU Tesla K40c.
| data size (1 | ||||||
|---|---|---|---|---|---|---|
| version | data layout | 10 K | 50 K | 100 K | 500 K | 1000 K |
| CPU | — | 5195 | 118 576 | 475 270 | 11 793 605 | 47 368 231 |
| GPU naive | SoA | 39 | 406 | 1527 | 31 166 | 123 601 |
| AoaS | 33 | 356 | 1236 | 27 314 | 107 927 | |
| GPU tiled | SoA | 30 | 372 | 1307 | 28 835 | 112 968 |
| AoaS | 30 | 326 | 1216 | 24 518 | 97 235 | |
Figure 6.Speed-ups of the GPU-accelerated AIDW method on the GPU K40c. (a) On single precision and (b) on double precision.
Execution time (ms) of CPU and GPU implementations of the AIDW method on double precision on the PC equipped with NVIDIA GPU Tesla K40c.
| data size (1 | ||||||
|---|---|---|---|---|---|---|
| version | data layout | 10 K | 50 K | 100 K | 500 K | 1000 K |
| CPU | — | 5195 | 118 576 | 475 270 | 11 793 605 | 47 368 231 |
| PU naive | SoA | 56 | 934 | 3156 | 73 818 | 292 422 |
| AoaS | 56 | 936 | 3166 | 74 268 | 293 565 | |
| GPU tiled | SoA | 48 | 789 | 2649 | 61 640 | 242 046 |
| AoaS | 48 | 786 | 2639 | 61 416 | 240 968 | |
Figure 7.Performance comparison of the layouts SoA and AoaS.
Figure 8.Performance comparison of the naive version and tiled version.