Literature DB >> 30035278

SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing.

Furqan Baig1, Hoang Vo1, Tahsin Kurc1, Joel Saltz1, Fusheng Wang1.   

Abstract

Much effort has been devoted to support high performance spatial queries on large volumes of spatial data in distributed spatial computing systems, especially in the MapReduce paradigm. Recent works have focused on extending spatial MapReduce frameworks to leverage high performance in-memory distributed processing capabilities of systems such as Spark. However, the performance advantage comes with the requirement of having enough memory and comprehensive configuration. Failing to fulfill this falls back to disk IO, defeating the purpose of such systems or in worst case gets out of memory and fails the job. The problem is aggravated further for spatial processing since the underlying in-memory systems are oblivious of spatial data features and characteristics. In this paper we present SparkGIS - an in-memory oriented spatial data querying system for high throughput and low latency spatial query handling by adapting Apache Spark's distributed processing capabilities. It supports basic spatial queries including containment, spatial join and k-nearest neighbor and allows extending these to complex query pipelines. SparkGIS mitigates skew in distributed processing by supporting several dynamic partitioning algorithms suitable for a rich set of contemporary application scenarios. Multilevel global and local, pre-generated and on-demand in-memory indexes, allow SparkGIS to prune input data and apply compute intensive operations on a subset of relevant spatial objects only. Finally, SparkGIS employs dynamic query rewriting to gracefully manage large spatial query workflows that exceed available distributed resources. Our comparative evaluation has shown that the performance of SparkGIS is on par with contemporary Spark based platforms for relatively smaller queries and outperforms them for larger data and memory intensive workflows by dynamic query rewriting and efficient spatial data management.

Entities:  

Keywords:  Computing methodologies → MapReduce algorithms; In-Memory processing; Information systems → MapReduce-based systems; MapReduce; Spark; Spatial processing; Spatial-temporal systems; Theory of computation → MapReduce algorithms

Year:  2017        PMID: 30035278      PMCID: PMC6054321     

Source DB:  PubMed          Journal:  Proc ACM SIGSPATIAL Int Conf Adv Inf


  1 in total

1.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.

Authors:  Ablimit Aji; Fusheng Wang; Hoang Vo; Rubao Lee; Qiaoling Liu; Xiaodong Zhang; Joel Saltz
Journal:  Proceedings VLDB Endowment       Date:  2013-08
  1 in total
  4 in total

1.  A PID-Based kNN Query Processing Algorithm for Spatial Data.

Authors:  Baiyou Qiao; Ling Ma; Linlin Chen; Bing Hu
Journal:  Sensors (Basel)       Date:  2022-10-09       Impact factor: 3.847

2.  Efficient 3D Spatial Queries for Complex Objects.

Authors:  Dejun Teng; Yanhui Liang; Hoang Vo; Jun Kong; Fusheng Wang
Journal:  ACM Trans Spat Algorithms Syst       Date:  2022-02-12

3.  SPEAR: Dynamic Spatio-Temporal Query Processing over High Velocity Data Streams.

Authors:  Furqan Baig; Dejun Teng; Jun Kong; Fusheng Wang
Journal:  Proc Int Conf Data Eng       Date:  2021-06-22

4.  3DPro: Querying Complex Three-Dimensional Data with Progressive Compression and Refinement.

Authors:  Dejun Teng; Yanhui Liang; Furqan Baig; Jun Kong; Vo Hoang; Fusheng Wang
Journal:  Adv Database Technol       Date:  2022 Mar-Apr
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.