| Literature DB >> 27350905 |
Omar Batarfi1, Radwa Elshawi2, Ayman Fayoumi1, Ahmed Barnawi1, Sherif Sakr3.
Abstract
A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.Entities:
Year: 2016 PMID: 27350905 PMCID: PMC4899405 DOI: 10.1186/s40064-016-2251-0
Source DB: PubMed Journal: Springerplus ISSN: 2193-1801
Fig. 1A sample attributed graph for bibliographic network
Fig. 2The grammar of G-SPARQL language (Sakr et al. 2012)
Fig. 3DSM relational encoding of attributed graph of Fig. 1
Fig. 4The four paradigms for building data storage and querying systems Hammoud et al. (2015)
Fig. 5The architecture of DG-SPARQL query execution engine
G-SPARQL algebraic operators Sakr et al. (2012)
| Operator | Description |
|---|---|
| NgetAttVal | Returns the values of an attribute for a set of nodes |
| EgetAttVal | Returns the values of an attribute for a set of edges |
| getEdgeNodes | Returns adjacent nodes, optionally through a specific relation, for a set of graph nodes |
| strucPred | Returns a set of vertices that are adjacent to other vertices with a specific relationship and optionally returns the connecting edges |
| edgeJoin | Returns pairs of vertices that are connected with an edge, optionally of a specified relationship, and optionally returns the connecting edges |
| pathJoin | Returns pairs of vertices which are connected by a sequence of edges of any length, optionally with a specified relationship, and optionally returns connecting paths |
| sPathJoin | Returns pairs of vertices which are connected by a sequence of edges of any length, optionally with a specified relationship, and returns the |
| filterPath | Returns paths that satisfy a condition |
Fig. 6An example DAG plan for G-SPARQL
Fig. 7Average query execution times of DG-SPARQL VS Giraph on LUMB datasets a Query Type QT1 b Query Type QT2 c Query Type QT3 d Query Type QT4
Fig. 8Speed-up improvement of query execution time in response to increasing the number of slave nodes (partitions)