Literature DB >> 35789916

Most relevant point query on road networks.

Zining Zhang¹, Shenghong Yang¹, Yunchuan Qin¹, Zhibang Yang¹, Yang Huang¹, Xu Zhou¹.

Abstract

Graphs are widespread in many real-life practical applications. One of a graph's fundamental and popular researches is investigating the relations between two given vertices. The relationship between nodes in the graph can be measured by the shortest distance. Moreover, the number of paths is also a popular metric to assess the relationship of different nodes. In many location-based services, users make decisions on the basis of both the two metrics. To address this problem, we propose a new hybrid-metric based on the number of paths with a distance constraint for road networks, which are special graphs. Based on it, a most relevant node query on road networks is identified. To handle this problem, we first propose a Shortest-Distance Constrained DFS, which uses the shortest distance to prune unqualified nodes. To further improve query efficiency, we present Batch Query DFS algorithm, which only needs only one DFS search. Our experiments on four real-life road networks demonstrate the performance of the proposed algorithms.

Entities: Chemical

Keywords: Graph; Path enumeration; Relevant vertices; Road networks

Year: 2022 PMID： 35789916 PMCID： PMC9244333 DOI： 10.1007/s00521-022-07485-x

Source DB: PubMed Journal: Neural Comput Appl ISSN： 0941-0643 Impact factor: 5.102

Introduction

Graphs are used in many practical applications, such as social networks [1], information networks, gene networks, protein interaction networks, and road networks. It is a fundamental problem in graph data management to analyze the relationship between given two vertices; the relationship is popularly measured by the shortest distance [2, 3] or k-shortest paths [4, 5]. Based on them, two variants such as KNN [6, 7] and k-closest pairs [8, 9] are presented. We can make decisions, forecasts or classifications by analyzing the relationship between different vertices. For example, the applications of medicine include tracking infectious diseases, predicting drug side effects and calculating the effect of public health interventions. Recently, it has been applied to COVID 19 outbreak prediction. KNN is also used for classification in machine learning. In the field of medical image, people can judge whether patients have a dis ease, classify benign and malignant, and predict whether there are precursors of disease through machine learning. Recently, path enumeration problem is proposed to assess how one vertex affects another vertex and attracts growing attention [10]. For example, in biological networks, it can gain the interaction chain by enumerating the paths between two substances [11]. In e-commerce networks, illegal acts such as money laundering can be checked by enumerating the paths of two entities and detecting whether there is a circle when a new edge is inserted [12]. Recently, [13, 14] research the path enumeration problem with a hop constraint, which is efficient to obtain limited essential paths. In many real-life applications, especially location-based services, it requires to compute the shortest paths. For example, we can select the nearest restaurant by calculating the shortest path between two locations in the road network. However, in many scenarios only determining the shortest path is not enough. Users may be interested in alternative paths which own some attributes but are longer than the shortest path. There have been many works about the shortest path with constraints [15]. For example, when we go to a restaurant within a few kilometers, we usually choose the nearest restaurant. However, some roads will be closed in real life because of accidents. The distance from other routes to the destination is much longer, so we will choose another restaurant. Inspired by this, we present a new metric to estimate the correlation between two points. Here, the number of paths under a distance constraint is utilized to measure the selectivity of destinations. The candidate node with more paths is more likely to be chosen. As shown in Fig. 1, assume that s is the source point; and are target points. It needs to choose one of the points of interest as our target. There are two paths and from the original point to whose lengths are 3.5 km and 4.2 km, respectively. Besides, there are two paths and to with lengths 3.2 km and 5.1 km, respectively. Among these paths, path is the shortest, and we can choose as a final result.

Fig. 1

An illustration of choosing the restaurant in a road network

However, all kinds of situations may occur on the road network. If the path is blocked for some reason, we cannot reach and through the path . As a result, we can only reach through path and reach through path . If we need reach the point of interest within a distance constraint 5 km in order to prevent the cost from being too high, path is not a good choice since it cannot reach within 5 km. But we can go through path to within 5 km. Finally, will be selected as the result instead of . Motivated by the above scenario, the number of paths can be utilized to measure the relevance between points on road networks. An illustration of choosing the restaurant in a road network In this paper, we present a new metric to measure the relevant of two nodes on road networks based on the number of paths within a distance constraint d. The more paths whose lengths do not exceed d, the stronger the correlation between the two nodes. In this way, it can ensure that the number of candidate paths to destination t is the largest under the condition of constraint d. Based on this new metric, the most relevant point query on road networks is formulated. To process the most relevant point query effectively, it faces the following challenges. Firstly, the main obstacle is to calculate the number of paths satisfying the distance constraint from the source point to all destination points. The path enumeration problem is complex, and its time-consuming increases exponentially. Secondly, similar to the work in [14], with a small value of distance constraint d, it also involves a vast search space because the number of paths increases exponentially w.r.t d. In this paper, to reduce redundant computation cost, we uses DFS to check the number of paths of all target points in a search process. In addition, unqualified nodes are pruned as early as possible by a lower bound of distance of each point in the search process. This contributes to accelerating the query procedure. Our principal contribution in this paper is summarized as follows. We analyze the limitation of the existing way of measuring the relevance of two points and propose a new metric, and formulate the most relevant point query on road network for the first time. We use the shortest distance to prune and use BC-DFS of [14] as the basic algorithm. At the same time, we prove that all results can be found in one search in Sect. 4.4, so we propose an optimization algorithm. We conducted experiments on four real road networks. Our optimization algorithm performs better when d is small than the basic algorithm. Moreover, because of some deficiencies in the experiment, we pointed out the direction of our future work.

Organization

The rest of this paper is organized as follows. We give related work in Sect. 2, and we introduce relevant concepts and formally define the problem in Sect. 3. In Sect. 4, we propose the basic scheme of the solution and the optimization algorithm. In Sect. 5, we conduct experimental research. This paper is summarized in Sect. 6.

Related work

In Sect. 2, we discuss some existing algorithms related to our problem.

KNN and K-closet pairs

KNN finds k-nearest neighbors from source point to target set. It is closely related to our life. We can use it to find nearby restaurants, close people and so on. Most of the existing algorithms are designed based on divide and conquer. They search and prune through partitions. [6] proposed an efficient index for KNN Search on road networks, called G-Tree. G-Tree adds two functions to adapt to KNN search on road networks compared with R-Tree. The first is the balanced tree structure, which can help prune the subtree. The road network is recursively divided into subnetworks using a multi-level graph partition algorithm [16]. Each subnetwork corresponds to a node of the G-Tree. The algorithm ensures that the number of boundary points is as few as possible but also that the size of each subgraph is almost the same. The second is to effectively calculate the minimum distance from the query location to the tree node for the best first search. Save the shortest distance from all vertices to boundary points in the leaf node. Save the distance of all boundary points of child nodes in non-leaf nodes. The minimum distance from the point to the tree node is calculated by dynamic programming. The tree node is added to the priority queue for the best priority search if the minimum distance of the tree node is greater than the distance of the k-th neighbor. Recently many different methods have been applied to solve these problems. For example, [17] extended their algorithm to GPU, which greatly accelerated the process. When we solve these problems in a higher-dimensional space, the complexity increases dramatically. Machine learning, which is very popular right now, is used for solving KNN (e.g., [18, 19]). K-Closet Pairs is extended from KNN. K-Closet Pairs finds k-pairs from source set to target set. [20] proposed a pruning heuristic and two updating strategies for minimizing the pruning distance and use them in the design of three non-incremental branch-and-bound algorithms for K-CPQ between spatial objects stored in two R-trees. [21] studied the problem of processing KCPQs between RAM-based point sets, using plane-sweep (PS) algorithms. [9] proposed -Tree solve the K-Closet Pairs problem based on G-Tree. The goal is to maximize the minimum network distance between subgraphs. Therefore, it uses LEM [22] to select two subgraphs with the shortest distance for folding iteratively. Another difference is that it saves the minimum network distance between each pair of boundary nodes of any two different leaf nodes of the -Tree. [23] proposed a branch-and-bound framework associated with effective lower and upper bound pruning techniques and early stopping conditions for efficiently retrieving relevant top-k closet pairs.

Shortest path enumeration

The “shortest path” is the shortest of all paths between two points. To enumerate all paths within d, we can keep on using the k-shortest paths algorithm by increasing k until the shortest path detected exceeds the distance constraint d where k is the number of paths. There are many classic algorithms (e.g., [4, 22, 24–26]). The most representative work is Yen’s algorithm [27]. Yen finds the next shortest path by continuously deviating from the current shortest path. Many existing algorithms are optimized on this basis. For example, [4] abstracts the shortest distance into a point. Moreover, they use the Yen algorithm and landmark index to find the lower bound; they can then use the best-first search algorithm to prune. Theodoros Chondrogiannis studied an interesting problem in [5]. They aim to find k-shortest paths that are sufficiently dissimilar and as short as possible. To compute kSPwLO (k-shortest paths with limited overlap) queries, they proposed two exact algorithms: one-pass and multi-pass. They also study two classes of heuristic algorithms: (a) performance-oriented heuristic algorithms that trade shortness for performance. (b) Completeness-oriented heuristic algorithms that trade dissimilarity for completeness. Their performance is not ideal when used in the road network and multi-target points.

Simple path enumeration

There are some existing works on the problem of enumerating s-t simple paths [13, 14, 28–33]. In [28], their focus is how to construct a succinct presentation of simple paths. [30, 31] have proposed polynomial delay algorithms for the s-t path enumeration problem, but the actual effect is not ideal. In recent years, You Peng et al. have done many studies. [14] studies the Hop-constrained s-t simple path enumerating. So that people pay attention to the limited important paths. They proposed BC-DFS and JOIN to solve this problem.The idea of BC-DFS is “do not fall into the same trap twice by learning from mistakes.” JOIN searches from source vertex and target vertex to find the middle vertices cut. Then they will join it based on the middle vertices cut. JOIN has a good performance. The time complexity is , and the space is bounded by where is the number of hop-constrained s-t paths, m is the number of edges, and k is the hop constraint. However, JOIN has used an unweighted graph. We cannot find the middle vertices cut in the weighted graph. Then [32] proposed the first FPGA-based algorithm PEFP to solve the problem. On the host side, they reduce the graph size and search space by a preprocessing algorithm, Pre-BFS. On the FPGA side in PEFP, they proposed a novel DFS-based batching technique to save on-chip memory efficiently. Thanks to hardware acceleration, the performance is better than JOIN. The latest research on their work is [33]. In addition to the existing JOIN and BC-DFS, they also proposed the SCB algorithm. The main idea of SCB is that when we find a result using DFS method, we need to go back to at most k steps to get a new valid sub-paths with blocking some vertices. Many invalid vertices could be avoided during the process if they violate the diversity constraints.

Preliminary

In this section, we introduce relevant definitions. We mainly study the situation in the road network. Firstly, the road network is transformed into a graph, and then the related concepts and problem definitions are introduced.

Road networks

A road network is modeled as an undirected weighted graph , where V is a vertex set and is an edge set. A vertex is either a road intersection or an end of a road, and an edge represents a road segment that enables travel between vertices and . W assigns a real-valued weight w(e) to an edge e that represents the corresponding road segment’s length.

Distance-constrained s-t path

A path p from the vertex v to the vertex is a sequence of vertices such that for every . In this paper, we denote a path from u to v by p(u, v). A simple path is a loop-free path where there are no repetitions of vertices and edges. By len(p), we denote the length of the path p (i.e., the sum of the weights of each edge of the path p). If where d is the pre-defined distance constraint, we say a path p is a distance-constrained path. For simplicity, we use path to denote distance-constrained simple path.

Problem definition

Given two vertices u and v, let num(u, v) denote the number of paths from u to v. The more paths between u and v, the more relevant they are. Given a graph G, a distance d, the source vertex s and the target set T, a most relevant point query returns a point t where t satisfies the following conditions.

Search algorithm

Basic idea

To solve the problem in this paper, we introduce two solutions.

Shortest-distance constrained DFS

A simple solution is to start the DFS search from the source point. We stop the search when the target point is found or the length exceeds d. When the current point’s qualified paths have been found, we continue to search the number path of the following target point. It is worth noting that the search does not touch the existing vertices in the visited set to avoid loops. After all branches of the current vertex have been accessed, it needs to be cleared from the visited set. To reduce the amount of computation, we use simple pruning to reduce the unnecessary search. We can calculate the shortest distance from the target point to all points in the search process. Our scene is under the condition of distance constraint d. When we apply Dijkstra to calculate the shortest distance, if the shortest distance is greater than d, we set sd[u] infinite means unreachable with distance d. When the existing length plus the shortest distance (i.e., sd[u]) is greater than d, it can be pruned directly, because sd[u] represents the minimum length required from u to the endpoint. An example by using SC-DFS Figure 2 shows an example of the basic method. Given a graph G, the source vertex , the target set , the distance constraint . The number on the edge represents the actual distance between two vertices. First, we compute the shortest distance from target vertices. The shortest distance is 3.3 km for , so if we explore through , the distance used plus the is . We do not need to continue exploring. The same as the . We can find two paths within 5 km. Path , path . Their lengths are 4.4 km and 4.3 km The shortest distance is 5.6 km and is 5.8 km for . So the vertex does not need to explore. Similarly, we can find three paths within 5 km for . Path , path . Path . Their lengths are 4.8 km, 5 km and 5 km. The , the shortest path is . Although the shortest path is , the . We will choose the point as result.

Fig. 2

An example by using SC-DFS

The pseudocode of SC-DFS is shown in Algorithm 1. We calculate the num(s, t) of each pair by CalculateNum(lines 4). Before that, we compute the shortest distance by applying Dijkstra (lines 3). Then we choose the point t whose path number is maximal (lines 5-7). Finally, t is returned as the most relevant point. As shown in Algorithm 2, we invoke CalculateNum() to compute the number of paths between two points. Initially, a set visited[u] is initialized to false, and the current path distance dis is initialized to 0. If the target node is visited at current, we increase the number of paths by one (lines 2-4). Besides visited[u] is utilized to check whether u has been accessed (lines 8). Lines 9-10 check whether the current path satisfies the distance constraint.

Barrier-based constrained DFS

The second baseline algorithm draws on the idea of [14], namely BC-DFS. The idea of BC-DFS is “do not fall in the same trap twice by learning from mistakes.” BC-DFS will explore wrong branches, but it also learns from the mistakes. For each outgoing neighbor v of u to be visited in BC-DFS, BC-DFS will set a barrier for v. If “”, the search will not continue. There is an example in Fig. 3a. In this example, the hop constraint and search stack . We will explore the outgoing neighbor of u. But we cannot reach t with the hop constraint . So we set the barrier of u is 2. It represents at least 2 hop from u to t. Then when we are searching another path , we will not continue to explore u. Because the length of stack S plus the barrier of u is 4, .

Fig. 3

An example by using BC-DFS

BC-DFS is a polynomial delay algorithm with O(km) time per output where k is the hop constraint, and m is the number of edges. We can extend it to the road network. However, the performance of BC-DFS is not good in some cases. As shown in Fig. 3b, the barrier of u does not affect the vertices on the left. We set the barrier of u to 2 after we search the path . However, when we explore on the left, it will not reach u again. The setting of some obstacles does not affect other parts. Secondly, for our problem, it still needs one to one search and needs to set a different barrier for source vertices. An example by using BC-DFS

Batch query DFS

As discussed in Sect. 4.2, SC-DFS uses the shortest distance to prune unqualified nodes. BC-DFS cannot be utilized to our problem in this paper. It is since its inspection cost is too high and it is not easy to set the barrier for multi-target points. Inspired by the Dijkstra algorithm, we can share the paths searched. The subpath of the shortest path is also the shortest path, so that the Dijkstra algorithm can find all shortest paths in one search from one point. We also prove that it can gain the number of paths for all nodes only in one search. Accordingly, we will not stop the search procedure when we explore a target.

Theorem 1

The number of paths for all target points can be found in BQ-DFS.

Proof

As shown in Fig. 4, suppose that the target points are and . There are two ways from the source point s to the target point . One is to reach through (i.e., ), and the other is not to reach through (i.e., ). For the first case, when we reach by using BQ-DFS, we will record the path number of and the paths from s to have been explored. We will not stop and continue to search downward. If the path’s length from s to is , then we just need to explore the path from to within . d is the constraint distance. The path from s to can be shared. But in SC-DFS, when we reach , we will stop and restart the search from s. A large number of repeated searches can be reduced because the paths before can be shared in BQ-DFS. For the second case, does not affect these paths. Similarly, when there are multiple target points, the paths of other target points can also be searched by sharing the explored paths in one search.

Fig. 4

Proof outline for BQ-DFS

Proof outline for BQ-DFS Considering the pruning strategy in SC-DFS is too expensive for multiple vertices. We do not use the shortest distance as the lower bound. In BQ-DFS, we find all paths in one search. If we use the shortest distance to prune, we must choose the minimum distance as the lower bound to ensure that the resulting search space is not pruned. So the points originally cut out in SC-DFS cannot be reduced here.

Example

In BQ-DFS, since Dijkstra costs too much and has poor pruning efficiency, we stop the BQ-DFS algorithm by the constraint d. Figure 5 shows an example. The constraint is . When we reach , we will record the number of the path from s to . Then we will continue to explore. The next vertex is , and the current length is . We can record the number of the path from s to . When we search v, the current distance is . We will stop searching and backtrack.

Fig. 5

An example by using BQ-DFS

An example by using BQ-DFS Algorithm 3 shows BQ-DFS’s pseudocode. We just use d to check whether continue search (lines 8-9). When we search one target point, we record number of the path. And we would not stop exploring (lines 2-3). Finally, we select the point as a final result of the most relevant point query.

Experiments

In this section, we evaluate the efficiency of proposed algorithms by comprehensive experiments.

Experimental setting

Datasets

We evaluate our algorithms on three real-world datasets, namely Amsterdam, Berlin and Oslo, which are road networks [34], with a size of hundreds of thousands of vertices. Besides, we use the real traffic network in Beijing, which the number of vertices and edges is more than one hundred thousand. In order to explore the influence of edge weight, we also calculate the average weight of each road network. The statistics of these datasets are illustrated in Table 1.

Table 1

Datasets

Data	Number of vertices	Number of edges	Average weight
Amsterdam	106, 600	130, 091	28
Berlin	428, 769	504, 229	31
Oslo	305, 175	330, 633	16
Beijing	165, 990	225, 998	343

Datasets

Query sets

To evaluate the search performance, we randomly choose 100 vertices as the query location, and for each query location, we generate 50 groups of target objects. We set d based on the average weight and report the average query response time of the algorithms to evaluate their time efficiency.

Algorithms

We evaluate the performance of three algorithms as follows: SC-DFS: The distance-constrained DFS pruned by shortest distance presented in Sect. 4.2. BC-DFS: The barrier-based constrained DFS introduced in Sect. 4.3. BQ-DFS: The Batch Query DFS presented in Sect. 4.4.

Implementation

All algorithms were implemented in C++ and conducted on an Intel(R) Core(TM) CPU i7-7700HQ@2.80GHz with 16GB RAM.

Evaluation of algorithms

Firstly, we compare the average running time of three algorithms (SC-DFS, BC-DFS, BQ-DFS) on four road networks. To achieve similar performance, the distance constraint d is set to 1500 m in the Beijing road network, while for the others, the d is set to 400 m. As shown in Fig. 6, the average running time of these three algorithms increases with the graph size grows. The overall performance of BQ-DFS is the best, followed by BC-DFS, and the performance of SC-DFS is the worst. BQ-DFS can be regarded as an upgraded version of SC-DFS, because it can effectively improve the query performance by sharing the search cost. Although BC-DFS can terminate the search early, the barrier is set for one target point. For multi-target points, it becomes difficult to set the barrier. Therefore, it can only search nodes one by one, and its performance is worse than ours. SC-DFS has the worst performance among four road networks, and its average running time is dozens of times than that of other algorithms.

Fig. 6

Average runtime comparison

Effect of distance-constraint d

In the experiments of [14], they tested the average running time and the number of paths by varying the value of the hop constraint k and found that these two parameters grow exponentially with k. Therefore, we similarly evaluate the running time of the algorithm on different road networks by changing the distance constraint d. Differently, [14] uses the unweighted graph, while we use the weighted graph. We set the distance constraint d according to their experimental results. According to the average weight of each road network in Table 1, we set d as 400 m, 500 m, 600 m, 700 m, 800 m, 900 m for AMS, BER and Oslo and set d as 1500 m, 2000 m, 2500 m, 3000 m for Beijing. As shown in Fig. 7, for on Amsterdam, OSLO, Berlin and for on Beijing road network, we observe similar results as those in Fig. 6. For on Amsterdam, OSLO, Berlin and on the Beijing road network, the performance of BQ-DFS and BC-DFS becomes similar. In Sect. 5.4,we know that when d increases, the number of paths will explode, which makes the cost of searching very high. Since the average weight is small relative to d, an edge may be accessed many times. The number increased exponentially. In addition, we find that the low growth rate of SC-DFS is due to its utilization of the shortest distance for pruning, which takes much time to find the shortest distance for each point. Subsequently, it can be used for pruning, but the time still increases when d grows.

Fig. 7

Effect of distance-constraint d

Number of d-constrained paths

In this subsection, we report the average number of paths on all datasets with different distance constraints values in Fig. 8. As expected, the number of paths grows exponentially with d. The number of paths increases rapidly because the weight of edges is too small; thus, the number of edges visited increases and the same edges will be visited repeatedly. However, these paths with large repetition are of little significance. We discussed it in our future work.

Fig. 8

Number of d-constrained paths

Conclusion

In this paper, firstly, we define a new relationship model with constraints used to evaluate the relationship between two points in the graph. Secondly, we propose a basic algorithm named SC-DFS. SC-DFS uses the shortest distance to prune. To improve efficiency, we propose a better algorithm called BQ-DFS. Specifically, BQ-DFS reduces the repeated searches because the paths explored can be shared. On the practical side, an extensive empirical study on four real-life graphs shows that BQ-DFS significantly outperformed BC-DFS when the distance constraint is small. In the future, we will boost the query performance by an index. Besides, it is also an interesting work to investigate parallel algorithms for the most relevant point query.

2 in total

1. A query language for biological networks.

Authors: Ulf Leser
Journal: Bioinformatics Date: 2005-09-01 Impact factor: 6.937

2. Efficient kNN Classification With Different Numbers of Nearest Neighbors.

Authors: Shichao Zhang; Xuelong Li; Ming Zong; Xiaofeng Zhu; Ruili Wang
Journal: IEEE Trans Neural Netw Learn Syst Date: 2017-04-12 Impact factor: 10.451

2 in total