Background
Network Distance
- In many real applications accessibility of objects is restricted by a spatial network
- Examples
- Driver looking for nearest gas station
- Mobile user looking for nearest restaurant
- Examples
- Shortest path distance used instead of Euclidean distance
- SP(a,b) = path between a and b with the minimum accumulated length
Challenges
- Euclidean distance is no longer relevant
- R-tree may not be useful, when search is based on shortest path distance
- Graph cannot be flattened to a one-dimensional space
- Special storage and indexing techniques for graphs are required
- Graph properties may vary
- directed vs. undirected
- length, time, etc. as edge weights
Modeling and Storing Spatial Networks
Modeling Spatial Networks
- Adjacency matrix only appropriate for dense graphs
- Spatial networks are sparse: use adjacency lists instead
Storing Large Spatial Networks
- Problem: adjacency lists representation may not fit in memory if graph is large
- Solution:
- partition adjacency lists to disk blocks (based on proximity)
- create B+-tree index on top of partitions (based on node-id)
Shortest Path Search
- Given a graph G(V,E), and two nodes s,t in V, find the shortest path from s to t
- A classic algorithmic problem
- Studied extensively since the 1950’s
- Several methods:
- Dijkstra’s algorithm
- A*-search
- Bi-directional search
Dijkstra’s Shortest Path Search
- idea: incrementally explore the graph around s, visitingnodes in distance order to suntil t is found (like NN)
Algorithm
Example
Illustrating
- Find the shortest path between a and b.
- Worst-case performance O(|E| + |V|log|V| )
A*-search
Description
Dijkstra’s search explores nodes around s without a specific search direction until t is found
Idea: improve Dijkstra’s algorithm by directing search towards t
Due to triangular inequality, Euclidean distance is a lower bound of network distance
Use Euclidean distance to lower bound network distance based on known information:
- Nodes are visited in increasing SPD(s,v)+dist(v,t) order
- SPD(s,v): shortest path distance from s to v (computed by Dijkstra)
- dist(v,t): Euclidean distance between v and t
- Original Dijkstra visits nodes in increasing SPD(s,v) order
- Nodes are visited in increasing SPD(s,v)+dist(v,t) order
Example
Illustrating
- Find the shortest path between s and t.
- f(p) = Dijkstra_dist(s, p) + Euclidean_dist(p, t)
Bi-directional search
- Dijkstra’s search explores nodes around s without a specific search direction until t is found
- Idea: search can be performed concurrently from s and from t (backwards)
- The shortest path tree of s and the (backward) shortest path tree of t are computed in concurrently
- One queue Q_s for forward and one queue Q_t for backward search
- Node visits are prioritized based on min(SPD(s,v), SPD(v,t))
- If v already visited from s and v is in Qt, then candidate shortest path: p(s,v)+p(v,t) (if v already visited from t and v in Q_s symmetric)
- If v is visited by both s and t terminate search; report best candidate shortest path
Example
Discussions
- A* and bi-directional search can be combined to powerful search techniques
- A* can only be applied if lower distance bounds are available
- All versions of Dijkstra’s search require non-negative edge weights
- Bellman-Ford is an algorithm for arbitrary negative edges
Spatial queries over spatial networks
Introduction
Source/Destination on Edges
- We have assumed that points s and t are nodes of the network
- In practice s and t could be arbitrary points on edges
- Mobile user locations
- Solve problem by introducing 2 more nodes
Spatial Queries over Spatial Networks
- Data:
- A (static) spatial network (e.g., city map)
- A (dynamic) set of spatial objects
- Spatial queries based on network distance:
- Selections. Ex: find gas stations within 10km driving distance from here
- Nearest neighbor search. Ex: find k nearest restaurants from present position
- Joins. Ex: find pairs of restaurants and hotels at most 100m from each other
Methodology
- Store (and index) the spatial network
- Graph component (indexes connectivity information)
- Spatial component (indexes coordinates of nodes, edges, etc.)
- Store (and index) the sets of spatial objects
- Ex., one spatial relation for restaurants, one spatial relation for hotels, one relation for mobile users, etc.
- Given a spatial location p, use spatial component of network to find the network edge containing p
- Given a network edge, use network component to traverse neighboring edges
- Given a neighboring edge, use spatial indexes to find objects on them
Evaluation of Spatial Selections (1)
- Query: find all objects in spatial relation R, within network distance ε from location q
- Method:
- Use spatial index of network (R-tree indexing network edges) to find edge n_1n_2, which includes q
- Use adjacency index of network (graph component) and apply Dijkstra’s algorithm to progressively retrieve edges that are within network distance ε from location q
- For all these edges apply a spatial selection on the R-tree that indexes R to find the results
Example
Example: Find restaurants at most distance 10 from q
Step 1: find network edge which contains q
- Step 2: traverse network to find all edges (or parts of them within distance 10 from q)
- Step 3: find restaurants that intersect the subnetwork computed at step 2
Evaluation of Spatial Selections (2)
Description
- Query: find all objects in spatial relation R, within network distance ε from location q
- Alternative method based on Euclidean bounds:
- Assumption: Euclidean distance is a lower-bound of network distance:
- dist(v,u) ≤ SPD(v,u), for any v,u
- Use R-tree on R to find set S of objects such that for each o in S: dist(q,o) ≤ ε
- For each o in S:
- find where o is located in the network (use Network R-tree)
- compute SPD(q,o) (e.g. use A*)
- If SPD(q,o) ≤ ε then output o
- Assumption: Euclidean distance is a lower-bound of network distance:
Example
Example: Find restaurants at most distance 10 from q
Step 1: find restaurants for which the Euclidean distance to q is at most 10: S={r1,r2,r3}
- Step 2: for each restaurant in S, compute SPD to q and verify if it is indeed a correct result
Evaluation of NN search (1)
- Query: find in spatial relation R the nearest object to a given location q
- Method:
- Use spatial index of network (R-tree indexing network edges) to find edge n_1n_2, which includes q
- Use adjacency index of network (graph component) and apply Dijkstra’s algorithm to progressively retrieve edges in order of their distance to q
- For each edge apply a spatial selection on the R-tree that indexes R to find any objects
- Keep track of nearest object found so far; use its shortest path distance to terminate network browsing
Example
- Example: Find nearest restaurant to q
- Step: in ppt 31
Evaluation of NN search (2)
- Query: find in spatial relation R the nearest object to a given location q
- Alternative method based on Euclidean bounds:
- Assumption: Euclidean distance lower-bounds network distance:
- dist(v,u) ≤ SPD(v,u), for any v,u
- Assumption: Euclidean distance lower-bounds network distance:
Spatial Join Queries
Description
- Query: find pairs (r,s), such that r in relation R, s in relation S, and SPD(r,s)≤ε
- Methods:
- For each r in R, do an ε-distance selection queries for objects in S (Index Nested Loops)
- For each pair (r,s), such that Euclidean dist(r,s)≤ε compute SPD(r,s) and verify SPD(r,s)≤ε
Notes on Query Evaluation based on Network Distance
- For each query type, there are methods based on network browsing and methods based on Euclidean bounds
- Network browsing methods are fast if network edges are densely populated with points of interest
- A limited network traversal can find the result fast
- Methods based on Euclidean bounds are good if the searched POIs are sparsely distributed in the network
- Few verifications with exact SP searches are required
- Directed SP search (e.g. using A*) avoids visiting empty parts of the network
Advanced indexing techniques for spatial networks
Shortest Path Materialization and Indexing in Large Graphs
- Dijkstra’s algorithm and related methods could be very expensive on very large graphs
- (Partial) materialization of shortest paths in static graphs can accelerate search
Hierarchical Path Materialization
- Idea: Partition graph G into G_1,G_2,G_3,… based on connectivity and proximity of nodes
- Every edge of G goes to exactly one G_i
- Border nodes belong to more than one G_i’s
- For each G_i compute and materialize SPs between every pair of nodes in G_i (matrix M_i)
- Partitions are small enough for materialization space overhead to be low
- Compute and materialize SPs between every pair of border nodes (matrix B)
- If border nodes too many, hierarchically partition them into 2nd-level partitions
algorithm
Illustrating
- Good partitioning if:
- small partitions
- few combinations examined for SP search
- Real road networks:
- Non-highway nodes in local partitions
- Highway nodes become border nodes
Compressing Materialized Paths
- Distance matrix with successors has O(n_2) space cost
- Motivation: reduce space by grouping targets based on common successors
algorithm
- Create and encode one space partitioning defined by targets of the same successor
- For each node s, index Is a set of <succ,R> pairs:
- succ: a successor of s
- R: a continuous region, such that for each t in R, the successor of s in SP(s,t) is succ
- To compute SP(s,t) for a given s, t:
- SP=s
- Use spatial index Is to find <succ,R>, such that t in R
- SP = SP + (s,succ)
- If succ = t, report SP and terminate
- Otherwise s=succ; Goto step 2
Summary
- Indexing and search of spatial networks is different than spatial indexing
- Shortest path distance is used instead of Euclidean distance, to define range queries, nearest neighbor search, and spatial joins
- Spatial networks could be too large to fit in memory
- Disk-based index for adjacency lists is used
- Several shortest path algorithms
- Spatial queries can be evaluated using Euclidean bounds
- Advanced indexing methods for shortest path search on large graphs