Upload
lenore-trujillo
View
31
Download
1
Embed Size (px)
DESCRIPTION
Selectivity Estimation of XPath for Cyclic Graphs. Yun Peng. Outline. Motivation Problem definition Prime number labeling Selectivity estimation Implementation. Motivation. - PowerPoint PPT Presentation
Citation preview
Selectivity Estimation of XPath for Cyclic Graphs
Yun Peng
Outline
Motivation Problem definition Prime number labeling Selectivity estimation Implementation
Motivation To retrieve sub graphs from large graph
databases efficiently, selectivity estimation is one of the most important query optimization technologies
An Example
Query q=//faculty[//RA][//TA] means to list all faculties that have both RA and TA To evaluate this query, we have two evaluation plans
One plan Find out faculties having RA. Result set size is 3. Find out faculties having TA from the intermediate results
Another plan Find out faculties having TA. Result set size is 2. Find out faculties having RA from the intermediate results
department
facul ty facul ty facul ty facul ty
name RA name TA RA TA RA RAname name
Problem Definition
Selectivity estimation is that given a query, estimate how many results are produced by this query without costly evaluation
department
facul ty facul ty facul ty facul ty
name RA name TA RA TA RA RAname name
q=//faculty[//RA]
Selectivity(q) = 3
Our methodology skeleton
Step1: label the graph nodes (pre-prepared)
Step2: Estimate query selectivity based on the pre-prepared labels (after a query comes)
Prime number labeling
Label each graph node with an integer, which is production of some prime numbers
Prime number labeling (cont.) Divisibility of labels implies ancestor-descendent
relationship
For example, 3*5*7*11 is divisible by 11, so node g is descendent of node a
Optimization
Replace integers by vectors
1 1 1 1
1 1 0 0
1 0 1 1
1 0 0 0
0 1 0 0
0 0 1 0
1 0 0 1
a
b
c
d
e
f
g
Optimization (cont.)
( ) ( ) 0VL a VL b implies node b is descendent of node a
Our methodology skeleton
Step1: label the graph nodes (pre-prepared)
Step2: Estimate query selectivity based on the pre-prepared labels (after a query comes)
Selectivity Estimation
Two dimensional histogram Originally designed for selectivity estimation on
trees [Jargadish 2004] Label each tree node by an interval, e.g. (l, r) Represent the interval by a dot (l, r) on the XOY
coordination system Partition the XOY plain to grids as buckets Estimate results using this histogram
Selectivity Estimation (cont.)
Optimization
Replace integers by vectors
1 1 1 1
1 1 0 0
1 0 1 1
1 0 0 0
0 1 0 0
0 0 1 0
1 0 0 1
a
b
c
d
e
f
g
Consecutive Ones Property Matrix Given a 0/1 matrix, if we can find an order of
columns such that all row’s 1s are consecutive, this matrix is called consecutive ones property matrix (C1P matrix)
Reorganization is linear Find the largest C1P sub matrix is NP and if 1s
number of each column is larger than 3, it is un- polynomial time approximatable
Add extra columns
0 1 2 3
1 1 1 1
1 1 0 0
1 0 1 1
1 0 0 0
0 1 0 0
0 0 1 0
1 0 0 1
a
b
c
d
e
f
g
0 1 2 3 4
1 1 1 1 0
1 1 0 0 0
0 0 1 1 1
: 4 01 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 1
a
b
c
Mapd
e
f
g
Add extra columns
Given a 0/1 matrix, add minimum number of extra columns such that result matrix is a C1P matrix is NP?
Heuristic algorithm
Duplicate Merge
1
2
3
1 2 3 4 5 6
1 1 1 0 1 1
0 1 1 0 1 0
0 0 1 1 1 1
r
r
r
Heuristic algorithm (cont.)
Heuristic Algorithm (cont.)
1
2
3
1 2 3 4 5 6
1 1 1 0 1 1
0 1 1 0 1 0
0 0 1 1 1 1
r
r
r
1
2
3
1 2 3 6 5 4
1 1 1 1 1 0
0 1 1 0 1 0 0
0 0 1 1 1 1
r
r
r
Selectivity Estimation (cont.)
Implementation
Implementation
Implementation