Upload
egbert-sanders
View
219
Download
1
Embed Size (px)
Citation preview
PrefJoin: An Efficient Preference-Aware Join Operator
Mohamed E. KhalefaMohamed F. MokbelJustin Levandoski
2
Outline
• Preference Queries• Implementing a Preference Join• The PrefJoin Operator• Performance Analysis• Conclusion
3
Preference Queries
SELECT *FROM Hotels H
Restaurants RWHERE H.city = R.cityPREFERRING MIN H.Price,
MAX H.Rating,MIN BeachDistance(H.Location, Beach)MIN R.PriceMAX R.RatingMIN R.WaitTime
Top-K [VLDB99]
Skyline [ICDE01]
K-Dominance [SIGMOD06]
K-Frequency [EDBT06]
Multi-Objective [VLDB04]
Using Skyline/K-Dominance/K-Frequency/...
4
Outline
• Preference Methods• Implementing a Preference Join• The PrefJoin Operator• Performance Analysis• Conclusion
5
The “On-Top” ImplementationSELECT *FROM Hotels H
Restaurants RWHERE H.city = R.cityPREFERRING MIN H.Price, MAX H.Rating,
MIN BeachDistance(H.Location, Beach) MIN R.Price, MAX R.Rating, MIN R.WaitTime
Join
Restaurants Hotels
Top-K
Skyline
Mult-Objective
K-Frequency
K-Dominance
Easy to implement Inefficient
6
The “Custom” Implementation
SkylineJoin
Restaurants Hotels
K-DomJoin
Restaurants Hotels
K-FreqJoin
Restaurants Hotels
Top-KJoin
Restaurants Hotels
Mult-ObjJoin
Restaurants Hotels
…Good performance
Infeasible
Multi-relational skyline [ICDE07]Equijoin skyline [ICDE10]
Progressive multi-criteria [ICDE10]
TA & NRA [PODS01]Klee [VLDB05]
Rank-Join [VLDB03]
7
Outline
• Preference Methods• Implementing a Preference Join• The PrefJoin Operator
– Architecture– Functionality
• Performance Analysis• Conclusion
8
The PrefJoin Architecture
Restaurants Hotels
K-Dominance
PrefJoin
SkylineK-Frequency
Good performance
Extensible architecture / Sustainable
9
The PrefJoin Architecture: Comparisons
Join
Restaurants Hotels
Top-K
Skyline
Multi-Objective
K-Frequency
K-Dominance
The On-Top Approach
Work: Easy to ImplementPerformance: Poor
SkylineJoin
R H
K-DomJoin
R H
Top-KJoin
R H
Mult-ObjJoin
R H
K-FreqJoin
R H
The CustomApproach
Work: Difficult/UnsustainablePerformance: Good
…
K-FrequencySkylineTop-K
Multi-Obj
K-Dom
HotelsRestaurants
PrefJoin
PrefJoin
Work: Easy to Implement/ SustainablePerformance: Good
10
Outline
• Preference Methods• Implementing a Preference Join• The PrefJoin Operator
– Architecture– Functionality
• Performance Analysis• Conclusion
11
PrefJoin Functionality
R2R1 …..
Phase 1Local Pruning
Phase 2Data Preparation
Plocal Plocal Plocal
LocalPref LocalPref LocalPref
Ppairwise Ppairwise Ppairwise
DB(t) DB(t) DB(t)
Phase 3Joining
Candidate Preference Set
Prefine
Final Preference SetPhase 4Refinement
Rm
“Plugin” Functions
12
PrefJoin Functionality: Plugin Functions
• Semantics of three plugin functions determine preference join type
Plocal Ppairwise Prefine
Skyline Skyline Null
= SkylineJoin
Plocal Ppairwise Prefine
Skyline Skyline K-Dominance
= K-DominanceJoin
Plocal Ppairwise Prefine
Multi-Objective
= Multi-ObjectiveJoin
Multi-Objective
Multi-Objective
13
PrefJoin Functionality
R2R1 …..
Phase 1Local Pruning
Phase 2Data Preparation
Plocal Plocal Plocal
LocalPref LocalPref LocalPref
Ppairwise Ppairwise Ppairwise
DB(t) DB(t) DB(t)
Phase 3Joining
Candidate Preference Set
Prefine
Final Preference SetPhase 4Refinement
Rm
14
Phase 1: Local Pruning• Filter tuples from each input relation guaranteed not
to be preference answers• Filtered tuples are never considered againSELECT * FROM Hotels H, Restaurants RWHERE H.city = R.cityPREFERRING MIN H.Price, MAX H.Rating, MIN BeachDistance(H.Location, Beach)
MIN R.Price, MAX R.Rating, MIN R.WaitTimeUSING SKYLINEHotels
Hid City P D R
1 Hawaii 2 2 7
2 Hawaii 5 3 8
3 Hawaii 4 1 5
4 Hawaii 5 5 3
5 Hawaii 5 4 2
6 Seattle 3 3 8
7 Seattle 3 1 5
8 Seattle 2 3 5
9 Seattle 4 4 6
10 Seattle 5 5 3
Restaurants
Rid City P R W
1 Hawaii 5 3 1
2 Hawaii 3 4 2
3 Hawaii 4 5 4
4 Hawaii 5 4 8
5 Hawaii 7 1 6
6 Seattle 4 3 1
7 Seattle 5 5 4
8 Seattle 3 3 4
9 Seattle 4 2 5
10 Seattle 6 1 6
Hawaii
Hid P D R
1 2 2 7
2 5 3 8
3 4 1 5
4 5 5 3
5 5 4 2
Seattle
Hid P D R
6 3 3 8
7 3 1 5
8 2 3 5
9 4 4 6
10 5 5 3
h(city)h(city)
Seattle
Rid P D R
6 4 3 1
7 5 5 4
8 3 3 4
9 1 2 5
10 6 1 6
Hawaii
Rid P R W
1 5 3 1
2 3 4 2
3 4 5 4
4 5 4 6
5 7 4 6
Plocal
Plocal
Plocal
Plocal
LocalPref (Hotels) LocalPref (Restaurants)
15
PrefJoin Functionality
R2R1 …..
Phase 1Local Pruning
Phase 2Data Preparation
Plocal Plocal Plocal
LocalPref LocalPref LocalPref
Ppairwise Ppairwise Ppairwise
DB(t) DB(t) DB(t)
Phase 3Joining
Candidate Preference Set
Prefine
Final Preference SetPhase 4Refinement
Rm
16
Phase 2: Data Preparation• Associate dominance metadata with tuples• Helps to reduce output of join phase
Hotels
Hid City P D R DB
1 Hawaii 2 2 7
2 Hawaii 5 3 8
3 Hawaii 4 1 5
6 Seattle 3 3 8
7 Seattle 3 1 5
8 Seattle 2 3 5
Hawaii
Hid P D R
1 2 2 7
2 5 3 8
3 4 1 5
Seattle
Hid P D R
6 3 3 8
7 3 1 5
8 2 3 5
9 4 4 6
10 5 5 3
Hotel LocalPref Set
Hotel Buckets
null
S
S
null
null
H
Restaurants
Rid City P R W DB
1 Hawaii 5 3 1 S
2 Hawaii 3 4 2 null
3 Hawaii 4 5 4 null
6 Seattle 4 3 1 null
7 Seattle 5 5 4 H
Restaurants LocalPref Set
Ppairwise
Ppairwise
17
PrefJoin Functionality
R2R1 …..
Phase 1Local Pruning
Phase 2Data Preparation
Plocal Plocal Plocal
LocalPref LocalPref LocalPref
Ppairwise Ppairwise Ppairwise
DB(t) DB(t) DB(t)
Phase 3Joining
Candidate Preference Set
Prefine
Final Preference SetPhase 4Refinement
Rm
18
Phase 3: Joining• Join input to produce candidate preference set
– Use metadata from previous phase as extra join predicate– Greatly reduces false positive preference answers
Hotels
Hid City P D R DB
1 Hawaii 2 2 7 null
2 Hawaii 5 3 8 S
3 Hawaii 4 1 5 S
6 Seattle 3 3 8 null
7 Seattle 3 1 5 null
8 Seattle 2 3 5 H
Restaurants
Rid City P R W DB
1 Hawaii 5 3 1 S
2 Hawaii 3 4 2 null
3 Hawaii 4 5 4 null
6 Seattle 4 3 1 null
7 Seattle 5 5 4 H
DB set intersection is not nullDB set intersection is null
Hotels
Hid Rid City HP HD HR RP RR RW
1 1 Hawaii 2 2 7 5 3 1
1 2 Hawaii 2 2 7 3 4 2
1 3 Hawaii 2 2 7 4 5 4
2 2 Hawaii 5 3 8 3 4 2
2 3 Hawaii 5 3 8 4 5 4
3 2 Hawaii 4 1 5 3 4 2
3 3 Hawaii 4 1 5 4 5 4
6 6 Seattle
3 3 8 4 3 1
6 7 Seattle
3 3 8 5 5 4
7 6 Seattle
3 1 5 4 3 1
7 7 Seattle
3 1 5 5 5 4
8 6 Seattle
2 3 5 4 3 1
Candidate Preference Set
19
PrefJoin Functionality
R2R1 …..
Phase 1Local Pruning
Phase 2Data Preparation
Plocal Plocal Plocal
LocalPref LocalPref LocalPref
Ppairwise Ppairwise Ppairwise
DB(t) DB(t) DB(t)
Phase 3Joining
Candidate Preference Set
Prefine
Final Preference SetPhase 4Refinement
Rm
20
Phase 4: Refinement• Apply final preference evaluation to join• Guarantees correct final preference answer• Optional phase
– Skyline does not require refinement phase– K-dominance does require refinement phase
Prefine
Final Preference Answer
Hotels
Hid Rid City HP HD HR RP RR RW
1 1 Hawaii 2 2 7 5 3 1
1 2 Hawaii 2 2 7 3 4 2
1 3 Hawaii 2 2 7 4 5 4
2 2 Hawaii 5 3 8 3 4 2
2 3 Hawaii 5 3 8 4 5 4
3 2 Hawaii 4 1 5 3 4 2
3 3 Hawaii 4 1 5 4 5 4
6 6 Seattle
3 3 8 4 3 1
6 7 Seattle
3 3 8 5 5 4
7 6 Seattle
3 1 5 4 3 1
7 7 Seattle
3 1 5 5 5 4
8 6 Seattle
2 3 5 4 3 1
21
Outline
• Preference Methods• Implementing a Preference Join• The PrefJoin Operator• Performance Analysis• Conclusion
22
Performance Analysis
• PrefJoin implemented in PostgreSQL
• Comparison of performance against– FlexPref [ICDE10]: generic, extensible join– SkylineJoin [ICDE07]: skyline-specific join
23
Scalability Experiment
• Performance for increasing input sizes– Skyline– K-Dominance– Multi-objective
24
Varying Number of Preference Attributes
• Increasing number of preference attributes for Skyline preference method
• Increased number of attributes increases preference answer cardinality
25
Outline
• Preference Methods• Implementing a Preference Join• The PrefJoin Operator• Performance Analysis• Conclusion
26
Conclusion and Summary
• Many (possibly infinite) preference methods• Three approaches to supporting preference join queries
– “On-top” approach: easy but inefficient– “Custom implementation” approach: efficient yet infeasible– PrefJoin’s “extensible” approach: efficient and feasible
• PrefJoin architecture– Four-phase approach– Uses three “plug-in” preference functions to determine preference join
semantics
• Performance analysis– Experiments with PostgreSQL implementation– Superior performance compared to existing custom and generic
preference join algorithms
27
Thank You
Questions
28
Preference Method Examples
Price
Dis
tan
ce
R1
R3
R4R2
R7
SELECT * FROM Restaurants RPREFERRING MIN R.Price, MIN R.Distance
R8
Skyline answer:{R1, R3, R5}
Restaurants
Id Price Distance
R1 1 9
R2 7 5.5
R3 5 4
R4 6 5
R5 10 1
R6 9 6
R7 11 2
R8 8 10
R6
R5Price
Dis
tan
ce
R1
R3
R4R2
R7
R8
Top-K Domination answer:{R3, R4, R2}
R6
R5
The Skyline Method The Top-K Domination Method