28
PrefJoin: An Efficient Preference-Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

Embed Size (px)

Citation preview

Page 1: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

PrefJoin: An Efficient Preference-Aware Join Operator

Mohamed E. KhalefaMohamed F. MokbelJustin Levandoski

Page 2: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

2

Outline

• Preference Queries• Implementing a Preference Join• The PrefJoin Operator• Performance Analysis• Conclusion

Page 3: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

3

Preference Queries

SELECT *FROM Hotels H

Restaurants RWHERE H.city = R.cityPREFERRING MIN H.Price,

MAX H.Rating,MIN BeachDistance(H.Location, Beach)MIN R.PriceMAX R.RatingMIN R.WaitTime

Top-K [VLDB99]

Skyline [ICDE01]

K-Dominance [SIGMOD06]

K-Frequency [EDBT06]

Multi-Objective [VLDB04]

Using Skyline/K-Dominance/K-Frequency/...

Page 4: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

4

Outline

• Preference Methods• Implementing a Preference Join• The PrefJoin Operator• Performance Analysis• Conclusion

Page 5: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

5

The “On-Top” ImplementationSELECT *FROM Hotels H

Restaurants RWHERE H.city = R.cityPREFERRING MIN H.Price, MAX H.Rating,

MIN BeachDistance(H.Location, Beach) MIN R.Price, MAX R.Rating, MIN R.WaitTime

Join

Restaurants Hotels

Top-K

Skyline

Mult-Objective

K-Frequency

K-Dominance

Easy to implement Inefficient

Page 6: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

6

The “Custom” Implementation

SkylineJoin

Restaurants Hotels

K-DomJoin

Restaurants Hotels

K-FreqJoin

Restaurants Hotels

Top-KJoin

Restaurants Hotels

Mult-ObjJoin

Restaurants Hotels

…Good performance

Infeasible

Multi-relational skyline [ICDE07]Equijoin skyline [ICDE10]

Progressive multi-criteria [ICDE10]

TA & NRA [PODS01]Klee [VLDB05]

Rank-Join [VLDB03]

Page 7: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

7

Outline

• Preference Methods• Implementing a Preference Join• The PrefJoin Operator

– Architecture– Functionality

• Performance Analysis• Conclusion

Page 8: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

8

The PrefJoin Architecture

Restaurants Hotels

K-Dominance

PrefJoin

SkylineK-Frequency

Good performance

Extensible architecture / Sustainable

Page 9: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

9

The PrefJoin Architecture: Comparisons

Join

Restaurants Hotels

Top-K

Skyline

Multi-Objective

K-Frequency

K-Dominance

The On-Top Approach

Work: Easy to ImplementPerformance: Poor

SkylineJoin

R H

K-DomJoin

R H

Top-KJoin

R H

Mult-ObjJoin

R H

K-FreqJoin

R H

The CustomApproach

Work: Difficult/UnsustainablePerformance: Good

K-FrequencySkylineTop-K

Multi-Obj

K-Dom

HotelsRestaurants

PrefJoin

PrefJoin

Work: Easy to Implement/ SustainablePerformance: Good

Page 10: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

10

Outline

• Preference Methods• Implementing a Preference Join• The PrefJoin Operator

– Architecture– Functionality

• Performance Analysis• Conclusion

Page 11: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

11

PrefJoin Functionality

R2R1 …..

Phase 1Local Pruning

Phase 2Data Preparation

Plocal Plocal Plocal

LocalPref LocalPref LocalPref

Ppairwise Ppairwise Ppairwise

DB(t) DB(t) DB(t)

Phase 3Joining

Candidate Preference Set

Prefine

Final Preference SetPhase 4Refinement

Rm

“Plugin” Functions

Page 12: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

12

PrefJoin Functionality: Plugin Functions

• Semantics of three plugin functions determine preference join type

Plocal Ppairwise Prefine

Skyline Skyline Null

= SkylineJoin

Plocal Ppairwise Prefine

Skyline Skyline K-Dominance

= K-DominanceJoin

Plocal Ppairwise Prefine

Multi-Objective

= Multi-ObjectiveJoin

Multi-Objective

Multi-Objective

Page 13: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

13

PrefJoin Functionality

R2R1 …..

Phase 1Local Pruning

Phase 2Data Preparation

Plocal Plocal Plocal

LocalPref LocalPref LocalPref

Ppairwise Ppairwise Ppairwise

DB(t) DB(t) DB(t)

Phase 3Joining

Candidate Preference Set

Prefine

Final Preference SetPhase 4Refinement

Rm

Page 14: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

14

Phase 1: Local Pruning• Filter tuples from each input relation guaranteed not

to be preference answers• Filtered tuples are never considered againSELECT * FROM Hotels H, Restaurants RWHERE H.city = R.cityPREFERRING MIN H.Price, MAX H.Rating, MIN BeachDistance(H.Location, Beach)

MIN R.Price, MAX R.Rating, MIN R.WaitTimeUSING SKYLINEHotels

Hid City P D R

1 Hawaii 2 2 7

2 Hawaii 5 3 8

3 Hawaii 4 1 5

4 Hawaii 5 5 3

5 Hawaii 5 4 2

6 Seattle 3 3 8

7 Seattle 3 1 5

8 Seattle 2 3 5

9 Seattle 4 4 6

10 Seattle 5 5 3

Restaurants

Rid City P R W

1 Hawaii 5 3 1

2 Hawaii 3 4 2

3 Hawaii 4 5 4

4 Hawaii 5 4 8

5 Hawaii 7 1 6

6 Seattle 4 3 1

7 Seattle 5 5 4

8 Seattle 3 3 4

9 Seattle 4 2 5

10 Seattle 6 1 6

Hawaii

Hid P D R

1 2 2 7

2 5 3 8

3 4 1 5

4 5 5 3

5 5 4 2

Seattle

Hid P D R

6 3 3 8

7 3 1 5

8 2 3 5

9 4 4 6

10 5 5 3

h(city)h(city)

Seattle

Rid P D R

6 4 3 1

7 5 5 4

8 3 3 4

9 1 2 5

10 6 1 6

Hawaii

Rid P R W

1 5 3 1

2 3 4 2

3 4 5 4

4 5 4 6

5 7 4 6

Plocal

Plocal

Plocal

Plocal

LocalPref (Hotels) LocalPref (Restaurants)

Page 15: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

15

PrefJoin Functionality

R2R1 …..

Phase 1Local Pruning

Phase 2Data Preparation

Plocal Plocal Plocal

LocalPref LocalPref LocalPref

Ppairwise Ppairwise Ppairwise

DB(t) DB(t) DB(t)

Phase 3Joining

Candidate Preference Set

Prefine

Final Preference SetPhase 4Refinement

Rm

Page 16: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

16

Phase 2: Data Preparation• Associate dominance metadata with tuples• Helps to reduce output of join phase

Hotels

Hid City P D R DB

1 Hawaii 2 2 7

2 Hawaii 5 3 8

3 Hawaii 4 1 5

6 Seattle 3 3 8

7 Seattle 3 1 5

8 Seattle 2 3 5

Hawaii

Hid P D R

1 2 2 7

2 5 3 8

3 4 1 5

Seattle

Hid P D R

6 3 3 8

7 3 1 5

8 2 3 5

9 4 4 6

10 5 5 3

Hotel LocalPref Set

Hotel Buckets

null

S

S

null

null

H

Restaurants

Rid City P R W DB

1 Hawaii 5 3 1 S

2 Hawaii 3 4 2 null

3 Hawaii 4 5 4 null

6 Seattle 4 3 1 null

7 Seattle 5 5 4 H

Restaurants LocalPref Set

Ppairwise

Ppairwise

Page 17: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

17

PrefJoin Functionality

R2R1 …..

Phase 1Local Pruning

Phase 2Data Preparation

Plocal Plocal Plocal

LocalPref LocalPref LocalPref

Ppairwise Ppairwise Ppairwise

DB(t) DB(t) DB(t)

Phase 3Joining

Candidate Preference Set

Prefine

Final Preference SetPhase 4Refinement

Rm

Page 18: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

18

Phase 3: Joining• Join input to produce candidate preference set

– Use metadata from previous phase as extra join predicate– Greatly reduces false positive preference answers

Hotels

Hid City P D R DB

1 Hawaii 2 2 7 null

2 Hawaii 5 3 8 S

3 Hawaii 4 1 5 S

6 Seattle 3 3 8 null

7 Seattle 3 1 5 null

8 Seattle 2 3 5 H

Restaurants

Rid City P R W DB

1 Hawaii 5 3 1 S

2 Hawaii 3 4 2 null

3 Hawaii 4 5 4 null

6 Seattle 4 3 1 null

7 Seattle 5 5 4 H

DB set intersection is not nullDB set intersection is null

Hotels

Hid Rid City HP HD HR RP RR RW

1 1 Hawaii 2 2 7 5 3 1

1 2 Hawaii 2 2 7 3 4 2

1 3 Hawaii 2 2 7 4 5 4

2 2 Hawaii 5 3 8 3 4 2

2 3 Hawaii 5 3 8 4 5 4

3 2 Hawaii 4 1 5 3 4 2

3 3 Hawaii 4 1 5 4 5 4

6 6 Seattle

3 3 8 4 3 1

6 7 Seattle

3 3 8 5 5 4

7 6 Seattle

3 1 5 4 3 1

7 7 Seattle

3 1 5 5 5 4

8 6 Seattle

2 3 5 4 3 1

Candidate Preference Set

Page 19: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

19

PrefJoin Functionality

R2R1 …..

Phase 1Local Pruning

Phase 2Data Preparation

Plocal Plocal Plocal

LocalPref LocalPref LocalPref

Ppairwise Ppairwise Ppairwise

DB(t) DB(t) DB(t)

Phase 3Joining

Candidate Preference Set

Prefine

Final Preference SetPhase 4Refinement

Rm

Page 20: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

20

Phase 4: Refinement• Apply final preference evaluation to join• Guarantees correct final preference answer• Optional phase

– Skyline does not require refinement phase– K-dominance does require refinement phase

Prefine

Final Preference Answer

Hotels

Hid Rid City HP HD HR RP RR RW

1 1 Hawaii 2 2 7 5 3 1

1 2 Hawaii 2 2 7 3 4 2

1 3 Hawaii 2 2 7 4 5 4

2 2 Hawaii 5 3 8 3 4 2

2 3 Hawaii 5 3 8 4 5 4

3 2 Hawaii 4 1 5 3 4 2

3 3 Hawaii 4 1 5 4 5 4

6 6 Seattle

3 3 8 4 3 1

6 7 Seattle

3 3 8 5 5 4

7 6 Seattle

3 1 5 4 3 1

7 7 Seattle

3 1 5 5 5 4

8 6 Seattle

2 3 5 4 3 1

Page 21: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

21

Outline

• Preference Methods• Implementing a Preference Join• The PrefJoin Operator• Performance Analysis• Conclusion

Page 22: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

22

Performance Analysis

• PrefJoin implemented in PostgreSQL

• Comparison of performance against– FlexPref [ICDE10]: generic, extensible join– SkylineJoin [ICDE07]: skyline-specific join

Page 23: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

23

Scalability Experiment

• Performance for increasing input sizes– Skyline– K-Dominance– Multi-objective

Page 24: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

24

Varying Number of Preference Attributes

• Increasing number of preference attributes for Skyline preference method

• Increased number of attributes increases preference answer cardinality

Page 25: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

25

Outline

• Preference Methods• Implementing a Preference Join• The PrefJoin Operator• Performance Analysis• Conclusion

Page 26: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

26

Conclusion and Summary

• Many (possibly infinite) preference methods• Three approaches to supporting preference join queries

– “On-top” approach: easy but inefficient– “Custom implementation” approach: efficient yet infeasible– PrefJoin’s “extensible” approach: efficient and feasible

• PrefJoin architecture– Four-phase approach– Uses three “plug-in” preference functions to determine preference join

semantics

• Performance analysis– Experiments with PostgreSQL implementation– Superior performance compared to existing custom and generic

preference join algorithms

Page 27: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

27

Thank You

Questions

Page 28: PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

28

Preference Method Examples

Price

Dis

tan

ce

R1

R3

R4R2

R7

SELECT * FROM Restaurants RPREFERRING MIN R.Price, MIN R.Distance

R8

Skyline answer:{R1, R3, R5}

Restaurants

Id Price Distance

R1 1 9

R2 7 5.5

R3 5 4

R4 6 5

R5 10 1

R6 9 6

R7 11 2

R8 8 10

R6

R5Price

Dis

tan

ce

R1

R3

R4R2

R7

R8

Top-K Domination answer:{R3, R4, R2}

R6

R5

The Skyline Method The Top-K Domination Method