PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin...

Preview:

Citation preview

PrefJoin: An Efficient Preference-Aware Join Operator

Mohamed E. KhalefaMohamed F. MokbelJustin Levandoski

2

Outline

• Preference Queries• Implementing a Preference Join• The PrefJoin Operator• Performance Analysis• Conclusion

3

Preference Queries

SELECT *FROM Hotels H

Restaurants RWHERE H.city = R.cityPREFERRING MIN H.Price,

MAX H.Rating,MIN BeachDistance(H.Location, Beach)MIN R.PriceMAX R.RatingMIN R.WaitTime

Top-K [VLDB99]

Skyline [ICDE01]

K-Dominance [SIGMOD06]

K-Frequency [EDBT06]

Multi-Objective [VLDB04]

Using Skyline/K-Dominance/K-Frequency/...

4

Outline

• Preference Methods• Implementing a Preference Join• The PrefJoin Operator• Performance Analysis• Conclusion

5

The “On-Top” ImplementationSELECT *FROM Hotels H

Restaurants RWHERE H.city = R.cityPREFERRING MIN H.Price, MAX H.Rating,

MIN BeachDistance(H.Location, Beach) MIN R.Price, MAX R.Rating, MIN R.WaitTime

Join

Restaurants Hotels

Top-K

Skyline

Mult-Objective

K-Frequency

K-Dominance

Easy to implement Inefficient

6

The “Custom” Implementation

SkylineJoin

Restaurants Hotels

K-DomJoin

Restaurants Hotels

K-FreqJoin

Restaurants Hotels

Top-KJoin

Restaurants Hotels

Mult-ObjJoin

Restaurants Hotels

…Good performance

Infeasible

Multi-relational skyline [ICDE07]Equijoin skyline [ICDE10]

Progressive multi-criteria [ICDE10]

TA & NRA [PODS01]Klee [VLDB05]

Rank-Join [VLDB03]

7

Outline

• Preference Methods• Implementing a Preference Join• The PrefJoin Operator

– Architecture– Functionality

• Performance Analysis• Conclusion

8

The PrefJoin Architecture

Restaurants Hotels

K-Dominance

PrefJoin

SkylineK-Frequency

Good performance

Extensible architecture / Sustainable

9

The PrefJoin Architecture: Comparisons

Join

Restaurants Hotels

Top-K

Skyline

Multi-Objective

K-Frequency

K-Dominance

The On-Top Approach

Work: Easy to ImplementPerformance: Poor

SkylineJoin

R H

K-DomJoin

R H

Top-KJoin

R H

Mult-ObjJoin

R H

K-FreqJoin

R H

The CustomApproach

Work: Difficult/UnsustainablePerformance: Good

K-FrequencySkylineTop-K

Multi-Obj

K-Dom

HotelsRestaurants

PrefJoin

PrefJoin

Work: Easy to Implement/ SustainablePerformance: Good

10

Outline

• Preference Methods• Implementing a Preference Join• The PrefJoin Operator

– Architecture– Functionality

• Performance Analysis• Conclusion

11

PrefJoin Functionality

R2R1 …..

Phase 1Local Pruning

Phase 2Data Preparation

Plocal Plocal Plocal

LocalPref LocalPref LocalPref

Ppairwise Ppairwise Ppairwise

DB(t) DB(t) DB(t)

Phase 3Joining

Candidate Preference Set

Prefine

Final Preference SetPhase 4Refinement

Rm

“Plugin” Functions

12

PrefJoin Functionality: Plugin Functions

• Semantics of three plugin functions determine preference join type

Plocal Ppairwise Prefine

Skyline Skyline Null

= SkylineJoin

Plocal Ppairwise Prefine

Skyline Skyline K-Dominance

= K-DominanceJoin

Plocal Ppairwise Prefine

Multi-Objective

= Multi-ObjectiveJoin

Multi-Objective

Multi-Objective

13

PrefJoin Functionality

R2R1 …..

Phase 1Local Pruning

Phase 2Data Preparation

Plocal Plocal Plocal

LocalPref LocalPref LocalPref

Ppairwise Ppairwise Ppairwise

DB(t) DB(t) DB(t)

Phase 3Joining

Candidate Preference Set

Prefine

Final Preference SetPhase 4Refinement

Rm

14

Phase 1: Local Pruning• Filter tuples from each input relation guaranteed not

to be preference answers• Filtered tuples are never considered againSELECT * FROM Hotels H, Restaurants RWHERE H.city = R.cityPREFERRING MIN H.Price, MAX H.Rating, MIN BeachDistance(H.Location, Beach)

MIN R.Price, MAX R.Rating, MIN R.WaitTimeUSING SKYLINEHotels

Hid City P D R

1 Hawaii 2 2 7

2 Hawaii 5 3 8

3 Hawaii 4 1 5

4 Hawaii 5 5 3

5 Hawaii 5 4 2

6 Seattle 3 3 8

7 Seattle 3 1 5

8 Seattle 2 3 5

9 Seattle 4 4 6

10 Seattle 5 5 3

Restaurants

Rid City P R W

1 Hawaii 5 3 1

2 Hawaii 3 4 2

3 Hawaii 4 5 4

4 Hawaii 5 4 8

5 Hawaii 7 1 6

6 Seattle 4 3 1

7 Seattle 5 5 4

8 Seattle 3 3 4

9 Seattle 4 2 5

10 Seattle 6 1 6

Hawaii

Hid P D R

1 2 2 7

2 5 3 8

3 4 1 5

4 5 5 3

5 5 4 2

Seattle

Hid P D R

6 3 3 8

7 3 1 5

8 2 3 5

9 4 4 6

10 5 5 3

h(city)h(city)

Seattle

Rid P D R

6 4 3 1

7 5 5 4

8 3 3 4

9 1 2 5

10 6 1 6

Hawaii

Rid P R W

1 5 3 1

2 3 4 2

3 4 5 4

4 5 4 6

5 7 4 6

Plocal

Plocal

Plocal

Plocal

LocalPref (Hotels) LocalPref (Restaurants)

15

PrefJoin Functionality

R2R1 …..

Phase 1Local Pruning

Phase 2Data Preparation

Plocal Plocal Plocal

LocalPref LocalPref LocalPref

Ppairwise Ppairwise Ppairwise

DB(t) DB(t) DB(t)

Phase 3Joining

Candidate Preference Set

Prefine

Final Preference SetPhase 4Refinement

Rm

16

Phase 2: Data Preparation• Associate dominance metadata with tuples• Helps to reduce output of join phase

Hotels

Hid City P D R DB

1 Hawaii 2 2 7

2 Hawaii 5 3 8

3 Hawaii 4 1 5

6 Seattle 3 3 8

7 Seattle 3 1 5

8 Seattle 2 3 5

Hawaii

Hid P D R

1 2 2 7

2 5 3 8

3 4 1 5

Seattle

Hid P D R

6 3 3 8

7 3 1 5

8 2 3 5

9 4 4 6

10 5 5 3

Hotel LocalPref Set

Hotel Buckets

null

S

S

null

null

H

Restaurants

Rid City P R W DB

1 Hawaii 5 3 1 S

2 Hawaii 3 4 2 null

3 Hawaii 4 5 4 null

6 Seattle 4 3 1 null

7 Seattle 5 5 4 H

Restaurants LocalPref Set

Ppairwise

Ppairwise

17

PrefJoin Functionality

R2R1 …..

Phase 1Local Pruning

Phase 2Data Preparation

Plocal Plocal Plocal

LocalPref LocalPref LocalPref

Ppairwise Ppairwise Ppairwise

DB(t) DB(t) DB(t)

Phase 3Joining

Candidate Preference Set

Prefine

Final Preference SetPhase 4Refinement

Rm

18

Phase 3: Joining• Join input to produce candidate preference set

– Use metadata from previous phase as extra join predicate– Greatly reduces false positive preference answers

Hotels

Hid City P D R DB

1 Hawaii 2 2 7 null

2 Hawaii 5 3 8 S

3 Hawaii 4 1 5 S

6 Seattle 3 3 8 null

7 Seattle 3 1 5 null

8 Seattle 2 3 5 H

Restaurants

Rid City P R W DB

1 Hawaii 5 3 1 S

2 Hawaii 3 4 2 null

3 Hawaii 4 5 4 null

6 Seattle 4 3 1 null

7 Seattle 5 5 4 H

DB set intersection is not nullDB set intersection is null

Hotels

Hid Rid City HP HD HR RP RR RW

1 1 Hawaii 2 2 7 5 3 1

1 2 Hawaii 2 2 7 3 4 2

1 3 Hawaii 2 2 7 4 5 4

2 2 Hawaii 5 3 8 3 4 2

2 3 Hawaii 5 3 8 4 5 4

3 2 Hawaii 4 1 5 3 4 2

3 3 Hawaii 4 1 5 4 5 4

6 6 Seattle

3 3 8 4 3 1

6 7 Seattle

3 3 8 5 5 4

7 6 Seattle

3 1 5 4 3 1

7 7 Seattle

3 1 5 5 5 4

8 6 Seattle

2 3 5 4 3 1

Candidate Preference Set

19

PrefJoin Functionality

R2R1 …..

Phase 1Local Pruning

Phase 2Data Preparation

Plocal Plocal Plocal

LocalPref LocalPref LocalPref

Ppairwise Ppairwise Ppairwise

DB(t) DB(t) DB(t)

Phase 3Joining

Candidate Preference Set

Prefine

Final Preference SetPhase 4Refinement

Rm

20

Phase 4: Refinement• Apply final preference evaluation to join• Guarantees correct final preference answer• Optional phase

– Skyline does not require refinement phase– K-dominance does require refinement phase

Prefine

Final Preference Answer

Hotels

Hid Rid City HP HD HR RP RR RW

1 1 Hawaii 2 2 7 5 3 1

1 2 Hawaii 2 2 7 3 4 2

1 3 Hawaii 2 2 7 4 5 4

2 2 Hawaii 5 3 8 3 4 2

2 3 Hawaii 5 3 8 4 5 4

3 2 Hawaii 4 1 5 3 4 2

3 3 Hawaii 4 1 5 4 5 4

6 6 Seattle

3 3 8 4 3 1

6 7 Seattle

3 3 8 5 5 4

7 6 Seattle

3 1 5 4 3 1

7 7 Seattle

3 1 5 5 5 4

8 6 Seattle

2 3 5 4 3 1

21

Outline

• Preference Methods• Implementing a Preference Join• The PrefJoin Operator• Performance Analysis• Conclusion

22

Performance Analysis

• PrefJoin implemented in PostgreSQL

• Comparison of performance against– FlexPref [ICDE10]: generic, extensible join– SkylineJoin [ICDE07]: skyline-specific join

23

Scalability Experiment

• Performance for increasing input sizes– Skyline– K-Dominance– Multi-objective

24

Varying Number of Preference Attributes

• Increasing number of preference attributes for Skyline preference method

• Increased number of attributes increases preference answer cardinality

25

Outline

• Preference Methods• Implementing a Preference Join• The PrefJoin Operator• Performance Analysis• Conclusion

26

Conclusion and Summary

• Many (possibly infinite) preference methods• Three approaches to supporting preference join queries

– “On-top” approach: easy but inefficient– “Custom implementation” approach: efficient yet infeasible– PrefJoin’s “extensible” approach: efficient and feasible

• PrefJoin architecture– Four-phase approach– Uses three “plug-in” preference functions to determine preference join

semantics

• Performance analysis– Experiments with PostgreSQL implementation– Superior performance compared to existing custom and generic

preference join algorithms

27

Thank You

Questions

28

Preference Method Examples

Price

Dis

tan

ce

R1

R3

R4R2

R7

SELECT * FROM Restaurants RPREFERRING MIN R.Price, MIN R.Distance

R8

Skyline answer:{R1, R3, R5}

Restaurants

Id Price Distance

R1 1 9

R2 7 5.5

R3 5 4

R4 6 5

R5 10 1

R6 9 6

R7 11 2

R8 8 10

R6

R5Price

Dis

tan

ce

R1

R3

R4R2

R7

R8

Top-K Domination answer:{R3, R4, R2}

R6

R5

The Skyline Method The Top-K Domination Method

Recommended