33
Creating Competitive Products Qian Wan [1] , Raymond Chi-Wing Wong [1] , Ihab F. Ilyas [2] , M. Tamer Ozsu [2] , Yu Peng [1] [1] Hong Kong University of Science and Technology [2] University of Waterloo Presented by Qian Wan Prepared by Qian Wan 1

Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

Embed Size (px)

Citation preview

Page 1: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

1

Creating Competitive Products

Qian Wan[1], Raymond Chi-Wing Wong[1], Ihab F. Ilyas[2], M. Tamer Ozsu[2], Yu Peng[1]

[1] Hong Kong University of Science and Technology

[2] University of WaterlooPresented by Qian WanPrepared by Qian Wan

Page 2: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

2

Outline

• Background– Skyline, Related Work

• Motivation– Examples, Problem Definition

• Algorithm– Framework, Grouping, Pruning

• Experiments– Synthetic, Real data– 6 factors

• Conclusions

Page 3: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

3

Skyline

• Definition– Skyline contains the points which are not dominated by

others

• Hotel searching problem– Distance to beach VS Price– Dominance– Skyline

Dist

Price

H3

H5

H7

H9

H1

H2

H4

H6

H8

Dist

Price

H1

H2

Page 4: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

4

Related Work

• Skyline Queries in DBMS [S.Borzsonyi, 2001]

• Single Table Skyline Queries– Bitmaps[K.L. Tan,2001], Nearest Neighbor[D.Kossomann,

2002], Branch and Bound Skylines[D.Papadias, 2005]

• Multi-Table Skyline Queries– Natural Join [W.Jin, 2007][D.Sun, 2008]

– Our Work• Join different source tables via a “Cartesian product”

like procedure.

Page 5: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

5

Outline

• Background– Skyline, Related Work

• Motivation– Examples, Problem Definition

• Algorithm– Framework, Grouping, Pruning

• Experiments– Synthetic, Real data– 6 factors

• Conclusions

Page 6: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

6

A Travel Agency’s DatabasePackage No-of-

stopsDistance-to-beach

Hotel-class Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

P4 1 150 4 300

Existing Vacation Packages

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 100

H2 200 2 90

H3 400 1 80

Flight No-of-stops

Flight-cost

F1 0 120

F2 1 100

Package No-of-stops

Distance-to-beach

Hotel-class Price

Q1(F1:H1) 0 100 3 220

Q2(F1,H2) 0 200 2 210

Q3(F1, H3) 0 400 1 200

… … … … …

Q24(f4,h6) 2 200 3 210

Newly Created Vacation Packages

Source Tables

1. Direct attributes2. Indirect attributes3. One indirect attribute characteristic e.g. Travel Agency (Price), PC Manufacture(Price) and Logistic Transportation Service (Price)

21,TT

ET

QT

Skyline tuples

Page 7: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

7

Finding Competitive Products

• Given a set of source tables• Market packages• New packages • Then, a tuple q in TQ is said to be competitive

product if q is in Skyline with respect to

kTTT ..., 21

ET

QT

QE TT

Page 8: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

8

Naïve Solution

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 100

H2 200 2 90

H3 400 1 80

H4 150 2 150

H5 170 2 140

H6 200 3 120

Flight No-of-stops

Flight-cost

F1 0 120

F2 1 100

F3 2 80

F4 2 90

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

… … … … …

Q24(f4,h6)

2 200 3 210

Package

No-of-stops

Distance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

P4 1 150 4 300

1. Intra-dominance checking2. Inter-dominance checking

Source Tables

Existing Vacation Packages

Newly Created Vacation Packages

Package

No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

Competitive Products

Page 9: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

9

Outline

• Background– Skyline, Related Work

• Motivation– Examples, Problem Definition

• Algorithm– Framework, Grouping, Pruning

• Experiments– Synthetic, Real data– 6 factors

• Conclusions

Page 10: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

10

Algorithm Overview

• Intra-dominance checking (Framework)– To Find Skyline in Source Tables

• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning

• Post-processing

Page 11: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

11

Intra-dominance Checking

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 100

H2 200 2 90

H3 400 1 80

H4 150 2 150

H5 170 2 140

H6 200 3 120

Flight No-of-stops

Flight-cost

F1 0 120

F2 1 100

F3 2 80

F4 2 90

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

… … … … …

Q15(f3,h5)

2 170 3 200

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 100

H2 200 2 90

H3 400 1 80

H4 150 2 150

H5 170 2 140

Flight No-of-stops

Flight-cost

F1 0 120

F2 1 100

F3 2 80

Skyline Tuples of Source Tables

Newly Created Vacation Packages

1. NO intra-dominance checking(one indirect attribute)2. NO competitive products are missing

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

Competitive Products

Page 12: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

12

Algorithm Overview

• Intra-dominance checking (Framework)– To Find Skyline in Source Tables

• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning

• Post-processing

Page 13: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

13

Inter-dominance Checking

Package No-of-stops

Distance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

P4 1 150 4 300

Package No-of-stops

Distance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

P4 1 150 4 300

Package No-of-stops

Distance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

No Missing Competitive Products

R* Tree will speedup the inter-dominance checking

Existing Vacation Packages Skyline in Existing

Vacation Packages

R0

R1

R3 R4

R2

R5

Inter-dominance Checking Range query

Page 14: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

14

Algorithm Overview

• Intra-dominance checking (Framework)– To Find Skyline in Source Tables

• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning

• Post-processing

Page 15: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

15

GroupingPackage No-of-

stopsDistance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

… … … … …

Q15(f3,h5)

2 170 3 200

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 100

H2 200 2 90

H3 400 1 80

H4 150 2 150

H5 170 2 140

Flight No-of-stops

Flight-cost

F1 0 120

F2 1 100

F3 2 80

Skyline Tuples of Source Tables

Newly Created Vacation Packages

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

Existing Vacation Packages

Competitive Products

A1

A2

B1

B2

C1={A1, B1}

C4={A2, B2}

Full Pruning

Page 16: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

16

Full PruningPackage No-of-

stopsDistance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

Best Representative

B1

B2

… … … … …

Bi

… … … … …

Bj

… … … … …

Bk

Group

C1

C2

… … … … …

Ci

… … … … …

Cj

… … … … …

Ck

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q(f2:h4) 1 150 4 250

Q’(f2,h5) 1 170 4 240

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Min 1 150 4 240

Quality of Best Representative: tightness of each group(Clustering, e.g. KMeans)

Best Representative

Page 17: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

17

Algorithm Overview

• Intra-dominance checking (Framework)– To Find Skyline in Source Tables

• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning

• Post-processing

Page 18: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

18

Partial Pruning

Partial Pruning Full pruning prunes all members in the group Partial pruning prunes some members in the group Partial pruning is used when full pruning cannot be applied

Idea Direct attribute does not change Estimate the best possible value for indirect attributes Eliminate a combination , if

It is dominated on all direct attributes It is dominated on all indirect attributes according to their best

estimation

Page 19: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

19

Algorithm Overview

• Framework• Intra-dominance checking– To Find Skyline in Source Tables

• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning

• Post-processing

Page 20: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

20

Post-processing

• More than one indirect attributes– Calculation• Previous algorithm Intra-dominance checking

– Any existing Skyline algorithm– Post-processing cost depends on the size of

Competitive Products

Page 21: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

21

Outline

• Background– Skyline, Related Work

• Motivation– Examples, Problem Definition

• Algorithm– Framework, Grouping, Pruning

• Experiments– Synthetic, Real data– 6 factors

• Conclusions

Page 22: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

22

Experiments

• Pentium IV 2.4GHz PC with 4GB memory, Linux platform, C++

• Synthetic anti-correlated datasets• Real datasets, Travel Agency A and Travel Agency B

– A, 296 packages, 1014 hotels and 4394 flights – B, 149 packages, 995 hotels and 866 flights

• Implementation– Algorithm for Creating Competitive Products (ACCP)– Baseline algorithm – Naïve algorithm

Preprocessing R* Tree Pruning

ACCP Yes Yes Yes

Baseline Yes Yes No

Naïve No No No

Page 23: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

23

Synthetic DatasetsParameters Default value

No. of attributes in each source table 4

No. of indirect attributes in a product table

1

No. of source tables 2

No. of clusters in each source table 2

Size of existing packages 5M

Size of each source table 100k

• Schema is the same as example

• Anti-correlated• 6 factors• Measurement

– Execution time– Pruning Power– Ratio of Competitive

Products out of all combinations

– Memory Usage

Page 24: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

24

ExperimentsParameters Execution time Pruning Power Ratio of

Competitive Products

Memory Usage

No. of attributes in each source table

1 2 3 4

No. of indirect attributes in a product table

5 6 7 8

No. of source tables

9 10 11 12

No. of clusters in each source table

13 14 15 16

Size of existing packages

17 18 19 20

Size of each source table

21 22 23 24

Page 25: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

25

Experiments

From 100k to 500k

Full pruning & partial pruning

TQ, TQ’, and TR

Pruning Powerslightly increases

Parameters Default value

No. of attributes in each source table 4

No. of indirect attributes in a product table

1

No. of source tables 2

No. of clusters in each source table 6

Size of existing packages 5M

Size of each source table 100k

Page 26: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

26

Outline

• Background– Skyline

• Motivation– Examples & Problem Definition

• Algorithm– Framework, Partition, Pruning

• Experiments– On both synthetic and real data– Over 6 factors

• Conclusions

Page 27: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

27

Conclusions

• Creating Competitive Products– Example– Problem Definition

• Algorithms– Framework– Intra-dominance checking– Inter-dominance checking– Post-processing

• Experiments– Synthetic anti-correlated datasets– Real datasets

Page 28: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

28

THANK YOU !Q&A

Page 29: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

29

APPENDIX

Page 30: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

30

Partial PruningPackage No-of-

stopsDistance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

… … … … …

Q15(f3,h5)

2 170 3 200

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 100

H2 200 2 90

H3 400 1 80

H4 150 2 150

H5 170 2 140

Flight No-of-stops

Flight-cost

F1 0 120

F2 1 100

F3 2 80

Skyline Tuples of Source Tables

Newly Created Vacation Packages

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

Existing Vacation Packages

Competitive Products

A1

B1

C1={A1, B1}

Full Pruning

Page 31: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

Meta Transformation

Package No-of-stops

Distance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

Package No-of-stops

Distance-to-beach

Hotel-class

Price

P2 1 140 2 170

Package No-of-stops

Price

P2 1 170

Package Distance-to-beach

Hotel-class Price

P2 140 2 170

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 200

H2 200 2 190

H3 400 1 180

Flight No-of-stops

Flight-cost

F1 0 200

F2 1 180

•No inter-dominance checking for {F2} X{H2}

Meta-Hotel

Meta-Flight

Min 1 100

Min 400 1 80

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 100

H2 200 2 90

H3 400 1 80

Flight No-of-stops

Flight-cost

F1 0 120

F2 1 100

A1

B1

Page 32: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

32

Experiments

From 2.5M to 10M

Parameters Default value

No. of attributes in each source table 4

No. of indirect attributes in a product table

1

No. of source tables 2

No. of clusters in each source table 6

Size of existing packages 5M

Size of each source table 100k

More competitive Slightly decreases

Page 33: Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science

33

Experiments

Travel Agency A Package Generation Set

1. A, 296 packages, 1014 hotels and 4394 flights . B, 149 packages, 995 hotels and 866 flights

2. Source tables from B, and Package from A

3. Vary discount from 0 to 0.504. Efficiency

ACCP(44.74s) and Baseline (84.47s)

5. |SKY|/|TQ|6. |DOM|/|TE|

DOMSKY