Upload
myles-jones
View
223
Download
1
Tags:
Embed Size (px)
Citation preview
1
Creating Competitive Products
Qian Wan[1], Raymond Chi-Wing Wong[1], Ihab F. Ilyas[2], M. Tamer Ozsu[2], Yu Peng[1]
[1] Hong Kong University of Science and Technology
[2] University of WaterlooPresented by Qian WanPrepared by Qian Wan
2
Outline
• Background– Skyline, Related Work
• Motivation– Examples, Problem Definition
• Algorithm– Framework, Grouping, Pruning
• Experiments– Synthetic, Real data– 6 factors
• Conclusions
3
Skyline
• Definition– Skyline contains the points which are not dominated by
others
• Hotel searching problem– Distance to beach VS Price– Dominance– Skyline
Dist
Price
H3
H5
H7
H9
H1
H2
H4
H6
H8
Dist
Price
H1
H2
4
Related Work
• Skyline Queries in DBMS [S.Borzsonyi, 2001]
• Single Table Skyline Queries– Bitmaps[K.L. Tan,2001], Nearest Neighbor[D.Kossomann,
2002], Branch and Bound Skylines[D.Papadias, 2005]
• Multi-Table Skyline Queries– Natural Join [W.Jin, 2007][D.Sun, 2008]
– Our Work• Join different source tables via a “Cartesian product”
like procedure.
5
Outline
• Background– Skyline, Related Work
• Motivation– Examples, Problem Definition
• Algorithm– Framework, Grouping, Pruning
• Experiments– Synthetic, Real data– 6 factors
• Conclusions
6
A Travel Agency’s DatabasePackage No-of-
stopsDistance-to-beach
Hotel-class Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
P4 1 150 4 300
Existing Vacation Packages
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 100
H2 200 2 90
H3 400 1 80
Flight No-of-stops
Flight-cost
F1 0 120
F2 1 100
Package No-of-stops
Distance-to-beach
Hotel-class Price
Q1(F1:H1) 0 100 3 220
Q2(F1,H2) 0 200 2 210
Q3(F1, H3) 0 400 1 200
… … … … …
Q24(f4,h6) 2 200 3 210
Newly Created Vacation Packages
Source Tables
1. Direct attributes2. Indirect attributes3. One indirect attribute characteristic e.g. Travel Agency (Price), PC Manufacture(Price) and Logistic Transportation Service (Price)
21,TT
ET
QT
Skyline tuples
7
Finding Competitive Products
• Given a set of source tables• Market packages• New packages • Then, a tuple q in TQ is said to be competitive
product if q is in Skyline with respect to
kTTT ..., 21
ET
QT
QE TT
8
Naïve Solution
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 100
H2 200 2 90
H3 400 1 80
H4 150 2 150
H5 170 2 140
H6 200 3 120
Flight No-of-stops
Flight-cost
F1 0 120
F2 1 100
F3 2 80
F4 2 90
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
… … … … …
Q24(f4,h6)
2 200 3 210
Package
No-of-stops
Distance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
P4 1 150 4 300
1. Intra-dominance checking2. Inter-dominance checking
Source Tables
Existing Vacation Packages
Newly Created Vacation Packages
Package
No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
Competitive Products
9
Outline
• Background– Skyline, Related Work
• Motivation– Examples, Problem Definition
• Algorithm– Framework, Grouping, Pruning
• Experiments– Synthetic, Real data– 6 factors
• Conclusions
10
Algorithm Overview
• Intra-dominance checking (Framework)– To Find Skyline in Source Tables
• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning
• Post-processing
11
Intra-dominance Checking
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 100
H2 200 2 90
H3 400 1 80
H4 150 2 150
H5 170 2 140
H6 200 3 120
Flight No-of-stops
Flight-cost
F1 0 120
F2 1 100
F3 2 80
F4 2 90
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
… … … … …
Q15(f3,h5)
2 170 3 200
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 100
H2 200 2 90
H3 400 1 80
H4 150 2 150
H5 170 2 140
Flight No-of-stops
Flight-cost
F1 0 120
F2 1 100
F3 2 80
Skyline Tuples of Source Tables
Newly Created Vacation Packages
1. NO intra-dominance checking(one indirect attribute)2. NO competitive products are missing
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
Competitive Products
12
Algorithm Overview
• Intra-dominance checking (Framework)– To Find Skyline in Source Tables
• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning
• Post-processing
13
Inter-dominance Checking
Package No-of-stops
Distance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
P4 1 150 4 300
Package No-of-stops
Distance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
P4 1 150 4 300
Package No-of-stops
Distance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
No Missing Competitive Products
R* Tree will speedup the inter-dominance checking
Existing Vacation Packages Skyline in Existing
Vacation Packages
R0
R1
R3 R4
R2
R5
Inter-dominance Checking Range query
14
Algorithm Overview
• Intra-dominance checking (Framework)– To Find Skyline in Source Tables
• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning
• Post-processing
15
GroupingPackage No-of-
stopsDistance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
… … … … …
Q15(f3,h5)
2 170 3 200
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 100
H2 200 2 90
H3 400 1 80
H4 150 2 150
H5 170 2 140
Flight No-of-stops
Flight-cost
F1 0 120
F2 1 100
F3 2 80
Skyline Tuples of Source Tables
Newly Created Vacation Packages
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
Existing Vacation Packages
Competitive Products
A1
A2
B1
B2
C1={A1, B1}
C4={A2, B2}
Full Pruning
16
Full PruningPackage No-of-
stopsDistance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
Best Representative
B1
B2
… … … … …
Bi
… … … … …
Bj
… … … … …
Bk
Group
C1
C2
… … … … …
Ci
… … … … …
Cj
… … … … …
Ck
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q(f2:h4) 1 150 4 250
Q’(f2,h5) 1 170 4 240
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Min 1 150 4 240
Quality of Best Representative: tightness of each group(Clustering, e.g. KMeans)
Best Representative
17
Algorithm Overview
• Intra-dominance checking (Framework)– To Find Skyline in Source Tables
• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning
• Post-processing
18
Partial Pruning
Partial Pruning Full pruning prunes all members in the group Partial pruning prunes some members in the group Partial pruning is used when full pruning cannot be applied
Idea Direct attribute does not change Estimate the best possible value for indirect attributes Eliminate a combination , if
It is dominated on all direct attributes It is dominated on all indirect attributes according to their best
estimation
19
Algorithm Overview
• Framework• Intra-dominance checking– To Find Skyline in Source Tables
• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning
• Post-processing
20
Post-processing
• More than one indirect attributes– Calculation• Previous algorithm Intra-dominance checking
– Any existing Skyline algorithm– Post-processing cost depends on the size of
Competitive Products
21
Outline
• Background– Skyline, Related Work
• Motivation– Examples, Problem Definition
• Algorithm– Framework, Grouping, Pruning
• Experiments– Synthetic, Real data– 6 factors
• Conclusions
22
Experiments
• Pentium IV 2.4GHz PC with 4GB memory, Linux platform, C++
• Synthetic anti-correlated datasets• Real datasets, Travel Agency A and Travel Agency B
– A, 296 packages, 1014 hotels and 4394 flights – B, 149 packages, 995 hotels and 866 flights
• Implementation– Algorithm for Creating Competitive Products (ACCP)– Baseline algorithm – Naïve algorithm
Preprocessing R* Tree Pruning
ACCP Yes Yes Yes
Baseline Yes Yes No
Naïve No No No
23
Synthetic DatasetsParameters Default value
No. of attributes in each source table 4
No. of indirect attributes in a product table
1
No. of source tables 2
No. of clusters in each source table 2
Size of existing packages 5M
Size of each source table 100k
• Schema is the same as example
• Anti-correlated• 6 factors• Measurement
– Execution time– Pruning Power– Ratio of Competitive
Products out of all combinations
– Memory Usage
24
ExperimentsParameters Execution time Pruning Power Ratio of
Competitive Products
Memory Usage
No. of attributes in each source table
1 2 3 4
No. of indirect attributes in a product table
5 6 7 8
No. of source tables
9 10 11 12
No. of clusters in each source table
13 14 15 16
Size of existing packages
17 18 19 20
Size of each source table
21 22 23 24
25
Experiments
From 100k to 500k
Full pruning & partial pruning
TQ, TQ’, and TR
Pruning Powerslightly increases
Parameters Default value
No. of attributes in each source table 4
No. of indirect attributes in a product table
1
No. of source tables 2
No. of clusters in each source table 6
Size of existing packages 5M
Size of each source table 100k
26
Outline
• Background– Skyline
• Motivation– Examples & Problem Definition
• Algorithm– Framework, Partition, Pruning
• Experiments– On both synthetic and real data– Over 6 factors
• Conclusions
27
Conclusions
• Creating Competitive Products– Example– Problem Definition
• Algorithms– Framework– Intra-dominance checking– Inter-dominance checking– Post-processing
• Experiments– Synthetic anti-correlated datasets– Real datasets
28
THANK YOU !Q&A
29
APPENDIX
30
Partial PruningPackage No-of-
stopsDistance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
… … … … …
Q15(f3,h5)
2 170 3 200
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 100
H2 200 2 90
H3 400 1 80
H4 150 2 150
H5 170 2 140
Flight No-of-stops
Flight-cost
F1 0 120
F2 1 100
F3 2 80
Skyline Tuples of Source Tables
Newly Created Vacation Packages
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
Existing Vacation Packages
Competitive Products
A1
B1
C1={A1, B1}
Full Pruning
Meta Transformation
Package No-of-stops
Distance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
Package No-of-stops
Distance-to-beach
Hotel-class
Price
P2 1 140 2 170
Package No-of-stops
Price
P2 1 170
Package Distance-to-beach
Hotel-class Price
P2 140 2 170
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 200
H2 200 2 190
H3 400 1 180
Flight No-of-stops
Flight-cost
F1 0 200
F2 1 180
•No inter-dominance checking for {F2} X{H2}
Meta-Hotel
Meta-Flight
Min 1 100
Min 400 1 80
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 100
H2 200 2 90
H3 400 1 80
Flight No-of-stops
Flight-cost
F1 0 120
F2 1 100
A1
B1
32
Experiments
From 2.5M to 10M
Parameters Default value
No. of attributes in each source table 4
No. of indirect attributes in a product table
1
No. of source tables 2
No. of clusters in each source table 6
Size of existing packages 5M
Size of each source table 100k
More competitive Slightly decreases
33
Experiments
Travel Agency A Package Generation Set
1. A, 296 packages, 1014 hotels and 4394 flights . B, 149 packages, 995 hotels and 866 flights
2. Source tables from B, and Package from A
3. Vary discount from 0 to 0.504. Efficiency
ACCP(44.74s) and Baseline (84.47s)
5. |SKY|/|TQ|6. |DOM|/|TE|
DOMSKY