53
GEOMETRY APPROACH FOR K- REGRET QUERY ICDE 2014 PENG PENG, RAYMOND CHI-WING WONG CSE, HKUST 1

Geometry Approach for k -Regret Query ICDE 2014

  • Upload
    leora

  • View
    80

  • Download
    0

Embed Size (px)

DESCRIPTION

Geometry Approach for k -Regret Query ICDE 2014. PENG Peng , Raymond Chi-Wing Wong CSE, HKUST. Outline. 1. Introduction 2. Contributions 3. Preliminary 4. Related Work 5 . Geometry Property 6 . Algorithm 7 . Experiment 8 . Conclusion. 1. Introduction. Multi-criteria Decision Making: - PowerPoint PPT Presentation

Citation preview

Page 1: Geometry Approach for  k -Regret Query ICDE 2014

1

GEOMETRY APPROACH FOR K-REGRET QUERYICDE 2014

PENG PENG, RAYMOND CHI-WING WONGCSE, HKUST

Page 2: Geometry Approach for  k -Regret Query ICDE 2014

2

OUTLINE1. Introduction2. Contributions3. Preliminary4. Related Work5. Geometry Property6. Algorithm7. Experiment8. Conclusion

Page 3: Geometry Approach for  k -Regret Query ICDE 2014

3

1. INTRODUCTIONMulti-criteria Decision Making:

• Design a query for the user which returns a number of “interesting” objects to a user

Traditional queries:• Top-k queries• Skyline queries

Page 4: Geometry Approach for  k -Regret Query ICDE 2014

4

1. INTRODUCTIONTop-k queries

• Utility function • Given a particular utility function , the utility of all the points in D

can be computed.• The output is a set of k points with the highest utilities.

Skyline queries• No utility function is required.• A point is said to be a skyline point if a point is not dominated by

any point in the dataset.• Assume that a greater value in an attribute is more preferable.• We say that q is dominated by p if and only if for each and there

exists an such that • The output is a set of skyline points.

Page 5: Geometry Approach for  k -Regret Query ICDE 2014

5

LIMITATIONS OF TRADITIONAL QUERIESTraditional Queries

• Top-k queries• Advantage: the output size is given by the user and it is

controllable.• Disadvantage: the utility function is assumed to be known.

• Skyline queries• Advantage: there is no assumption that the utility function is

known.• Disadvantage: the output size cannot be controlled.

Recently proposed Query in VLDB2010• K-regret queries

• Advantage: There is no assumption that the utility function is known and the output size is given by the user and is controllable.

Page 6: Geometry Approach for  k -Regret Query ICDE 2014

6

2. CONTRIBUTIONS We give some theoretical properties of k-regret queries

We give a geometry explanation of a k-regret query. We define happy points, candidate points for the k-regret

query. Significance: All existing algorithms and new algorithms to

be developed for the k-regret query can also use our happy points for finding the solution of the k-regret query more efficiently and more effectively.

We propose two algorithms for answering a k-regret query GeoGreedy algorithm StoredList algorithm

We conduct comprehensive experimental studies

Page 7: Geometry Approach for  k -Regret Query ICDE 2014

7

3. PRELIMINARYNotations in k-regret queriesWe have . Let .

• Utility function .• is an example where .• Consider 3 utility functions, namely, .• .

• Maximum utility .• ,• .

Page 8: Geometry Approach for  k -Regret Query ICDE 2014

8

3. PRELIMINARYNotations in k-regret queries

• Regret ratio .Measures how bad a user with f feels after receiving the output S. If it is 1, the user feels bad; if it is 0, then the user feels happy.

, ,.029; , ,;

, ,.

• Maximum regret ratio .Measures how bad a user feels after receiving the output S.A user feels better when is smaller.• .

Page 9: Geometry Approach for  k -Regret Query ICDE 2014

9

3. PRELIMINARYProblem Definition

• Given a d-dimensional database of size n and an integer k, a k-regret query is to find a set of S containing at most k points such that is minimized.

• Let be the maximum regret ratio of the optimal solution.Example

• Given a set of points each of which is represented as a 2-dimensional vector.

• A 2-regret query on these 4 points is to select 2 points among as the output such that the maximum regret ratio based on the selected points is minimized among other selections.

Page 10: Geometry Approach for  k -Regret Query ICDE 2014

10

4. RELATED WORKVariations of top-k queries

• Personalized Top-k queries (Information System 2009)- Partial information about the utility function is assumed to be known.

• Diversified Top-k queries (SIGMOD 2012)- The utility function is assumed to be known.

- No assumption on the utility function is made for a k-regret query.Variations of skyline queries

• Representative skyline queries (ICDE 2009) - The importance of a skyline point changes when the data is contaminated.

• K-dominating skyline queries (ICDE 2007)- The importance of a skyline point changes when the data is contaminated.

- We do not need to consider the importance of a skyline point in a k-regret query.Hybrid queries

• Top-k skyline queries (OTM 2005)- The importance of a skyline point changes when the data is contaminated.

• -skyline queries (ICDE 2008)- No bound is guaranteed and it is unknown how to choose .

- The maximum regret ratio used in a k-regret query is bounded.

Page 11: Geometry Approach for  k -Regret Query ICDE 2014

11

4. RELATED WORKK-regret queries

• Regret-Minimizing Representative Databases (VLDB 2010)• Firstly propose the k-regret queries;• Proves a worst-case upper bound and a lower bound for the

maximum regret ratio of the k-regret queries;• Propose the best-known fastest algorithm for answering a k-

regret query.• Interactive Regret Minimization (SIGMOD 2012)

• Propose an interactive version of k-regret query and an algorithm to answer a k-regret query.

• Computing k-regret Minimizing set (VLDB 2014)• Prove the NP-completeness of a k-regret query;• Define a new k-regret minimizing set query and proposed two

algorithms to answer this new query.

Page 12: Geometry Approach for  k -Regret Query ICDE 2014

12

5. GEOMETRY PROPERTY• Geometry explanation of the maximum regret ratio given

an output set S• Happy point and its properties

Page 13: Geometry Approach for  k -Regret Query ICDE 2014

13

GEOMETRY EXPLANATION OF • Maximum regret ratio .How to compute given an output set ?

• The function space F can be infinite.• The method used in “Regret-Minimizing Representative

Databases” (VLDB2010): Linear Programming• It is time consuming when we have to call Linear

Programming independently for different s.

Page 14: Geometry Approach for  k -Regret Query ICDE 2014

14

GEOMETRY EXPLANATION OF • Maximum regret ratio .We compute with Geometry method.

• Straightforward and easily understood;• Save time for computing .

Page 15: Geometry Approach for  k -Regret Query ICDE 2014

15

AN EXAMPLE IN 2-D, where .

𝑝1

𝑝3

𝑝2𝑝4

1

1

𝑝5

𝑝6

Page 16: Geometry Approach for  k -Regret Query ICDE 2014

16

AN EXAMPLE IN 2-D, where S.

𝑝1

𝑝3

𝑝2𝑝4

1

1

𝑝5

𝑝6

Page 17: Geometry Approach for  k -Regret Query ICDE 2014

17

GEOMETRY EXPLANATION OF Critical ratio

• A -critical point given denoted by is defined as the intersection between the vector and the surface of .

• Critical ratio

𝑝=(0.67 ,0.82)

0 .8 𝑥+0.7 𝑦=1

𝑜

𝑝❑′ =(0.6,0 .74 )

Page 18: Geometry Approach for  k -Regret Query ICDE 2014

18

GEOMETRY EXPLANATION OF Lemma 0:

• According to the lemma shown above, we compute at first for each which is outside and find the greatest value of which is the maximum regret ratio of .

Page 19: Geometry Approach for  k -Regret Query ICDE 2014

19

AN EXAMPLE IN 2-DSuppose that , and the output set is ....So,

.𝑝1

𝑝3

𝑝2𝑝4

1

1

𝑝5

𝑝6

𝑝5′

𝑝6′

𝑝2′

Page 20: Geometry Approach for  k -Regret Query ICDE 2014

20

HAPPY POINTThe set is defined as a set of -dimensional points of size , where for each point and , we have when , and when . In a 2-dimensional space, , where .

𝑝1

𝑝3

𝑝2𝑝4

𝑣 𝑐1𝑝5

𝑝6

𝑣 𝑐2

Page 21: Geometry Approach for  k -Regret Query ICDE 2014

21

HAPPY POINTIn the following, we give an example of in a 2-dimensonal case.Example:

𝑝1

𝑝3

𝑝2𝑝4

𝑣 𝑐1𝑝5

𝑝6

𝑣 𝑐2

Page 22: Geometry Approach for  k -Regret Query ICDE 2014

22

HAPPY POINTDefinition of domination:

• We say that q is dominated by p if and only if for each and there exists an such that

Definition of subjugation:• We say that q is subjugated by p if and only if q is on or below

all the hyperplanes containing the faces of and is below at least one hyperplane containing a face of .

• We say that q is subjugated by p if and only if for each and there exists a such that .

Page 23: Geometry Approach for  k -Regret Query ICDE 2014

23

AN EXAMPLE IN 2-D subjugates because is below both the line and the line . does not subjugates because is above the line .

𝑝1

𝑝3

𝑝2𝑝4

𝑣 𝑐1𝑝5

𝑝6

𝑣 𝑐2

Page 24: Geometry Approach for  k -Regret Query ICDE 2014

24

HAPPY POINTLemma 1:

• There may exist a point in , which cannot be found in the optimal solution of a k-regret query.

Example:• In the example shown below, the optimal solution of a 3-

regret query is , where is not a point in

𝑝1

𝑝3𝑝2

𝑣 𝑐1𝑝5

𝑝6

𝑣 𝑐2

Page 25: Geometry Approach for  k -Regret Query ICDE 2014

25

AN EXAMPLE IN 2-DLemma 2:

Example:

𝑝1

𝑝3

𝑝2𝑝4

𝑣 𝑐1𝑝5

𝑝6

𝑣 𝑐2

Page 26: Geometry Approach for  k -Regret Query ICDE 2014

26

HAPPY POINTAll existing studies are based on as candidate points for the k-regret query.Lemma 3:

• Let be the maximum regret ratio of the optimal solution. Then, there exists an optimal solution of a k-regret query, which is a subset of when .

Example:• Based on Lemma 3, we compute the optimal solution based

on instead of .

Page 27: Geometry Approach for  k -Regret Query ICDE 2014

27

6. ALGORITHMGeometry Greedy algorithm (GeoGreedy)

• Pick boundary points of the dataset of size and insert them into an output set;

• Repeatedly compute the regret ratio for each point which is outside the convex hull constructed based on the up-to-date output set, and add the point which currently achieves the maximum regret ratio into the output set;

• The algorithm stops when the output size is k or all the points in are selected.

Stored List Algorithm (StoredList)• Preprocessing Step:

• Call GeoGreedy algorithm to return the output of an -regret query;• Store the points in the output set in a list in terms of the order that they are

selected. • Query Step:

• Returns the first k points of the list as the output of a k-regret query.

Page 28: Geometry Approach for  k -Regret Query ICDE 2014

28

7. EXPERIMENTDatasets Experiments on Synthetic datasets Experiments on Real datasets

• Household dataset : • NBA dataset: • Color dataset: • Stocks dataset:

Algorithms:• Greedy algorithm (VLDB 2010)• GeoGreedy algorithm• StoredList algorithm

Measurements:• The maximum regret ratio• The query time

Page 29: Geometry Approach for  k -Regret Query ICDE 2014

29

7. EXPERIMENTExperiments

• Relationship Among • Effect of Happy Points• Performance of Our Method

Page 30: Geometry Approach for  k -Regret Query ICDE 2014

30

RELATIONSHIP AMONG DatasetHousehold 926 1332 9832

NBA 65 75 447

Color 124 151 1023

Stocks 396 449 3042

Page 31: Geometry Approach for  k -Regret Query ICDE 2014

31

EFFECT OF HAPPY POINTSHousehold: maximum regret ratio

The result based on The result based on

Page 32: Geometry Approach for  k -Regret Query ICDE 2014

32

EFFECT OF HAPPY POINTSHousehold: query time

The result based on The result based on

Page 33: Geometry Approach for  k -Regret Query ICDE 2014

33

PERFORMANCE OF OUR METHODExperiments on Synthetic datasets

• Maximum regret ratio

Effect of d Effect of n

Page 34: Geometry Approach for  k -Regret Query ICDE 2014

34

PERFORMANCE OF OUR METHODExperiments on Synthetic datasets

• Query time

Effect of d Effect of n

Page 35: Geometry Approach for  k -Regret Query ICDE 2014

35

PERFORMANCE OF OUR METHODExperiments on Synthetic datasets

• Maximum regret ratio

Effect of k Effect of large k

Page 36: Geometry Approach for  k -Regret Query ICDE 2014

36

PERFORMANCE OF OUR METHODExperiments on Synthetic datasets

• Query time

Effect of k Effect of large k

Page 37: Geometry Approach for  k -Regret Query ICDE 2014

37

8. CONCLUSION• We studied a k-regret query in this paper.• We proposed a set of happy points, a set of candidate

points for the k-regret query, which is much smaller than the number of skyline points for finding the solution of the k-regret query more efficiently and effectively.

• We conducted experiments based on both synthetic and real datasets.

• Future directions:• Average regret ratio minimization• Interactive version of a k-regret query

Page 38: Geometry Approach for  k -Regret Query ICDE 2014

38

THANK YOU!

Page 39: Geometry Approach for  k -Regret Query ICDE 2014

39

GEOGREEDY ALGORITHMGeoGreedy Algorithm

Page 40: Geometry Approach for  k -Regret Query ICDE 2014

40

GEOGREEDY ALGORITHMAn example in 2-d:In the following, we compute a 4-regret query using GeoGreedy algorithm.

𝑝1

𝑝3

𝑝2𝑝4

1

1

𝑝5

𝑝6

Page 41: Geometry Approach for  k -Regret Query ICDE 2014

41

GEOGREEDY ALGORITHMLine 2 – 4:

1

1

𝑝5

𝑝6

Page 42: Geometry Approach for  k -Regret Query ICDE 2014

42

GEOGREEDY ALGORITHMLine 2 – 4:

• .Line 5 – 10 (Iteration 1):

• Since and , we add in .

𝑝1

𝑝3

𝑝2𝑝4

1

1

𝑝5

𝑝6

𝑝1′

𝑝2′

Page 43: Geometry Approach for  k -Regret Query ICDE 2014

43

GEOGREEDY ALGORITHMLine 5 – 10 (Iteration 2):

• After Iteration 1, .• We can only compute which is less than 1 and we add in .

𝑝1

𝑝3

𝑝2𝑝4

1

1

𝑝5

𝑝6

𝑝2′

Page 44: Geometry Approach for  k -Regret Query ICDE 2014

44

STOREDLIST ALGORITHMStored List Algorithm

• Pre-compute the outputs based on GeoGreedy Algorithm for .• The outputs with a smaller size is a subset of the outputs with

a larger size.• Store the outputs of size n in a list based on the order of the

selection.

Page 45: Geometry Approach for  k -Regret Query ICDE 2014

45

STOREDLIST ALGORITHMAfter two iterations in GeoGreedy Algorithm, the output set .

Since the critical ratio for each of the unselected points is at least 1, we stop GeoGreedy Algorithm and is the output set with the greatest size.

We stored the outputs in a list L which ranks the selected points in terms of the orders they are added into .

That is, .

When a 3-regret query is called, we returns the set .

Page 46: Geometry Approach for  k -Regret Query ICDE 2014

46

EFFECT OF HAPPY POINTSNBA: maximum regret ratio

The result based on The result based on

Page 47: Geometry Approach for  k -Regret Query ICDE 2014

47

EFFECT OF HAPPY POINTSNBA: query time

The result based on The result based on

Page 48: Geometry Approach for  k -Regret Query ICDE 2014

48

EFFECT OF HAPPY POINTSColor: maximum regret ratio

The result based on The result based on

Page 49: Geometry Approach for  k -Regret Query ICDE 2014

49

EFFECT OF HAPPY POINTSColor: query time

The result based on The result based on

Page 50: Geometry Approach for  k -Regret Query ICDE 2014

50

EFFECT OF HAPPY POINTSStocks: maximum regret ratio

The result based on The result based on

Page 51: Geometry Approach for  k -Regret Query ICDE 2014

51

EFFECT OF HAPPY POINTSStocks: query time

The result based on The result based on

Page 52: Geometry Approach for  k -Regret Query ICDE 2014

52

PRELIMINARYExample: , where .

We have .Let .Since ,

and ,

we have .

Similarly,

,

.

So, we have

Page 53: Geometry Approach for  k -Regret Query ICDE 2014

53

AN EXAMPLE IN 2-DPoints (normalized):

𝑝1

𝑝3

𝑝2𝑝4

1

1

𝑝5

𝑝6

𝑝1

𝑝3

𝑝2𝑝4

𝑣 𝑐1𝑝5

𝑝6

𝑣 𝑐2