29
1 A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone Manco

Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone Manco

  • Upload
    bina

  • View
    72

  • Download
    0

Embed Size (px)

DESCRIPTION

A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis , Kyriakos Mouratidis. Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone Manco. Scenario. Some users want to select objects with specific features, based on their preferences - PowerPoint PPT Presentation

Citation preview

Page 1: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

1

A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIESLeong Hou U, Nikos Mamoulis, Kyriakos Mouratidis

Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone Manco

Page 2: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

2

Scenario

• Some users want to select objects with specific features, based on their preferences

• These requests are performed as database queries• Queries express users’ preferences by different weights on the

attributes of the searched objects• These are the so-called Preference Queries

Page 3: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

3

Scenario

• The result of a preference query is the object in the database with the highest aggregate score

• If multiple preference queries are issued simultaneously, an object may be the best solution for many of them:– Who will be coupled to the object?– Which results will receive other users?

A FAIR ASSIGNMENT PROBLEM

Page 4: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

4

Scenario - Example

• Internship assignment, based on student’s preferences in terms of:– nature of the job– Salary– office location– other features…

• For a single student the system returns a set of top-k results with respect of his/her preference function

• An available internship position could be the top-1 choice of many interested students. It can only be assigned to one of them

• The system must look for a fair 1-1 matching between the users and the objects– Stable Marriage Problem (SMP)

Page 5: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

5

Scenario - Example

Best point

• Internship assignment, based on student’s preferences in terms of:– nature of the job– salary– office location– other features…

b

a

d

c

f1

f2

Users’ preference functionsf1=0.8X+0.2Yf2=0.5X+0.5Y

Positions’ attributesa=(0.5,0.6)b=(0.2,0.7)c=(0.8,0.2)d=(0.4,0.4)

(salary) X

(sta

ndin

g)

Y

Page 6: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

6

Related Algorithms

• 1-1 assignment problem is related to three types of search:– Spatial Assignment problem (model: SMP)

• Chain Algorithm– Skyline Queries

• Branch-and-Bound Skyline Algorithm– Top-k Search

• Threshold Algorithm

• Stablepair:Given two datasets A and B, a 1-1 matching M is stable if there are no two pairs (a , b) and (a’ , b’ ) in M , such that a prefers b’ to b, and b prefers a to a ‘(where a, a’ ∈ A and b, b’ ∈ B).

Page 7: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

7

Spatial Assignment Problem – Chain Algorithm

• Its goal is to find a stable pair• Its preference function is based on Euclidean distance

– a prefers b’ to b if dist(a,b’) < dist(a,b)• A pair (a,b) is stable if and only if a’s closest object is b

and b’s closest object is a, where a and b are among the unassigned (remaining) objects in A and B

o Chain algorithm:1. pick an object from A (randomly) or Q;2. find the NN (Nearest Neighbour) of a∈ A (aB∈ B);3. find the NN a’ ∈ A of aB∈ B;4. if a ≠ a’, aBis pushed into a queue Q; otherwise pair (a,aB)

is output as the result pair and a, aB are removed from A and B.

Page 8: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

8

Skyline Queries – BBS Algorithm

• A different approach exploits the set’s skyline concept– The skyline of O consists of all points o ∈ O that are not

dominated by any other point in O.• It’s faster if the objects are indexed by an R-Tree

o BBS algorithm:1. Compute the skyline of O by accessing the minimum

number of R-tree nodes it is I/O optimal2. Access the node of the tree in ascending distance order

from the sky pointSky point is the (imaginary) most preferable object

possible.3. Once a data object is found, it is added to the skyline and

all R-tree nodes/subtrees dominated by it are pruned.

Page 9: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

9

BBS Algorithm–Example

sky

M3

M2

M1

m5

m4

m7

m6

m2

m1

m3

a

c d

g h i

e

bf

jl

k

m

M1 M2 M3

m1 m2 m3 m4 m5 m6 m7

g h a c d e i j l k m b f ...

INN Heap = {M1, M2,M3}INN Heap = {m1, m2, m3, M2,M3}INN Heap = {e, i, m1, m2, M2,M3}

Osky = {e}Osky = {e, a}

INN Heap = {m2}INN Heap = {a}

Page 10: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

10

Skyline Queries –DeltaSky Algorithm

• Is used in a dinamic dataset, where objects can be added/removed

• It determines the intersection between MBR and EDR without explicity calculating the EDR itself

• For each deletion in Osky, DeltaSkyTraverse the R-Tree once

• If more deletion are performed,DeltaSky incurs in high I/O cost

• EDR: Exclusive Dominance Region• MBR: Minimum Bounding Rectangle

Page 11: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

11

Top-k search – Threshold Algorithm

• O is a collection of n objects, an object o has D attributes• D S1, S2, …, SD sorted lists, one for each attribute, ordered by

the atomic scores• A top-k query, based on an aggregate function f, retrieves a k-

subset Otopk of O (k<n), such that f(o) ≥ f(o’), ∀o ∈ Otopk, o’ ∈ (O−Otopk)

• The most used algorithm for top-k queries is Threshold Algorithm (TA)– pops objects form the sorted lists in round-robin manner– for each object o, f(o) is computed– The set of k objects with the highest score is maintained– the search terminates when the k-th score is greater than

or equal to threshold T

Page 12: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

12

Top-k search - BRS & Onion

• Branch-and-bound Ranked Search:1. Visit R-tree nodes in an order determined by a preference

function f2. Maxscore(M): is an upper bound of the score for any object

inside the MBR M3. Nodes are accessed in descending maxscore order4. Terminate when the score of the k-th best object is no

smaller than the next node’s maxscore.

• Onion:1. Compute the convex hull of the data objects and set it as

the layer2. Remove the hull object3. Expand the layers from the first one moving inwards

Page 13: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

13

Problem Statement

• A set of user preference function F over a set of multidimensional objects O.

• The score f(o) of an object o is:

• Our goal is to find stable 1-1 matching between F and O• A function-object pair (f, o) in F × O is stable, if there is no

function f’ ∈ F, f’ ≠ f, f’(o) > f(o) and there is no object o’ ∈ O, o’ ≠ o, f(o’) > f(o), where F and O are the sets of the unassigned (remaining) functions and objects.

Page 14: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

14

Algorithms – Brute Force Search

Assumption: F kept in memory, O indexed by an R-tree (Ro) on the disk

• Progressive technique• Issue top-1 queries against O, one for every function in F (|F|

pairs)• The pair (f,o) with the highest f(o) value should be stable

– o is the top-1 preference of f– f’(o) cannot be greater than f(o) for any function f’ ≠ f

• After the pair (f,o) is added to the query result– o is removed from Ro

– If o was the top-1 object for another function f’ ≠ f, top-1 search must be re-applied for f’

• Improvements: maintaining the search heap for each top-1 query, the search can resume– Drawback: large amount of memory!

Page 15: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

15

Algorithms – Skyline-Based Search

• Assumption: if F contains only monotone function, than the top-1 objects should be in Osky

• Stable function-object pairs between Osky and F are found and output– Osky is computed and maintained

First we compute the skyline Osky

SB(set F, R-tree Ro)

Osky := ∅while |F| > 0 do

UpdateSkyline(Osky, o, RO)Then while there are unassigned functions the pair (f,o) with the highest f(o) score is found

(f,o):=BestPair(F, Osky) Output (f,o)

Osky := ComputeSkyline(RO)

F := F-f; O := O-o; Osky:= Osky-oFinally, f and o are removed from F and O, and Oskyis updated

Page 16: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

16

Algorithms – Skyline-Based Search (Example)

sky

Page 17: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

17

Implementation - BestPair

• A brute force implementation is not efficient:– Requires |F| * |Osky| comparisons (cross product F x Osky)

• Another approach is to index either F or Osky

– The indexing of Oskyis not practical (number of updates)– F is indexed since only one deletion is performed in it at each loop

Functions are indexed as sorted lists, one for each coefficient• It’s applied a reverse top-1 search on the lists, where the roles of

objects and functions are swapped• Each list L1,…, LD (D is the dimensionality) holds the (f.αi,f) pairs of all

functions f ∈ F, sorted on f.αi in descending order• The threshold T can be calculated as

– The sum of the coefficients could be greater than 1, then a normalization of the function is required

Normalization algorithm1. Rank dimensions in descending order based on o’s corresponding

values2. B=1 , for each dimension i: βi = min{B,li} , B = B-βi

Page 18: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

18

Implementation - BestPair (Example)

o = (10,6,8) fa = 0.8X + 0.1Y + 0.1Zfb = 0.2X + 0.8Y + 0.0Zfc = 0.5X + 0.4Y + 0.1Zfd = 0.0X + 0.1Y + 0.9Zfe = 0.2X + 0.4Y + 0.4Z

L1 L2 L3

fa(0.8) fb(0.8) fd(0.9)

fc(0.5) fe(0.4) fe(0.4)

fe(0.2) fc(0.4) fc(0.1)

fb(0.2) fd(0.1) fa(0.1)

fd(0.0) fa(0.1) fb(0.0)fbest = fa = 9.4

fa(o)=9.4 fb(o)=6.8 fd(o)=7.8

l1=0.8, l2=0.8, l3=0.9

B=1 β1 = min{B,l1} = 0.8 B = B-0.8 = 0.2 β3 = min{B,l3} = 0.2 B=0

β1= 0.8 , β2= 0 , β3= 0.2

Ttight = 9.6fc(o)=8.2

l1=0.5, l2=0.8, l3=0.9

B=1 β1 = min{B,l1} = 0.5 B = B-0.5 = 0.5 β3 = min{B,l3} = 0.5 B=0

β1= 0.5 , β2= 0 , β3= 0.5

Ttight = 9

Page 19: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

19

Implementation - BestPair (Improvements)

• TA access order– The accessing order changes from Round-Robin to li*oi

descending values order (li is the last value seen in each Li)

• Resuming search– The state of the previous applied search for the object in

Osky is stored and the search can be resumed, if necessary– The drawback of this method is the extra memory required

• Iterative solution: the queue’s maximum capacity is set to Ω = ω * |F|

– the queue stores only the top-Ω functions– Ω is decreased by 1 when an element is popped from

the queue; if Ω=0, its value is reset to ω * |F|– this allow to control the tradeoff between execution

time and memory usage

Page 20: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

20

Implementation – UpdateSkyline (Example)

• To minimize the tree traversal cost during skyline maintenance, the dominated objects by o are pruned and these entries are added to the pruned list o.plist

• To minimize the required memory, each pruned object is kept in the plistof only one skyline object

m1M2

M3

c

a

b

d

Scand = {m1, c, M2, M3}Scand = {c, M2, M3, a, b, d}Scand = {M2, M3, a, b, d} Osky = {c}Scand = {M3, a, b, d}Scand = {a, b, d}Scand = {b, d}Scand = {d} Osky = {a, c}Osky = {a, b, c}Scand = {}c.plist = {M2}c.plist = {M2, M3} b.plist = {d}

Scand := ∅algorithm UpdateSkyline(set Osky, object o,R-tree RO)

new Osky :=ResumeSkyline(Scand , Osky)

algorithm ResumeSkyline(set Scand, set Osky)while Q is not empty do

else not dominated by any skyline object⊳

else

Scand :={E|E o.plist, E ∈ ∉o’.plist, o’ O∀ ∈ sky }

de-heap top entry E of Scand

if E is non-leaf entry then

for all entries E’ N ∈ dovisit node N pointed by E

Osky :=Osky E∪

if E is dominated by any o O∈ skythenadd E to o.plist

push E’ into Scand

Page 21: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

21

Algorithms – Skyline-Based Search (Optimization)

• The numbers of loops required can be reduced if multiple stable object-function pairs are output at each loop

SB(set F, R-tree RO)

Osky := ; O∅ del := ∅while |F | > 0 do more unassigned functions⊳ if Osky = then∅ Osky :=ComputeSkyline(RO) else UpdateSkyline(Osky, Odel, RO) Odel := ∅ Fbest :=∅ for all o O∈ sky do find function o.fbest F that maximizes f(o)∈ Fbest :=Fbest o.f∪ best

for all f F∈ best do find object f.obest O∈ sky that maximizes f(o) for all f F∈ best do if (f.obest).fbest=f then F := F − f ; O := O − f.obest

Osky := Osky−f.obest; Odel := Odel f.o∪ best

• Fbest is the subset of F that includes the functions o.fbest that maximize f(o)

• For each f∈Fbest, the object f.obest that maximizes f(o) is coupled with the function f

• If (f.obest).fbest=f, then (f, f.obest) is stable and the function/object is removed from F/O and Osky

• At least one pair is guaranteed to be output

Page 22: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

22

Problem Variants

• Objects and Functions with capacities– Multiple objects/functions may share the same features

only one object/function with a capacity attribute– Once a pair is found, the capacity of f and o are reduced by

1

• Functions with Different Priorities– f.γ is the priority of the function – To increase the efficiency of TA, a skyline Fsky is built on the

functions

Page 23: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

23

Experiments

• Three types of synthetic datasets:– independent values are generated uniformly and

independently– correlated object’s values are close in all dimensions (if

an object is good in one dimension, it is likely to be good on the other ones too)

– anti-correlated objects that are good in one dimension tend to be poor in the other ones

Parameter Values|F| (in thousands) 1, 2.5, 5, 10, 20|O| (in thousands) 10, 50, 100, 200, 400DimensionalityD 3, 4, 5, 6

Capacityk 1, 2, 4,8, 16Function Piority γ 1, 2, 4,8, 16

Page 24: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

24

Experiments – |F| and |O| Dependency

Page 25: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

25

Experiments – Dimensionality D

Page 26: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

26

Experiments – Capacity k and Priority γ

Page 27: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

27

Experiments – Real Data (Zillow and NBA)

Page 28: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

28

Conclusions

• SB is proven to be:– I/O optimal by using an incremental skyline maintenance

algorithm, which is proven to be I/O optimal– CPU optimal by accelerating the matching between

functions and skyline objects and identifying multiple stable pairs in each iteration

Page 29: Gruppo 10:  Paolo Barboni,  Tommaso Campanella, Simone Manco

29

Conclusions

THANK YOU FOR YOUR ATTENTION

Dedicated to Chip…. RIP