61
1 On Optimal Worst-Case Matching Cheng Long (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and Technology) Philip S. Yu (University of Illinois at Chicago) Minhao Jiang (Hong Kong University of Science and Technology) Presented by Raymond Chi-Wing Wong Prepared by Raymond Chi-Wing Wong

1 On Optimal Worst-Case Matching Cheng Long (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and

Embed Size (px)

Citation preview

1

On Optimal Worst-Case Matching

Cheng Long (Hong Kong University of Science and Technology)Raymond Chi-Wing Wong (Hong Kong University of Science and

Technology)Philip S. Yu (University of Illinois at Chicago)

Minhao Jiang (Hong Kong University of Science and Technology)

Presented by Raymond Chi-Wing WongPrepared by Raymond Chi-Wing Wong

2

Outline

1. Introduction2. Problem Definition3. Related Work4. Algorithm – Swap-Chain5. Empirical Study6. Conclusion

3

1. Introduction

P = {p1, p2, p3}

O = {o1, o2, o3}

hospitals

residential estates

p3

p1

o1

o2

o3

p2

Some existing studies consider the capacitiesof hospitals and the demands of customers

Return an assignment between P and O such that the “condition” of the assignment is satisfied.

1

11

4

5

6

1

1

1

Different applications have different conditions.

Worst-case Optimized Condition: In the assignment, the maximum matching distance (mmd) between a residential-estate o and a hospital p is minimized.

Worst-case Optimized assignment

Worst-case Optimized Assignmentmmd = 6

4

1. Introduction

P = {p1, p2, p3}

O = {o1, o2, o3}

hospitals

residential estates

p3

p1

o1

o2

o3

p2

Some existing studies consider the capacitiesof hospitals and the demands of customers

Return an assignment between P and O such that the “condition” of the assignment is satisfied.

1

11

4

5

6

1

1

1

Different applications have different conditions.

Worst-case Optimized Condition: In the assignment, the maximum matching distance (mmd) between a residential-estate o and a hospital p is minimized.

Worst-case Optimized assignment

Worst-case Optimized Assignment

There are a lot of applications which need the worst-case optimized assignment.

1. Emergency applications (e.g., hospital allocation, fire stations and police stations)

In Hong Kong Ambulance service, the minimized maximum distance is 12 minutes (driving distance).

2. Logistics, Data Warehouse Allocation3. Mail Delivery4. Profile Matching

Worst-case dissatisfactory rate among customers is minimized

mmd = 6

5

1. Introduction

P = {p1, p2, p3}

O = {o1, o2, o3}

hospitals

residential estates

p3

p1

o1

o2

o3

p2

Some existing studies consider the capacitiesof hospitals and the demands of customers

Return an assignment between P and O such that the “condition” of the assignment is satisfied.

1

11

10 3

2

1

1

1

Different applications have different conditions.

[Wong et al, VLDB 2009] Fair Condition: In the assignment, each o O is allocated to p P that(i) is as near to o as possible, and (ii) its servicing capacity has not been exhausted in serving other closer estates.

Fair assignment

Fair Assignment mmd = 10

mmd = 6Worst-case Optimized Assignment

6

1. Introduction

P = {p1, p2, p3}

O = {o1, o2, o3}

hospitals

residential estates

p3

p1

o1

o2

o3

p2

Some existing studies consider the capacitiesof hospitals and the demands of customers

Return an assignment between P and O such that the “condition” of the assignment is satisfied.

1

11

7

5

2

1

1

1

Different applications have different conditions.

[U et al, VLDBJ 2010] Globally Optimized Condition: The total “cost” of the assignment (i.e., the sum of the matching distances) is minimized.Globally Optimized

Assignment

Fair Assignment mmd = 10

Globally Optimized Assignmentmmd = 7

mmd = 6Worst-case Optimized Assignment

7

1. Introduction NN: Nearest neighborRNN: Reverse nearest neighbor

P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

hospitals

residential estates

p3

p1

o1

o2

o3

p2

Some existing studies consider the capacitiesof hospitals and the demands of customers

Return an assignment between P and O such that the “condition” of the assignment is satisfied.

1

11

7

5

2

1

1

1

Different applications have different conditions.

[U et al, VLDBJ 2010] Globally Optimized Condition: The total “cost” of the assignment (i.e., the sum of the matching distances) is minimized.Globally Optimized

Assignment

Fair Assignment mmd = 10

The assignment with this globally optimized condition is said to be a globally optimized assignment.

Globally Optimized Assignmentmmd = 7

mmd = 6Worst-case Optimized Assignment

Existing spatial assignments cannot solve the problem of finding Worst-case optimized assignment well.

8

Outline

1. Introduction2. Problem Definition3. Related Work4. Algorithm – Swap-Chain5. Empirical Study6. Conclusion

9

2. Problem Definition

P = {p1, p2, p3}

O = {o1, o2, o3}

hospitals

residential estates

p3

p1

o1

o2

o3

p2

1

11

4

5

6

1

1

1

Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.

Worst-case Optimized assignment

mmd = 6

Each o O is associated with its demand o.w (which is a positive integer)

Each p P is associated with its capacity p.w(which is a positive integer)

10

Outline

1. Introduction2. Problem Definition3. Related Work4. Algorithm – Swap-Chain5. Empirical Study6. Conclusion

3. Related Work

Bottleneck Matching Problem (BMP) Given two sets of objects, namely A and

B, and a matching distance cost between each object in A and each object in B,

BMP is to find a perfect matching (or assignment) between A and B which minimizes the maximum matching distance.

11

This problem considers that the demand of each object in A and the capacity of each object in B are both equal to 1.

Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.

Our problem considers that the demand of each object in A and the capacity of each object in B are both equal to any positive integer.

3. Related Work

Bottleneck Matching Problem (BMP)

12

Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.

Threshold the fastest algorithm

The algorithm requires to materialize all pairwise distances.Thus, it is not quite scalable.

3. Related Work

13

Spatial Assignment Problem Fair Assignment Global Optimized Assignment

As we described before, they do not address our problem well.

Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.

3. Related Work

14

Major Contribution Propose an efficient and scalable

algorithm (called Swap-Chain) for this problem

More efficient and scalable than the adapted algorithm for the bottleneck problem

Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.

15

Outline

1. Introduction2. Problem Definition3. Related Work4. Algorithm – Swap-Chain5. Empirical Study6. Conclusion

4. Algorithm – Swap-Chain

Swap-Chain involves the following 3 steps. Step 1 (Initialization) Step 2 (Assignment Adjustment) Step 3 (Iterative Step)

16

4. Algorithm – Swap-Chain

17

p3

p1

o1

o2

o3

p2

1

11

1

1

1

10 3

2

Step 1 (Initialization): Find a full assignment A using a given condition (e.g., fair assignment, globally optimized assignment and random assignment)

Fair assignment

mmd = 10

4. Algorithm – Swap-Chain

18

Step 2 (Assignment Adjustment): Re-assign some matches in A to form another full assignment A’ such that the mmd value of A’ is smaller than that of A.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

10 3

2

mmd = 10

4. Algorithm – Swap-Chain

19

Step 2 (Assignment Adjustment): Re-assign some matches in A to form another full assignment A’ such that the mmd value of A’ is smaller than that of A.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

2

5

7

mmd = 107

4. Algorithm – Swap-Chain

20

Step 3 (Iterative Step): Repeat Step 2 until it is not possible to perform the assignment adjustment step (Step 2).

p3

p1

o1

o2

o3

p2

1

11

1

1

1

2

5

7

mmd = 107

4. Algorithm – Swap-Chain

21

Step 3 (Iterative Step): Repeat Step 2 until it is not possible to perform the assignment adjustment step (Step 2).

p3

p1

o1

o2

o3

p2

1

11

1

1

1

6

5

4

mmd = 1076

After this assignment adjustment step, we cannot re-adjust the assignment again so that the mmd value of the adjusted assignment is smaller.This is the final solution for our problem.

Step 1 is easy. How can we perform Step 2 (i.e., how to re-assign some matchesin A such that the mmd value of this assignment A is decreased)?

4. Algorithm – Swap-Chain

Algorithm Swap-Chain makes use of extreme matches for re-adjusting the assignment in Step 2

Given an assignment A, a match in A is called an extreme match if the matching distance of this match is equal to the mmd value of A.

22

4. Algorithm – Swap-Chain

23

Consider the assignment obtained just after Step 1.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

10 3

2

mmd = 10

An extreme match

Step (a): Break the extreme match (o, p)

4. Algorithm – Swap-Chain

24

Consider the assignment obtained just after Step 1.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

3

2

mmd = 10

Step (b): Find a set of objects in O and P to be involved for the assignment adjustment.

Step (a): Break the extreme match (o, p)

o2List p2 o1 p1

A chain from o2

7

5A range query

A range query

4. Algorithm – Swap-Chain

25

We continue these sub-steps again to reduce the mmd value of the assignment.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

2

mmd = 10

Step (a): Break the extreme match (o, p)

7

7

5

An extreme match

4. Algorithm – Swap-Chain

26

We continue these sub-steps again to reduce the mmd value of the assignment.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

2

mmd = 10

Step (b): Find a set of objects in O and P to be involved for the assignment adjustment.

Step (a): Break the extreme match (o, p)

7

List p3 o3 p2

A chain from o2

5

o2

4

6

A range query

A range query

4. Algorithm – Swap-Chain

27

We cannot re-adjust the assignment anymore to reduce its mmd value.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

mmd = 107

5

4

6

6

The final solution.

4. Algorithm – Swap-Chain In the algorithm, we have to perform a range

query on P We build an index on P Let the time complexity of building on P be

28

= O(n log n)

4. Algorithm – Swap-Chain The time complexity of Swap-Chain is equal to

O(R . n . (log n + k))

29

R is the number of extreme matches found in Swap-Chain

k <<n

R is typically a small number. In our experiments, R is equal to 500 on average when the dataset size is 1M.

4. Algorithm – Swap-Chain The space occupied by Swap-Chain mainly

comes from the index on P (which is O(n log n)).

30

4. Algorithm – Swap-Chain

Our Swap-Chain can be extended to handling the non-spatial problem

31

Due to the time limit, we do not discuss the details here.

32

Outline

1. Introduction2. Problem Definition3. Related Work4. Algorithm – Swap-Chain5. Empirical Study6. Conclusion

5. Empirical Study Synthetic Dataset

P and O: Uniform distribution Real Dataset

4 Datasets in Canada AB (Alberta) BC (British Columbia) ON (Ontario) QC (Quebec)

For each dataset, O: a set of populated areas P: a set of fire stations

33

5. Empirical Study Measurements

Execution Time Memory

Our proposed algorithm Swap-Chain

Two Sets of Experiments Comparison with Existing Spatial Assignment Comparison with an adapted algorithm of the

bottleneck problem (Threshold-Adapt (TA))

34

5. Empirical Study First Set: Comparison with Existing Spatial

Assignment

35

Real dataset Synthetic dataset

5. Empirical Study Second Set: Comparison with the Adapted

Algorithm Threshold-Adapt (TA)

36

Real dataset

5. Empirical Study

37

Second Set: Comparison with the Adapted Algorithm Threshold-Adapt (TA)

(in thousands) (in thousands)

Synthetic dataset

38

Outline

1. Introduction2. Problem Definition3. Related Work4. Algorithm – Swap-Chain5. Empirical Study6. Conclusion

6. Conclusion

Problem which is to find the worst-case optimized assignment

Algorithm Swap-Chain Efficient and Scalable

Experiments

39

Q&A

40

41

1. Introduction

Bichromatic Reverse Nearest Neighbor (BRNN or RNN) Given

P and O are two sets of objects in the same data space

Problem Given an object pP, a BRNN query finds all the

objects oO whose nearest neighbor (NN) in P are p.

42

1. Introduction NN: Nearest neighborRNN: Reverse nearest neighbor

P = {p1, p2, p3}

O = {o1, o2, o3}

hospitals

residential estates

p3

p1

o1

o2

o3

NN in P = p3

NN in P = p3

NN in P = p2

RNN = {o2, o3}

RNN = {o1}

p2

RNN = {}

Capacities of hospitals are not considered.

Demands of customers are not considered.

There is a serving capacity of p3

3. Related Work

Bottleneck Matching Problem (BMP) Given two sets of objects, namely A

and B, and a matching distance cost between each object in A and each object in B,

BMP is to find a perfect matching (or assignment) between A and B which minimizes the matching distance.

43

This problem considers that the demand of each object in A and the capacity of each object in B are both equal to 1.

Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.

One may come up with a straightforward solution to solve our problem as follows.

For each object p in P (in our problem), we duplicate this object p p.w timesFor each object o in O (in our problem), we duplicate this object o o.w times

Our problem considers that the demand of each object in A and the capacity of each object in B are both equal to any positive integer.

Thus, use an existing algorithm for BMP to solve our problem.

However, this approach is cumbersome and undesirable (esp.the capacities/demands are very large).

44

Outline

1. Introduction2. Problem Definition3. Related Work4. Algorithm – Threshold-Adapt5. Algorithm – Swap-Chain6. Empirical Study7. Conclusion

4. Algorithm – Threshold-Adapt

Threshold-Adapt is an algorithm which searches the “best” solution in the solution search.

45

Threshold-Adapt (for demands/ capacities equal to any positive integer) shares the same skeleton with Threshold (for demands/capacities equal to 1)

However, this algorithm is not scalable

4. Algorithm – Threshold-Adapt

46

Before we introduce this algorithm, we give two concepts. Concept 1: Full assignment Concept 2: Feasibility

4. Algorithm – Threshold-Adapt

Suppose that the total demands from O are at most the total capacities from P

An assignment A between O and P is said to be full if each object in O is matched with an object in P.

47

Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

10 3

2

A full assignment

Concept 1: Full Assignment

4. Algorithm – Threshold-Adapt

Suppose that the total demands from O are at most the total capacities from P

An assignment A between O and P is said to be full if each object in O is matched with an object in P.

48

Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

10 3A non-full assignment

Concept 1: Full Assignment

4. Algorithm – Threshold-Adapt

We want to find the full assignment with the smallest mmd value.

49

Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

10 3

2

mmd = 10

A full assignment

Concept 1: Full Assignment

4. Algorithm – Threshold-Adapt

50

Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

mmd = 6

4

5

6

We want to find the full assignment with the smallest mmd value.

A full assignment

We choose this full assignment since it has the smallest mmd value

Concept 1: Full Assignment

4. Algorithm – Threshold-Adapt

A value is said to be feasible for our problem if there exists a full assignment such that its mmd value is at most this value.

51

Concept 2: Feasibility

p3

p1

o1

o2

o3

p2

1

11

1

1

1

4

5

6

Consider a value 6

There exists a full assignment suchthat its mmd value is at most 6

6 is feasible.

4. Algorithm – Threshold-Adapt

A value is said to be feasible for our problem if there exists a full assignment such that its mmd value is at most this value.

52

Concept 2: Feasibility

p3

p1

o1

o2

o3

p2

1

11

1

1

1

4

5

6

Consider a value 10

There exists a full assignment suchthat its mmd value is at most 10

10 is feasible.

4. Algorithm – Threshold-Adapt

A value is said to be feasible for our problem if there exists a full assignment such that its mmd value is at most this value.

53

Concept 2: Feasibility

p3

p1

o1

o2

o3

p2

1

11

1

1

1

Consider a value 10

There exists a full assignment suchthat its mmd value is at most 10

10 is feasible.

10 3

2

There can be more than one assignment such that its mmd value is at most 10.

4. Algorithm – Threshold-Adapt

A value is said to be feasible for our problem if there exists a full assignment such that its mmd value is at most this value.

54

Concept 2: Feasibility

p3

p1

o1

o2

o3

p2

1

11

1

1

1

Consider a value 1

There does exist a full assignment suchthat its mmd value is at most 1

1 is not feasible.

4. Algorithm – Threshold-Adapt

We have described the two concepts.

55

Concept 2: Feasibility

Concept 1: Full Assignment

We present Threshold-Adapt next.

4. Algorithm – Threshold-Adapt

Let S be the set of all pairwise distances between O and P

56

Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.

optimal mmd (i.e., 6)

5

9

p3

p1

o1

o2

o3

p2

1

11

1

1

1

310

7

4

11

6

2

S = { }5 , 3 , 9 , 10, 7, 4, 11, 6, 2

Observation: The optimal mmd valueis in S.

Step 1: for each value v in S, we determine whether v is feasible for our problemStep 2: find the smallest value v which is feasible.Step 3: return the full assignment with its mmd value equal to v

4. Algorithm – Threshold-Adapt

Let S be the set of all pairwise distances between O and P

57

Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.

5

9

p3

p1

o1

o2

o3

p2

1

11

1

1

1

310

7

4

11

6

2

S = { }5 , 3 , 9 , 10, 7, 4, 11, 6, 2

Observation: The optimal mmd valueis in S.

Step 1: for each value v in S, we determine whether v is feasible for our problemStep 2: find the smallest value v which is feasible.Step 3: return the full assignment with its mmd value equal to v

There are two remaining issues 1.Issue 1 How to determine whether a value v is feasible

2.Issue 2 How to improve the efficiency of this algorithm

Can be done by Maximum-flow Algorithm

Can be speeded up by binary search

4. Algorithm – Threshold-Adapt

The time complexity of Threshold-Adapt is O(n2 + . log n)

where is the complexity analysis of the

maximum-flow algorithm.

58

4. Algorithm – Threshold-Adapt

The space complexity of Threshold-Adapt is O(n2)

59

This algorithm is not scalable

4. Algorithm – Swap-Chain The time complexity of Swap-Chain is equal to

O( + + R . I) where

is the time complexity of Step 1

R is the number of extreme matches found in Swap-Chain

I is the time complexity of performing the re-matching operation for a given extreme match

60

=O(n log n) if the fair assignment is used.

In our experiments, R is equal to 500 on average on average when the dataset size is 1M.

R is typically a small number.

I = O(n (log n + k)) where k <<n

6. Empirical Study

61

Non-Spatial Problem

Real dataset