1 On Optimal Worst-Case Matching Cheng Long (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and

1

On Optimal Worst-Case Matching

Cheng Long (Hong Kong University of Science and Technology)Raymond Chi-Wing Wong (Hong Kong University of Science and

Technology)Philip S. Yu (University of Illinois at Chicago)

Minhao Jiang (Hong Kong University of Science and Technology)

Presented by Raymond Chi-Wing WongPrepared by Raymond Chi-Wing Wong

2

Outline

1. Introduction2. Problem Definition3. Related Work4. Algorithm – Swap-Chain5. Empirical Study6. Conclusion

3

1. Introduction

P = {p1, p2, p3}

O = {o1, o2, o3}

hospitals

residential estates

p3

p1

o1

o2

o3

p2

Some existing studies consider the capacitiesof hospitals and the demands of customers

Return an assignment between P and O such that the “condition” of the assignment is satisfied.

1

11

4

5

6

1

1

1

Different applications have different conditions.

Worst-case Optimized Condition: In the assignment, the maximum matching distance (mmd) between a residential-estate o and a hospital p is minimized.

Worst-case Optimized assignment

Worst-case Optimized Assignmentmmd = 6

4

1. Introduction

P = {p1, p2, p3}

O = {o1, o2, o3}

hospitals

residential estates

p3

p1

o1

o2

o3

p2



1

11

4

5

6

1

1

1


Worst-case Optimized Condition: In the assignment, the maximum matching distance (mmd) between a residential-estate o and a hospital p is minimized.


Worst-case Optimized Assignment

There are a lot of applications which need the worst-case optimized assignment.

1. Emergency applications (e.g., hospital allocation, fire stations and police stations)

In Hong Kong Ambulance service, the minimized maximum distance is 12 minutes (driving distance).

2. Logistics, Data Warehouse Allocation3. Mail Delivery4. Profile Matching

Worst-case dissatisfactory rate among customers is minimized

mmd = 6

5

1. Introduction

P = {p1, p2, p3}

O = {o1, o2, o3}

hospitals

residential estates

p3

p1

o1

o2

o3

p2



1

11

10 3

2

1

1

1


[Wong et al, VLDB 2009] Fair Condition: In the assignment, each o O is allocated to p P that(i) is as near to o as possible, and (ii) its servicing capacity has not been exhausted in serving other closer estates.

Fair assignment

Fair Assignment mmd = 10

mmd = 6Worst-case Optimized Assignment

6

1. Introduction

P = {p1, p2, p3}

O = {o1, o2, o3}

hospitals

residential estates

p3

p1

o1

o2

o3

p2



1

11

7

5

2

1

1

1


[U et al, VLDBJ 2010] Globally Optimized Condition: The total “cost” of the assignment (i.e., the sum of the matching distances) is minimized.Globally Optimized

Assignment


Globally Optimized Assignmentmmd = 7


7

1. Introduction NN: Nearest neighborRNN: Reverse nearest neighbor

P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

hospitals

residential estates

p3

p1

o1

o2

o3

p2



1

11

7

5

2

1

1

1


[U et al, VLDBJ 2010] Globally Optimized Condition: The total “cost” of the assignment (i.e., the sum of the matching distances) is minimized.Globally Optimized

Assignment


The assignment with this globally optimized condition is said to be a globally optimized assignment.

Globally Optimized Assignmentmmd = 7


Existing spatial assignments cannot solve the problem of finding Worst-case optimized assignment well.

8

Outline


9

2. Problem Definition

P = {p1, p2, p3}

O = {o1, o2, o3}

hospitals

residential estates

p3

p1

o1

o2

o3

p2

1

11

4

5

6

1

1

1

Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.


mmd = 6

Each o O is associated with its demand o.w (which is a positive integer)

Each p P is associated with its capacity p.w(which is a positive integer)

10

Outline


3. Related Work

Bottleneck Matching Problem (BMP) Given two sets of objects, namely A and

B, and a matching distance cost between each object in A and each object in B,

BMP is to find a perfect matching (or assignment) between A and B which minimizes the maximum matching distance.

11

This problem considers that the demand of each object in A and the capacity of each object in B are both equal to 1.


Our problem considers that the demand of each object in A and the capacity of each object in B are both equal to any positive integer.

3. Related Work

Bottleneck Matching Problem (BMP)

12


Threshold the fastest algorithm

The algorithm requires to materialize all pairwise distances.Thus, it is not quite scalable.

3. Related Work

13

Spatial Assignment Problem Fair Assignment Global Optimized Assignment

As we described before, they do not address our problem well.


3. Related Work

14

Major Contribution Propose an efficient and scalable

algorithm (called Swap-Chain) for this problem

More efficient and scalable than the adapted algorithm for the bottleneck problem


15

Outline


4. Algorithm – Swap-Chain

Swap-Chain involves the following 3 steps. Step 1 (Initialization) Step 2 (Assignment Adjustment) Step 3 (Iterative Step)

16


17

p3

p1

o1

o2

o3

p2

1

11

1

1

1

10 3

2

Step 1 (Initialization): Find a full assignment A using a given condition (e.g., fair assignment, globally optimized assignment and random assignment)

Fair assignment

mmd = 10


18

Step 2 (Assignment Adjustment): Re-assign some matches in A to form another full assignment A’ such that the mmd value of A’ is smaller than that of A.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

10 3

2

mmd = 10


19

Step 2 (Assignment Adjustment): Re-assign some matches in A to form another full assignment A’ such that the mmd value of A’ is smaller than that of A.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

2

5

7

mmd = 107


20

Step 3 (Iterative Step): Repeat Step 2 until it is not possible to perform the assignment adjustment step (Step 2).

p3

p1

o1

o2

o3

p2

1

11

1

1

1

2

5

7

mmd = 107


21

Step 3 (Iterative Step): Repeat Step 2 until it is not possible to perform the assignment adjustment step (Step 2).

p3

p1

o1

o2

o3

p2

1

11

1

1

1

6

5

4

mmd = 1076

After this assignment adjustment step, we cannot re-adjust the assignment again so that the mmd value of the adjusted assignment is smaller.This is the final solution for our problem.

Step 1 is easy. How can we perform Step 2 (i.e., how to re-assign some matchesin A such that the mmd value of this assignment A is decreased)?


Algorithm Swap-Chain makes use of extreme matches for re-adjusting the assignment in Step 2

Given an assignment A, a match in A is called an extreme match if the matching distance of this match is equal to the mmd value of A.

22


23

Consider the assignment obtained just after Step 1.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

10 3

2

mmd = 10

An extreme match

Step (a): Break the extreme match (o, p)


24

Consider the assignment obtained just after Step 1.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

3

2

mmd = 10

Step (b): Find a set of objects in O and P to be involved for the assignment adjustment.


o2List p2 o1 p1

A chain from o2

7

5A range query

A range query


25

We continue these sub-steps again to reduce the mmd value of the assignment.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

2

mmd = 10


7

7

5

An extreme match


26

We continue these sub-steps again to reduce the mmd value of the assignment.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

2

mmd = 10

Step (b): Find a set of objects in O and P to be involved for the assignment adjustment.


7

List p3 o3 p2

A chain from o2

5

o2

4

6

A range query

A range query


27

We cannot re-adjust the assignment anymore to reduce its mmd value.

p3

p1

o1

o2

o3

p2

1

11

1

1

1

mmd = 107

5

4

6

6

The final solution.

4. Algorithm – Swap-Chain In the algorithm, we have to perform a range

query on P We build an index on P Let the time complexity of building on P be

28

= O(n log n)

4. Algorithm – Swap-Chain The time complexity of Swap-Chain is equal to

O(R . n . (log n + k))

29

R is the number of extreme matches found in Swap-Chain

k <<n

R is typically a small number. In our experiments, R is equal to 500 on average when the dataset size is 1M.

4. Algorithm – Swap-Chain The space occupied by Swap-Chain mainly

comes from the index on P (which is O(n log n)).

30


Our Swap-Chain can be extended to handling the non-spatial problem

31

Due to the time limit, we do not discuss the details here.

32

Outline


5. Empirical Study Synthetic Dataset

P and O: Uniform distribution Real Dataset

4 Datasets in Canada AB (Alberta) BC (British Columbia) ON (Ontario) QC (Quebec)

For each dataset, O: a set of populated areas P: a set of fire stations

33

5. Empirical Study Measurements

Execution Time Memory

Our proposed algorithm Swap-Chain

Two Sets of Experiments Comparison with Existing Spatial Assignment Comparison with an adapted algorithm of the

bottleneck problem (Threshold-Adapt (TA))

34

5. Empirical Study First Set: Comparison with Existing Spatial

Assignment

35

Real dataset Synthetic dataset

5. Empirical Study Second Set: Comparison with the Adapted

Algorithm Threshold-Adapt (TA)

36

Real dataset

5. Empirical Study

37

Second Set: Comparison with the Adapted Algorithm Threshold-Adapt (TA)

(in thousands) (in thousands)

Synthetic dataset

38

Outline


6. Conclusion

Problem which is to find the worst-case optimized assignment

Algorithm Swap-Chain Efficient and Scalable

Experiments

39

Q&A

40

41

1. Introduction

Bichromatic Reverse Nearest Neighbor (BRNN or RNN) Given

P and O are two sets of objects in the same data space

Problem Given an object pP, a BRNN query finds all the

objects oO whose nearest neighbor (NN) in P are p.

42

1. Introduction NN: Nearest neighborRNN: Reverse nearest neighbor

P = {p1, p2, p3}

O = {o1, o2, o3}

hospitals

residential estates

p3

p1

o1

o2

o3

NN in P = p3

NN in P = p3

NN in P = p2

RNN = {o2, o3}

RNN = {o1}

p2

RNN = {}

Capacities of hospitals are not considered.

Demands of customers are not considered.

There is a serving capacity of p3

3. Related Work

Bottleneck Matching Problem (BMP) Given two sets of objects, namely A

and B, and a matching distance cost between each object in A and each object in B,

BMP is to find a perfect matching (or assignment) between A and B which minimizes the matching distance.

43

This problem considers that the demand of each object in A and the capacity of each object in B are both equal to 1.


One may come up with a straightforward solution to solve our problem as follows.

For each object p in P (in our problem), we duplicate this object p p.w timesFor each object o in O (in our problem), we duplicate this object o o.w times

Our problem considers that the demand of each object in A and the capacity of each object in B are both equal to any positive integer.

Thus, use an existing algorithm for BMP to solve our problem.

However, this approach is cumbersome and undesirable (esp.the capacities/demands are very large).

44

Outline

1. Introduction2. Problem Definition3. Related Work4. Algorithm – Threshold-Adapt5. Algorithm – Swap-Chain6. Empirical Study7. Conclusion

4. Algorithm – Threshold-Adapt

Threshold-Adapt is an algorithm which searches the “best” solution in the solution search.

45

Threshold-Adapt (for demands/ capacities equal to any positive integer) shares the same skeleton with Threshold (for demands/capacities equal to 1)

However, this algorithm is not scalable


46

Before we introduce this algorithm, we give two concepts. Concept 1: Full assignment Concept 2: Feasibility


Suppose that the total demands from O are at most the total capacities from P

An assignment A between O and P is said to be full if each object in O is matched with an object in P.

47


p3

p1

o1

o2

o3

p2

1

11

1

1

1

10 3

2

A full assignment

Concept 1: Full Assignment


Suppose that the total demands from O are at most the total capacities from P

An assignment A between O and P is said to be full if each object in O is matched with an object in P.

48


p3

p1

o1

o2

o3

p2

1

11

1

1

1

10 3A non-full assignment



We want to find the full assignment with the smallest mmd value.

49


p3

p1

o1

o2

o3

p2

1

11

1

1

1

10 3

2

mmd = 10

A full assignment



50


p3

p1

o1

o2

o3

p2

1

11

1

1

1

mmd = 6

4

5

6

We want to find the full assignment with the smallest mmd value.

A full assignment

We choose this full assignment since it has the smallest mmd value



A value is said to be feasible for our problem if there exists a full assignment such that its mmd value is at most this value.

51

Concept 2: Feasibility

p3

p1

o1

o2

o3

p2

1

11

1

1

1

4

5

6

Consider a value 6

There exists a full assignment suchthat its mmd value is at most 6

6 is feasible.



52


p3

p1

o1

o2

o3

p2

1

11

1

1

1

4

5

6

Consider a value 10


10 is feasible.



53


p3

p1

o1

o2

o3

p2

1

11

1

1

1

Consider a value 10


10 is feasible.

10 3

2

There can be more than one assignment such that its mmd value is at most 10.



54


p3

p1

o1

o2

o3

p2

1

11

1

1

1

Consider a value 1

There does exist a full assignment suchthat its mmd value is at most 1

1 is not feasible.


We have described the two concepts.

55



We present Threshold-Adapt next.


Let S be the set of all pairwise distances between O and P

56


optimal mmd (i.e., 6)

5

9

p3

p1

o1

o2

o3

p2

1

11

1

1

1

310

7

4

11

6

2

S = { }5 , 3 , 9 , 10, 7, 4, 11, 6, 2

Observation: The optimal mmd valueis in S.

Step 1: for each value v in S, we determine whether v is feasible for our problemStep 2: find the smallest value v which is feasible.Step 3: return the full assignment with its mmd value equal to v


Let S be the set of all pairwise distances between O and P

57


5

9

p3

p1

o1

o2

o3

p2

1

11

1

1

1

310

7

4

11

6

2

S = { }5 , 3 , 9 , 10, 7, 4, 11, 6, 2

Observation: The optimal mmd valueis in S.

Step 1: for each value v in S, we determine whether v is feasible for our problemStep 2: find the smallest value v which is feasible.Step 3: return the full assignment with its mmd value equal to v

There are two remaining issues 1.Issue 1 How to determine whether a value v is feasible

2.Issue 2 How to improve the efficiency of this algorithm

Can be done by Maximum-flow Algorithm

Can be speeded up by binary search


The time complexity of Threshold-Adapt is O(n2 + . log n)

where is the complexity analysis of the

maximum-flow algorithm.

58


The space complexity of Threshold-Adapt is O(n2)

59

This algorithm is not scalable

4. Algorithm – Swap-Chain The time complexity of Swap-Chain is equal to

O( + + R . I) where

is the time complexity of Step 1

R is the number of extreme matches found in Swap-Chain

I is the time complexity of performing the re-matching operation for a given extreme match

60

=O(n log n) if the fair assignment is used.

In our experiments, R is equal to 500 on average on average when the dataset size is 1M.

R is typically a small number.

I = O(n (log n + k)) where k <<n

6. Empirical Study

61

Non-Spatial Problem

Real dataset

Documents

1 On Optimal Worst-Case Matching Cheng Long (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and