Upload
lynne-powell
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
1
On Optimal Worst-Case Matching
Cheng Long (Hong Kong University of Science and Technology)Raymond Chi-Wing Wong (Hong Kong University of Science and
Technology)Philip S. Yu (University of Illinois at Chicago)
Minhao Jiang (Hong Kong University of Science and Technology)
Presented by Raymond Chi-Wing WongPrepared by Raymond Chi-Wing Wong
2
Outline
1. Introduction2. Problem Definition3. Related Work4. Algorithm – Swap-Chain5. Empirical Study6. Conclusion
3
1. Introduction
P = {p1, p2, p3}
O = {o1, o2, o3}
hospitals
residential estates
p3
p1
o1
o2
o3
p2
Some existing studies consider the capacitiesof hospitals and the demands of customers
Return an assignment between P and O such that the “condition” of the assignment is satisfied.
1
11
4
5
6
1
1
1
Different applications have different conditions.
Worst-case Optimized Condition: In the assignment, the maximum matching distance (mmd) between a residential-estate o and a hospital p is minimized.
Worst-case Optimized assignment
Worst-case Optimized Assignmentmmd = 6
4
1. Introduction
P = {p1, p2, p3}
O = {o1, o2, o3}
hospitals
residential estates
p3
p1
o1
o2
o3
p2
Some existing studies consider the capacitiesof hospitals and the demands of customers
Return an assignment between P and O such that the “condition” of the assignment is satisfied.
1
11
4
5
6
1
1
1
Different applications have different conditions.
Worst-case Optimized Condition: In the assignment, the maximum matching distance (mmd) between a residential-estate o and a hospital p is minimized.
Worst-case Optimized assignment
Worst-case Optimized Assignment
There are a lot of applications which need the worst-case optimized assignment.
1. Emergency applications (e.g., hospital allocation, fire stations and police stations)
In Hong Kong Ambulance service, the minimized maximum distance is 12 minutes (driving distance).
2. Logistics, Data Warehouse Allocation3. Mail Delivery4. Profile Matching
Worst-case dissatisfactory rate among customers is minimized
mmd = 6
5
1. Introduction
P = {p1, p2, p3}
O = {o1, o2, o3}
hospitals
residential estates
p3
p1
o1
o2
o3
p2
Some existing studies consider the capacitiesof hospitals and the demands of customers
Return an assignment between P and O such that the “condition” of the assignment is satisfied.
1
11
10 3
2
1
1
1
Different applications have different conditions.
[Wong et al, VLDB 2009] Fair Condition: In the assignment, each o O is allocated to p P that(i) is as near to o as possible, and (ii) its servicing capacity has not been exhausted in serving other closer estates.
Fair assignment
Fair Assignment mmd = 10
mmd = 6Worst-case Optimized Assignment
6
1. Introduction
P = {p1, p2, p3}
O = {o1, o2, o3}
hospitals
residential estates
p3
p1
o1
o2
o3
p2
Some existing studies consider the capacitiesof hospitals and the demands of customers
Return an assignment between P and O such that the “condition” of the assignment is satisfied.
1
11
7
5
2
1
1
1
Different applications have different conditions.
[U et al, VLDBJ 2010] Globally Optimized Condition: The total “cost” of the assignment (i.e., the sum of the matching distances) is minimized.Globally Optimized
Assignment
Fair Assignment mmd = 10
Globally Optimized Assignmentmmd = 7
mmd = 6Worst-case Optimized Assignment
7
1. Introduction NN: Nearest neighborRNN: Reverse nearest neighbor
P = {p1, p2}
O = {o1, o2, o3 , o4, o5}
hospitals
residential estates
p3
p1
o1
o2
o3
p2
Some existing studies consider the capacitiesof hospitals and the demands of customers
Return an assignment between P and O such that the “condition” of the assignment is satisfied.
1
11
7
5
2
1
1
1
Different applications have different conditions.
[U et al, VLDBJ 2010] Globally Optimized Condition: The total “cost” of the assignment (i.e., the sum of the matching distances) is minimized.Globally Optimized
Assignment
Fair Assignment mmd = 10
The assignment with this globally optimized condition is said to be a globally optimized assignment.
Globally Optimized Assignmentmmd = 7
mmd = 6Worst-case Optimized Assignment
Existing spatial assignments cannot solve the problem of finding Worst-case optimized assignment well.
8
Outline
1. Introduction2. Problem Definition3. Related Work4. Algorithm – Swap-Chain5. Empirical Study6. Conclusion
9
2. Problem Definition
P = {p1, p2, p3}
O = {o1, o2, o3}
hospitals
residential estates
p3
p1
o1
o2
o3
p2
1
11
4
5
6
1
1
1
Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.
Worst-case Optimized assignment
mmd = 6
Each o O is associated with its demand o.w (which is a positive integer)
Each p P is associated with its capacity p.w(which is a positive integer)
10
Outline
1. Introduction2. Problem Definition3. Related Work4. Algorithm – Swap-Chain5. Empirical Study6. Conclusion
3. Related Work
Bottleneck Matching Problem (BMP) Given two sets of objects, namely A and
B, and a matching distance cost between each object in A and each object in B,
BMP is to find a perfect matching (or assignment) between A and B which minimizes the maximum matching distance.
11
This problem considers that the demand of each object in A and the capacity of each object in B are both equal to 1.
Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.
Our problem considers that the demand of each object in A and the capacity of each object in B are both equal to any positive integer.
3. Related Work
Bottleneck Matching Problem (BMP)
12
Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.
Threshold the fastest algorithm
The algorithm requires to materialize all pairwise distances.Thus, it is not quite scalable.
3. Related Work
13
Spatial Assignment Problem Fair Assignment Global Optimized Assignment
As we described before, they do not address our problem well.
Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.
3. Related Work
14
Major Contribution Propose an efficient and scalable
algorithm (called Swap-Chain) for this problem
More efficient and scalable than the adapted algorithm for the bottleneck problem
Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.
15
Outline
1. Introduction2. Problem Definition3. Related Work4. Algorithm – Swap-Chain5. Empirical Study6. Conclusion
4. Algorithm – Swap-Chain
Swap-Chain involves the following 3 steps. Step 1 (Initialization) Step 2 (Assignment Adjustment) Step 3 (Iterative Step)
16
4. Algorithm – Swap-Chain
17
p3
p1
o1
o2
o3
p2
1
11
1
1
1
10 3
2
Step 1 (Initialization): Find a full assignment A using a given condition (e.g., fair assignment, globally optimized assignment and random assignment)
Fair assignment
mmd = 10
4. Algorithm – Swap-Chain
18
Step 2 (Assignment Adjustment): Re-assign some matches in A to form another full assignment A’ such that the mmd value of A’ is smaller than that of A.
p3
p1
o1
o2
o3
p2
1
11
1
1
1
10 3
2
mmd = 10
4. Algorithm – Swap-Chain
19
Step 2 (Assignment Adjustment): Re-assign some matches in A to form another full assignment A’ such that the mmd value of A’ is smaller than that of A.
p3
p1
o1
o2
o3
p2
1
11
1
1
1
2
5
7
mmd = 107
4. Algorithm – Swap-Chain
20
Step 3 (Iterative Step): Repeat Step 2 until it is not possible to perform the assignment adjustment step (Step 2).
p3
p1
o1
o2
o3
p2
1
11
1
1
1
2
5
7
mmd = 107
4. Algorithm – Swap-Chain
21
Step 3 (Iterative Step): Repeat Step 2 until it is not possible to perform the assignment adjustment step (Step 2).
p3
p1
o1
o2
o3
p2
1
11
1
1
1
6
5
4
mmd = 1076
After this assignment adjustment step, we cannot re-adjust the assignment again so that the mmd value of the adjusted assignment is smaller.This is the final solution for our problem.
Step 1 is easy. How can we perform Step 2 (i.e., how to re-assign some matchesin A such that the mmd value of this assignment A is decreased)?
4. Algorithm – Swap-Chain
Algorithm Swap-Chain makes use of extreme matches for re-adjusting the assignment in Step 2
Given an assignment A, a match in A is called an extreme match if the matching distance of this match is equal to the mmd value of A.
22
4. Algorithm – Swap-Chain
23
Consider the assignment obtained just after Step 1.
p3
p1
o1
o2
o3
p2
1
11
1
1
1
10 3
2
mmd = 10
An extreme match
Step (a): Break the extreme match (o, p)
4. Algorithm – Swap-Chain
24
Consider the assignment obtained just after Step 1.
p3
p1
o1
o2
o3
p2
1
11
1
1
1
3
2
mmd = 10
Step (b): Find a set of objects in O and P to be involved for the assignment adjustment.
Step (a): Break the extreme match (o, p)
o2List p2 o1 p1
A chain from o2
7
5A range query
A range query
4. Algorithm – Swap-Chain
25
We continue these sub-steps again to reduce the mmd value of the assignment.
p3
p1
o1
o2
o3
p2
1
11
1
1
1
2
mmd = 10
Step (a): Break the extreme match (o, p)
7
7
5
An extreme match
4. Algorithm – Swap-Chain
26
We continue these sub-steps again to reduce the mmd value of the assignment.
p3
p1
o1
o2
o3
p2
1
11
1
1
1
2
mmd = 10
Step (b): Find a set of objects in O and P to be involved for the assignment adjustment.
Step (a): Break the extreme match (o, p)
7
List p3 o3 p2
A chain from o2
5
o2
4
6
A range query
A range query
4. Algorithm – Swap-Chain
27
We cannot re-adjust the assignment anymore to reduce its mmd value.
p3
p1
o1
o2
o3
p2
1
11
1
1
1
mmd = 107
5
4
6
6
The final solution.
4. Algorithm – Swap-Chain In the algorithm, we have to perform a range
query on P We build an index on P Let the time complexity of building on P be
28
= O(n log n)
4. Algorithm – Swap-Chain The time complexity of Swap-Chain is equal to
O(R . n . (log n + k))
29
R is the number of extreme matches found in Swap-Chain
k <<n
R is typically a small number. In our experiments, R is equal to 500 on average when the dataset size is 1M.
4. Algorithm – Swap-Chain The space occupied by Swap-Chain mainly
comes from the index on P (which is O(n log n)).
30
4. Algorithm – Swap-Chain
Our Swap-Chain can be extended to handling the non-spatial problem
31
Due to the time limit, we do not discuss the details here.
32
Outline
1. Introduction2. Problem Definition3. Related Work4. Algorithm – Swap-Chain5. Empirical Study6. Conclusion
5. Empirical Study Synthetic Dataset
P and O: Uniform distribution Real Dataset
4 Datasets in Canada AB (Alberta) BC (British Columbia) ON (Ontario) QC (Quebec)
For each dataset, O: a set of populated areas P: a set of fire stations
33
5. Empirical Study Measurements
Execution Time Memory
Our proposed algorithm Swap-Chain
Two Sets of Experiments Comparison with Existing Spatial Assignment Comparison with an adapted algorithm of the
bottleneck problem (Threshold-Adapt (TA))
34
5. Empirical Study First Set: Comparison with Existing Spatial
Assignment
35
Real dataset Synthetic dataset
5. Empirical Study Second Set: Comparison with the Adapted
Algorithm Threshold-Adapt (TA)
36
Real dataset
5. Empirical Study
37
Second Set: Comparison with the Adapted Algorithm Threshold-Adapt (TA)
(in thousands) (in thousands)
Synthetic dataset
38
Outline
1. Introduction2. Problem Definition3. Related Work4. Algorithm – Swap-Chain5. Empirical Study6. Conclusion
6. Conclusion
Problem which is to find the worst-case optimized assignment
Algorithm Swap-Chain Efficient and Scalable
Experiments
39
41
1. Introduction
Bichromatic Reverse Nearest Neighbor (BRNN or RNN) Given
P and O are two sets of objects in the same data space
Problem Given an object pP, a BRNN query finds all the
objects oO whose nearest neighbor (NN) in P are p.
42
1. Introduction NN: Nearest neighborRNN: Reverse nearest neighbor
P = {p1, p2, p3}
O = {o1, o2, o3}
hospitals
residential estates
p3
p1
o1
o2
o3
NN in P = p3
NN in P = p3
NN in P = p2
RNN = {o2, o3}
RNN = {o1}
p2
RNN = {}
Capacities of hospitals are not considered.
Demands of customers are not considered.
There is a serving capacity of p3
3. Related Work
Bottleneck Matching Problem (BMP) Given two sets of objects, namely A
and B, and a matching distance cost between each object in A and each object in B,
BMP is to find a perfect matching (or assignment) between A and B which minimizes the matching distance.
43
This problem considers that the demand of each object in A and the capacity of each object in B are both equal to 1.
Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.
One may come up with a straightforward solution to solve our problem as follows.
For each object p in P (in our problem), we duplicate this object p p.w timesFor each object o in O (in our problem), we duplicate this object o o.w times
Our problem considers that the demand of each object in A and the capacity of each object in B are both equal to any positive integer.
Thus, use an existing algorithm for BMP to solve our problem.
However, this approach is cumbersome and undesirable (esp.the capacities/demands are very large).
44
Outline
1. Introduction2. Problem Definition3. Related Work4. Algorithm – Threshold-Adapt5. Algorithm – Swap-Chain6. Empirical Study7. Conclusion
4. Algorithm – Threshold-Adapt
Threshold-Adapt is an algorithm which searches the “best” solution in the solution search.
45
Threshold-Adapt (for demands/ capacities equal to any positive integer) shares the same skeleton with Threshold (for demands/capacities equal to 1)
However, this algorithm is not scalable
4. Algorithm – Threshold-Adapt
46
Before we introduce this algorithm, we give two concepts. Concept 1: Full assignment Concept 2: Feasibility
4. Algorithm – Threshold-Adapt
Suppose that the total demands from O are at most the total capacities from P
An assignment A between O and P is said to be full if each object in O is matched with an object in P.
47
Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.
p3
p1
o1
o2
o3
p2
1
11
1
1
1
10 3
2
A full assignment
Concept 1: Full Assignment
4. Algorithm – Threshold-Adapt
Suppose that the total demands from O are at most the total capacities from P
An assignment A between O and P is said to be full if each object in O is matched with an object in P.
48
Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.
p3
p1
o1
o2
o3
p2
1
11
1
1
1
10 3A non-full assignment
Concept 1: Full Assignment
4. Algorithm – Threshold-Adapt
We want to find the full assignment with the smallest mmd value.
49
Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.
p3
p1
o1
o2
o3
p2
1
11
1
1
1
10 3
2
mmd = 10
A full assignment
Concept 1: Full Assignment
4. Algorithm – Threshold-Adapt
50
Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.
p3
p1
o1
o2
o3
p2
1
11
1
1
1
mmd = 6
4
5
6
We want to find the full assignment with the smallest mmd value.
A full assignment
We choose this full assignment since it has the smallest mmd value
Concept 1: Full Assignment
4. Algorithm – Threshold-Adapt
A value is said to be feasible for our problem if there exists a full assignment such that its mmd value is at most this value.
51
Concept 2: Feasibility
p3
p1
o1
o2
o3
p2
1
11
1
1
1
4
5
6
Consider a value 6
There exists a full assignment suchthat its mmd value is at most 6
6 is feasible.
4. Algorithm – Threshold-Adapt
A value is said to be feasible for our problem if there exists a full assignment such that its mmd value is at most this value.
52
Concept 2: Feasibility
p3
p1
o1
o2
o3
p2
1
11
1
1
1
4
5
6
Consider a value 10
There exists a full assignment suchthat its mmd value is at most 10
10 is feasible.
4. Algorithm – Threshold-Adapt
A value is said to be feasible for our problem if there exists a full assignment such that its mmd value is at most this value.
53
Concept 2: Feasibility
p3
p1
o1
o2
o3
p2
1
11
1
1
1
Consider a value 10
There exists a full assignment suchthat its mmd value is at most 10
10 is feasible.
10 3
2
There can be more than one assignment such that its mmd value is at most 10.
4. Algorithm – Threshold-Adapt
A value is said to be feasible for our problem if there exists a full assignment such that its mmd value is at most this value.
54
Concept 2: Feasibility
p3
p1
o1
o2
o3
p2
1
11
1
1
1
Consider a value 1
There does exist a full assignment suchthat its mmd value is at most 1
1 is not feasible.
4. Algorithm – Threshold-Adapt
We have described the two concepts.
55
Concept 2: Feasibility
Concept 1: Full Assignment
We present Threshold-Adapt next.
4. Algorithm – Threshold-Adapt
Let S be the set of all pairwise distances between O and P
56
Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.
optimal mmd (i.e., 6)
5
9
p3
p1
o1
o2
o3
p2
1
11
1
1
1
310
7
4
11
6
2
S = { }5 , 3 , 9 , 10, 7, 4, 11, 6, 2
Observation: The optimal mmd valueis in S.
Step 1: for each value v in S, we determine whether v is feasible for our problemStep 2: find the smallest value v which is feasible.Step 3: return the full assignment with its mmd value equal to v
4. Algorithm – Threshold-Adapt
Let S be the set of all pairwise distances between O and P
57
Problem: to find an assignmentbetween P and O such that the maximum matching distance (mmd) is minimized.
5
9
p3
p1
o1
o2
o3
p2
1
11
1
1
1
310
7
4
11
6
2
S = { }5 , 3 , 9 , 10, 7, 4, 11, 6, 2
Observation: The optimal mmd valueis in S.
Step 1: for each value v in S, we determine whether v is feasible for our problemStep 2: find the smallest value v which is feasible.Step 3: return the full assignment with its mmd value equal to v
There are two remaining issues 1.Issue 1 How to determine whether a value v is feasible
2.Issue 2 How to improve the efficiency of this algorithm
Can be done by Maximum-flow Algorithm
Can be speeded up by binary search
4. Algorithm – Threshold-Adapt
The time complexity of Threshold-Adapt is O(n2 + . log n)
where is the complexity analysis of the
maximum-flow algorithm.
58
4. Algorithm – Threshold-Adapt
The space complexity of Threshold-Adapt is O(n2)
59
This algorithm is not scalable
4. Algorithm – Swap-Chain The time complexity of Swap-Chain is equal to
O( + + R . I) where
is the time complexity of Step 1
R is the number of extreme matches found in Swap-Chain
I is the time complexity of performing the re-matching operation for a given extreme match
60
=O(n log n) if the fair assignment is used.
In our experiments, R is equal to 500 on average on average when the dataset size is 1M.
R is typically a small number.
I = O(n (log n + k)) where k <<n