View
222
Download
0
Category
Preview:
Citation preview
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 1/22
“Mining the Matrix” – Applying
Data Mining Concepts to Large
Scale MIPs
J.L. Higle
May, 2004
Groningen WSIP
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 2/22
2
Outline
• “VanTran” paratransit driver schedules(a problem formulation)
• Problem Characteristics(or why I gave up on that idea…)
• Observations based on SSD computations(relationship to some data mining techniques)
• An initial foray …• Computational results so far
• What’s next…
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 3/22
3
VanTran – Paratransit service provider in Tucson
•± 80 vans, each hold up to 3 wheelchairs and 10 ambulatory riders
•Reservation-based ride service
•Changes in federal regulation impose stricter service requirements
•Driver schedules (“routes”) are set well in advance of day of service,
and remain constant over lengthy periods of time.
•Demand changes each day, so vehicle movements vary.
Question: Given that service requirements are changing, how should
the driver schedules be adapted to accommodate this change?
My response … “This sounds like a recourse problem …first stage ~ “schedule drivers”
second stage ~ “assign customers to drivers”
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 4/22
4
One formulation
(or something like that…):Variables:
2
10
1 if driver r "starts" in period t; 0 othe rwise
1 if driver r "ends" in period t; 0 otherwise1 if r drives < 2hrs; 0 otherwise
1 if r drives > 10 hrs; 0 otherwise
1 if customer i assigned to
rt
rt
r
r
ir
s
e D
D
x
r; 0 otherwise
1 if customer i not assigned; 0 otherwise1 if i and j assigned togther; 0 otherwise
i
ij
u z
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 5/22
5
, ,
( , )
/
driver's schedule is too short
driver's on the road too long
1 , don't start and end at the same time
s
r s i
r t r i i j
s
t
t rt r r
s
rt rt
i
r
rt rt r
rt rt
j
t
Min t p p M d i j
s t
t M t r
t M t
e s D D
e s D
e s D
s e
r
r
u z
t
å å å å
åå
l
l
l
l
( )
*
1 driver on the road at all times
1 account for each customer
1 , , i,j assigned to same driver?
, seat capacity
(1 )
ir i
ij ir jr
ir
i
r q t
r
i
i I t
pick
rq rq
rt r i
t
x u
z
t
i
i j r
n c r t
t M
x x
x
x
e
t
s
s
åå
å
å
å , don't assign i to r if r starts too lat e
(1 ) , don't assign i to r if r ends too earlyr
drop
r
t
t ii x
i r
t M t i r e
å
Plus a few extra “tie-breaking” constraints to tighten the relaxation
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 6/22
6
Problem Characteristics (or why I gave up on that idea…)
• Hmm… how about 4 drivers, 20 customers, and only one demandscenario…
• This results in a MIP – 605 binaries, 210 continuous, 1546constraints…
• I let CPLEX work on it all night … after 5,000,000 B&B nodes Imanaged to get the gap down to 7.7% (ouch)
• VanTran’s got 80 vehicles, 60 ± drivers, thousands of passengers,random daily demand…
I abandoned that approach …This is a large MIP (binary) lots of rows, lots of columns that’s
probably going to be solved lots of times…
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 7/22
7
SSD Stochastic Scenario Decomposition
(Higle, Rayco, Sen ’01)
• Solution method for multistage SLP
• Uses scenario decomposition + piecewise linear
concave approximation of the dual objectivefunction
• Uses randomly generated observations for
successive improvement of approximation
– adaptive sampling (aka “internal”) as opposed to
nonadaptive sampling (aka “external”)
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 8/22
8
SSD: Salient Features
• New observations – increase master program column dimension• New cuts – increase master program row dimension
• Column dimension grows is more problematic…
To solve the master program we introduced some column and rowaggregation to reduce the size as follows:
• aggregate most of the cuts (except for two … “new”, “incumbent”)
• represent each column with 4-tuple
– current value
– coefficient in “new” cut
– coefficient in “incumbent” cut
– coefficient in “aggregated” cut
• columns with similar 4-tuples are aggregatedNote: aggregation ignores scenario and stage associated with
column … looks at the “data” and considers only similarities inthe data.
Note also: it worked surprisingly well on a variety of problems.
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 9/22
9
The aggregation scheme used by SSD can be viewed as a formof “data mining” of the master program matrix.
“Data Mining” is a catch-all phrase – it’s a collection of techniques used to draw some meaning from a large data set&/or reduce its storage requirements.
What’s the connection…
Large problems Big Matrices lots of “data”
“VanTran” takes a long time to solve because it’s hard tochoose from “similar” solutions, and lots of “ties” have to be
broken
Perhaps columns can be “clustered” so that tie-breaking can be
postponed until later...Perhaps the “information” contained in the constraints can be
represented in a compressed form…
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 10/22
10
Variable Clustering are the binary, continuous variables
are the constraints … i.e.,
For each columns i and j calculate for some “distance”, D.
If are all “close” to each other, replace the columns by a“clustered” variable, X
I (general integer) so that the constraints become:
where , and (obj. coeffs. similarly defined).
The resulting MIP has fewer variables, and is likely to spend less timetrying to break ties.
, cb x x
b b c c A x A x r 1,..., where
b
m
b n i A a a a i
,ij i j
d D a a
i i I a
I I c c
I
a X A x r å1
|| || I ii I a a I å || || I I X u I
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 11/22
11
A solution to the clustered MIP, can be converted to a solution to
the original MIP (“parsed”) as follows:
If the “cluster” must be undone, which can be
accomplished with a simple MIP … during which
can be assigned as well.
c x
ˆ 0 0 I i
X x i I
ˆ 1 I I i
X u x i I
ˆ0 I I
X u
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 12/22
12
Two questions—
How should “distance” be defined?
How should variables be “linked” to form clusters?• Possible “Distance” definitions…
• cityblock:
• euclidean:
• Less standard:
• indicator:
•correlation:
• hamming: percent of coordinates that differ
• jaccard:
| |ij ki kj
k
d a a å
2
ij ki kj
k
d a a å
1 | * | where ( ) 1 if 0; 0 otherwiseij ki kj
k
d a a u u >å
1 ( , )ij i jd a a
1
| |ij ki kj
k
d a an
å
1
| |ij ki kj
k nonzeroes
d a an
å
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 13/22
13
Possible “linking” definitions:
“linking” creates a hierarchical tree that indicates the order in
which clusters are aggregated. Some possible methods for aggregating two sub-clusters:
• single: min. smallest distance between elements of both sets
• complete: min. largest distance between elements of both sets• average: min. average distance between objects in the two sets
• ward: minimize inner squared distance
JH Confession: I didn’t want to code this so I just used Matlab… these are some of the standard linkages that Matlab
provides…
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 14/22
14
My initial foray…
MIP Solver: CPLEX 8.1 (BTTOL 0.1, EMPHASIS feasibility)
6 Distances*4 Linkages*3 ClusterSizes = 72 runs
VanTran: 605 Binary, 210 continuous variables … 1546 constraints
Best known objective value = 605
CPLEX required 1,490,789 nodes to identify it
Gap after 5,000,000 nodes (37,481,441 iterations) is 7.7%
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 15/22
15
Some especially good combinations:
Dist Link ClusterSize ObjVal Nodes ItersJaccard Ward 300 607 1,240 8,881
Indicator Average 300 609 940 8,543
CityBlock Average 0.50 616.5 10,000 61,359
Correlation Complete 0.50 616.5 10,000 65,473
Correlation Single 0.50 616.5 10,000 108,396
Euclidean Average 300 617 5,264 62,120
Euclidean Single 0.50 618.5 10,000 73,302
Correlation Average 0.50 622.5 10,000 66,865
CityBlock Single 300 626 359 5,814
CityBlock Single 0.75 626 359 5,814
Not quite so good…
Euclidean Single 300 815 350 4230
Euclidean Average 50 902 4282 50514
The rest…average ( std dev)(Note: 28 did not return a feasible solution)
1951.3(260.5)
217.4(1283.4)
1617.8(6708.2)
(Note: Nodelimit = 10,000. Relative error gap tolerance = 0.03)
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 16/22
16
Some significant correlations:
measure measure correlation
Parsed Objective Value # of clusters -0.369
MIP solution node id -0.604
MIP iteration count -0.706
MIP1 solution node id # of clusters 0.523
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 17/22
17
DistDist
Link Link
CutOff CutOff
WardSingeCompete Average 3007550
2000
1500
1000
2000
1500
1000
Dist
Euc
Ham
Ind
Jac
Abs
Cor
Link
Singe
Ward
Average
Compete
Interaction Plot (data means) for Parsed_O
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 18/22
18
What’s next…
• An obvious next step
– use the solution as an initial solution for the original MIP. – slight problem: lower bounds are still weak, doesn’t help much
• Obvious questions –
can we “mine” for improved lower bounds?
(maybe, but not ready for prime time …)
• Does this generalize beyond VanTran?
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 19/22
19
Experimentation with some MIPLIB problems:
Problems and Characteristics
Problem Binaries Continuous Row Types Notes
10teams 1800 225 110 G, 40 P, 80 S correlations n/a
danoint 56 465 256 G, 392 U
egout 55 86 43 G, 55 U
fiber 1254 44 44 G, 378 U
khb05250 24 1326 77 G, 24 U
misc06 112 1696 820 G
mod011 96 10862 4400 G, 16 S, 64 U
rentacar 55 9502 6674 G, 55 U correlations n/argn 100 80 20 G, 4 P
G: General P: Packing S: Special Ordered Set U,L: Upper,Lower Bound
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 20/22
20
Problem Combinations # within 1.5% of bestknown solution
# within 0.01%of best known
solution
# w/o initialsolution
10teams 60 60
danoint 72 12 12
egout 72 72 54
fiber 72 *** 69
khb05250 72 68 49
misc06 72 72 0
mod011 72 72 72rentacar 60 60
rgn 72 49 49
*** when “parsed” solution was used to initialize B&B, optimal solutionidentified within 285 nodes, 4300 iterations
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 21/22
21
Combination Summary: # within 1.5% of best known solution
Average Complete Single Ward
CityBlock 15 15 15 15
Correlation 13 15 13 13
Euclidean 17 15 15 15
Hamming 15 14 13 15Indicator 15 15 13 16
Jaccard 14 14 13 13
Each Distance/Linkage Combination has:
3 cluster sizes, 9 problems = 27 combinations.(correlation: 7 problems, so 21 combinations)
8/8/2019 Mining the Matrix
http://slidepdf.com/reader/full/mining-the-matrix 22/22
22
Conclusions?
• There’s still a lot of work ~ data analysis/interpretation to do
• It appears that there may be some problem classes where this type of
approach is beneficial
• “Parsed” solution isn’t necessarily feasible, but a complete recoursetype of formulation should eliminate that problem
• Lower bounds … might be achievable through a row aggregation
scheme
Recommended