22
“Mining the Matrix” – Applying Data Mining Concepts to Large Scale MIPs J.L. Higle May, 2004 Groningen WSIP

Mining the Matrix

Embed Size (px)

Citation preview

Page 1: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 1/22

“Mining the Matrix” – Applying

Data Mining Concepts to Large

Scale MIPs

J.L. Higle

May, 2004

Groningen WSIP

Page 2: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 2/22

2

Outline

• “VanTran” paratransit driver schedules(a problem formulation)

• Problem Characteristics(or why I gave up on that idea…)

• Observations based on SSD computations(relationship to some data mining techniques)

• An initial foray …• Computational results so far 

• What’s next…

Page 3: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 3/22

3

VanTran – Paratransit service provider in Tucson 

•± 80 vans, each hold up to 3 wheelchairs and 10 ambulatory riders

•Reservation-based ride service

•Changes in federal regulation impose stricter service requirements

•Driver schedules (“routes”) are set well in advance of day of service,

and remain constant over lengthy periods of time.

•Demand changes each day, so vehicle movements vary.

Question: Given that service requirements are changing, how should

the driver schedules be adapted to accommodate this change?

My response … “This sounds like a recourse problem …first stage ~ “schedule drivers”

second stage ~ “assign customers to drivers”

Page 4: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 4/22

4

One formulation

(or something like that…):Variables:

2

10

1 if driver r "starts" in period t; 0 othe rwise

1 if driver r "ends" in period t; 0 otherwise1 if r drives < 2hrs; 0 otherwise

1 if r drives > 10 hrs; 0 otherwise

1 if customer i assigned to

rt 

rt 

ir 

 s

e D

 D

 x

r; 0 otherwise

1 if customer i not assigned; 0 otherwise1 if i and j assigned togther; 0 otherwise

i

ij

u z 

Page 5: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 5/22

5

, ,

( , )

/

driver's schedule is too short

driver's on the road too long

1 , don't start and end at the same time

 s

r  s i

r t r i i j

 s

t rt r r  

 s

rt rt  

i

rt rt r  

rt rt  

 j

Min t p p M d i j

 s t 

t M t r  

t M t 

e s D D

e s D

e s D

 s e

u z 

å å å å

åå

l

l

l

l

( )

*

1 driver on the road at all times

1 account for each customer  

1 , , i,j assigned to same driver?

, seat capacity

(1 )

ir i

ij ir jr  

ir 

i

r q t 

i

i I t 

 pick 

rq rq

rt  r  i

 x u

 z 

i

i j r 

n c r t  

t M 

 x x

 x

 x

e

 s

 s

åå

å

å

å , don't assign i to r if r starts too lat e

(1 ) , don't assign i to r if r ends too earlyr 

drop

t  ii x

i r 

t M t i r  e

å

Plus a few extra “tie-breaking” constraints to tighten the relaxation

Page 6: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 6/22

6

Problem Characteristics (or why I gave up on that idea…)

• Hmm… how about 4 drivers, 20 customers, and only one demandscenario…

• This results in a MIP – 605 binaries, 210 continuous, 1546constraints…

• I let CPLEX work on it all night … after 5,000,000 B&B nodes Imanaged to get the gap down to 7.7% (ouch)

• VanTran’s got 80 vehicles, 60 ± drivers, thousands of passengers,random daily demand…

I abandoned that approach …This is a large MIP (binary) lots of rows, lots of columns that’s

 probably going to be solved lots of times…

Page 7: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 7/22

7

SSD Stochastic Scenario Decomposition

(Higle, Rayco, Sen ’01)

• Solution method for multistage SLP

• Uses scenario decomposition + piecewise linear 

concave approximation of the dual objectivefunction

• Uses randomly generated observations for 

successive improvement of approximation

 –  adaptive sampling (aka “internal”) as opposed to

nonadaptive sampling (aka “external”)

Page 8: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 8/22

8

SSD: Salient Features

•  New observations – increase master program column dimension•  New cuts – increase master program row dimension

• Column dimension grows is more problematic…

To solve the master program we introduced some column and rowaggregation to reduce the size as follows:

• aggregate most of the cuts (except for two … “new”, “incumbent”)

• represent each column with 4-tuple

 –  current value

 –  coefficient in “new” cut

 –  coefficient in “incumbent” cut

 –  coefficient in “aggregated” cut

• columns with similar 4-tuples are aggregatedNote: aggregation ignores scenario and stage associated with

column … looks at the “data” and considers only similarities inthe data.

Note also: it worked surprisingly well on a variety of problems.

Page 9: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 9/22

9

The aggregation scheme used by SSD can be viewed as a formof “data mining” of the master program matrix.

“Data Mining” is a catch-all phrase – it’s a collection of techniques used to draw some meaning from a large data set&/or reduce its storage requirements.

What’s the connection…

Large problems Big Matrices lots of “data”

“VanTran” takes a long time to solve because it’s hard tochoose from “similar” solutions, and lots of “ties” have to be

 broken

Perhaps columns can be “clustered” so that tie-breaking can be

 postponed until later...Perhaps the “information” contained in the constraints can be

represented in a compressed form…

Page 10: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 10/22

10

Variable Clustering are the binary, continuous variables

are the constraints … i.e.,

For each columns i and j calculate for some “distance”,  D.

If are all “close” to each other, replace the columns by a“clustered” variable,  X 

 I (general integer) so that the constraints become:

where , and (obj. coeffs. similarly defined).

The resulting MIP has fewer variables, and is likely to spend less timetrying to break ties.

, cb x x

b b c c A x A x r   1,..., where

b

m

b n i  A a a a i

,ij i j

d D a a

i i I a

  I I c c

 I 

a X A x r   å1

|| || I ii I a a I  å || || I I   X u I  

Page 11: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 11/22

11

A solution to the clustered MIP, can be converted to a solution to

the original MIP (“parsed”) as follows:

If the “cluster” must be undone, which can be

accomplished with a simple MIP … during which

can be assigned as well.

c x

ˆ 0 0 I i

  X x i I  

ˆ 1  I I i

  X u x i I  

ˆ0 I I 

 X u

Page 12: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 12/22

12

Two questions— 

How should “distance” be defined?

How should variables be “linked” to form clusters?• Possible “Distance” definitions…

• cityblock:

• euclidean:

•  Less standard:

• indicator:

•correlation:

• hamming: percent of coordinates that differ 

•  jaccard:

| |ij ki kj

d a a å

2

ij ki kj

d a a å

1 | * | where ( ) 1 if 0; 0 otherwiseij ki kj

d a a u u >å

1 ( , )ij i jd a a

1

| |ij ki kj

d a an

å

1

| |ij ki kj

k nonzeroes

d a an

å

Page 13: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 13/22

13

Possible “linking” definitions:

“linking” creates a hierarchical tree that indicates the order in

which clusters are aggregated. Some possible methods for aggregating two sub-clusters:

• single: min. smallest distance between elements of both sets

• complete: min. largest distance between elements of both sets• average: min. average distance between objects in the two sets

• ward: minimize inner squared distance

JH Confession: I didn’t want to code this so I just used Matlab… these are some of the standard linkages that Matlab

 provides…

Page 14: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 14/22

14

My initial foray…

MIP Solver: CPLEX 8.1 (BTTOL 0.1, EMPHASIS feasibility)

6 Distances*4 Linkages*3 ClusterSizes = 72 runs

VanTran: 605 Binary, 210 continuous variables … 1546 constraints

Best known objective value = 605

CPLEX required 1,490,789 nodes to identify it

Gap after 5,000,000 nodes (37,481,441 iterations) is 7.7%

Page 15: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 15/22

15

Some especially good combinations:

Dist Link ClusterSize ObjVal Nodes ItersJaccard Ward 300 607 1,240 8,881

Indicator Average 300 609 940 8,543

CityBlock Average 0.50 616.5 10,000 61,359

Correlation Complete 0.50 616.5 10,000 65,473

Correlation Single 0.50 616.5 10,000 108,396

Euclidean Average 300 617 5,264 62,120

Euclidean Single 0.50 618.5 10,000 73,302

Correlation Average 0.50 622.5 10,000 66,865

CityBlock Single 300 626 359 5,814

CityBlock Single 0.75 626 359 5,814

Not quite so good…

Euclidean Single 300 815 350 4230

Euclidean Average 50 902 4282 50514

The rest…average ( std dev)(Note: 28 did not return a feasible solution)

1951.3(260.5)

217.4(1283.4)

1617.8(6708.2)

(Note: Nodelimit = 10,000. Relative error gap tolerance = 0.03)

Page 16: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 16/22

16

Some significant correlations: 

measure measure correlation

Parsed Objective Value # of clusters -0.369

MIP solution node id -0.604

MIP iteration count -0.706

MIP1 solution node id # of clusters 0.523

Page 17: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 17/22

17

DistDist

Link Link 

CutOff CutOff 

WardSingeCompete  Average 3007550

2000

1500

1000

2000

1500

1000

Dist

Euc

Ham

Ind

Jac

 Abs

Cor

Link 

Singe

Ward

 Average

Compete

Interaction Plot (data means) for Parsed_O

Page 18: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 18/22

18

What’s next…

• An obvious next step

 – use the solution as an initial solution for the original MIP. – slight problem: lower bounds are still weak, doesn’t help much

• Obvious questions – 

can we “mine” for improved lower bounds?

(maybe, but not ready for prime time …)

• Does this generalize beyond VanTran?

Page 19: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 19/22

19

Experimentation with some MIPLIB problems:

Problems and Characteristics

Problem Binaries Continuous Row Types Notes

10teams 1800 225 110 G, 40 P, 80 S correlations n/a

danoint 56 465 256 G, 392 U

egout 55 86 43 G, 55 U

fiber 1254 44 44 G, 378 U

khb05250 24 1326 77 G, 24 U

misc06 112 1696 820 G

mod011 96 10862 4400 G, 16 S, 64 U

rentacar 55 9502 6674 G, 55 U correlations n/argn 100 80 20 G, 4 P

G: General P: Packing S: Special Ordered Set U,L: Upper,Lower Bound

Page 20: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 20/22

20

Problem Combinations # within 1.5% of bestknown solution

# within 0.01%of best known

solution

# w/o initialsolution

10teams 60 60

danoint 72 12 12

egout 72 72 54

fiber 72 *** 69

khb05250 72 68 49

misc06 72 72 0

mod011 72 72 72rentacar 60 60

rgn 72 49 49

*** when “parsed” solution was used to initialize B&B, optimal solutionidentified within 285 nodes, 4300 iterations

Page 21: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 21/22

21

Combination Summary: # within 1.5% of best known solution

Average Complete Single Ward

CityBlock 15 15 15 15

Correlation 13 15 13 13

Euclidean 17 15 15 15

Hamming 15 14 13 15Indicator 15 15 13 16

Jaccard 14 14 13 13

Each Distance/Linkage Combination has:

3 cluster sizes, 9 problems = 27 combinations.(correlation: 7 problems, so 21 combinations)

Page 22: Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 22/22

22

Conclusions?

• There’s still a lot of work ~ data analysis/interpretation to do

• It appears that there may be some problem classes where this type of 

approach is beneficial

• “Parsed” solution isn’t necessarily feasible, but a complete recoursetype of formulation should eliminate that problem

• Lower bounds … might be achievable through a row aggregation

scheme