Mining the Matrix

8/8/2019 Mining the Matrix

http://slidepdf.com/reader/full/mining-the-matrix 1/22

“Mining the Matrix” – Applying

Data Mining Concepts to Large

Scale MIPs

J.L. Higle

May, 2004

Groningen WSIP



2

Outline

• “VanTran” paratransit driver schedules(a problem formulation)

• Problem Characteristics(or why I gave up on that idea…)

• Observations based on SSD computations(relationship to some data mining techniques)

• An initial foray …• Computational results so far

• What’s next…



3

VanTran – Paratransit service provider in Tucson

•± 80 vans, each hold up to 3 wheelchairs and 10 ambulatory riders

•Reservation-based ride service

•Changes in federal regulation impose stricter service requirements

•Driver schedules (“routes”) are set well in advance of day of service,

and remain constant over lengthy periods of time.

•Demand changes each day, so vehicle movements vary.

Question: Given that service requirements are changing, how should

the driver schedules be adapted to accommodate this change?

My response … “This sounds like a recourse problem …first stage ~ “schedule drivers”

second stage ~ “assign customers to drivers”



4

One formulation

(or something like that…):Variables:

2

10

1 if driver r "starts" in period t; 0 othe rwise

1 if driver r "ends" in period t; 0 otherwise1 if r drives < 2hrs; 0 otherwise

1 if r drives > 10 hrs; 0 otherwise

1 if customer i assigned to

rt

rt

r

r

ir

s

e D

D

x

r; 0 otherwise

1 if customer i not assigned; 0 otherwise1 if i and j assigned togther; 0 otherwise

i

ij

u z



5

, ,

( , )

/

driver's schedule is too short

driver's on the road too long

1 , don't start and end at the same time

s

r s i

r t r i i j

s

t

t rt r r

s

rt rt

i

r

rt rt r

rt rt

j

t

Min t p p M d i j

s t

t M t r

t M t

e s D D

e s D

e s D

s e

r

r

u z

t

å å å å

åå

l

l

l

l

( )

*

1 driver on the road at all times

1 account for each customer

1 , , i,j assigned to same driver?

, seat capacity

(1 )

ir i

ij ir jr

ir

i

r q t

r

i

i I t

pick

rq rq

rt r i

t

x u

z

t

i

i j r

n c r t

t M

x x

x

x

e

t

s

s

åå

å

å

å , don't assign i to r if r starts too lat e

(1 ) , don't assign i to r if r ends too earlyr

drop

r

t

t ii x

i r

t M t i r e

å

Plus a few extra “tie-breaking” constraints to tighten the relaxation



6

Problem Characteristics (or why I gave up on that idea…)

• Hmm… how about 4 drivers, 20 customers, and only one demandscenario…

• This results in a MIP – 605 binaries, 210 continuous, 1546constraints…

• I let CPLEX work on it all night … after 5,000,000 B&B nodes Imanaged to get the gap down to 7.7% (ouch)

• VanTran’s got 80 vehicles, 60 ± drivers, thousands of passengers,random daily demand…

I abandoned that approach …This is a large MIP (binary) lots of rows, lots of columns that’s

probably going to be solved lots of times…



7

SSD Stochastic Scenario Decomposition

(Higle, Rayco, Sen ’01)

• Solution method for multistage SLP

• Uses scenario decomposition + piecewise linear

concave approximation of the dual objectivefunction

• Uses randomly generated observations for

successive improvement of approximation

– adaptive sampling (aka “internal”) as opposed to

nonadaptive sampling (aka “external”)



8

SSD: Salient Features

• New observations – increase master program column dimension• New cuts – increase master program row dimension

• Column dimension grows is more problematic…

To solve the master program we introduced some column and rowaggregation to reduce the size as follows:

• aggregate most of the cuts (except for two … “new”, “incumbent”)

• represent each column with 4-tuple

– current value

– coefficient in “new” cut

– coefficient in “incumbent” cut

– coefficient in “aggregated” cut

• columns with similar 4-tuples are aggregatedNote: aggregation ignores scenario and stage associated with

column … looks at the “data” and considers only similarities inthe data.

Note also: it worked surprisingly well on a variety of problems.



9

The aggregation scheme used by SSD can be viewed as a formof “data mining” of the master program matrix.

“Data Mining” is a catch-all phrase – it’s a collection of techniques used to draw some meaning from a large data set&/or reduce its storage requirements.

What’s the connection…

Large problems Big Matrices lots of “data”

“VanTran” takes a long time to solve because it’s hard tochoose from “similar” solutions, and lots of “ties” have to be

broken

Perhaps columns can be “clustered” so that tie-breaking can be

postponed until later...Perhaps the “information” contained in the constraints can be

represented in a compressed form…



10

Variable Clustering are the binary, continuous variables

are the constraints … i.e.,

For each columns i and j calculate for some “distance”, D.

If are all “close” to each other, replace the columns by a“clustered” variable, X

I (general integer) so that the constraints become:

where , and (obj. coeffs. similarly defined).

The resulting MIP has fewer variables, and is likely to spend less timetrying to break ties.

, cb x x

b b c c A x A x r 1,..., where

b

m

b n i A a a a i

,ij i j

d D a a

i i I a

I I c c

I

a X A x r å1

|| || I ii I a a I å || || I I X u I



11

A solution to the clustered MIP, can be converted to a solution to

the original MIP (“parsed”) as follows:

If the “cluster” must be undone, which can be

accomplished with a simple MIP … during which

can be assigned as well.

c x

ˆ 0 0 I i

X x i I

ˆ 1 I I i

X u x i I

ˆ0 I I

X u



12

Two questions—

How should “distance” be defined?

How should variables be “linked” to form clusters?• Possible “Distance” definitions…

• cityblock:

• euclidean:

• Less standard:

• indicator:

•correlation:

• hamming: percent of coordinates that differ

• jaccard:

| |ij ki kj

k

d a a å

2

ij ki kj

k

d a a å

1 | * | where ( ) 1 if 0; 0 otherwiseij ki kj

k

d a a u u >å

1 ( , )ij i jd a a

1

| |ij ki kj

k

d a an

å

1

| |ij ki kj

k nonzeroes

d a an

å



13

Possible “linking” definitions:

“linking” creates a hierarchical tree that indicates the order in

which clusters are aggregated. Some possible methods for aggregating two sub-clusters:

• single: min. smallest distance between elements of both sets

• complete: min. largest distance between elements of both sets• average: min. average distance between objects in the two sets

• ward: minimize inner squared distance

JH Confession: I didn’t want to code this so I just used Matlab… these are some of the standard linkages that Matlab

provides…



14

My initial foray…

MIP Solver: CPLEX 8.1 (BTTOL 0.1, EMPHASIS feasibility)

6 Distances*4 Linkages*3 ClusterSizes = 72 runs

VanTran: 605 Binary, 210 continuous variables … 1546 constraints

Best known objective value = 605

CPLEX required 1,490,789 nodes to identify it

Gap after 5,000,000 nodes (37,481,441 iterations) is 7.7%



15

Some especially good combinations:

Dist Link ClusterSize ObjVal Nodes ItersJaccard Ward 300 607 1,240 8,881

Indicator Average 300 609 940 8,543

CityBlock Average 0.50 616.5 10,000 61,359

Correlation Complete 0.50 616.5 10,000 65,473

Correlation Single 0.50 616.5 10,000 108,396

Euclidean Average 300 617 5,264 62,120

Euclidean Single 0.50 618.5 10,000 73,302

Correlation Average 0.50 622.5 10,000 66,865

CityBlock Single 300 626 359 5,814

CityBlock Single 0.75 626 359 5,814

Not quite so good…

Euclidean Single 300 815 350 4230

Euclidean Average 50 902 4282 50514

The rest…average ( std dev)(Note: 28 did not return a feasible solution)

1951.3(260.5)

217.4(1283.4)

1617.8(6708.2)

(Note: Nodelimit = 10,000. Relative error gap tolerance = 0.03)



16

Some significant correlations:

measure measure correlation

Parsed Objective Value # of clusters -0.369

MIP solution node id -0.604

MIP iteration count -0.706

MIP1 solution node id # of clusters 0.523



17

DistDist

Link Link

CutOff CutOff

WardSingeCompete Average 3007550

2000

1500

1000

2000

1500

1000

Dist

Euc

Ham

Ind

Jac

Abs

Cor

Link

Singe

Ward

Average

Compete

Interaction Plot (data means) for Parsed_O



18

What’s next…

• An obvious next step

– use the solution as an initial solution for the original MIP. – slight problem: lower bounds are still weak, doesn’t help much

• Obvious questions –

can we “mine” for improved lower bounds?

(maybe, but not ready for prime time …)

• Does this generalize beyond VanTran?



19

Experimentation with some MIPLIB problems:

Problems and Characteristics

Problem Binaries Continuous Row Types Notes

10teams 1800 225 110 G, 40 P, 80 S correlations n/a

danoint 56 465 256 G, 392 U

egout 55 86 43 G, 55 U

fiber 1254 44 44 G, 378 U

khb05250 24 1326 77 G, 24 U

misc06 112 1696 820 G

mod011 96 10862 4400 G, 16 S, 64 U

rentacar 55 9502 6674 G, 55 U correlations n/argn 100 80 20 G, 4 P

G: General P: Packing S: Special Ordered Set U,L: Upper,Lower Bound



20

Problem Combinations # within 1.5% of bestknown solution

# within 0.01%of best known

solution

# w/o initialsolution

10teams 60 60

danoint 72 12 12

egout 72 72 54

fiber 72 *** 69

khb05250 72 68 49

misc06 72 72 0

mod011 72 72 72rentacar 60 60

rgn 72 49 49

*** when “parsed” solution was used to initialize B&B, optimal solutionidentified within 285 nodes, 4300 iterations



21

Combination Summary: # within 1.5% of best known solution

Average Complete Single Ward

CityBlock 15 15 15 15

Correlation 13 15 13 13

Euclidean 17 15 15 15

Hamming 15 14 13 15Indicator 15 15 13 16

Jaccard 14 14 13 13

Each Distance/Linkage Combination has:

3 cluster sizes, 9 problems = 27 combinations.(correlation: 7 problems, so 21 combinations)



22

Conclusions?

• There’s still a lot of work ~ data analysis/interpretation to do

• It appears that there may be some problem classes where this type of

approach is beneficial

• “Parsed” solution isn’t necessarily feasible, but a complete recoursetype of formulation should eliminate that problem

• Lower bounds … might be achievable through a row aggregation

scheme

Documents

Mining the Matrix