Mixed-Integer Models for Nonseparable Piecewise Linear ...jvielma/publications/Mixed-Integer-Models-for-Nonseparable.pdf · charges. Applications of optimization problems with non-convex

Submitted tomanuscript

Mixed-Integer Models for Nonseparable PiecewiseLinear Optimization: Unifying Framework and

ExtensionsJuan Pablo Vielma, Shabbir Ahmed and George Nemhauser

H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332

{jvielma,sahmed,gnemhaus}@isye.gatech.edu

We study the modeling of non-convex piecewise linear functions as Mixed Integer Programming (MIP)

problems. We review several new and existing MIP formulations for continuous piecewise linear functions

with special attention paid to multivariate non-separable functions. We compare these formulations with

respect to their theoretical properties and their relative computational performance. In addition, we study

the extension of these formulations to lower semicontinuous piecewise linear functions.

Key words : Mixed Integer Programming, Piecewise Linear Functions

History : September, 2008

1. Introduction

We consider optimization problems involving piecewise linear functions modeled as Mixed Integer

Programming (MIP) problems. When the functions considered are convex these problems can be

modeled as Linear Programming (LP) problems, so we focus on non-convex functions for which

the optimization problem is NP-hard even when all the functions are univariate (Keha et al. 2006).

Non-convex piecewise linear functions are generally used to approximate non-linearities arising

from factors such as economies of scale or complex technological processes. They also naturally

appear as cost functions of supply chain problems to model discounts for high volume and fixed

charges. Applications of optimization problems with non-convex piecewise linear functions include

production planning (Fourer et al. 1993), optimization of electronic circuits (Graf et al. 1990),

operation planning of gas networks (Martin et al. 2006), process engineering (Bergamini et al. 2005,

2008), merge-in-transit (Croxton et al. 2003b) and other network flow problems with non-convex

piecewise linear objective functions (Croxton et al. 2007).

1

Vielma, Ahmed and Nemhauser: Mixed-Integer Models for Piecewise Linear Optimization2 Article submitted to ; manuscript no.

Optimization problems involving non-convex piecewise linear functions can be solved with spe-

cialized algorithms (de Farias Jr. et al. 2008, Keha et al. 2006, Tomlin 1981) or they can be modeled

as MIPs (Lowe 1984, Sherali 2001, Croxton et al. 2003a, Balakrishnan and Graves 1989, Keha et al.

2004, Dantzig 1960, Wilson 1998, Lee and Wilson 2001, Jeroslow and Lowe 1985, Padberg 2000,

Vielma and Nemhauser 2008a, Magnanti and Stratila 2004, Markowitz and Manne 1957) and solved

with a general purpose MIP solver. The advantage of this latter approach is that it capitalizes on

the advanced technology available in state of the art MIP solvers (Vielma et al. 2008). MIP models

for non-convex piecewise linear functions have been extensively studied, but existing comparisons

(Croxton et al. 2003a, Keha et al. 2004, Jeroslow and Lowe 1985) only concentrate on the case in

which the functions are separable (i.e. can be written as the sum of univariate functions). When

a non-separable function is known analytically it can sometimes be converted into a separable

one by algebraic manipulations (Tomlin 1981). However this conversion might be undesirable for

numerical reasons (Martin et al. 2006) and because it can result in weaker formulations (Croxton

et al. 2007). Furthermore, in many applications the functions come from complicated simulation

models (Lasdon and Waren 1980) and are not known analytically.

The main objective of this paper is to unify the numerous MIP models for piecewise linear

functions into a common framework which considers the possibility of non-separable functions and

discontinuities directly. In addition, we present a theoretical and computational comparison of the

models considered. Because models for separable multivariate functions can be obtained directly

from models for univariate functions we will assume that multivariate functions are non-separable.

The remainder of the paper is organized as follows. In Section 2 we study the MIP modeling of

continuous piecewise linear functions and define concepts that will be used throughout the paper.

In Section 3 we give several MIP models for continuous piecewise linear functions and in Section 4

we study some properties of these formulations. In Section 5, we present computational results

comparing the formulations for continuous functions. In Section 6, we study the extension of the

formulations to lower semicontinuous functions and in Section 7 we present computational results

comparing the formulations for this class of functions. In Section 8 we present some final remarks.

Vielma, Ahmed and Nemhauser: Mixed-Integer Models for Piecewise Linear OptimizationArticle submitted to ; manuscript no. 3

2. Modeling Piecewise Linear Functions

An appropriate way of modeling a piecewise linear function f : D ⊂ Rn→ R is to model its epi-

graph given by epi(f) := {(x, z) ∈D×R : f(x)≤ z}. For example, the epigraph of the function in

Figure 1(a) is depicted in Figure 1(b).

0 1 2 4 5

f(4) = 50

f(0) = 10

f(1) = 32f(2) = 40

f(5) = 15

(a) f .

0 1 2 4 5

50

10

3240

15

(b) epi(f).

Figure 1 A continuous piecewise linear function and its epigraph as the union of polyhedra.

For simplicity, we assume that the function domain D is bounded and f is only used in a

constraint of the form f(x)≤ 0 or as an objective function that is being minimized. We then need

a model of epi(f) since f(x)≤ 0 can be modeled as (x, z) ∈ epi(f), z ≤ 0 and the minimization of

f can be achieved by minimizing z subject to (x, z)∈ epi(f). For continuous functions we can also

work with its graph, but modeling the epigraph will allow us to extend most of the results to some

discontinuous functions and will simplify the analysis of formulation properties.

Following the theory developed by Jeroslow and Lowe (Jeroslow 1987, 1989, Jeroslow and Lowe

1984, 1985, Lowe 1984), we say that a polyhedron P ⊂Rn×R×Rp×Rq is a binary mixed-integer

programming model for a set S ⊂Rn×R if

(x, z)∈ S⇔∃(λ,y)∈Rp×{0,1}q s.t. (x, z,λ, y)∈ P. (1)

Under the bounded domain assumption, Jeroslow and Lowe prove that the epigraph of a function

can be modeled as a binary mixed-integer programming model if and only if it is a union of

polyhedra with a common recession cone given by C+n := {(0, z) ∈Rn×R : z ≥ 0}. This condition


is a special case of the results in Jeroslow (1989), which also consider unbounded domains and

more general uses of f in a mathematical program. Furthermore, this condition implies that for a

function f :D⊂R→R we have that epi(f) can be modeled as a binary mixed-integer programming

model if and only if f is piecewise linear and lower semicontinuous. Our definition of a piecewise

linear function is motivated by the extension of this characterization to the multivariate case.

A single variable continuous piecewise linear function f : [0, u]→R can be described as

f(x) :={mix+ ci x∈ [di−1, di] ∀i∈ {1, . . . ,K} (2)

for some K ∈ Z+, {mi}Ki=1 ⊂R, {ci}Ki=1 ⊂R and {dk}Kk=0 ⊂R such that 0 = d0 < d1 < . . . < dK = u.

For example, function f depicted Figure 1(a) can be described in form (2) for K = 4, m1 = 22,

m2 = 8, m3 =−17.5, m4 = 10, c1 = 10, c2 = 24, c3 = 75, c4 =−35, d0 = 0, d1 = 1, d2 = 2, d3 = 4 and

d4 = 5. A natural extension to the multivariate case is given by

Definition 1 (Continuous Piecewise Linear Function). Let D ⊂ Rn be a compact set.

A continuous function f : D ⊂ Rn → R is a piecewise linear function if and only if there exists

{mP}P∈P ⊆Rn, {cP}P∈P ⊆R and a finite family of polytopes P such that D=⋃P∈P P and

f(x) :={mPx+ cP x∈ P ∀P ∈P. (3)

Note that D does not need to be convex or connected and that the boundedness assumption is

for simplicity. Furthermore, if x ∈ P1 ∩P2 for two polytopes P1, P2 ∈ P the definition implies that

mP1x+ cP1

= mP2x+ cP2

which ensures the continuity of f on D. In addition, Definition 1 does

not specify how the polytopes are described as this is formulation dependent. In some formulations

the polytopes are given as the convex hull of a finite number of points and in others the polytopes

are given as a system of linear inequalities. The finite family of polytopes P is usually taken to

be a triangulation of D (Lee and Wilson 2001, Martin et al. 2006, Wilson 1998) and in fact some

models will require this. For any family of polytopes P we denote the set of vertices of the family

by V(P) :=⋃P∈P V (P ) where V (P ) is the set of vertices of P . When P is a triangulation this

coincides with the usual definition of vertices of a triangulation.


Using the approach of modeling epi(f) as a union of polyhedra, Balas (Balas 1979) and Jeroslow

and Lowe introduce two standard ways of modeling f . An advantage of this approach is that it

allows for a simple treatment of lower semicontinuous functions. In addition, with this definition

the epigraph of a continuous piecewise linear function is the union of polyhedra given by

epi(f) =C+n +

⋃P∈P

conv({(v, f(v))}v∈V (P )

)(4)

where conv denotes the convex hull operation and + denotes the Minkowski addition of sets. For

the function given in Figure 1(a) this characterization is illustrated in Figure 1(b) and detailed in

Appendix EC.1.

3. Mixed Integer Programming Models for Piecewise Linear Functions

In this section we review several new and existing formulations for continuous functions. In

Appendix EC.1 we illustrate the formulations for the function depicted in Figure 1(a).

3.1. Disaggregated convex combination models

All formulations in this section represent (x, z) ∈ epi(f) as the convex combination of points

(v, f(v)) for v ∈ V(P) plus a ray in C+n . They have one continuous variable for each v ∈ V (P ) and

for each P ∈ P to represent a point (x, z)∈ epi(f) as (x, z) = r+∑

P∈P∑

v∈V (P ) λP,v(v, f(v)), for

r ∈C+n and {λP,v}P∈P, v∈V (P ) ⊂R+ such that

∑P∈P

∑v∈V (P ) λP,v = 1.

3.1.1. Basic Model

This formulation has no requirement on the family of polytopes and is given by

∑P∈P

∑v∈V (P )

λP,vv= x,∑P∈P

∑v∈V (P )

λP,v (mPv+ cP )≤ z (5a)

λP,v ≥ 0 ∀P ∈P, v ∈ V (P ),∑

v∈V (P )

λP,v = yP ∀P ∈P (5b)∑P∈P

yP = 1, yP ∈ {0,1} ∀P ∈P. (5c)

This formulation has been studied in Croxton et al. (2003a), Jeroslow (1987), Jeroslow and Lowe

(1984), Lowe (1984), Meyer (1976) and Sherali (2001) and is sometimes referred to as the convex

combination model. To distinguish it from the formulation in Section 3.2 we instead refer to it as

the disaggregated convex combination model and denote it by DCC.


3.1.2. Logarithmic Model

Using ideas from Ibaraki (1976), Vielma and Nemhauser (2008a) and Vielma and Nemhauser

(2008b) we can reduce the number of binary variables and constraints of DCC. To do this we identify

each polytope in P with a binary vector in {0,1}dlog2 |P|e through an injective function B : P →

{0,1}dlog2 |P|e. We then use dlog2 |P|e binary variables y ∈ {0,1}dlog2 |P|e to force∑

v∈V (P ) λP,v = 1

when y=B(P).

The resulting formulation has no requirement on the family of polytopes and is given by

∑P∈P

∑v∈V (P )

λP,vv= x,∑P∈P

∑v∈V (P )

λP,v (mPv+ cP )≤ z (6a)

λP,v ≥ 0 ∀P ∈P, v ∈ V (P ),∑P∈P

∑v∈V (P )

λP,v = 1 (6b)∑P∈P+(B,l)

∑v∈V (P )

λP,v ≤ yl,∑

P∈P0(B,l)

∑v∈V (P )

λP,v ≤ (1− yl), yl ∈ {0,1} ∀l ∈L(P), (6c)

where B :P →{0,1}dlog2 |P|e is any injective function, P+(B, l) := {P ∈P : B(P )l = 1}, P0(B, l) :=

{P ∈P : B(P )l = 0} and L(P) := {1, . . . , dlog2 |P|e}. We refer to it as the logarithmic dissagregated

convex combination model and denote it by DLog.

3.2. Convex combination models

The formulations in this section reduce the number of continuous variables of DCC by aggregating

variables associated with a point in V(P) that belongs to more than one polytope in P. The

resulting formulations have one continuous variable for each v ∈ V(P) and hence represent point

(x, z)∈ epi(f) as (x, z) = r+∑

v∈V(P) λv(v, f(v)), for r ∈C+n and λ∈RV(P)

+ such that∑

v∈V(P) λv = 1.

3.2.1. Basic Model


∑v∈V(P)

λvv= x,∑

v∈V(P)

λv (mPv+ cP )≤ z (7a)

λv ≥ 0 ∀v ∈ V(P),∑

v∈V(P)

λv = 1 (7b)

λv ≤∑

P∈P(v)

yP ∀v ∈ V(P),∑P∈P

yP = 1, yP ∈ {0,1} ∀P ∈P, (7c)


where P(v) := {P ∈P : v ∈ P}. This formulation is studied in Dantzig (1963, 1960), Garfinkel and

Nemhauser (1972), Jeroslow and Lowe (1985), Keha et al. (2004), Lee and Wilson (2001), Lowe

(1984), Nemhauser and Wolsey (1988), Padberg (2000) and Wilson (1998) and is sometimes referred

to as the lambda method. We refer to this formulation as the convex combination model and denote

it by CC.

3.2.2. Logarithmic Model

As in DLog’s construction we can reduce the number of binary variables and constraints of CC by

identifying each polytope in P with a binary vector in {0,1}dlog2 |P|e through an injective function

B :P →{0,1}dlog2 |P|e. However, we now need B to comply with conditions that can be interpreted

as the construction of a binary branching scheme for the effect of (7c) on λ∈RV(P). This constraint

requires the non-zero λ variables to be associated with the vertices of a polytope in P:

∃P ∈P s.t. {v ∈ V(P) : λv > 0} ⊂ V (P ). (8)

A binary branching scheme for (8) imposes it by fixing to zero disjoint sets of λ variables in each

side of a series of branching dichotomies. For example, for the function depicted in Figure 1(a)

we have P = {[0,1], [1,2], [2,4], [4,5]} and we can force (8) by the branching scheme given by the

following two dichotomies: (λ2 = 0 or λ0 = λ5 = 0) and (λ4 = λ5 = 0 or λ0 = λ1 = 0).

In general, a branching scheme for (8) is a family of dichotomies {Ls,Rs}s∈S indexed by a finite

set S and with Ls,Rs ⊂ V(P) such that for every P ∈ P we have V (P ) =⋂s∈S

(V(P) \ Ts

), where

Ts =Ls or Ts =Rs for each s∈ S. For such a branching scheme a valid formulation is given by

∑v∈V(P)

λvv= x,∑

v∈V(P)

λv (mPv+ cP )≤ z (9a)

λv ≥ 0 ∀v ∈ V(P),∑

v∈V(P)

λv = 1 (9b)∑v∈Ls

λv ≤ ys,∑v∈Rs

λv ≤ (1− ys), ys ∈ {0,1} ∀s∈ S. (9c)

For (9) to have a logarithmic number of binary variables, we need a branching scheme with a log-

arithmic number of dichotomies. Such a scheme was introduced in Vielma and Nemhauser (2008a)


0 1 20

1

2

(a) J1 triangulation of [0,2]2.

0 1 20

1

2

(b) 1/2 scaled J1 triangulation of [0,2]2.

Figure 2 Examples of triangulations of subsets of R2.

and Vielma and Nemhauser (2008b) for the case when the family of polytopes P is topologically

equivalent or compatible (Aichholzer et al. 2003) with a triangulation known as J1 or “Union Jack”

(Todd 1977). For simplicity we first describe the formulation for the case when P = J1 and then

show how to extend the formulation to the case where P is compatible with J1.

J1 is defined for D = [0,K]n for K ∈ Z even. The vertex set of J1 is given by V = {0, . . . ,K}n.

The simplices of J1 are constructed as follows. Let N = {1, . . . , n}, V0 = {v ∈ V : vi is odd, ∀i∈N},

Sym(N) be the group of all permutations on N and ei be the i-th unit vector of Rn. For each

(v0, π, s) ∈ V0× Sym(N)×{−1,1}n define j1(v0, π, s) to be the simplex whose vertices are {yi}ni=0

where y0 = v0 and yi = yi−1 + sπ(i)eπ(i) for each i ∈N . Triangulation J1 of D is given by all these

simplices, which is illustrated in Figure 2(a) for D= [0,2]2. A branching scheme for J1 is constructed

by dividing index set S into two sets S1 and S2. The first set is given by S1 :=N×{1, . . . , dlog2(K)e}

and L(s1,s2) := {v ∈ V : vs1 ∈ O(s2,1)}, R(s1,s2) := {v ∈ V : vs1 ∈ O(s2,0)} for each (s1, s2) ∈ S1,

where O(l, b) :={k ∈ {0, . . . ,K} : (k= 0 or Gk

l = b) and(k=K or Gk+1

l = b)}

for an arbitrary but

fixed set of binary vectors (Gl)Kl=1 ⊂ {0,1}dlog2(K)e such that Gl and Gl+1 differ in at most one

component for each l ∈ {1, . . . , dlog2(K)e − 1}. There are many different sets of vectors with this

property and they are usually referred to as reflective binary or Gray codes (Wilf. 1989). The second

set is given by S2 := {(s1, s2) ∈ N 2 : s1 < s2} and L(s1,s2) := {v ∈ V : vs1 is even and vs2 is odd},

R(s1,s2) := {v ∈ V : vs1 is odd and vs2 is even} for each (s1, s2)∈ S2.

Following Vielma and Nemhauser (2008a) and Vielma and Nemhauser (2008b) we refer to the

formulation obtained with this scheme as the logarithmic branching convex combination model and


denote it by Log. As mentioned before, Log can be extended to any family of polytopes P that is

compatible with the J1 triangulation. This requires the existence of a bijection ϕ : {0, . . . ,K}n→

V(P) between the vertices of J1 and the family P such that v1, . . . , vn+1 are the vertices of a simplex

in J1 if and only if ϕ(v1), . . . ,ϕ(vn+1) are the vertices of a polytope in P. For example, taking

ϕ : {0, . . . ,4}2 → {0,1/2,1,3/2,2}2 given by ϕ(v1, v2) = (v1/2, v2/2) we have that the 1/2 scaled

J1 triangulation depicted in Figure 2(b) is compatible with the J1 triangulation of [0,4]2. Using

bijection ϕ the formulation for P is simply obtained by replacing (9a) by∑

v∈V(P) λvϕ(v) = x and∑v∈V(P) λv (mPϕ(v) + cP )≤ z.

A similar formulation can be obtained from a branching scheme introduced in Martin et al.

(2006), but the resulting formulation has a linear instead of logarithmic number of binary variables.

3.3. Multiple choice model


∑P∈P

xP = x,∑P∈P

(mPx

P + cPyP)≤ z (10a)

APxP ≤ yP bP ∀P ∈P (10b)∑

P∈P

yP = 1, yP ∈ {0,1} ∀P ∈P, (10c)

where APx≤ bP is the set of linear inequalities describing P . This formulation has been studied

in Balakrishnan and Graves (1989), Croxton et al. (2003a), Jeroslow and Lowe (1984) and Lowe

(1984). We refer to this formulation as the multiple choice model and denote it by MC.

3.4. Incremental model

This formulation requires P to be a triangulation with a special ordering property. This property

always holds for univariate functions so for simplicity we describe the formulation for this case first.

For univariate function f : [l, u]→ R and for P = {[dk−1, dk]}Kk=1 where l = d0 ≤ d1 ≤ . . .≤ dK = u,

the formulation is given by

d0 +K∑k=1

δk (dk− dk−1) = x, f(d0) +K∑k=1

δk (f(dk)− f(dk−1))≤ z (11a)

δ1 ≤ 1, δK ≥ 0, δk+1 ≤ yk ≤ δk, yk ∈ {0,1} ∀k ∈ {1, . . . ,K − 1}. (11b)


The extension to multivariate functions (Wilson 1998) requires the family of polytopes to be a

triangulation T that complies with the following ordering properties:

O1. The simplices in T can be ordered as T1, . . . , T|T | so that Ti ∩Ti−1 6= ∅ for i∈ {2, . . . , |T |}.

O2. For the order above, the vertices of each simplex Ti can be ordered as v0i , . . . , v

|V (Ti)|−1i in a

way such that v|V (Ti)|−1i−1 = v0

i for i∈ {2, . . . , |T |}.

These properties are required to represent (x, z) incrementally akin to (11a) for the univariate case.

Fortunately these conditions are met for many triangulations including J1 (Wilson 1998).

For a given order complying with O1–O2 the formulation is given by

v00 +

|T |∑i=1

|V (Ti)|−1∑j=1

δji(vji − v0

i

)= x, f(v0

0) +|T |∑i=1

|V (Ti)|−1∑j=1

δji(f(vji )− f(v0

i ))≤ z (12a)

|V (T1)|−1∑j=1

δj1 ≤ 1, δji ≥ 0 ∀i∈ {1, . . . , |T |}, j ∈ {1, . . . , |V (Ti)| − 1} (12b)

yi ≤ δ|V (Ti)|−1i ,

|V (Ti+1)|−1∑j=1

δji+1 ≤ yi, yi ∈ {0,1} ∀i∈ {1, . . . , |T |− 1}. (12c)

This formulation has been studied in Croxton et al. (2003a), Dantzig (1963, 1960), Keha et al.

(2004), Markowitz and Manne (1957), Padberg (2000), Sherali (2001), Vajda (1964) and Wilson

(1998) and it is sometimes referred to as the delta method. Following Croxton et al. (2003a) and

Keha et al. (2004) we refer to it as the incremental model and denote it by Inc.

4. Properties of Mixed Integer Programming Formulations

In this section we study some properties of the formulations. We begin by studying the strength

of the formulations as a model of epi(f) ignoring possible interactions with other constraints. For

this case a motivating problem is the minimization of f :D⊂Rn→R over its domain D given by

minx∈D

f(x) = min(x,z)∈epi(f)

z. (13)

We then study the effects of interactions with other constraints using as a motivating problem

minx∈X

f(x) = min(x,z)∈epi(f)∩(X×R)

z, (14)


where X ⊂D is any compact set. Finally, we study the sizes of the formulations and their require-

ments on the family of polytopes P used to describe the piecewise linear function.

Consider a MIP formulation of epi(f) given by a polytope P ⊂ Rn+p+q+1 complying with (1).

The linear programming (LP) relaxation of the formulation is then simply P . Alternative MIP

formulations are usually compared with respect to the tightness of their LP relaxation in the

absence of additional constraints. In this regard, the strongest possible property of a MIP formu-

lation is to require that all vertices of its LP relaxation comply with the corresponding integrality

requirements. Formulations with this property are referred to as locally ideal in Padberg (2000)

and Padberg and Rijal (1996). It is shown in Lee and Wilson (2001), Padberg (2000) and Wilson

(1998) that CC is not locally ideal. However all of the other formulations from Section 3 are locally

ideal.

Theorem 1. All formulations from Section 3 except CC are locally ideal.

The proof of this and other statements in this section are given in Appendix EC.2. For a locally

ideal formulation P of epi(f) we have

min(x,z,λ,y)∈P

z = minx∈D

f(x), (15)

which allows solving (13) directly as an LP and can be useful for solving (14) with a branch-and-

bound algorithm. However, as noted in Croxton et al. (2003a) and Keha et al. (2004), property

(15) might still hold for non-locally ideal formulations such as CC. In fact, we will see that (15) is

implied by a geometric property introduced by Jeroslow and Lowe, but is weaker than the locally

ideal property.

A slightly restricted version of Proposition 3.1 in Jeroslow and Lowe (1984) states that for any

closed set S ⊂Rn×R and for any binary mixed-integer programming model P ⊂Rn+p+q+1 for S,

the projection of P onto the first n+ 1 variables contains the convex hull of S. Jeroslow and Lowe

referred to a model P of S as sharp when the projection is exactly the convex hull of S. By letting

S be the epigraph of piecewise linear function f we directly get the following result.


Theorem 2. (Croxton et al. 2003a, Jeroslow and Lowe 1984, Lowe 1984) Let D ⊂ Rn be a

polytope, f : D→ R be a continuous piecewise linear function, P ⊂ Rn+p+q+1 be a MIP formula-

tion for epi(f) satisfying (1) and P(x,z) the projection of P onto (x, z). Then epi(convenvD(f)) =

conv(epi(f))⊂ P(x,y) where convenvD is the lower convex envelope of f over D.

A formulation P of epi(f) is said to be sharp when epi(convenvD(f)) = conv(epi(f)) = P(x,y).

Because minx∈D f(x) = minx∈D convenvD(f)(x) we have that (15) holds for sharp formulations.

Sharpness has been shown to hold for some formulations in Croxton et al. (2003a), Jeroslow (1987,

1989), Jeroslow and Lowe (1984, 1985), Keha et al. (2004), Lowe (1984), Padberg (2000) and

Sherali (2001) and the following proposition states that it holds for any locally ideal formulation.

Proposition 1. Any locally ideal formulation is sharp.

We then directly have that all formulations except CC are sharp. As noted in Section 3.2,

CC can be obtained from DCC in a way which reduces its tightness. Fortunately, this loss of

tightness does not affect the sharpness properties of CC so the following theorem holds.

Theorem 3. All formulations from Section 3 are sharp.

Sharpness is not preserved when x complies with additional constraints, so a property similar

to (15) does not hold for (14). However, it is still possible to characterize the LP bound obtained

when a sharp formulation is used to model the objective function of a larger model. The following

theorem follows directly from the definitions of sharpness and convex envelopes.

Theorem 4. Let D ⊂ Rn be a polytope, f : D→ R be a continuous piecewise linear function,

P ⊂Rn+p+q+1 be a sharp binary mixed-integer programming model for epi(f) and X be a compact

set. Then minx,z,λ,y{z : (x, z,λ, y)∈ P, x∈X}= minx∈X convenvD(f)(x).

For the case where X is a polytope this has also been studied in Croxton et al. (2003a) and

Croxton et al. (2007) and together with Theorem 3 yields the following corollary.

Corollary 1. All formulations from Section 3 give the same LP bound for solving (14).


Now we present the sizes of all the formulations given in Section 3. We give the number of extra

constraints and extra variables besides z and x and also indicate the number of extra variables

that are binary. Table 1 shows this information for all models. Except for Log and MC the sizes

are given as a function of n, |P| and the number of vertices |V(P)| or |V (P )|. For MC the size is

a function of n, |P| and the number of facets of polytope P denoted by F (P ). In particular if P is

a triangulation we have that |F (P )| ≤ n+ 1 for all P ∈P. For Log the size is a function of |V(P)|

and |S| where S is the branching scheme for the J1 triangulation of [0,K]n. In this case we have

|P| = Knn! and |S| = ndlog2(K)e+ n(n− 1)/2, but it is not clear how to explicitly relate these

numbers together when n> 2. However we can see that |S| grows asymptotically as log2(|P|) only

when n is fixed. More specifically, for fixed n we have |S| ∼ log2(|P|) (i.e. limK→∞ |S|/ log2(|P|) =

1) with |S| = log2(|P|) for K of the form 2r, but for fixed K we have log2(|P|) ∈ o(|S|) (i.e.

limn→∞ log2(|P|)/|S|= 0).

Model Constraints Additional Variables BinariesDCC n+ |P|+ 2 |P|+

∑P∈P |V (P )| |P|

DLog n+ 2dlog2(|P|)e+ 2 2dlog2(|P|)e+∑

P∈P |V (P )| 2dlog2(|P|)eCC n+ 3 + |V(P)| |V(P)|+ |P| |P|Log n+ 2 + 2|S| |V(P)|+ |S| |S|MC n+ 2 +

∑P∈P F (P ) (n+ 1)|P| |P|

Inc 1 + 2|P| |P|− 1 +∑

P∈P(|V (P )| − 1) |P|− 1Table 1 Sizes of Formulations

Finally, we summarize the requirements that the different formulations have on the family of

polytopes P used to describe the piecewise linear function. The first type of requirement concerns

the description of the polytopes in P as either the convex hull of a finite number of points (vertex

representation) or as the feasible region of a system of linear inequalities (inequality representation).

Although conversion between the two descriptions can be done efficiently for special cases of P

such as triangulations, the description requirements can be an important factor in the choice of the

formulation when general polytopes are used. We have seen that every formulation except MC uses

the vertex representation. The second type of requirements concerns the need for a particular family

of polytopes P. Although requiring P to be of a special class such as a triangulation is usually not


too restrictive, it can be an important factor when the function is constructed as the interpolation

of a non-linear function (Carnicer and Floater 1996, Pottmann et al. 2000). We have seen that

DCC, DLog, CC, and MC have no requirement on P. Inc requires P to be any triangulation

which complies with conditions O1–O2 described in Section 3.4 and Log requires P to be the J1

triangulation.

5. Computational Experiments for Continuous Functions

In this section we computationally test the formulations for continuous piecewise linear functions.

Our tests are on transportation problems with piecewise linear objective functions. We believe

these problems provide enough additional constraints to provide meaningful results while allowing

the piecewise linear objectives to dominate the optimization effort.

All models were generated using Ilog Concert 2 and solved using CPLEX 11 on a 2.4GHz

workstation with 2GB of RAM. Furthermore, all tests were run with a time limit of 10000 seconds.

5.1. Continuous Separable Concave Functions

The first set of experiments considers formulations for univariate functions. The instances tested

for these formulations are the same transportation problems with concave separable piecewise

linear objectives considered in Vielma and Nemhauser (2008a). These instances are based on the

10× 10 transportation problems used in Keha et al. (2006) and Vielma et al. (2008). Each of the

problems include the supply and demand information and capacities ue for each arc e. The problems

also include the subdivision of [0, ue] into 4 randomly selected intervals and their generation is

described in Keha et al. (2006). For each of the 5 instances we constructed several randomly

generated piecewise linear separable objective functions. These objective functions are of the form∑e∈E fe(xe) where E is the set of arcs of the transportation problem and fe(xe) is a continuous

non-decreasing concave piecewise linear function of the flow xe on arc e. We chose to use this

class of functions because they are widely used in practice and are challenging enough to provide

meaningful computational results. Each fe(xe) is affine in K segments and has fe(0) = 0. The

slopes for each segment of a particular fe were generated by obtaining a sample of size K from


set {z/1000 : z ∈ {1, . . . ,2000}} and sorting them to ensure concavity. We considered K = 4, 8, 16,

and 32 and for each K and for each of the 5 transportation problems we generated 20 objective

functions for a total of 100 instances for each K. To obtain the subdivisions of [0, ue] into 8, 16 and

32 intervals we simply recursively divided in half each of the intervals starting with the original 4

from Keha et al. (2006). Furthermore, we independently generated the objective functions for each

choice of K.

We tested all mixed integer formulations from Section 3 and in addition we tested the traditional

SOS2 formulation of piecewise linear functions (see for example Keha et al. (2004)) which does

not include binary variables. We implemented this formulation using CPLEX’s built in support for

SOS2 constraints and we refer to it as SOS2 in the computational results.

Table 2 shows the minimum, average, maximum and standard deviation of the solve times in

seconds. The table also shows the number of times the solves failed because the time limit was

reached and the number of times each formulation had the fastest solve time (win or tie).

model min avg max std win failMC 0.2 1.3 8.3 1.5 45 0SOS2 0.1 1.7 7.9 1.3 26 0Log 0.2 2.1 12.4 2.3 24 0DLog 0.3 2.1 10.3 2.2 4 0Inc 0.4 2.4 11.6 2.5 2 0DCC 0.3 2.6 14.0 2.5 0 0CC 0.3 4.6 23.0 4.3 0 0

(a) 4 segments.

model min avg max std win failMC 1.2 9.9 39 7.0 41 0Log 0.6 12.3 84 10.5 31 0DLog 0.8 13.2 91 11.6 5 0SOS2 0.8 15.8 202 23.0 23 0DCC 2.6 42.7 252 46.6 0 0Inc 5.1 43.0 163 29.3 0 0CC 2.6 81.0 570 96.6 0 0

(b) 8 segments.

model min avg max std win failLog 0.5 24 96 18 80 0DLog 0.8 32 132 25 17 0MC 1.9 97 730 122 2 0SOS2 1.9 109 1030 167 1 0Inc 29.8 302 1442 239 0 0CC 3.9 351 3691 517 0 0DCC 3.9 1366 10000 2120 0 3

(c) 16 segments.

model min avg max std win failLog 2.5 43 194 39 90 0DLog 5.5 63 328 53 8 0SOS2 10.0 925 10000 1900 2 2Inc 271.0 981 4039 685 0 0CC 67.5 1938 10000 2560 0 4MC 22.5 2246 10000 3208 0 9DCC 89.6 8163 10000 3141 0 69

(d) 32 segments.

Table 2 Solve times for univariate continuous functions [s].

For K = 4 we see that the average solve time for all formulations is of the same order of magni-

tude, but for larger K’s the difference between models becomes noticeable. Many conclusions could


be extracted from these results, but they should be taken with care as they can depend on both

the instances and the solver used. For example, MC is faster on average than Inc for K’s ranging

from 4 to 16, but in previous tests using CPLEX 9.1 the average solve time for Inc was always

better than or comparable to MC. Nevertheless, we make the following observations.

We see that the logarithmic formulations Log and DLog can have a significant advantage over

the other formulations (up to over an order of magnitude for K = 32) for K’s larger than 4 and that,

as expected, this advantage grows with K. Another interesting observation concerns SOS2, which

in previous tests with CPLEX 9.1 was significantly slower than most mixed integer programming

formulations. It seems that the reason for this bad performance was more of an implementation

issue than a property of the SOS2 based formulation (Vielma et al. 2008). As the results show, the

implementation of SOS2 constraints has been significantly improved in CPLEX 11 which allows

SOS2 to always be among the 5 best formulations. In fact, it is only for K = 32 that we have

mixed integer formulations outperforming SOS2 by more than an order of magnitude.

In an attempt to explain the results from Table 2 we study some characteristics of the solves by

CPLEX. In Table 3 we present some results for the instances with K = 8 (Appendix EC.3 includes

the same results for K = 4, K = 16, and K = 32). Corollary 1 states that all MIP formulations

should provide the same LP relaxation bound and so should SOS2 (Keha et al. 2004). We confirmed

this is true up to small numerical errors and, as expected, the common bound was not equal to

the optimal MIP solution resulting in an average integrality GAP of 5% (calculated as 100(zIP −

zLP )/zIP where zIP and zLP are the optimal values of the mixed integer program and its LP

relaxation respectively). However, an equality in the LP relaxation bound does not necessarily imply

an equality on the LP bound obtained at the root node by CPLEX as this includes preprocessing

and cuts. For this reason we present in Table 3(a) the percentage of the integrality GAP that

was closed by CPLEX at the root node for the different formulations (this was calculated as

100(zroot − zLP )/(zIP − zLP ) where zroot is the optimal values of the root relaxation obtained by

CPLEX after preprocessing and cutting planes). A second issue is the time required to solve the

LP relaxation of the different formulations, which we present in Table 3(b). Because the solve


times were very small for all formulations, the results from Table 3(b) are in milliseconds. Finally,

in Table 3(c) we present the number of nodes processed by CPLEX. The complexity of CPLEX

model min avg max stdMC 17.4 37 61.2 9.5DCC 11.8 23 38.6 6.0DLog 4.7 20 46.7 7.2Log 6.2 19 35.6 6.6Inc 5.3 19 39.3 6.9CC 9.3 16 39.2 4.9SOS2 0.0 0 1.2 0.1

(a) GAP closed at root node by CPLEX. [%]

model min avg max stdSOS2 0 4.3 10 4.3Log 0 9.5 20 9.5DCC 0 11.5 20 11.5CC 0 11.6 20 11.6DLog 0 11.9 20 11.9MC 10 24.3 40 24.3Inc 20 38.1 50 38.1(b) LP relaxation solve time. [ms]

model min avg max stdMC 64 535 2003 301Inc 142 1970 8814 1611DLog 134 2419 17114 2415Log 120 2591 22541 2777DCC 549 13956 120035 19253CC 500 17276 127110 22467SOS2 606 21833 337199 39081(c) Branch-and-bound nodes processed.

Table 3 Solve characteristics for univariate continuous functions and K = 8.

makes it hard to infer categorical conclusions about these results, but we will comment on some

interesting patterns. Note that a larger formulation might have an LP relaxation which is slower

to solve, but it might allow CPLEX to close a larger percentage of the integrality GAP. This can

lead to fewer branch-and-bound nodes needed to solve the problem, which can translate to faster

solve speeds. An example of this behavior is MC, which has the second slowest solve time for its LP

relaxation, but allows CPLEX to close the largest percentage of the integrality GAP resulting in

the best performances in both number of nodes and solve times. On the other hand, having a small

formulation can have the reverse effect on the LP relaxation solve speeds and closed GAP, but

might still provide an advantage. For example, SOS2 is one of the smallest formulations as it does

not include any binary variables. We can see that CPLEX does not close a significant percentage

of the integrality GAP for SOS2, which translates into a need to process a large number of nodes.

However, having the fastest solve time for its LP relaxation allows this formulation to still have

an excellent performance with respect to solve times. Still, faster solves of its LP relaxation and


large percentages of root GAP closed might not necessarily translate to better performance. For

example, DCC is on average better than Inc and DLog with respect to both solve speed of its LP

relaxation and GAP closed at the root node. However, Inc and DLog have a better or comparable

performance than DCC in terms of both solve times and nodes processed. This is particularly

surprising for DLog which is essentially the same as DCC but with fewer variables. A possible

explanation for this behavior is that Log, Inc and DLog allow CPLEX to perform a more effective

branch-and-bound search. DCC produces unbalanced branch-and-bound trees as fixing a binary

variable to zero produces very little change compared to fixing the same variable to one. In contrast,

Log and DLog are designed to produce balanced branch-and-bound trees, and Inc also produces

a fairly balanced tree since fixing a binary to a particular value in Inc usually fixes many other

variables to take the same value.

5.2. Continuous Non-Separable Functions

We now consider non-separable functions of two variables. For these experiments we selected a

series of two commodity transportation problems with 5 supply nodes and 2 demand nodes. These

instances were constructed by combining two 5× 2 transportation problems generated in a man-

ner similar to the instances used in Vielma et al. (2008). The supplies, demands and individual

commodity arc capacities for each commodity were obtained from two different transportation

problems and the joint arc capacities were set to 3/4 of the sum of the corresponding individual

arc capacities. We considered an objective function of the form∑

e∈E fe(x1e, x

2e) where E is the

common set of 10 arcs of the transportation problems and fe(x1e, x

2e) is a piecewise linear function

of the flows xie in arc e of commodity i for i = 1,2. Each fe(x1e, x

2e) for arc e with individual arc

capacities uie for commodity i= 1,2 was constructed by triangulating [0, u1e]× [0, u2

e] with the J1

triangulation induced by the grid obtained from the subdivision of [0, u1e] and [0, u2

e] into K inter-

vals as determined from the respective original transportation problems. For K ranging from 4

to 16 the number of vertices and triangles range from 25 to 289 and from 32 to 512 respectively.

Using this triangulation we then obtained fe(x1e, x

2e) by interpolating g (‖(x1

e, x2e)‖) where ‖ · ‖ is


the Euclidean norm and g : [0,‖(u1e, u

2e)‖]→ R is a continuous concave piecewise linear function

randomly generated in a similar way to the univariate functions of Section 5.1. The idea of this

function is to use the sub-linearity of the Euclidean norm to consider discounts for sending the two

commodities in the same arc and concave function g to consider economies of scale. We selected

5 combinations of different pairs of the original transportation problems and for each one of these

we generated 20 objective functions for a total of 100 instances for each K.

Table 4 shows the usual statistics for the solve times with different grid sizes for all the appro-

priate formulations. We again used a limit of 10000 seconds and only tested a formulation for the

next largest K if it had failed in less than 5 instances in the previous K.

model min avg max std wins failLog 0.4 2.7 9.3 2.0 93 0MC 1.2 5.6 17.1 3.1 7 0DLog 1.6 7.6 25.5 5.2 0 0CC 5.9 17.8 107.2 14.5 0 0Inc 2.8 31.7 126.5 25.8 0 0DCC 8.1 36.8 476.1 50.6 0 0

(a) 4× 4 grid.

model min avg max std wins failLog 1.7 13 33 5.4 100 0DLog 17.8 45 135 20.2 0 0MC 30.9 398 5328 583.6 0 0Inc 99.5 769 6543 1110.5 0 0CC 102.9 4412 10000 3554.6 0 13DCC 237.0 6176 10000 3385.9 0 31

(b) 8× 8 grid.

model min avg max std wins failLog 27 56 118 19 100 0DLog 125 325 1064 128 0 0Inc 772 4857 10000 3429 0 20MC 2853 9266 10000 1678 0 77

(c) 16× 16 grid.

Table 4 Solve times for two variable multi-commodity transportation problems. [s].

Logarithmic models Log and DLog were among the best performers for all grid sizes, probably

because for two variable functions |P| grows much faster with k than in the univariate case. For

example, for k = 4 a k × k grid yields |P| = 32 which is comparable to k = 32 in the univariate

case. In addition, the smaller number of continuous variables is what probably allows Log to be

the best performer overall.

6. Extension to Lower Semicontinuous Functions

In this section we study the extension of the formulations to discontinuous functions such as the

ones in Figure 3. Consider first the univariate piecewise linear discontinuous function g depicted in


Figure 3(a), for which g−(d) = limx→dx≤d

g(x) and g+(d) = limx→dx≥d

g(x). Function g is now only affine

in [0,2), {2}, (2,4] and (4,5]. However, because g is lower semicontinuous we have that epi(g) is

closed and is still the union of polyhedra with common recession cone C+1 . Hence we can model

epi(g) as a binary mixed-integer programming problem. The example from Figure 3(a) shows that

0 2 4 5g(4) = 0

g(0) = g+(4) = 1

g−(2) = 4

g+(2) = g(5) = 3

g(2) = 2

(a) g

x

y

(b) h

Figure 3 Lower semicontinuous piecewise linear functions.

to consider discontinuous univariate piecewise linear functions we need to use intervals that are

not necessarily of the form [di−1, di] for di−1 < di. The inclusion of points described as {d}= [d, d]

complies with Definition 1 as we did not require the polytopes to be full dimensional. In contrast,

the inclusion of non closed intervals such as [0,2) requires the use of sets other than polytopes. The

simplest extension we can use is to consider bounded sets that can be described by a finite number

of strict and non-strict linear inequalities. These sets are usually referred to as copolytopes (Kannan

1992). Using copolytopes instead of polytopes we get the following definition for not necessarily

continuous piecewise linear functions.

Definition 2 (Piecewise Linear Function). LetD⊂Rn be a compact set. A (not necessar-

ily continuous) function f :D⊂Rn→R is piecewise linear if and only if there exists a finite family

of copolytopes P complying with D=⋃P∈P P and (3) for some {mP}P∈P ⊆Rn and {cP}P∈P ⊆R.

For example, function h from Figure 3(b) can be described as

h(x, y) :=

3 (x, y)∈ P1

2 (x, y)∈ P2

2 (x, y)∈ P3

0 (x, y)∈ P4.

(16)


for P1 = (0,1]2, P2 = {(x, y)∈R2 : x= 0, y > 0}, P3 = {(x, y)∈R2 : y= 0, x > 0} and P4 = {(0,0)}.

A piecewise linear function as defined in Definition 2 is not necessarily lower semicontinuous,

but this condition is crucial for obtaining a mixed integer programming model. For a lower semi-

continuous piecewise linear function f we have a direct extension of characterization (4) to

epi(f) =C+n +

⋃P∈P

conv({(v,mPv+ cP )}v∈V (P )

), (17)

where V (P ) denotes the set of vertices of the closure P of P . We note that the closure of a

copolytope P = {x ∈ Rn : aix ≤ bi ∀i ∈ {1, . . . , p}, aix < bi ∀i ∈ {p + 1, . . . ,m}} is P = {x ∈ Rn :

aix≤ bi ∀i∈ {1, . . . ,m}}. We now study the extension of formulations from Section 3 to the lower

semicontinuous case and comment on the properties of the extensions.

Formulations DCC, DLog and MC directly model epi(f) so their extension to the lower semi-

continuous case is achieved by replacing characterization (4) of epi(f) for continuous f by charac-

terization (17) of epi(f) for lower semicontinuous f . Because V (P ) in (4) is replaced by V (P ) in

(17) the extension of DCC is obtained by replacing V (P ) by V (P ) in (5). For univariate functions

this extension has been noted in Croxton et al. (2003a) and Sherali (2001). Similarly, the extension

of DLog is obtained by replacing V (P ) by V (P ) in (6). The extension of MC is obtained from (10)

by replacing (10b) by APλP ≤ yP bP ∀P ∈ P where APλP ≤ bP is the set of linear inequalities

describing polytope P . For univariate functions this extension has been noted in Croxton et al.

(2003a). In Appendix EC.4 we illustrate these formulations for the function depicted in Figure 3(a).

Additionally, for simple discontinuities we can use ad-hoc techniques to adapt other formulations

as well. We explore two such techniques for univariate functions.

The first technique is from Vielma et al. (2008) and involves duplicating break points at which

a univariate function is discontinuous. For a univariate lower semicontinuous piecewise linear

function f : [0, u]→ R we always have an integer K and real numbers (dk)Kk=0 and (fk)Kk=0 such

that 0 = d0 ≤ d1 ≤ . . . ≤ dK = u, fk is equal to f(dk), f−(dk) or f+(dk) and epi(f) = C+1 +(⋃K

k=1 conv ({(dk−1, fk−1), (dk, fk)}))

. Using this characterization we can adapt CC to obtain the

formulation given by


K∑k=0

λkdk = x,K∑k=0

λkfk ≤ z,K∑k=0

λk = 1, λk ≥ 0 ∀k ∈ {0, . . . ,K} (18a)

λ0 ≤ y1, λK ≤ yK , λk ≤ yk + yk+1 ∀k ∈ {1, . . . ,K − 1},K∑k=1

yk = 1, y ∈ {0,1}K . (18b)

We can also adapt Inc, Log and SOS2. For Log we replace (18b) by the corresponding con-

straints (9c), which in this case are∑

k∈Lsλk ≤ ys,

∑k∈Rs

λk ≤ (1 − ys) and ys ∈ {0,1} for all

s ∈ {1, . . . , dlog2(K)e}, where Ls :={k ∈ {0, . . . ,K} : (k= 0 or Gk

l = 1) and(k=K or Gk+1

l = 1)}

and Rs :={k ∈ {0, . . . ,K} : (k= 0 or Gk

l = 0) and(k=K or Gk+1

l = 0)}

for an arbitrary but fixed

set of vectors (Gl)Kl=1 ⊂ {0,1}dlog2(K)e that form a Gray code. For Inc we obtain the formulation

given by

d0 +K∑k=1

δk (dk− dk−1) = x, f0 +K∑k=1

δk (fk− fk−1)≤ z, (19a)

δ1 ≤ 1, δK ≥ 0, δk+1 ≤ yk ≤ δk, yk ∈ {0,1} ∀k ∈ {1, . . . ,K − 1}. (19b)

For SOS2 the adaptation is analoguous to the one for CC and is described in Vielma et al. (2008).

We denote these models CC Dup, Inc Dup, Log Dup and SOS2 Dup. In Appendix EC.4 we

illustrate them for the function depicted in Figure 3(a).

The second technique can be applied when all discontinuities of f are caused by fixed charge

type jumps. In this case, f is the sum of a continuous function fC of the form (2) and a lower

semicontinuous non-decreasing step function

fJ(x) :=

{0 x= 0bk x∈ (dk−1, dk] ∀k ∈ {1, . . . ,K}

(20)

for (dk)Kk=0 ∈ RK+1, (bk)Kk=1 ∈ RK+ such that 0 = d0 < d1 < . . . < dK = u and 0≤ b1 ≤ b2 ≤ . . .≤ bK .

Hence, for (mk)Kk=1 ∈RK and (ck)Kk=1 ∈RK , f can be described as

f(x) :=

{c1 x= 0mkx+ ck + bk x∈ (dk−1, dk] ∀k ∈ {1, . . . ,K}.

(21)

By using the relation f = fC + fJ we can construct a model for epi(f) from models for epi(fC)

and epi(fJ). This combination of models is referred to as model linkage in Jeroslow and Lowe

(1985) where it is shown to computationally perform relatively poorly, in part because formulation


sharpness is not preserved by model linkage and in part because of poor coordination between the

binary variables of the linked models. Fortunately, as noted in Lowe (1984), it is sometimes possible

to improve model coordination by using ad-hoc techniques. We illustrate this possible coordination

by using two specific examples. In both cases we need a lower semicontinuous function f : [0, u]→R

which is continuous and zero valued at zero and hence has 0 = c1 = b1 in characterizations (20) and

(21). The first coordination is for the model obtained by linking CC and the model of fJ given byK∑k=0

dkλdk= x,

K∑k=1

bkwk ≤ z,K∑k=0

λdk= 1,

K∑k=1

wk = 1, 0≤ λd0 ≤w1, 0≤ λdK≤wK (22a)

0≤ λdk≤ (wk +wk+1) ∀k ∈ {1, . . . ,K − 1}, wk ∈ {0,1} ∀k ∈ {1, . . . ,K}. (22b)

To coordinate we identify the λdkvariables of the models and force wk = y[dk−1,dk]. The resulting

model is given byK∑k=0

dkλdk= x, λd0m1d0 +

K∑k=1

(λdk

(mkdk + ck) + bkwk

)≤ z,

K∑k=0

λdk= 1, 0≤ λd0 ≤w1 (23a)

0≤ λdK≤wK , 0≤ λdk

≤ (wk +wk+1) ∀k ∈ {1, . . . ,K − 1},K∑k=1

wk = 1, w ∈ {0,1}K . (23b)

We refer to this formulation as the coordinated convex combination model and denote it by

CC Coord. A similar coordination can be achieved by linking Inc and another model of fJ . The

resulting model is given byK∑k=1

δk (dk− dk−1) = x,K∑k=1

(mkdk−mkdk−1

)δk +

K−1∑k=1

(bk+1− bk

)yk ≤ z (24a)

δ1 ≤ 1, δK ≥ 0, δk+1 ≤ yk ≤ δk, yk ∈ {0,1} ∀k ∈ {1, . . . ,K − 1}. (24b)

This model has been studied in Keha (2003). We refer to this formulation as the coordinated

incremental model and denote it by Inc Coord. We illustrate these formulations in Appendix EC.4.

Regarding the properties of the formulations, it is direct that Proposition 1, Theorem 2 and

Theorem 4 also hold for lower semicontinuous piecewiselinear functions. It is also direct that DCC,

DLog and MC remain locally ideal for lower semicontinuous functions, that Inc Dup, Log Dup

and Inc Coord are locally ideal and that CC Dup is sharp, but not locally ideal. Finally, it is

direct that CC Coord is not locally ideal, but the following proposition holds.

Proposition 2. CC Coord is sharp.


7. Computational Experiments for Lower Semicontinuous Functions

In this section we computationally test the MIP formulations for lower semicontinuous piecewise

linear functions. We use the same transportation problems from Section 5.

7.1. Discontinuous Separable Functions

The first set of experiments considers formulations for univariate lower semicontinuous functions.

The instances tested in this section were obtained from the transportation problems from Sec-

tion 5.1 by modifying functions fe(xe) of the flow xe on arc e. Each function fe(xe) affine in

segments {[dk−1, dk]}Kk=1 was transformed into a discontinuous function by adding fixed charge

jumps in each of the breakpoints {dk}Kk=0. Each jump was randomly generated by independently

selecting an integer in [10,50] using a uniform distribution.

We tested MC, DCC and DLog as they can directly handle lower semicontinuous functions.

However, we modified DLog as it initially performed poorly (for K = 4 it had an average

solve time of 562 seconds and a maximum solve time of 6615 seconds). We believe that this

poor performance was due to |P| not being a power of two (for K = 4 we have P = {d0 =

0, (d0, d1], (d1, d2], (d2, d3], (d3, d4]}) as this is a common problem with binary encoded formulations

(Coppersmith and Lee 2005). To resolve this we subtracted f+e (0) from each function fe(xe) and

reset the value of fe(0) to 0. This eliminated the fixed charges at 0 leaving each fe(xe) continuous

and zero valued at 0. To restore the fixed charges we added a binary variable ye ∈ {0,1} for each

e ∈E with objective coefficient equal to the original fixed charge f+e (0) and constraint xe ≤ ueye.

We also tested CC Coord and Inc Coord with the fixed charge elimination technique because they

require functions that are continuous and zero valued at 0. We additionally tested the formulations

obtained by applying the break point duplication technique to CC, Log, Inc and SOS2. Additional

combinations of models and techniques are not included either because they are redundant (e.g.

DCC directly handles lower semicontinuous functions and hence does not require the break point

duplication technique) or because they are not compatible (e.g. we are not aware of any effective

coordination technique for Log). Table 5 shows the usual statistics for these instances.


model min avg max std win failMC 0.5 5.5 30 5.2 76 0Inc Coord 0.8 7.3 40 6.3 15 0DLog FC 0.8 9.0 41 6.5 6 0Inc Dup 1.0 10.7 61 8.4 3 0Log Dup 1.0 13.0 69 8.7 0 0DCC 2.0 14.8 75 9.5 0 0CC Coord 1.1 15.7 116 14.1 0 0SOS2 Dup 3.2 56.7 522 75.3 0 0CC Dup 7.4 78.9 646 105.3 0 0

(a) 4 segments.

model min avg max std win failMC 0.0 16 107 23 86 0DLog FC 0.3 32 123 21 9 0Log Dup 2.1 43 241 38 4 0Inc Coord 7.9 70 298 51 0 0Inc Dup 18.7 84 300 51 0 0DCC 0.0 366 10000 1110 1 1SOS2 Dup 8.8 476 5919 853 0 0CC Coord 21.3 699 5438 1014 0 0CC Dup 8.1 895 10000 1644 0 2

(b) 8 segments.

model min avg max std win failDLog FC 23 106 445 88 55 0MC 13 263 2697 401 29 0Log Dup 12 331 10000 1055 16 1Inc Coord 108 333 2037 247 0 0Inc Dup 105 405 1548 278 0 0SOS2 Dup 51 1952 10000 2587 0 6CC Dup 177 4409 10000 3223 0 18CC Coord 342 6018 10000 3624 0 36DCC 110 8046 10000 3551 0 76

(c) 16 segments.

model min avg max std win failDLog FC 54 779 5395 958 84 0Inc Coord 287 1586 10000 1457 1 1Inc Dup 315 1935 10000 1984 2 4Log Dup 77 2661 10000 3268 4 12MC 116 4282 10000 4070 9 30

(d) 32 segments.

Table 5 Solve times for univariate discontinuous functions [s].

Again MC is one of the best performers except for K = 32 where the logarithmic models again

have the advantage. The duplication and coordination techniques only seem to work well for

Inc and Log which were already faster than CC in the continuous case. This could explain their

advantage when using the duplication and coordination techniques as well. However, this expla-

nation does not hold for SOS2, which did very well in the continuous case, but performed poorly

here.

7.2. Discontinuous Non-Separable Functions

The set of experiments in this section considers non-separable functions of two variables. The

instances tested in this section where obtained from the 5× 2 multi commodity transportation

problems from Section 5.2 by replacing function fe(x1e, x

2e) of the flows xie in arc e of commodity i


for i= 1,2. To define the new function we use the K×K grid {d10, . . . , d

1K}×{d2

0, . . . , d2K} obtained

from the subdivision of [0, u1e] and [0, u2

e] into K intervals as determined from the respective original

transportation problems. We select two random samples of size K from set {0,1, . . . ,10K−1} and

sort them in non-increasing order to obtain (rik)Kk=1 for each i = 1,2. We then define si0 = 0 and

sik = rik(dik− dik−1) + sik−1 for each k ∈ {1, . . . ,K} and i= 1,2. fe(x1e, x

2e) is defined as

fe(x1e, x

2e) :=

x1e +x2

e (x, y) = (0,0)x1e +x2

e + s1k x∈ (d1

k−1, d1k], y= 0

x1e +x2

e + s2k y ∈ (d2

k−1, d2k], x= 0

x1e +x2

e + 0.75(s1k + s2

l ) (x, y)∈ (d1k−1, d

1k]× (d2

l−1, d2l ].

The idea is that for each commodity there is a fixed shipping charge for arc e that depends on the

interval (dik−1, dik] in which the amount xie shipped falls. We have that this fixed charge divided by

the amount shipped is non-increasing because of economies of scale and that if both commodities

are shipped through arc e there is a 75% discount on the sum of the fixed charges.

We only tested MC, DCC and DLog as they can handle general lower semicontinuous piecewise

linear functions. Table 6 shows the usual statistics for different grid sizes. We again see that MC is

stat min avg max std wins failMC 0.1 2.3 8.8 1.8 97 0DLog 0.4 6.0 19.3 3.9 3 0DCC 0.9 9.9 29.8 6.6 0 0

(a) 4× 4 grid.

stat min avg max std wins failDLog 1.1 17 59 11 51 0MC 1.0 19 122 18 49 0DCC 8.4 83 377 64 0 0

(b) 8× 8 grid.

stat min avg max std wins failDLog 4.8 55 201 36 96 0MC 10.2 209 1138 195 4 0DCC 51.2 890 2993 542 0 0

(c) 16× 16 grid.

stat min avg max std wins failDLog 56 319 1385 201 100 0MC 151 4310 10000 3780 0 25DCC 1648 8504 10000 2545 0 65

(d) 32× 32 grid.

Table 6 Solve times for non-separable functions [s].

always faster than DCC and is only significantly slower than DLog for the largest grids. Finally,

we note that the smaller solve times for these instances when compared to the ones in Section 5.2

could be due to the fact that here the only nonlinearities in the objective functions are fixed charges.


8. Conclusions

We studied the modeling of piecewise linear functions as MIPs. We reviewed several new and

existing formulations for continuous functions with particular attention paid to their extension to

the multivariate non-separable case. We also compared these formulations both with respect to

their theoretical properties and their relative computational performance. In addition we studied

several ways to extend these formulations to consider lower semicontinuous functions.

Because of the limited computational experiments it is hard to reach categorical conclusions.

However there are several trends that, combined with the theoretical properties of the formulations,

provide general guidelines for the use of the different formulations by practitioners. For example,

when the number of polytopes defining the piecewise linear function is small MC seems to be one of

the best choices. Furthermore it seems to be always preferable to CC and DCC. Another example

concerns functions defined by a large number of polytopes. In this case the sizes of logarithmic

formulations DLog and Log can give them a significant computational advantage. Finally, for lower

semicontinuous functions it seems that, with the exception of SOS2 Dup, special ad-hoc techniques

only provide an advantage when they are used to adapt formulations that already performed well

in the continuous case.

Acknowledgments

This research has been supported by NSF grants CMMI-0522485 and CMMI-0758234, AFOSR grant FA9550-

07-1-0177 and Exxon Mobil Upstream Research Company. The authors would also like to thank an anony-

mous associate editor and two anonymous referees for their prompt review and thoughtful comments.

References

Aichholzer, O., F. Aurenhammer, F. Hurtado, H. Krasser. 2003. Towards compatible triangulations. Theoret.

Comput. Sci. 296 3–13.

Balakrishnan, A., S. C. Graves. 1989. A composite algorithm for a concave-cost network flow problem.

Networks 19 175–202.

Balas, E. 1979. Disjunctive programming. Ann. of Discrete Math. 5 3–51.


Bergamini, M. L., P. Aguirre, I. Grossmann. 2005. Logic-based outer approximation for globally optimal

synthesis of process networks. Comput. and Chemical Engrg. 29 1914–1933.

Bergamini, M. L., I. Grossmann, N. Scenna, P. Aguirre. 2008. An improved piecewise outer-approximation

algorithm for the global optimization of minlp models involving concave and bilinear terms. Comput.

and Chemical Engrg. 32 477–493.

Carnicer, J. M., M. S. Floater. 1996. Piecewise linear interpolants to Lagrange and Hermite convex scattered

data. Numer. Algorithms 13 345–364.

Coppersmith, D., J. Lee. 2005. Parsimonious binary-encoding in integer programming. Discrete Optim. 2

190–200.

Croxton, K. L., B. Gendron, T. L. Magnanti. 2003a. A comparison of mixed-integer programming models

for nonconvex piecewise linear cost minimization problems. Management Sci. 49 1268–1273.

Croxton, K. L., B. Gendron, T. L. Magnanti. 2003b. Models and methods for merge-in-transit operations.

Transportation Sci. 37 1–22.

Croxton, K. L., B. Gendron, T. L. Magnanti. 2007. Variable disaggregation in network flow problems with

piecewise linear costs. Oper. Res. 55 146–157.

Dantzig, G. B. 1960. On the significance of solving linear-programming problems with some integer variables.

Econometrica 28 30–44.

Dantzig, G. B. 1963. Linear Programming and Extensions. Princeton University Press, Princeton.

de Farias Jr., I. R., M. Zhao, H. Zhao. 2008. A special ordered set approach for optimizing a discontinuous

separable piecewise linear function. Oper. Res. Lett. 36 234–238.

Fourer, R., D. M. Gay, B. W. Kernighan. 1993. AMPL–A Modeling Language for Mathematical Programming .

The Scientific Press.

Garfinkel, R. S., G. L. Nemhauser. 1972. Integer Programming . Wiley.

Graf, T., P. Vanhentenryck, C. Pradelleslasserre, L. Zimmer. 1990. Simulation of hybrid circuits in constraint

logic programming. Comput. & Math. With Appl. 20 45–56.

Ibaraki, T. 1976. Integer programming formulation of combinatorial optimization problems. Discrete Math.

16 39–52.


Jeroslow, R. G. 1987. Representability in mixed integer programming 1: characterization results. Discrete

Appl, Math. 17 223–243.

Jeroslow, R. G. 1989. Representability of functions. Discrete Appl, Math. 23 125–137.

Jeroslow, R. G., J. K. Lowe. 1984. Modeling with integer variables. Math. Programming Stud. 22 167–184.

Jeroslow, R. G., J. K. Lowe. 1985. Experimental results on the new techniques for integer programming

formulations. J. of the Oper. Res. Soc. 36 393–403.

Kannan, R. 1992. Lattice translates of a polytope and the frobenius problem. Combinatorica 12 161–177.

Keha, A. B. 2003. A polyhedral study of nonconvex piecewise linear optimization. Ph.D. thesis, Georgia

Institute of Technology.

Keha, A. B., I. R. de Farias, G. L. Nemhauser. 2004. Models for representing piecewise linear cost functions.

Oper. Res. Lett. 32 44–48.

Keha, A. B., I. R. de Farias, G. L. Nemhauser. 2006. A branch-and-cut algorithm without binary variables

for nonconvex piecewise linear optimization. Oper. Res. 54 847–858.

Lasdon, L. S., A. D. Waren. 1980. A survey of nonlinear programming applications. Oper. Res. 28 1029–1073.

Lee, J., D. Wilson. 2001. Polyhedral methods for piecewise-linear functions I: the lambda method. Discrete

Appl, Math. 108 269–285.

Lowe, J. K. 1984. Modelling with integer variables. Ph.D. thesis, Georgia Institute of Technology.

Magnanti, T. L., D. Stratila. 2004. Separable concave optimization approximately equals piecewise linear

optimization. G. L. Nemhauser, D. Bienstock, eds., IPCO , Lecture Notes in Computer Science, vol.

3064. Springer, 234–243.

Markowitz, H., A. Manne. 1957. On the solution of discrete programming-problems. Econometrica 25

84–110.

Martin, A., M. Moller, S. Moritz. 2006. Mixed integer models for the stationary case of gas network opti-

mization. Math. Programming 105 563–582.

Meyer, R. R. 1976. Mixed integer minimization models for piecewise-linear functions of a single variable.

Discrete Math. 16 163–171.

Nemhauser, G. L., L. A. Wolsey. 1988. Integer and combinatorial optimization. Wiley-Interscience.


Padberg, M. 2000. Approximating separable nonlinear functions via mixed zero-one programs. Oper. Res.

Lett. 27 1–5.

Padberg, M. W., M. P. Rijal. 1996. Location, Scheduling, Design, and Integer Programming . Springer.

Pottmann, H., R. Krasauskas, B. Hamann, K. I. Joy, W. Seibold. 2000. On piecewise linear approximation

of quadratic functions. Journal for Geometry and Graphics 4 9–31.

Sherali, H. D. 2001. On mixed-integer zero-one representations for separable lower-semicontinuous piecewise-

linear functions. Oper. Res. Lett. 28 155–160.

Todd, M. J. 1977. Union Jack triangulations. S. Karamardian, ed., Fixed Points: algorithms and applications.

Academic Press, 315–336.

Tomlin, J.A. 1981. A suggested extension of special ordered sets to non-separable non-convex program-

ming problems. P. Hansen, ed., Studies on Graphs and Discrete Programming , Annals of Discrete

Mathematics, vol. 11. North Holland, 359–370.

Vajda, S. 1964. Mathematical Programming . Addison-Wesley.

Vielma, J. P., A. B. Keha, G. L. Nemhauser. 2008. Nonconvex, lower semicontinuous piecewise linear

optimization. Discrete Optim. 5 467–488.

Vielma, J. P., G. L. Nemhauser. 2008a. Modeling disjunctive constraints with a logarithmic number of binary

variables and constraints. A. Lodi, A. Panconesi, G. Rinaldi, eds., IPCO , Lecture Notes in Computer

Science, vol. 5035. Springer, 199–213.

Vielma, J. P., G. L. Nemhauser. 2008b. Modeling disjunctive constraints with a logarithmic number of binary

variables and constraints. Georgia Institute of Technology .

Wilf., H. S. 1989. Combinatorial algorithms–an update, CBMS-NSF regional conference series in applied

mathematics, vol. 55. Society for Industrial and Applied Mathematics.

Wilson, D. L. 1998. Polyhedral methods for piecewise-linear functions. Ph.D. thesis, University of Kentucky.

e-companion to Vielma, Ahmed and Nemhauser: Mixed-Integer Models for Piecewise Linear Optimization ec1

Appendices

EC.1. Example for Continuous Functions

The function given in Figure 1(a) can be described as

f(x) :=

22x+ 10 x∈ [0,1]

8x+ 24 x∈ [1,2]

−17.5x+ 75 x∈ [2,4]

10x− 35 x∈ [4,5]

(EC.1)

and characterization (4) of its epigraph is given by

epi(f) = {(0, r) : r≥ 0}+(

conv({(0,10), (1,32)}

)∪ conv

({(1,32), (2,40)}

)∪ conv

({(2,40), (4,5)}

)∪ conv

({(4,5), (5,15)}

)).

Note that this representation can be simplified by replacing conv({(2,40), (4,5)}) ∪

conv({(4,5), (5,15)}) with conv({(2,40), (4,5), (5,15)}), but this requires detecting that

(conv({(2,40), (4,5)}) +{(0, r) : r≥ 0})∪ (conv({(4,5), (5,15)}) +{(0, r) : r≥ 0}) is in fact a poly-

hedron.

We now describe the formulations in Section 3 for the function defined in (EC.1). DCC is given

by

0λ[0,1],0 + 1(λ[0,1],1 +λ[1,2],1

)+ 2

(λ[1,2],2 +λ[2,4],2

)+ 4

(λ[2,4],4 +λ[4,5],4

)+ 5λ[4,5],5 = x

10λ[0,1],0 + 32(λ[0,1],1 +λ[1,2],1

)+ 40

(λ[1,2],2 +λ[2,4],2

)+ 5

(λ[2,4],4 +λ[4,5],4

)+ 15λ[4,5],5 ≤ z

λ[0,1],0, λ[0,1],1, λ[1,2],1, λ[1,2],2, λ[2,4],2, λ[2,4],4, λ[4,5],4, λ[4,5],5 ≥ 0

λ[0,1],0 +λ[0,1],1 = y[0,1], λ[1,2],1 +λ[1,2],2 = y[1,2], λ[2,4],2 +λ[2,4],4 = y[2,4], λ[4,5],4 +λ[4,5],5 = y[4,5]

y[0,1] + y[1,2] + y[2,4] + y[4,5] = 1, y[0,1], y[1,2], y[2,4], y[4,5] ∈ {0,1}.

For B([0,1]) = (0,0)T , B([1,2]) = (0,1)T , B([2,4]) = (1,1)T ,B([4,5]) = (1,0)T DLog is given by

0λ[0,1],0 + 1(λ[0,1],1 +λ[1,2],1

)+ 2

(λ[1,2],2 +λ[2,4],2

)+ 4

(λ[2,4],4 +λ[4,5],4

)+ 5λ[4,5],5 = x

ec2 e-companion to Vielma, Ahmed and Nemhauser: Mixed-Integer Models for Piecewise Linear Optimization

10λ[0,1],0 + 32(λ[0,1],1 +λ[1,2],1

)+ 40

(λ[1,2],2 +λ[2,4],2

)+ 5

(λ[2,4],4 +λ[4,5],4

)+ 15λ[4,5],5 ≤ z

λ[0,1],0, λ[0,1],1, λ[1,2],1, λ[1,2],2, λ[2,4],2, λ[2,4],4, λ[4,5],4, λ[4,5],5 ≥ 0

λ[2,4],2 +λ[2,4],4 +λ[4,5],4 +λ[4,5],5 ≤ y1, λ[0,1],0 +λ[0,1],1 +λ[1,2],1 +λ[1,2],2 ≤ (1− y1)

λ[1,2],1 +λ[1,2],2 +λ[2,4],2 +λ[2,4],4 ≤ y2, λ[0,1],0 +λ[0,1],1 +λ[4,5],4 +λ[4,5],5 ≤ (1− y2)

y1, y2 ∈ {0,1}.

CC is given by

0λ0 + 1λ1 + 2λ2 + 4λ4 + 5λ5 = x, 10λ0 + 32λ1 + 40λ2 + 5λ4 + 15λ5 ≤ z

λ0, λ1, λ2, λ4, λ5 ≥ 0, λ0 +λ1 +λ2 +λ4 +λ5 = 1

λ0 ≤ y[0,1], λ1 ≤ y[0,1] + y[1,2], λ2 ≤ y[1,2] + y[2,4], λ4 ≤ y[2,4] + y[4,5], λ5 ≤ y[4,5]

y[0,1] + y[1,2] + y[2,4] + y[4,5] = 1, y[0,1], y[1,2], y[2,4], y[4,5] ∈ {0,1}.

For G1 = (0,0)T , G2 = (1,0)T , G3 = (1,1)T , G4 = (0,1)T and ϕ(0) = 0, ϕ(1) = 1, ϕ(2) = 2, ϕ(3) =

4, ϕ(4) = 5, Log is given by

0λ0 + 1λ1 + 2λ2 + 4λ3 + 5λ4 = x, 10λ0 + 32λ1 + 40λ2 + 5λ3 + 15λ4 ≤ z

λ0, λ1, λ2, λ3, λ4 ≥ 0, λ0 +λ1 +λ2 +λ3 +λ4 = 1

λ2 ≤ y1, λ0 +λ4 ≤ (1− y1), λ3 +λ4 ≤ y2, λ0 +λ1 ≤ (1− y2), y1, y2 ∈ {0,1}.

Finally, MC is given by

x[0,1] +x[1,2] +x[2,4] +x[4,5] = x(22x[0,1] + 10y[0,1]

)+(8x[1,2] + 24y[1,2]

)+(−17.5x[2,4] + 75y[2,4]

)+(10x[4,5]− 35y[4,5]

)≤ z

0y[0,1] ≤ x[0,1] ≤ y[0,1], 1y[1,2] ≤ x[1,2] ≤ 2y[1,2], 2y[2,4] ≤ x[2,4] ≤ 4y[2,4], 4y[4,5] ≤ x[4,5] ≤ 5y[4,5]

y[0,1] + y[1,2] + y[2,4] + y[4,5] = 1, y[0,1], y[1,2], y[2,4], y[4,5] ∈ {0,1}

and Inc is given by

10 + 22δ1 + 8δ2− 35δ3 + 10δ4 ≤ z, 0 + δ1 + δ2 + 2δ3 + δ4 = x

y1 ≤ δ1 ≤ 1, y2 ≤ δ2 ≤ y1, y3 ≤ δ3 ≤ y2, 0≤ δ4 ≤ y3, y1, y2, y3 ∈ {0,1}.


EC.2. Proofs

Theorem 1. All formulations from Section 3 except CC are locally ideal.

Proof of Theorem 1. All models except CC, DLog and Log have been previously shown to be

locally ideal (Balas 1979, Jeroslow and Lowe 1984, Lowe 1984, Padberg 2000, Sherali 2001, Wilson

1998), so we only need to prove that DLog and Log are locally ideal.

For Log assume for contradiction that there exists an vertex (x, z,λ, y) of (9) such that ys ∈ (0,1)

for some s∈ S. We divide the proof in two main cases.

Case 1:∑

v∈Lsλv < ys and

∑v∈Rs

λv < (1−ys). For ε > 0 define (x1, z1, λ1, y1) and (x2, z2, λ2, y2)

as x1 = x2 = x, z1 = z2 = z, λ1 = λ2 = λ, y1 = y + ε and y2 = y − ε. For sufficiently small ε we

have that (x1, z1, λ1, y1) and (x2, z2, λ2, y2) comply with (9) and (x, z,λ, y) = 1/2(x1, z1, λ1, y1) +

1/2(x2, z2, λ2, y2). This contradicts (x, z,λ, y) being a vertex.

Case 2:∑

v∈Lsλv = ys or

∑v∈Rs

λv = (1 − ys). Without loss of generality we may assume

that∑

v∈Lsλv = ys. We then have vs ∈ Ls such that 0 < λvs < 1 and vl /∈ Ls such that 0 < λvl

<

1. If∑

v∈Rsλv = (1 − ys) we additionally select vl ∈ Rs. For ε > 0 we define (x1, z1, λ1, y1) and

(x2, z2, λ2, y2) in the following way. First let λ1k = λ1

k = λk for all k /∈ {vs, vl}, λ1vs

= λvs +ε, y1s = ys+ε,

λ2vs

= λvs − ε, y2s = ys − ε, λ1

vl= λ1

vl− ε and λ2

vl= λ2

vl+ ε. To define y1

t and y2t for each t ∈ S \ {s}

we only need to consider the following four cases (note that Lt ∩Rt = ∅ and that without loss of

generality we can exchange Rt and Lt):

(a) vs, vl ∈Lt and vs, vl /∈Rt.

(b) vs ∈Lt and vl ∈Rt.

(c) vs ∈Lt, vl /∈Lt and vl /∈Rt (case vl ∈Lt, vs /∈Lt and vs /∈Rt is analogous).

(d) vs, vl /∈Lt and vs, vl /∈Rt.

For case a) we can simply set y1t = y2

t = y. For case b) we have 0 < yt < 1 and we can set

y1t = yt + ε and y2

t = yt− ε. For case c) we either have∑

v∈Ltλv < yt or

∑v∈Lt

λv = yt. For the first

case we can simply set y1t = y2

t = y. For the second case we have 0< yt < 1 and∑

v∈Rtλv < (1− yt)

and we can set y1t = yt + ε and y2

t = yt − ε. For case d) we can set y1t = y2

t = y. Finally we set


x1 = x + ε(vs − vl), x2 = x − ε(vs − vl), z1 = z + ε(f(vs) − f(vl)) and z2 = z − ε(f(vs) − f(vl)).

We again have that for sufficiently small ε (x1, z1, λ1, y1) and (x2, z2, λ2, y2) comply with (9) and

(x, z,λ, y) = 1/2(x1, z1, λ1, y1) + 1/2(x2, z2, λ2, y2).

For DLog the proof is analogous. �

Proposition 1. Any locally ideal formulation is sharp.

Proof of Proposition 1. We need to prove P(x,y) ⊂ conv(epi(f)). If x ∈ P(x,y) then because P is

locally ideal there exist λ ∈ Rp, y ∈ [0,1]q such that (x, z,λ, y) = (0, h,0,0) +∑

i∈I µi(xi, zi, λi, yi)

for h≥ 0, |I|<∞, µ∈RI+,∑

i∈I µi = 1, and (xi, zi, λi, yi)∈ P with yi ∈ {0,1}q for every i∈ I. Then

by (1) (xi, zi)∈ epi(f) for all i∈ I and hence (x, z)∈ conv(epi(f)). �

Theorem 3. All formulations from Section 3 are sharp.

Proof of Theorem 3. This is direct from Theorem 1 for all formulations except CC. For CC the

result follows by noting that the projection onto the x and z variables of the polyhedron given by∑v∈V(P) λvv = x,

∑v∈V(P) λvf(v)≤ z, λv ≥ 0 ∀v ∈ V(P) and

∑v∈V(P) λv = 1 is clearly contained

in conv(epi(f)). �

Proposition 2. CC Coord is sharp.

Proof of Proposition 2. It suffices to show that for f defined in (21) and for any vertex (λ∗,w∗)

of

K∑k=0

λdk= 1,

K∑k=1

wk = 1, w ∈ {0,1}K (EC.2a)

0≤ λd0 ≤w1, 0≤ λdK≤wK , 0≤ λdk

≤ (wk +wk+1) ∀k ∈ {1, . . . ,K − 1} (EC.2b)

we have (x∗, z∗) ∈ conv(epi(f)) for z∗ := λ∗d0m1d0 +∑K

k=1

(λ∗dk

(mkdk + ck) + bkw∗k

),

x∗ :=∑K

k=0 dkλ∗dk

.

From Proposition 4 of Lee and Wilson (2001) we have that the vertices of (EC.2) are of the

following forms:

1. λ∗dl=w∗l = 1, λ∗dk

= 0 ∀k 6= l, w∗k = 0 ∀k 6= l.


2. λ∗dl=w∗l+1 = 1, λ∗dk

= 0 ∀k 6= l, w∗k = 0 ∀k 6= l+ 1.

3. λ∗dl−1= λ∗dl

=w∗l =w∗l+1 = 1/2, λ∗dk= 0 ∀k /∈ {l− 1, l}, w∗k = 0 ∀k /∈ {l, l+ 1}.

4. λ∗dl−1= λ∗dl

=w∗l−1 =w∗l = 1/2, λ∗dk= 0 ∀k /∈ {l− 1, l}, w∗k = 0 ∀k /∈ {l− 1, l}.

For case 1 we have x∗ = dl and z∗ = mldl + cl + bl so (x∗, z∗) ∈ epi(f). For case 2 and l ≥ 1

we have x∗ = dl and z∗ = mldl + cl + bl+1 ≥ mldl + cl + bl so (x∗, z∗) ∈ epi(f). For case 2 and

l = 0 we have x∗ = d0 and z∗ = 0 so (x∗, z∗) ∈ epi(f). For case 3 we have x∗ = (dl−1 + dl)/2 and

z∗ = (ml−1dl−1 + cl−1 + bl + mldl + cl + bl+1)/2 ≥ (ml(dl−1 + dl)/2 + cl + bl so (x∗, z∗) ∈ epi(f).

For case 4 we have x∗ = (dl−1 + dl)/2 and z∗ = (ml−1dl−1 + cl−1 + bl−1 + mldl + cl + bl)/2, but

(dk,mkdk + ck + bk)∈ epi(f) so (x∗, z∗)∈ conv(epi(f)). �

EC.3. Additional Computational Results for Univariate ContinuousFunctions

In this section we extend the computational results of Section 5.1 by considering instances with

K = 4, 16 and 32. Tables EC.1, EC.2 and EC.3 present the same statistics as Table 3 for this set

of instances. We note that for these values of K the average integrality GAPS are 4%, 6% and 6%

respectively.


model min avg max stdMC 26 58 100 17.7DLog 16 37 62 10.8Inc 12 37 100 16.0DCC 16 36 60 10.2Log 13 36 61 10.0CC 11 25 43 6.7SOS2 0 0 0 0.0


model min avg max stdSOS2 0 2.8 10 2.8Log 0 5.3 10 5.3DLog 0 5.7 10 5.7DCC 0 5.9 10 5.9CC 0 7.1 10 7.1MC 0 9.4 20 9.4Inc 0 12.4 20 12.4(b) LP relaxation solve time. [ms]

model min avg max stdMC 0 234 891 216Inc 1 357 2081 365DLog 22 504 2677 529Log 14 587 3569 617DCC 10 798 5960 897CC 30 964 8938 1139SOS2 220 1974 13434 1833

(c) Branch-and-bound nodes processed.

Table EC.1 Solve characteristics for univariate continuous functions and K = 4.

model min avg max stdMC 10.9 26 53 7.2DCC 7.1 17 48 5.9DLog 2.0 17 51 8.1Log 2.0 17 51 7.7Inc 2.6 14 35 5.9CC 5.6 10 21 2.8SOS2 0.0 0 0 0.0


model min avg max stdSOS2 0 5.9 10 5.9Log 0 15.8 20 15.8CC 10 23.3 40 23.3DLog 10 27.7 40 27.7DCC 10 29.9 40 29.9MC 50 89.6 120 89.5Inc 90 139.0 180 138.9

(b) LP relaxation solve time. [ms]

model min avg max stdMC 52 2809 27890 4392DLog 44 4129 19900 3978Log 45 4428 23921 4167Inc 204 5139 25162 4118CC 245 28895 241524 38696SOS2 1487 98050 959307 155930DCC 461 302134 2345087 461223




model min avg max stdMC 9.5 18.8 31.4 4.8Log 1.7 15.0 32.3 6.7DCC 5.8 13.4 25.4 3.7Inc 1.8 9.5 24.4 4.3CC 2.0 5.6 9.5 1.5DLog 0.1 1.8 11.6 1.9SOS2 0.0 0.0 0.0 0.0


model min avg max stdSOS2 0 12 20 11Log 10 30 40 30CC 20 39 60 39DLog 40 60 70 60DCC 60 93 110 93MC 230 418 600 418Inc 410 534 670 534(b) LP relaxation solve time. [ms]

model min avg max stdDLog 382 4776 27375 4926Log 276 5287 25797 5505Inc 964 8196 40352 7315MC 471 28855 146197 37678CC 1762 80224 505999 103995SOS2 2752 471156 4707352 943424DCC 5097 916227 1485910 389175



EC.4. Example for Lower Semicontinuous Functions

The function given in Figure 3(a) can be described as

g(x) :=

1.5x+ 1 x∈ [0,1)

2 x∈ [2,2]

−1.5x+ 6 x∈ (2,4]

2x− 7 x∈ (4,5]

(EC.3)

and characterization (17) of its epigraph is given by

epi(g) = {(0, r) : r≥ 0}+(

conv({(0,1), (2,4)}

)∪ conv

({(2,2)}

)∪ conv

({(2,3), (4,0)}

)∪ conv

({(4,1), (5,3)}

)).

We now describe the formulations in Section 3 for the function defined in (EC.3). DCC is given

by

0λ[0,2),0 + 2(λ[0,2),2 +λ[2,2],2 +λ(2,4],2

)+ 4

(λ(2,4],4 +λ(4,5],4

)+ 5λ(4,5],5 = x

1λ[0,2),0 + 4λ[0,2),2 + 2λ[2,2],2 + 3λ(2,4],2 + 0λ(2,4],4 + 1λ(4,5],4 + 3λ(4,5],5 ≤ z


λ[0,2),0, λ[0,2),2, λ[2,2],2, λ(2,4],2, λ(2,4],4, λ(4,5],4, λ(4,5],5 ≥ 0

λ[0,2),0 +λ[0,2),2 = y[0,2), λ[2,2],2 = y[2,2], λ(2,4],2 +λ(2,4],4 = y(2,4], λ(4,5],4 +λ(4,5],5 = y(4,5]

y[0,2) + y[2,2] + y(2,4] + y(4,5] = 1, y[0,2), y[2,2], y(2,4], y(4,5] ∈ {0,1}.

For B([0,2)) = (0,0)T , B([2,2]) = (0,1)T , B((2,4]) = (1,1)T ,B((4,5]) = (1,0)T DLog is given by

0λ[0,2),0 + 2(λ[0,2),2 +λ[2,2],2 +λ(2,4],2

)+ 4

(λ(2,4],4 +λ(4,5],4

)+ 5λ(4,5],5 = x

1λ[0,2),0 + 4λ[0,2),2 + 2λ[2,2],2 + 3λ(2,4],2 + 0λ(2,4],4 + 1λ(4,5],4 + 3λ(4,5],5 ≤ z

λ[0,2),0, λ[0,2),2, λ[2,2],2, λ(2,4],2, λ(2,4],4, λ(4,5],4, λ(4,5],5 ≥ 0

λ(2,4],2 +λ(2,4],4 +λ(4,5],4 +λ(4,5],5 ≤ y1, λ[0,2),0 +λ[0,2),2 +λ[2,2],2 ≤ (1− y1)

λ[2,2],2 +λ(2,4],2 +λ(2,4],4 ≤ y2, λ[0,2),0 +λ[0,2),2 +λ(4,5],4 +λ(4,5],5 ≤ (1− y2).

y1, y2 ∈ {0,1}.

MC is given by

x[0,2) +x[2,2] +x(2,4] +x(4,5] = x(1.5x[0,2) + 1y[0,2)

)+(0x[2,2] + 2y[2,2]

)+(−1.5x(2,4] + 6y(2,4]

)+(2x(4,5]− 7y[4,5]

)≤ z

0y[0,2) ≤ x[0,2) ≤ y[0,2), 2y[2,2] ≤ x[2,2] ≤ 2y[2,2], 2y(2,4] ≤ x(2,4] ≤ 4y(2,4], 4y(4,5] ≤ x(4,5] ≤ 5y(4,5]

y[0,2) + y[2,2] + y(2,4] + y(4,5] = 1, y[0,2), y[2,2], y(2,4], y(4,5] ∈ {0,1}.

For g defined in (EC.3) epi(g) = C+1 +

(⋃K

k=1 conv ({(dk−1, fk−1), (dk, fk)}))

for K = 6, d0 = 0,

d1 = d2 = d3 = 2, d4 = d5 = 4, d6 = 5, f0 = g(0) = 1, f1 = g−(2) = 4, f2 = g(2) = 2, f3 = g+(2) = 3,

f4 = g(4) = 0, f5 = g+(4) = 1 and f6 = g(5) = 3. CC Dup is given by

0λ0 + 2(λ1 +λ2 +λ3) + 4(λ4 +λ5) + 5λ6 = x, 1λ0 + 4λ1 + 2λ2 + 3λ3 + 0λ4 + 1λ5 + 3λ6 ≤ z

λ0, λ1, λ2, λ3, λ4, λ5, λ6 ≥ 0, λ0 +λ1 +λ2 +λ3 +λ4 +λ5 +λ6 = 1

λ0 ≤ y0, λ1 ≤ y1 + y2, λ2 ≤ y2 + y3, λ3 ≤ y3 + y4, λ4 ≤ y4 + y5, λ5 ≤ y5 + y6, λ6 ≤ y6

y1 + y2 + y3 + y4 + y5 + y6 = 1, y ∈ {0,1}6.


For G1 = (0,0,0)T , G2 = (1,0,0)T , G3 = (1,1,0)T , G4 = (0,1,0)T , G5 = (0,1,1)T , G6 = (1,1,1)T

Log Dup is given by

0λ0 + 2(λ1 +λ2 +λ3) + 4(λ4 +λ5) + 5λ6 = x, 1λ0 + 4λ1 + 2λ2 + 3λ3 + 0λ4 + 1λ5 + 3λ6 ≤ z

λ0, λ1, λ2, λ3, λ4, λ5, λ6 ≥ 0, λ0 +λ1 +λ2 +λ3 +λ4 +λ5 +λ6 = 1

λ2 +λ6 ≤ y1, λ0 +λ4 ≤ (1− y1), λ3 +λ4 +λ5 +λ6 ≤ y2, λ0 +λ1 ≤ (1− y2)

λ5 +λ6 ≤ y3, λ0 +λ1 +λ2 +λ3 ≤ (1− y3), y ∈ {0,1}3.

Inc Dup is given by

0 + 2δ1 + 0δ2 + 0δ3 + 2δ4 + 0δ5 + 1δ6 = x, 1 + 3δ1− 2δ2 + 1δ3− 3δ4 + 1δ5 + 2δ6 ≤ z

y1 ≤ δ1 ≤ 1, y2 ≤ δ2 ≤ y1, y3 ≤ δ3 ≤ y2, y4 ≤ δ4 ≤ y3, y5 ≤ δ5 ≤ y4, 0≤ δ6 ≤ y5, y ∈ {0,1}5.

Finally, we describe formulations with the coordination technique for the function g̃= g̃C + g̃J in

Figure EC.1. This function can be described in form (21) for K = 2, d0 = 0, d1 = 1, d2 = 2, m1 = 2,

m2 = 1, c1 = 0, c2 = 1, b1 = 0 and b2 = 1, which yields

g̃(x) :=

2x x∈ [0,1]

x+ 2 x∈ (1,2].(EC.4)

0 1 20

2

3

(a) g̃C .

0 1 20

1

(b) g̃J .

0 1 20

2

3

4

(c) g̃.

Figure EC.1 Decomposition of fixed charged lower semicontinuous piecewise linear function.

For g̃, CC Coord is given by

0λ0 + 1λ1 + 2λ2 = x, 0w1 + 1w2 + 0λ0 + 2λ1 + 3λ2 ≤ z, λ0 +λ1 +λ2 = 1


λ0 ≤w1, λ1 ≤w1 +w2, λ2 ≤w2, λ0, λ1, λ2 ≥ 0, w1 +w2 = 1, w1, w2 ∈ {0,1}

and Inc Coord is given by

δ1 + δ2 = x, 2δ1 + 1δ2 +w1 ≤ z, w1 ≤ δ1 ≤ 1, 0≤ δ2 ≤w1, w1 ∈ {0,1}.

Documents

Mixed-Integer Models for Nonseparable Piecewise Linear ...jvielma/publications/Mixed-Integer-Models-for-Nonseparable.pdf · charges. Applications of optimization problems with non-convex