research.wsulibs.wsu.edu€¦ · ACKNOWLEDGEMENTS My greatest appreciation and my most sincere \Thank You!" go to my advisor Pro-fessor Ari Ariyawansa for his guidance, advice, and

$Page 1: research.wsulibs.wsu.edu€¦ · ACKNOWLEDGEMENTS My greatest appreciation and my most sincere \Thank You!" go to my advisor Pro-fessor Ari Ariyawansa for his guidance, advice, and$
OPTIMIZATION OVER SYMMETRIC CONES UNDER UNCERTAINTY

By

BAHA’ M. ALZALG

A dissertation submitted in partial fulfillment ofthe requirements for the degree of

DOCTOR OF PHILOSOPHY

WASHINGTON STATE UNIVERSITYDepartment of Mathematics

DECEMBER 2011

To the Faculty of Washington State University:

The members of the Committee appointed to examine the dissertation of

BAHA’ M. ALZALG find it satisfactory and recommend that it be accepted.

K. A. Ariyawansa, Professor, Chair

Robert Mifflin, Professor

David S. Watkins, Professor

ii

For all the people

iii

ACKNOWLEDGEMENTS

My greatest appreciation and my most sincere “Thank You!” go to my advisor Pro-

fessor Ari Ariyawansa for his guidance, advice, and help during the preparation of this

dissertation. I am also grateful for his offer for me to work as his research assistant and

for giving me the opportunity to write papers and visit several conferences and workshops

in North America and overseas.

I wish also to express my appreciation and gratitude to Professor Robert Mifflin and

Professor David S. Watkins for taking the time to serve as committee members and for

various ways in which they helped me during all stages of doctoral studies.

I want to thank all faculty, staff and graduate students in the Department of Mathe-

matics at Washington State University. I would especially like to thank from the faculty

Associate Professor Bala Krishnamoorthy, from the staff Kris Johnson, and from the

students Pietro Paparella for their kind help.

Finally, no words can express my gratitude to my parents and my grandmother for

love and prayers. I also owe a special gratitude to my brothers and sisters, and some

relatives in Jordan for their support and encouragement.

iv

OPTIMIZATION OVER SYMMETRIC CONES UNDER UNCERTAINTY

Abstract

by Baha’ M. Alzalg, Ph.D.

Washington State University

December 2011

Chair: Professor K. A. Ariyawansa

We introduce and study two-stage stochastic symmetric programs (SSPs) with recourseto handle uncertainty in data defining (deterministic) symmetric programs in which alinear function is minimized over the intersection of an affine set and a symmetric cone.We present a logarithmic barrier decomposition-based interior point algorithm for solvingthese problems and prove its polynomial complexity. Our convergence analysis proceedsby showing that the log barrier associated with the recourse function of SSPs behaves asa strongly self-concordant barrier and forms a self-concordant family on the first stagesolutions. Since our analysis applies to all symmetric cones, this algorithm extends Zhao’sresults [48] for two-stage stochastic linear programs, and Mehrotra and Ozevin’s results[25] for two-stage stochastic semidefinite programs (SSDPs). We also present another classof polynomial-time decomposition algorithms for SSPs based on the volumetric barrier.While this extends the work of Ariyawansa and Zhu [10] for SSDPs, our analysis is basedon utilizing the advantage of the special algebraic structure associated with the symmetriccone not utilized in [10]. As a consequence, we are able to significantly simplify the proofsof central results. We then describe four applications leading to the SSP problem where, inparticular, the underlying symmetric cones are second-order cones and rotated quadraticcones.

v

Contents

Acknowledgement iv

Abstract v

Chapter 1

1 Introduction and Background 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 What is a symmetric cone? . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Symmetric cones and Euclidean Jordan algebras . . . . . . . . . . . . . . . 11

2 Stochastic Symmetric Optimization Problems 28

2.1 The stochastic symmetric optimization problem . . . . . . . . . . . . . . . 28

2.1.1 Definition of an SSP in primal standard form . . . . . . . . . . . . 29

2.1.2 Definition of an SSP in dual standard form . . . . . . . . . . . . . . 30

2.2 Problems that can be cast as SSPs . . . . . . . . . . . . . . . . . . . . . . 31

3 A Class of Polynomial Logarithmic Barrier Decomposition Algorithms

for Stochastic Symmetric Programming 42

3.1 The log barrier problem for SSPs . . . . . . . . . . . . . . . . . . . . . . . 43

3.1.1 Formulation and assumptions . . . . . . . . . . . . . . . . . . . . . 43

3.1.2 Computation of ∇xη(µ,x) and ∇2xxη(µ,x) . . . . . . . . . . . . . . 50

vi

3.2 Self-concordance properties of the log-barrier recourse . . . . . . . . . . . . 53

3.2.1 Self-concordance of the recourse function . . . . . . . . . . . . . . . 53

3.2.2 Parameters of the self-concordant family . . . . . . . . . . . . . . . 59

3.3 A class of logarithmic barrier algorithms for solving SSPs . . . . . . . . . . 64

3.4 Complexity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.4.1 Complexity for short-step algorithm . . . . . . . . . . . . . . . . . . 66

3.4.2 Complexity for long-step algorithm . . . . . . . . . . . . . . . . . . 67

4 A Class of Polynomial Volumetric Barrier Decomposition Algorithms

for Stochastic Symmetric Programming 75

4.1 The volumetric barrier problem for SSPs . . . . . . . . . . . . . . . . . . . 76

4.1.1 Formulation and assumptions . . . . . . . . . . . . . . . . . . . . . 76

4.1.2 The volumetric barrier problem for SSPs . . . . . . . . . . . . . . . 77

4.1.3 Computation of ∇xη(µ,x) and ∇2xxη(µ,x) . . . . . . . . . . . . . . 80

4.2 Self-concordance properties of the volumetric barrier recourse . . . . . . . . 85

4.2.1 Self-Concordance of η(µ, ·) . . . . . . . . . . . . . . . . . . . . . . . 86

4.2.2 Parameters of the self-concordant family . . . . . . . . . . . . . . . 93

4.3 A class of volumetric barrier algorithms for solving SSPs . . . . . . . . . . 97

4.4 Complexity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

4.4.1 Complexity for short-step algorithm . . . . . . . . . . . . . . . . . . 100

4.4.2 Complexity for long-step algorithm . . . . . . . . . . . . . . . . . . 101

5 Some Applications 106

5.1 Two applications of SSOCPs . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5.1.1 Stochastic Euclidean facility location problem . . . . . . . . . . . . 106

5.1.2 Portfolio optimization with loss risk constraints . . . . . . . . . . . 112

5.2 Two applications of SRQCPs . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.2.1 Optimal covering random ellipsoid problem . . . . . . . . . . . . . . 118

vii

5.2.2 Structural optimization . . . . . . . . . . . . . . . . . . . . . . . . . 125

6 Related Open problems: Multi-Order Cone Programming Problems 130

6.1 Multi-order cone programming problems . . . . . . . . . . . . . . . . . . . 131

6.2 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.3 Multi-oder cone programming problems over integers . . . . . . . . . . . . 138

6.4 Multi-oder cone programming problems under uncertainty . . . . . . . . . 139

6.5 An application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

6.5.1 CERFLPs—An MOCP model . . . . . . . . . . . . . . . . . . . . . 142

6.5.2 DERFLPs—A 0-1MOCP model . . . . . . . . . . . . . . . . . . . . 143

6.5.3 ERFLPs with integrality constraints—An MIMOCP model . . . . . 145

6.5.4 Stochastic CERFLPs—An SMOCP model . . . . . . . . . . . . . . 146

7 Conclusion 150

viii

List of Abbreviations

CERFLP continuous Euclidean-rectilinear facility location problem

CFLP continuous facility location problem

DERFLP discrete Euclidean-rectilinear facility location problem

DFLP discrete facility location problem

DLP deterministic linear programming

DRQCP deterministic rotated quadratic cone programming

DSDP deterministic semidefinite programming

DSOCP deterministic second-order cone programming

DSP deterministic symmetric programming

EFLP Euclidean facility location problem

ERFLP Euclidean-rectilinear facility location problem

ESFLP Euclidean single facility location problem

FLP facility location problem

KKT Karush-Kuhn-Tucker conditions

MFLP multiple facility location problem

MIMOCP mixed integer multi-order cone programming

MOCP (deterministic) multi-order cone programming

0− 1MOCP 0-1 multi-order cone programming

POCP pth − order cone programming

RFLP rectilinear facility location problem

SFLP stochastic facility location problem

SLP stochastic linear programming

SMOCP stochastic multi-order cone programming

SRQCP stochastic rotated quadratic cone programming

SSDP stochastic semidefinite programming

ix

SSOCP stochastic second-order cone programming

SSP stochastic symmetric programming

Arw(x) the arrow-shaped matrix associated with the vector x

Aut(K) the automorphism group of a cone K

diag(·) the operator that maps its argument to a block diagonal matrix

GL(n,R) the general linear group of degree n over R

int(K) the interior of a cone K

En the n dimensional real vector space whose elements are indexed from 0

En+ the second-order cone of dimension n

En+ the rotated quadratic cone of dimension n

Hn the space of complex Hermitian matrices of order n

Hn+ the space of complex Hermitian semidefinite matrices of order n

KJ the cone of squares of a Euclidean Jordan algebra J

Qnp the pth-order cone of dimension n

QHn the space of quaternion Hermitian matrices of order n

QHn+ the space of quaternion Hermitian semidefinite matrices of order n

Sn the space of real symmetric matrices of order n

Sn+ the space of real symmetric positive semidefinite matrices of order n

Rn the space of real vectors of dimension n

Rn+ the cone of nonnegative orthants of Rn

||x|| the Frobenius norm of an element x

||x||2 the Euclidean norm of a vector x

||x||p the p-norm of a vector x

bxc the linear representation of an element x

dxe the quadratic representation of an element x

x

X 0 the matrix X is positive semidefinite

X 0 the matrix X is positive definite

x 0 the vector x lies in a second-order cone of an appropriate dimension

x 0 the vector x lies in the interior of a second-order cone of an

appropriate dimension

x N 0 the vector x lies in the Cartesian product of N second-order cones with

appropriate dimensions

x 0 the vector x lies in a rotated quadratic cone of an appropriate dimension

x N 0 the vector x lies in the Cartesian product of N rotated quadratic cones

with appropriate dimensions

x KJ 0 x is an element of a symmetric cone KJ

x KJ 0 x is an element of the interior of a symmetric cone KJ

x 〈p〉 0 the vector x lies in a pth-order cone of an appropriate dimension

x N〈p〉 0 the vector x lies in the Cartesian product of N pth-order cones with

appropriate dimensions

x 〈p1,p2,...,pN 〉 0 the vector x lies in the Cartesian product of N cones of orders p1, p2, . . . ,

pN and with appropriate dimensions

x y the Jordan multiplication of elements x and y of a Jordan algebra

x • y the inner product trace(x y) of elements x and y of a Euclidean Jordan

algebra

xi

Chapter 1

Introduction and Background

1.1 Introduction

The purpose of this dissertation is to introduce the two-stage stochastic symmetric pro-

grams (SSPs)1 with recourse and to study this problem in the dual standard form:

max cTx+ E [Q(x, ω)]

s.t. Ax+ ξ = b

ξ ∈ K1,

(1.1.1)

where x and ξ are the first-stage decision variables, and Q(x, ω) is the minimum of the

problem

max d(ω)Ty

s.t. W (ω)y + ζ = h(ω)− T (ω)x

ζ ∈ K2,

(1.1.2)

1Based on tradition in optimization literature, we use the term stochastic symmetric program to meanthe generic form of a problem, and the term stochastic symmetric programming to mean the field ofactivities based on that problem. While both will be denoted by the acronym SSP, the plural of the firstusage will be denoted by the acronym SSPs. Acronyms DLP, DRQCP, DSDP, DSOCP, DSP, MIMOCP,MOCP, 0-1MOCP, SLP, SMOCP, SRQCP SSDP, and SSOCP are defined and used accordance with thiscustom.

1

where y and ζ are the second-stage variables, E[Q(x, ω)] :=∫

ΩQ(x, ω)P (dω), the matrix

A and the vectors b and c are deterministic data, and the matrices W (ω) and T (ω) and

the vectors h(ω) and d(ω) are random data whose realizations depend on an underlying

outcome ω in an event space Ω with a known probability function P . The cones K1 and K2

are symmetric cones (i.e., closed, convex, pointed, self-dual cones with their automorphism

groups acting transitively on their interiors) in Rn1 and Rn2 . (Here, n1 and n2 are positive

integers.)

The birth of symmetric programming (also known as symmetric cone programming

[36]) as a subfield of convex optimization can be dated back to 2003. The main motivation

of this generalization was its ability to handle many important applications that cannot be

covered by linear programming. In symmetric programs, we minimize a linear function

over the intersection of an affine set and a so called symmetric cone. In particular, if

the symmetric cone is the nonnegative orthant cone, the result is linear programming, if

it is the second-order cone, the result is second-order cone programming, and if it is the

semidefinite cone (the cone of all real symmetric positive semidefinite matrices), the result

is semidefinite programming. It is seen that symmetric programming is a generalization

of linear programming that includes second-order cone programming and semidefinite

programming as special cases. We shall refer to such problems as deterministic linear

programs (DLPs), deterministic second-order cone programs (DSOCPs), deterministic

semidefinite programs (DSDPs) and deterministic symmetric programs (DSPs) because

the data defining applications leading to such problems is assumed to be known with

certainty. It has been found that DSPs are related to many application areas including

finance, geometry, robust linear programming, matrix optimization, norm minimization

problems, and relaxations in combinatorial optimization. We refer the reader to the survey

papers by Todd [38] and Vandenberghe and Boyd [41] which discuss, in particular, DSDP

and its applications, and the survey papers of Alizadeh and Goldfarb [1], and Lobo, et al.

[23] which discuss, in particular, DSOCPs with a number of applications in many areas

2

including a variety of engineering applications.

Deterministic optimization problems are formulated to find optimal decisions in prob-

lems with certainty in data. In fact, in some applications we cannot specify the model

entirely because it depends on information which is not available at the time of formu-

lation but it will be determined at some point in the future. Stochastic programs have

been studied since 1950s to handle those problems that involve uncertainty in data. See

[17, 32, 33] and references contained therein. In particular, two-stage stochastic linear

programs (SLPs) have been established to formulate many applications (see [15] for ex-

ample) of linear programming with uncertain data. There are efficient algorithms (both

interior and noninterior point) for solving SLPs. The class of SSP problems may be viewed

as an extension of DSPs (by allowing uncertainty in data) on the one hand, and as an

extension of SLPs (where K1 and K2 are both cones of nonnegative orthants) or, more

generally, stochastic semidefinite programs (SSDPs) with recourse [9, 25] (where K1 and

K2 are both semidefinite cones) on the other hand.

Interior point methods [30] are considered to be one of the most successful classes of

algorithms for solving deterministic (linear and nonlinear) convex optimization problems.

This provides motivation to investigate whether decomposition-based interior point algo-

rithms can be developed for stochastic programming. Zhao [48] derived a decomposition

algorithm for SLPs based on a logarithmic barrier and proved its polynomial complexity.

Mehrotra and Ozevin [25] have proved important results that extend the work of Zhao

[48] to the case of SSDPs including a derivation of a polynomial logarithmic barrier de-

composition algorithm for this class of problems that extends Zhao’s algorithm for SLPs.

An alternative to the logarithmic barrier is the volumetric barrier of Vaidya [40] (see also

[5, 6, 7]). It has been observed [8] that certain cutting plane algorithms [21] for SLPs

based on the volumetric barrier perform better in practice than those based on the loga-

rithmic barrier. Recently, Ariyawansa and Zhu [10] have derived a class of decomposition

algorithms for SSDPs based on a volumetric barrier analogous to work of Mehrotra and

3

Ozevin [25] by utilizing the work of Anstreicher [7] for DSDP.

Concerning algorithms for DSPs, Schmieta and Alizadeh [35, 36] and Rangarajan [34]

have extended interior point algorithms for DSDP to DSP. Concerning algorithms for

SSPs, we know of no interior point algorithms for solving them that exploits the special

structure of the symmetric cone as it is done in [35, 36, 34] for DSP. The question that

naturally arises now is whether interior point methods could be derived for solving SSPs,

and if the answer is affirmative, it is important to ask whether or not we can prove the

polynomial complexity of the resulting algorithms. Our particular concern in this disser-

tation is to extend decomposition-based interior point methods for SSDPs to stochastic

optimization problems over all symmetric cones based on both logarithmic and volumet-

ric barriers and also to prove the polynomial complexity of the resulting algorithms. In

fact, there is a unifying theory [18] based on Euclidean Jordan algebras that connects all

symmetric cones. This theory is very important for a sound understanding of the Jordan

algebraic characterization of symmetric cones and, hence, a correct understanding of the

very close equivalence between Problem (1.1.1, 1.1.2) and our constructive definition of

an SSP which will be presented in Chapter 2.

Now let us indicate briefly how to solve Problem (1.1.1, 1.1.2). We assume that the

event space Ω is discrete and finite. In practice, the case of interest is when the random

data would have a finite number of realizations, because the inputs of the modeling process

require that SSPs be solved for a finite number of scenarios. The general scheme of the

decomposition-based interior point algorithms is as follows. As we have an SSP with a

finite scenarios, we can explicitly formulate the problem as a large scale DSP, we then use

primal-dual interior point methods to solve the resulting large scale formulation directly,

which can be successfully accomplished by utilizing the special structure of the underlying

symmetric cones.

This dissertation is organized as follows. In the remaining part of this chapter, we

outline a minimal foundation of the theory of Euclidean Jordan algebras. Based on Eu-

4

clidean Jordan algebras. In Chapter 2 we explicitly introduce the constructive definition

of SSP problem in both the primal and dual standard forms and then introduce six general

classes of optimization problems that can be cast as SSPs. The focus of Chapter 3 is on

extending the work of Mehrotra and Ozevin [25] to the case of SSPs by deriving a class

of logarithmic barrier decomposition algorithms for SSPs and establishing the polynomial

complexity of the resulting algorithm.

In Chapter 4, we extend the work of Ariyawansa and Zhu [10] to the case of SSPs

by deriving a class of volumetric barrier decomposition algorithms for SSPs and proving

polynomial complexity of certain members of the class of algorithms.

Chapter 5 is devoted to describe four applications leading to, indeed, two important

special cases of SSPs. More specifically, we describe the stochastic Euclidean facility

location problem and the portfolio optimization problem with loss risk constraints as two

applications of SSPs when the symmetric cones K1 and K2 are both second-order cones,

and then describe the optimal covering random ellipsoid problem and an application in

structural optimization as two applications of SSPs when the symmetric cones K1 and K2

are both rotated quadratic cones (see §2.2 for definitions).

The material in Chapters 3-5 are independent, but they essentially depend on the

material of Chapters 1 and 2. So, after reading Chapter 2, one can proceed directly to

Chapter 3 and/or Chapter 4 concerning theoretical results (algorithms and complexity

analysis), and/or to Chapter 5 concerning applicational models (see Figure 1.1).

In Chapter 6, we propose the so called multi-order cone programming problem as a

new conic optimization problem. This problem is beyond the scope of this dissertation

because it is over non-symmetric cones, so we will leave it as an unsolved open problem.

We indicate weak and strong duality relations of this optimization problem and describe

an application of it.

We conclude this dissertation in Chapter 7 by summarizing its contributions and

indicating some possible future research directions.

5

Figure 1.1: The material has been organized into seven chapters. After reading Chapter2, the reader can proceed directly to Chapter 3, Chapter 4, and/or Chapter 5.

We gave a concise definition for the symmetric cones immediately after Problem (1.1.1,

1.1.2). In §1.2, we write down this definition more explicitly, and in §1.3 we review the

theory of Euclidean Jordan algebras that connects all symmetric cones. The text of Faraut

and Koranyi [18] covers the foundations of this theory.

In Chapter 2, the reader will see how general and abstract the problem is. In order to

make our presentation concrete, we will describe two interesting examples of symmetric

cones with some details throughout this chapter: the second-order cone and the cone of

real symmetric positive semidefinite matrices. Our presentation in §1.2 and §1.3 mostly

follows that of [18] and [36], and, in particular, most examples in §1.3 are taken from [36].

We now introduce some notations that will be used in the sequel. We use R to

denote the field of real numbers. If A ⊆ Rk and B ⊆ Rl, then the Cartesian product of

A× B := (x;y) : x ∈ A and y ∈ B. We use Rm×n to denote the vector spaces of real

m×n matrices. The matrices 0n, In ∈ Rn×n denote, respectively, the zero and the identity

matrices of order n (we will write 0n and In as 0 and I when n is known from the context).

All vectors we use are column vectors with superscript “T” indicating transposition. We

use “,” for adjoining vectors and matrices in a row, and use “;” for adjoining them in a

column. So, for example, if x, y, and z are vectors, we have:

6

x

y

z

= (xT,yT, zT)T = (x;y; z).

For each vector x ∈ Rn whose first entry is indexed with 0, we write x for the subvector

consisting of entries 1 through n− 1; therefore x = (x0; x) ∈ R×Rn−1. We let En denote

the n dimensional real vector space R× Rn−1 whose elements x are indexed with 0, and

denote the space of real symmetric matrices of order n by Sn.

1.2 What is a symmetric cone?

This section and the next section are elementary. We start with some basic definitions

finally leading us to the definition of the symmetric cone.

Let V be a finite-dimensional Euclidean vector space over R with inner product 〈·, ·〉.

A subset S of V is said to be convex if it is closed with respect to convex combinations

of finite subsets of S, i.e., for any λ ∈ (0, 1), x,y ∈ S implies that λx + (1 − λ)y ∈ S.

A subset K ⊂ V is said to be a cone if it is closed under scalar multiplication by positive

real numbers, i.e., if for any λ > 0, x ∈ K implies that λx ∈ K. A convex cone is a cone

that is also a convex set. A cone is said to be closed if it is closed with respect to taking

limits, solid if it has a nonempty interior, pointed if it contains no lines or, alternatively,

it does not contain two opposite nonzero vectors (so the origin is an extreme point), and

regular if it is a closed, convex, pointed, solid cone.

By GL(n,R) the general linear group of degree n over R (i.e., the set of n×n invertible

matrices with entries from R, together with the operation of ordinary matrix multiplica-

tion). For a regular cone K ⊂ V , we denote by int(K) the interior of K, and by Aut(K)

the automorphism group of K, i.e., Aut(K) := ϕ ∈ GL(n,R) : ϕ(K) = K.

Definition 1.2.1. Let V be a finite-dimensional real Euclidean space. A regular K ⊂ V

7

is said to be homogeneous if for each u,v ∈ int(K), there exists an invertible linear map

ϕ : V −→ V such that

1. ϕ(K) = K, i.e., ϕ is an automorphism of K, and

2. ϕ(u) = v.

In other words, Aut(K) acts transitively on the interior of K.

A regular K ⊂ V said to be self-dual if it coincides with its dual cone K∗, i.e.,

K = K∗ := s ∈ V : 〈x, s〉 ≥ 0, ∀x ∈ K,

and symmetric if it is both homogeneous and self-dual.

Almost all conic optimization problems in real world applications are associated with

symmetric cones such as the nonnegative orthant cone, the second-order cone (see below

for definitions), and the cone of positive semi-definite matrices over the real or complex

numbers. Except for Chapter 7 which proposes an unsolved open optimization problem

over a non-symmetric cone, our focus in this dissertation is on those cones that are

symmetric.

The material in the rest of this section is not needed later because the theory of Jordan

algebras will lead us independently to the same results. However, we include this material

for the sake of clarity of the examples and completeness.

Example 1. The second-order cone En+.

We show that the second-order cone (also known as the quadratic, Lorentz, or the ice

cream cone) of dimension n

En+ := ξ ∈ En : ξ0 ≥ ‖ξ‖2,

with the standard inner product, 〈ξ, ζ〉 := ξTζ, is a symmetric cone.

8

In Lemma 6.2.1, we prove that the dual cone of the pth-order cone is the qth-order cone,

i.e.,

ξ ∈ En : ξ0 ≥ ‖ξ‖p∗ = ξ ∈ En : ξ0 ≥ ‖ξ‖q,

where 1 ≤ p ≤ ∞ and q is conjugate to p in the sense that 1/p+ 1/q = 1. Picking p = 2

(hence q = 2) implies that (En+)∗ = En+. This demonstrates the self-duality of En+. So, to

prove that the cone En+ is symmetric, we only need to show that it is homogeneous. The

proof follows that of Example 1 in §2 of [18, Chapter I].

First, notice that En+ can be redefined as

En+ := ξ ∈ En : ξTJξ ≥ 0, ξ0 ≥ 0,

where

J :=

1 0T

0 −In−1

.Notice also that each element of the group G := A ∈ Rn×n : ATJA = J maps En+ onto

itself (because, for every A ∈ G, we have that (Aξ)TJ(Aξ) = ξTJξ), and so does the

direct product H := R+ × G. It now remains to show that the group H acts transitively

on the interior of En+. To do so, it is enough show that, for any x ∈ int(En+), there exists

an element in H that maps e to x, where e := (1; 0) ∈ En.

Note that we may write x as x = λy with λ =√xTJx and y ∈ En. Moreover, there

exists a reflector matrix Q such that y = Q (0; . . . ; 0; r), with r = ‖y‖2. We then have

y20 − r2 = y2

0 − ‖y‖22 = yTJy =

1

λ2xTJx = 1.

Therefore, there exists t ≥ 0 such that y0 = cosh t and r = sinh t.

9

We define

Q :=

1 0T

0 Q

and Ht :=

cosh t 0T sinh t

0 In−2 0

sinh t 0T cosh t

.

Observing that Q,Ht ∈ G yields that QHt ∈ G, and therefore λ QHt ∈ H. The result

follows by noting that x = λ QHt e.

Example 2. The cone of real symmetric positive semidefinite matrices Sn+.

The residents of the interior of the cone of real symmetric positive semidefinite matrices of

order n, Sn+, are the positive definite matrices, the residents of the boundary of this cone

are the singular positive semidefinite matrices (having at least one 0 eigenvalue), and

the only matrix that resides at the origin is the positive semidefinite matrix having all 0

eigenvalues. We now show that Sn+, with the Frobenius inner product 〈U, V 〉 := trace(UV ),

is a symmetric cone.

We first verify that Sn+ is self-dual, i.e.,

Sn+ = (Sn+)∗ = U ∈ Sn : trace(UV ) ≥ 0 for all V ∈ Sn+.

We can easily prove that Sn+ ⊆ (Sn+)∗ by assuming that X ∈ Sn+ and observing that, for

any Y ∈ Sn+, we have

trace(XY ) = trace(X12Y X

12 ) ≥ 0.

Here we used the positive definiteness of X12Y X

12 to obtain the inequality. In fact, any

U ∈ Sn can be written as U = QΛQT where Q ∈ Rn×n is an orthogonal matrix and

Λ ∈ Rn×n is a diagonal matrix whose diagonal entries are the eigenvalues of U (see for

example Watkins [43, Theorem 5.4.20]). Therefore, if U ∈ Sn+ we have that

trace(U) = trace(QΛQT) = trace(ΛQQT) = trace(Λ) =n∑i=1

λi ≥ 0.

10

To prove that (Sn+)∗ ⊆ Sn+, assume that X /∈ Sn+, then there exists a nonzero vector y ∈ Rn

such that trace(X yyT) = yTXy < 0, which shows that X /∈ (Sn+)∗.

So, to prove that the cone Sn+ is symmetric, it remains to show that it is homogeneous.

The proof follows exactly that of Example 2 in §2 of [18, Chapter I].

For P ∈ GL(n,R), we define a linear map ϕP : Sn −→ Sn by

ϕP (X) = PXPT.

Note that ϕP maps Sn+ into itself. By the Cholesky decomposition theorem, if X ∈ int(Sn+)

then X can be decomposed into a product X = LLT = ϕL(In), where L ∈ GL(n,R), which

establishes the result.

1.3 Symmetric cones and Euclidean Jordan algebras

In this section we give a review some of the basic definitions and theory of Euclidean

Jordan algebras that are necessary for our subsequent development. We will also see how

we can use Euclidean Jordan algebras to obtain symmetric cones.

Let J be a finite-dimensional vector space over R. A map : J × J −→ J is

called bilinear if for all x,y ∈ J and for all α, β ∈ R, we have that (αx + βy) z =

α(x z) + β(y z) and x (αy + βz) = α(x y) + β(x z).

Definition 1.3.1. A finite-dimensional vector space J over R is called an algebra over R

if a bilinear map : J × J −→ J exists.

Let x be an element in an algebra J , then for n ≥ 2 we define xn recursively by

xn := x xn−1.

Definition 1.3.2. Let J be a finite-dimensional R algebra with a bilinear product :

J × J −→ J . Then (J , ) is called a Jordan algebra if for all x,y ∈ J

11

1. x y = y x (commutativity);

2. x (x2 y) = x2 (x y) (Jordan’s axiom).

The product xy between two elements x and y of a Jordan algebra (J , ) is called the

Jordan multiplication between x and y. A Jordan algebra (J , ) has an identity element if

there exists a (necessarily unique) element e ∈ J such that xe = ex = x for all x ∈ J .

A Jordan algebra (J , ) is not necessarily associative, that is, x (y z) = (xy)z may

not hold in general. However, it is power associative, i.e., xp xq = xp+q for all integers

p, q ≥ 1.

Example 3. It is easy to see that the space Rn×n of n× n real matrices with the Jordan

multiplication X Y := (XY + Y X)/2 forms a Jordan algebra with identity In.

Example 4. It can be verified that the space En with the Jordan multiplication x y :=

(xTy;x0y + y0x) forms a Jordan algebra with the identity (1; 0) ∈ En.

Definition 1.3.3. A Jordan algebra J is called Euclidean if there exists an inner product

〈·, ·〉 on (J , ) such that for all x,y, z ∈ J

1. 〈x,x〉 > 0 ∀ x 6= 0 (positive definiteness);

2. 〈x,y〉 = 〈y,x〉 (symmetry);

3. 〈x,y z〉 = 〈x y, z〉 (associativity).

That is, J admits a positive definite, symmetric, quadratic form which is also associative.

In the sequel, we consider only Euclidean Jordan algebras with identity. We simply

denote the Euclidean Jordan algebra (J , ) by J .

Example 5. The space Rn×n is not a Euclidean Jordan algebra. However, under the

operation “” defined in Example 3, the subspace Sn is a Jordan subalgebra of Rn×n, and,

12

indeed, is a Euclidean Jordan algebra with the inner product 〈X, Y 〉 = trace(X Y ) =

trace(XY ). While both the symmetry and the associativity can be easily proved by using the

fact that trace(XY ) = trace(Y X), the positive definiteness can be immediately obtained

by observing that trace(X2) > 0 for X 6= 0.

Example 6. It is easy to verify that the space En (with “” defined as in Example ( 4))

is a Euclidean Jordan algebra with the inner product 〈x,y〉 = xTy.

Many of the results below also hold for the general setting of Jordan algebras, but

here we focus entirely on Euclidean Jordan algebras with identity as that generality is not

needed for our subsequent development.

Since a Euclidean Jordan algebra J is power associative, we can define the concepts

of rank, minimal and characteristic polynomials, eigenvalues, trace, and determinant for

it.

Definition 1.3.4. Let J be a Euclidean Jordan algebra. Then

1. for x ∈ J , deg(x) := min r > 0 : e,x,x2, . . . ,xr is linearly dependent is called

the degree of x;

2. rank(J ) := maxx∈J deg(x) is called the rank of J .

Let x be an element of degree d in a Euclidean Jordan algebra J . We define R[x] as

the set of all polynomials in x over R:

R[x] :=

∞∑i=0

aixi : ai ∈ R, ai = 0 for all but a finite number of i

.

Since e,x,x2, . . . ,xd is linearly dependent, there exists a nonzero monic polynomial

q ∈ R[x] of degree d, such that q(x) = 0. In other words, there are real numbers

a1(x), a2(x), . . . , ad(x), not all zero, such that

q(x) := xd − a1(x)xd−1 + a2(x)xd−2 + · · ·+ (−1)dad(x)e = 0.

13

Clearly q is of minimal degree of those polynomials in R[x] which have the above

properties, so we call it (or, alternatively, the polynomial p(λ) := λd − a1(x)λd−1 +

a2(x)λd−2 + · · · + (−1)dad(x)) the minimal polynomial of x. Note that the minimal

polynomial of an element x ∈ J is unique, because otherwise, as its monic, we will have

a contradiction to the minimality of its degree.

An element x ∈ J is called regular if deg(x) = rank(J ). If x is a regular element of a

Euclidean Jordan algebra, then we define its characteristic polynomial to be equal to its

minimal polynomial. We have the following proposition.

Proposition 1.3.1 ([18, Proposition II.2.1]). Let J be an algebra with rank r. The

set of regular elements is open and dense in J . There exist polynomials a1, a2, . . . , ar

such that the minimal polynomial of every regular element x is given by p(λ) = λr −

a1(x)λr−1 + a2(x)λr−2 + · · ·+ (−1)rar(x). The polynomials a1, a2, . . . , ar are unique and

ai is homogeneous with degree i.

The polynomial p(λ) is called the characteristic polynomial of the regular element x.

Since the set of regular elements are dense in J , by continuity we can extend characteristic

polynomials to all x in J . So, the minimal polynomial coincides with the characteristic

polynomial for regular elements and divides the characteristic polynomial of non-regular

elements.

Definition 1.3.5. Let x be an element in a rank-r algebra J , then its eigenvalues are the

roots λ1, λ2, . . . , λr of its characteristic polynomial p(λ) = λr − a1(x)λr−1 + a2(x)λr−2 +

· · ·+ (−1)rar(x).

Whereas the minimal polynomial has only simple roots, it is possible, in the case of

non-regular elements, that the characteristic polynomial have multiple roots. Indeed, the

characteristic and minimal polynomials have the same set of roots, except for their mul-

tiplicities. In fact, we can define the degree of x to be the number of distinct eigenvalues

of x.

14

Definition 1.3.6. Let x be an element in a rank-r algebra J , and λ1, λ2, . . . , λr be the

roots of its characteristic polynomial p(λ) = λr−a1(x)λr−1+a2(x)λr−2+· · ·+(−1)rar(x).

Then

1. trace(x) := λ1 + λ2 + · · ·+ λr = a1(x) is the trace of x in J ;

2. det(x) := λ1λ2 · · ·λr = ar(x) is the determinant of x in J .

Example 7. All these concepts (characteristic polynomials, eigenvalues, trace, determi-

nant, etc.) are the corresponding concepts used in Sn. Observe that rank(Sn) = n because

the deg(X) is the number of distinct eigenvalues of X which is, indeed, at most n.

Example 8. We can easily prove the following quadratic identity for x ∈ En:

x2 − 2x0x+ (x20 − ‖x‖2

2)e = 0.

Thus, we can define the polynomial p(λ) := λ2 − 2x0λ+ (x20 − ‖x‖2

2) as the characteristic

polynomial of x in En and its two roots, λ1,2 = x0±‖x‖2, are the eigenvalues of x. We also

have that trace(x) = 2x0 and det(x) = x20 − ‖x‖2

2. Observe also that λ1 = λ2 if and only

if x = 0, and therefore x is a multiple of the identity. Thus, every x ∈ En−αe : α ∈ R

has degree 2. This implies that rank(En) = 2, which is, unlike the rank of Sn, independent

of the dimension of the underlying vector space.

For an element x ∈ J , let bxc : J −→ J be the linear map defined by bxcy := x y,

for all y ∈ J . Note that bxce = x and bxcx = x2. Note also that the operators bxc and

bx2c commute, because, by Jordan’s Axiom, bxcbx2cy = bx2cbxcy.

Example 9. An equivalent way of dealing with symmetric matrices is dealing with vectors

obtained from the vectorization of symmetric matrices. The operator vec : Sn −→ Rn2

creates a column vector from a matrix X by stacking its column vectors below one another.

15

Note that

vec(XY ) = vec

(XY + Y X

2

)=

1

2(vec(XY I)+vec(IY X)) =

1

2(I ⊗X +X ⊗ I)︸︷︷︸

:=bXc

vec(Y ),

where we used the fact that vec(ABC) = (CT⊗A)vec(B) to obtain the last equality (here,

the operator ⊗ : Rm×n × Rk×l −→ Rmk×nl is the Kronecker product which maps the pair

of matrices (A,B) into the matrix A⊗B whose (i, j) block is aijB for i = 1, 2, . . . ,m and

j = 1, 2, . . . , n). This gives the explicit formula of the b·c operator for Sn.

Example 10. The explicit formula of the b·c operator for En can be immediately given

by considering the following multiplication:

x y =

xTy

x0y + y0x

=

x0 xT

x x0I

︸︷︷︸:=Arw(x):=bxc

y.

Here Arw(x) ∈ Sn is the arrow-shaped matrix associated with the vector x ∈ En.

For x,y ∈ J , let dx,ye : J × J −→ J be the quadratic operator defined by

dx,ye := bxcbyc+ bycbxc − bx yc.

Therefore, in addition to bxc, we can define another linear map dxe : J −→ J associated

with x that is called the quadratic representation and defined by

dxe := 2bxc2 − bx2c = dx,xe.

16

Example 11. Continuing Example 9 we have

dX,Zevec(Y ) = bXcbZcvec(Y ) + bZcbXcvec(Y )− bX Zcvec(Y )

= vec(X (Z Y )) + vec(Z (X Y ))− vec((X Z) Y )

= vec(XZY+XY Z+ZY X+Y ZX

4

)+ vec

(ZXY+ZY X+XY Z+Y XZ

4

)− vec

(XZY+ZXY+Y XZ+Y ZX

4

)= 1

2(vec(XY Z) + vec(ZY X))

= 12(X ⊗ Z + Z ⊗X)vec(Y ).

This shows that dX,Ze := 12(X ⊗ Z + Z ⊗X), and, in particular, dXe := X ⊗X.

Example 12. Continuing Example 10, we can easily verify that

dxe := 2Arw2(x)− Arw(x2) =

‖x||22 2x0xT

2x0x det(x)I + 2xxT

.Notice that bec = dee = I, trace(e) = r, det(e) = 1 (since all the eigenvalues of e

are equal to one). Notice also that the linear operator bxc is symmetric with respect to

〈·, ·〉, because, by the associativity of the inner product 〈·, ·〉, we have that 〈bxcy, z〉 =

〈x y, z〉 = 〈x,y z〉 = 〈y, bxcz〉. This implies that dxe is also symmetric with respect

to 〈·, ·〉. It is also easy to see that bx, y + zc = bx,yc + bx, zc, and consequently

dx, y + ze = dx,ye+ dx, ze.

As the operator dXe in Example 11 plays an important role in the development of

the interior point methods for DSDP (SSDP) and the operator in Example 12 plays an

important role in the development of the interior point methods for DSOCP (SSOCP),

we would expect that the operator dxe will play a similar role in the development of the

interior point methods for DSP (SSP).

A spectral decomposition is a decomposition of x into idempotents together with the

eigenvalues. Recall that two elements c1, c2 ∈ J are said to be orthogonal if c1 c2 = 0.

17

A set of elements of J is orthogonal if all its elements are mutually orthogonal to each

other. An element c ∈ J is said to be an idempotent if c2 = c. An idempotent is primitive

if it is non-zero and cannot be written as a sum of two (necessarily orthogonal) non-zero

idempotents.

Definition 1.3.7. Let J be a Euclidean Jordan algebra. Then a subset c1, c2, . . . , cr

of J is called:

1. a complete system of orthogonal idempotents if it is an orthogonal set of idempotents

where c1 + c2 + · · ·+ cr = e;

2. a Jordan frame if it is a complete system of orthogonal primitive idempotents.

Example 13. Let q1, q2, . . . , qn be an orthonormal subset (all its vectors are mutually

orthogonal and all of unit norm (length)) of Rn. Then the set q1qT1 , q2q

T2 , . . . , qnq

Tn is

a Jordan frame in Sn. In fact, by the orthonormality, we have that

(qiqTi )(qjq

Tj ) =

0n, if i 6= j

qiqTi , if i = j.

and that∑n

i=1 qiqTi = In.

Example 14. Let x be a vector in En. It is easy to see that the set

12

(1; x‖x‖2

), 1

2

(1; −x‖x‖2

)is a Jordan frame in En.

Theorem 1.3.1 (Spectral decomposition (I), [18]). Let J be a Euclidean Jordan algebra

with rank r. Then for x ∈ J there exist real numbers λ1, λ2, . . . , λr and a Jordan frame

c1, c2, . . . , cr such that x = λ1c1 +λ2c2 + · · ·+λrcr, and λ1, λ2, . . . , λr are the eigenvalues

of x.

It is immediately seen that the eigenvalues of elements of Euclidean Jordan algebras

are always real, which is not the case for non-Euclidean Jordan algebras.

18

Example 15. It is known that for any X ∈ Sn there exists an orthogonal matrix Q ∈ Rn×n

and a diagonal matrix Λ ∈ Rn×n such that X = QΛQT. In fact, λ1, λ2, . . . , λn; the

diagonal entries of Λ, and q1, q2, . . . , qn; the columns of Q, can be used to rewrite X

equivalently as

X = λ1 q1qT1︸︷︷︸

C1

+λ2 q2qT2︸︷︷︸

C2

+ · · ·+ λn qnqTn︸︷︷︸

Cn

.

which, in view of Example 13, gives the spectral decomposition (I) of X in Sn.

Example 16. Using Example 14, the spectral decomposition (I) of x in En can be obtained

by considering the following identity:

x = (x0 + ‖x‖2)︸︷︷︸λ1

(1

2

)(1;

x

‖x‖2

)︸︷︷︸

c1

+ (x0 − ‖x‖2)︸︷︷︸λ2

(1

2

)(1;− x

‖x‖2

)︸︷︷︸

c2

.

Theorem 1.3.2 (Spectral decomposition (II), [18]). Let J be a Euclidean Jordan algebra.

Then for x ∈ J there exist unique real numbers λ1, λ2, . . . , λk, all distinct, and a unique

complete system of orthogonal idempotents c1, c2, . . . , ck such that x = λ1c1 +λ2c2 + · · ·+

λkck.

Continuing Examples 15 and 16, we have the following two examples [36].

Example 17. To write the spectral decomposition (II) of X in Sn, let λ1 > λ2 > · · · > λk

be distinct eigenvalues of X such that, for each i = 1, 2, . . . , k, the eigenvalue λi has

multiplicity mi and orthogonal eigenvectors qi1, qi2, . . . , qimi. Then X can be written as

X = λ1

m1∑j=1

q1jqT1j︸︷︷︸

:=C1

+λ2

m2∑j=1

q2jqT2j︸︷︷︸

:=C2

+ · · ·+ λk

mk∑j=1

qkjqTkj︸︷︷︸

:=Ck

.

where the set C1, C2, . . . , Ck is an orthogonal system of idempotents and∑k

i=1 Ci = In.

Notice that for each eigenvalue λi, the matrix Ci is uniquely written, even though the

19

corresponding eigenvectors qi1, qi2, . . . , qimimay not be unique.

Example 18. By Example 16, the eigenvalues of x ∈ En are λ1,2 = x0 ± ‖x‖2, and

therefore, as mentioned earlier, only multiples of identity have multiples eigenvalues. In

fact, if x = αe, for some α ∈ R, then its spectral decomposition (II) is simply αe (here

e is the singleton system of orthonormal idempotents).

Definition 1.3.8. Let J be a Euclidean Jordan algebra. We say that x ∈ J is positive

semidefinite (positive definite) if all its eigenvalues are nonnegative (positive).

Proposition 1.3.2 ([18, Proposition III.2.2]). If the elements x and y are positive defi-

nite, then the element bxcy is so.

Definition 1.3.9. We say that two elements x and y of a Euclidean Jordan algebra

are simultaneously decomposed if there is a Jordan frame c1, c2, . . . , cr such that x =∑ri=1 λici and y =

∑ri=1 µici.

For x ∈ J , it is now possible to rewrite the definition of x2 as

x2 := λ21c1 + λ2

2c2 + · · ·+ λ2kck = x x.

We also have the following definition.

Definition 1.3.10. Let x be an element of a Euclidean Jordan algebra J with a spectral

decomposition x := λ1c1 + λ2c2 + · · ·+ λkck. Then

1. the square root of x is x1/2 := λ1/21 c1 + λ

1/22 c2 + · · · + λ

1/2k ck, whenever all λi ≥ 0,

and undefined otherwise;

2. the inverse of x is x−1 := λ−11 c1 + λ−1

2 c2 + · · · + λ−1k ck, whenever all λi 6= 0, and

undefined otherwise.

20

More generally, if f is any real valued continuous function, then it is also possible to

extend the above definition to define f(x) as

f(x) := f(λ1)c1 + f(λ2)c2 + · · ·+ f(λk)ck.

Observe that x−1 x = e. We call x invertible if x−1 is defined, and non-invertible or

singular otherwise. Note that every positive definite element is invertible and its inverse

is also positive definite.

Remark 1. The equality x y = e may not imply that y = x−1, as it can be seen in the

following equality [18, Chapter II]:

1 0

0 −1

︸︷︷︸

=X=X−1

1 1

1 −1

︸︷︷︸

=Y 6=X−1

=

1 0

0 1

︸︷︷︸

=I

.

We define the differential operator Dx : J −→ J by

Dxf(x) :=

(d

dλ1

f(λ1)

)c1 +

(d

dλ2

f(λ2)

)c2 + · · ·+

(d

dλkf(λk)

)ck.

The Jacobian matrix ∇xf(x) is defined so that (∇xf(x))Ty = (Dxf(x)) y for all

y ∈ J . For n ≥ 2, we define Dnxf(x) recursively by Dn

xf(x) := Dx(Dn−1x f(x)). For

instance, D2xx = Dx(Dxx) = Dxe = 0, Dxx

−1 = −x−2, provided that x is invertible.

More generally, if y is a function of x and they are both simultaneously decomposed,

then Dxf(y) := Dyf(y) Dxy. For example, Dxy−1 = −y−2 Dxy, provided that y

is invertible. Note that, if y and z are functions of x and they are all simultaneously

decomposed, then Dx(y ± z) = Dxy ± Dxz, and Dx(y z) = (Dxy) z + y (Dxz).

This differential operator is interesting and will play an important role when computing

partial derivatives.

21

There is a one-to-one correspondence between Euclidean Jordan algebras and sym-

metric cones.

Definition 1.3.11. If J is a Euclidean Jordan algebra then its cone of squares is the set

KJ := x2 : x ∈ J .

Such a one to one correspondence between (cones of squares of) Euclidean Jordan

algebras and symmetric cones is given by the following fundamental result, which says

that a cone is symmetric if and only if it is the cone of squares of some Euclidean Jordan

algebra.

Theorem 1.3.3 (Jordan algebraic characterization of symmetric cones, [18]). A regular

cone K is symmetric iff K = KJ for some Euclidean Jordan algebra J .

The above result implies that an element is positive semidefinite if and only if it

belongs to the cone of squares, and it is positive definite if and only if it belongs to the

interior of the cone of squares. In other words, an element x in a Euclidean Jordan algebra

J is positive semidefinite if and only if x ∈ KJ , and is positive definite if and only if

x ∈ int(KJ ), where int(KJ ) denotes the interior of the cone KJ .

The following notations will be used throughout the dissertation. For a Euclidean

Jordan algebra J , we write x KJ 0 and x KJ 0 to mean that x ∈ KJ and x ∈ int(KJ ),

respectively. We also write x KJ y (or y KJ x) and x KJ y (or y ≺KJ x) to mean

that x− y KJ 0 and x− y KJ 0, respectively.

Example 19. The cone KSn is, indeed, Sn+, the cone of real symmetric positive semidef-

inite matrices of order n. This can be seen in view of the fact that a symmetric matrix is

square of another symmetric matrix if and only if it is a positive semidefinite matrix. To

prove this fact, suppose that X is positive semidefinite, then it has nonnegative eigenvalues

22

λ1, λ2, . . . , λn and, by the spectral theorem for real symmetric matrices, there exists an or-

thogonal matrix Q ∈ Rn×n and a diagonal matrix Λ := diag2(λ1;λ2; . . . ;λn) ∈ Rn×n such

that X = QΛQT. By letting Y := QΛ1/2QT ∈ Sn where Λ1/2 := diag(√λ1;√λ2; . . . ;

√λn),

it follows that

X = QΛQT = QΛ1/2Λ1/2QT = (QΛ1/2QT)(QΛ1/2QT) = Y 2.

To prove the other direction, let us assume that X = Y 2 for some Y ∈ Sn. It is clear

that X ∈ Sn. To show that X is positive semidefinite, let (λ,v) be an eigenpair of Y ,

then λ ∈ R (every real symmetric matrix has real eigenvalues). Furthermore, we have

that Xv = Y 2v = Y (λv) = λ2v. Thus, (λ2,v) is an eigenpair of X, which means that X

has always nonnegative eigenvalues, and therefore this completes the proof.

Example 20. The cone of squares of En is the second-order cone of dimension n:

En+ := ξ ∈ En : ξ0 ≥ ‖ξ‖2.

To see this, recall that the cone of squares (with with respect to “” defined in Example

4) is

KEn = ζ2 : ζ ∈ En = (‖ζ||22; 2ζ0ζ) : ζ ∈ En.

Thus, any x ∈ KEn can be written as x = (‖y||22; 2y0y), for some y ∈ En. It follows that

x0 = ‖y||22 = y20 + ‖y‖2

2 ≥ 2y0y = x,

where the inequality follows by observing that (y0 − ‖y‖2)2 ≥ 0. This means that x ∈ En+

and hence KEn ⊆ En+. The proof of the other direction can be found in §4 of [1].

2We indicate by the operator diag(·) that one maps its argument to a block diagonal matrix; forexample, if x ∈ Rn, then diag(x) is the n× n diagonal matrix with the entries of x on the diagonal.

23

Theorem 1.3.4 ([18, Theorem III.1.5]). Let J be a Jordan algebra over R with the

identity element e. The following properties are equivalent:

1. J is a Euclidean Jordan algebra.

2. The symmetric bilinear form trace(x y) is positive definite.

A direct consequence of the above theorem is that if J is a Euclidean Jordan algebra,

then x • y := trace(x y) is an inner product. In the sequel, we define the inner product

as 〈x,y〉 := x • y = trace(x y) and we call it the Frobenius inner product. It is easy to

see that, for x,y, z ∈ J , x • e = trace(x), x • y = y • x, (x + y) • z = x • z + y • z,

x • (y + z) = x • z + x • z, and (x y) • z = x • (y z).

For x ∈ J , the Frobenius norm (denoted by ‖ ·‖F , or simply ‖ ·‖) is defined as ‖x|| :=√x • x. We can also define various norms on J as functions of eigenvalues. For example,

the definition of the Frobenius norm can be rewritten as ‖x|| :=√λ2

1 + λ22 + · · ·+ λ2

k =√trace(x2) =

√x • x. The Cauchy-Schwartz inequality holds for the Frobenius inner

product, i.e., for x,y ∈ J , |x • y| ≤ ‖x|| ‖y|| (see for example [34]). Note that ‖e|| =

√e • e =

√r.

Let x ∈ J and t ∈ R. For a function g := g(x, t) from J ×R into R, we will use “g′”

for the partial derivative of g with respect to t, and “∇xg”, “∇2xxg”, “∇3

xxxg” to denote

the gradient, Hessian, and the third order partial derivative of g with respect to x. For a

function y := y(x, t) from J × R into J , we will also use “y′” for the partial derivative

of y with respect to t, “Dxy” to denote the first partial derivative of y with respect to

x, and “∇xy” to denote the Jacobian matrix of y (with respect to x).

Let h1,h2, . . . ,hk ∈ J . For a function f from J into R, we write

∇kx...xf(x)[h1,h2, . . . ,hk] :=

∂kf(x+ t1h1 + · · ·+ tkhk)

∂t1 · · · ∂tk|t1=···=tk=0

to denote the value of the kth differential of f taken at x along the directions h1,h2, . . . ,hk.

24

We now present some handy tools that will help with our computations.

Lemma 1.3.1. Let J be a Euclidean Jordan algebra with identity e, and x,y, z ∈ J .

Then

1. ln det (e+ tx)′|t=0 = trace(x) and, more generally, ln det (y + tx)′|t=0 = y−1 •

x, provided that det (e+ tx) and det (y + tx) are positive.

2. trace(e+ tx)−1′|t=0 = −trace(x) and, more generally, trace(e+txy)−1′|t=0 =

−x • y, provided that e+ tx is invertible.

3. ∇x ln detx = x−1, provided that detx is positive (so x is invertible). More gener-

ally, if y is a function of x, then ∇x ln dety = (∇xy)Ty−1, provided that dety is

positive.

4. ∇xx−1 = −dx−1e provided that x is invertible, and hence ∇2

xx ln detx = ∇xx−1 =

−dx−1e. More generally, if y is a function of x, then ∇xy−1 = −dy−1e∇xy

provided that y is invertible.

5. ∇xdxe[y] = 2dx,ye.

6. If x and y are functions of t where t ∈ R, then (x y)′ = x y′ + x′ y. In other

words, (bxcy)′ = bx′cy+ bxcy′. Therefore, bxc′ = bx′c, dx,ye′ = dx,y′e+ dx′,ye,

and, in particular, dxe′ = 2dx,x′e.

7. ∇x trace(x) = e = Dxx and, more generally, if y is a function of x and they are

both simultaneously decomposed, then ∇xtrace(y) = (∇xy)Te = Dxy. Hence, if y

and z are functions of x and they are all simultaneously decomposed, then

∇x(y • z) = Dx(y z) = (Dxy) z + (Dxz) y = (∇xy)Tz + (∇xz)Ty.

25

The proofs of most of these statements are straightforward. We only indicate that item

1 follows from the facts that det (e+ tx)′|t=0 = trace(x) and that det (y + tx)′|t=0 =

det(y)(y−1•x) [18, Proposition III.4.2], the first statement in item 2 follows from spectral

decomposition (I), the first statement in item 3 is taken from [18, Proposition III.4.2],

the first statement in item 4 is taken from [18, Proposition II.3.3], item 5 is taken from

§3 of [18, Chapter II], the first statement in item 6 is taken from §4 of [18, Chapter II],

and the first statement in item 7 is obtained by using item 3 and the observation that

det(exp(x)) = exp(trace(x)).

Definition 1.3.12. We say two elements x,y of a Euclidean Jordan algebra J operator

commute if bxcbyc = bycbxc. In other words, x and y operator commute if for all z ∈ J ,

we have that x (y z) = y (x z).

We remind the reader about the fact that two matrices X, Y ∈ Sn commute if and

only if XY is symmetric, if and only if X and Y can be simultaneously diagonalized, i.e.

they share a common system of orthonormal eigenvectors (the same Q). This fact can be

generalized in the following theorem which can be also applied to multiple-block elements.

Theorem 1.3.5 ([36, Theorem 27]). Two elements of a Euclidean Jordan algebra operator

commute if and only if they are simultaneously decomposed.

Note that if two operator commutative elements x and y are invertible, then so is

x y. Moreover, (x y)−1 = x−1 y−1, and det(x y) = det(x) det(y) (see also [18,

Proposition II.2.2]). In Remark 1 we mentioned that the equality x y = e may not

imply that y = x−1. However, the equality x y = e does imply that y = x−1 when the

elements x and y operator commute.

Lemma 1.3.2 (Properties of d·e). Let x and y be elements of a Euclidean Jordan algebra

with rank r and dimension n, x invertible, and k be an integer. Then

1. det(dxey) = det2(x) det(y).

26

2. dxex−1 = x, dxee = x2.

3. ddyexe = dyedxedye.

4. dxke = dxek.

The first three items of the following lemma are taken from [18, Chapters II and III]

and the last one is taken from [36, §2].

We use “,” for adjoining elements of a Euclidean Jordan algebra J in a row, and use

“;” for adjoining them in a column. Thus, if J is a Euclidean Jordan algebra, and xi ∈ J

for i = 1, 2, . . . ,m, we have

(x1;x2; . . . ;xm) :=

x1

x2

...

xm

∈ J × J × · · · × J︸︷︷︸

m times

,

and we write (x1;x2; . . . ;xm)T := (x1,x2, . . . ,xm) :=[x1 x2 · · · xm

]. As we mentioned

earlier, we also use the superscript “T” to indicate transposition of column vectors in Rn.

We end this section with the following lemma which is essentially a part of Lemma 12

in [36].

Lemma 1.3.3. Let x ∈ J with a spectral decomposition x = λ1c1 + λ2c2 + · · · + λrcr.

Then the following statements hold.

1. The matrices bxc and dxe commute and thus share a common system of eigenvec-

tors.

2. Every eigenvalue of bxc have the form (1/2)(λi+λj) for some i, j ≤ r. In particular,

x KJ 0 (x KJ 0) if and only if bxc 0 (bxc 0). The eigenvalues of x; λi’s,

are amongst the eigenvalues of bxc.

27

Chapter 2

Stochastic Symmetric Optimization

Problems

In this chapter, we use the Jordan algebraic characterization of symmetric cones to define

the SSP problem in both the primal and dual standard forms. We then see how this

problem can include some general optimization problems as special cases.

2.1 The stochastic symmetric optimization problem

We define a problem based on the DSP problem analogous to the way SLP problem is

defined based on the DLP problem. To do so, we first introduce the definition of a DSP.

Let J be a Euclidean Jordan algebra with dimension n and rank r. The DSP problem

and its dual [36] are

min c • x max bTy

(P1) s.t. ai • x = bi, i = 1, 2, . . . ,m (D1) s.t.m∑i=1

yiai KJ c

x KJ 0; y ∈ Rm,

28

where c,ai ∈ J for i = 1, 2, . . . ,m, b ∈ Rm, x is the primal variable, y is the dual

variable, and, as mentioned in §2.1, KJ is the cone of squares of J .

The pair (P1, D1) can be rewritten in the following compact form

min c • x max bTy

(P2) s.t. Ax = b (D2) s.t. ATy KJ c

x KJ 0; y ∈ Rm,

where A := (a1;a2; . . . ;am) is a linear operator that maps J into Rm and AT is a linear

operator that maps Rm into J such that x ATy = (Ax)Ty. In fact, we can prove weak

and strong duality properties for the pair (P2, D2) as justification for referring to them as

a primal dual pair; see, for example, [30].

In the rest of this section, we assume thatm1,m2, n1, n2, r1, and r2 are positive integers,

and J1 and J2 are Euclidean Jordan algebras with identities e1 and e2, dimensions n1

and n2, and ranks r1 and r2, respectively.

2.1.1 Definition of an SSP in primal standard form

We define the primal form of the SSP based on the primal form of the DSP. Given ai, tj ∈

J1 and wj ∈ J2 for i = 1, 2, . . . ,m1 and j = 1, 2, . . . ,m2. Let A := (a1;a2; . . . ;am1)

be a linear operator that maps x to the m1-dimensional vector whose ith component is

ai • x, b ∈ Rm1 , c ∈ J1, T := (t1; t2; . . . ; tm2) be a linear operator that maps x to

the m2-dimensional vector whose ith component is ti • x, W := (w1;w2; . . . ;wm2) be a

linear operator that maps y to the m2-dimensional vector whose ith component is wi • y,

h ∈ Rm2 , and d ∈ J2. We also assume that A, b and c are deterministic data, and T,W,h

and d are random data whose realizations depend on an underlying outcome ω in an event

space Ω with a known probability function P . Given this data, an SSP with recourse in

29

primal standard form is

min c • x+ E [Q(x, ω)]

s.t. Ax = b

x KJ1 0,

(2.1.1)

where Q(x, ω) is the minimum value of the problem

min d(ω) • y

s.t. W (ω)y = h(ω)− T (ω)x

y KJ2 0,

(2.1.2)

where x is the first-stage decision variable, y is the second-stage variable, and

E[Q(x, ω)] :=

∫Ω

Q(x, ω)P (dω).

2.1.2 Definition of an SSP in dual standard form

In many applications we find that a defined SSP problem based on the dual standard form

(D2) is more useful. In this part we define the dual form of the SSP based on the dual form

of the DSP. Given ai ∈ J1 and ti,wj ∈ J2 for i = 1, 2, . . . ,m1 and j = 1, 2, . . . ,m2. Let

A := (a1,a2, . . . ,am1), b ∈ J1, c ∈ Rm1 , T := (t1, t2, . . . , tm1), W := (w1,w2, . . . ,wm2),

h ∈ J2, and d ∈ Rm2 . We also assume that A, b and c are deterministic data, and T,W,h

and d are random data whose realizations depend on an underlying outcome ω in an event

space Ω with a known probability function P . Given this data, an SSP with recourse in

dual standard form is

max cTx+ E [Q(x, ω)]

s.t. Ax KJ1 b,(2.1.3)

where Q(x, ω) is the maximum value of the problem

30

max d(ω)Ty

s.t. W (ω)y KJ2 h(ω)− T (ω)x,(2.1.4)

where x ∈ Rm1 is the first-stage decision variable, y ∈ Rm2 is the second-stage variable,

and

E[Q(x, ω)] :=

∫Ω

Q(x, ω)P (dω).

In fact, it is also possible to define SSPs in“mixed forms” where the first-stage is based

on primal problem (P2) while the second-stage is based on the dual problem (D2), and

vice versa.

2.2 Problems that can be cast as SSPs

It is interesting to mention that almost all conic optimization problems in real world

applications are associated with symmetric cones, and that, as an illustration of the

modeling power of conic programming, all deterministic convex programming problems

can be formulated as deterministic conic programs (see [29]). As a consequence, by

considering the stochastic counterpart of this result, it is straightforward to show that all

stochastic convex programming problems can be formulated as stochastic conic programs.

In this section we will see how SSPs can include some general optimization problems as

special cases. We start with two-stage stochastic linear programs with recourse.

Problem 1. Stochastic linear programs:

It is clear that the space of real numbers R with Jordan multiplication x y := xy and

inner product x • y := xy forms a Euclidean Jordan algebra. Since a real number is a

square of another real number if and only if it is nonnegative, the cone of squares of R is

indeed R+; the set of all nonnegative real numbers. This verifies that R+ is a symmeric

cone. The cone Rp+ of nonnegative orthants of Rp is also symmetric because it is just

31

Euclidean Jordan algebra Rp = (x1;x2; . . . ;xp) : xi ∈ R, i = 1, 2, . . . , pSymmetric cone Rp

+ = x ∈ Rp : xi ≥ 0, i = 1, 2, . . . , pConic inequality x Rp

+0 ≡ x ≥ 0 and x Rp

+0 ≡ x > 0

Jordan multiplication x y = (x1y1;x2y2; · · · ;xpyp) = diag(x)︸︷︷︸bxc

y

Inner product x • y = x1y1 + x2y2 + · · ·+ xpyp = xTyIdentity element e = (1; 1; . . . ; 1)Spectral decomposition x = x1︸︷︷︸

λ1

(1; 0; . . . ; 0)︸︷︷︸c1

+ x2︸︷︷︸λ2

(0; 1; . . . ; 0)︸︷︷︸c2

+

· · ·+ xp︸︷︷︸λp

(0; 0; . . . ; 1)︸︷︷︸cp

Cone rank rank(Rp+) = p

Expression for trace trace(x) = x1 + x2 + · · ·+ xpExpression for determinant det(x) = x1 x2 · · · xpExpression for Frobenius norm ||x|| =

√x2

1 + x22 + · · ·+ x2

p = ||x||2Expression for inverse x−1 = ( 1

x1; 1x2

; . . . ; 1xp

) (if xi 6= 0 for all i)

Expression for square root x1/2 = (√x1;√x2; . . . ;

√xp) (if xi ≥ 0 for all i)

Log barrier function − ln det(x) = −∑p

i=1 ln xi if xi > 0 for all i

Table 2.1: The Euclidean Jordan algebraic structure of the nonnegative orthantcone.

the Cartesian product of the symmetric cones

p time︷︸︸︷R+,R+, . . . ,R+. Table 2.1 summarizes the

Euclidean Jordan algebraic structure of the nonnegative orthant cone Rp+.

When J1 = Rn1 and J2 = Rn2 ; the spaces of vectors of dimensions n1 and n2, re-

spectively, with Jordan multiplication x y := diag(x)y and the standard inner product

x • y := xTy, then KJ1 = Rn1+ and KJ2 = Rn2

+ ; the cones of nonnegative orthants of

Rn1 and Rn2 , respectively, and hence the SSP problem (2.1.1, 2.1.2) becomes the SLP

problem:

min cTx+ E [Q(x, ω)]

s.t. Ax = b

x ≥ 0,

32


min d(ω)Ty

s.t. W (ω)y = h(ω)− T (ω)x

y ≥ 0.

Two-stage SLP with recourse has many practical applications, see for example [15].

Problem 2. Stochastic second-order cone programs:

When J1 = En1 and J2 = En2 with Jordan multiplication x y := (xTy, x0y + y0x),

and inner product x • y := xTy, then KJ1 = En1+ and KJ2 = En2

+ ; the second-order cones

of dimensions n1 and n2, respectively (see Example 20), and hence we obtain SSOCP

with recourse. Table 2.21 summarizes the Euclidean Jordan algebraic structure of the

second-order cone in both single- and multiple-block forms.

To introduce the SSOCP problem, we first introduce some notations that will be used

throughout this part and in Chapter 5.

For simplicity sake, we write the single-block second-order cone inequality x Ep+ 0

as x 0 (to mean that x ∈ Ep+) when p is known from the context, and the multiple-

block second-order cone inequality x Ep1+ ×Ep2+ ×···×E

pN+

0 as x N 0 (to mean that x ∈

Ep1+ × Ep2+ × · · · × E

pN+ ) when p1, p2, . . . , pN are known from the context. It is immediately

seen that, for every vector x ∈ Rp where p =∑N

i=1 pi, x N 0 if and only if x is

partitioned conformally as x = (x1;x2; . . . ;xN) and xi 0 for i = 1, 2, . . . , N . We also

write x y (x N y) or y x (y N x) to mean that x− y 0 (x− y N 0).

We now are ready to introduce the definition of an SSOCP with recourse.

Let N1, N2 ≥ 1 be integers. For i = 1, 2, . . . , N1 and j = 1, 2, . . . , N2, let m1,m2, n1, n2,

n1i, n2j be positive integers such that n1 =∑N1

i=1 n1i and n2 =∑N2

i=1 n2j. An SSOCP with

recourse in primal standard form is defined based on deterministic data A ∈ Rm1×n1 , b ∈

1The direct sum of two square matrices A and B is the block diagonal matrix A⊕B :=

[A 00 B

].

33

Euclidean Jordan alg. J Ep = x : x = (x0; x) ∈ R× Rp−1 Ep1 × · · · × EpN

Symmetric cone KJ Ep+ = x ∈ Ep : x0 ≥ ||x||2 Ep1

+ × · · · × EpN

+

x KJ 0 (x KJ 0) x 0 (x 0) x N 0 (x N 0)

Jordan product x y (xTy;x0y + y0x) = Arw(x)y (x1 y1; . . . ;xN yN )

Inner product x • y xTy x1Ty1 + · · ·+ xN

TyNThe identity e (1;0) (e1; . . . ; eN )The matrix bxc Arw(x) Arw(x1)⊕ · · · ⊕Arw(xN )

Spectral decomposition x = (x0 + ||x||2)︸︷︷︸λ1

(1

2

)(1;

x

||x||2

)︸︷︷︸

c1

Follows from decomp. of

+ (x0 − ||x||2)︸︷︷︸λ2

(1

2

)(1;− x

||x||2

)︸︷︷︸

c2

each block xi, 1 ≤ i ≤ N

rank(KJ ) 2 2N

Expression for trace(x) λ1 + λ2 = 2x0∑N

i=1 trace(xi)Expression for det(x) λ1λ2 = x20 − ||x||2 ΠN

i=1 det(xi)

Frobenius norm ||x||√λ21 + λ22 =

√2||x||2

∑Ni=1 ||xi||2

Inverse x−1 λ−11 c1 + λ−12 c2 = Jxdet(x) (x−11 ; . . . ;x−1N ) (if x−1i exists

(if det(x) 6= 0; o/w x is singular) for all i; o/w x is singular)

Log barrier function − ln (x20 − ||x||22) −∑N

i=1 ln det xi

Table 2.2: The Euclidean Jordan algebraic structure of the second-order cone in single-and multiple-block forms.

Rm1 and c ∈ Rn1 and random data T ∈ Rm2×n1 ,W ∈ Rm2×n2 ,h ∈ Rm2 and d ∈ Rn2

whose realizations depend on an underlying outcome ω in an event space Ω with a known

probability function P . Given this data, the two-stage SSOCP in the primal standard

form is the problem


s.t. Ax = b

x N10,

(2.2.1)

where x ∈ Rn1 is the first-stage decision variable and Q(x, ω) is the minimum value of

the problem

min d(ω)Ty

s.t. W (ω)y = h(ω)− T (w)x

y N20,

(2.2.2)

34

where y ∈ Rn2 is the second-stage variable and

E[Q(x, ω)] :=

∫Ω

Q(x, ω)P (dω).

Note that if N1 = n1 and N2 = n2, then SSOCP (2.2.1, 2.2.2) reduces to SLP. In fact,

if N1 = n1, then ni = 1 for each i = 1, 2, . . . , N1, and so xi ∈ E1+ := t ∈ R : t ≥ 0 for

each i = 1, 2, . . . , N1. Thus, the constraint x N1 0 means the same as x ≥ 0, i.e., x lies

in the nonnegative orthant of Rn1 . The same situation occurs for y N2 0. Thus SLPs is

a special case of SSOCP (2.2.1, 2.2.2).

Stochastic quadratic programs (SQPs) are also a special case of SSOCPs. To demon-

strate this, recall that a two-stage SMIQP with recourse is defined based on deterministic

data C ∈ Sn1+ , c ∈ Rn1 , A ∈ Rm1×n1 and b ∈ Rm1 ; and random data H ∈ Sn2

+ ,d ∈ Rn2 , T ∈

Rm2×n1 ,W ∈ Rm2×n2 , and h ∈ Rm2 whose realizations depend on an underlying outcome

in an event space Ω with a known probability function P . Given this data, an SQP with

recourse is

min q1(x, ω) = xTCx+ cTx+ E[Q(x, ω)]

s.t. Ax = b

x ≥ 0,

(2.2.3)


the problem

min q2(y, ω) = yTH(ω)y + d(ω)Ty

s.t. W (ω)y = h(ω)− T (ω)x

y ≥ 0,

(2.2.4)

where y ∈ Rn2 is the second-stage decision variable and

E[Q(x, ω)] :=

∫Ω

Q(x, ω)P (dω).

35

Observe that the objective function of (2.2.3) can be written as (see [1])

q1(x1, ω) = ||u||2 + E[Q(x, ω)]− 1

4cTC−1c where u = C

1/2x+1

2C−

1/2c.

Similarly, the objective function of (2.2.4) can be written as

q2(y, ω) = ||v||2 − 1

4d(ω)TH(ω)−1d (ω) where v = H(ω)

1/2y +1

2H(ω)−

1/2d(ω).

Thus, problem (2.2.3, 2.2.4) can be transformed into the SSOCP:

min u0

s.t. u− C1/2x = 12C−

1/2c

Ax = b

u 0, x ≥ 0,

(2.2.5)


min v0

s.t. v −H(ω)1/2y = 1

2H(ω)−

1/2d(ω)

W (ω)y = h(ω)− T (ω)x

v 0, y ≥ 0,

(2.2.6)

where

E[Q(x, ω)] :=

∫Ω

Q(x, ω)P (dω).

Note that the SQP problem (2.2.3, 2.2.4) and the SSOCP problem (2.2.5, 2.2.6) will

have the same minimizing solutions, but their optimal objective values are equal up to

constants. More precisely, the difference between the optimal objective values of (2.2.4,

2.2.6) would be −12d(ω)TH(ω)−1 d(ω). Similarly, the optimal objective values of (2.2.3,

36

2.2.4) and (2.2.5, 2.2.6) will differ by

−1

2cTC−1c− 1

2

∫Ω

(d(ω)T H(ω)−1d(ω)

)P (dω).

In §5.1 of Chapter 5, we describe two applications that illustrate the applicability of

two-stage SSOCPs with recourse.

Problem 3. Stochastic rotated quadratic cone programs:

For each vector x ∈ Rn indexed from 0, we write x for the sub-vector consisting of entries

2 through n − 1; therefore x = (x0;x1; x) ∈ R × R × Rn−2. We let En denote the n

dimensional real vector space R× R× Rn−2 whose elements x are indexed with 0.

The rotated quadratic cone [1] of dimension n is defined by

En+ := x = (x0;x1; x) ∈ En : 2x0x1 ≥ ||x||2, x0 ≥ 0, x1 ≥ 0.

The constraint on x that satisfies the relation 2x0x1 ≥ ||x||2 is called a hyperbolic con-

straint. It is clear that the cones En+ and En+ have the same Euclidean Jordan algebraic

structure. In fact, the later is obtained by rotating the former through an angle of

thirty degrees in the x0x1-plane [1]. More specifically, by writing x 0 (x N 0)

to mean that x ∈ Ep+ (x ∈ Ep1+ × Ep2+ × · · · ˆEpN+ ), then one can easily see that the

hyperbolic constraint (x0;x1; x) 0 is equivalent to the second-order cone constraint

(2x0 + x1; 2x0 − x1; 2x) 0. In fact,

2x0 + x1 ≥ ||(2x0 − x1; 2x)||2 ⇐⇒ (2x0 + x1)2 ≥ ||(2x0 − x1; 2x)||22

⇐⇒ 4x0x1 ≥ −4x0x1 + 4||x||22

⇐⇒ 2x0x1 ≥ ||x||22.

So, if we are given the same setting as in Problem (2.2.1, 2.2.2), the two-stage stochastic

rotated quadratic cone program (SRQCP) in the primal standard form is the problem

37


s.t. Ax = b

x N10,


the problem

min d(ω)Ty

s.t. W (ω)y = h(ω)− T (w)x

y N20.

In §5.2 of Chapter 5, we describe two applications that illustrate the applicability of

two-stage SRQCPs with recourse.

Problem 4. Stochastic semidefinite programs:

When J1 = Sn1 and J2 = Sn2 ; the spaces of real symmetric matrices of orders n1 and

n2, respectively, with Jordan multiplication X Y := 12(XY + Y X) and inner product

X •Y := trace(XY ), then KJ1 = Sn1+ and KJ2 = Sn1

+ ; the cones of real symmetric positive

semidefinite matrices of orders n1 and n2, respectively (see Example 19). We simply write

the linear matrix inequality X Sp+ 0p as X 0 (to mean that X ∈ Sp+). Hence the SSP

problem (2.1.1, 2.1.2) becomes the SSDP problem (see also [9, Subsection 2.1]):

min C •X + E [Q(X,ω)]

s.t. AX = b

X 0,

(2.2.7)

where Q(X,ω) is the minimum value of the problem

min D(ω) • Y

s.t. W(ω)Y = h(ω)− T (ω)X

Y 0,

(2.2.8)

38

Euclidean Jordan alg. J Sp = X ∈ Rp×p : XT = X vec(Sp) = vec(X) ∈ Rp2 : XT = XSymmetric cone KJ Sp+ = X ∈ Sp : X 0 vec(Sp+) = vec(X) ∈ Rp2 : X 0Jordan product x y X Y = 1

2(XY + Y X) vec(X) vec(Y ) = 12 vec(XY + Y X)

Inner product x • y X • Y = trace(XY ) vec(X) • vec(Y ) = vec(X)Tvec(Y )

The identity e Ip vec(Ip)

The matrix bXc 12(I ⊗X +X ⊗ I) 1

2(I ⊗X +X ⊗ I)

The matrix dXe X ⊗X X ⊗XSpectral decomposition X = λ1 q1q

T1︸︷︷︸

C1

+λ1 q2qT2︸︷︷︸

C2

+ · · · vec(X) = λ1 vec(q1qT1 )︸︷︷︸

c1

+λ1 vec(q2qT2 )︸︷︷︸

c2

+λp qpqTp︸︷︷︸

Cp

+ · · ·+ λp vec(qpqTp )︸︷︷︸

cp

rank(KJ ) p p

Table 2.3: The Euclidean Jordan algebraic structure of the positive semidefinite cone inmatrix and vectorized matrix forms.

where X ∈ Sn1 is the first-stage decision variable, Y ∈ Sn2 is the second-stage variable,

and

E[Q(X,ω)] :=

∫Ω

Q(X,ω)P (dω).

Here, clearly, C ∈ Sn1 , D ∈ Sn2 , and the linear operator A : Sn1 −→ Rm1 is defined by

AX := (A1 •X;A2 •X; . . . ;Am1 •X)

where Ai ∈ Sn1 for i = 1, 2, . . . ,m1. The linear operators W : Sn2 −→ Rm2 and T :

Sn2 −→ Rm1 are defined in a similar manner. Problem (2.2.7, 2.2.8) can also be written

in a vectorized matrix form (see [25, Section 1]). See Table 2.3 for a summary of the

Euclidean Jordan algebraic structure of the positive semidefinite cone in both matrix and

vectorized matrix forms. Some applications leading to two-stage SSDPs with recourse can

be found in [49].

Problem 5. Stochastic programming over complex Hermitian positive semidef-

inite matrices:

Recall that a square matrix with complex entries is called Hermitian if it is equal to its

39

own conjugate transpose, and that the eigenvalues of a (positive semidefinite) Hermitian

matrix are always (nonnegative) real valued.

Let Hp denote the space of complex Hermitian matrices of order p. Equipped with the

Jordan multiplication X Y := (XY + Y X)/2, the space Hp forms a Euclidean Jordan

algebra. By following a similar argument to that in Example 19 (but using the spectral

theorem for complex Hermitian matrices instead of using the spectral theorem for real

symmetric matrices, see [43, Theorem 5.4.13]), we can show that the cone of squares

of the space of complex Hermitian matrices of order p is the cone of complex Hermitian

positive semidefinite matrices of orders p, denoted byHp+. The Euclidean Jordan algebraic

structure of the cone of complex Hermitian positive semidefinite matrices is analogous to

that of the cone of real symmetric positive semidefinite matrices.

Another subclass of stochastic symmetric programming problems is obtained when

J1 = Hn1 and J2 = Hn2 , and so we have that KJ1 = Hn1+ and KJ2 = Hn1

+ ; the cones

of complex Hermitian positive semidefinite matrices of orders n1 and n2, respectively.

In fact, it is impossible to deal with the field of complex numbers directly as it is an

unordered field. However, this can be overcome by defining a transformation that maps

complex Hermitian matrices to symmetric matrices; see, for example, [35, 18].

Problem 6. Stochastic programming over quaternion Hermitian positive semidef-

inite matrices:

We first recall some notions of quaternions and quaternion matrices. The entries of the

field of real quaternions has the form x = x0 + x1i + x2j + x3k where x0, x1, x2, x3 ∈ R

and i, j, and k are abstract symbols satisfying i2 = j2 = k2 = ijk = −1. The conjugate

transpose of x is x∗ = x0 − x1i − x2j − x3k. If A is a p × p quaternion matrix and

A∗ is its conjugate transpose, then A is called Hermitian if A = A∗. Note that, if A is

Hermitian, then w∗Aw is a real number for any p dimensional column vector w with

quaternion components. The matrix A is called positive semidefinite if it is Hermitian

40

and w∗Aw ≥ 0 for any p dimensional column vector w with quaternion components. A

(positive semidefinite) quaternion Hermitian matrix has (nonnegative) real eigenvalues.

Let QHp denote the space of quaternion Hermitian matrices of order p. Equipped with

the Jordan multiplication X Y := (XY + Y X)/2, the space QHp forms a Euclidean

Jordan algebra. It has been shown [46] that the spectral theorem holds also for quaternion

matrices (i.e., any quaternion Hermitian matrix is conjugate to a real diagonal matrix).

So, by following argument similar to that in Example 19, we can show that the cone of

squares of the space of quaternion Hermitian matrices of order p is the cone of quaternion

Hermitian positive semidefinite matrices of orders p, denoted by QHp+. The Euclidean

Jordan algebraic structure of the cone of complex Hermitian positive semidefinite matrices

is analogues to that of the cone of real symmetric positive semidefinite matrices and that

of the cone of complex Hermitian positive semidefinite matrices.

The last class of problems is obtained when J1 = QHn1 and J2 = QHn2 with X Y :=

(XY +Y X)/2. Hence KJ1 = QHn1+ and KJ2 = QHn1

+ ; the cones of quaternion Hermitian

positive semidefinite matrices of orders n1 and n2, respectively. Similar to Problem (5),

there is a difficulty in dealing with quaternion entries, but this difficulty can be overcome

by defining a transformation that maps Hermitian matrices with quaternion entries to

symmetric matrices; see, for example, [35, 18].

Of course, it is also possible to consider a “mixed problem” such as that one obtained

by selecting J1 = Rn1 and J2 = En2 , or by selecting J1 = En1 and J2 = Sn2 , etc.

In many applications we find that the models lead to SSPs in dual standard form. The

focus of this dissertation is on the SSP in dual standard form. In the next two chapters,

we will focus on interior point algorithms for solving the SSP problem (2.1.3) and (2.1.4).

41

Chapter 3

A Class of Polynomial Logarithmic

Barrier Decomposition Algorithms

for Stochastic Symmetric

Programming

In this chapter, we turn our attention to the study of logarithmic barrier decomposition-

based interior point methods for the general SSP problem. Since our algorithm applies to

all symmetric cones, this work extends Zhao’s work [48] which focuses particularly on SLP

problem (Problem 1), and Mehrotra and Ozevin’s work [25] which focuses particularly on

the SSDP problem (Problem 4). More specifically, we use logarithmic barrier interior

point methods to present a Bender’s decomposition-based algorithm for solving the SSP

problem (2.1.3) and (2.1.4) and then prove its polynomial complexity. Our convergence

analysis proceeds by showing that the log barrier associated with the recourse function

of SSPs behaves as a strongly self-concordant barrier and forms a self-concordant family

on the first stage solutions. Our procedure closely follows that of [25] (which essentially

follows the procedure of [48]), but our setting is much more general. The results of this

42

chapter have been submitted for publication [3].

3.1 The log barrier problem for SSPs

We begin by presenting the extensive formulation of the SSP problem (2.1.3, 2.1.4) and

the log barrier [30] problems associated with them.

3.1.1 Formulation and assumptions

We now examine (2.1.3, 2.1.4) when the event space Ω is finite. Let (T (k),W (k),h(k),

d(k)) : k = 1, 2, . . . , K be the set of the possible values of the random variables(T (ω),

W (ω),h(ω),d(ω))

and let pk := P(T (ω),W (ω),h(ω),d(ω)

)=(T (k),W (k),h(k),d(k)

)be

the associated probability for k = 1, 2, . . . , K. Then Problem (2.1.3, 2.1.4) becomes

max cTx+K∑k=1

pkQ(k)(x)

s.t. Ax KJ1 b,(3.1.1)

where, for k = 1, 2, . . . , K, Q(k)(x) is the maximum value of the problem

max d(k)Ty(k)

s.t. W (k)y(k) KJ2 h(k) − T (k)x,

(3.1.2)

where x ∈ Rm1 is the first-stage decision variable, and y(k) ∈ Rm2 is the second-stage

variable for k = 1, 2, . . . , K. Now, for convenience we redefine d(k) as d(k) := pkd(k) for

43

k = 1, 2, . . . , K, and rewrite Problem (3.1.1, 3.1.2) as

max cTx+K∑k=1

Q(k)(x)

s.t. Ax+ s = b

s KJ1 0,

(3.1.3)

where, for k = 1, 2, . . . , K, Q(k)(x) is the maximum value of the problem

max d(k)Ty(k)

s.t. W (k)y(k) + s(k) = h(k) − T (k)x

s(k) KJ2 0.

(3.1.4)

Let ν(k) be the second-stage dual multiplier. The dual of Problem of (3.1.4) is

min (h(k) − T (k)x) • ν(k)

s.t. W (k)Tν(k) = d(k)

ν(k) KJ2 0.

(3.1.5)

The log barrier problem associated with Problem (3.1.3, 3.1.4) is

max η(µ,x) := cTx+K∑k=1

ρ(k)(µ,x) + µ ln det s

s.t. Ax+ s = b

s KJ1 0,

(3.1.6)

where, for k = 1, 2, . . . , K, ρ(k)(µ,x) is the maximum value of the problem

max d(k)Ty(k) + µ ln det s(k)

s.t. W (k)y(k) + s(k) = h(k) − T (k)x

s(k) KJ2 0.

(3.1.7)

44

(Here µ > 0 is a barrier parameter.) If for some k, Problem (3.1.7) is infeasible, then we

define∑K

k=1 ρ(k)(µ,x) :=∞. The log barrier problem associated with Problem (3.1.5) is

the problem

min (h(k) − T (k)x) • ν(k) − µ ln detν(k)


ν(k) KJ2 0,

(3.1.8)

which is the Lagrangian dual of (3.1.7). Because Problems (3.1.7) and (3.1.8) are, respec-

tively, concave or convex, (y(k), s(k)) and ν(k) are optimal solutions to (3.1.7) and (3.1.8),

respectively, if and only if they satisfy the following optimality conditions:

s(k) ν(k) = µ e2,

W (k)y(k) + s(k) = h(k) − T (k)x,

W (k)Tν(k) = d(k),

s(k) KJ2 0, ν(k) KJ2 0.

(3.1.9)

The elements s(k) and ν(k) may not operator commute. So, the equality s(k) = µ ν(k)−1

may not hold (see Remark 1 in Chapter 1). In fact, we need to scale the optimality

conditions (3.1.9) so that the scaled elements are simultaneously decomposed.

Let p KJ2 0. From now on, with respect to p, we define

s(k) := dp−1es(k), ν(k) := dpeν(k), h := dp−1eh, W (k) := dp−1eW (k), and T (k) := dp−1eT (k).

Recall that, dpedp−1e = dpedpe−1 = I. We have the following lemma and proposition.

Lemma 3.1.1 (Lemma 28, [36]). Let p be an invertible element in J2. Then s ν = µe2

if and only if s ν = µe2.

Proposition 3.1.1. (y, s,ν) satisfies the optimality conditions ( 3.1.1) if and only if

45

(y, s, ν) satisfies the relaxed optimality conditions:

s(k) ν(k) = µe2,

W (k)y(k) + s(k) = h(k) − T (k)x,

W (k)Tν(k) = d(k),

s(k) KJ2 0, ν(k) KJ2 0.

(3.1.10)

Proof. The proof follows from Lemma 3.1.1, Proposition 1.3.2, and the fact that dpe(KJ2) =

KJ2 , and likewise, as an operator, dpe(int(KJ2)) = int(KJ2), because KJ2 is symmetric. 2

To our knowledge, this is the first time this effective way of scaling is being used for

stochastic programing, but it was originally proposed by Monteiro [28] and Zhang [47] for

DSDP, and after that generalized by Schmieta and Alizadeh [36] for DSP.

With this change of variable Problem (3.1.6, 3.1.7) becomes

max η(µ,x) := cTx+K∑k=1

ρ(k)(µ,x) + µ ln det s

s.t. Ax+ s = b

s KJ1 0,

(3.1.11)

where, for k = 1, 2, . . . , K, ρ(k)(µ,x) is the maximum value of the problem

max d(k)Ty(k) + µ ln det s(k)

s.t. W (k)y(k) + s(k) = h(k) − T (k)x

s(k) KJ2 0,

(3.1.12)

46

and Problem (3.1.8) becomes

min (h(k) − T (k)x) • ν(k) − µ ln det ν(k)


ν(k) KJ2 0,

(3.1.13)

Note that the Problem (3.1.7) and Problem (3.1.12) have the same maximizer, but

their optimal objective values are equal up to a constant. More precisely, the difference

between their optimal objective values would be µ ln detp2 (see item 1 of Lemma (1.3.2)).

Similarly, Problems (3.1.8) and (3.1.13) have the same minimizer but their optimal ob-

jective values differ by the same value (µ ln detp2).

The SSP (3.1.11, 3.1.12) can be equivalently written as a DSP:

max

= η(µ,x)︷︸︸︷cTx+ µ ln det s+

K∑k=1

(d(k)Ty(k) + µ ln det s(k)

)︸︷︷︸

= ρ(k)(µ,x)

s.t. Ax+ s = b

W (k)y(k) + s(k) = h(k) − T (k)x, k = 1, 2, . . . , K

s KJ1 0, s(k) KJ2 0, k = 1, 2, . . . , K.

(3.1.14)

In Subsection 3.1.2, we need to compute ∇xη(µ,x) and ∇2η(µ,x) so that we can

determine the Newton direction defined by

∆x := −∇2η(µ,x)−1∇xη(µ,x)

for our algorithms. We shall see that each choice of p leads to a different search direction

(see Algorithm 1). As we mentioned earlier, we are interested in the class of p for which

the scaled elements are simultaneously decomposed. In view of Theorem 1.3.5, it is enough

to choose p so that s(k) and ν(k) operator commute. That is, we restrict our attention to

47

the following set of scalings:

C(s(k),ν(k)) := p KJ2 0 | s(k) and ν(k) operator commute.

We introduce the following definition [1].

Definition 3.1.1. The set of directions ∆x arising from those p ∈ C(s(k),ν(k)) is called

the commutative class of directions, and a direction in this class is called a commutative

direction.

It is clear that p = e may not be in C(s(k),ν(k)). The following choices of p shows

that the set C(s(k),ν(k)) is not empty.

We may choose p = s(k)1/2and get s(k) = e2. In fact, by using Lemma 1.3.2, it can

be seen that

s(k) = dp−1e s(k)

=⌈s(k)−1/2

⌉(s(k)1/2

)2

=(⌈s(k)−1/2

⌉ ⌈s(k)1/2

⌉)e2

=

(⌈s(k)1/2

⌉−1 ⌈s(k)1/2

⌉)e2

= e2.

We may choose p = ν(k)−1/2and get ν(k) = e2. To see this, note that by using Lemma

1.3.2 we have

ν(k) = dpeν(k)

=⌈ν(k)−1/2

⌉(ν(k)1/2

)2

=(⌈ν(k)−1/2

⌉ ⌈ν(k)1/2

⌉)e2

=

(⌈ν(k)1/2

⌉−1 ⌈ν(k)1/2

⌉)e2

= e2.

The above two choices of directions are well-known in commutative classes. These two

choices form a class of Newton directions derived by Helmberg et al [20], Monteiro [28],

48

and Kojima et al [22], and referred to as the HRVW/KSH/M directions. It is interesting

to mention that the popular search direction due to Nesterov-Todd (NT) is also in a

commutative class, because in this case one chooses p in such a way that ν(k) = s(k).

More precisely, in the NT direction we choose

p =

(⌈ν(k)1/2

⌉(⌈ν(k)1/2

⌉s(k))−1/2

)−1/2

=

(⌈s(k)−1/2

⌉(⌈s(k)1/2

⌉ν(k)

)1/2)−1/2

,

and, using Lemma 1.3.2, we get

dpe2ν(k) = dp2eν(k)

=

⌈⌈ν(k)1/2

⌉(⌈ν(k)1/2

⌉s(k))−1/2

⌉−1

ν(k)

=

⌈⌈ν(k)−1/2

⌉(⌈ν(k)1/2

⌉s(k))1/2

⌉ν(k)

=

(⌈ν(k)−1/2

⌉⌈(⌈ν(k)1/2

⌉s(k))1/2

⌉⌈ν(k)−1/2

⌉)ν(k)

=

(⌈ν(k)−1/2

⌉⌈(⌈ν(k)1/2

⌉s(k))1/2

⌉)e2

=(⌈ν(k)−1/2

⌉ ⌈ν(k)1/2

⌉)s(k)

= s(k).

Thus, ν(k) = dpeν(k) = dp−1es(k) = s(k).

We proceed by making some assumptions. First we define

F1 :=x : Ax+ s = b, s KJ1 0

;

F (k)(x) :=y(k) : W (k)y(k) + s(k) = h(k) − T (k)x, s(k) KJ2 0

for k = 1, 2, . . . , K;

F (k)2 :=

x : F (k)(x) 6= ∅

for k = 1, 2, . . . , K;

F2 :=⋂Kk=1F

(k)2 ;

F0 := F1

⋂F2;

49

F :=

(x, s,γ)× (y(1), s(1), ν(1), . . . ,y(K), s(K), ν(K)) : Ax+ s = b, s KJ1 0,

W (k)y(k) + s(k) = h(k) − T (k)x, s(k) KJ2 0, W (k)Tν(k) = d(k),

ν(k) KJ2 0, k = 1, 2, . . . , K;ATγ +∑K

k=1 T(k)Tν(k) = c

.

Here γ is the first-stage dual multiplier. Now we make

Assumption 3.1.1. The matrices A and W (k) for all k have full column rank.

Assumption 3.1.2. The set F is nonempty.

Assumption 3.1.1 is for convenience. Assumption 3.1.2 guarantees strong duality for

first- and second-stage SSPs. In other words, it requires that Problem (3.1.14) and its dual

have strictly feasible solutions. This implies that problems (3.1.11-3.1.14) have a unique

solutions. Note that for a given µ > 0,∑K

k=1 ρ(k)(µ,x) <∞ if and only if x ∈ F2. Hence,

the feasible region for (3.1.11) is described implicitly by F0. Throughout the paper we

denote the optimal solution of the first-stage problem (3.1.11) by x(µ), and the solutions

of the optimality conditions (3.1.10) by (y(k)(µ,x), s(k)(µ,x), ν(k)(µ,x)). The following

proposition establishes the relationship between the optimal solutions of Problems (3.1.11,

3.1.12) and those of Problem (3.1.14).

Proposition 3.1.2. Let µ > 0 be fixed. Then (x(µ), s(µ); y(1)(µ), s(1)(µ); · · · ; y(K)(µ),

s(K)(µ)) is the optimal solution of (3.1.14) if and only if (x(µ), s(µ)) is the optimal solution

of (3.1.11) and (y(1)(µ), s(1)(µ); · · · ; y(K)(µ), s(K)(µ)) are the optimal solutions for (3.1.12)

for given µ and x = x(µ).

3.1.2 Computation of ∇xη(µ,x) and ∇2xxη(µ,x)

In order to compute ∇xη(µ,x) and ∇2xxη(µ,x), we need to determine the derivative of

ρ(k)(µ,x) with respect to x. Let (y(k), ν(k), s(k)) := (y(k)(µ,x), ν(k)(µ,x), s(k)(µ,x)). We

50

first note that from (3.1.10) we have that

(h(k)−T (k)x)•ν(k) = (W (k)y(k)+s(k))•ν(k) = y(k)T(W (k)Tν(k))+s(k)•ν(k) = y(k)Td(k)+r2 µ,

where in the second equality we used the observation that

W (k)y(k) • ν(k) =

m2∑i=1

(y(k)i w

(k)i ) • ν(k) =

m2∑i=1

y(k)i (w

(k)i • ν(k)) = y(k)T(W (k)Tν(k))

(here w(k)i ∈ J2 is the ith column of W (k)), and in the last equality we used that trace(e2) =

rank(J2) = r2. This implies that

(h(k) − T (k)x) • ν(k) − µ ln det ν(k) = ρ(k)(µ,x)− µ ln det s(k) + r2 µ− µ ln det ν(k)

= ρ(k)(µ,x) + r2 µ− µ ln det(s(k) ν(k))

= ρ(k)(µ,x) + r2 µ (1− lnµ).

Thus,

ρ(k)(µ,x) = (h(k) − T (k)x) • ν(k) − µ ln det ν(k) − r2 µ (1− lnµ). (3.1.15)

Differentiating (3.1.15) and using the optimality conditions (3.1.10), we obtain

∇xρ(k)(µ,x) = ∇x((h(k) − T (k)x) • ν(k))− µ ∇xln det ν(k)

= (∇x(h(k) − T (k)x))Tν(k) + (∇xν(k))T(h(k) − T (k)x)− µ (∇xν

(k))Tν(k)−1

= −T (k)Tν(k) + (∇xν(k))T(W (k)y(k) + s(k))− µ (∇xν

(k))Tν(k)−1

= −T (k)Tν(k) + (∇xν(k))T(W (k)y(k)) + (∇xν

(k))T(s(k) − µ ν(k)−1)

= −T (k)Tν(k) + (∇xν(k))T(W (k)y(k))

From (3.1.10) we have that W(k)i • ν(k) = d

(k)i , where d

(k)i is the ith entry of d(k). This

implies that

51

(∇xν(k))T(W (k)y(k)) = (∇xν

(k))Tm2∑i=1

y(k)i w

(k)i

=

m2∑i=1

y(k)i (∇xν

(k))Tw(k)i

=

m2∑i=1

y(k)i ∇x(ν(k) • w(k)

i )

= 0.

Thus,

∇xρ(k)(µ,x) = −T (k)Tν(k) and ∇2

xxρ(k)(µ,x) = −T (k)T∇xν

(k).

Therefore, we also need to determine the derivative of ν(k) with respect to x. Differ-

entiating (3.1.10) with respect to x, we get the system

∇xν(k) = −µ

⌈s(k)−1

⌉∇xs

(k),

W (k)∇xy(k) +∇xs

(k) = −T (k),

W (k)T∇xν(k) = 0.

(3.1.16)

Solving the system (3.1.16), we obtain

∇xs(k) = −

⌈s(k)1/2

⌉P (k)

⌈s(k)−1/2

⌉T (k),

∇xν(k) = µ

⌈s(k)−1/2

⌉P (k)

⌈s(k)−1/2

⌉T (k),

∇xy(k) = −R(k)−1

W (k)T⌈s(k)−1

⌉T (k),

(3.1.17)

where

R(k) := R(k)(µ,x) = W (k)T⌈s(k)−1

⌉W (k),

P (k) := P (k)(µ,x) = I −⌈s(k)−1/2

⌉W (k)R(k)−1

W (k)T⌈s(k)−1/2

⌉.

(3.1.18)

Observing that, by differentiating Ax + s = b with respect to x, we get ∇xs = −A.

52

We then have

∇xη(µ,x) = c−K∑k=1

∇xρ(k)(µ,x) + µ (∇xs)Ts−1,

= c−K∑k=1

T (k)Tν(k) − µ ATs−1,

(3.1.19)

and

∇2xxη(µ,x) = −

K∑k=1

T (k)T∇xν(k) + µ AT

⌈s−1⌉∇xs

= −µK∑k=1

T (k)T⌈s(k)−1/2

⌉P (k)

⌈s(k)−1/2

⌉T (k) − µ AT

⌈s−1⌉A.

(3.1.20)

3.2 Self-concordance properties of the log-barrier re-

course

The notion of so called self-concordant functions introduced by Nesterov and Nemirovskii

[30] allows us to develop polynomial time path following interior point methods for solving

SSPs. In this section, we prove that the recourse function with log barrier is a strongly

self-concordant function and leads to a strongly self-concordant family with appropriate

parameters.

3.2.1 Self-concordance of the recourse function

This subsection is devoted to show that η(µ, ·) is a µ strongly self-concordant barrier on

F0. First, we have the following definition.

Definition 3.2.1 (Nesterov and Nemirovskii [30, Definition 2.1.1]). Let E be a finite-

dimensional real vector space space, G be an open nonempty convex subset of E, and let

53

f be a C3, convex mapping from G to R. Then f is called α-self-concordant on G with

the parameter α > 0 if for every x ∈ G and h ∈ E, the following inequality holds:

|∇3xxxf(x)[h,h,h]| ≤ 2α−1/2(∇2

xxf(x)[h,h])3/2. (3.2.1)

An α-self-concordant function f on G is called strongly α-self-concordant if f tends to

infinity for any sequence approaching a boundary point of G.

We note that in the above definition the set G is assumed to be open. However,

relative openness would be sufficient to apply the definition. See also [30, Item A, Page

57].

The proof of strongly self-concordance of the function η(µ, ·) relies on two lemmas

that we state and prove below. We point out that the proof of the first lemma is shown

in [30, Proposition 5.4.5] for s ∈ Sn1++ := int(Sn1

+ ), i.e., s lies in the interior of the cone of

real symmetric positive semidefinite matrices of orders n1. We now generalize this proof

for the case where s lies in the interior of an arbitrary symmetric cone of dimension n1,

where n1, as mentioned before, is any positive integer.

Lemma 3.2.1. For any fixed µ > 0, the function f(s) := −µ ln det s is a µ strongly

self-concordant barrier on KJ1 .

Proof. Let si∞i=1 be any sequence in KJ1 . It is clear that the function f(si) tends to

∞ when si approaches a point from boundary of KJ1 . It remains to show that (3.2.1) is

satisfied for f(s) on KJ1 . Let s KJ1 0 and h ∈ J1, then there exists an element u ∈ J1

such that s = u2, and therefore, by Lemmas 1.3.1 and 1.3.2, we have that

∇sf(s)[h] = −µln det (s+ th)′|t=0

= −µln det (u2 + th)′|t=0

= −µln det (due(e1 + tdue−1h))′|t=0

= −µln det (u2) + ln det (e1 + tdue−1h))′|t=0

54

= −µ trace(due−1h)

det e1

= −µ(s−1 • h),

∇2ssf(s)[h,h] = ∇s(∇sf(s)[h]) • h

= −µ (∇s(s−1 • h)) • h

= −µ (Ds(s−1 h)) • h

= µ (s−2 h) • h

= µ(s−1 h) • (s−1 h),

∇3sssf(s)[h,h,h] = ∇s(∇2

ssf(s)[h,h]) • h

= µ (∇s((s−1 h) • (s−1 h))) • h

= µ (Ds((s−1 h) (s−1 h))) • h

= −2µ ((s−2 h) (s−1 h)) • h

= −2µ(s−1 h) • ((s−1 h) (s−1 h)).

Let h = s−1 h ∈ J1, and λ1, λ2, . . . , λp be its eigenvalues. The result is established

by observing that

2µ |trace(h3)|︸︷︷︸= |∇3

sssf(s)[h,h,h]|

= 2µ

p∑i=1

|λi|3 ≤ 2µ

p∑i=1

λi2

3/2

= 2µ trace(h2)3/2︸︷︷︸= 2µ−1/2∇2

xxf(s)[h,h]3/2

. 2

Lemma 3.2.2. For any fixed µ > 0, ρ(k)(µ,x) is a µ strongly self-concordant barrier on

F (k)2 , k = 1, 2, · · · , K.

Proof. Let xi∞i=1 be any sequence in F (k)2 . It is clear that the function ρ(k)(µ,xi) tends

to ∞ when xi approaches a point from boundary of F (k)2 . It remains to show that (3.2.1)

is satisfied for ρ(k)(µ,x) on F (k)2 . For any µ > 0, x ∈ F (k)

2 , and d ∈ Rm1 , we define the

univariate function

Ψ(k)(t) := ∇2xxρ

(k)(µ,x+ td)[d,d].

55

Note that Ψ(k)(0) = ∇2xxρ

(k)(µ,x)[d,d] and Ψ(k)(0)′ = ∇3xxxρ

(k)(µ,x)[d,d,d]. So, to

prove the lemma, it is enough to show that

|Ψ(k)(0)′| ≤ 2√µ|Ψ(k)(0)|3/2.

Let (ν(k)(t), s(k)(t), P (k)(t), R(k)(t)) := (ν(k)(µ,x+td), s(k)(µ,x+td), P (k)(µ,x+td), R(k)(µ,x+

td)). We also define

u(k)(t) :=√µ P (k)(t)

⌈s(k)−1/2

(t)⌉T (k)(t)d.

By the notations introduced in §4, we have (ν(k), s(k)) = (ν(k)(0), s(k)(0)). We also let

(P (k), R(k),u(k)) := (P (k)(0), R(k)(0),u(k)(0)).

Notice that, by the definition of P (k), we have P (k)2= P (k). Using (3.1.17) and

(3.1.18), we get

Ψ(k)(0) = ∇2xxρ

(k)(µ,x)[d,d]

= −(T (k)T∇xν(k))[d,d]

= −µ(T (k)T

⌈s(k)−1/2

(t)⌉P (k)

⌈s(k)−1/2

(t)⌉T (k)

)[d,d]

= −µ dT(T (k)T

⌈s(k)−1/2

(t)⌉P (k)2

⌈s(k)−1/2

(t)⌉T (k)

)d

= −u(k) • u(k)

= −‖u(k)‖2.

Hence Ψ(k)(0)′= −2u(k) • u(k)′. So, in order to bound |Ψ(k)(0)

′|, we need to compute

the derivative of u(k) with respect to t. Using (3.1.18), we have

u(k)′ =√µP (k)

⌈s(k)−1/2

(t)⌉′

T (k)d

=√µ⌈s(k)−1/2

(t)⌉−⌈s(k)−1/2

(t)⌉W (k)R(k)−1

W (k)T⌈s(k)−1/2

(t)⌉2′

T (k)d

56

=√µ⌈s(k)−1/2

(t)⌉′−⌈s(k)−1/2

(t)⌉′W (k)R(k)−1

W (k)T⌈s(k)−1/2

(t)⌉2

−⌈s(k)−1/2

(t)⌉W (k)(R(k)−1

)′W (k)T⌈s(k)−1/2

(t)⌉2

−⌈s(k)−1/2

(t)⌉W (k)R(k)−1

W (k)T(⌈s(k)−1/2

(t)⌉2)′

T (k)d

=√µ⌈s(k)−1/2

(t)⌉′−⌈s(k)−1/2

(t)⌉′W (k)R(k)−1

W (k)T⌈s(k)−1/2

(t)⌉2

+⌈s(k)−1/2

(t)⌉W (k)R(k)−1

(R(k))′R(k)−1W (k)T

⌈s(k)−1/2

(t)⌉2

−⌈s(k)−1/2

(t)⌉W (k)R(k)−1

W (k)T(⌈s(k)−1/2

(t)⌉ ⌈s(k)−1/2

(t)⌉′

+⌈s(k)−1/2

(t)⌉′ ⌈s(k)−1/2

(t)⌉)

T (k)d

=√µ⌈s(k)−1/2

(t)⌉′−⌈s(k)−1/2

(t)⌉′W (k)R(k)−1

W (k)T⌈s(k)−1/2

(t)⌉2

+⌈s(k)−1/2

(t)⌉W (k)R(k)−1

W (k)T(⌈s(k)−1/2

(t)⌉ ⌈s(k)−1/2

(t)⌉′

+⌈s(k)−1/2

(t)⌉′ ⌈s(k)−1/2

(t)⌉)

W (k)R(k)−1W (k)T

⌈s(k)−1/2

(t)⌉2

−⌈s(k)−1/2

(t)⌉W (k)R(k)−1

W (k)T(⌈s(k)−1/2

(t)⌉ ⌈s(k)−1/2

(t)⌉′

+⌈s(k)−1/2

(t)⌉′ ⌈s(k)−1/2

(t)⌉)

T (k)d

=√µ⌈s(k)−1/2

(t)⌉′(

I − W (k)R(k)−1W (k)T

⌈s(k)−1/2

(t)⌉2)

−⌈s(k)−1/2

(t)⌉W (k)R(k)−1

W (k)T(⌈s(k)−1/2

(t)⌉ ⌈s(k)−1/2

(t)⌉′

+⌈s(k)−1/2

(t)⌉′ ⌈s(k)−1/2

(t)⌉)(

− W (k)R(k)−1W (k)T

⌈s(k)−1/2

(t)⌉2

+ I)T (k)d

=√µ⌈s(k)−1/2

(t)⌉′−⌈s(k)−1/2

(t)⌉W (k)R(k)−1

W (k)T(⌈s(k)−1/2

(t)⌉ ⌈s(k)−1/2

(t)⌉′

+⌈s(k)−1/2

(t)⌉′ ⌈s(k)−1/2

(t)⌉)(

I − W (k)R(k)−1W (k)T

⌈s(k)−1/2

(t)⌉2)

T (k)d

=√µ⌈s(k)−1/2

(t)⌉′−⌈s(k)−1/2

(t)⌉W (k)R(k)−1

W (k)T(⌈s(k)−1/2

(t)⌉ ⌈s(k)−1/2

(t)⌉′

+⌈s(k)−1/2

(t)⌉′ ⌈s(k)−1/2

(t)⌉)(⌈

s(k)−1/2(t)⌉−1

P (k)⌈s(k)−1/2

(t)⌉)

T (k)d

=⌈s(k)−1/2

(t)⌉′−⌈s(k)−1/2

(t)⌉W (k)R(k)−1

W (k)T(⌈s(k)−1/2

(t)⌉ ⌈s(k)−1/2

(t)⌉′

+⌈s(k)−1/2

(t)⌉′ ⌈s(k)−1/2

(t)⌉)⌈

s(k)−1/2(t)⌉−1

u(k).

Notice that, for any ξ ∈ Rm2 , we have

u(k) •⌈s(k)−1/2

(t)⌉W (k)ξ =

(P (k)

⌈s(k)−1/2

(t)⌉T (k)d

)•⌈s(k)−1/2

(t)⌉W (k)ξ

=(⌈s(k)−1/2

(t)⌉T (k)d

)• P (k)

⌈s(k)−1/2

(t)⌉W (k)ξ

57

=(⌈s(k)−1/2

(t)⌉T (k)d

)•(⌈s(k)−1/2

(t)⌉W (k)ξ

−⌈s(k)−1/2

(t)⌉W (k)R(k)−1

W (k)T⌈s(k)−1/2

(t)⌉2

W (k)︸︷︷︸=R(k)

ξ)

= 0.

This implies that

Ψ(k)(0)′= 2u(k) • u(k)′ = 2u(k) •

(⌈s(k)−1/2

(t)⌉′⌈s(k)−1/2

(t)⌉−1

u(k)). (3.2.2)

By (3.2.2), (3.1.17) and (3.1.18) and using norm inequalities, we get

|Ψ(k)(0)′| = 2µ−1/2

∣∣∣∣u(k) •⌈s(k)−1/2

(t)⌉′⌈s(k)1/2(t)

⌉u(k)

∣∣∣∣= µ−1/2

∣∣∣∣u(k) •(⌈s(k)−1/2

(t)⌉′⌈s(k)1/2(t)

⌉+⌈s(k)1/2(t)

⌉⌈s(k)−1/2

(t)⌉′)u(k)

∣∣∣∣= µ−1/2

∣∣∣u(k) •⌈s(k)1/2(t)

⌉(⌈s(k)−1/2

(t)⌉⌈s(k)−1/2

(t)⌉′

+⌈s(k)−1/2

(t)⌉′⌈s(k)−1/2

(t)⌉)⌈

s(k)1/2(t)⌉u(k)

∣∣∣= µ−1/2

∣∣∣u(k) •⌈s(k)1/2(t)

⌉(⌈s(k)−1/2

(t)⌉2)′⌈

s(k)1/2(t)⌉u(k)

∣∣∣= µ−1/2

∣∣∣u(k) •⌈s(k)1/2(t)

⌉⌈s(k)−1

(t)⌉′⌈s(k)1/2(t)

⌉u(k)

∣∣∣= µ−1/2

∣∣∣u(k) •⌈ν(k)−1/2

(t)⌉⌈ν(k)(t)

⌉′⌈ν(k)−1/2

(t)⌉u(k)

∣∣∣= 2µ−1/2

∣∣∣u(k) •⌈ν(k)−1/2

(t)⌉⌈ν(k)(t), ν(k)′(t)

⌉⌈ν(k)−1/2

(t)⌉u(k)

∣∣∣= 2µ−1/2

∣∣∣u(k) •⌈ν(k)−1/2

(t)⌉⌈ν(k)(t), ν(k)(µ,x+ td)′|t=0

⌉⌈ν(k)−1/2

(t)⌉u(k)

∣∣∣= 2µ−1/2

∣∣∣u(k) •⌈ν(k)−1/2

(t)⌉⌈ν(k)(t),∇xν

(k)[d]⌉⌈ν(k)−1/2

(t)⌉u(k)

∣∣∣= 2µ−1/2

∣∣∣u(k) •⌈e2, ν

(k)−1/2(t) ∇xν

(k)[d] ν(k)−1/2(t)⌉u(k)

∣∣∣= 2µ−1/2

∣∣∣u(k) •⌊ν(k)−1/2

(t) ∇xν(k)[d] ν(k)−1/2

(t)⌋u(k)

∣∣∣≤ 2µ−1/2‖u(k)‖

∥∥∥⌊ν(k)−1/2(t) ∇xν

(k)[d] ν(k)−1/2(t)⌋u(k)

∥∥∥≤ 2µ−1/2‖u(k)‖2

∥∥∥ν(k)−1/2(t) ∇xν

(k)[d] ν(k)−1/2(t)∥∥∥

= 2µ−1/2‖u(k)‖2∥∥∥ ⌈ν(k)−1/2

(t)⌉∇xν

(k)d∥∥∥

= 2µ−1 ‖u(k)‖2∥∥∥ ⌈s(k)1/2(t)

⌉∇xν

(k)d∥∥∥

58

= 2µ−1/2‖u(k)‖2‖u(k)‖

= 2µ−1/2‖Ψ(k)(0)‖3/2. The lemma is established. 2

Theorem 3.2.1. For any fixed µ > 0, η(µ,x) is a µ strongly self-concordant barrier on

F0.

Proof. It is trivial to see that the linear function cTx is a µ strongly self-concordant

barrier on F1 (indeed, both sides of (3.2.1) are identically zero). By Lemma 3.2.1, one

can see that the function µ ln det s is also a µ strongly self-concordant barrier on F1.

By Lemma 3.2.2 and [30, Proposition 2.1.1(ii)], we can conclude that∑K

k=1 ρ(k)(µ,x)

is a µ strongly self-concordant barrier on F2. The theorem is then established by [30,

Proposition 2.1.1(ii)]. 2

3.2.2 Parameters of the self-concordant family

We have shown that the recourse function η(µ, ·) is a strongly self-concordant function.

Having such a property, this function enjoys many nice features. In this subsection, we

show that the family of functions η(µ, ·) : µ > 0 is a strongly self-concordant family

with appropriate parameters. We first introduce the definition of self-concordant families.

Definition 3.2.2 (Nesterov and Nemirovskii [30, Definition 3.1.1]). Let R++ be the set of

all positive real numbers. Let G be an open nonempty convex subset of Rn. Let µ ∈ R++

and let fµ : R++×G→ R be a family of functions indexed by µ. Let α1(µ), α2(µ), α3(µ),

α4(µ) and α5(µ) : R++ → R++ be continuously differentiable functions on µ. Then the

family of functions fµµ∈R++ is called strongly self-concordant with the parameters α1,

α2, α3, α4, α5, if the following conditions hold:

(i) fµ is continuous on R++ × G, and for fixed µ ∈ R++, fµ is convex on G. fµ has

three partial derivatives on G, which are continuous on R++ × G and continuously

differentiable with respect to µ on R++.

59

(ii) For any µ ∈ R++, the function fµ is strongly α1(µ)-self-concordant.

(iii) For any (µ,x) ∈ R++ ×G and any h ∈ Rn,

|∇xfµ(µ,x)[h]′ − ln α3(µ)′∇xfµ(µ,x)[h]| ≤ α4(µ)α1(µ)12

(∇2

xxfµ(µ,x)[h,h]) 1

2 ,

|∇2xxfµ(µ, x)[h,h]′ − ln α2(µ)′∇2

xxfµ(µ,x)[h,h]| ≤ 2α5(µ)∇2xxfµ(µ,x)[h,h].

In this section, we need to compute ∇xη(µ,x)′, and in order to do so we need to

determine the partial derivative of ν(k)(µ,x) with respect to µ. Let (y(k)′, ν(k)′ , s(k)′)

denote the partial derivatives of (y(k)(µ,x), ν(k)(µ,x), s(k)(µ,x)) with respect to µ. Dif-

ferentiating (3.1.10) with respect to µ, we get the system

s(k) ν(k)′ + ν(k) s(k)′ = e2,

W (k)y(k)′ + s(k)′ = 0,

W (k)Tν(k)′ = 0.

(3.2.3)

Solving the system (3.2.3), we obtain

s(k)′ = W (k)R(k)−1W (k)T s(k)−1

,

ν(k)′ =⌈s(k)−1/2

(t)⌉P (k)e2,

y(k)′ = −R(k)−1W (k)T s(k)−1

.

(3.2.4)

The proof of strongly self-concordance of the family η(µ, ·) : µ > 0 depends on the

following two lemmas.

Lemma 3.2.3. For any µ > 0,x ∈ F0 and h ∈ Rm1, the following inequality holds:

|∇xη(µ,x)T[h]′| ≤(−(r1 +Kr2)

µ∇2

xxη(µ,x)[h,h]

)1/2

.

60

Proof. By differentiating (3.1.19) with respect to µ and applying (3.2.4), we obtain

∇xη(µ,x)′ = −K∑k=1

T (k)T⌈s(k)−1/2

(t)⌉P (k)e2 − ATs−1 = − 1

√µBε,

where B ∈ Rm1×(Kn2+n1) is defined by

B :=[√

µ T (1)T⌈s(1)−1/2

(t)⌉P (1), . . . ,

√µ T (k)T

⌈s(K)−1/2

(t)⌉P (K),

√µ AT

⌈s−1/2

⌉ ]

and

ε := (e2; . . . ; e2︸︷︷︸K times

; e1) ∈ J2 × · · · × J2︸︷︷︸K times

×J1.

Notice that, in view of (3.1.20), we have

BBT = µK∑k=1

T (k)T⌈s(k)−1/2

(t)⌉P (k)

⌈s(k)−1/2

(t)⌉T (k) + µ AT

⌈s−1⌉A = −∇2

xxη(µ,x).

This gives

−∇xη(µ,x)T′∇2xxη(µ,x)−1∇xη(µ,x)′ = 1

µε •BT[BBT]−1Bε

≤ 1µε • ε

= 1µ(e1 • e1 +Ke2 • e2).

Recall that e1 • e1 = rank(J1) = r1 and e2 • e2 = rank(J2) = r2. It follows that

−∇xη(µ,x)T′∇2xxη(µ,x)−1∇xη(µ,x)′ ≤ (r1 +Kr2)

µ. (3.2.5)

We then have

|∇xη(µ,x)T[h]′| ≤(−∇xη(µ,x)T′∇2

xxη(µ,x)−1∇xη(µ,x)′)1/2

(−∇2xxη(µ,x)[h,h])

1/2

≤(−(r1 +Kr2)

µ∇2

xxη(µ,x)[h,h]

)1/2

, as desired. 2

61

Lemma 3.2.4. For any µ > 0,x ∈ F0 and h ∈ Rm1, the following inequality holds:

|∇2xxη(µ,x)[h,h]′| ≤ −

2√r2

µ∇2

xxη(µ,x)[h,h].

Proof. Let (ν(k), s(k), P (k), R(k)) := (ν(k)(µ,x), s(k)(µ,x), P (k)(µ,x), R(k)(µ,x)). We fix

h ∈ Rm1 and define

u(k) :=√µ P (k)

⌈s(k)−1/2

⌉T (k)h.

Now by following some similar steps as in the proof of Lemma 3.2.2, and using (3.2.2),

(3.1.18), (3.2.3), (3.1.10), and (3.2.4), we get

(u(k) • u(k))′ = 2 u(k) • u(k)′

= 2 u(k) •(⌈s(k)−1/2

(t)⌉′⌈s(k)−1/2

(t)⌉−1

u(k))

= u(k) •⌈s(k)1/2(t)

⌉⌈s(k)−1

(t)⌉′⌈s(k)1/2(t)

⌉u(k)

= u(k) •⌈ν(k)−1/2

(t)⌉⌈ν(k)(t)

⌉′⌈ν(k)−1/2

(t)⌉u(k)

= 2 u(k) •⌈ν(k)−1/2

(t)⌉⌈ν(k)(t), ν(k)′(t)

⌉⌈ν(k)−1/2

(t)⌉u(k)

= 2 u(k) •⌈e2, ν

(k)−1/2(t) ν(k)′(t) ν(k)−1/2

(t)⌉u(k)

= 2 u(k) •⌊ν(k)−1/2

(t) ν(k)′(t) ν(k)−1/2(t)⌋u(k)

= 2 u(k) •⌊ν(k)−1

(t) ν(k)′(t)⌋u(k)

= 2µ−1 u(k) •⌊s(k)(t) ν(k)′(t)

⌋u(k)

= 2µ−1 u(k) •⌊e2 − ν(k)(t) s(k)′(t)

⌋u(k)

= 2µ−1 u(k) •⌊e2 − µ s(k)−1

(t) s(k)′(t)⌋u(k)

≤ 2µ−1‖u(k)‖2∥∥∥e2 − µ

⌈s(k)−1/2

(t)⌉s(k)′(t)

∥∥∥= 2µ−1‖u(k)‖2

∥∥∥e2 − µ⌈s(k)−1/2

(t)⌉W (k)R(k)−1

W (k)T s(k)−1∥∥∥

= 2µ−1‖u(k)‖2 ‖e2 − µ (e2 − P (k)e2)‖

≤ 2µ−1‖u(k)‖2 ‖e2‖,

where the last inequality is obtained by observing that e2 − P (k)e2 KJ2 0, which can be

62

immediately seen by noting that P (k) I.

Recall that ‖e2‖2 = rank(J2) = r2. This gives us

|(u(k) • u(k))′| ≤2√r2

µu(k) • u(k).

From (3.1.20), we have that

∇2xxη(µ,x)[h,h] = −

K∑k=1

u(k) • u(k) − µ (AT⌈s−1⌉A)[h,h].

Therefore, for any h ∈ Rn, we have

|∇2xxη(µ,x)[h,h]′| ≤

K∑k=1

|(u(k) • u(k))′|+ (AT⌈s−1⌉A)[h,h]

≤2√r2

µ

K∑k=1

u(k) • u(k) + (AT⌈s−1⌉A)[h,h]

≤ −2√r2µ∇2

xxη(µ,x)[h,h].

This completes the proof. 2

Theorem 3.2.2. The family η(µ, ·) : µ > 0 is a strongly self-concordant family with

the following parameters

α1(µ) = µ, α2(µ) = α3(µ) = 1, α4(µ) =

√r1 +Kr2

µ, α5(µ) =

√r2

µ.

Proof. It is clear that condition (i) of Definition 3.2.2 holds. Theorem 3.2.1 shows that

condition (ii) is satisfied and Lemmas 3.2.4 and 3.2.3 show that condition (iii) is satisfied.

2

63

3.3 A class of logarithmic barrier algorithms for solv-

ing SSPs

In §3.2 we have established that the parametric functions η(µ, ·) constitute a strongly

self-concordant family. Therefore, it is straightforward to develop primal path following

interior point algorithms for solving SSP (3.1.3, 3.1.4). In this section we introduce a

class of log barrier algorithms for solving this problem. This class is stated formally in

Algorithm 1.

Algorithm 1 The Decomposition Algorithm for Solving SSP

Require: ε > 0, γ ∈ (0, 1), θ > 0, β > 0, x0 ∈ F0 and µ0 > 0.x := x0, µ := µ0

while µ ≥ ε dofor k = 1, 2, . . . , K do

solve (3.1.9) to obtain (y(k), s(k),ν(k))choose a scaling element p ∈ C(s(k),ν(k)) and compute (s(k), ν(k))

end forcompute ∆x := −∇2

xxη(µ,x)−1∇xη(µ,x) using (3.1.19) and (3.1.20)

compute δ(µ,x) :=

√− 1

µ∆xT∇2

xxη(µ,x)∆x using (3.1.20)

while δ > β dox := x+ θ∆xfor k = 1, 2, . . . , K do

solve (3.1.9) to obtain (y(k), s(k),ν(k))choose a scaling element p ∈ C(s(k),ν(k)) and compute (s(k), ν(k))


xxη(µ,x)−1∇xη(µ,x) using (3.1.19) and (3.1.20)

compute δ(µ,x) :=

√− 1

µ∆xT∇2


end whileµ := γµapply inverse scaling to (s(k), ν(k))

end while

Our algorithm is initialized with a starting point x0 ∈ F0 and a starting value µ0 > 0

for the barrier parameter µ, and is indexed by a parameter γ ∈ (0, 1). We use δ as a

measure of the proximity of the current point x to the central path, and β as a threshold

64

for that measure. If the current x is too far away from the central path in the sense that

δ > β, Newton’s method is applied to find a point close to the central path. Then the

value of µ is reduced by a factor γ and the whole precess is repeated until the value of

µ is within the tolerance ε. By tracing the central path as µ approaches zero, a strictly

feasible ε-optimal solution to (3.1.11) will be generated.

3.4 Complexity analysis

In this section we present the complexity analysis for two variants of algorithms: short-

step algorithms and long-step algorithms, which are controlled by the input value of γ in

Algorithm 1.

As mentioned in [25], the first part of the following proposition follows directly from

the definition of self concordance and is due to [30, Theorem 2.1.1]. The second part is

results from the first part and is given in [48] without proof.

Proposition 3.4.1. For any µ > 0 and x ∈ F0, we denote ∆x := −∇2xxη(µ,x)−1∇xη(µ,x)

and δ :=√− 1µ∆xT∇2

xxη(µ,x)∆x. Then for δ < 1, τ ∈ [0, 1] and any h,h1,h2 ∈ Rm2 we

have

(i) −(1−τδ)2hT∇2xxη(µ,x)h ≤ −hT∇2

xxη(µ,x+τ∆x)h ≤ −(1−τδ)−2hT∇2xxη(µ,x)h,

(ii) |hT1 (∇2

xxη(µ,x+ τ∆x)−∇2xxη(µ,x))h2| ≤ [(1− τδ)−2 − 1]

√−hT

1∇2xxη(µ,x)h1√

−hT2∇2

xxη(µ,x)h2.

The following lemma is essentially Theorem 2.2.3 of [30] and describes the behavior

of the Newton method as applied to η(µ, ·).

Lemma 3.4.1. For any µ > 0 and x ∈ F0, let ∆x be the Newton direction defined by

∆x := −∇2xxη(µ,x)−1∇xη(µ,x); δ := δ(µ,x) =

√− 1µ∆xT∇2

xxη(µ,x)∆x, x+ = x +

∆x, ∆x+ be the Newton direction calculated at x+, and δ(µ,x+) :=√− 1µ∆x+T∇2

xxη(µ,x+)∆x+.

Then the following relations hold:

65

(i) If δ < 2−√

3, then δ(µ,x+) ≤(

δ

1− δ

)2

≤ δ2.

(ii) If δ ≥ 2−√

3, then η(µ,x)− η(µ,x+ θ∆x) ≥ µ(δ− ln(1 + δ)), where θ = (1 + δ)−1.

3.4.1 Complexity for short-step algorithm

In the short-step version of the algorithm, we decrease the barrier parameter by a factor

γ := 1 − σ/√r1 +Kr2, with σ < 0.1, in each iteration. The kth iteration of the short-

step algorithm is performed as follows: at the beginning of the iteration, we have µ(k−1)

and x(k−1) on hand and x(k−1) is close to the central path, i.e., δ(µ(k−1),x(k−1)) ≤ β.

After the barrier parameter µ is reduced from µ(k−1) to µk := γµ(k−1), we have that

δ(µk,x(k−1)) ≤ 2β. Then a full Newton step with size θ = 1 is taken to produce a new

point xk with δ(µk,xk) ≤ β. We now show that, in this class of algorithms, only one

Newton step is sufficient for recentering after updating the parameter µ. For the purpose

of proving this result, we present the following proposition which is a restatement of [30,

Theorem 3.1.1].

Proposition 3.4.2. Let χκ(η;µ, µ+) :=(

1+r22

+√r1+Kr2κ

)lnγ−1. Assume that δ(µ,x) <

κ and µ+ := γµ satisfies

χκ(η;µ, µ+) ≤ 1− δ(µ,x)

κ.

Then δ(µ+,x) < κ.

Lemma 3.4.2. Let µ+ = γµ, where γ = 1 − σ/√r1 +Kr2 and σ ≤ 0.1, and let β =

(2−√

3)/2. If δ(µ,x) ≤ β, then δ(µ+,x) ≤ 2β.

Proof. Let κ := 2β = 2 −√

3. Since δ(µ, y) ≤ κ/2, one can verify that for σ ≤ 0.1, µ+

satisfies

χκ(η;µ, µ+) ≤ 1

2≤ 1− δ(µ, y)

κ.

By Proposition 3.4.2, we have δ(µ+, y) ≤ κ. 2

66

By Lemmas 3.4.1(i) and 3.4.2, we conclude that we can reduce the parameter µ by the

factor γ := 1 − σ/√r1 +Kr2, σ < 0.1, at each iteration, and that only one Newton step

is sufficient to restore proximity to the central path. So we have the following complexity

result for short-step algorithms.

Theorem 3.4.1. Consider Algorithm 1 and let µ0 be the initial barrier parameter, ε >

0 the stopping criterion, and β = (2 −√

3)/2. If the starting point x0 is sufficiently

close to the central path, i.e., δ(µ0,x0) ≤ β, then the short-step algorithm reduces the

barrier parameter µ at a linear rate and terminates with at most O(√r1 +Kr2 ln(µ0/ε))

iterations.

3.4.2 Complexity for long-step algorithm

In the long-step version of the algorithm, the barrier parameter is decreased by an arbi-

trary constant factor γ ∈ (0, 1). It has a potential for larger decrease on the objective

function value, however, several damped Newton steps might be needed for restoring the

proximity to the central path.

The kth iteration of the long-step algorithms is performed as follows: at the beginning

of the iteration we have a point x(k−1), which is sufficiently close to x(µ(k−1)), where

x(µ(k−1)) is the solution to (3.1.11) for µ := µ(k−1). The barrier parameter is reduced

from µ(k−1) to µk := γµ(k−1), where γ ∈ (0, 1), and then a search is started to find a point

xk that is sufficiently close to x(µk). The long-step algorithm generates a finite sequence

consisting of N points in F0, and we finally take xk to be equal to the last point of this

sequence. We want to determine an upper bound on N , the number of Newton iterations

that are needed to find the point xk.

For µ > 0 and x ∈ F0, we define the function φ(µ,x) := η(µ,x(µ)) − η(µ,x), which

represents the difference between the objective value η(µk,x(k)) at the end of kth iteration

and the minimum objective value η(µk,x(µk−1)) at the beginning of kth iteration. Then

67

our task is to find an upper bound on φ(µ+,x). To do so, we first give upper bounds on

φ(µ,x) and φ′(µ,x) respectively. We have the following lemma.

Lemma 3.4.3. Let µ > 0 and x ∈ F0, we denote ∆x := x(µ)− x and define

δ := δ(µ,x) =

√− 1

µ∆xT∇2

xxη(µ,x)∆x.

For any µ > 0 and x ∈ F0, if δ < 1, then the following inequalities hold:

φ(µ,x) ≤ µ

(δ

1− δ+ ln(1− δ)

), (3.4.1)

|φ′(µ,x)| ≤ −√r1 +Kr2 ln(1− δ). (3.4.2)

Proof.

φ(µ,x) := η(µ,x(µ))− η(µ,x) :=

∫ 1

0

∇xη(µ,x+ τ∆x)T∆xdτ.

Since x(µ) is the optimal solution, we have

∇xη(µ,x(µ)) = 0 (3.4.3)

Hence, with the aid of Proposition 3.4.1(i), we get

φ(µ,x) =

∫ 1

0

∫ τ

1

∆xT∇2xxη(µ,x+ α∆x)∆x dαdτ

≤ −∫ 1

0

∫ 1

τ

∆xT∇2xxη(µ,x)∆x

(1− αδ)2dαdτ

=

∫ 1

0

∫ 1

τ

µδ2

(1− αδ)2dαdτ

= µ

(δ

1− δ+ ln(1− δ)

),

which establishes (3.4.1).

Now, for any µ > 0, by applying the chain rule, using (3.4.3), and applying the

68

Mean-Value Theorem, we get

φ′(µ,x) = η′(µ,x(µ))− η′(µ,x) +∇xη(µ,x(µ))Tx′(µ)

= η′(µ,x(µ))− η′(µ,x) = ∇xη(µ,x+$∆x)T∆x,(3.4.4)

for some $ ∈ (0, 1). Hence

|φ′(µ,x)| =

∣∣∣∣∫ 1

0

∇xη′(µ,x+ τ∆x)T∆x dτ

∣∣∣∣≤

∫ 1

0

√−∆xT∇2

xxη(µ,x+ τ∆x)∆x√−∇xη′(µ,x+ τ∆x)T[∇2

xxη(µ,x+ τ∆x)]−1∇xη′(µ,x+ τ∆x) dτ.

Then, by using (3.2.5) and Proposition 3.4.1(i), we obtain

|φ′(µ,x)| ≤∫ 1

0

√−∆xT∇2

xxη(µ,x)∆x

1− τ δ

√r1 +Kr2

µdτ

=

∫ 1

0

δ√µ

1− τ δ

√r1 +Kr2

µdτ

=√r1 +Kr2

∫ 1

0

δ

1− τ δdτ

= −√r1 +Kr2 ln(1− δ),

which establishes (3.4.2). 2

Lemma 3.4.4. Let µ > 0 and x ∈ F0 be such that δ < 1, where δ is as defined in Lemma

3.4.3. Let µ+ := γµ with γ ∈ (0, 1). Then

η(µ+,x(µ+))− η(µ+,x) ≤ O(r1 +Kr2)µ+.

Proof. Differentiating (3.4.4) with respect to µ, we get

φ′′(µ,x) = η′′(µ,x(µ))− η′′(µ,x) +∇xη′(µ,x(µ))Tx′(µ). (3.4.5)

69

We will work on the right-hand side of (3.4.5) by verifying that the second term

η′′(µ,x) is nonnegative and then bounding the first and the last term.

From the definition of η(x, µ), we have that η′′(µ,x) =∑K

k=1 ρ′′k(µ,x). Differentiating

ρk(µ,x) with respect to µ and using (3.2.4) and (3.1.10) gives

ρ′k(µ,x) = d(k)Ty(k)′ + ln det s(k) + µ s(k)−1 • s(k)′

= ln det s(k) + d(k)T(−R(k)−1W (k)T s(k)−1

) + µ s(k)−1 • (W (k)R(k)−1W (k)T s(k)−1

)

= ln det s(k) + d(k)T(−R(k)−1W (k)T s(k)−1

) + ν(k) • (W (k)R(k)−1W (k)T s(k)−1

)

= ln det s(k) + (−d(k) + W (k)Tν(k))T(R(k)−1W (k)T s(k)−1

)

= ln det s(k).

Observe that (I − P (k))2 = I − P (k) (remember P (k)2= P (k)). Therefore, by differen-

tiating ρ′k(µ,x) with respect to µ and using (3.2.4) and (3.1.18), we obtain

ρ′′k(µ,x) = s(k)−1 • s(k)′

= s(k)−1 • (W (k)R(k)−1W (k)T s(k)−1

)

= s(k)−1 •(⌈s(k)1/2

⌉ (I − P (k)

) ⌈s(k)1/2

⌉s(k)−1

)=

∥∥∥ (I − P (k)) ⌈s(k)1/2

⌉s(k)−1

∥∥∥2

≥ 0.

Thus, for µ > 0 and x ∈ F0, η′′(µ,x) ≥ 0, and hence η(µ,x) is a convex function of

µ. In addition, using (3.1.18), we also have

ρ′′k(µ,x) = s(k)−1 •(⌈s(k)⌉s(k)−1 −

⌈s(k)1/2

⌉P (k)

⌈s(k)1/2

⌉s(k)−1

)= s(k)−1 •

⌈s(k)⌉s(k)−1 − s(k)−1 •

⌈s(k)1/2

⌉P (k)

⌈s(k)1/2

⌉s(k)−1

= s(k)−1 •⌈s(k)⌉s(k)−1 −

∥∥∥P (k)⌈s(k)1/2

⌉s(k)−1

∥∥∥2

≤ s(k)−1 •⌈s(k)⌉s(k)−1

= s(k)−1 • s(k)

70

= µ−1 ν(k) • s(k)

= µ−1 trace(e2)

=r2

µ.

Hence

η′′(µ,x(µ)) ≤ Kr2

µ. (3.4.6)

By differentiating (3.4.3) with respect to µ, we get

∇xη′(µ,x(µ)) +∇2

xxη(µ,x(µ))x′(µ) = 0,

or equivalently

x′(µ) = −∇2xxη(µ,x(µ))−1∇xη

′(µ,x(µ)).

Hence, by using (3.2.5), we have

−∇xη′(µ,x(µ))Tx′(µ) = ∇xη

′(µ,x(µ))T∇2xxη(µ,x(µ))−1∇xη

′(µ,x(µ))

≤ 1

µ(r1 +Kr2).

(3.4.7)

By combining (3.4.6) and (3.4.7), and using the fact that η′′(µ,x) ≥ 0, we obtain

φ′′(µ,x(µ)) ≤ r1 + 2Kr2

µ. (3.4.8)

Applying the Mean-Value Theorem and using Lemma 3.4.3 and (3.4.8) gives

φ(µ+,x) = φ(µ,x) + φ′(µ,x)(µ+ − µ) +

∫ µ+

µ

∫ τ

µ

φ′′(υ,x) dυdτ

≤ µ

(δ

1− δ+ ln(1− δ)

)−√r1 +Kr2 ln(1− δ) (µ− µ+)

+(r1 + 2Kr2)

∫ µ+

µ

∫ τ

µ

υ−1 dυdτ

71

= µ

(δ

1− δ+ ln(1− δ)

)−√r1 +Kr2 ln(1− δ) (µ− µ+)

+(r1 + 2Kr2) (µ− µ+) ln τµ

≤ µ

(δ

1− δ+ ln(1− δ)

)−√r1 +Kr2 ln(1− δ) (µ− µ+)

+(r1 + 2Kr2) (µ− µ+) lnγ−1

(recall γ−1 = µ+/µ ≥ τ/µ). Since δ and γ are constants, the lemma is established. 2

Notice that the previous lemma requires δ < 1. However, evaluating δ explicitly may

not be possible. In the next lemma we will see that δ is actually proportional to δ, which

can be evaluated.

Lemma 3.4.5. For any µ > 0 and x ∈ F0, let ∆x := −∇2xxη(µ,x)−1∇xη(µ,x) and

∆x := x− x(µ). We denote

δ := δ(µ,x) =

√− 1

µ∆xT∇2

xxη(µ,x)∆x and δ := δ(µ,x) =

√− 1

µ∆xT∇2

xxη(µ,x)∆x.

If δ < 1/6, then 23δ ≤ δ ≤ 2δ.

Proof. Let H := ∇2xxη(µ,x) and g := ∇xη(µ,x), and denote g := g + H∆x. From

(3.4.3), we have ∆x = −H−1g. This gives us ∆x = ∆x+H−1g. We then have

δ =

√− 1

µ(∆x+H−1g)T∇2

xxη(µ,x)(∆x+H−1g)

≤

√√√√− 1

µ∆xT∇2

xxη(µ,x)∆x− 1

µ(H−1g)T∇2

xxη(µ,x)︸︷︷︸H

(H−1g)

≤ δ +

√− 1

µgTH−1g,

(3.4.9)

where we used the triangle inequality to obtain the last inequality. Note that, by (3.4.3),

72

we have ∇xη(µ,x− ∆x) = 0. Applying the Mean-Value Theorem gives

−hTg = −hT(H∆x+ g)

= −hT(∇2xxη(µ,x)∆x+∇xη(µ,x))

= −hT(∇2xxη(µ,x)∆x− (∇xη(µ,x− ∆x)−∇xη(µ,x)))

= −hT(∇2xxη(µ,x)−∇2

xxη(µ,x− (1−$)∆x))∆x, for some $ ∈ (0, 1).

Now in view of Proposition 3.4.1(ii) we have

−hTg = −∫ 1

0

hT(∇2xxη(µ,x)−∇2

xxη(µ,x− (1− τ)∆x))∆x dτ

≤√−∆xTH∆x

√−hTHh

∫ 1

0

((1− (1− τ)δ)−2 − 1)dτ

=

(δ

1− δ

) √−∆xTH∆x

√−hTHh

=

(√µ δ2

1− δ

)√−hTHh.

It can be verified that −gTH−1g = maxhTHh − 2hTg : h ∈ Rm1. It then follows

that

−gTH−1g ≤ max

hTHh+

(2√µ δ2

1− δ

)√−hTHh : h ∈ Rm1

=

µ δ4

(1− δ)2. (3.4.10)

From (3.4.9) and (3.4.10), we obtain δ ≤ δ+ δ2

1−δ , or equivalently, 2δ2−(1+δ) δ+δ ≥ 0.

Therefore, the condition δ ≤ 1/6 implies that 12δ2− 7 δ+ 1 ≥ 0, which in turn gives that

δ ≤ 1/3 = 2δ.

From (3.4.9), by exchanging positions of ∆x and ∆x and following the above steps,

we get

δ ≤ δ +δ2

1− δ, or equivalently, δ ≤ δ

1− δ≤ δ

1− 13

=3

2δ.

Thus, the condition δ < 1/6 implies that 23δ ≤ δ ≤ 2δ. This completes the proof. 2

73

Combining Lemmas 3.4.1(ii), 3.4.4, and 3.4.5, we have the following complexity result

for long-step algorithms.

Theorem 3.4.2. Consider Algorithm 1 and let µ0 be the initial barrier parameter, ε > 0

the stopping criterion, and β = 1/6. If the starting point x0 is sufficiently close to the

central path, i.e., δ(µ0,x0) ≤ β, then the long-step algorithm reduces the barrier parameter

µ at a linear rate and terminates with at most O((r1 +Kr2) ln(µ0/ε)) iterations.

74

Chapter 4

A Class of Polynomial Volumetric

Barrier Decomposition Algorithms

for Stochastic Symmetric

Programming

Ariyawansa and Zhu [10] have derived a class of polynomial time decomposition-based

algorithms for solving the SSDP problem (Problem 4) based on a volumetric barrier

analogous to work of Mehrotra and Ozevin [25] by utilizing the work of Anstreicher [7]

for DSDP. In this chapter, we extend Ariyawansa and Zhu’s work [10] to the case of

SSPs by deriving a class of volumetric barrier decomposition algorithms for the general

SSP problem and establishing polynomial complexity of certain members of the class of

algorithms. The results of this chapter have been submitted for publication [4].

75

4.1 The volumetric barrier problem for SSPs

In this section we formulate appropriate volumetric barrier function for the SSP problem

(with finite event space Ω) and obtain expressions for the derivatives required in the rest

of the paper. Our procedure closely follows that in §3 of [10], although our setting is much

more general.

4.1.1 Formulation and assumptions

We now examine (2.1.3, 2.1.4) when the event space Ω is finite. Let (T (k),W (k),h(k),d(k)) :

k = 1, 2, . . . , K be the set of the possible values of the random variables(T (ω),W (ω),h(ω),

d(ω))

and let pk := P(T (ω),W (ω),h(ω),d(ω)

)=(T (k),W (k),h(k),d(k)

)be the associated

probability for k = 1, 2, . . . , K. Then Problem (2.1.3, 2.1.4) becomes

max cTx+K∑k=1

pkQ(k)(x)

s.t. Ax KJ1 b,(4.1.1)

where, for k = 1, 2, . . . , K, Q(k)(x) is the maximum of the problem

max d(k)Ty(k)

s.t. W (k)y(k) KJ2 h(k) − T (k)x,

(4.1.2)

where x ∈ Rm1 is the first-stage decision variable, and y(k) ∈ Rm2 is the second-stage

variable for k = 1, 2, . . . , K.

We notice that the constraints in (4.1.1, 4.1.2) are negative symmetric while the com-

mon practice in the DSP literature is to use positive symmetric constraints. So for conve-

nience we redefine d(k) as d(k) := pkd(k) for k = 1, 2, . . . , K, and rewrite Problem (4.1.1,

76

4.1.2) as

min cTx+K∑k=1

Q(k)(x)

s.t. Ax− b KJ1 0,

(4.1.3)

where, for k = 1, 2, . . . , K, Q(k)(x) is the maximum of the problem

min d(k)Ty(k)

s.t. W (k)y(k) + T (k)x− h(k) KJ2 0.(4.1.4)

In the rest of this paper our attention will be on Problem (4.1.3, 4.1.4), and from now

on when we use the acronym “SSP” in this paper we mean Problem (4.1.3, 4.1.4).

4.1.2 The volumetric barrier problem for SSPs

In this section we formulate a volumetric barrier for SSPs and obtain expressions for the

derivatives required in our subsequent development.

In order to define the volumetric barrier problem for the SSP (4.1.3, 4.1.4), we need

to make some assumptions. First we define

F1 :=x : s1(x) := Ax− b KJ1 0

;

F (k)(x) :=y(k) : s

(k)2 (x,y(k)) := W (k)y(k) + T (k)x− h(k) KJ2 0

for k = 1, 2, . . . , K;

F2 :=x : F (k)(x) 6= ∅, k = 1, 2, . . . , K

;

F0 := F1

⋂F2.

Now we make

Assumption 4.1.1. The matrix A and every matrix T (k) have full column rank.

Assumption 4.1.2. The set F0 is nonempty.

Assumption 4.1.3. For each x ∈ F0 and for k = 1, 2, . . . , K, Problem (4.1.4) has a

nonempty isolated compact set of minimizers.

77

Assumption 4.1.1 is for convenience. Under Assumption 4.1.2, the set F1 is nonempty.

The logarithmic barrier [30] for F1 is the function l1 : F1 → R defined by

l1(x) := − ln det(s1(x)), ∀x ∈ F1,

and the volumetric barrier [30, 40] for F1 is the function v1 : F1 → R defined by

v1(x) :=1

2ln det(∇2

xxl1(x)), ∀x ∈ F1.

Also under Assumption 4.1.2, F2 is nonempty and for x ∈ F2, F (k)(x) is nonempty for k =

1, 2, . . . , K. The logarithmic barrier [30] for F (k)(x) is the function l(k)2 : F2×F (k)(x)→ R

defined by

l(k)2 (x,y(k)) := −ln det(s

(k)2 (x,y(k))), ∀y(k) ∈ F (k)(x), x ∈ F2,

and the volumetric barrier [30, 40] for F (k)(x) is the function v(k)2 : F2 × F (k)(x) → R

defined by

v(k)2 (x,y(k)) :=

1

2ln det(∇2

y(k)y(k)l(k)2 (x,y(k))), ∀y(k) ∈ F (k)(x), x ∈ F2.

Now, we define the volumetric barrier problem for the SSP (4.1.3, 4.1.4) as

min η(µ,x) : = cTx+K∑k=1

ρk(µ,x) + µc1v1(x), (4.1.5)

where for k = 1, 2, . . . , K and x ∈ F0, ρk(µ,x) is the minimum of the problem

min d(k)Ty(k) + µc2v(k)2 (x,y(k)). (4.1.6)

78

Here c1 := 225√n1 and c2 := 450n3

2 are constants, and µ > 0 is the barrier parameter.

Now, we will show that (4.1.6) has a unique minimizer for each x ∈ F0 and for

k = 1, 2, . . . , K. For this purpose, we present the following theorem.

Theorem 4.1.1 (Fiacco and Mccormick [19, Theorem 8]). Consider the inequality con-

strained problem

min f(x)

s.t. gi(x) ≥ 0, i = 1, 2, . . . ,m,(4.1.7)

where the functions f, g1, . . . , gm : Rn → R are continuous. Let I be a scalar-valued

function of x with the following two properties: I(x) is continuous in the region R0 :=

x : gi(x) > 0, i = 1, 2, . . . ,m, which is assumed to be nonempty; if xk is any infinite

sequence of points in R0 converging to xB such that gi(xB) = 0 for at least one i, then

limk→∞ I(xk) = +∞. Let τ be a scalar-valued function of the single variable s with the

following two properties: if s1 > s2 > 0, then τ(s1) > τ(s2) > 0; if sk is an infinite

sequence of points such that limk→∞ sk = 0, then limk→∞ τ(sk) = 0. Let U : R0×R+ → R

be defined by U(x, s) := f(x)+τ(s)I(x). If (4.1.7) has a nonempty, isolated compact set of

local minimizers and sk is a strictly decreasing infinite sequence, then the unconstrained

local minimizers of U(·, sk) exist for sk small.

Lemma 4.1.1. If Assumptions 4.1.2 and 4.1.3 hold, then for each x ∈ F0 and k =

1, 2, . . . , K, the Problem (4.1.6) has a unique minimizer for µ small.

Proof. For any x ∈ F0, v(k)2 (x,y(k)) is defined on the nonempty set F (k)(x). By Theorem

1.3.1, there exist real numbers λ(k)1 , λ

(k)2 , · · · , λ(k)

r2 and a Jordan frame c(k)1 , c

(k)2 , · · · , c(k)

r2

such that s(k)2 (x,y(k)) = λ

(k)1 c

(k)1 + λ

(k)2 c

(k)2 + · · ·+ λ

(k)r2 c

(k)r2 , and λ

(k)1 , λ

(k)2 , · · · , λ(k)

r2 are the

eigenvalues of s(k)2 (x,y(k)). Moreover, λ

(k)j can be viewed as a function of y(k) ∈ F (k)(x)

for j = 1, 2, . . . , r2. Then λ(k)j is continuous for j = 1, 2, . . . , r2 and hence the constraint

s(k)2 (x,y(k)) KJ2 0 can be replaced by the constraints: λ

(k)j (y(k)) > 0, j = 1, 2, . . . , r2.

So (4.1.4) can be rewritten in the form of (4.1.7). Therefore, by Theorem 4.1.1, local

79

minimizers of (4.1.6) exist for each x ∈ F0 and k = 1, 2, . . . K for µ small. The uniqueness

of the minimizer follows from the fact that v(k)2 is strictly convex. 2

In view of Lemma 4.1.1, Problem (4.1.5) is well-defined, and its feasible set is F0.

4.1.3 Computation of ∇xη(µ,x) and ∇2xxη(µ,x)

In order to compute the derivatives of η we need to determine the derivatives of ρk, k =

1, 2, . . . , K, which in turn require the derivatives of v(k)2 and l

(k)2 for k = 1, 2, . . . , K. We

will drop the superscript (k) when it does not lead to confusion.

Note that

∇xs1(x) = A, ∇xs2(x,y) = T, and ∇ys2(x,y) = W.

Hence

∇xl1(x) = −(∇xs1)Ts−11 = −ATs−1

1 , and ∇yl2(x,y) = −(∇ys2)Ts−12 = −WTs−1

2 .

This implies that

∇2xxl1(x) = ATds−1

1 e∇xs1 = ATds−11 eA,

and

H := ∇2yyl2(x,y) = WTds−1

2 e∇xs2 = WTds−12 eW.

We need the matrix calculus result for our computation.

Proposition 4.1.1. Let X ∈ Rn×n be nonsingular . Then

∂

∂Xij

X−1 = −X−1eieTjX

−1,

for i, j = 1, 2, . . . , n. Here, ei is the ith vector in the standard basis for Rn.

80

To compute the first partial derivatives of v2(x,y), we start by observing that

∂

∂xk

⌈s−1

2

⌉=

∑ij

∂

∂ ds2eij

⌈s−1

2

⌉ ∂

∂xkds2eij

=∑ij

∂

∂ ds2eijds2e−1 ∂

∂xkds2eij

= −2∑ij

ds2e−1 eieTj ds2e−1

⌈s2,

∂

∂xks2

⌉ij

= −2∑ij

ds2e−1 eieTj ds2e−1 ds2, tkeij

= −2⌈s−1

2

⌉ds2, tke

⌈s−1

2

⌉.

(4.1.8)

Then, for i = 1, 2, . . . ,m1, we have

∂

∂xiv2(x,y) = 1

2

∂

∂xiln det H

= 12H−1 • ∂

∂xiH

= 12H−1 • ∂

∂xi(WTds−1

2 eW )

= 12H−1 •WT

(∂

∂xids−1

2 e)W

= −H−1 •WT ds2e−1 ds2, tie ds2e−1W

= −WTH−1W • ds2e−1 ds2, tie ds2e−1

= −⌈s−1/22

⌉WTH−1W

⌈s−1/22

⌉•⌈s−1/22

⌉ds2, tie

⌈s−1/22

⌉By defining

P :=⌈s−1/22

⌉WTH−1W

⌈s−1/22

⌉,

which acts as the orthogonal projection onto the range of⌈s−1/22

⌉W , we get

∂

∂xiv2(x,y) = −WTH−1W •

⌈s−1

2

⌉ds2, tie

⌈s−1

2

⌉= −P •

⌈s−1/22

⌉ds2, tie

⌈s−1/22

⌉,

(4.1.9)

for i = 1, 2, . . . ,m1.

81

Similarly, we have

∂

∂yiv2(x,y) = −WTH−1W •

⌈s−1

2

⌉ds2,wie

⌈s−1

2

⌉= −P •

⌈s−1/22

⌉ds2,wie

⌈s−1/22

⌉,

for i = 1, 2, . . . ,m2.

To compute the second partial derivatives of v2(x,y), we start by observing that

∂

∂xkH−1 =

∑ij

∂

∂Hij

H−1 ∂

∂xkHij

= −∑ij

(H−1eieTjH−1)

∂

∂xk

[WT

⌈s−1

2

⌉W]ij

= −∑ij

(H−1eieTjH−1)[WT

(∂

∂xk

⌈s−1

2

⌉)W]ij

= 2∑ij

(H−1eieTjH−1)[WT ds2e−1 ds2, tke ds2e−1W

]ij

= 2H−1WT⌈s−1

2

⌉ds2, tke

⌈s−1

2

⌉WH−1,

(4.1.10)

and, using (4.1.8) and Lemma 1.3.2, that

∂

∂xj

(⌈s−1

2

⌉ds2, tie

⌈s−1

2

⌉)=

(∂

∂xj

⌈s−1

2

⌉)ds2, tie

⌈s−1

2

⌉+⌈s−1

2

⌉( ∂

∂xjds2, tie

)⌈s−1

2

⌉+⌈s−1

2

⌉ds2, tie

(∂

∂xj

⌈s−1

2

⌉)= −2

⌈s−1

2

⌉ds2, tje

⌈s−1

2

⌉ds2, tie

⌈s−1

2

⌉+⌈s−1

2

⌉dtj, tie

⌈s−1

2

⌉−2

⌈s−1

2

⌉ds2, tie

⌈s−1

2

⌉ds2, tje

⌈s−1

2

⌉= −2

⌈s−1

2

⌉ds2, tje

⌈s−1

2

⌉ds2, tie

⌈s−1

2

⌉+⌈s−1

2

⌉ (dtj, tie − 2 ds2, tie

⌈s−1

2

⌉ds2, tje

) ⌈s−1

2

⌉.

(4.1.11)

By combining (4.1.9), (4.1.10) and (4.1.11), we get

∇2xxv2(y,x) =

∂2

∂x∂xv2(x,y) = 2Qxx +Rxx − 2Txx, (4.1.12)

82

where

Qxxi,j = (WH−1WT) •

⌈s−1

2

⌉ds2, tje

⌈s−1

2

⌉ds2, tie

⌈s−1

2

⌉,

Rxxi,j = (WH−1WT) •

⌈s−1

2

⌉ (2 ds2, tie

⌈s−1

2

⌉ds2, tje − dtj, tie

) ⌈s−1

2

⌉,

Txxi,j = (WH−1WT) •

⌈s−1

2

⌉ds2, tje

⌈s−1

2

⌉WH−1W

⌈s−1

2

⌉ds2, tie

⌈s−1

2

⌉.

By following similar steps as above, we have that

∇2yyv2(y,x) =

∂2

∂y∂yv2(x,y) = 2Qyy +Ryy − 2T yy,

where

Qyyi,j = (WH−1WT) •

⌈s−1

2

⌉ds2,wje

⌈s−1

2

⌉ds2,wie

⌈s−1

2

⌉,

Ryyi,j = (WH−1WT) •

⌈s−1

2

⌉ (2 ds2,wie

⌈s−1

2

⌉ds2,wje − dwj,wie

) ⌈s−1

2

⌉,

T yyi,j = (WH−1WT) •

⌈s−1

2

⌉ds2,wje

⌈s−1

2

⌉WH−1W

⌈s−1

2

⌉ds2,wie

⌈s−1

2

⌉;

∇2xyv2(x,y) =

∂2

∂y∂xv2(x,y) = 2Qxy +Rxy − 2Txy,

where

Qxyi,j = (WH−1WT) •

⌈s−1

2

⌉ds2,wje

⌈s−1

2

⌉ds2, tie

⌈s−1

2

⌉,

Rxyi,j = (WH−1WT) •

⌈s−1

2

⌉ (2 ds2, tie

⌈s−1

2

⌉ds2,wje − dwj, tie

) ⌈s−1

2

⌉,

Txyi,j = (WH−1WT) •

⌈s−1

2

⌉ds2,wje

⌈s−1

2

⌉WH−1W

⌈s−1

2

⌉ds2, tie

⌈s−1

2

⌉;

and

∇2yxv2(y,x) =

∂2

∂x∂yv2(x,y) = 2Qyx +Ryx − 2T yx,

where

Qyyi,j = (WH−1WT) •

⌈s−1

2

⌉ds2, tje

⌈s−1

2

⌉ds2,wie

⌈s−1

2

⌉,

83

Ryyi,j = (WH−1WT) •

⌈s−1

2

⌉ (2 ds2,wie

⌈s−1

2

⌉ds2, tje − dtj,wie

) ⌈s−1

2

⌉,

T yxi,j = (WH−1WT) •

⌈s−1

2

⌉ds2, tje

⌈s−1

2

⌉WH−1W

⌈s−1

2

⌉ds2,wie

⌈s−1

2

⌉.

Now, define ϕk : R+ ×F0 ×F (k)(x) −→ R by

ϕk(µ,x,y) := dTy + µc2v2(x,y).

By (4.1.6) we then have

ρk(µ,x) = miny∈F(k)(x)

ϕk(µ,x,y)

and

ρk(µ,x) = ϕk(µ,x,y)|y=y = ϕk(µ,x, y),

where y is the minimizer of (4.1.6). Observe that y is a function of x and is defined by

∇yϕk(µ,x,y)|y=y = 0. (4.1.13)

Note that, by (4.1.13), we have ∇yv2(x, y) = ∇yv2(x,y)|y=y = − 1µc2d. This implies that

∇2yxv2(x, y) = ∇x∇yv2(x, y) = 0 and ∇3

yxxv2(x, y) = ∇2xx∇yv2(x, y) = 0.

Now we are ready to calculate the first and second order derivatives of ρk with respect to

x. We have

∇xρk(µ,x) = [∇xϕk(µ,x,y) +∇yϕk(µ,x,y) ∇xy]|y=y

= ∇xϕk(µ,x,y)|y=y +∇yϕk(µ,x,y)|y=y ∇xy|y=y

= ∇xϕk(µ,x,y)|y=y

= µc2∇xv2(x,y)|y=y

84

= µc2∇xv2(x, y),

∇2xxρk(µ,x) = µ∇x∇xv2(x, y)

= µc2∇2xxv2(x, y) +∇2

yxv2(x, y) [∇xy]|y=y

= µc2∇2xxv2(x, y),

∇3xxxρk(µ,x) = µ∇x∇2

xxv2(x, y

= µc2∇3xxxv2(x, y) +∇3

yxxv2(x, y) [∇xy]|y=y

= µc2∇3xxxv2(x, y).

In summary we have

∇xρk(µ,x) = µc2∇xv(k)2 (x, y(k)),

∇2xxρk(µ,x) = µc2∇2

xxv(k)2 (x, y(k)),

∇3xxxρk(µ,x) = µc2∇3

xxxv(k)2 (x, y(k)),

(4.1.14)

and

∇xη(µ,x) = c+ µc1∇xv1(x) +K∑k=1

µc2∇xv(k)2 (x, y(k)),

∇2xxη(µ,x) = µc1∇2

xxv1(x) +K∑k=1

µc2∇2xxv

(k)2 (x, y(k)),

(4.1.15)

where ∇xv(k)2 (x, y(k)), ∇2

xxv(k)2 (x, y(k)), and ∇3

xxxv(k)2 (x, y(k)) are calculated in (4.1.9),

(4.1.12), and (4.2.4) respectively.

4.2 Self-concordance properties of the volumetric bar-

rier recourse

In this section we prove that the recourse function with volumetric barrier is a strongly

self-concordant function leading to a strongly self-concordant family with appropriate

85

parameters. Establishing this allows us to develop volumetric barrier polynomial time

path following interior point methods for solving SSPs.

We need the following proposition for proving some results.

Proposition 4.2.1. Let A,B,C ∈ Rn×n. Then

1. A,B 0 implies that A •B ≥ 0;

2. if A 0 and B C, then A •B ≥ A • C.

4.2.1 Self-Concordance of η(µ, ·)

This subsection is devoted to show that η(µ, ·) is a strongly self-concordant barrier on F0

(see Definition 3.2.1). Throughout this subsection, with respect to h ∈ Rm1 , we define

b := b(h) :=

m1∑i=1

hiti and b :=⌈s−1/22

⌉ds2, be

⌈s−1/22

⌉e2.

Our proof relies on the following lemmas.

Lemma 4.2.1. Let (x,y) be such that s2(x,y) KJ2 0. Then we have

0 Qxx ∇2xxv2(x,y). (4.2.1)

Proof. Let h ∈ Rm1 , h 6= 0. We have

hTQxxh =∑i,j

Qxxij hihj

= (WH−1WT) •∑i,j

( ⌈s−1

2

⌉ds2, tje

⌈s−1

2

⌉ds2, tie

⌈s−1

2

⌉hihj

)= (WH−1WT) •

⌈s−1

2

⌉ (∑i,j

ds2, hjtje⌈s−1

2

⌉ds2, hitie

) ⌈s−1

2

⌉= (WH−1WT) •

⌈s−1

2

⌉ [∑j

ds2, hjtje] ⌈s−1

2

⌉ [∑i

ds2, hitie] ⌈s−1

2

⌉= (WH−1WT) •

⌈s−1

2

⌉ds2, be

⌈s−1

2

⌉ds2, be

⌈s−1

2

⌉

86

=(⌈s−1/22

⌉WH−1WT

⌈s−1/22

⌉)•(⌈s−1/22

⌉ds2, be

⌈s−1/22

⌉)2

= P •⌊b2⌋.

Similarly we have

hTRxxh = (WH−1WT) •(

2⌈s−1

2

⌉ds2, be

⌈s−1

2

⌉ds2, be

⌈s−1

2

⌉−⌈s−1

2

⌉dbe⌈s−1

2

⌉ )=

(⌈s−1/22

⌉WH−1WT

⌈s−1/22

⌉)•(

2(⌈s−1/22

⌉ds2, be

⌈s−1/22

⌉)2

−⌈s−1/22

⌉dbe⌈s−1/22

⌉)= P •

⌈b⌉,

and

hTTxxh = (WH−1WT) • ds2e−1 ds2, be⌈s−1

2

⌉WH−1W

⌈s−1

2

⌉ds2, be

⌈s−1

2

⌉= P •

⌈s−1/22

⌉ds2, be

⌈s−1/22

⌉P⌈s−1/22

⌉ds2, be

⌈s−1/22

⌉= P •

⌊b⌋P⌊b⌋.

Using Proposition 4.2.1 and observing that P 0 and⌊b2⌋ 0, we conclude that

Qxx 0.

Since P is a projection, we have that I − P 0 and therefore

⌊b⌋P⌊b⌋⌊b⌋2

=1

2

(⌊b2⌋

+⌈b⌉). (4.2.2)

This implies that

P •⌊b⌋P⌊b⌋≤ 1

2P •

(⌊b2⌋

+⌈b⌉),

which is exactly hTTxxh ≤ 1

2hT(Qxx +Rxx)h. Since h is arbitrary, we have shown that

Txx 1

2(Qxx +Rxx), which together with Qxx 0 establishes the result. 2

87

Lemma 4.2.2. For any h ∈ Rm1, and (x,y) be such that s2(x,y) KJ2 0. Then

‖b‖ ≤√

2 r3/22 (hTQxxh)1/2. (4.2.3)

Proof. By Theorem 1.3.1, there exist real numbers λ1, λ2, · · · , λr2 and a Jordan frame

c1, c2, · · · , cr2 such that b = λ1c1 +λ2c2 + · · ·+λr2cr2 , and λ1, λ2, · · · , λr2 are the eigenval-

ues of b. Without loss of generality (scaling h as needed, and re-ordering indices), we may

assume that 1 = |λ1| ≥ |λ2| ≥ . . . ≥ λr2 . In view of Lemma 1.3.3, the matrix⌊b2⌋

has a

full set of orthonormal eigenvectors cij with corresponding eigenvalues (1/2)(λ2i +λ2

j), for

1 ≤ i ≤ j ≤ r2. It follows that

hTQxxh =1

2P

[r2∑

i,j=1

(λ2i + λ2

j) cijcTij

]

=1

2

r2∑i,j=1

(λ2i + λ2

j) cTijPcij.

Recall that P is a projection onto an m2-dimensional space. So, we can write P as

P =

m2∑l=1

uluTl ,

where u1,u2, . . . ,um2 are the orthonormal eigenvectors of P corresponding to the nonzero

eigenvalues of P . Consider uk for some k, we have

uk =

r2∑i,j=1

αijcij,

for some constants αij, for i, j = 1, 2, . . . , r2, and

1 = ‖uk‖ =∥∥∥ r2∑i,j=1

αijcij

∥∥∥ ≤ r2∑i,j=1

‖αijcij‖ =

r2∑i,j=1

|αij|.

88

This means that there exist ik, jk such that

|αikjk | ≥1

r22

.

Thus

hTQxxh =1

2

∑i,j

(λ2i + λ2

j)cTij

(m2∑l=1

uluTl

)cij

=1

2

∑i,j

(λ2i + λ2

j)

m2∑l=1

cTijuluTl cij

=1

2

∑i,j

(λ2i + λ2

j)

m2∑l=1

‖uTl cij‖2

≥ 1

2

∑i,j

(λ2i + λ2

j)‖uTk cij‖2

=1

2

∑i,j

(λ2i + λ2

j)|αikjk |2

≥ 1

2

∑i,j

(λ2i + λ2

j)1

r42

≥ 1

2r42

∑j

(λ2i + λ2

j)

≥ 1

2r42

∑j

λ2i

=1

2r42

∑j

‖b‖22

=1

2r32

‖b‖22.

The result is established. 2

We will next compute the third partial derivative of v2(x,y) with respect to x. To

start, let (x,y) be such that s2(x,y) KJ2 0, and h ∈ Rn2 . We have

∂

∂xihTQxxh = W

(∂

∂xiH−1

)WT •

⌈s−1

2

⌉ds2, be

⌈s−1

2

⌉ds2, be

⌈s−1

2

⌉+ (WH−1WT) • ∂

∂xj

(⌈s−1

2

⌉ds2, be

⌈s−1

2

⌉ds2, be

⌈s−1

2

⌉)= 2WH−1WT ds2e−1 ds2, tie ds2e−1WH−1WT •

(⌈s−1

2

⌉ds2, be

)2 ⌈s−1

2

⌉+ (WH−1WT) • ∂

∂xi

(⌈s−1

2

⌉ds2, be

⌈s−1

2

⌉ds2, be

⌈s−1

2

⌉),

89

where

∂

∂xi

(⌈s−1

2

⌉ds2, be

⌈s−1

2

⌉ds2, be

⌈s−1

2

⌉)= −2

⌈s−1

2

⌉ (ds2, tie

⌈s−1

2

⌉ds2, be

⌈s−1

2

⌉ds2, be

+ ds2, be⌈s−1

2

⌉ds2, tie

⌈s−1

2

⌉ds2, be

+ ds2, be⌈s−1

2

⌉ds2, be

⌈s−1

2

⌉ds2, tie

) ⌈s−1

2

⌉+⌈s−1

2

⌉ (dti, be

⌈s−1

2

⌉ds2, be

+ ds2, be⌈s−1

2

⌉dti, be

) ⌈s−1

2

⌉.

We conclude that the first directional derivative of hTQxxh with respect to x, in the

direction h, is given by

∇xhTQxxh [h] =

m1∑i=1

hi∂

∂xihTQxxh

= 2P •⌊b⌋P⌊b2⌋− 3P •

⌊b3⌋− P

⌈b2, b

⌉.

By following similar steps as above, we obtain

∇xhTRxxh [h] = 2P •

⌊b⌋P⌈b⌉− 4P •

⌈b2, b

⌉,

∇xhTTxxh [h] = 4P •

⌊b⌋P⌊b⌋P⌊b⌋− 4P •

⌊b⌋P⌊b2⌋− 2P •

⌊b⌋P⌈b⌉.

By combining the previous results, we get

∇3xxxv2(x,y) [h,h,h] = 12P •

⌊b⌋P⌊b2⌋− 6P •

⌊b3⌋− 6P •

⌈b2, b

⌉+6P •

⌊b⌋P⌈b⌉− 8P •

⌊b⌋P⌊b⌋P⌊b⌋.

(4.2.4)

We need the following lemma which bounds ∇3xxxv2(x,y) [h,h,h].

Lemma 4.2.3. For any h ∈ Rm1 and (x,y) be such that s2(x,y) KJ2 0. Then

|∇3xxxv2(x,y) [h,h,h] | ≤ 30‖b‖hTQxxh. (4.2.5)

90

Proof. Note that ⌊b2⌋ ⌊b⌋

=1

2

(⌊b3⌋

+⌈b2, b

⌉).

Then (4.2.4) can be rewritten as

∇3xxxv2(x,y) [h,h,h] = P

⌊b⌋P •

(12⌊b2⌋

+ 6⌈b⌉− 8

⌊b⌋P⌊b⌋)− 12P •

⌊b2⌋ ⌊b⌋.

(4.2.6)

From (4.2.2) we have

12⌊b2⌋

+ 6⌈b⌉− 8

⌊b⌋P⌊b⌋ 8

⌊b2⌋

+ 2⌈b⌉.

By observing that that −⌊b2⌋⌈b⌉⌊b2⌋, we obtain

6⌊b2⌋ 12

⌊b2⌋

+ 6⌈b⌉− 8

⌊b⌋P⌊b⌋ 18

⌊b2⌋. (4.2.7)

Let λ1, λ2, . . . , λr2 be the eigenvalues of B. Then, for i, j = 1, 2, . . . , r2, the eigenvalues of⌊b⌋

are of the form (1/2)(λi + λj) (see Lemma 1.3.3). We then have

−‖b‖ I ⌊b⌋ ‖b‖ I

and hence

−‖b‖P P⌊b⌋P ‖b‖P. (4.2.8)

Using (4.2.7), (4.2.8), and the face that⌊b2⌋ 0, we get

∣∣P ⌊b⌋P • (12⌊b2⌋

+ 6⌈b⌉− 8

⌊b⌋P⌊b⌋)∣∣ ≤ 18||b‖P •

⌊b2⌋. (4.2.9)

In addition, since the elements b2 and b have the same eigenvectors, the matrices⌊b2⌋

91

and⌊b⌋

also have the same eigenvectors. This implies that

−‖b‖⌊b2⌋⌊b2⌋ ⌊b⌋ ‖b‖

⌊b2⌋.

Thus we have that ∣∣P • ⌊b2⌋ ⌊b⌋∣∣ ≤ ‖b‖P • ⌊b2

⌋. (4.2.10)

The result follows from (4.2.6), (4.2.9) and (4.2.10). 2

We can now state the proof of Theorem 4.2.1.

Theorem 4.2.1. For any fixed µ > 0, ρk(µ, ·) is µ-self-concordant on F0, for k =

1, 2, . . . , K.

Proof. By combining the results of (4.2.1), (4.2.3) and (4.2.5), we get

∣∣∇3xxxv2(x, y) [h,h,h]

∣∣ ≤ 30√

2n3/22 (hT∇2

xxv2(x, y)h)3/2,

which combined with (4.1.14) implies that

|∇3xxxρk(y) [h,h,h]| ≤ 30

√2µc2r

3/22 (∇2

xxv2(x, y) [h,h])3/2

= 2µ−1/2(c2µ∇2xxv2(x,y) [h,h])3/2

= 2µ−1/2(∇2xxρk(x) [h,h])3/2.

The theorem is established. 2

Corollary 4.2.1. For any fixed µ > 0, η(µ, ·) is a µ-self-concordant function on F0.

Proof. It is easy to verify that µc1v1 is µ-self-concordant on F1. The corollary follows

from [30, Proposition 2.1.1]. 2

92

4.2.2 Parameters of the self-concordant family

In this subsection, we show that the family of functions η(µ, ·) : µ > 0 is a strongly

self-concordant family with appropriate parameters (see definition 3.2.2).

The proof of self-concordancy of the family η(µ, ·) : µ > 0 relies on the following

two lemmas.

Lemma 4.2.4. For any µ > 0 and x ∈ F0, the following inequality holds:

|∇2xxη(µ,x)′[h,h]| ≤ 1

µ∇2

xxη(µ,x)[h,h], ∀h ∈ Rm1 .

Proof. Differentiating ∇2xxη(µ,x) in (4.1.15) with respect to µ, we obtain

∇2xxη(µ,x)′ = ∇2

xxv1(x) +K∑k=1

∇2xxv

(k)2 (x, y(k)) + µ∇3

xxyv(k)2 (x, y(k)) (y(k))′

= ∇2xxv1(x) +

K∑k=1

∇2xxv

(k)2 (x, y(k)) =

1

µ∇2

xxη(µ,x).

The result immediately follows by observing that ∇2xxη(µ,x) 0, and therefore, for

any h ∈ Rn, we have that 1µ∇2

xxη(µ,x)[h,h] ≥ 0. 2

For fixed (x, y) with s2(x, y) KJ2 0, let

ti =⌈s−1/22

⌉ds2, tie

⌈s−1/22

⌉e2 and wj =

⌈s−1/22

⌉ds2,wje

⌈s−1/22

⌉e2,

for i = 1, 2, . . . ,m1 and j = 1, 2, . . . ,m2. We can apply a Gram-Schmidt procedure to

bwic and obtain buic with ‖ buic ‖ = 1 for all i and ui • uj = 0, i 6= j. Then the

linear span of buic , i = 1, 2, . . . ,m2 is equal to the span of bwic , i = 1, 2, . . . ,m2.

Let U = [u1; u2; . . . ; um2 ] ∈ Rn22×m2 and u =

∑m2

k=1 ui. It follows that P = UUT. We

93

then have∂v2(x, y)

∂xi= −P • btic

= −UUT • btic

= −trace(U bticU)

= −m2∑k=1

uk • (ti uk)

= −m2∑k=1

ti • u2k

= −ti •m2∑k=1

u2k

= −ti • u,

(4.2.11)

and

Qxxi,j = P • btic btjc

= trace(U btic btjcU)

=

m2∑k=1

(uk • (ti (tj uk)))

=

m2∑k=1

ti • (u2k tj)

= ti •

((m2∑k=1

u2k

) tj

)= ti • (u tj).

(4.2.12)

Lemma 4.2.5. Let (x, y) be such that s2(x, y) KJ2 0. Then

∇xv2(x, y)T∇2xxv2(x, y)−1∇xv2(x, y) ≤ m2. (4.2.13)

Proof. Let T = [t1; t2; . . . ; tm1 ] ∈ Rn22×m1 . From (4.2.12) we have that

Qxxi,j = ti • (u tj) = ti • due tj,

94

and hence

Qxx = TT due T .

From (4.2.11), we also have

∇xv2(x, y)T = −TTu.

Thus

∇xv2(x, y)T(Qxx)−1∇xv2(x, y) = u • T (TT due T )−1TTu

= u1/2 •⌈u1/2

⌉T (TT due T )−1TT

⌈u1/2

⌉u1/2

≤ u1/2 • u1/2

= trace(u)

= m2,

where the last equality follows from the fact that u =∑m2

k=1 u2k, and that trace(u2

k) = uk •

uk = 1 for each k. In addition, Qxx ∇2xxv2(x, y) implies ∇2

xxv2(x, y)−1 (Qxx)−1.

This completes the proof. 2

Lemma 4.2.6. For any µ > 0 and x ∈ F0, we have

|∇xη′(µ,x)T[h]| ≤

√(m1c1 +m2c2)(1 +K)

µ∇2

xxη(µ,x)[h,h], ∀h ∈ Rm1 .

Proof. Differentiating ∇xη(µ,x) in (4.1.15) with respect to µ, we obtain

∇xη′(µ,x) = c1∇xv1(x) +

K∑k=1

c1∇xv(k)2 (x, y(k)) + µ∇2

xyv(k)2 (x, y(k)) · (y(k))′

= c1∇xv1(x) +K∑k=1

c2∇xv(k)2 (x, y(k)) .

95

In Lemma 4.2.13, we have shown that

∇xv(k)2 (x, y(k))T∇2

xxv(k)2 (x, y(k))−1∇xv

(k)2 (x, y(k)) ≤ m2,

which is equivalent to

|∇xv(k)2 (x, y(k))[h]| ≤

√m2∇2

xxv(k)2 (x, y(k))[h,h], ∀h ∈ Rm2 . (4.2.14)

Similarly, we can show that

∇xv1(x)T∇2xxv1(x)−1∇xv1(x) ≤ m1,


|∇xv1(x)[h]| ≤√m1∇2

xxv1(x)[h,h], ∀h ∈ Rm1 . (4.2.15)

Then, using (4.2.14) and (4.2.15), we have that for all h ∈ Rm1

|∇xη′(µ,x)[h]| =

∣∣∣∣∣(c1∇xv1(x) +

K∑k=1

c2∇xv(k)2 (x, y(k))

)[h]

∣∣∣∣∣≤ |c1∇xv1(x)[h]|+

K∑k=1

|c2∇xv(k)2 (x, y(k))[h]|

≤√m1c2

1∇2xxv1(x)[h,h] +

K∑k=1

√m2c2

2∇2xxv

(k)2 (x, y(k))[h,h]

≤√

(m1c1)c1∇2xxv1(x)[h,h] +

K∑k=1

√(m2c2)c2∇2

xxv(k)2 (x, y(k))[h,h]

≤

√√√√(m1c1 +m2c2)(1 +K)

(c1∇2

xxv1(x)[h,h] +K∑k=1

c2∇2xxv

(k)2 (x, y(k))[h,h]

)

=

√(m1c1 +m2c2)(1 +K)

µ∇2

xxη(µ,x)[h,h].

96

The result is established. 2

Theorem 4.2.2. The family η(µ, ·) : µ > 0 is a strongly self-concordant family with

the following parameters

α1(µ) = µ, α2(µ) = α3(µ) = 1, α4(µ) =

√(1 +K)(m1c1 +m2c2)

µ, α5(µ) =

1

µ.

Proof. It is direct to see that condition (i) of Definition 3.2.2 holds. Corollary 4.2.1

shows that condition (ii) is satisfied and Lemmas 4.2.4 and 4.2.6 show that condition (iii)

is satisfied. 2

4.3 A class of volumetric barrier algorithms for solv-

ing SSPs

In §5 we have established that the parametric functions η(µ, ·) is a strongly self-concordant

family. Therefore, it becomes straightforward to develop primal path following interior

point algorithms for solving SSP (4.1.3, 4.1.4). In this section we introduce a class of

volumetric barrier algorithms for solving this problem. This class is stated formally in

Algorithm 2.

Our algorithm is initialized with a starting point x0 ∈ F0 and a starting value µ0 > 0

for the barrier parameter µ, and is indexed by a parameter γ ∈ (0, 1). We use δ as a

measure of the proximity of the current point x to the central path, and β as a threshold

for that measure. If the current x is too far away from the central path in the sense that

δ > β, Newton’s method is applied to find a point close to the central path. Then the

value of µ is reduced by a factor γ and the whole precess is repeated until the value of

µ is within the tolerance ε. By tracing the central path as µ approaches zero, a strictly

97

Algorithm 2 Volumetric Barrier Algorithm for Solving SSP (4.1.3,4.1.4)

Require: ε > 0, γ ∈ (0, 1), θ > 0, β > 0, x0 ∈ F0 and µ0 > 0.x := x0, µ := µ0

while µ ≥ ε dofor k = 1, 2, . . . , K do

solve (4.1.6) to obtain y(k)


xxη(µ,x)−1∇xη(µ,x) using (4.1.15)

compute δ(µ,x) :=

√− 1

µ∆xT∇2


while δ > β dox := x+ θ∆xfor k = 1, 2, . . . , K do

solve (4.1.6) to obtain y(k)


xxη(µ,x)−1∇xη(µ,x) using (4.1.15)

compute δ(µ,x) :=

√− 1

µ∆xT∇2


end whileµ := γµ

end while

feasible ε-solution to (4.1.6) will be generated.

4.4 Complexity analysis

Theorems 4.4.1 and 4.4.2 present the complexity analysis for two variants of algorithms:

short-step algorithms and long-step algorithms, which are controlled by the manner of

selection γ that we have made in Algorithm 2.

In the short-step version of the algorithm, we decrease the barrier parameter in each

iteration by a factor γ := 1−σ/√

(1 +K)(m1c1 +m2c2), σ < 0.1. The kth iteration of the

short-step algorithms is performed as follows: at the beginning of the iteration, we have

µ(k−1) and x(k−1) on hand and x(k−1) is close to the center path, i.e., δ(µ(k−1),x(k−1)) ≤ β.

After the barrier parameter µ is reduced from µ(k−1) to µk := γµ(k−1), we have that

δ(µk,x(k−1)) ≤ 2β. Then a full Newton step with size θ = 1 is taken to produce a new

98

point xk with δ(µk,xk) ≤ β. We now show that, in this class of algorithms, only one

Newton step is sufficient for recentering after updating the parameter µ. We have the

following theorem.


the stopping criterion, and β = (2 −√

3)/2. If the starting point x0 is sufficiently close

to the central path, i.e., δ(µ0,x0) ≤ β, then the short-step algorithm reduces the barrier

parameter µ at a linear rate and terminates with at most O(√

(1 +K)(m1c1 +m2c2)

ln(µ0/ε)) iterations.

In the long-step version of the algorithm, the barrier parameter is decreased by an

arbitrary constant factor γ ∈ (0, 1). It has a potential for larger decrease on the objective

function value, however, several damped Newton steps might be needed for restoring the

proximity to the central path. The kth iteration of the long-step algorithms is performed

as follows: at the beginning of the iteration we have a point x(k−1), which is sufficiently

close to x(µ(k−1)), where x(µ(k−1)) is the solution to (4.1.5) for µ := µ(k−1). The barrier

parameter is reduced from µ(k−1) to µk := γµ(k−1), where γ ∈ (0, 1), and then searching

is started to find a point xk that is sufficiently close to x(µk). The long-step algorithm

generates a finite sequence consisting of N points in F0, and we finally take xk to be

equal to the last point of this sequence. We want to determine an upper bound on N , the

number of Newton iterations that are needed to find the point xk. We have the following

theorem.


the stopping criterion, and β = 1/6. If the starting point x0 is sufficiently close to the

central path, i.e., δ(µ0,x0) ≤ β, then the long-step algorithm reduces the barrier parameter

µ at a linear rate and terminates with at most O((1+K)(m1c1+m2c2) ln(µ0/ε)) iterations.

The proofs of the Theorems 4.4.1 and 4.4.1 are similar to the proofs of Theorems 3.4.1

99

and 3.4.2 and are given in the subsections below. For convenience, we restate Proposition

3.4.1(i) and Lemma 3.4.1.

Proposition 4.4.1. For any µ > 0 and x ∈ F0, we denote ∆x := −∇2xxη(µ,x)−1

∇xη(µ,x) and δ :=√

1µ∇2

xxη(µ,x)[∆x,∆x]. Then for δ < 1, τ ∈ [0, 1] and any h ∈ Rm2

we have

∇2xxη(µ,x+ τ∆x)[h,h] ≤ (1− τδ)−2∇2

xxη(µ,x)[h,h].

Lemma 4.4.1. For any µ > 0 and x ∈ F0, let ∆x be the Newton direction defined

by ∆x := −∇2xxη(µ,x)−1∇xη(µ,x); δ := δ(µ,x) =

√1µ∇2

xxη(µ,x)[∆x,∆x], x+ =

x+ ∆x, ∆x+ be the Newton direction calculated at x+, and δ(µ,x+) :=√1µ∇2

xxη(µ,x+)[∆x+,∆x+]. Then the following relations hold:

(i) If δ < 2−√

3, then δ(µ,x+) ≤(

δ

1− δ

)2

≤ δ

2.

(ii) If δ ≥ 2−√

3, then η(µ,x)− η(µ,x+ θ∆x) ≥ µ(δ− ln(1 + δ)), where θ = (1 + δ)−1.

4.4.1 Complexity for short-step algorithm

For the purpose of proving Theorems 4.4.1, we present the following proposition which is

a restatement of [30, Theorem 3.1.1].

Proposition 4.4.2. Let χκ(η;µ, µ+) :=

(1 +

√(1+K)(m1c1+m2c2)

κ

)lnγ−1. Assume that

δ(µ,x) < κ and µ+ := γµ satisfies

χκ(η;µ, µ+) ≤ 1− δ(µ,x)

κ.

Then δ(µ+,x) < κ.

Lemma 4.4.2. Let µ+ = γµ, where γ = 1 − σ/√

(1 +K)(m1c1 +m2c2) and σ ≤ 0.1,

and let β = (2−√

3)/2. If δ(µ,x) ≤ β, then δ(µ+,x) ≤ 2β.

100

Proof. Let κ := 2β = 2 −√

3. Since δ(µ,x) ≤ κ/2, one can verify that for σ ≤ 0.1, µ+

satisfies

χκ(η;µ, µ+) ≤ 1

2≤ 1− δ(µ,x)

κ.

By Proposition 4.4.2, we have δ(µ+,x) ≤ κ. 2

By Lemmas 4.4.1(i) and 4.4.2, we conclude that we can reduce the parameter µ by

the factor γ := 1 − σ/√

(1 +K)(m1c1 +m2c2), σ < 0.1, at each iteration, and that only

one Newton step is sufficient to restore proximity to the central path. Hence, Theorem

4.4.1 follows.

4.4.2 Complexity for long-step algorithm

For x ∈ F0 and µ > 0, we define the function φ(µ,x) := η(µ,x(µ)) − η(µ,x), which

represents the difference between the objective value η(µk,x(k)) at the end of kth iteration

and the minimum objective value η(µk,x(µk−1)) at the beginning of kth iteration. Then

the task is to find an upper bound on φ(µ,x). To do so, we first give upper bounds on

φ(µ,x) and φ′(µ,x) respectively. We have the following lemma.

Lemma 4.4.3. Let µ > 0 and x ∈ F0, we denote ∆x := x− x(µ) and define

δ := δ(µ,x) =

√1

µ∇2

xxη(µ,x)[∆x, ∆x].

For any µ > 0 and x ∈ F0, if δ < 1, then the following inequalities hold:

φ(µ,x) ≤ µ

(δ

1− δ+ ln(1− δ)

), (4.4.1)

|φ′(µ,x)| ≤ −√

(1 +K)(m1c1 +m2c2) ln(1− δ). (4.4.2)

101

Proof.

φ(µ,x) := η(µ,x)− η(µ,x(µ)) :=

∫ 1

0

∇xη(µ,x+ (1− τ)∆x)[∆x]dτ.

Since x(µ) is the optimal solution, we have

∇xη(µ,x(µ)) = 0. (4.4.3)

Hence, with the aid of Proposition 4.4.1, we get

φ(µ,x) =

∫ 1

0

∫ 1

0

∇2xxη(µ,x(µ) + (1− α)∆x)[∆x, ∆x] dαdτ

≤∫ 1

0

∫ 1

0

∇2xxη(µ,x)[∆x, ∆x]

(1− δ + αδ)2dαdτ

=

∫ 1

0

∫ 0

1

µδ2

(1− δ + αδ)2dαdτ

= µ

(δ

1− δ+ ln(1− δ)

),

which establishes (4.4.1).

Now, for any µ > 0, by applying the chain rule, using (4.4.3), and applying the

Mean-Value Theorem, we get

φ′(µ,x) = η′(µ,x)− η′(µ,x(µ))−∇xη(µ,x(µ))Tx′(µ)

= η′(µ,x)− η′(µ,x(µ))

= ∇xη(µ,x(µ) +$∆x)T∆x,

(4.4.4)

for some $ ∈ (0, 1). Hence

|φ′(µ,x)| =

∣∣∣∣∫ 1

0

∇xη(µ,x(µ) + τ∆x)′[∆x] dτ

∣∣∣∣

102

≤∫ 1

0

√∇2

xxη(µ,x(µ) + τ∆x)[∆x, ∆x]√∇xη(µ,x(µ) + τ∆x)′ • (∇2

xxη(µ,x(µ) + τ∆x)−1∇xη(µ,x(µ) + τ∆x)′) dτ.

In view of Lemma 4.2.6 we have the following estimation

−∇xη(µ,x)′ • (∇2xxη(µ,x)−1∇xη(µ,x)′) ≤ (1 +K)(m1c1 +m2c2)

µ. (4.4.5)

Then, by using (4.4.5), Proposition 4.4.1, and the observation x(µ)+τ∆x = x−(1−τ)∆x,

we obtain

|φ′(µ,x)| ≤∫ 1

0

√∇2

xxη(µ,x)[∆x, ∆x]

1− (1− τ)δ

√(1 +K)(m1c1 +m2c2)

µdτ

=

∫ 1

0

δ√µ

1− δ + τ δ

√(1 +K)(m1c1 +m2c2)

µdτ

=√

(1 +K)(m1c1 +m2c2)

∫ 1

0

δ

1− δ + τ δdτ

= −√

(1 +K)(m1c1 +m2c2) ln(1− δ),

which establishes (4.4.2). 2

Lemma 4.4.4. Let µ > 0 and x ∈ F0 be such that δ < 1, where δ is as defined in Lemma

4.4.3. Let µ+ := γµ with γ ∈ (0, 1). Then

η(µ+,x)− η(µ+,x(µ+)) ≤ O(1)[(1 +K)(m1c1 +m2c2)]µ+.

Proof. Differentiating (4.4.4) with respect to µ, we get

φ′′(µ,x) = η′′(µ,x)− η′′(µ,x(µ))− ∇xη(µ,x(µ))′Tx′(µ). (4.4.6)

We will work on the right-hand side of (4.4.6) by bounding the second and the last

terms. Observe that η′′(µ,x(µ)) =∑K

k=1 ρ(k)′′(µ,x). Differentiating ρ(k)(µ,x) with re-

103

spect to µ we obtain ρ(k)′(µ,x) = c2v2(x, y). Now, differentiating ρ(k)′(µ,x) with respect

to µ we obtain ρ(k)′′(µ,x) = c2∇yv2(x, y)Ty′. Then, differentiating (4.1.13) yields

y′ = − 1

µ∇2

yyv2(x, y)−1∇yv2(x, y).

Therefore we have

ρ′′k(µ,x) = − 1

µc2∇yv2(x, y)T∇2

yyv2(x, y)−1∇yv2(x, y).

It can be shown (see also [7, Theorem 4.4]) that −ρ′′k(µ, y) ≤ 1

µc2m2. Thus

−η′′(µ,x(µ)) = −K∑k=1

ρ(k)′′(µ,x) ≤ m2c2K

µr2

. (4.4.7)

By differentiating (4.4.3) with respect to µ, we get

∇xη(µ,x(µ))′ +∇2xxη(µ,x(µ))x′(µ) = 0,

or equivalently,

x′(µ) = −∇2xxη(µ,x(µ))−1∇xη

′(µ,x(µ)).

Hence, by using (4.4.5), we have

−∇xη(µ,x(µ))′Tx′(µ) = ∇xη(µ,x(µ))′T∇2xxη(µ,x(µ))−1∇xη(µ,x(µ))′

≤ (1 +K)(m1c1 +m2c2)

µ.

(4.4.8)

Observe that η(µ,x) is concave in µ, it follows that η′′(µ,x) ≤ 0. Combining this with

(4.4.7) and (4.4.8), we obtain

104

φ′′(µ,x(µ)) ≤ m(1 +K)(m1c1 + 2m2c2)

µ. (4.4.9)

Applying the Mean-Value Theorem and using Lemma 4.4.3 and (4.4.9) to get

φ(µ+,x) = φ(µ,x) + φ′(µ,x)(µ+ − µ) +

∫ µ

µ+

∫ µ

τ

φ′′(υ,x) dυdτ

≤ µ

(δ

1− δ+ ln(1− δ)

)−√

(1 +K)(m1c1 +m2c2) ln(1− δ) (µ+ − µ)

+m(1 +K)(m1c1 + 2m2c2)

∫ µ

µ+

∫ τ

µ

υ−1 dυdτ

= µ

(δ

1− δ+ ln(1− δ)

)−√

(1 +K)(m1c1 +m2c2) ln(1− δ) (µ+ − µ)

+m(1 +K)(m1c1 + 2m2c2) (µ− µ+) ln τµ

≤ µ

(δ

1− δ+ ln(1− δ)

)−√

(1 +K)(m1c1 +m2c2) ln(1− δ) (µ− µ+)

+m(1 +K)(m1c1 + 2m2c2) (µ− µ+) lnγ−1.

(Recall that γ−1 = µ+/µ ≥ τ/µ.)

Since δ and γ are constants, the lemma is established. 2

Note that the previous lemma requires δ < 1. However, evaluating δ explicitly may

not be possible. In the next lemma we will see that δ is actually proportional to δ, which

can be evaluated.

Lemma 4.4.5. For any µ > 0 and x ∈ F0, let ∆x := −∇2xxη(µ,x)−1∇xη(µ,x) and

∆x := x− x(µ). We denote

δ := δ(µ,x) =

√− 1

µ∇2

xxη(µ,x)[∆x,∆x] and δ := δ(µ,x) =

√− 1

µ∇2

xxη(µ,x)[∆x, ∆x].

If δ < 1/6, then 23δ ≤ δ ≤ 2δ.

Proof. See the proof of Lemma 3.4.5. 2

Theorem 4.4.2 follows by combining Lemmas 4.4.1(ii), 4.4.4, and 4.4.5.

105

Chapter 5

Some Applications

In this chapter of this dissertation, we will turn our attention to present four applications of

SSOCPs and SRQCPs. Namely, we describe stochastic Euclidean facility location problem

and the portfolio optimization problem with loss risk constraints as two applications

of SSOCPs, then we describe the optimal covering random ellipsoid problem and an

application in structural optimization as two applications of SRQCPs. We also refer the

reader to a paper by Maggioni et al. [24] which describes another important application

of SRQCPs in mobile ad hoc networks. The results of this chapter have been submitted

for publication [2].

5.1 Two applications of SSOCPs

Each one of the following two subsections is devoted to describe an application of SSOCPs.

5.1.1 Stochastic Euclidean facility location problem

In facility location problems (FLPs) we are interested in choosing a location to build a

new facility or locations to build multiple new facilities so that an appropriate measure

of distance from the new facilities to existing facilities is minimized. FLPs arise locating

106

airports, regional campuses, wireless communication towers, etc.

The following are two ways of classifying FLPs (see also [39]): We can classify FLPs

based on the number of new facilities in the following sense: if we add only one new facility

then we get a problem known as a single facility location problem (SFLP), while if we add

multiple new facilities instead of adding only one, then we get more a general problem

known as a multiple facility location problem (MFLP). Another way of classification is

based on the distance measure used in the model between the facilities. If we use the

Euclidean distance then these problems are called Euclidean facility location problems

(EFLPs), if we use the rectilinear distance then these problems are called rectilinear

facility location problems (RFLPs).

In (deterministic) Euclidean single facility location problem (ESFLP), we are given r

existing facilities represented by the fixed points a1,a2, · · · ,ar in Rn, and we plan to place

a new facility represented by x so that we minimize the weighted sum of the distances

between x and each of the points a1,a2, · · · ,ar. This leads us to the problem

min∑r

i=1wi ‖x− ai‖2

or, alternatively, to the problem (see for example [42])

min∑r

i=1wi ti

s.t. (t1;x− a1; · · · ; tr;x− ar) r(n+1)r 0,

where wi is the weight associated with the ith existing facility and the new facility for i =

1, 2, . . . , r. However, the resulting model is a DSOCP model. So, there is no stochasticity

in this model. But what if we assume that some of the fixed existing facilities are random?

Where could such a randomness be found in the real world?

Picture (A) in Figure 5.1 shows the locations of multi-national military facilities led

by troops from the United States and the United Kingdom in the Middle East region as

107

Figure 5.1: The multinational military facilities locations before and after starting theIraq war. Pictures taken from www.globalsecurity.org/military/facility/centcom.htm.

of December, 31, 2002, i.e., before the beginning of the Iraq war (it is known that the

war began on March 20, 2003). This picture shows the multi-national military facilities

including the navy facility locations and the air force facility locations but not the army

facility locations. Assuming that the random existing facilities are the Iraqi military

facilities whose locations are unknown at the time of taking this picture which is before

the beginning of the war. Assuming also that the new facilities are the army multi-national

facilities whose locations have to be determined at the time of taking this picture which

is before the beginning of the war. Then it is reasonable to look at a problem of this kind

as a stochastic ESFLP. Picture (B) in Figure 5.1 shows the locations of all multi-national

military facility locations including the army facility locations as of December, 31, 2002,

i.e., after starting the Iraq war.

So, in some applications, the locations of existing facilities cannot be fully specified

because the locations of some of them depend on information not available at the time

when decision needs to be made but will only be available at a later point in time. In

general, in order to be precise only the latest information of the random facilities is used.

This may require an increasing or decreasing of the number of the new facilities after

108

the latest information about the random facilities become available. In this case, we

are interested in stochastic facility location problems (or abbreviated as stochastic FLPs).

When the locations of all old facilities are fully specified, FLPs are called deterministic

facility location problems (or abbreviated as deterministic FLPs).

In this section we consider (both single and multiple) stochastic Euclidean facility

location problems, and in the next chapter we describe four different models of FLP, two

of them can be viewed as generalizations of the models presented in this section.

Stochastic ESFLPs

Let a1,a2, . . . ,ar1 be fixed points in Rn representing the coordinates of r1 existing fixed

facilities and a1(ω), a2(ω), . . . , ar2(ω) be random points in Rn representing the coordinates

of r2 random facilities who realizations depends on an underlying outcome ω in an event

space Ω with a known probability function P .

Suppose that at present we do not know the realizations of r2 random facilities, and

that at some point in time in future the realizations of these r2 random facilities become

known.

Our goal is to locate a new facility x that minimizes the weighted sums of the Euclidean

distance between the new facility and each one of the existing fixed facilities and also

minimizes the expected weighted sums of the distance between the new facility and the

realization of each one of the random facilities. Note that this decision needs to be made

before the realizations of the r2 random facilities become available. This leads us to the

following SSOCP model:

min∑r1

i=1wi ti + E [Q(x, ω)]

s.t. (t1;x− a1; . . . ; tr1 ;x− ar1) r1 0,


109

min∑r2

j=1 wj tj

s.t. (t1;x− a1(ω); . . . ; tr2 ;x− ar2(ω)) r2 0,

and

E[Q(x, ω)] :=

∫Ω

Q(x, ω)P (dω),

where wi is the weight associated with the ith existing facility and the new facility for

i = 1, 2, . . . , r1 and wj(ω) is the weight associated with the jth random existing facility

and the new facility for j = 1, 2, . . . , r2.

Stochastic EMFLPs

Assume that we need to add m new facilities, namely x1,x2, . . . ,xm ∈ Rn, instead of

adding only one. This may require an increasing or decreasing of the number of the new

facilities after the latest information about the random facilities become available. For

simplicity, let us assume that the number of new facilities was previously known and fixed.

Then we have two cases depending on whether or not there is an interaction among the

new facilities in the underlying model. If there is no interaction between the new facilities,

we are just concerned in minimizing the weighted sums of the distance between each one

of the new facilities on one hand and each one of the fixed facilities and the realization of

each one of the random facilities on the other hand. In other words, we solve the following

SSOCP model:

min∑m

j=1

∑r1i=1wij tij + E [Q(x1; . . . ;xm, ω)]

s.t. (t1j;xj − a1; . . . ; tr1j;xj − ar1) r1 0, j = 1, 2, . . . ,m,

where Q(x1; · · · ;xm, ω) is the minimum value of the problem

110

min∑m

j=1

∑r2i=1 wij(ω) tij

s.t. (t1j;xj − a1(ω); . . . ; tr2j;xj − ar2(ω)) r2 0, j = 1, 2, . . . ,m,

and

E[Q(x1; . . . ;xm, ω)] :=

∫Ω

Q(x1; . . . ;xm, ω)P (dω).

where wij is the weight associated with the ith existing facility and the jth new facility

for j = 1, 2, . . . ,m and i = 1, 2, . . . , r1, and wij(ω) is the weight associated with the ith

random existing facility and the jth new facility for j = 1, 2, . . . ,m and i = 1, 2, . . . , r2.

If interaction exists among the new facilities, then, in addition to the above require-

ments, we need to minimize the sum of the Euclidean distances between each pair of the

new facilities. In this case, we are interested in a model of the form:

min∑m

j=1

∑r1i=1wijtij +

∑mj=1

∑j−1j′=1 wjj′ tjj′ + E [Q(x1; . . . ;xr1 , ω)]

s.t. (t1j;xj − a1; . . . ; tr1j;xj − ar1) r1 0, j = 1, 2, . . . ,m

(tj(j+1);xj − xj+1; . . . ; tjm;xj − xm) (m−j) 0, j = 1, 2, . . . ,m− 1,

where Q(x1; . . . ;xm, ω) is the minimum value of the problem

min∑m

j=1

∑r2i=1 wij(ω) tij +

∑mj=1

∑j−1j′=1 wjj′ tjj′

s.t. (t1j;xj − a1(ω); . . . ; tr2j;xj − ar2(ω)) r2 0, j = 1, 2, . . . ,m

(tj(j+1);xj − xj+1; . . . ; tjm;xj − xm) (m−j) 0, j = 1, 2, . . . ,m− 1,

and

E[Q(x1; . . . ;xm, ω)] :=

∫Ω

Q(x1; . . . ;xm, ω)P (dω),

111

where wjj′ is the weight associated with the new facilities j′ and j for j′ = 1, 2, . . . , j − 1

and j = 1, 2, . . . ,m.

5.1.2 Portfolio optimization with loss risk constraints

The application in this subsection is a well-known problem from portfolio optimization.

We consider the problem of maximizing the expected return subject to loss risk con-

straints. The same problem over one period was cited as an application of DSOCP (see

Lobo, et al. [23]). Some extensions of this problem will be described.

The problem

Consider a portfolio problem with n assets or stocks over two periods. We start by letting

xi denote the amount of asset i held at the beginning of (and throughout) the first period,

and pi will denote the price change of asset i over this period. So, the vector p ∈ Rn is

the price vector over the first period. For simplicity, we let p be Gaussian with known

mean p and covariance Σ, so the return over this period is the (scalar) Gaussian random

variable r = pTx with mean r = pTx and variance σ = xTΣx where x = (x1;x2; . . . ;xn).

Let yi denote the amount of asset i held at the beginning of (and throughout) the

second period, and qi(ω) will denote the random price change of asset i over this period

whose realization depends on an underlying outcome ω in an event space Ω with known

probability function P . Similarly, we take the price vector q(ω) ∈ Rn over the second

period to be Gaussian with mean q(ω) and covariance Σ(ω), so the return over this period

is the (scalar) Gaussian random variable r(ω) = q(ω)Ty with mean ¯r(ω) = q(ω)Ty and

variance σ(ω) = yTΣ(ω)y where y = (y1; y2; . . . ; yn). Suppose that in the first period we

do not know the realization of the Gaussian vector q(ω) ∈ Rn, and that at some point in

time in the future (after the first period) the realization of this Gaussian vector becomes

known. As pointed out in [23], the choices of portfolio variables x and y involve the

112

(classical, Markowitz) tradeoff between random return mean and random variance.

The optimization variables are the portfolio vectors x ∈ Rn (of the first stage) and

y ∈ Rn (of the second stage). For these portfolio vectors, we take the simplest assumptions

x ≥ 0,y ≥ 0 (i.e., no short positions [23]) and∑n

i=1 xi =∑n

i=1 yi = 1 (i.e. unit total

budget [23]).

Let α and α be given unwanted return levels over the first and second periods, respec-

tively, and let β and β be given maximum probabilities over the first and second periods,

respectively. Assuming the above data is given, our goal is to determine the amount of

the asset i (which is xi over the first period and yi over the second period), i.e. determine

x and y, such that the expected returns over these two periods are maximized subject

to the following two types of loss risk constraints: the constraint P (r ≤ α) ≤ β must be

satisfied over the first period, and the constraint P (r(ω) ≤ α) ≤ β must be satisfied over

the second period. This determination needs to be made before the realizations become

available.

Formulation of the model

As noted in [23], the constraint P (r ≤ α) ≤ β is equivalent to the second-order cone

constraint (α− r; Φ−1(β)(Σ

12x)) 0,

provided β ≤ 1/2 (i.e.,Φ−1(β) ≤ 0), where

Φ(z) =1√2π

∫ z

−∞e−t

2/2dt

is the cumulative normal distribution function of a zero mean unit variance Gaussian

random variable. To prove this (see also [23]), notice that the constraint P (r ≤ α) ≤ β

113

can be written as

P

(r − r√σ≤ α− r√

σ

)≤ β.

Since (r− r)/√σ is a zero mean unit variance Gaussian random variable, the probability

above is simply Φ((α − r)/√σ), thus the constraint P (r ≤ α) ≤ β can be expressed as

Φ((α− r)/√σ) ≤ β or ((α− r)/

√σ) ≤ Φ−1(β), or equivalently r+ Φ−1(β)

√σ ≥ α. Since

√σ =√xTΣ x =

√(Σ1/2 x)T(Σ1/2x) = ‖Σ1/2x‖2,

the constraint P (r ≤ α) ≤ β is equivalent to the the second-order cone inequality r +

Φ−1(β)‖Σ1/2 x‖2 ≥ α or equivalently to the second-order cone constraint

(α− r; Φ−1(β)(Σ

12x)) 0.

Similarly, provided β ≤ 1/2, the constraint P (r(ω) ≤ α) ≤ β is equivalent to the

second-order cone constraint

(α− ¯r(ω); Φ−1(β)(Σ(ω)1/2y)

) 0.

Our goal is to determine the amount of the asset i (which is xi over the first period

and yi over the second period), i.e. determine x and y, such that the expected returns

over these two periods are maximized. This problem can be cast as a two-stage SSOCP

as follows:

max pT x+ E[Q(x, ω)]

s.t.(α− pTx; Φ−1(β)(Σ1/2x)

) 0

1Tx = 1,x ≥ 0,

(5.1.1)


114

max q(ω)T y

s.t.(α− q(ω)Ty; Φ−1(β)(Σ(ω)1/2y)

) 0

1Ty = 1,y ≥ 0,

(5.1.2)

and

E[Q(x, ω)] :=

∫Ω

Q(x, ω)P (dω).

Note that this formulation is different from one suggested by Mehrotra and Ozevin [27]

which minimizes the variance period (two-stage extension of Markowitz’s mean-variance

model). Here we formulate a model with a linear objective function leading to another

approach for solving this problem.

Extensions

The simple problem described above has many extensions [23]. One of these extensions is

imposing several loss risk constraints, i.e., the constraints P (r ≤ αi) ≤ βi, i = 1, 2, . . . , k1

(where βi ≤ 1/2, for i = 1, 2, . . . , k1), or equivalently

(αi − r; Φ−1(βi)(Σ

1/2x)) 0, for i = 1, 2, . . . , k1

to be satisfied over the first period, and the constraints P (r(ω) ≤ αj) ≤ βj, j = 1, 2, . . . , k2

(where βj ≤ 1/2, for j = 1, 2, . . . , k2), or equivalently

(αj − ¯r(ω); Φ−1(βj)(Σ(ω)1/2y)

) 0, for j = 1, 2, . . . , k2

to be satisfied over the second period. So our problem becomes

115


s.t.(αi − r; Φ−1(βi)(Σ

1/2x)) 0, i = 1, 2, . . . , k1

1Tx = 1,x ≥ 0,


max q(ω)T y

s.t.(αj − ¯r(ω); Φ−1(βj)(Σ(ω)1/2y)

) 0, j = 1, 2, . . . , k2

1Ty = 1,y ≥ 0,

and

E[Q(x, ω)] :=

∫Ω

Q(x, ω)P (dω).

As another extension, observe that the statistical models (p; Σ) for the price changes

during the first period and(q(ω), Σ(ω)

)for the price changes during the second period

are both uncertain (regardless of the fact that the later depends on ω while the first

does not) and one of the limitations of model in (5.1.1, 5.1.2) is its need to estimate these

statistical models. In [11], Bawa et al. indicated that using estimates of unknown expected

returns and unknown covariance matrices leads to an estimation risk in portfolio choice.

To handle this uncertainty, as in [11], we take those expected returns and covariance

matrices that belong to bounded intervals. i.e., we consider the statistical models (p; Σ)

and (q(ω), Σ(ω)) that belong to the bounded sets:

ℵ := (p; Σ) : pl ≤ p ≤ pu,Σl ≤ Σ ≤ Σu

116

and

< := (q(ω); Σ(ω)) : ql(ω) ≤ q(ω) ≤ qu(ω), Σl(ω) ≤ Σ(ω) ≤ Σu(ω),

respectively, where pl, pu, ql(ω), qu(ω),Σl,Σu, Σl(ω) and Σu(ω) are the extreme values of

the intervals mentioned above, then we take a worst case realization of the statistical

models (p; Σ) and(q(ω), Σ(ω)

)by maximizing the minimum of the expected returns

over all (p; Σ) ∈ ℵ and (q(ω), Σ(ω)) ∈ <.

Let us consider here only a discrete version of this development. Suppose we have

N1 different possible scenarios over the first period, each of which is modeled by a simple

Gaussian model for the price change vector pk with mean pk and covariance Σk. Similarly,

we also have N2 different possible scenarios over the second period, each of which is

modeled by a simple Gaussian model for the price change ql(ω), with mean ql(ω) and

covariance Σl(ω). We can then take a worst case realization of these information by

maximizing the minimum of the expected returns for these different scenarios, subject to

a constraint on the loss risk for each scenario to get the following SSOCP model:


s.t.(α− pTi x; Φ−1(β)(Σ

1/2i x)

) 0, i = 1, 2, . . . , N1

1Tx = 1,x ≥ 0,

where E[Q(x, ω)] is the maximum value of the problem

max q(ω)T y

s.t.(α− qj(ω)Ty; Φ−1(β)(Σj(ω)1/2y)

) 0, j = 1, 2, . . . , N2

1Ty = 1,y ≥ 0,

and

E[Q(x, ω)] :=

∫Ω

Q(x, ω)P (dω).

117

5.2 Two applications of SRQCPs

The stochastic versions of the two problems in this section have been described and

formulated by Ariyawansa and Zhu [49] as SSDP models, but it has been found recently [1,

23, 24] that on numerical grounds solving DSOCPs (SSOCPs) or DRQCPs1 (SRQCPs) by

treating them as DSDPs (SSDPs) is inefficient, and that DSOCPs (SSOCPs) or DRQCPs

(SRQCPs) can be solved more efficiently by exploiting their structure. In fact, in [1, 23]

we see many problems first formulated as DSDPs in [38, 41, 30] have been reformulated

as DSOCPs or DRQCP mainly for this reason. In view of this fact, in this section we

reformulate SRQCP models (as these problems should be solved as such) for the minimum-

volume covering random ellipsoid problem and the structural optimization problem.

5.2.1 Optimal covering random ellipsoid problem

The problem

The problem description in this part is taken from Ariyawansa and Zhu [9, Subsection

3.2]. Suppose we have n1 fixed ellipsoids:

Ei := x ∈ Rn : xTHix+ 2gTi x+ vi ≤ 0 ⊂ Rn, i = 1, 2, . . . , n1

where Hi ∈ Sn+, gi ∈ Rn and vi ∈ R are deterministic data for i = 1, 2, . . . , n1.

Suppose we also have n2 random ellipsoids:

Ei(ω) := x ∈ Rn : xTHi(ω)x+ 2gi(ω)Tx+ vi(ω) ≤ 0 ⊂ Rn, i = 1, 2, . . . , n2,

where, for i = 1, 2, . . . , n2, Hi(ω) ∈ Sn+, gi(ω) ∈ Rn and vi(ω) ∈ R are random data

whose realizations depend on an underlying outcome ω in an event space Ω with a known

1The acronym DRQCPs stands for deterministic rotated quadratic cone programs.

118

probability function P . We assume that, at present time, we do not know the realizations

of n2 random ellipsoids, and that at some point in the future the realizations of these n2

ellipsoids become known.

Given this, we need to determine a ball that contains all n2 fixed ellipsoids and the

realizations of the n2 random ellipsoids. This decision needs to be made before the real-

izations of the random ellipsoids become available. Consequently, when the realizations

of the random ellipsoids do become available, the ball that has already been determined

may or may not contain all the realized random ellipsoids. In order to guarantee that the

modified ball contains all (fixed and realizations of random) ellipsoids, we assume that at

that stage we are allowed to change the radius of the ball (but not its center), if necessary.

We consider the same assumptions as in [49]. We assume that the cost of choosing

the ball has three components:

• the cost of the center, which is proportional to the Euclidean distance to the center

from the origin;

• the cost of the initial radius, which is proportional to the square of the radius;

• the cost of changing the radius. The center and the radius of the initial ball are

determined so that the expected total cost is minimized.

In [49], Ariyawansa and Zhu describe the following concrete version of this generic

application: Let n := 2. The fixed ellipsoids contain targets that need to be destroyed,

and the random ellipsoids contain targets that also need to be destroyed but are moving.

Fighter aircrafts take off from the origin with a planned disk of coverage that contains

the fixed ellipsoids. In order to be accurate only the latest information about the moving

targets is used. This may require increasing the radius of the planned disk of coverage

after latest information about the random ellipsoids become available, which may occur

after the initially planned fighter aircrafts have taken off. This increase, dependent on

119

the initial disk of coverage and the specific information about the moving targets, may

result in an additional cost. The initial disk of coverage need to be determined so that

the expected total cost is minimized.

Our first goal is to determine x ∈ Rn and γ ∈ R such that the ball B defined by

B := x ∈ Rn : xTx− 2xTx+ γ ≤ 0

contains the fixed ellipsoids Ei for i = 1, 2, . . . , n1. As we mentioned, this determination

need to be determined before the realization of the random ellipsoids become available.

When the realizations of the random ellipsoids become available, if necessary, we need to

determine γ so that the new ball

B := x ∈ Rn : xTx− 2xTx+ γ ≤ 0

contains all the realizations of the random ellipsoids Ei(ω) for i = 1, 2, . . . , n2.

Notice that the center of the ball B is x, its radius is r :=√xTx− γ, and the distance

from the origin to its center is√xTx. Notice also that the new ball B has the same center

x as B but a larger radius r :=√xTx− γ.

Formulation of the model

We introduce the constraints d12 ≥ xTx and d2 ≥ r2 = xTx− γ. That is, d1 is an upper

bound on the distance between the center of the ball B and the origin,√xTx, and d2 is

an upper bound on square of the radius of the ball B.

In order to proceed, we feel that it is necessary for the reader to recall the following

fact:

Fact 1 (Sun and Freund [37]). Suppose that we are given two ellipsoids Ei ⊂ Rn, i = 1, 2

defined by Ei := x ∈ Rn : xTHix+ 2gTi x+ vi ≤ 0, where Hi ∈ Sn+, gi ∈ Rn and vi ∈ R

120

for i = 1, 2, then E1 contains E2 if and only if there exists τ ≥ 0 such that the linear

matrix inequality H1 g1

gT1 v1

τ

H2 g2

gT2 v2

holds.

In view of Fact 1 and the requirement that the ball B contains the fixed ellipsoids Ei

for i = 1, 2, . . . , n1, and the realizations of the random ellipsoids Ei(ω) for i = 1, 2, . . . , n2,

we accordingly add the following constraints:

I −x

−xT γ

τi

Hi gi

giT vi

, i = 1, 2, . . . , n1,

and I −x

−xT γ

δi

Hi(ω) gi(ω)

gTi (ω) vi(ω)

, i = 1, 2, . . . , n2,

or equivalently

Mi 0, ∀i = 1, . . . , n1 and Mi(ω) 0, i = 1, 2, . . . , n2

where for each i = 1, . . . , n1,

Mi :=

τiHi − I τigi + x

τigiT + xT τivi − γ

,and for each i = 1, . . . , n2,

Mi(ω) :=

δiHi(ω)− I δigi(ω) + x

δigiT(ω) + xT δivi(ω)− γ

.

121

Since we are looking to minimizing d2, where d2 is an upper bound on square of the

radius of the ball B, we can write the constraint d2 ≥ xTx− γ as d2 = xTx− γ. So, the

matrix Mi can be then written as

Mi =

τiHi − I τigi + x

τigiT + xT τivi + d2 − xTx

.Now, letHi := ΞiΛiΞ

Ti be the spectral decomposition ofHi, where Λi := diag(λi1; . . . ;λin),

and let ui := ΞTi (τigi + x). Then, for each i = 1, . . . , n1, we have

Mi :=

ΞTi 0

0T 1

Mi

Ξi 0

0T 1

=

τiΛi − I ui

uTi τivi + d2 − xTx

.Consequently, Mi 0 if and only if Mi 0 for each i = 1, 2, . . . , n1. Now our

formulation of the problem in SSOCP depends on the following lemma (see also [23]):

Lemma 5.2.1. For each i = 1, 2, . . . , n1, Mi 0 if and only if τiλmin(Hi) ≥ 1 and

xTx ≤ d2 + τivi + 1Tsi, where si = (si1; . . . ; sin), sij = u2ij/(τiλij − 1) for all j such that

τiλij > 1, and sij = 0 otherwise.

Proof. For each i = 1, 2, . . . , n1, it is known that the matrix Mi 0 if and only if every

principle minor of Mi is nonnegative. Since

det(τiΛi − I) =

∣∣∣∣∣∣∣∣∣∣∣∣∣

τiλi1 − 1 0 . . . 0

0 τiλi2 − 1 . . . 0

......

. . ....

0 0 . . . τn1λin1 − 1

∣∣∣∣∣∣∣∣∣∣∣∣∣= Πn1

j=1(τjλij − 1).

It follows that Mi 0 if and only if Πsj=1(τjλij − 1) ≥ 0, for all s = 1, . . . , n1 and

det(Mi) ≥ 0. Thus, Mi 0 if and only if τjλij ≥ 1, for all j = 1, . . . , n1 and det(Mi) ≥ 0.

122

Notice that

det(Mi) =

(k∏j=1

(τiλij − 1)

)((τivi + d2 − xTx)−

k∑j=1

u2ij

).

This means the inequality det(Mi) ≥ 0 strictly holds for each i ≤ j ≤ k such that

τjλij = 1. Hence, det(Mi) ≥ 0 if and only if

(τivi + d2 − xTx)− 1Tsi = (τivi + d2 − xTx)−∑

τiλij>1

(u2ij/(τjλij − 1)) ≥ 0.

Therefore, Mi 0 if and only if τiλmin(Hi) ≥ 1 and d2 ≥ xTx− τivi + 1Tsi. 2

So far, we have shown that each constraint Mi 0 can be replaced by the constraints

τiλmin(Hi) ≥ 1, xTx ≤ σ and σ ≤ d2 + τivi − 1Tsi.

Similarly, for each i = 1, 2, . . . , n2, by letting Hi := ΞTi ΛiΞ

Ti (the spectral decomposi-

tion of Hi), ui := ΞTi (δigi + x), and si := (si1; si2; . . . ; sin), where sij := u2

ij/(δiλij − 1)

for all j when δiλij > 1, and sij := 0 otherwise, then we can show that each constraint

Mi(ω) 0 can be replaced by the constraints δiλmin(Hi(ω)) ≥ 1, xTx ≤ σ(ω), and

σ(ω) ≤ d2 + δivi(ω)− 1Tsi(ω).

Since we are minimizing d2 and d2, then for all j = 1, 2, . . . , n, we can relax the

definitions of sij and sij by replacing them by u2ij ≤ sij(τiλij − 1) for all i = 1, 2, . . . , n1

and u2ij ≤ sij(δiλij − 1) for all i = 1, 2, . . . , n2, respectively.

When the realizations of the random ellipsoids become available, if necessary, we

determine λ so that the new ball

B := x ∈ Rn : xTx− 2xTx+ λ ≤ 0

contains all the realizations of the random ellipsoid. This new ball B has the same center x

as B but a larger radius, r :=√xTx. We note that r2−r2 = (xTx−γ)−(xTx−γ) = γ−γ,

123

and thus we introduce the constraint 0 ≤ γ− γ ≤ z where z is an upper bound of r2− r2.

Let c > 0 denote the cost per unit of the Euclidean distance between the center of the

ball B and the origin; let α > 0 be the cost per unit of the square of the radius of B; and

let β > 0 be the cost per unit increase of the square of the radius if it becomes necessary

after the realizations of the random ellipsoids are available.

We now define the following decision variables

x := (d1; d2; x; γ; τ ) and y := (z; γ; δ).

Then, by introducing the following unit cost vectors

c := (c;α; 0; 0; 0) and q := (β; 0; 0),

and combining all of the above, we get the following SQRCP model:

min cTx + E[Q(x, ω)]

s.t. ui = ΞTi (τigi + x), i = 1, 2, . . . , n1

u2ij ≤ sij(τiλij − 1), i = 1, 2, . . . , n1, j = 1, 2, . . . , n

xTx ≤ σ1

σ1 ≤ d2 + τivi − 1Tsi, i = 1, 2, . . . , n1

τi ≥ 1/λmin(Hi), i = 1, 2, . . . , n1

xTx ≤ d21,


124

min qTy

s.t. ui(ω) = ΞTi (ω)T(δigi(ω) + x), i = 1, 2, . . . , n2

uij(ω)2 ≤ sij(ω)(δiλij(ω)− 1), i = 1, 2 . . . , n2, j = 1, 2, . . . , n

xTx ≤ σ2(ω)

σ2(ω) ≤ d2 + δivi(ω)− 1Tsi(ω), i = 1, 2, . . . , n2

δi ≥ 1/λmin(Hi)(ω), i = 1, 2, . . . , n2

0 ≤ δ − δ ≤ z,

and

E[Q(x, ω)] :=

∫Ω

Q(x, ω)P (dω).

5.2.2 Structural optimization

Ben-Tal and Bendsøe in [12] and Nemirovski [13] consider the following problem from

structural optimization. A structure of k linear elastic bars connects a set of p nodes.

They assume the geometry (topology and lengths of bars) and the material are fixed. The

goal is to size the bars, i.e., determine appropriate cross-sectional areas of the bars.

For i = 1, 2, . . . , k, and j = 1, 2, . . . , p, we define the following decision variables and

parameters:

• fj := the external force applied on the jth node,

• dj := the (small) displacement of the jth node resulting from the load force fj,

• xi:= the cross-sectional area of the ith bar,

• xi:= the lower bound on the cross-sectional area of the ith bar,

• xi:= the upper bound on the cross-sectional area of the ith bar,

• li := the length of the ith bar,

125

• v := the maximum allowed volume of the bars of the structure,

• G(x) :=∑k

i=1 xiGi is the stiffness matrix, where the matrices Gi ∈ Sp, i = 1, 2, . . . , k

depend only on fixed parameters (such as length of bars and material).

In the simplest version of the problem they consider one fixed set of externally applied

nodal forces fj, j = 1, 2, . . . , p. Given this, the elastic stored energy within the structure

is given by

ε = fTd,

which is a measure of the inverse of the stiffness of the structure. In view of the definition

of the stiffness matrix G(x), we can also conclude that the following linear relationship

between f and d:

f = G(x) d.

The objective is to find the stiffest truss by minimizing ε subject to the inequality

lT x ≤ v as a constraint on the total volume (or equivalently, weight) and the constraint

x ≤ x ≤ x as upper and lower bounds on the cross-sectional areas.

For simplicity, we assume that x > 0 and G(x) 0, for all x > 0. In this case we

can express the elastic stored energy in terms of the inverse of the stiffness matrix and

the external applied nodal force as follows:

ε = fTG(x)−1f .

In summary, they consider the problem

min fT G(x)−1f

s.t. x ≤ x ≤ x

lT x ≤ v,

126


min s

s.t. fT G(x)−1f ≤ s

x ≤ x ≤ x

lT x ≤ v.

(5.2.1)

The first inequality constraint in (5.2.1) is just fractional quadratic function inequality

constraint and it can be formulated as a hyperbolic inequality. In §2 of [1] (see also §2 of

[23]), the authors demonstrate that this inequality is satisfied if and only if there exists

ui ∈ Rri and ti ∈ R with ti > 0, i = 1, 2, . . . , k, such that

k∑i=1

DTi ui = f , uT

i ui ≤ xiti, for i = 1, 2, . . . , k, and 1T t ≤ s,

where ri = rank(Gi) and the matrix Di ∈ Rri×n is the Cholesky factor of Gi (i.e., DTi Di =

Gi) for each i = 1, 2, . . . , k. Using this result, problem (5.2.1) becomes the problem

min s

s.t.∑k

i=1DTi ui = f

uTi ui ≤ xiti, i = 1, . . . , k

1T t ≤ s

x ≤ x ≤ x

lT x ≤ v,

(5.2.2)

which includes only linear and hyperbolic constraints.

The model (5.2.2) depends on a “simple” assumption that says that the external forces

applied to the nodes are fixed. As more complicated versions, they consider multiple

loading scenarios as well. In [49], Ariyawanza and Zhu consider the case that external

forces applied to the nodes are random variables with known distribution functions and

they formulate an SSDP model for this problem. In this subsection, we formulate an

127

SRQCP model for this problem under the assumption that some of the external forces

applied to the nodes are fixed and the rest are random variables with known distribution

functions. Let us denote to the external force applied on the jth node by fj if it is

fixed and by ˜fj(ω) if it is random. Due to the changes in the environmental conditions

(such as wind speed and temperature), we believe that the randomness of some of the

external forces is much closer to the reality. Without loss of generality, let us assume that

f(ω) = (f ; ˜f(ω)) where ˜f(ω) depends on an underlying outcome ω in an event space Ω

with a known probability function P .

Accordingly, the displacement of the jth node resulting from the random forces and the

elastic stored energy within the structure are also random. Then we have the following

relations

ε = fTd, f = G(x) d, ˜ε(ω) = ˜f(ω)T ˜d(ω), and ˜f(ω) = G(x) ˜d(ω).

Suppose we design a structure for a customer. The structure will be installed in an

open environment. From past experience, the customer can provide us with sufficient in-

formation so that we can model the external forces that will be applied to the nodes of the

structure as random variables with known distribution functions. Given this information,

we can formulate a model of this problem so that we can guarantee that our structure will

be able to continue to function and stand up against the worst environmental conditions.

128

In summary, we solve the following SRQCP problem:

min s+ E [Q(x, ω)]

s.t.∑k

i=1DTi ui = f

uTi ui ≤ xiti, i = 1, . . . , k

1T t ≤ s

x ≤ x ≤ x

lT x ≤ v,


min ˜s

s.t.∑k

i=1 DTi

˜iu(ω) = ˜f(ω)

˜iu(ω)T ˜

iu(ω) ≤ xi˜ti, i = 1, . . . , k

1T ˜t ≤ ˜s,

and

E[Q(x, ω)] :=

∫Ω

Q(x, ω)P (dω).

129

Chapter 6

Related Open problems:

Multi-Order Cone Programming

Problems

In this chapter we introduce a new class of convex optimization problems that can be

viewed as an extension of second-order cone programs. We present primal and dual forms

of multi-order cone programs (MOCPs) in which we minimize a linear function over a

Cartesian product of pth-order cones (we allow different p values for different cones in the

product). We then indicate weak and strong duality relations for the problem. We also

introduce mixed integer multi-order cone programs (MIMOCPs) to handle MOCPs with

integer-valued variables, and two-stage stochastic multi-order cone programs (SMOCPs)

with recourse to handle uncertainty in data defining (deterministic) MOCPs. We demon-

strate how decision making problems associated with facility location problems lead to

MOCP, MIMOCP, and SMOCP models. It is interesting to investigate other applicational

settings leading to (deterministic, stochastic and mixed integer) MOCPs. Development

of algorithms for such multi-order cone programs which in turn will benefit from a duality

theory is equally interesting and remains one of the important open problems for future

130

research.

We begin by introducing some notations we use in the sequel.

Given p ≥ 1, the pth-order cone of dimension n is defined as

Qnp := x = (x0; x) ∈ R× Rn−1 : x0 ≥ ‖x‖p

where ‖ · ‖p denotes the p-norm. The cone Qnp is regular (see, for example, [45]). As

special cases, when p = 2 we obtain Qn2 := En+; the second-order cone of dimension n, and

when p = 1 or ∞, Qnp is a polyhedral cone.

We write x 〈n〉〈p〉 0 to mean that x ∈ Qnp . Given 1 ≤ pi ≤ ∞ for i = 1, 2, . . . , r. We

write x 〈n1,n2,...,nr〉〈p1,p2,...,pr〉 0 to mean that x ∈ Qn1

p1× Qn2

p2× · · · × Qnr

pr . It is immediately seen

that, for every vector x ∈ Rn where n =∑r

i=1 ni, x 〈n1,n2,...,nr〉〈p1,p2,...,pr〉 0 if and only if x is

partitioned conformally as x = (x1;x2; . . . ;xr) and xi 〈ni〉〈pi〉 0 for i = 1, 2, . . . , r. For

simplicity, we write:

• Qnp as Qp and x 〈n〉〈p〉 0 as x 〈p〉 0 when n is known from the context;

• Qn1p1×Qn2

p2× · · · ×Qnr

pr as Q〈p1,p2,...,pr〉 and x 〈n1,n2,...,nr〉〈p1,p2,...,pr〉 0 as x 〈p1,p2,...,pr〉 0 when

n1, n2, . . . , nr are known from the context;

• x 〈p, p, . . . , p︸︷︷︸r times

〉 0 as x r〈p〉 0.

The set of all interior points of Qnp is denoted by int(Qnp ) := x = (x0; x) ∈ R ×

Rn−1 : x0 > ‖x‖p. We write x 〈p1,p2,...,pr〉 0 to mean that x ∈ int(Q〈p1,p2,...,pr〉) :=

int(Qp1)× int(Qp2)× · · · × int(Qpr).

6.1 Multi-order cone programming problems

Let r ≥ 1 be an integer, and p1, p2, . . . , pr are such that 1 ≤ pi ≤ ∞ for i = 1, 2, . . . , r.

Let m,n, n1, n2, . . . , nr be positive integers such that n =∑r

i=1 ni. Then we define an

131

MOCP in primal standard form as

min cTx

(P ) s.t. A x = b

x 〈p1,p2,...,pr〉 0,

where A ∈ Rm×n, b ∈ Rm and c ∈ Rn constitute given data, and x ∈ Rn is the primal

decision variable. We define a MOCP in dual standard form as

max bTy

(D) s.t. ATy + z = c

z 〈q1,q2,...,qr〉 0,

where y ∈ Rm and z ∈ Rn are the dual decision variables, and q1, q2, . . . , qr are integers

such that 1 ≤ qi ≤ ∞ for i = 1, 2, . . . , r.

If (P) and (D) are defined by the same data, and qi is conjugate to pi, in the sense

that 1/pi + 1/qi = 1 for i = 1, 2, . . . , r, then we can prove relations between (P) and (D)

(see §3) to justify referring to (D) as the dual of (P) and vice versa.

A pth-order cone programming (POCP) problem in primal standard form is

min cTx

s.t. A x = b

x r〈p〉 0,

(6.1.1)

where m,n, n1, n2, . . . , nr are positive integers such that n =∑r

i=1 ni, p ∈ [1,∞], A ∈

Rm×n, b ∈ Rm and c ∈ Rn constitute given data, x ∈ Rn is the primal variable. According

to (D), the dual problem associated with POCP (6.1.1) is

132

max bTy

s.t. ATy + z = c

z r〈q〉 0,

(6.1.2)

where y ∈ Rm and z ∈ Rn are the dual variables and q is conjugate to p.

Clearly, second-order cone programs are a special case of POCPs which occurs when

p = 2 in (6.1.1) (hence q = 2 in (6.1.2)).

Example 21. Norm minimization problems:

In [1], Alizadeh and Goldfarb presented second-order cone programming formulations of

three norm minimization problems where the norm is the Euclidean norm. In this subsec-

tion we indicate extensions of these three problems where we use arbitrary p norms leading

to MOCPs. Let vi = Aix+ bi ∈ Rni−1, i = 1, 2, . . . , r. The following norm minimization

problems can be cast as MOCPs:

1. Minimization of the sum of norms: The problem min∑r

i=1 ‖vi||pi can be formulated

as

min∑r

i=1 ti

s.t. Aix+ bi = vi, i = 1, 2, . . . , r

(t1;v1; t2;v2; . . . ; tr;vr) 〈p1,p2,...,pr〉 0.

2. Minimization of the maximum of norms: The problem min max1≤i≤r ‖vi||pi can be

expressed as the MOCP problem

min t

s.t. Aix+ bi = vi, i = 1, 2, . . . , r

(t;v1; t;v2; . . . ; t;vr) 〈p1,p2,...,pr〉 0.

3. Minimization of the sum of the k largest norms: More generally, the problem of

minimizing the sum of the k largest norms can also be cast as an MOCP. Let the

133

norms ‖v[1]‖p[1] , ‖v[2]‖p[2] , . . . , ‖v[r]‖p[r] be the norms ‖v1||p1 , ‖v2||p2 , . . . , ‖vr||pr sorted

in nonincreasing order. Then the problem min∑r

i=1 ‖v[i]‖p[i] can be formulated (see

also [1] and [23] and the related references contained therein) as the MOCP problem

min∑r

i=1 si + kt

s.t. Aix+ bi = vi, i = 1, 2, . . . , r

(s1 + t;v1; s2 + t;v2; . . . ; sr + t;vr) 〈p1,p2,...,pr〉 0

si ≥ 0, i = 1, 2, . . . , r.

6.2 Duality

Since MOCPs are a class of convex optimization problems, we can develop a duality theory

for them. Here we indicate weak and strong duality for the pair (P, D) as justification for

referring to them as a primal dual pair.

It was shown by Faraut and Koranyi in [18, Chapter I.2] that the second-order cone

is self-dual. We now prove the more general result that the the dual of the pth-order cone

of dimension n is the qth-order cone of dimension n, where q is the conjugate to p.

Lemma 6.2.1. Qp∗ = Qq, where 1 ≤ p ≤ ∞ and q is the conjugate to p. More gen-

erally, Q〈p1,p2,...,pr〉∗ = Q〈q1,q2,...,qr〉, where 1 ≤ pi ≤ ∞ and qi is the conjugate to pi for

i = 1, 2, . . . , r.

Proof. The proof of the second part trivially follows from the first part and the fact

that (K1 × K2 × · · · × Kr)∗ = K∗1 × K∗2 × · · · × K∗r . To prove the first part, we first

prove that Qq ⊆ Qp∗. Let x = (x0; x) ∈ Qq, we show that x ∈ Qp∗ by verifying that

xTy ≥ 0 for any y ∈ Qp. So let y = (y0; y) ∈ Qp. Then xTy = x0 y0 + xTy ≥

‖x‖q‖y‖p + xTy ≥ |xTy|+ xTy ≥ 0, where the first inequality follows from the fact that

x ∈ Qq and y ∈ Qp and the second one from Holder’s inequality. Now we show Qp∗ ⊆ Qq.

Let y = (y0; y) ∈ Qp∗, we show that y ∈ Qq by verifying that y0 ≥ ‖y‖q. This is trivial if

134

y = 0 or p =∞. If y 6= 0 and 1 ≤ p <∞, let u := (y1p/q; y2

p/q; . . . ; yn−1p/q) and consider

x := (‖u‖p;−u) ∈ Qp. Then by using Holder’s inequality, where the equality is attained,

we obtain 0 ≤ xTy = ‖u‖p y0 − uTy = ‖u‖p y0 − ‖u‖p‖y‖q = ‖u‖p (y0 − ‖y‖q). This

gives that y0 ≥ ‖y‖q. 2

From this lemma we deduce that the pth-order cone is reflexive, i.e., Qp∗∗ = Qp, and

more generally, Q〈p1,p2,...,pr〉 is also reflexive. On the basis of this fact, it is natural to infer

that the dual of the dual is the primal.

In view of the above lemma, problem (D) can be derived from (P) through the usual

Lagrangian approach. The Lagrangian function for (P) is

L(x,λ,ν) = cTx− λT(A x− b)− νTx.

The dual objective is

q(λ,ν) := infxL(x,λ,ν) = inf

x(c− ATλ− ν)Tx+ λTb.

In fact, we may call the constraint x 〈p1,p2,...,pr〉0 as the “nonnegativity of x”, but,

with respect to the multi-order cone Q〈p1,p2,...,pr〉. Note that the Lagrange multiplier ν

corresponding to the inequality constraint x 〈p1,p2,...,pr〉0 is restricted to be nonnegative

with respect to the dual of Q〈p1,p2,...,pr〉 (i.e., ν 〈q1,q2,··· ,qr〉0), whereas the Lagrange mul-

tiplier λ corresponding to the equality constraint Ax− b = 0 is unrestricted. Hence the

dual problem is obtained by maximizing q(λ,ν) subject to ν 〈q1,q2,...,qr〉0.

If c − ATλ − ν 6= 0, the infimum is clearly −∞. So we can exclude λ for which

c − ATλ − ν 6= 0. When c − ATλ − ν = 0, the dual objective function is simply λTb.

135

Primal Minimum Maximum Dual

C vector: 〈p1,p2,...,pr〉 vector: 〈q1,q2,...,qr〉 V

O vector: 〈p1,p2,...,pr〉 vector: 〈q1,q2,...,qr〉 A

N vector or scalar: ≥ vector or scalar: ≥ RS vector or scalar: ≤ vector or scalar: ≤ BT vector or scalar: = vector or scalar: free LV vector: 〈p1,p2,...,pr〉 vector: 〈q1,q2,...,qr〉 C

A vector: 〈p1,p2,...,pr〉 vector: 〈q1,q2,...,qr〉 O

R vector or scalar: ≥ vector or scalar: ≤ NB vector or scalar: ≤ vector or scalar: ≥ SL vector or scalar: free vector or scalar: = T

Table 6.1: Correspondence rules between primal and dual MOCPs.

Hence, we can write the dual problem as follows:

max bTλ

s.t. ATλ+ ν = c

ν 〈q1,q2,...,qr〉 0.

(6.2.1)

Replacing λ by x and ν by z in (6.2.1) we get (D).

In general, MOCPs can be written in a variety of forms different from the standard

form (P, D). The situation in MOCPs is similar to that for linear programs; any MOCP

problem can be written in the standard form. However, if we consider MOCPs in other

forms, then it is more convenient to apply the duality rules directly. Table 6.1 is a

summary of these rules. This table is a generalization of a similar table in [14, Section

4.2].

For instance, using this table, the dual of (P) is the problem

max bTλ

s.t. ATλ 〈q1,q2,...,qr〉c,

which is equivalent to problem (6.2.1) where ATλ〈q1,q2,...,qr〉c means that c−ATλ〈q1,q2,...,qr〉0.

136

Using Lemma 6.2.1, we can prove the following weak duality property.

Theorem 6.2.1. (Weak duality) If x is any primal feasible solution of (P) and (y, z)

is any feasible solution of (D), then the duality gap cTx− bTy = xTz ≥ 0.

Proof. Note that cTx − bTy = (ATy + z)Tx − bTy = yTAx + zTx − yTb = yT(Ax −

b) + zTx = xTz. Since x ∈ Q〈p1,p2,...,pr〉 and z ∈ Q〈q1,q2,...,qr〉 = Q〈p1,p2,...,pr〉∗, we conclude

that xTz ≥ 0. 2

We now give conditions for strong duality to hold. We say that problem (P) is strictly

feasible if there exists a primal feasible point x such that x 〈p1,p2,...,pr〉 0. In the remaining

part of this section, we assume that the m rows of the matrix A are linearly independent.

Using the Karush-Kuhn-Tucker (KKT) conditions, we state and prove the following strong

duality result.

Theorem 6.2.2. (Strong duality I) Consider the primal–dual pair (P, D). If (P) is strictly

feasible and solvable with a solution x, then (D) is solvable and the optimal values of (P)

and (D) are equal.

Proof. By the assumptions of the theorem, x is an optimal solution of (P) where we can

apply the KKT conditions. This implies that there are Lagrange multiplier vectors λ and

ν such that (x, λ, ν) satisfies

A x = b, x 〈p1,p2,...,pr〉0,

ATλ+ ν = c, ν 〈q1,q2,...,qr〉0,

xTi νi = 0, for i = 1, 2, . . . , r.

This implies that (λ, ν) is feasible for the dual problem (D). Let (y, z) be any feasible

solution of (D), then we have that bTy ≤ cTx = xTν + bTλ = bTλ, where we used the

weak duality to obtain the inequality and the complementary slackness to obtain the last

equality. Thus, (λ, ν) is an optimal solution of (D) and cTx = bTλ as desired. 2

137

Note that the result in Theorem 6.2.1 is symmetric between (P) and (D). The following

strong duality result can also be obtained by applying the duality relations [30, Theorem

4.2.1] to our problem formulation,

Theorem 6.2.3. (Strong duality II) Consider the primal–dual pair (P, D). If both (P)

and (D) have strictly feasible solutions, then they both have optimal solutions x∗ and

(y∗, z∗), respectively, and p∗ := cTx∗ = d∗ := bTy∗ (i.e., x∗Tz∗ = 0 (complementary

slackness)).

From the above results, we get the following corollary.

Corollary 6.2.1. (Optimality conditions) Assume that both (P) and (D) are strictly

feasible, then (x,y, z) ∈ Rn+m+n is a pair of optimal solutions if and only if

A x = b, x 〈p1,p2,...,pr〉0,

ATy + z = c, z 〈q1,q2,...,qr〉0,

xTi zi = 0, for i = 1, 2, . . . , r.

6.3 Multi-oder cone programming problems over in-

tegers

In this section we introduce two important related problems that result when decision

variables in an MOCP can only take integer values. Consider the MOCP problem (P).

If we require an additional constraint that a subset of the variables have to attain 0-1

values, then we are interested in optimization problem of the form

138

min cTx

s.t. Ax = b

x〈p1,p2,...,pr〉0

xk ∈ 0, 1, k ∈ Γ,

where Γ ⊆ 1, 2, . . . , n, the decision variable x ∈ Rn has some of its components xk

(k ∈ Γ) with integer values and bounded by αk, βk ∈ R. This class of optimization

problems may be termed as 0-1 multi-order cone programs (0-1MOCPs).

A more general and interesting problem when in a MOCP some variables can only

take integer values. If we are given the same data A, b, and c as in (P), then we are

interested in the problem of the form

min cTx

s.t. Ax = b

x〈p1,p2,...,pr〉0

xk ∈ [αk, βk]⋂Z, k ∈ Γ,

(6.3.1)

where Γ ⊆ 1, 2, . . . , n, the decision variable x ∈ Rn has some of its components xk

(k ∈ Γ) with integer values and bounded by αk, βk ∈ R. This class of optimization

problems may be termed as mixed integer multi-order cone programs (MIMOCPs).

6.4 Multi-oder cone programming problems under

uncertainty

In this section we define two-stage stochastic multi-order cone programs (SMCOPs) with

recourse to handle uncertainty in data defining (deterministic) MOCPs. Let r1, r2 ≥ 1 be

integers. For i = 1, 2, . . . , r1 and j = 1, 2, . . . , r2, let p1i, p2j ∈ [1,∞] andm1,m2, n1, n2, n1i,

139

n2j be positive integers such that n1 =∑r1

i=1 n1i and n2 =∑r2

i=1 n2j. An SMOCP with re-

course in primal standard form is defined based on deterministic dataA ∈ Rm1×n1 , b ∈ Rm1

and c ∈ Rn1 and random data T ∈ Rm2×n1 ,W ∈ Rm2×n2 ,h ∈ Rm2 and d ∈ Rn2 whose

realizations depend on an underlying outcome ω in an event space Ω with a known prob-

ability function P . Given this data, an SMOCP with recourse in primal standard form

is


s.t. Ax = b

x 〈p11,p12,...,p1r1 〉0,


the problem

min d(ω)Ty

s.t. W (ω)y = h(ω)− T (w)x

y 〈p21,p22,...,p2r2 〉0,

where y ∈ Rn2 is the second-stage variable, and

E[Q(x, ω)] :=

∫Ω

Q(x, ω)P (dω).

Two-stage stochastic pth(second)-order cone programs with recourse are a special case

of SMOCPs with p1i = p2j = p ≥ 1 (p1i = p2j = 2) for all i = 1, 2, . . . , r1 and j =

1, 2, . . . , r2.

6.5 An application

Our application is four versions the FLP (see Subsection 5.1.1). For these four versions we

present problem descriptions leading to a MOCP model, a 0-1MOCP model, an MIMOCP

140

model, and an SMOCP model.

As we mentioned in Subsection 5.1.1, FLPs can be classified based on the distance

measure used in the model between the facilities. If we use the Euclidean distance then

these problems are called Euclidean facility location problems (EFLPs), if we use the rec-

tilinear distance (also known as L1 distance, city block distance, or Manhattan distance)

then these problems are called rectilinear facility location problems (RFLPs). Further-

more, in some applications we use both the Euclidean and the rectilinear distances (based

on the relationships between the pairs of facilities) as the distance measures used in the

model between the facilities to get a mixed of EFLPs and RFLPs that we refer to as

Euclidean-rectilinear facility location problems (ERFLPs). Another way of classification

this problem is based on where we can place the new facilities in the solution space.

When the new facilities can be placed any place in solution space, the problem is called

a continuous facility location problem (CFLP), but usually the decision maker needs the

new facilities to be placed at specific locations (called nodes) and not in any place in

the solution space. In this case the problem is called a discrete facility location problem

(DFLP).

Each one of the next subsections is devoted to a version of ERFLPs. More specifi-

cally, we consider (deterministic) continuous Euclidean-rectilinear facility location prob-

lems (CERFLPs) which leads to an MOCP model, discrete Euclidean-rectilinear facility

location problems (DERFLPs) which leads to a 0-1MOCP model, ERFLPs with integral-

ity constraints which leads to an MIMOCP model, and stochastic continuous Euclidean-

rectilinear facility location problems (stochastic CERFLPs) which leads to an SMOCP

model.

141

6.5.1 CERFLPs—An MOCP model

In single ERFLPs, we are interested in choosing a location to build a new facility among

existing facilities so that this location minimizes the sum of weighted (either Euclidean

or rectilinear) distances to all existing facilities.

Assume that we are given r + s existing facilities represented by the fixed points

a1,a2, . . . ,ar, ar+1,ar+2, . . . ,ar+s in Rn, and we plan to place a new facility represented

by x so that we minimize the weighted sum of the Euclidean distances between x and

each of the points a1,a2, . . . ,ar and the weighted sum of the rectilinear distances between

x and each of the points ar+1,ar+2, . . . ,ar+s. This leads us to the problem

min∑r

i=1wi ‖x− ai‖2 +∑r+s

i=r+1wi ‖x− ai‖1

or, alternatively, to the problem

min∑r+s

i=1 wi ti

s.t. (t1;x− a1; . . . ; tr;x− ar) r〈2〉 0

(tr+1;x− ar+1; . . . ; tr+s;x− ar+s) s〈1〉 0,

where wi is the weight associated with the ith existing facility and the new facility for

i = 1, 2, . . . , r + s.

In multiple ERFLPs we add m new facilities, namely x1,x2, . . . ,xm ∈ Rn, instead of

adding only one. We have two cases depending whether or not there is an interaction

among the new facilities in the underlying model. If there is no interaction between the

new facilities, we are just concerned in minimizing the weighted sums of the distance

between each one of the new facilities and each one of the fixed facilities. In other words,

we solve the following MOCP model:

142

min∑m

j=1

∑r+si=1 wij tij

s.t. (t1j;xj − a1; . . . ; trj;xj − ar) r〈2〉 0, j = 1, 2, . . . ,m

(t(r+1)j;xj − ar+1; . . . ; t(r+s)j;xj − ar+s) s〈1〉 0, j = 1, 2, . . . ,m,

(6.5.1)

where wij is the weight associated with the ith existing facility and the jth new facility

for j = 1, 2, . . . ,m and i = 1, 2, . . . , r + s.

If interaction exists among the new facilities, then, in addition to the above require-

ments, we need to minimize the sum of the (either Euclidean or rectilinear) distances

between each pair of the new facilities. Let 1 ≤ l ≤ m and assume that we are required

to minimize the weighted sum of the Euclidean distances between each pair of the new

facilities x1,x2, . . . ,xl and the weighted sum of the rectilinear distances between each

pair of the new facilities xl+1,xl+2, . . . ,xm. In this case, we are interested in a model of

the form:

min∑m

j=1

∑r+si=1 wij tij +

∑mj=2



(t(r+1)j;xj − ar+1; . . . ; t(r+s)j;xj − ar+s) s〈1〉 0, j = 1, 2, . . . ,m

(tj(j+1);xj − xj+1; . . . ; tjl;xj − xl) (l−j)〈2〉 0, j = 1, 2, . . . , l − 1

(tj(j+1);xj − xj+1; . . . ; tjm;xj − xm) (m−j)〈1〉 0, j = l + 1, 2, . . . ,m− 1,

(6.5.2)


and j = 2, 3, . . . ,m.

6.5.2 DERFLPs—A 0-1MOCP model

We consider the discrete version of the problem by assuming that the new facilities

x1,x2, . . . ,xm need to be placed at specific locations and not in any place in 2- or 3-

143

(or higher) dimensional space. Let the points v1,v2, . . . ,vk ∈ Rn represent these specific

locations where k ≥ m. So, we add the constraint xi ∈ v1,v2, . . . ,vk for i = 1, 2, . . . ,m.

Clearly, for i = 1, 2, . . . ,m, the above constraint can be replaced by the following linear

and binary constraints:

xi = v1 yi1 + v2 yi2 + · · ·+ vk yik,

yi1 + yi2 + · · ·+ yik = 1, and

yi = (yi1; yi2; . . . ; yik) ∈ 0, 1k.

We also assume that we cannot place more than one facility at each location. Conse-

quently, we add the following constaints:

(1; y1l; y2l; . . . ; yml) 〈1〉 0, for l = 1, 2, . . . , k.

If there is no interaction between the new facilities, then the MOCP model (6.5.1) becomes

the following 0-1MOCP model:

min∑m

j=1

∑r+si=1 wij tij



xi = v1 yi1 + v2 yi2 + · · ·+ vk yik, i = 1, 2, . . . ,m

(1; y1l; y2l; . . . ; yml) 〈1〉 0, for l = 1, 2, . . . , k

1T yi = 1, yi ∈ 0, 1k, i = 1, 2, . . . ,m.

If interaction exists among the new facilities, then the MOCP model (6.5.2) becomes the

144

following 0-1MOCP model:

min∑m

j=1

∑r+si=1 wij tij +

∑mj=2





(tj(j+1);xj − xj+1; . . . ; tjm;xj − xm) (m−j)〈1〉 0, j = l + 1, 2, . . . ,m− 1

xi = v1 yi1 + v2 yi2 + · · ·+ vk yik, i = 1, 2, . . . ,m

(1; y1l; y2l; . . . ; yml) 〈1〉 0, for l = 1, 2, . . . , k

1T yi = 1, yi ∈ 0, 1k, i = 1, 2, . . . ,m.

For l = 1, 2, . . . , k, let zl = 1 if the location vi is chosen, and 0 otherwise. Then, we can

go further, and consider more assumptions: Let k1, k2, k3, k4 ∈ [1, k] be integers such that

k1 ≤ k2 and k3 ≤ k4. If we must choose at most k1 of the locations v1,v2, . . . ,vk2 , then

we impose the constraints:

(k1; z1; z2; . . . ; zk2) 〈1〉 0, and z ∈ 0, 1k.

If we must choose at most k1 of the locations v1,v2, . . . ,vk2 , or at most k3 of the

locations v1,v2, . . . ,vk4 , then we impose the constraints:

(k1f ; z1; z2; . . . ; zk2) 〈1〉 0, (k3(1− f); z1; z2; . . . ; zk4) 〈1〉 0, z ∈ 0, 1k, and f ∈ 0, 1.

6.5.3 ERFLPs with integrality constraints—An MIMOCP model

In some problems we may need the locations to have integer-valued coordinates. In most

cities, streets are laid out on a grid, so that city is subdivided into small numbered blocks

that are square or rectangular. In this case, usually the decision maker needs the new

145

facility to be placed at the corners of the city blocks. Thus, for each i ∈ ∆ ⊂ 1, 2, . . . ,m,

let us assume that the variable xi lies in the hyperrectangle Ξni ≡ xi : ζi ≤ xi ≤ ηi, ζi ∈

Rn,ηi ∈ Rn and has to be integer-valued, i.e. xi must be in the grid Ξni

⋂Zn. Thus,

if there is no interaction between the new facilities, then instead of solving the MOCP

model (6.5.1), we solve the following MIMOCP model:

min∑m

j=1

∑r+si=1 wij tij +

∑mj=2




xk ∈ Ξnk

⋂Zn, k ∈ ∆.

If interaction exists among the new facilities, then instead of solving the MOCP model

(6.5.2), we solve the following MIMOCP model:

min∑m

j=1

∑r+si=1 wij tij +

∑mj=2





(tj(j+1);xj − xj+1; . . . ; tjm;xj − xm) (m−j)〈1〉 0, j = l + 1, 2, . . . ,m− 1

xk ∈ Ξnk

⋂Zn, k ∈ ∆.

6.5.4 Stochastic CERFLPs—An SMOCP model

Before we describe the stochastic version of this generic application, we indicate a more

concrete version of it. Assume that we have a new growing city with many suburbs and

we want to build a hospital for treating the residents of this city. Some people live in

the city at the present time. As the city expands, many houses in new suburbs need to

be built and the locations of these suburbs will be known in the future in different parts

of the city. Our goal is to find the best location of this hospital so that it can serve the

146

current suburbs and the new ones. This location must be determined at the current time

and before information about the locations of the new suburbs become available. For

those houses that are close enough to the location of the hospital, we use the rectilinear

distance as a distance measure between them and the hospital, while for the new suburbs

that that will be located far way from the location of the hospital, we use the Euclidean

distance as a distance measure between them and the hospital. See Figure 6.1.

Figure 6.1: A more concrete version of the stochastic CERFLP: A new growing city withmany expected building houses in different possible sides of the city.

Generally speaking, let a1,a2, . . . ,ar1 ,ar1+1,ar1+2, . . . ,ar1+s1 be fixed points in Rn

representing the coordinates of r1+s1 existing fixed facilities and a1(ω), a2(ω), . . . , ar2(ω),

ar2+1(ω), ar2+2(ω), . . . , ar2+s2(ω) be random points in Rn representing the coordinates of

r2 + s2 random facilities whose realizations depends on an underlying outcome ω in an

event space Ω with a known probability function P .

Suppose that at present we do not know the realizations of r2 + s2 random facilities,

and that at some point in time in future the realizations of these r2 + s2 random facilities

147

become known.

Our goal is to locate m new facilities x1,x2, . . . ,xm ∈ Rn that minimize the the

following sums:

• the weighted sums of the Euclidean distance between each one of the new facilities

and each one of the fixed facilities a1,a2, . . . ,ar1 ;

• the weighted sums of the rectilinear distance between each one of the new facilities

and each one of the fixed facilities ar1+1,ar1+2, . . . ,ar1+s1 ;

• the expected weighted sums of the Euclidean distance between each one of the new

facilities and each one of the random facilities a1(ω), a2(ω), . . . , ar2(ω);

• the expected weighted sums of the rectilinear distance between each one of the new

facilities and each one of the random facilities ar2+1(ω), ar2+2(ω), . . . , ar2+s2(ω).

Note that this decision needs to be made before the realizations of the r2 + s2 random

facilities become available. If there is no interaction between the new facilities, then the

(deterministic) MOCP model (6.5.1) becomes the following SMOCP model:

min∑m

j=1

∑r1+s1i=1 wij tij + E [Q(x1; . . . ;xm, ω)]

s.t. (t1j;xj − a1; . . . ; tr1j;xj − ar1) r1〈2〉 0, j = 1, 2, . . . ,m

(t(r1+1)j;xj − ar1+1; . . . ; t(r1+s1)j;xj − ar1+s1) s1〈1〉 0, j = 1, 2, . . . ,m,


min∑m

j=1

∑r2+s2i=1 wij(ω) tij

s.t. (t1j;xj − a1(ω); . . . ; tr2j;xj − ar2(ω)) r2〈2〉 0, j = 1, 2, . . . ,m

(t(r2+1)j;xj − a(r2+1)(ω); . . . ; t(r2+s2)j;xj − a(r2+s2)(ω)) s2〈1〉 0, j = 1, 2, . . . ,m,

148

and

E[Q(x1; . . . ;xm, ω)] :=

∫Ω

Q(x1; . . . ;xm, ω)P (dω),

where wij is the weight associated with the ith existing facility and the jth new facility for

j = 1, 2, . . . ,m and i = 1, 2, . . . , r1 + s1, and wij(ω) is the weight associated with the ith

random existing facility and the jth new facility for j = 1, 2, . . . ,m and i = 1, 2, . . . , r2+s2.

If interaction exists among the new facilities, then the (deterministic) MOCP model

(6.5.2) becomes the following SMOCP model:

min∑m

j=1

∑r1+s1i=1 wij tij +

∑mj=2

∑j−1j′=1 wjj′ tjj′ + E [Q(x1; . . . ;xm, ω)]

s.t. (t1j;xj − a1; . . . ; tr1j;xj − ar1) r1〈2〉 0, j = 1, 2, . . . ,m

(t(r1+1)j;xj − ar1+1; . . . ; t(r1+s1)j;xj − ar1+s1) s1〈1〉 0, j = 1, 2, . . . ,m




min∑m

j=1

∑r2+s2i=1 wij(ω) tij +

∑mj=2


s.t. (t1j;xj − a1(ω); . . . ; tr2j;xj − ar2(ω)) r2〈2〉 0, j = 1, 2, . . . ,m

(t(r2+1)j;xj − a(r2+1)(ω); . . . ; t(r2+s2)j;xj − a(r2+s2)(ω)) s2〈1〉 0, j = 1, 2, . . . ,m



and

E[Q(x1; . . . ;xm, ω)] :=

∫Ω

Q(x1; . . . ;xm, ω)P (dω),


and j = 2, 3, . . . ,m.

149

Chapter 7

Conclusion

In this dissertation we have introduced the two-stage stochastic symmetric programming

(SSP) with recourse. While one can take the view point that SSPs are a way of dealing

with uncertainty in data defining deterministic symmetric programs, it is also possible to

take the dual viewpoint that SSPs are a way of allowing any symmetric cone in place of

the nonnegative orthant cone in stochastic linear programs, or the positive semidefinite

cone in stochastic semidefinite programs. It is also seen that the SSP problem includes

some important general classes of optimization problems (Problems 1-6) as special cases.

We have presented a logarithmic barrier interior point algorithm for solving SSPs

using Bender’s decomposition. We have proved the convergence results by showing that

the logarithmic barrier associated with the recourse function is a strongly self-concordant

barrier on the first-stage solutions. We have described and analyzed short- and long-

step variance of the algorithm that follow the primal central trajectory of the first-stage

problem. We concluded that, for given two symmetric cones with ranks r1 and r2 in the

first- and second-stage problems, respectively, and for K number of realizations, we need

at most O(√r1 +Kr2 ln(µ0/ε)) Newton iterations in the short-step class of the algorithm

to follow the first-stage central path from a starting value of the barrier µ0 to a terminating

value ε. We have also shown that at most O((r1 + Kr2) ln(µ0/ε)) Newton iterations for

150

this recentering in the long-step class of the algorithm.

An alternative to the logarithmic barrier is the volumetric barrier of Vaidya [40]. In this

dissertation, we have also presented a class of volumetric barrier decomposition algorithms

for SSPs. We have proved the convergence results by showing that the volumetric barrier

associated with the recourse function is a strongly self-concordant barrier on the first-

stage solutions. We have also described and analyzed short- and long-step variant of

the algorithm that follow the primal central trajectory of the first-stage problem. For

given two symmetric cones with ranks r1 and r2 and dimensions n1 and n2 in the first-

and second-stage problems, respectively, and for K number of realizations, we have seen

that in order to follow the first-stage central path from a starting value of the barrier µ0

to a terminating value ε, we need at most O(√

(1 +K)(m1c1 +m2c2) ln(µ0/ε)) Newton

iterations in the short-step class of the algorithm. We have also shown that at most

O((1+K)(m1c1+m2c2) ln(µ0/ε)) Newton iterations in the long-step class of the algorithm.

Chen and Mehrotra [16] have proposed a prototype primal interior point decomposi-

tion algorithm for the two-stage stochastic convex optimization problem. Since stochastic

symmetric programming is a subclass of general stochastic convex optimization, our prob-

lem can be solved by using their prototype algorithm, but doing this is not a good idea,

because solving SSPs as convex programs without exploiting the special structure of the

symmetric cone is not advisable both theoretically and practically. Therefore, a separate

study of SSP algorithms is warranted.

Concerning applications, we have presented four application areas that lead to SSPs

when the underlying symmetric cones are second-order cones and rotated quadratic cones.

This has been simply accomplished by applying a relaxation on assumptions that include

deterministic data to include random data.

Problems 1-6 in Chapter 2 are all important special cases of the SSP problem. So, it

is interesting to investigate other applicational settings leading to these general problems,

such as the ones obtained when the underlying symmetric cones are real symmetric or

151

complex Hermitian positive semidefinite matrices.

Concerning implementation of the proposed SSP algorithms, it will be interesting to

implement these methods in the future to determine their practical effects. We there-

fore refer ourselves to a paper by Mehrotra and Ozevin [26] where they develop efficient

practical implementations of their SSDP algorithm proposed in [25].

We have ended the dissertation by introducing deterministic and stochastic multi-

order cone programs (MOCPs) as new conic optimization problems and to be natural

extensions of deterministic and stochastic second-order cone programs. We presented an

application leading to deterministic, 0-1, integer, and stochastic multi-order cone pro-

grams. The MOCP problems are over non-symmetric cones, hence we left such problems

as “logarithmically unsolved” open problems so as to contemplate for our future research

in conic programming.

152

Bibliography

[1] F. Alizadeh and D. Goldfarb. Second-order cone programming. Math. Program., Ser.B 95:3–51, 2003.

[2] B. Alzalg. Stochastic second-order cone programming: Application models, Submit-ted.

[3] B. Alzalg. Stochastic symmetric optimization. Submitted.

[4] B. Alzalg and K. A. Ariyawansa. A class of polynomial volumetric barrier decompo-sition algorithms for stochastic symmetric programming, Submitted.

[5] K. M. Anstreicher. On Vaidya’s volumetric cutting plane method for convex pro-gramming. Mathematics of Operations Research, 22(1):63–89, 1997.

[6] K. M. Anstreicher. Volumetric path following algorithms for linear programming.Mathematical Programming, 76(1B):245–263, 1997.

[7] K. M. Anstreicher. The volumetric barrier for semidefinite programming. Mathemat-ics of Operations Research, 25(3):365–380, 2000.

[8] K. A. Ariyawansa and A. J. Felt. On a new collection of stochastic linear programmingtest problems. Informs Journal on Computing, 16(3):291–299, 2004.

[9] K. A. Ariyawansa and Y. Zhu. Stochastic semidefinite programming: A new paradigmfor stochastic optimization. 4OR—The Quarterly Journal of the Belgian, French andItalian OR Societies, 4(3):65–79, 2006.

[10] K. A. Ariyawansa and Y. Zhu. A class of polynomial volumetric barrier decompositionalgorithms for stochastic semidefinite programming. Mathematics of computation,80:1639–1661, 2011.

[11] V. S. Bawa, S. J. Brown, and R. W. Klein. Estimation risk and optimal portfoliochoice: A survey, Proceedings of the American Statistical Association, 53–58, 1979.

[12] A. Ben-Tal and M. P. Bendsøe. A new method for optimal truss topology design.SIAM J. Optimization, 3(2):322–358, 1993.

153

[13] A. Ben-Tal and A. Nemirovski. Interior point polynomial-time method for trusstopology design, Technical report, Faculty of Industrial Engineering and Mangement,Technion, 1992.

[14] D. Bertsimas and J. Tsitsiklis. Introduction to linear optimization, Athena Scientific,1997.

[15] J. R. Birge. Stochastic programming computation and applications. INFORMS Jour-nal on Computing, 9:111–133, 1997.

[16] M. Chen and S. Mehrotra. Self-concordance tree and decomposition based interiorpoint methods for the two stage stochastic convex optimization problem. Technicalreport, Northwestern University, 2007.

[17] M. A. H. Dempster. Stochastic Programming. Academic Press, London, UK, 1980.

[18] J. Faraut and A. Koranyi. Analysis on Symmetric Cones. Oxford University Press,Oxford, UK, 1994.

[19] A. V. Fiacco and G. P. McCormick. Nonlinear Programming: Sequential Uncon-strained Minimization Techniques. John Wiley, New York, NY, 1968.

[20] C. Helmberg, F. Rendl, R. J. Vanderbei, H. Wolkowicz. An interior-point methodsfor stochastic semidefinite programming. SIAM J. of Optimization, 6:342–361, 1996.

[21] P. L. Jiang. Polynomial Cutting Plane Algorithms for Stochastic Programming andRelated Problems. Ph.D. dissertation, Department of Pure and Applied Mathematics,Washington State University, Pullman, WA, 1997.

[22] M. Kojima, S. Shindoh, S. Hara. Interior-point methods for the monotone linearcomplementarity problem in symmetric matrices. SIAM J. of Optimization, 7(9):86–125, 1997.

[23] M. S. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret. Applications of second-ordercone programming. Linear Algebra Appl., 284:193–228, 1998.

[24] F. Maggioni, F.A. Potra, M.I. Bertocchi, and E. Allevi. Stochastic second-order coneprogramming in mobile ad hoc networks. J. Optim. Theory Appl., 143:309–328, 2009.

[25] S. Mehrotra and M. G. Ozevin. Decomposition-based interior point methods for two-stage stochastic semidefinite programming. SIAM J. of Optimization, 18(1):206–222,2007.

[26] S. Mehrotra, and M. Ozevin. On the implementation of interior point decompositionalgorithms for two-stage stochastic conic programs. SIAM J. Optim., 19(4):1846–1880, 2008.

154

[27] S. Mehrotra, and M. Ozevin. Decomposition based interior point methods for two-stage stochastic convex quadratic programs with recourse. Operations Research,54(4):964–974, 2009.

[28] R. D. Monteiro. Primal-dual path-following algorithms for semidefinite programming.SIAM J. Optim., 7:663–678, 1997.

[29] Yu. E. Nesterov and A. S. Nemirovskii. Conic formulation of a convex programmingproblem and duality. Optimization Methods and Software, 1(2):95–115 , 1992.

[30] Yu. E. Nesterov and A. S. Nemirovski. Interior Point Polynomial Algorithms inConvex Programming., SIAM Publications. SIAM, Philadelphia, PA, USA, 1994.

[31] A. Prekopa. On probabilistic constrained programming. Proceedings of the PrincetonSymposium on Mathematical Programming. 113–138, Princeton Univ. Press, Prince-ton, NJ, USA, 1970.

[32] A. Prekopa. Stochastic Programming. Kluwer Academic Publishers, Boston, MA,USA, 1995.

[33] A. Prekopa. The use of discrete moment bounds in probabilistic constrained stochasticprogramming models. Ann. Oper. Res., 85:21–38, 1999.

[34] B. K. Rangarajan. Polynomial convergence of infeasible-interior point methods oversymmetric cones. SIAM J. of Optimization, 16(4):1211–1229, 2006.

[35] S. H. Schmieta and F. Alizadeh. Associative and Jordan algebras, and polynomialtime interior point algorithms for symmetric cones. Mathematics of Operations Re-search, 26(3):543–564, 2001.

[36] S. H. Schmieta and F. Alizadeh. Extension of primal-dual interior point methods tosymmetric cones. Math. Program., Ser. A 96:409–438, 2003.

[37] P. Sun and R. M. Freund. Computation of minimum-cost covering ellipsoids. Opera-tions Research, 52(5):690–706, 2004.

[38] M. J. Todd. Semidefinite optimization. ACTA Numerica, 10:515–560, 2001.

[39] J. A. Tompkins, J. A. White and Y. A. Bozer, J. M. Tanchoco Facilities Planning,3rd edn. Wiley, Chichester, 2003.

[40] P. M. Vaidya. A new algorithm for minimizing convex functions over a convex set.Math. Program., Ser. A, 73:291–341, 1996.

[41] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Rev., 38:49–95,1996.

[42] R. J. Vanderbei and H. Yurittan. Using LOQO to solve second-order cone program-ming problems. Report SOR 98-09, Princeton University, Princeton, USA, 1998.

155

[43] D. S. Watkins. Fundamentals of Matrix Computations. 3rd edn. Wiley, New York,2010.

[44] R. J-B. Wets. Stochastic programming: Solution techniques and approximationschemes. Mathematical Programming: The State of the Art, 1982, A. Bachem, M.Groeschel, and B. Korte, eds., Springer-Verlag, Berlin, 566–603, 1982.

[45] M. M. Wiecek. Advances in cone-based preference modeling for decision making withmultiple criteria. Decision Making in Manufacturing and Services, 1(1-2):153–173,2007.

[46] F. Zhang. Quaternions and matrices of quaternions. Linear Algebra Appl., 251(2):21–57, 1997.

[47] Y. Zhang. On extending primal-dual interior-point algorithms from linear programingto semidefinite programming. SIAM J. Optim., 8:356–386, 1998.

[48] G. Zhao. A log-barrier method with Benders decomposition for solving two-stagestochastic linear programs. Math. Program., Ser. A, 90:507–536, 2001.

[49] Y. Zhu and K. A. Ariyawansa. A preliminary set of applications leading to stochasticsemidefinite programs and chance-constrained semidefinite programs. Applied Math-ematical Modelling, 35(5):2425–2442, 2011.

156

Documents

research.wsulibs.wsu.edu€¦ · ACKNOWLEDGEMENTS My greatest appreciation and my most sincere \Thank You!" go to my advisor Pro-fessor Ari Ariyawansa for his guidance, advice, and