Comprehensive Robustness via Moment-Based Optimization ... › bitstream › 1807 › ... · The use of a stochastic model to predict the likelihood of future outcomes forms an integral

Comprehensive Robustness via Moment-Based Optimization :Theory and Applications

by

Jonathan Yu-Meng Li

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Department of Mechanical and Industrial EngineeringUniversity of Toronto

Copyright c© 2012 by Jonathan Yu-Meng Li

Abstract

Comprehensive Robustness via Moment-Based Optimization : Theory and Applications

Jonathan Yu-Meng Li

Doctor of Philosophy

Department of Mechanical and Industrial Engineering

University of Toronto

2012

The use of a stochastic model to predict the likelihood of future outcomes forms an

integral part of decision optimization under uncertainty. In classical stochastic modeling

uncertain parameters are often assumed to be driven by a particular form of probability

distribution. In practice however, the distributional form is often difficult to infer from

the observed data, and the incorrect choice of distribution can lead to significant quality

deterioration of resultant decisions and unexpected losses. In this thesis, we present

new approaches for evaluating expected future performance that do not rely on an exact

distributional specification and can be robust against the errors related to committing to

a particular specification. The notion of comprehensive robustness is promoted, where

various degrees of model misspecification are studied. This includes fundamental one

such as unknown distributional form and more involved ones such as stochastic moments

and moment outliers. The approaches are developed based on the techniques of moment-

based optimization, where bounds on the expected performance are sought based solely

on partial moment information. They can be integrated into decision optimization and

generate decisions that are robust against model misspecification in a comprehensive

manner. In the first part of the thesis, we extend the applicability of moment-based

optimization to incorporate new objective functions such as convex risk measures and

richer moment information such as higher-order multivariate moments. In the second

ii

part, new tractable optimization frameworks are developed that account for various forms

of moment uncertainty in the context of decision analysis and optimization. Financial

applications such as portfolio selection and option pricing are studied.

iii

To my love, Lily,

and to my parents, John and Jean Li.

iv

Acknowledgements

Along the course of my PhD, I have been fortunate to have the guidance and support of

a number of professors. My supervisor, Professor Roy H. Kwon, has been a role model

for implementing the principle of “being passionate about what you do”. His energy and

creativity has been a key ingredient in initiating many of our research discussions. The

past year, which was filled with the stress of looking for an academic job placement,

would have been much more difficult without his support and advice.

I would like to thank my committee members, Timothy Chan, Samir Elhedhli, Sebas-

tian Jaimungal, and Yuri Lawryshyn for their valuable time, and insightful comments and

suggestions. Special thanks go to Sebastian, who brought up the idea of penalty-based

optimization in my second year seminar, which later led to the fruitful developments

in Chapter 4. Special thanks also go to Samir who, although already being occupied by

many leadership duties, still kindly agreed to be my external examiner, to read my thesis,

and to adjust his schedule to attend my defense.

I am indebted to Professor Michael J. Best and Professor Tamas Terlaky for their

continuous support for many years since my Masters studies. Tamas introduced me to

the field of mathematical programming, whereas Michael opened my eyes in the area of

financial optimization. In particular, I would like to thank Michael for going out of his

way to help me on many occasions.

I would also like to thank my Masters supervisor, Antoine Deza, who helped me

transition from someone with a pure physics background to someone pursuing a research

career in Operations Research.

I have had the pleasure of sharing my PhD years with numerous good friends. I met

many of them during the development of the University of Toronto Operations Research

Group (UTORG), in particular, Mike and Kimia from the very first day. Special thanks

to Mike, Kimia, Velibor, and Jenya, who have made UTORG a family and not just a

v

research group to me. I am also thankful to Steve who made my start at U of T easier,

and to Tim who gave me much advice during my job search.

Last, yet most important, I owe my family all the love and gratitude. For many

years, I have been away from my parents and have always been “too busy” to go back

to visit them. They however never ceased in supporting me in any form they could.

They and my sister, Ann, continuously prayed for me, and my lord, Jesus Christ, has

always provided me with more than I could ever expect. Beyond any doubt, without their

unconditional love I would not be able to make it this far. I reserve this very last part

of the acknowledgement to a special person whom I met during the journey of graduate

studies, and ever since she has been not only a part of my family but a kindred spirit.

Without her companion, I could have lost the true taste of life. Thank you, Lily. Thank

you for your love, compassion, and support.

vi

Contents

1 Introduction and Thesis Outline 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Thesis Outline and Contribution . . . . . . . . . . . . . . . . . . . . . . . 5

2 Moment Problems, Tractable Counterparts, and Application 7

2.1 Moment Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Application in Model-Risk Management . . . . . . . . . . . . . . . . . . 14

2.2.1 Market Price-Based Convex Risk Measures . . . . . . . . . . . . . 15

2.2.2 A Moment-Based Distribution-Free Optimization Approach . . . 17

2.2.3 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3 Tractability of Accouting for Multivariate Moment Information . . . . . . 27

2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 Accounting for Stochastic Moments 36

3.1 Deterministic Semidefinite Optimization Models . . . . . . . . . . . . . . 37

3.2 A Stochastic Semidefinite Optimization Approach . . . . . . . . . . . . . 39

3.3 Solution Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4 Application in Bounding Option Prices . . . . . . . . . . . . . . . . . . . 49

3.4.1 A Moment-Based Lattice under Regime Switching . . . . . . . . . 50

3.4.2 Implementation and Experiments . . . . . . . . . . . . . . . . . . 54

vii

3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4 Distributionally Robust Optimization under Extreme Moment Uncer-

tainty 63

4.1 Moment Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.2 Comprehensive Distributionally Robust Optimization . . . . . . . . . . . 67

4.3 General Complexity Results . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.4 Connection with Classical Minimax Approaches . . . . . . . . . . . . . . 77

4.5 Semidefinite Optimization Reformulations . . . . . . . . . . . . . . . . . 80

4.5.1 Variations of Moment Uncertainty Structures . . . . . . . . . . . 88

4.5.2 Extensions to Factor Models . . . . . . . . . . . . . . . . . . . . . 90

4.6 Application in Portfolio Selection . . . . . . . . . . . . . . . . . . . . . . 92

4.6.1 Portfolio Selection under Model Uncertainty . . . . . . . . . . . . 92

4.6.2 Implementation and Experiments . . . . . . . . . . . . . . . . . . 93

4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5 Conclusion and Future Research 102

A Additional Tables 105

A.1 Tables of Section 2.2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

A.2 Tables of Section 3.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

B Additional Figures 108

B.1 Section 3.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Bibliography 110

viii

List of Tables

2.1 ϑ(V∗) of Qfin for various values of parameters s′, K, τ . . . . . . . . . . . 25

2.2 ϑ(V∗) of Qmom for various values of parameters s′, K, τ , where w′ = 1. . . 26

3.1 Pseudo code for scenario generation . . . . . . . . . . . . . . . . . . . . . 54

4.1 Comparison of different approaches in the period: 1997/01-2003/12 . . . 97




A.1 CB (resp. CM) denotes the call option price of the diffusion (resp. jump-

diffusion) model with Lo’s specification. Cb denotes the call option prices

of the benchmark model, i.e. the jump-diffusion model with k = 1, φ2 =

0.15, λ = 0.25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

A.2 ϑ(V∗) of Qmom for various values of parameters s′, K, τ , where w′ = 2. . . 106

A.3 ϑ(V∗) of Qmom for various values of parameters s′, K, τ , where w′ = 5. . . 106

A.4 Upper/lower bounds and prices for different strike prices K, b+(b−)-values

and time to maturity under 2 regimes. . . . . . . . . . . . . . . . . . . . 107

ix

List of Figures

3.1 Regime switching lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.2 The case of 2 regimes and K = 1200 . . . . . . . . . . . . . . . . . . . . 58



4.1 Cumulative wealth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.2 Cumulative wealth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

B.1 The case of 3 regimes and K = 1325 . . . . . . . . . . . . . . . . . . . . 108



x

Chapter 1

Introduction and Thesis Outline

1.1 Introduction

Consider a portfolio manager who follows a forecast of return distributions to determine

the optimal investment policy. What if the forecast model cannot well represent the re-

alized return? Is the resultant policy still reliable, or could it actually lead to unexpected

losses?

Central to the advance in decision sciences was the new development of probability and

optimization theories that enable decision makers to model the randomness of decision

environments and make decisions that best utilize available data. Stochastic Optimization

(SO), for example, has been a popular decision optimization tool that allows parameters

in decision optimization problems to be modeled as random variables and be driven by a

probabilistic model. These quantitative approaches however have been found unreliable

lately due to deficiency in the probabilistic model used to capture today’s extremely

volatile environments. Classical examples include 2008 financial crisis, where the failure

of modeling extreme correlations leads to global devastating losses.

The challenge today of applying mathematical modeling in decision making can per-

haps be best summarized by a quote from the famous statistician George Box, ”Essen-

1

CHAPTER 1. INTRODUCTION AND THESIS OUTLINE

tially, all models are wrong, but some are useful”. In particular, decision makers often face

challenges in deciding which probability distribution to employ in their decision analysis.

The information that they can acquire about the underlying probability distribution is

often fairly limited. For example, in financial decision making, one rarely has full in-

formation about the joint distribution of asset returns but only partial information such

as first and second order moments [Popescu, 2007]. In such cases, it is often tempting

to assume a particular distributional form such as a multivariate normal distribution in

evaluating expected future performance. This however can be potentially misleading and

often underestimates the true level of downside performance. In addition to the difficulty

of specifying an exact distributional form, in practice even moments of the underlying

distribution can be hard to estimate accurately. It has been found in time series studies

that moments are in many cases stochastic and changing over time. Overlooking such

level of uncertainty can also lead to a false sense of risk exposure since the estimated

volatility level (second order moment) can be completely different from the realized one.

The theme of this thesis is to develop more robust decision analysis by taking into

account all levels of uncertainty associated with distributional form and moments in

evaluating expected future performance. An evaluation approach and associated decision

analysis are considered robust if the evaluation relies only on partial information that

decision makers could acquire about the distribution, and is not sensitive to a particular

realization of distributional form or moments. In particular, throughout this thesis we

consider the following three layers of uncertainty associated with distributional forms and

moments:

1. Distribution uncertainty with fixed moments

2. Distribution uncertainty with stochastic moments

3. Distribution uncertainty with extreme moments (moment outliers)

2


The above three forms of uncertainty share a common feature that they do not assume

any particular form of the distribution. Additional complexities are introduced in the

latter layers as richer forms of moment uncertainty are considered in addition to the

uncertainty of distributional forms. In the second layer, moments are considered random

and governed by a finite-state stochastic model, where each state corresponds to each

possible realization of moments. This layer of uncertainty can be useful for example

to model the random switching of moments exhibited in many time series. The third

layer of uncertainty is motivated from radical behavior in modern decision environments,

where the changes of moments can be fully unpredictable based on available historical

data. For example, many crises from a statistical point of view are outliers, which are

highly improbable but often have devastating impact. In these crises, volatility often

soars to an unprecedented level. Such a layer of uncertainty can also be applied in the

cases that only limited amount of data is available to estimate moments. In these cases,

the estimated range of moments may fail to capture the true moments, and the need

arises to model the true moments as outliers. The idea of accounting for the above three

layers of uncertainties, which include all plausible realizations of distributional forms and

moments, in decision analysis and optimization constitutes the notion of comprehensive

robustness promoted in this thesis. The resultant analysis of performance evaluation

and optimal decision making is expected to be minimally impacted by all conceivable

misspecifications of the underlying probability/stochastic model.

The research challenges here are multifold. First and foremost, in all these layers of

uncertainty since no distributional form is assumed, there are infinitely many probability

measures involved in evaluating expected-performance. In the first layer of uncertainty

for example, an uncountable set of probability distributions that are consistent with given

moments is considered. This raises the following questions: What are the best-possible

estimates on the expected performance inferred from such a set, and how efficiently can

3


we generate the estimates? The questions become more challenging when the moment

information used to characterize the distribution set can only be provided in a stochastic

manner or even incomplete as outlined in the second and third layer of uncertainty.

Second, there is no clear rule of how extreme moments described in the third layer should

be handled. Clearly, the value of extreme moments involved in evaluating expected-

performance cannot be arbitrary, which otherwise leads to meaningless evaluation. This

leads to the questions: How should we decide which extreme moments matter more in

evaluation and discard those otherwise? Lastly, in the context of decision optimization, it

is essential to investigate if decisions based on the new performance-evaluation approaches

can be optimized in a tractable manner.

The methodology developed in this thesis can be viewed as a application of, from

a modeling perspective, moment problems that arise from probability theory. Classical

moment problems are concerned with deriving conditions for the existence of a probability

measure that matches a given sequence of moments. In a more generalized setting,

evaluation of certain expected-value functional is sought based on a sequence of moments.

Such evaluation typically involves infinitely many probability measures that satisfy the

same set of moments, and naturally leads to the problems of deriving upper and lower

bounds on the expected-values over the set of measures. While such a generalized setting

is exactly the framework we consider to find the best-possible estimates on the expected

performance inferred from moment information, the bounds derived based on probability

theory can however be too loose to be informative in decision analysis. Our approach to

derive tight bounds instead hinges on the connection between modern conic optimization

theory and moment problems. In particular, the exploitation of various optimization tools

such as duality and semi-definite optimization theories enables us not only to generate

sharp bounds in moment problems, but also to provide theoretical evidence that the best-

possible estimate can be generated efficiently. The main contribution of this thesis lies in

4


extending the applicability of these moment-based optimization approaches to resolve the

aforementioned research challenges in developing the notion of comprehensive robustness.

1.2 Thesis Outline and Contribution

The thesis is organized as follows.

Chapter 2. Moment Problems, Tractable Counterparts, and Ap-

plication

In this chapter we begin with a brief introduction of moment problems and its connection

with modern optimization theory. Two main streams of conic optimization approaches

that tackle the problems from distinct perspectives, one from the primal and another

from the dual perspective, are reviewed. We first present the application of the dual

approach in developing a special form of risk measures used for measuring the impact

of model uncertainty in derivative pricing. The new application is accompanied with

numerical studies that highlight the benefit of using a moment-based setting. We then

present new tractability results in generating tight (the tightest) bounds for moment

problems involving higher order marginal moments.

Chapter 3. Accounting for Stochastic Moments

We consider a new setting of moment problems, where moments are stochastic and driven

by a finite-state stochastic model that captures the dynamics of a non-stationary decision

environment. To account for distribution information about the states and associated

moments, we present two-stage stochastic semidefinite optimization models as robust

counterparts of semidefinite optimization models arising from moment problems with

fixed moments. The framework is comprehensive in the sense that it includes as special

5


limiting cases the deterministic and robust optimization counterparts. The central result

is a closed-form solution for the optimal value of the proposed optimization model, which

is equivalent to a Value at Risk quantity. The framework is applied in the area of

option pricing to derive upper and lower bounds for the price of a European-style call

option under regime switching, where only conditional moments of regime switching

distributions are assumed. Computational experiments using the S&P 500 index as the

underlying asset are performed that illustrate the advantages of the two-stage stochastic

programming approach over the deterministic strategy.

Chapter 4. Distributionally Robust Optimization under Ex-

treme Moment Uncertainty

The focus of this chapter is to develop a new robust formulation of stochastic program-

ming problems in the presence of rare but high-impact realization of moment uncertainty.

Such an extreme form of moment uncertainty can be treated as moment outliers, which

are difficult to infer from historical data. Prior robust formulations hedge moment un-

certainty by assuming a fixed range of values that moments can possibly fall into, which

however cannot effectively account for moment outliers. Our robust model can be seen as

a moment-based extension from classical penalized minimax frameworks, where a penalty

function is re-designed to account for extreme moment uncertainty. We proved that un-

der very mild conditions, the decision optimization model is guaranteed to be solvable in

a tractable manner, and show that for a wide range of specifications, the model can be

recast as semidefinite programs (SDP) and solved very efficiently. The framework is then

applied to portfolio selection problems. Computational experiments based on real-life

market data are presented, which highlights the utility of our approach during financial

market turmoil.

6

Chapter 2

Moment Problems, Tractable

Counterparts, and Application

In this chapter, we review and exploit the theory of moment problems to generate best-

possible estimates on expected performance when no distributional information except

the first finite number of moments are available for the underlying distribution. Moment

problems have their roots in probability theory. For example, fundamental probability

inequalities such as Markov’s and Chebyshev’s inequality attempt to derive bounds on

the probability of certain events based only on the mean and/or variance of an underlying

random variable. In decision analysis, risk-averse decision makers are often keen to esti-

mate the worst-case expected performance based on available distributional information

such as moments. As the performance measures can be of all sorts, the estimation of the

worst-case performance naturally leads to a more generalized setting of moment problems,

where bounds based on moments are sought on a wide range of expected functionals.

The development of these bounds involves two fundamental questions: How tight are

the bounds? Is it tractable (analytically or computationally) to generate tight bounds?

In the cases that the tightest bounds can be generated, i.e. existence of a probabil-

7

CHAPTER 2. MOMENT PROBLEMS, TRACTABLE COUNTERPARTS, AND

APPLICATION

ity distribution attaining the bound, the bounds can be informative as they represent

the best-possible estimates of the expected performance that can be inferred from just

moment information. Developing such bounds can also be viewed as a robust way to

estimate certain expected-values. It does not rely on a particular form of distribution

and thus the estimation are free from the errors related to committing to a particular

distributional form. From here on, for simplicity, we may call such bounds moment-based

bounds.

The focus of this chapter is to study the tractability of generating tight or the tightest

moment-based bounds via modern optimization theory. The idea of using optimization

theory such as duality theory to formulate a dual optimization problem for which the opti-

mal value attains the tightest moment-based bound can be traced back to the earlier work

of Isii (1960). Smith (1995) then shed new light on the synthesis of the dual optimization

problem, computational strategies, and applications in decision analysis. A major break-

through was by Bertsimas et al. (2000) and Bertismas and Popescu (2002) who exploited

modern conic optimization theory to show that a large class of moment-based bounds can

be efficiently computed by reformulating the corresponding dual optimization problems

as semidefinite programming problems (SDP). Another related, but more generalized

approach using semidefinite programming in generating moment-based bounds was pro-

posed by Lasserre (2001). Instead of relying on the use of duality theory to seek the

tightest bound, which may not always be feasible, Lasserre resorted to the theory related

to the characterization of moment sequences and developed SDP relaxation techniques

that tighten the bounds by a hierarchy of SDP relaxation problems.

In Section 2.1, we present the problem of moments and review briefly the modern

solution approaches proposed by Bertsimas et al. (2000), Bertismas and Popescu (2002),

and Lasserre (2001). In the later sections, we will extend the applicability of these

approaches and present new tractability results. In Section 2.2, the application in model-

8


APPLICATION

risk management is presented, where we develop new moment-based convex risk measures.

In Section 2.3, new SDP reformulations are presented for computing tight and the tightest

moment-based bounds that account for higher-order multivariate moments.

2.1 Moment Problems

Let (<n,B,Q) denote a probability space, where B is the Borel σ−algebra on <n. Suppose

that the expected-value of h(ξ) needs to be estimated, where ξ denotes a vector of n-

dimensional random variables, and h : <n → <. If complete knowledge of the probability

measure Q is available, this leads to the evaluation of the integral

EQ[h(ξ)] :=

∫Ch(ξ)dQ(ξ), (2.1)

where ξ ∈ <n, and C denotes the support of Q. However, in most cases only partial

moment information is available for the measure Q: EQ[φj(ξ)] = bj, j = 1, ..., J, where φj

is a polynomial function of ξ. Since the measure in these cases cannot not be uniquely

determined, the best we can do to evaluate the integral is to find the tightest possible

bounds on it. We thus arrive at the following optimization problem, also known as the

generalized moment problem (c.f. [Lasserre, 2010]):

maxQ

(minQ

) EQ[h(ξ)] (2.2)

subject to EQ[φj(ξ)] = bj, j = 1, ..., J.

Isii (1960, 1963) was the first to apply duality theory to study the above problem. Fol-

lowing the spirit of linear programming duality, we can derive the dual problem of (2.2):

minz

(maxz

) zTb (2.3)

subject to zTφ(ξ) ≥ h(ξ), ∀ξ ∈ C,

where b (resp. φ(ξ)) denotes a vector form of bj (resp. φj(ξ)). The most powerful

connection between solving the dual problem (2.3) and generating the tightest moment-

9


APPLICATION

based bound (2.2) is established via the following strong duality result by Isii (1963). The

theorem implies that under a mild condition, if the problem (2.3) can be solved exactly,

its optimal value is the tightest moment-based bound.

Theorem 2.1.1. [Isii, 1963] If the vector of moments b is interior to the feasible moment

set M = E[φ(ξ)] | ξ arbitrary multivariate distribution, then strong duality holds.

While solving the dual problem (2.3) is known to be NP-hard in its generic form

[Bertsimas and Popescu, 2005], Bertismas and Popescu (2000,2002) showed that there is

a large class of instances motivated from real-life applications that can be solved in poly-

nomial time and in a practically efficient manner. Their results hinge on the findings that

a wide range of constraint forms in (2.3) can be reformulated as semidefinite constraints,

also known as semidefinite representable [Ben-Tal and Nemirovski, 2001]. Recall that

a convex set K ⊂ <n′ is called semidefinite representable (SDr) if it can be expressed

as k∗ ∈ K,∃t∗ | A(k∗ t∗) − B 0, where A denotes a linear operator, B denotes a

constant matrix, and the notation 0 implies that the left-hand-side of the expression is

a semidefinite matrix. In addition, a convex function f ∗ : <n′ → <∪∞ is called SDr if

its epigraph (k∗, t∗) | f ∗(k∗) ≤ t∗ is an SDr set. Here we present as an example one of

the key results related to semidefinite reformulations of (2.3) in [Bertsimas and Popescu,

2002] when both functions h and φj in (2.2) are univariate polynomial functions. The

result will also be used in the latter sections.

Proposition 2.1.1. [Bertsimas and Popescu, 2002], [Gotoh and Konno, 2002]

1. The polynomial g(x) =∑n

r=0 yrxr satisfies g(x) ≥ 0 for all x ∈ [0, a) if and only if

there exists a positive semidefinite matrix X = [xij]i,j=0,...,n such that

0 =∑

i,j:i+j=2l−1

xij, l = 1, ..., n,

l∑r=0

yr

n− r

l − r

ar =∑

i,j:i+j=2l

xij, l = 0, ..., n.

10


APPLICATION


r=0 yrxr satisfies g(x) ≥ 0 for all x ∈ [a, b] if and only if

there exists a positive semidefinite matrix X = [xij]i,j=0,...,n such that

0 =∑

i,j:i+j=2l−1

xij, l = 1, ..., n,

l∑m=0

k+m−l∑r=m

yr

r

m

k − r

l −m

ar−mbm =∑

i,j:i+j=2l

xij, l = 0, ..., n.


r=0 yrxr satisfies g(x) ≥ 0 for all x ∈ [a,∞) if and only

if there exists a positive semidefinite matrix X = [xij]i,j=0,...,n such that

0 =∑

i,j:i+j=2l−1

xij, l = 1, ..., n,

n∑r=l

yr

r

l

ar−l =∑

i,j:i+j=2l

xij, l = 0, ..., n.

Since the work of [Bertsimas et al., 2000], [Bertsimas and Popescu, 2002], and [Bert-

simas and Popescu, 2005], the idea of applying conic optimization theory to efficiently

generate the tightest moment-based bounds has been considerably generalized (see [Zu-

luaga and Pena, 2005]). In Section 2.3, we will present new tractability results of a

special class of moment problems, where random variables are multivariate and marginal

moment information is incorporated.

Generating the tightest moment-based bounds, while most desirable, is not always

computationally tractable. Lasserre (2001) proposed an alternative conic optimization

approach, moment relaxation techniques, to solve moment problems approximately. In-

stead of taking a dual perspective of moment problem, Lasserre’s approach tackles directly

the primal form of the problem, i.e. (2.2) using a “change of variables”-type method. The

idea is to replace each integral form of monomial EQ[Πni=1ξ

pii ] by a new scalar up ∈ <,

where p = (p1, ..., pn) ∈ Zn+ is an index variable, and |p| = p1 + · · · + pn. In the cases

11


APPLICATION

that functions h and φj in (2.2) are polynomials, the problem (2.2) can be reformulated

as follows:

maxu

(minu

) cTu (2.4)

subject to κTj u = bj, j = 1, ..., J,

where u is a vector form of up, and c (resp. κj) denotes the coefficients of the polynomial

h (resp. φj). Clearly, the above problem is a relaxation of the problem (2.2). To ensure

the bounds generated from the above relaxation can be reasonably tight, Lasserre (2001)

further employed the notion of moment matrices to strengthen the relaxation.

Definition 2.1.1. The moment matrix Mr(u) is defined by

Mr(u)(1, i∗) = Mr(u)(i∗, 1) = u∗i∗−1, for i∗ = 1, . . . , 2r + 1,

Mr(u)(1, j∗) = u∗α∗ and Mr(u)(i∗, 1) = u∗β∗ ⇒ Mr(u)(i∗, j∗) = u∗α∗+β∗ ,

where u∗i∗ ∈ R : i∗ ∈ Z+ is the sequence obtained by ordering u so that it conforms with

the indexing implied by the usual basis

1, ξ1, . . . , ξd, . . . , ξ

21 , ξ1ξ2, . . . , ξ

2r1 , ξ

2r−11 ξ2, . . . , ξ

2rn

(2.5)

of the vector space of R-valued polynomials in n variables of degree at most 2r.

It is easy to verify that if the sequence u is a feasible moment sequence, i.e. there

exists a probability measure having u as its moments, the corresponding moment matrix

Mr(u) must be positive semidefinite. The fact that such a semidefinite condition is also

sufficient for an infinite sequence u = up : |p| =∞ to be a feasible moment sequence

is established by Curto and Fialkow (1996).

Theorem 2.1.2. [Curto and Fialkow, 1996] For an infinite sequence u = up : |p| =∞,

if Mr(u) 0 and Mr(u) has finite rank r, then u has a unique r-atomic representing

measure.

12


APPLICATION

The semidefinite condition given in Theorem 2.1.2 is intractable from a computational

perspective due to the infinite size of the sequence. Lasserre (2001) proposed the use of a

truncated sequence and the associated moment matrix, i.e. fixing the value r in (2.5), to

develop a hierarchy of relaxation counterparts. By increasing the value r, the relaxation

problem (2.4) can be strengthen in a systematical manner. Such relaxation techniques are

powerful because each relaxation can be solved efficiently as a semidefinite programming

problem. Lasserre (2001) also proved several asymptotic convergence result as r → ∞.

Later, we will revisit the use of Lasserre’s type of approach to develop more tractability

results.

In the cases that both necessary and sufficient conditions are available for a finite

sequence of u, the relaxation becomes exact. This is the case for univariate random

variables. The following theorem is due to Hamburger (1920,1921).

Theorem 2.1.3. [Hamburger, 1920],[Hamburger, 1921] For univariate random variables

with whole real line as support, the necessary and sufficient condition for a vector u :=

[u1, u2, ..., u2r] to be a feasible moment sequence is that it belongs to the following set Ω,

which is a positive semidefinite cone.

Ω := u |

u0 u1 · · · ur

u1 u2 · · · ur+1

......

. . ....

ur ur+1 · · · u2r

0,

where u = [u1, u2, ..., u2r].

Other related positive semidefinite conditions for univariate random variables with

different range of support can be found in the early work of Stieltjes (1894,1895).

13


APPLICATION

2.2 Application in Model-Risk Management

In this section, we present an application of moment-based optimization in developing

a special form of risk measures used for measuring the impact of model uncertainty

in derivative pricing. The pricing of derivatives, such as options or futures, remains

challenging as modern markets more than ever are exhibiting complex and non-stationary

behaviors. In the classical Black-Scholes pricing formula [Black and Scholes, 1973], it is

assumed that the price dynamics of an underlying security follows a geometric Brownian

motion (GBM) with constant volatility. This assumption, however, does not fit well

most empirical findings that the trend and the volatility of markets are in general non-

stationary.

Despite the rapid developments in derivative pricing models, there is no single model

that can suit all cases. Practitioners often face the difficulty in choosing a right model

and are concerned about the possible losses associated with model misspecification. Dev-

astating financial losses have been long reported due to derivative mispricing resulting

from model misspecification. To help better manage such level of risk, known as model

risk, Cont (2006) was the first to provide a comprehensive treatment of the design of

new risk measures for quantifying the impact of model risk. Such new measures can be

viewed as special instances of the popular convex risk measures, where the optimization

representation of convex risk measures is specialized with its solution space refined to a

set of derivative-pricing models (distributions) that are ambiguous to traders. In par-

ticular, a market price-based penalty function was introduced in the measure that gives

higher preference to the pricing model that can better reproduce the market prices of

existing derivative instruments. In the rest of this section, we also call such measures

market price-based convex risk measures.

Cont gave examples that illustrate the evaluation of the risk measure based on finite

families of probability (pricing) measures. However, these families of measures often

14


APPLICATION

require additional assumptions on the functional forms of the pricing distributions, which

can be very difficult to verify in practice. This can also lead to underestimation of the

impact of model misspecification since the true pricing distribution may not be even

considered when evaluating the risk measures. In this section, we consider the case

of infinite families of measures that share common moments, e.g. mean and variance

for European-style options, and present a new approach to evaluate Cont’s convex risk

measures. Examples are given that illustrate the benefits of evaluating the risk measure

with infinite families of measures and shed light on the limitations of considering only

finite families of measures.

2.2.1 Market Price-Based Convex Risk Measures

We first review briefly the properties of convex risk measures and then provide some

background of Cont’s market price-based risk measures. Given a sample space w : w ∈

Ω, let X : Ω → < denote a linear space of bounded functions. Note that for any

V1,V2 ∈ X the notation V1 ≥ V2 stands for the relation V1(w) ≥ V2(w) ∀ w ∈ Ω. A

function ρ : X → < is called a convex risk measure if it satisfies the following axioms, for

all V1,V2 ∈ X ,

1. if V1 ≥ V2 then ρ(V1) ≤ ρ(V2);

2. if c′ ∈ <, then ρ(V1 + c′) = ρ(V1)− c′;

3. ρ(λ′V1 + (1− λ′)V2) ≤ λ′ρ(V1) + (1− λ′)ρ(V2) ∀ λ′ ∈ [0, 1].

The convexity property 3 plays a significant role as it supports the notion that diversi-

fication typically helps to reduce risk. Under some mild conditions, Follmer and Schied

(2002) show a particularly useful representation. In particular, any convex risk measure

ρ can be represented as

ρ(V) = supQ∈DEQ[−V ]− α(Q),

15


APPLICATION

where α : D → < is a convex function.

Assume now that there exists a financial market within which derivative instruments

are traded. Given a set of L financial instruments whose future payoffs are Hl : Ω →

<, l = 1, ..., L and whose current market prices are hl ∈ <, l = 1, ..., L. Then, there is no

arbitrage opportunity if and only if there exists a probability measure Q such that

EQ[Hl] = hl, l = 1, ..., L

holds. Such a Q may or may not be unique, depending on the assumptions of the mar-

ket. Detailed discussions of these assumptions and explanations are referred to relevant

literature. In the cases that Q is not unique or cannot be uniquely determined, traders

then face model uncertainty. In practice, based on their knowledge of the market, traders

usually specify a family of possible measures D for pricing a target payoff V∗ : Ω→ <; as

a consequence, the prices generated from different measures in D may not be the same.

One important source of information that helps in specifying a measure Q is from

the market prices of derivative instruments traded in the market. In [Cont, 2006], these

options are called benchmark options as their market prices can serve as useful reference.

Thus, based on a set of benchmark options with payoffs (Hl)l=1,...,L and market prices

(hl)l=1,...,L, Cont suggests the following metric ϑ to quantify the uncertainty (risk) with

respect to a given target payoff V∗ and a set of pricing models D, where

ϑ(V∗) = π∗(V∗)− π∗(V∗),

and

π∗(V∗) = supQ∈DEQ[V∗]− ||h

∗− EQ[H∗]||,

π∗(V∗) = infQ∈DEQ[V∗] + ||h

∗− EQ[H∗]||,

and h∗

(resp. H∗) denotes the aggregated vector form of (hl)l=1,...,L (resp. (Hl)l=1,...,L).

The operator || · || is a norm over the aggregated vector space. More generally, the norm

16


APPLICATION

function can be represented as

||h∗− EQ[H∗]|| =

L∑l=1

w′l · |hl − EQ[Hl]|,

where w′l denotes a penalty parameter. The upper (resp. lower) bound measure is closely

related to the convex risk measure ρ(V), where

ρ(V∗) = π∗(−V∗) (resp. ρ(V∗) = −π∗(V∗)),

and the penalty function α(Q) is defined as α(Q) := ||h∗− EQ[H∗]||. The upper/lower

bound measure without the penalty term simply evaluates the most extreme value of

derivative prices evaluated under each measure in the set D. With an addition of the

penalty term, each measure further takes into account the “calibration error” of each

possible Q, i.e. the capability of each measure Q to reproduce the market prices of given

benchmark instruments. One of the most useful features of the penalty construction is

that given a set of ambiguous pricing measures D, the metric ϑ requires only one measure

Q ∈ D but not all in D to replicate the market prices of benchmark options so that the

metric ϑ can be considered as a“good”measure [Cont, 2006]. In other words, with almost

no difficulty, the metric ϑ can be applied over additional measures that are difficult to

calibrate as well as the initially specified measures (at least one of which is assumed to

calibrate sufficiently with the benchmark prices). This is beneficial since this provides

the flexibility to incorporate a wider class of pricing measures into D, especially those

that may be more compatible with a trader’s view of future market scenarios despite the

difficulty of calibration.

2.2.2 A Moment-Based Distribution-Free Optimization Approach

Here, we characterize the set of infinite families of probability measures that we will use

for the market price-based risk measures. Let ξ denote the random price of a single

17


APPLICATION

underlying asset. The set D is specified through a set of moment conditions, i.e.

D := Q : EQ[φj(ξ)] = bj, j = 1, ..., J, (2.6)

where φj : < → < is continuous and bj ∈ <. The focus of this section is to provide a

method to reformulate the evaluation problems π∗(V∗) and π∗(V∗) under (2.6) as convex

optimization problems so that they can be solved efficiently. Our approach is based on the

theories of semi-infinite and semi-definite programming. From here on, we assume that

the norm function || · || within the penalty function is semidefinite representable (SDr).

Many norm functions, including those discussed in [Cont, 2006] are SDr (cf. [Ben-Tal

and Nemirovski, 2001]). The following lemma is essential for our development.

Lemma 2.2.1. Consider the problems

psup = supQ

∫Cψ(ζ)dQ(ζ) :

∫CE(ζ)dQ(ζ) = E0,

∫CdQ(ζ) = 1,

and

pinf = infQ

∫Cψ(ζ)dQ(ζ) :

∫CE(ζ)dQ(ζ) = E0,

∫CdQ(ζ) = 1,

where Q is a non-negative measure on the measurable space (<n,B), ψ : <n → < and

E : <n → <m are continuous, and E0 ∈ <m. The dual problems can be respectively

written as

dsup = infλ0,Λe

λ0 + ΛeTE0 : λ0 + Λe

TE(ζ) ≥ ψ(ζ) ∀ζ ∈ C (2.7)

= infΛe

supζ∈C

ψ(ζ)−ΛeTE(ζ) + Λe

TE0, (2.8)

and

dinf = supλ0,Λe

λ0 + ΛeTE0 : λ0 + Λe

TE(ζ) ≤ ψ(ζ) ∀ζ ∈ C (2.9)

= supΛe

infζ∈C

ψ(ζ)−ΛeTE(ζ) + Λe

TE0, (2.10)

where λ0 ∈ < and Λe ∈ <m. Then strong duality holds, i.e. psup = dsup (pinf = dinf ), if

E0 ∈ int(

∫CE(ζ)dQ(ζ)), (2.11)

18


APPLICATION

for all Q.

Proof. One can derive the first dual reformulation (2.7) (resp. (2.9)) by following duality

theory for semi-infinite linear problems (cf. [Shapiro, 2001]). The second dual formulation

(2.8) (resp. (2.10)) can be derived by first converting the constraint into

λ0 ≥ supζ∈Cψ(ζ)−Λe

TE(ζ) (resp. λ0 ≤ infζ∈Cψ(ζ)−Λe

TE(ζ)).

Since the right-hand-side of the above inequality provides a lower (resp. upper) bound

of λ0, we can replace λ0 in the objective function by supζ∈Cψ(ζ) − ΛeTE(ζ) (resp.

infζ∈Cψ(ζ)−ΛeTE(ζ)).

We present here a general framework to generate tractable reformulations for the

problems of evaluating π∗(V∗) and π∗(V∗) under (2.6). Note that the target payoff V∗

and future payoffs Hl are functions of random price ξ.

Theorem 2.2.1. Given that the interior condition (2.11) holds, the problems of evalu-

ating π∗(V∗) and π∗(V∗) under (2.6) are equivalent to solving the following two problems

π∗ := inf s+ t

s.t. V∗(ξ)−J∑j=1

λmj(φj(ξ)− bj)−L∑l=1

λhlHl(ξ) ≤ s ∀ξ ≥ 0

sup(ql)l=1,...,L

L∑l=1

λhlql − ||h∗ − vec((ql)l=1,...,L)|| ≤ t, (2.12)

π∗ := sup s+ t

s.t. V∗(ξ)−J∑j=1


λhlHl(ξ) ≥ s ∀ξ ≥ 0

inf(ql)l=1,...,L

L∑l=1

λhlql + ||h∗ − vec((ql)l=1,...,L)|| ≥ t, (2.13)

where for each problem (λmj)j=1,...,J , (λhl)l=1,...,L, s, t are variables, and λmj , λhl,s,t ∈ <.

Furthermore, the constraints (2.12) and (2.13) are SDr. Note that vec(qγ) denotes the

aggregated vector form of the set of scalar variables qγ.

19


APPLICATION

Proof. We present here only the reformulation of π∗ since π∗ can be reformulated in an

identical manner. We first introduce slack variables (ql)l=1,...,L ∈ < and reformulate the

problem as follows

supQ,(ql)l=1,...,L

EQ[V∗(ξ)]− ||h∗ − vec((ql)l=1,...,L)||

subject to EQ[φj(ξ)] = bj, j = 1, ..., J

EQ[Hl(ξ)] = ql, l = 1, ..., L.

Consider maximizing the above problem first with respect to the measure Q; based on

Lemma 2.2.1, the problem can be reformulated as

supq

infλm,λh supξ≥0 V∗(ξ)−J∑j=1

λmj(φj(ξ)− bj)

−L∑l=1

λhl(Hl(ξ)− ql)− ||h∗ − q||,

where q := vec(ql=1,...,L), λm := vec(λmj=1,...,J) and λh := vec(λhl=1,...,L

). Since the op-

erator supξ≥0 preserves convexity, the problem is concave with respect to (ql)l=1,...,L and

convex with respect to λm,λh. Therefore, using Sion’s minimax theory we can switch

(supq) and (infλm,λh) and arrive

infλm,λh

supq supξ≥0 V∗(ξ)−J∑j=1

λmj(φj(ξ)− bj)

−L∑l=1

λhl(Hl(ξ)− ql)− ||h∗ − q||.

By introducing slack variables s and t, the problem can be equivalently written as

infλm,λh,s,t

s+ t

supξ≥0

V∗(ξ)−J∑j=1


λhlHl(ξ) ≤ s (2.14)

supq

L∑l=1

λhlql − ||h∗ − q|| ≤ t.

20


APPLICATION

Then, the first constraint is equivalent to a feasibility constraint which requires that the

inequality in (2.14) holds for all ξ ≥ 0. Now, consider the second constraint in the above

problem. By introducing a slack variable z′, the constraint can be reformulated as follows

supq,z′

L∑l=1

λhl · ql − z′ ≤ t

subject to ||h∗ − q|| ≤ z′.

Given that the norm ||·|| is SDr, the problem can be reformulated as a SDP maximization

problem. By applying SDP duality theory, we can derive an equivalent SDP minimization

dual problem. Notice that the Slater condition holds for the above problem, and therefore

strong duality holds. The resulting constraint is then in a form miny∈S cTy ≤ t, where c

is a coefficient vector and S is a SDr set, which is equivalent to the condition

∃y ∈ S : cTy ≤ t.

This condition is a set of SDP constraints.

We now consider the problem of evaluating Cont’s convex risk measures for European

call/put options when only a finite number of moments are available for the underlying

security. Based on Proposition 2.1.1 and Theorem 2.2.1, we show in Corollary 2.2.1

that the problem can be reformulated as a semidefinite programming problem. Similar

settings of moment conditions can also be found in [Grundy, 1991], [Boyle and Lin, 1997],

[Bertsimas and Popescu, 2002], [Gotoh and Konno, 2002].

Corollary 2.2.1. Consider the evaluations of π∗ and π∗ for a European call (put) option

with strike price K0, given a set of European call options with Hl(ξ) = max(0, ξ−Kl) l =

1, ..., o and put options with Hl(ξ) = max(0, Kl−ξ), l = o+1, ..., L as benchmark options,

where Kl ∈ <+. In the case that a vector of raw moments bj, j = 1, ..., J is given,

the evaluation problems π∗ and π∗ can be solved efficiently as semidefinite optimization

problems.

21


APPLICATION

Proof. For brevity, we consider only reformulating the problem of evaluating π∗. The

evaluation problem of π∗ can be reformulated using the same approach. Based on Theo-

rem 2.2.1, the only constraint that needs to be further reformulated is

V∗(ξ)−J∑j=1


λhlHl(ξ) ≤ s ∀ξ ≥ 0.

Thus, for the case that the target option is an European call option, i.e. V∗(ξ) =

max(0, ξ − K0), given that φj(ξ) = ξj, j = 1, ..., J the constraint can be equivalently

written as

max(0, ξ − K0)−J∑j=1

λmj(ξj − bj)−

o∑l=1

λhl max(0, ξ − Kl)

−L∑

l=o+1

λhl max(0, Kl − ξ) ≤ s ∀ξ ≥ 0.

Now, let k1, ..., kI denote the ordered sequence of the breakpoints Kl, l = 0, ..., L, where

ks′ ≤ ks′+1. We can partition the space of ξ ∈ <+ according to the sequence k1, ..., kI ,

and the above constraint can thus be decomposed and generally written as the following

set of constraints

J∑j=1

λmj ξj ≥

(a′0Tλq)ξ + b′0, ξ ∈ [0, k1]

(a′1Tλq)ξ + b′1, ξ ∈ [k1, k2]

......

(a′I−1Tλq)ξ + b′I−1, ξ ∈ [kI−1, kI ]

(a′ITλq)ξ + b′I , ξ ∈ [kI ,∞]

, (2.15)

where λq = vec(λhl=1,...,L). Proposition 2.1.1 can then be applied to convert each con-

straint in (2.15) to its SDP counterpart based on the end points of the respective partition.

This completes the proof that the overall problem can be reformulated as a semidefinite

optimization problem. It is trivial to see that the same approach also applies to the case

of an European put option, i.e. V∗(ξ) = max(0, K0 − ξ).

22


APPLICATION

2.2.3 Numerical Examples

In this section, we will compare the values of the metric ϑ(V∗) calculated by a set of finite

families of pricing measures and a set of infinite families of measures. The infinite families

of pricing measures are in particular defined via the first two raw moments. We follow the

numerical example presented in [Lo, 1987], which illustrates the practical relevance of his

semi-parametric bound when there are several competing specifications for the stochastic

process of an underlying security.

In Lo’s experiment, he considered two leading classes of prices processes, lognormal

diffusions and mixed diffusion-jump processes as two candidates that drive the price

dynamic of the underlying security. A remarkable fact is that for any given dataset,

risk-neutral variances of these two processes are numerically identical, which implies that

the semi-parametric bound derived based on any specification within these two classes of

processes immediately applies to all other specifications. In our experiment, Lo’s setup

for the two processes will be the choice for the set of finite families of pricing measures,

and the associated moments will be the condition for an alternative set of infinite families

of pricing measures to satisfy. We compare the values of ϑ(V∗) evaluated between these

two sets. The lognormal diffusion and mixed diffusion-jump processes are defined as

follows. Note that the following notations are the same as the ones used in [Lo, 1987],

which may overlap with the notations used in other parts of this thesis.

dS1 = α1S1dt+ σ1S1dW,

dS2 = [α2 − λ(k − 1)]S2dt+ σ2S2dW + (γ − 1)S2dNλ,

where ln γ ∼ N(β, δ2) and k = E[γ]. We omit the details of the above popular processes,

but provide the analytical forms of their risk-neutral variances

V1 = e2rτ · [eσ21τ − 1],

V2 = e2rτ · [e[λ(k−1)2+σ22+λσ2

γ ]τ − 1],

23


APPLICATION

where σ2γ = var[γ] = e[2β+δ2](eδ

2 − 1). Besides the parameters of the stochastic processes,

parameter r (resp. τ) denotes the risk-free rate (resp. expiration time). Having identical

risk-neutral variances (V1 = V2) implies that

σ21 = λ(k − 1)2 + σ2

2 + λσ2γ. (2.16)

In [Lo, 1987], a diffusion model is selected by setting σ1 = s′/√

52, where s is the annual

compound standard deviation. A mixed diffusion-jump model is selected by setting k = 1,

λ = 0.25, σ2r = φ1 · σ2

1, and σ22 = φ2 · σ2

1, where φ1 = 3.6, φ2 = 0.1. This setting ensures

that the condition (2.16) holds. From here on, QB (resp. QM) denotes the diffusion

model (resp. mixed diffusion-jump model) with Lo’s parameter setting. The European

call option prices of these two models for various τ, s′, K are presented in Table A.1 under

columns indexed by CB and CM .

We now consider a set of European call options as benchmark options. We generate

their prices based on an alternative mixed diffusion-jump model Qb parameterized by

k = 1, φ2 = 0.15, λ = 0.25, which is different from the one selected by Lo but with the

same risk-neutral variance. Note that φ1 is uniquely determined after setting k, φ2, λ.

The prices for various τ, s′, K are also listed in Table A.1 under columns indexed by Cb.

In particular, we choose only a portion of the generated option prices with respect to

strike prices K = 30, 35, 40 as the prices of benchmark options (hl) and leave the rest

for the use of comparing target payoff (V∗). We then conduct the experiment as follows.

First, we consider the set of finite families of measures

Qfin := QB,QM ,Qb,

and the set of infinite families of measures based on the first two raw moments

Qmom := Q | EQ[ST ] = S0erτ ,EQ[S2

T ] = V1,

where S0 (resp. ST ) here denotes the initial (resp. terminal) price of the underlying asset.

Based on each set, we will evaluate the metric ϑ(V∗) for the target payoff V∗(ξ) = (ξ−K)+

24


APPLICATION

for K = 45, 50. Then, we add a sequence of mixed diffusion-jump models with parameters

φ2 = 0, 0.1, ..., 0.9, 0.99, λ = 2.5e − 7 to the set Qfin and examine the values of ϑ. Note

that the maximum of ϑ(V∗) among all possible specifications for the mixed diffusion-jump

models is attained by adding such a sequence, which is verified via optimization.

φ2 = 0.1 φ2 = 0.5 ∼ 0.99 φ2 = 0 ∼ 0.99

λ = 0.25 λ = 2.5e−7 λ = 2.5e−7

s′ K [τ = 1 τ = 12 τ = 24] [τ = 1 τ = 12 τ = 24] [τ = 1 τ = 12 τ = 24]

0.2 45 0.000 0.000 0.000 0.000 0.014 0.002 0.000 0.014 0.002

0.2 50 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.4 45 0.000 0.000 0.000 0.027 0.013 0.009 0.027 0.013 0.009

0.4 50 0.000 0.000 0.000 0.000 0.024 0.012 0.000 0.024 0.012

0.6 45 0.000 0.000 0.001 0.044 0.022 0.053 0.051 0.022 0.057

0.6 50 0.000 0.000 0.000 0.000 0.030 0.024 0.000 0.030 0.024

0.8 45 0.000 0.000 0.007 0.039 0.033 0.104 0.056 0.033 0.148

0.8 50 0.000 0.000 0.000 0.000 0.052 0.067 0.000 0.052 0.067

Table 2.1: ϑ(V∗) of Qfin for various values of parameters s′, K, τ .

In Table 2.1, the first three columns present the values of ϑ(V∗) evaluated based on

Qfin, columns 4 ∼ 6 (resp. 7 ∼ 9) present the values when additional diffusion-jump

models with parameters φ2 = 0.5, ..., 0.9, 0.99 (resp. φ2 = 0, ..., 0.9, 0.99) and λ = 2.5e−7

are added. First, notice in Table 2.1 that in several cases, the values of ϑ(V∗) are simply

zeros; that is, except the benchmark model Qb, which achieves optimality for both upper

and lower bound problems (π∗ and π∗), all other models are discarded. Some of these

discarded models have significant impacts on the price of the target payoff; however, the

impacts of these models are mostly devaluated by the respective calibration errors. This

sheds some light on one potential limitation of evaluating the measure ϑ based solely on

finite families of pricing measures: if only restrictive functional forms of distributions are

available for traders to represent their views of price dynamics, this can potentially lead

to a trivial conclusion such as zero model-uncertainty, which forgo all the information

that traders have provided. On the other hand, as shown in Table 2.2, the evaluation

25


APPLICATION

s′ K τ = 1 τ = 12 τ = 24

0.2 45 0.044 0.552 1.215

0.2 50 0.021 0.182 0.439

0.4 45 0.167 1.944 2.792

0.4 50 0.070 0.910 2.269

0.6 45 0.370 2.842 3.402

0.6 50 0.151 2.431 4.351

0.8 45 0.657 3.454 3.566

0.8 50 0.285 3.908 5.801

Table 2.2: ϑ(V∗) of Qmom for various values of parameters s′, K, τ , where w′ = 1.

based on the first two moments of infinite families of pricing measures can always provide

nontrivial values of ϑ(V∗), which embodies traders’ concern of other possible specifications

by using their moment information. In the mean while, the approach retains the feature

of re-weighting the price impact of each model with respect to the associated calibration

error. Also, observe in Table 2.2 that with one week to maturity the values of ϑ(V∗)

are fairly tight with respect to the values evaluated based on finite families of pricing

measures, which follows closely the feature of Lo’s bound. Thus, in the cases that zero

model uncertainty are reported from the approach that takes into account only finite

families of pricing measures but not trusted by the traders, these tight bounds can be

particularly useful to serve as second reference for traders. In general, the values of

ϑ(V∗) presented in Table 2.2 increases with the variance and the time to maturity. This

behavior is plausible since the larger the variance is, the larger the weights can possibly

be on the tails of distributions, and therefore the larger the impact is on the price. In

addition, this impact can only be magnified as the time to maturity increases.

One additional aspect worth noting is that the evaluation of ϑ based on finite families

of pricing measures can in some cases be very sensitive to the adjustment of the penalty

parameter w′. By slightly increasing w′ to 1.6, the values of ϑ(V∗) in the setting of Table

A.2 all turn to zeros. Thus, this may impose additional difficulty for traders to adjust their

26


APPLICATION

aversion towards calibration errors as the trivial conclusion of zero model-uncertainty can

be easily made by slight adjustment of the penalty parameter. On the other hand, as

shown in Table A.2 and A.3, the evaluation of ϑ(V∗) based on the moments of infinite

families of pricing measures remains reporting meaningful values when we increase w′ to

2 and 5. In fact, the result of Table A.3 is the “minimum” possible values of ϑ(V∗), which

are invariant to any further increment of w′. This minimum values of ϑ(V∗) is attained

by a pricing measure that perfectly replicates the benchmark prices so that the penalty

is always zero.

2.3 Tractability of Accouting for Multivariate Mo-

ment Information

Real-life applications often involve multiple random quantities of interest. For example,

in option pricing the value of an option may depend on multiple assets such as popular

basket options. The incorporation of multivariate moment information, while practically

useful, is however much more challenging than the univariate case. Several instances

for example are known to be computationally intractable (NP-hard). The focus of this

section is to shed light on the tractability of a special class of multivariate moment

problems that not only allow for the incorporation of high-order moments but are also

amenable to SDP reformulations.

To unify the presentation of all relevant results, in the rest of this section, the notation

D denotes a set of distribution that captures available information about Q. The objective

here is to solve the following optimization problem efficiently :

supQ∈D

E Q[h(ξ)]. (2.17)

In the cases that the objective function is “piecewise concave” in ξ, the problem (2.17)

is known to be tractable for incorporating the following forms of distribution sets: D

27


APPLICATION

characterized by fixed support and mean [Dupacova, 1987], by fixed mean and covariance

[Bertsimas and Popescu, 2002], and by fixed ranges of support, mean, and an upper bound

on covariance [Delage and Ye, 2010]. The problem (2.17), however, has also been proven

intractable to incorporate D that fixes the support, mean, and covariance of distributions,

and that fixes the d-th first moments with d ≥ 4 [Bertsimas and Popescu, 2005].

We consider here two special forms of distribution sets that incorporate the informa-

tion of marginal higher moments. In the following descriptions, fi denotes a univariate

distribution, and Q(f1, ..., fn) represents a multivariate probability measure Q whose

marginal distributions are fi, i = 1, ..., n. From here on, for simplicity notation Q is

also used as shorthand for Q(f1, ..., fn).

• Marginal higher moments:

Dm :=Q(f1, ..., fn)

∣∣E fi [φ(i)(ξ)] = b(i)

,

where φ(i)(ξ) = [1 ξi ξ2i · · · ξdi ]T, and b(i) denotes a vector of associated univariate

moments.

• Marginal higher moments and a covariance matrix :

Dmc :=Q(f1, ..., fn)

∣∣E fi [φ(i)(ξ)] = b(i),E Q[ξ] = µ,E Q[(ξ − µ)(ξ − µ)T] = Σ,

where φi(ξ) = [1 ξ3i ξ4

i · · · ξdi ]T, and b(i) (resp. µ, Σ) denotes a vector of

associated univariate moments (resp. a mean vector, a covariance matrix).

In addition, we focus on functions h(ξ) of the following form

h(ξ) := maxk∈1,...,K

hk(ξ), hk(ξ) :=n∑i=1

cTk,ihi(ξ),

where hi(ξ) := [1 ξi ξ2i · · · ξd

′i ]T. ξi denotes the i-th univariate component of the

random vector ξ, and d′ denotes the order of ξi. We shall call such class of functions

28


APPLICATION

piecewise separable functions. For example, in portfolio selection problems, one would

consider a special instance hk(ξ) =∑n

i=1 ak · xi · ξi + bk, where xi (resp. ξi) denotes the

money investment (resp. random return) of a single asset, and∑n

i=1 xi · ξi represents a

portfolio. The piecewise structure of h(ξ) in this case allows for modeling a wide range

of utility and risk measure functions (see more detailed discussion in Section 4.2).

It is often more practical in real-life settings to consider marginal higher moments

rather than joint higher moments. In the latter case, the number of parameters needed

to be estimated can be extremely large as it grows exponentially in the dimension of the

random quantities of interests. This could potentially lead to unstable estimations of

joint moments. In the case that only marginal higher moments are available, we show in

the following proposition that problem (2.17) based on the distribution set Dm can be

solved efficiently.

Proposition 2.3.1. The problem

supQ(f1,...,fn)

E Q[ maxk∈1,...,K

n∑i=1

cTk,ihi(ξ)]

subject to E fi [φ(i)(ξ)] = b(i), i = 1, ..., n′

can be solved efficiently via a semidefinite programming problem, where hi(ξ) = [1 ξi · · ·

ξd′i ]T and φ(i)(ξ) = [1 ξi · · · ξdi ]T.

Proof. Using duality theory of semi-infinite programming, we can derive the following

dual problem

minimizez(i)

n′∑i=1

b(i)Tz(i) :

n′∑i=1

zT(i)φ(i)(ξ) ≥ maxk∈1,...,K

n∑i=1

cTk,ihi(ξ), ∀ξ. (2.18)

The constraint in the above problem can be equivalently reformulated as a system of K

constraints:n′∑i=1

zT(i)φ(i)(ξ) ≥n∑i=1

cTk,ihi(ξ), ∀ξ, k = 1, ..., K. (2.19)

29


APPLICATION

Based on (2.19), the problem (2.18) can be equivalently expressed in the following general

form

minimizez(i)

n′∑i=1

b(i)Tz(i) :

max(n′,n)∑i=1

θk(z(i))Tg(i)(ξ) ≥ 0 ∀ξ, k = 1, ..., K,

where g(i)(ξ) = [1 ξi · · · ξd′′(i)i ]T, and θk(z(i)) denotes the coefficient vector after shifting

the terms on the right-hand-side of the inequality in (2.19) to the left-hand-side and

re-grouping the coefficients with respect to each variable ξi. The above optimization

problem can be equivalently expressed as

minimizez(i)

n′∑i=1

b(i)Tz(i) : inf

ξ

max(n′,n)∑i=1

θk(z(i))Tg(i)(ξ) ≥ 0 k = 1, ..., K.

By introducing free variables tk,i, each kth-constraint can be equivalently reformulated

into following constraints :

max(n′,n)∑i=1

tk,i ≥ 0, infξiθk(z(i))

Tg(i)(ξ) ≥ tk,i, i = 1, ...,max(n′, n).

The constraints on the right-hand-side are equivalent to

θ∗k(tk,i, z(i))Tg(i)(ξ) ≥ 0, ∀ξi, i = 1, ...,max(n′, n), (2.20)

where θ∗k(tk,i, z(i)) denotes the coefficient vector after the same kind of shifting operation

done for θk(z(i)). Based on Proposition 2.1.1, each ith-constraint in (2.20) is known to

be SDP-representable. This completes the proof.

Our motivation to study the distribution set Dmc, which additionally accounts for

a covariance structure, comes from the application of a linear factor model in moment

problems. Linear factor models are a popular approach used to reduce the dimensionality

of the random quantities. In these models, the random vector ξ is assumed to be driven

by a lower-dimension factor vector ζ so that ξ = Vζ+ε holds, where V is a factor loading

matrix, and ε is a vector of residual returns with zero mean and zero correlation. The

application of factor models in portfolio selection problem can be found in Section 4.5.2.

30


APPLICATION

It is often assumed in factor models that no correlation exhibits among components of ζ

and between ζ and ε. Thus, if we reformulate moment problems based on a factor model,

not only the higher marginal moments of random factors ζ but also the zero correlation

structure among ζ, ε needs to be incorporated in a distribution set, which is exactly the

distribution set Dmc with Σ whose off-diagonal elements are zeros.

The problem (2.17) based on the set Dmc is known to be tractable in generating the

tightest bound when the order of the polynomials in the function h(ξ) is less than or

equals to two, i.e. d′ ≤ 2, and only up to second order moment information is specified

in the distribution set Dmc. In Theorem 2.3.1, we consider the same maximum order

of polynomials in h(ξ), i.e. d′ = 2 but further address marginal higher moments in the

distribution set Dmc, i.e. d > 2. A SDP relaxation formulation is provided that generates

a tight upper bound. The bound is usefully tight in the sense that it is guaranteed to

be tighter than the bound generated based only on the first two moment (mean and

covariance) or only on marginal higher moments.

Theorem 2.3.1. Consider the following optimization problem:

supQ(f1,...,fn)

E Q[ maxk∈1,...,K

n∑i=1

cTk,ihi(ξ)]

subject to E fi [φ(i)(ξ)] = b(i), i = 1, ..., n, E Q[ξ] = µ,E Q[(ξ − µ)(ξ − µ)T] = Σ,

where hi(ξ) = [1 ξi ξ2i ]

T and φ(i)(ξ) = [1 ξi · · · ξdi ]T. The following SDP problem

provides an upper bound on the optimal value of the above problem, which is at least as

tight as the one generated based on the set Dmc with only mean and covariance or with

31


APPLICATION

only marginal moments.

maximizeνk,%k,Γk,ηk,i

K∑k=1

n∑i=1

cTk,i[νk %k(i) Γk(i, i)]T

K∑k=1

%k = µ,K∑k=1

Γk = Σ + µµT, (2.21)

K∑k=1

ηk,i = b(i),

K∑k=1

νk = 1, Γk %k

%Tk νk

0, (νk, %k(i),Γk(i, i),ηk,i) ∈ Ki, i = 1, ..., n,

where each Ki denotes a positive semidefinite cone Ω (Theorem 2.1.3), and %k(i) (resp.

Γk(i, i)) denotes the i-th (resp. (i, i)-th) entry of the vector %k (resp. the matrix Γk).

Proof. Our first step is to partition the whole domain <n according to the piecewise

structure of the objective function h(ξ) := maxk∈1,...,K hk(ξ). Let Pk denote the partition

where the function hk(ξ) attains the maximum value of maxk∈1,...,K hk(ξ), i.e. <n = P1 ∪

P2∪ · · ·∪ PK and Pk′ ∩Pk′′ = ∅ for k′ 6= k′′. We accordingly define new measures Fk, k =

1, ..., K whose domain are respectively Pk, k = 1, ..., K. Without loss of generality, Q =∑Kk=1 Fk follows. Based on new measures Fk, k = 1, ..., K, we can reformulate the problem

as follows.

maximizeFk

K∑k=1

E Fk[hk(ξ)]

K∑k=1

E Fk[ξ] = µ,

K∑k=1

E Fk[ξξT] = Σ + µµT, (2.22)

K∑k=1

E Fk[φ(i)(ξ)] = b(i),

K∑k=1

E Fk[I<n ] = 1. (2.23)

Note that in the above formulation, we relax the condition that the measure Fk is sup-

ported on the partition Pk. Next, we replace E Fk[ξ] by %k, E Fk

[ξξT] by Γk, E Fk[φ(i)(ξ)]

by ηk,i, and E Fk[I<n ] by νk, where %k and ηk,i are in a vector form and Γk is in a matrix

32


APPLICATION

form. Let %k(i) (resp. Γk(i, i)) denote the ith (resp. (i, i)th)-entry of %k(resp. Γk). We

thus arrive at the following relaxed problem.

maximizeνk,%k,Γk,ηk,i

K∑k=1

n∑i=1

cTk,i[νk %k(i) Γk(i, i)]T

K∑k=1

%k = µ,

K∑k=1

Γk = Σ + µµT, (2.24)

K∑k=1

ηk,i = b(i),

K∑k=1

νk = 1.

Now, we add the following constraint

(νk, %k(i),Γk(i, i),ηk,i) ∈ Ki,

where each Ki denotes a positive semidefinite cone Ω (Theorem 2.1.3). We now show

that the relaxation (2.21) provides a bound that is tighter than the one generated based

on Dmc with only marginal moments. Let (ν∗k ,%∗k,Γ

∗k,η

∗k,i) denote the optimal solution

of (2.24). Without loss of generality, we assume first that all ν∗k > 0. Due to the

cone property of Ki, there must exists marginal distributions fik that satisfies the

marginal moments %∗k(i)/ν∗k ,Γ

∗k(i, i)/ν

∗k , and η∗k,i/ν

∗k . We can always construct a product

measure Fk := f1 × · · · × fnk based on marginal distributions fik. We can finally

define a new measure F =∑K

k=1 Fk · ν∗k that straightfowardedly satisfies E F [ξ] = µ and

E F [φ(i)(ξ)] = b(i). Using such a measure, we can derive the following inequalities

supQ∈Dm

E Q[h(ξ)]

≥ E F [ maxk=1,..,K

hk(ξ)]

=K∑k=1

E Fk [ maxk=1,..,K

hk(ξ)] · ν∗k

≥K∑k=1

E Fk [hk(ξ)] · ν∗k (2.25)

=K∑k=1

n∑i=1

cTk,i[ν∗k %∗k(i) Γ∗k(i, i)]

T.

33


APPLICATION

This implies that the relaxation problem (2.21) is at least as tight as the bound generated

based only on marginal moments. Next, we add the constraint Γk %k

%Tk νk

0, (2.26)

which is a necessary and sufficient condition for the existence of a multivariate probability

measure Fk that satisfies∫<n

I<ndFk(ξ) = 1,

∫<nξdFk(ξ) = %k/νk,

∫<nξξ

TdFk(ξ) = Γk/νk,

given that νk > 0. This is because constraint (2.26) is equivalent to Γkνk− %k

νk

%Tkνk 0 by

dividing (2.26) by νk and using Schur’s complement. Thus, we can always construct a

distribution, e.g. multivariate normal distribution, with mean%Tkνk

and covariance Γkνk−

%kνk

%Tkνk

. Following similar arguments, we can construct, based on the optimal solution

(ν∗k ,%∗k,Γ

∗k,η

∗k,i) of (2.24), a new measure F =

∑Kk=1 Fk · ν∗k , where

∫<n I<ndFk(ξ) =

1,∫<n ξdFk(ξ) = %∗k/ν

∗k ,

∫<n ξξ

TdFk(ξ) = Γ∗k/ν

∗k ., and such a measure satisfies the

given mean µ and second order moment Σ+µµT conditions. Following similar inequality

arguments (2.25), we conclude that the relaxation problem (2.21) is at least as tight as

the bound generated based only on mean and covariance.

Remark 2.3.1. Proposition 2.3.1 and Theorem 2.3.1 rely on the piecewise-separable

structure of objective functions, which is a useful structure in generalizing the tractability

of univariate moment problems to multivariate settings.

2.4 Conclusion

In this chapter, we first exploit the connection between the theory of moment problems

and modern conic optimization to develop a tractable moment-based setting in evaluating

market-price based convex risk measures. The moment-based setting is useful as it allows

34


APPLICATION

for incorporation of a much wider class of distributions that results in more revealing

measures than would otherwise be possible when quantifying model misspecification in

derivative pricing. New tractability results of solving multivariate moment problems

are also presented, where several SDP reformulations are provided for moment problems

incorporating higher marginal moments. This class of multivariate moment problems can

serve as powerful modeling tools in the applications for which only marginal moments

are available. This is for example the case for many financial applications, where joint

moments of random returns such as correlations are often hard to estimate. In our future

work, we will apply the class of moment problems in the problem of portfolio selection

to verify its practical values.

35

Chapter 3

Accounting for Stochastic Moments

In this chapter, we deal with the second layer of uncertainty, i.e. stochastic moments, ad-

dressed in the introduction in developing the notion of comprehensive robustness. It has

been assumed in the formulation of classical moment problems that moments are static,

either having fixed values or falling into a fixed range of values. This is opposed to the

notion that moments are dynamic and often driven by an underlying stochastic process in

time series studies. Such a discrepancy leads to the difficulty of applying moment-based

bounds in practice since they may not conform well to richer distributional information

governing the true dynamics of moments. In particular, many decision environments

such as financial markets have been known undergoing a phase transition in a repetitive

manner, changing from one state to another. Such a transition is often accompanied by

the abrupt changes of moments such as the soaring of volatility. In such instances, the

use of moment-based bounds that account for only a single state can be misleading and

giving a false sense of robustness.

While little has been argued about the presence of multiple states in many environ-

ments, each of which for example has a distinct feature of trend and volatility, the exact

form of distribution that characterizes each state is often hard to specify. This leads to

the following question: Is it possible to derive distribution-free bounds that account for

36

CHAPTER 3. ACCOUNTING FOR STOCHASTIC MOMENTS

the existence of multiple states, and their associated likelihood and moment characteri-

zation? This motivates the development of a new framework presented in this chapter,

a stochastic semi-definite optimization model that incorporates the settings of classi-

cal moment problems as building blocks and uses the notion of recourse borrowed from

Stochastic Programming to further take into account the stochastic nature of moments.

As a result, more robust bounds are generated with respect to possible state realizations.

The remainder of the chapter is structured as follows. In Section 3.1, we review first

the deterministic semi-definite models proposed by Bertismas and Popescu (2002). In

section 3.2, we present the development of two-stage stochastic semi-definite models. We

show in section 3.3 that the framework is comprehensive and includes as special limiting

cases the deterministic and robust optimization counterparts. We also show that the

optimal bound value is equivalent to a Value at Risk quantity, and the optimal solution

can be obtained via simple sorting. Finally, in section 3.4, the framework is applied in

bounding the price of a European-style call option under regime switching. An additional

moment-based lattice is constructed for generating scenario-based moments. Computa-

tional experiments using the S&P 500 index as the underlying asset are performed that

illustrate the advantages of the stochastic programming approach over the deterministic

strategy.

3.1 Deterministic Semidefinite Optimization Models

In this section, we briefly review a class of deterministic SDP models introduced in

[Bertsimas and Popescu, 2002]. Such class of models, which is applicable to generating

moment-based bounds in case of univariate random variables, can be solved efficiently

for a wide range of specifications. To be specific, consider now the following formulation

37


of the moment problems.

maxQ

EQ[h(ξ)] (3.1)

subject to EQ[ξp] = bp, p = 0, 1, ..., d,

and

minQ

EQ[h(ξ)] (3.2)

subject to EQ[ξp] = bp, p = 0, 1, ..., d,

where ξ here is a univariate random variable. The general tractability results for the

above problem are given in the following theorem.

Theorem 3.1.1. [Bertsimas and Popescu, 2002] The tightest bounds for the problems

(3.1) and (3.2) with a piecewise polynomial function h : < → <, given moments of

an univariate random variable ξ, can be solved efficiently as a semidefinite optimization

problem.

From here on, for simplicity we will use the following generic form of SDP formulation

to represent the SDP problem used for bounding the expected value of a pre-specified

piecewise polynomial function h(ξ), given moments bp, p = 0, ..., d.

UBSDP(b) := minimizey,X,Z

d∑p=0

bpyp, (3.3)

subject to (X,Z,y) ∈ G ⊂ (Jm1 ,Jm2 ,Rd+1),

X,Z 0,

and

LBSDP(b) := maximizey,X,Z

d∑p=0

bpyp, (3.4)

subject to (X,Z,y) ∈ H ⊂ (Jm1 ,Jm2 ,Rd+1),

X,Z 0,

38


where b := (b0, ..., bd). G, H represent polyhedral sets corresponding to some linear

constraints introduced along the problem reformulation, and Jm denotes the set of real

symmetric matrices of order m. In general, the expression of matrices X and Z can be

further reduced to a single positive semi-definite matrix. The use of two matrices here is

for consistency purposes with later results.

It has been assumed in the above model formulations that the moments bp, p = 0, ..., d

are deterministic parameters. As discussed earlier, such an assumption can be problem-

atic in many practical applications. In the next section, we present a new optimization

approach that further accounts for the stochastic nature of moments and generates more

robust bounds with respect to possible state realizations.

3.2 A Stochastic Semidefinite Optimization Approach

In this section, we formulate models from which upper and lower bounds can be computed

in the presence of stochastic moments. In particular, we consider in our model that there

are S-possible realizations of states and each realization s (s = 1, ..., S) corresponds to

a distribution characterized by its vector of moments b(ws). Each realization of a state

will correspond to a scenario s and P (ws) is the probability that the scenario s will

realize. h(ws) denotes the optimal bound value obtained from the deterministic model

(3.1) or (3.2) using just the vector of moments b(ws). The models are two-stage stochastic

versions of the semidefinite programs defined in (3.3) and (3.4). In particular, the model

is a two-stage stochastic semidefinite program with recourse [Ariyawansa and Zhu, 2006]

that is analogous to the two-stage stochastic linear programming with recourse framework

[Kall and Wallace, 1994], [Birge and Louveaux, 1997].

The stochastic model seeks to find a semidefinite matrix in the first-stage that results

in a bound such that the expected penalized difference between the first-stage bound and

the bound for a given state realization in the second stage is minimized. The first stage

39


objective and constraints are as in the problems (3.3) and (3.4). Thus, the first stage

bound can be seen to be robust with respect to possible state realizations. The bound

values for each state realization in the second stage are computed offline using (3.3) for

upper bounds and (3.4) for lower bounds before the formulation of the model. Thus,

the recourse decision for each scenario is determined upon realization of the state in the

second stage and so the model is a stochastic program with simple recourse [Everitt and

Ziemba, 1979].

Robust Upper Bound with Stochastic Moments

We present the two-stage stochastic semidefinite programming model (SSDP) for the

upper bound, where the optimal value is denoted as UBSSDP(b) given that b is a moment

vector for the first stage. The two-stage stochastic semidefinite programming model is as

follows.

minimizey,X,Z

d∑p=0

bpyp +R(y), (3.5)


X,Z 0.

The recourse function R(y) is defined as R(y) := Ew[Q(y, w)], where the function

Q(y, w) is defined as follows

Q(y, w) := minimizey+,y− b+y+ + b−y−, (3.6)

subject to y+ − y− = h(w)−d∑p=0

bpyp,

y+, y− ≥ 0,

40


where y = (y0, ..., yd), b+, b− ≥ 0, w denotes the random outcome (scenario), and

h(w) := minimizey,X,Z

d∑p=0

bp(w)yp, (3.7)

subject to (X, Z, y) ∈ G ⊂ (Jm1 ,Jm2 ,Rd+1),

X, Z 0.

In (3.7), bp(w) denotes the moments with respect to the random outcome (scenario)

w. Thus, we define a new upper bound based on the optimal solution of (3.5):

UBSSDP(b) :=d∑p=0

bpyoptp ,

where yoptp denotes the optimal solution of (3.5).

Robust Lower Bound with Stochastic Moments

To formulate a stochastic semidefinite programming model for robust lower bound, we

change the sign of R(y) from positive to negative as follows, since the problem (3.4) is a

maximization problem.

maximizey,X,Z

d∑p=0

bpyp −R(y) (3.8)


X,Z 0,

To formulate the recourse function R(y) in (3.8), we only need to modify h(w) in (3.6)

as follows.

h(w) := maximizey,X,Z

d∑p=0

bp(w)yp, (3.9)

subject to (X, Z, y) ∈ H ⊂ (Jm1 ,Jm2 ,Rd+1),

X, Z 0.

41


Thus, we define a new lower bound based on the optimal solution of (3.8):

LBSSDP(b) :=d∑p=0

bpyoptp ,

where yoptp denotes the optimal solution of (3.8).

3.3 Solution Features

In this section, we discuss several features of the robust bounds UBSSDP(b) and LBSSDP(b).

In particular, we highlight its relation to the deterministic bounds UBSDP(b) and LBSDP(b).

We first consider the case that an identical moment vector b is used for both the first

stage in UBSSDP(b) (LBSSDP(b)) and in UBSDP(b) (LBSDP(b)). Such an identical moment

vector can for example be the average of the moment vectors associated with all possible

regime realizations. We show that under certain conditions on the penalty parameters

b+ and b− the bounds generated by the stochastic programming model are equivalent to

the ones generated by the deterministic model.

We then consider the case that the deterministic bounds UBSDP(b) and LBSDP(b) are

computed based on moment vectors that give the worst possible values of the bounds.

We show that the bounds UBSSDP(b) and LBSSDP(b) are always less conservative than the

worst-case bounds and therefore can be useful alternatives when additional information

of the underlying regime dynamics is revealed. Finally, we show that the robust bound

UBSSDP(b) (LBSSDP(b)) is actually equivalent to a Value at Risk (VaR) quantity where

the confidence level is a function of the penalty parameters of the stochastic model, and

thus can be computed easily via a sorting algorithm over a finite number of deterministic

bounds.

For simplicity, we present only the proofs of the upper bound results since the deriva-

tions of the corresponding lower bound results are analogous. In the following theo-

rem, we show that the robust bounds UBSSDP(b) and LBSSDP(b) are always more con-

42


servative than the deterministic bounds UBSDP(b) and LBSDP(b) given the same mo-

ment vector b. The difference between a robust bound and a deterministic bound, i.e.

(UBSSDP(b) − UBSDP(b)), can be seen to be the extra premium that relates to the cost

of hedging for over- or under-estimation of the bound.

Theorem 3.3.1. The optimal value UBSSDP(b) (LBSSDP(b) resp.) satisfies that

UBSSDP(b) ≥ UBSDP(b)

(LBSSDP(b) ≤ LBSDP(b) resp.) for any b.

Proof. Suppose that UBSSDP(b) < UBSDP(b) for some b. Since the optimal solution

of (3.5) is feasible for the problem (3.3), and the form of the function UBSSDP(b) :=∑dp=0 qpy

optp is identical to the objective function of (3.3), this contradicts the fact that

UBSDP(b) is the optimal value of (3.3).

The penalty parameters b+ and b− determine the risk aversion attitude of users to-

wards the difference between the first-stage bound and the bound on the option price for

a given regime realization in the second stage. Intuitively, the higher (lower) the b+ and

b− are, the more (less) sensitive the users are towards the difference. In the following

theorem, we show that when b+ ≤ 1 (resp. b− ≤ 1) the upper (resp. lower) bound

generated by the stochastic programming model is equivalent to the upper (resp. lower)

bound generated by the corresponding deterministic model.

Theorem 3.3.2. If b+ ≤ 1 (resp. b− ≤ 1), the optimal value UBSSDP(b) = UBSDP(b)

(resp. LBSSDP(b) = LBSDP(b)) for any b.

Proof. Based on Theorem 3.3.1, UBSSDP(b) ≥ UBSDP(b) for any b. Here, we further show

that UBSSDP(b) ≤ UBSDP(b) for any b given that b+ ≤ 1. First, let y′ := (y′0, ..., y′d) be

the optimal solution of the problem (3.5) and let y′′

:= (y′′0 , ..., y

′′

d ) be the optimal solution

43


of the problem (3.3). Suppose now that UBSSDP(b) > UBSDP(b) given that b+ ≤ 1, which

can be equivalently written as

d∑p=0

bpy′p =

d∑p=0

bpy′′

p + δ, δ > 0. (3.10)

Then, by substituting the right-hand-side of (3.10) as the optimal-value quantity as-

sociated with the solution y′ into the objective function of (3.5), and further re-arranging

the objective function based on the following partitions of ws

I1 := ws | h(ws) ≥d∑p=0

bpy′p

I2 := ws |d∑p=0

bpy′′

p ≤ h(ws) <d∑p=0

bpy′p

I3 := ws | h(ws) <d∑p=0

bpy′′

p,

we obtain the following quantity

d∑p=0

bpy′′

p + δ +∑ws∈I1

P (ws) · s(ws) +∑ws∈I2

P (ws) · s(ws) +∑ws∈I3

P (ws) · s(ws), (3.11)

where

s(ws) =

b+(h(ws)−

∑dp=0 bpy

′′p − δ) ws ∈ I1

b−(∑d

p=0 bpy′′p + δ − h(ws)) ws ∈ I2

b−(∑d

p=0 bpy′′p + δ − h(ws)) ws ∈ I3.

The quantity (3.11) can be re-written as

d∑p=0

bpy′′

p +R(y′′) + ∆ + δ(1−

∑ws∈I1

P (ws)b+ +

∑ws∈I2

P (ws)b− +

∑ws∈I3

P (ws)b−),

where ∆ := b−∑

ws∈I2 P (ws)·(∑d

p=0 bpy′′p−h(ws))−b+

∑ws∈I2 P (ws)·(h(ws)−

∑dp=0 bpy

′′p ).

Due to (3.10), h(ws)−∑d

p=0 bpy′′p ≤ δ for ws ∈ I2 holds, and thus ∆ ≥ δ(−b+

∑ws∈I2 P (ws)

−b−∑

ws∈I2 P (ws)). Based on this and some algebraic manipulation, it is easy to see that

∆ + δ(1 −∑

ws∈I1 P (ws)b+ +

∑ws∈I2 P (ws)b

− +∑

ws∈I3 P (ws)b−) > 0 if b+ ≤ 1, which

44


leads to the conclusion that y′′

is more optimal than y′ for the problem (3.5). This is a

contradiction, and thus if b+ ≤ 1, UBSSDP(b) ≤ UBSDP(b) must hold.

Consider now the worst-case formulations of the problems (3.3) and (3.4) in the spirit

of modern robust optimization [Ben-Tal et al., 2009]. From here on, let C∗ denote the

union set of the first stage moments b and the moments b(ws), s = 1, ..., S associated

with each scenario realization s, i.e. C∗ := b(ws), s = 1, ..., S ∪ b. The worst-case

formulation for the upper bound problem is given in (3.12), whereas the lower bound

formulation is given in (3.13).

WUBSDP = miny,X,Z

maxb(w)∈C∗

d∑p=0

bp(w)yp, (3.12)


X,Z 0.

WLBSDP = maxy,X,Z

minb(w)∈C∗

d∑p=0

bp(w)yp, (3.13)


X,Z 0,

where b(w) := (b0(w), ..., bd(w)). In the worst-case formulations (3.12) and (3.13) the

moment vector b is determined by the inner optimization problem, which seeks the worst

possible value of the objective function. In Theorem 3.3.3, we show that the robust

bound UBSSDP(b) (resp. LBSSDP(b)) is always less conservative than the bound WUBSDP

(resp. WLBSDP), which is based on the moment vector that results in the most extreme

objective value. Before showing the main theorem, we first present the following lemma,

which is useful for proving Theorem 3.3.3 and Theorem 3.3.4.

45


Lemma 3.3.1. The function h : y →∑d

p=0 bpyp given any b := (b0, ..., bd) is continuous

and unbounded above over the feasible set (X,Z,y) ∈ G ⊂ (Jm1 ,Jm2 ,Rd+1) | X ,Z

0, where y := (y0, ..., yd).

Proof. The continuity of h is obvious since the feasible set is a convex set. To see the

unboundness, consider maximizing instead of minimizing the objective function in (3.3).

The dual problem of the maximization form of (3.3) is as follows.

minimizeQ

∫−h(ξ)dQ(ξ), (3.14)

subject to

∫ξpdQ(ξ) = −bp, p = 0, 1, ..., d,

Q(ξ) ≥ 0,

where b0 = 1. Clearly, no feasible solution exists for the above problem. Based on

duality theory, this implies that the primal problem, the maximization form of (3.3), is

unbounded. This completes the proof.

Now, we are ready to prove Theorem 3.3.3.

Theorem 3.3.3. The optimal value UBSSDP(b) (LBSSDP(b) resp.) satisfies that

UBSSDP(b) ≤WUBSDP

(LBSSDP(b) ≥WLBSDP resp.) for any b ∈ C∗, where C∗ := b(ws), s = 1, ..., S ∪ b.

Proof. Consider the problem (3.5) with the first stage parameter b := (b0, ..., bd) ∈ C∗.

Let y′ := (y′0, ..., y′d) be the optimal solution of the problem (3.5). Suppose now that

UBSSDP(b) > WUBSDP. This implies that

d∑p=0

bpy′p > WUBSDP ≥ max

b(w)∈C∗min

y,X,Z:(X,Z,y)∈G,X,Z0

d∑p=0

bp(w)yp. (3.15)

To see why the last inequality in (3.15) is true, let bopt and yopt denote the optimal

b(w) and y in the optimization problem in (3.15). Then, in the worst-case upper bound

46


problem (3.12), if the optimal y∗ in (3.12) equals to yopt, the inequality in (3.15) must

hold. Consider now if the optimal y∗ in (3.12) does not equal to yopt, the following must

hold by the definition of the optimization problem in (3.15)

d∑p=0

(bopt)py∗p ≥

d∑p=0

(bopt)p(yopt)p.

Thus, the below inequalities follow immediately

miny,X,Z:(X,Z,y)∈G,X,Z0

maxb(w)∈C∗

d∑p=0

b(w)pyp = maxb(w)∈C∗

d∑p=0

b(w)py∗p

≥d∑p=0

(bopt)py∗p ≥

d∑p=0

(bopt)p(yopt)p.

This completes the verification of the last inequality in (3.15). The inequality in (3.15)

implies the following two sets of inequalities, (3.16) and (3.17):

d∑p=0

bpy′p > max

b(w)∈C∗min


d∑p=0

b(w)pyp

≥ miny,X,Z:(X,Z,y)∈G,X,Z0

d∑p=0

bpyp =d∑p=0

bpy′′

p , (3.16)

d∑p=0

bpy′p > max

b(w)∈C∗min


d∑p=0

b(w)pyp ≥ UBSDP(b(w)), ∀b(w) ∈ C∗, (3.17)

where (y′′0 , ..., y

′′

d ) denotes the optimal solution of the last optimization problem in (3.16).

The inequalities (3.16) imply that there exists a feasible y′′

such that

d∑p=0

bpy′p >

d∑p=0

bpy′′

p . (3.18)

Thus, based on Lemma 3.3.1 there must exists y′′′

:= (y′′′0 , ..., y

′′′

d ) such that

d∑p=0

bpy′p >

d∑p=0

bpy′′′

p = WUBSDP, and thusd∑p=0

bpy′′′

p ≥ UBSDP(b(w)), ∀b(w) ∈ C∗.

(3.19)

Finally, based on (3.17) and (3.19), it is easy to verify that y′′′

is more optimal than y′

for the problem (3.5), which is a contradiction. Thus, UBSSDP(b) ≤WUBSDP must hold

for any b ∈ C∗.

47


Interestingly, the two-stage stochastic semidefinite programming model for a robust

upper (resp. lower) bound can in fact be recast as a newsvendor problem with a simple

lower (resp. upper) bound constraint (cf. [Shapiro et al., 2009]). For simplicity, we

present and discuss only the reformulation of the upper bound problem to a newsvendor

problem as follows.

Theorem 3.3.4. Given that UBSDP(b) < ∞, the two-stage stochastic semidefinite pro-

gramming model (3.5) is equivalent to the following newsvendor problem

minimizex′≥l x′ + Ew[b+(h(w)− x′)+ + b−(h(w)− x′)−],

where b+, b− ≥ 0 and l := UBSDP(b).

Proof. To prove the equivalency, it suffices to show that for any x′ that satisfies x′ ≥

UBSDP(b), there exists a solution y := (y0, ..., yd) that is feasible with respect to both

the equality∑d

p=0 bpyp = x′ and constraints in (3.5). Given that UBSDP(b) < ∞, i.e.

∃(y0, ..., yd) such that∑d

p=0 bpyp ≤ x′ for any x′ ≥ l, the existence of a feasible y that

satisfies∑d

p=0 bpyp = x′ can be proven by showing that the function∑d

p=0 bpyp is contin-

uous with respect to the feasible set (X,Z,y) ∈ G ⊂ (Jm1 ,Jm2 ,Rd+1) | X,Z 0

and it is unbounded above, which is the result of Lemma 3.3.1.

Based on the above theorem, a closed-form solution for the two-stage stochastic

semidefinite programming model of upper bound (3.5) can be derived using the approach

to derive the optimal solution for the newsvendor problem.

Theorem 3.3.5. (cf. [Shapiro et al., 2009])

UBSSDP(b) = maxUBSDP(b), F−1(κ∗),

where κ∗ = (b+ − 1)/(b+ + b−), F−1(κ∗) = infh∗ : F (h∗) ≥ κ∗, and F (•) denotes the

cumulative distribution function of the random outcome h(w).

48


It is important to note that the measure F−1(κ∗) is also the popular risk measure,

Value-at-Risk (VaR). This result reveals the fact that the flexibility that we provide users

to control their risk aversion attitude towards over- or under-estimation of the bounds

in fact has a direct interpretation of the quantile selected for the Value-at-Risk measure.

Later in the computational experiments, we will provide both the values of b+ and b−

and the quantile to highlight the usefulness of this connection.

Finally, we highlight in the following corollary that as result of Theorem 3.3.5, to

compute the robust bounds UBSSDP(b) and LBSSDP(b) it suffices to consider only a finite

number of deterministic bounds, e.g. h(ws) and find the bounds that are optimal with

respect to the problems (3.5) and (3.8). This procedure can be easily carried out using a

sorting algorithm.

Corollary 3.3.1. (cf. [Shapiro et al., 2009]) The optimal value

UBSSDP(b) ∈ UBSDP(b)

(LBSSDP(b) ∈ LBSDP(b)), where b ∈ C∗ and C∗ := b(ws), s = 1, ..., S ∪ b.

3.4 Application in Bounding Option Prices

The problem of computing bounds on option prices has been of recent (e.g. [Gotoh and

Konno, 2002], [Bertsimas and Popescu, 2002], [Dalakouras et al., 2006], [Popescu, 2007])

and past interest (e.g. [Ritchken, 1985], [Lo, 1987], [Grundy, 1991], [Boyle and Lin,

1997]). This problem is important because it arises from the consideration of alternative

models to the standard geometric Brownian motion that is assumed in the Black-Scholes

framework for modeling the price of an asset, since geometric Brownian motion often

results in pricing biases. A variant of this problem [Bertsimas and Popescu, 2002], [Gotoh

and Konno, 2002] considers computing the tightest possible upper and lower bounds of

the price of an option given that there is no arbitrage in the market and that only the

49


first several moments of a risk neutral distribution are known. This approach can be

seen to be a relaxation of the Black-Scholes approach in that a particular model is not

assumed i.e. this approach is considered model or distribution free.

In this section, we consider bounds for the price of a European-style call option under

regime switching. The two-stage stochastic semidefinite programming model is applied

that incorporates a lattice generated by a finite-state Markov chain regime-switching

model as a representation of scenarios (uncertainty) to compute bounds. Our objective

here is to have a distribution free approach for computing bounds for a European-style call

option, but with a regime-switching process that does not necessarily assume a lognormal

distribution for each regime. We incorporate a finite-state Markov chain regime-switching

process as in [Hamilton, 1989] to generate a discrete lattice for computing option bounds.

The strategy is to use the lattice as a discrete set of scenarios that represents uncertainty

in regimes that will realize in the second stage in a stochastic programming with recourse

framework. The use of our stochastic semidefinite programming model will generate a

first-stage (here and now) bound that accounts for the regime switching dynamics of

the underlying asset. We demonstrate the value of the stochastic solution (bound) and

computational experiments using the S&P 500 index are performed that illustrate the

advantages of the stochastic programming approach over the deterministic strategy.

3.4.1 A Moment-Based Lattice under Regime Switching

We present here the construction of a lattice used for generating input parameters within

the recourse of the stochastic program in (3.5) and (3.8): scenarios s (s = 1, ..., S),

associated probability P (ws) and moments b(ws). The lattice is constructed based on

the information of conditional risk-neutral moments of a discrete-time regime switching

process, which captures the switching dynamics of the moments.

We assume that the continuous compound return of an underlying security follows

50


a Markovian regime switching model. The model assumes that there exists multiple

regimes Φt ∈ 1, ...,m′ for the value of the security at time t, and for different time t

the regimes can “switch” according to a transition probability matrix PΦ. The switching

follows a Markov process, pk′l′ = PΦ(Φt = l′|Φt−1 = k′) = PΦ(Φt = l′|I ′t−1), where I ′t−1

refers to the information set of prices available till t− 1. The model is defined as follows

Rt = µΦt + εt (3.20)

εt|Φt ∼ N (0, σ2Φt),

where Rt is a variable of interest, e.g. the continuous compound return, µΦt is regime-

dependent mean, and εt is a random variable with normal distribution and with regime-

dependent variance σ2Φt

.

Obtaining risk-neutral moments for a regime-switching process is in general not pos-

sible; however by conditioning on a sequence of realized regimes ΦT = ΦT , ...,Φt+1 =

Φt+1, where ΦT , ..., Φt+1 ∈ 1, ...,m′, the risk-neutral moments can be obtained as fol-

lows. From here on, the notation St denote the stock price at time t, and T denotes a

terminal time.

Lemma 3.4.1. Suppose the continuous compound return of the security price St follows

the mean-variance regime switching model (3.20). Then, the conditional risk-neutral

moment E[SdT |ΦT = ΦT , ...,Φt+1 = Φt+1, St = S0] = Sd0 exp(d(nr − 12

∑Tκ′=t+1 σ

2Φκ′

)h +

12d2∑T

κ′=t+1 σ2Φκ′h), where σ2

Φκ′is the variance with respect to regime Φκ′, n is the number

of time steps, h is the length of time step, and r is the risk-free rate.

Proof. The process can be written in the following form for each time step h at time t

log(St+1

St) = µΦt+1

h+ σΦt+1

√hεt+1, εt+1 ∼ N (0, 1),

where µΦt+1and σΦt+1

are independent of εt+1. Using the transformation of measures

(Maruyama-Girsanov Theorem, cf. [Kariya and Liu, 2003]), we can derive an equivalent

51


martingale measure by changing εt+1 = ε∗t+1 + δ′t+1

√h, ε∗t+1 ∼ N (0, 1), where δ′t+1 =

−(µΦt+1− r + 1

2σ2

Φt+1)/σΦt+1

, and the process for each time step h at time t under the

martingale measure becomes

St+1 = St exp((r − 1

2σ2

Φt+1)h+ σΦt+1

√hε∗t+1), ε∗t+1 ∼ N (0, 1).

By summing the sequence of normally distributed random variables with mean (r− 12σ2

Φt)h

and variance hσ2Φt

, one can obtain

ST = S0 exp((nr − 1

2

T∑κ′=t+1

σ2Φκ′

)h+√h

√√√√ T∑κ′=t+1

σ2Φκ′ε∗), ε∗ ∼ N (0, 1),

which is clearly a martingale conditioning on a given sequence of regimes. Finally, the

risk-neutral moments can be obtained using the moment function formula.

A conditioning approach seems to be computationally expensive or even intractable

due to an exponential number of regime switching paths where a single path is realization

of regimes for the time periods from the start to the time to maturity. Each time period

can be in one of m′ regimes, so the total number of regime switching paths for n′ periods

is m′n′. Running m′n

′many SDP subroutines can be computationally intractable for

large m′ or n′. However, by taking a careful look at the quantity of interest, E[SdT |ΦT =

ΦT , ...,Φt+1 = Φt+1, St = S0] = Sd0 exp(d(nr− 12

∑Tκ′=t+1 σ

2Φκ′

)h+ 12d2∑T

κ′=t+1 σ2Φκ′h), it ap-

pears that different regime-switching paths can result in the same quantity∑T

κ′=t+1 σ2Φκ′

.

This observation can be visualized in Figures 3.1(a) and 3.1(b) of the lattice construction

for the regime switchings under 2 and 3 regimes.

Each axis represents the state of each regime. As the process proceeds, the switching

of regimes can be viewed as taking incremental steps along the edges over the grid/mesh.

For 2-regimes, the construction coincides with the structure of a binomial lattice. It

is observed that all paths that traverse to the same node at time T in the lattice are

associated with the same quantity∑T

κ′=t+1 σ2Φκ′

. For 3-regimes, the lattice construction

52


w2

w1

(a) Lattice for 2 regimes

w2

w3

w1

(b) Lattice for 3 regimes

Figure 3.1: Regime switching lattices

requires an additional axis (dimension) so that the paths resulting in the same quantity∑Tκ′=t+1 σ

2Φκ′

can merge. Similar ideas apply to larger numbers of regimes.

Thus, to generate scenarios for m′ regimes with the set of variances (σ2)s := σ2Φ1, ...

, σ2Φm′ given T time periods, we first generate all combinations with replacement from

the set (σ2)s for T many selections; that is, we generate a sequence such as 1, 1, 2, 3

if m′ = 3 and T = 4, but does not allow a repeated combination such as 2, 1, 1, 3.

Each combination corresponds to a scenario and the respective risk-neutral moments

can be derived using the formula in Lemma 3.4.1. Thereafter, to derive the respective

probability for each scenario, we generate all possible permutations (i.e. the exact regime

switching paths) for each combination (each scenario), and for each path i calculate its

probability by P (i) =∑m′

j=1 P (i|Φ0 = j)P (Φ0 = j), where Φ0 is the initial state at time 0.

With the estimated parameters of the filtered probability P (Φ0 = j) and the switching

probability pk′l′ , P (i) can then be computed. Finally, by summing up the probability

of all paths within each scenario, we have a complete set of risk-neutral moments and

respective probabilities for all scenarios. This overall procedure can be summarized using

53


the pseudo-code algorithm in Table 3.1.

1. Initialize

S0: initial price, r: risk-free rate, (σ2)s: set of variances from m′ regimes,

T : the number of time periods till time to maturity,

P (Φ0 = j): filtered probability of initial state,

pk′l′ : transition probability from regime k′ to regime l′.

2. Generate scenarios ws, s = 1, ..., S

(2.a) generate a list of combinations with replacement ws, s = 1, ..., S from

the sequence 1, ..,m′, where the number of elements within each combination is T ,

for s = 1 to S

(2.b) compute the cumulative variance,∑Tκ′=t+1 σ

2Φκ′

(2.c) compute the risk-neutral moments using the formula:

E[SdT |ΦT = ΦT , ...,Φt+1 = Φt+1, St = S0]

= Sd0 exp(d(nr − 12

∑Tκ′=t+1 σ

2Φκ′

)h+ 12d

2∑Tκ′=t+1 σ

2Φκ′

h).

end

3. Compute the probability for each scenario

for s = 1 to S

(3.a) generate all permutations denoted by C′′

for the combination representing ws,

for c′′

= 1 to C′′

(3.b) compute P (c′′) =

∑m′

j=1 P (c′′ |Φ0 = j)P (Φ0 = j),

end

(3.c) compute P (ws) =∑C

′′

c′′=1 P (c′′).

end

Table 3.1: Pseudo code for scenario generation

3.4.2 Implementation and Experiments

Let T , K denote respectively the time to maturity (exercise time) and the strike price

(exercise price) of a European call option. r is the risk-free rate. Then, the call option

54


price is given by

e−rτEQ[max(0, ST − K)], (3.21)

where τ = T − t. The deterministic moment problems that bound the prices of the call

option are as follows:

UBSDP(b) := e−rτ maxQ

EQ[max(0, ST − K)] (3.22)

subject to EQ[SpT ] = bp, p = 0, 1, ..., d,

and

LBSDP(b) := e−rτ minQ

EQ[max(0, ST − K)] (3.23)

subject to EQ[SpT ] = bp, p = 0, 1, ..., d,

where ST is a non-negative random variable. The above problems can be considered as

special cases of the general moment problems addressed in Section 3.1.

In this section we investigate the empirical performance of the robust bound for

European-style call option prices under regime switching. In particular, we are inter-

ested in S&P500 stock index options as empirical studies show strong evidence of regime

switching behavior [Turner et al., 1989], [So et al., 1998], [Hardy, 2001], [Freeland et al.,

2009]. Our results show that the deterministic SDP bound UBSDP(•) can be inadequate

to bound the option price if the volatility of the underlying asset is non-deterministic. In

general, our estimation methodology follows [Christoffersen and Jacobs, 2004], and [Hsieh

and Ritchken, 2005]; that is, we minimize the sum of squared option-valuation errors and

combine cross sectional information from option prices and asset prices. In performing

the computational experiments, we focus on options with maturities from months to a

year as opposed to shorter durations, since regime-switching is most often occurring for

underlying assets such as the S&P500 for durations longer than a few months.

55


The main intent of the experiments is to demonstrate the computational feasibility

of computing quality bounds for option prices for assets with regime switching dynamics

using a stochastic semidefinite programming approach rather than emphasizing the cali-

bration of the model for testing in and out of sample performance. We collected option

data with maturities in multiple of five weeks. The data covers approximately the four

year period from October 2004 to March 2008. The data is collected on the third Friday

of each month and is obtained from the OptionMetric database through the Rotman

Financial and Trading Lab at the University of Toronto. We adjust the index level ac-

cording to the dividends paid out over the time to maturity. The actual cash dividend

payments made during the life of the option is used as a proxy for the expected dividend

payments, as suggested in [Jackwerth and Rubinstein, 1996] and [Bakshi et al., 1997].

Then, we subtract the present value of all the dividends from the index levels to obtain

contemporaneous adjusted index levels. We also normalize the option and strike prices

by the adjusted index price so that the adjusted index price is $1. We use the T-bill

term structure to deduce the discount rates and estimate the regime switching model by

minimizing the sum of squared errors between theoretical and actual prices.

However, in order to combine cross sectional information from option prices with the

time series behavior of the underlying asset, the transition probability matrix PΦ and the

filtered probability of the regimes are estimated using Maximum Likelihood Estimation

(MLE). Then, we minimize the following quantity:

$RMSE =

√1

Nt

∑i∗

(Ci∗,t − Ci∗,t(h∗t ))2,

where Ci∗,t is the market price of contract i∗ at time t, Ci∗,t(h∗t ) is the respective model

price, and Nt is the total number of contracts available at time t. At each time t,

the MLE-estimated filtered probability for each regime is taken as the probability for

initial state, and the MLE-estimated transition probability matrix PΦ is assumed to

hold for all contracts. Hence, only the volatility of each regime needs to be estimated.

56


The choice of estimation approach was to ensure the consistency of the approach for

the estimation of the volatility with traditional implied volatility estimation approaches.

10,000 simulated paths are generated, and the antithetic variate technique is employed

for variance reduction. To verify that 10,000 draws are adequate, we generate 25,000

paths for several cases and obtained identical results.

The following option bound computations are based on estimation using data until

March 20 2008, of which the details are provided in [Kwon and Li, 2011]. The initial price

is S0 = 1329.51 and the yearly risk-free rate r = 0.01. From the estimation results, where

up to five regimes are estimated, the initial state of the price process can be found to be

consistently in one of two regimes as determined by the estimated filtered probability of

each regime. The volatilities of the two regimes are distinctively different. Let blow (resp.

bhigh) denote the first four moments of the risk-neutral distribution for the regime with

low (resp. high) volatility. The first four moments are usually considered to capture most

of essential distributional information such as skewness and kurtosis. We consider in our

experiments that a trader may either assume the price process to follow the regime with

low volatility and compute the deterministic SDP upper bound UBSDP(blow), or assume

the process to follow the regime with high volatility and compute the deterministic SDP

lower bound LBSDP(bhigh).

To verify the quality of the bounds, we compute the European call option prices

under a lognormal regime switching model and use them as references for true option

prices. We also compute the respective robust bounds UBSSDP(blow) and LBSSDP(bhigh)

that further account for the dynamics of regime switching. For simplicity, we set b− = 1

(resp. b+ = 1) when computing upper (resp. lower) bounds for different values of b+

(resp. b−); we present the respective quantile (%) based on Theorem 3.3.5, which helps

to highlight its connection with the VaR risk measure. Finally, the worst-case bounds

WUBSDP and WLBSDP are generated as well. The bounds and prices are computed over

57


1 2 3 4 5 6 7 8120

140

160

180

200

220

240

260

Time to Maturity (5 weeks/unit)

Upper

bounds a

nd p

rices

BSlow

RS

UBSDP

UBSSDP

(b+=10)

UBSSDP

(b+=10

2)

WUBSSDP

(a) Upper bounds and prices

1 2 3 4 5 6 7 8120

140

160

180

200

220

240


Low

er

bounds a

nd p

rices

BShigh

RS

LBSDP

LBSSDP

(b−=10)

LBSSDP

(b−=10

2)

WLBSDP

(b) Lower bounds and prices

Figure 3.2: The case of 2 regimes and K = 1200

various maturities in multiples of 5 week periods for regime switching processes using 2, 3,

4, and 5 regimes. From here on, RS denotes the value of the European call option prices

under a lognormal regime switching model, and BS (Low) (resp. BS (High)) refers to

the Black-Scholes call option price computed based on the regime with low (resp. high)

volatility.

In Table A.4, we provide prices/bounds for various strike prices K that correspond

to the cases of in-the-money (K=1200), at-the-money (K=1325), and out-of-the-money

(K=1400).

(a). Quality of Bounds As seen in Figures 3.2-3.4 and Figures B.1-B.3, except for the

lower-bound case of 2 regimes and strike price 1200, the deterministic SDP bounds are

inadequate to bound the European call option prices in the presence of regime switching.

Despite its least reliance on the form of distribution, the deterministic SDP bounds

still significantly under- or overestimate how extreme the option price can be. More

importantly, if the distribution for each regime deviates from the lognormal distribution

58


1 2 3 4 5 6 7 820

40

60

80

100

120

140

160

180

200

220


Upper

bounds a

nd p

rices

BSlow

RS

UBSDP

UBSSDP

(b+=10)

UBSSDP

(b+=10

2)

WUBSSDP


1 2 3 4 5 6 7 80

20

40

60

80

100

120

140

160

180


Low

er

bounds a

nd p

rices

BShigh

RS

LBSDP

LBSSDP

(b−=10)

LBSSDP

(b−=10

2)

WLBSDP



assumed in RS pricing, e.g. fat-tail return distribution in a bear market, the actual under-

or overestimation resulting from the use of deterministic SDP bounds can be even worse.

On the other hand, the robust bounds UBSSDP(blow) (LBSSDP(bhigh)) consistently bound

the RS prices with fixed penalty parameters and thus could serve as useful alternatives

for evaluating the extremeness of option prices. In addition, the robust bounds can also

be found much more reasonable than the worst-case upper bounds WUBSDP that are

too conservative to have any practical value. The most extreme example can be found

in Figure B.3(a), where the worst-case upper bound is meaningless. Interestingly, even

when we increase the value of b+ to an extent that corresponds to covering 99.999%

the possible bounds, the robust upper bounds UBSSDP(blow) are still significantly tighter

than the worst-case ones WUBSDP. This sheds light on the practical value of the robust

bounds that allow controllability in the degree to which the regime switching dynamics

are incorporated.

59


1 2 3 4 5 6 7 80

20

40

60

80

100

120

140

160

180


Upper

bounds a

nd p

rices

BSlow

RS

UBSDP

UBSSDP

(b+=10)

UBSSDP

(b+=10

2)

WUBSSDP


1 2 3 4 5 6 7 80

20

40

60

80

100

120

140


Low

er

bounds a

nd p

rices

BShigh

RS

LBSDP

LBSSDP

(b−=10)

LBSSDP

(b−=10

2)

WLBSDP



(b). The Impact of the Structure of Regime Switching Lattice on the Option

Bounds It can be observed in Figures 3.2-3.4 that for different K values, the feature of

the robust bounds are mostly identical except the actual values. This is plausible since

the change of K value is unrelated to the structure of the lattice and, therefore, affect

only the exact values of the bounds. In addition, the robust bounds exhibit two trends.

First, all bounds increase as a function of time to maturity (which reflects the fact that

prices for options for longer maturities are higher). Second, as the penalty parameter b+

(b−) increases the bounds become increasingly extreme and reflect the regime switching

process as indicated by the undulating nature of the bounds that correspond to different

regime realizations that may occur in the future.

Consider the upper bound results as an example. In the case that b+ = 1, i.e.

UBSSDP(•) is equivalent to the deterministic bound UBSDP(•), the curve is smooth and

the bound increases steadily as the time of maturity increases. The difference between

the features of the curves highlights the effectiveness of the robust bounds to further

take into account the structure of a regime switching lattice. Parameter b+ corresponds

60


to a certain quantile of the distributions of paths constructed via a regime switching

lattice, where a path is a particular sequence of regime realizations starting from the

start of the second stage and up to time t. Thus, if an increment of b+ leads to only a

marginal increase in the bound UBSSDP(•) at a time to maturity t, this implies that for

the regime switching paths that end at the time point t, those paths which may lead to

higher bounds continue to be much less likely under the increased quantile and do not

influence the bound. If bounds at times to maturity t are sensitive to the incremental

changes in b+, it is because that those paths that lead to higher bounds are also more

likely at time t under the increased quantile.

It can also be observed (see Table A.4) that the bounds become more responsive to

increases in the parameter b+ (b−) as the number of regimes increase. In the cases of

larger number of regimes, the bound at each time-to-maturity changes more frequently

as b+ (b−) increases. This can be explained by the observation that as the number of

regimes increases, the regime-switching lattice becomes finer and, thus, at each time to

maturity the number of possible option bounds at the second-stage increases since the

number of realizations of regime paths increases. As a result, it becomes more likely

to switch from one bound based on a particular realized regime path to another as

parameter b+ (b−) increases. Similar reasoning could also be used to explain why the

robust bounds become in general more responsive to increases in the parameter b+ as

the time-to-maturity increases. Overall, such a feature can be useful in practice since it

implies that the sensitivity of the bounds to a user’s risk aversion attitude (represented

by parameter b+ (b−)) toward over or under estimation of the first stage bound can be

controlled through the complexity of the underlying regime switching lattice.

61


3.5 Conclusion

In this chapter, stochastic semidefinite programming models were developed that in-

corporate as scenarios (uncertainty) a moment-based lattice generated by a finite-state

stochastic model to compute bounds on expected future performance. The stochastic pro-

gramming approach presents an effective approach to mitigate the risk associated with

stochastic moments in that the models are tractable and controllable through penalty

parameters that express risk aversion. The use of a general finite-state lattice in the

stochastic programming framework is not an ad hoc approach for computing bounds as

the deterministic and robust optimization counterparts are limiting cases and all bounds

are equivalent to Value at Risk quantities where the confidence corresponds to a function

of the penalty parameters. Extensive computational experiments to generate bounds on

the price of European-style call options under regime switching were performed illustrat-

ing the flexibility and advantages of the bound over deterministic approaches.

62

Chapter 4

Distributionally Robust

Optimization under Extreme

Moment Uncertainty

In this chapter, our focus is to tackle the third layer of uncertainty, i.e. extreme moment

uncertainty, that completes the notion of comprehensive robustness proposed in the be-

ginning of this thesis. This extreme form of moment uncertainty, moment outliers, is

addressed in the context of decision optimization. Recently, there is a growing body of

research, so-called Distributionally Robust Optimization (DRO), that focus on stochas-

tic optimization problems for which only partial moment information of the underlying

probability measure is available. DRO stems from the minimax approaches pioneered

by Scarf (1958), where decisions that minimize the worst-case (maximum) expected cost

among a set of distributions sharing common mean and variance are sought. The inner

maximization problem, in its general form, is indeed the moment problem addressed in

Chapter 2. The complexity of DRO, which involves infinitely many moment problems

as sub-problems, has been studied in various contexts. For example, El Ghaoui et al.

63

CHAPTER 4. DISTRIBUTIONALLY ROBUST OPTIMIZATION UNDER EXTREME

MOMENT UNCERTAINTY

(2003) considered a portfolio selection problem that minimizes worst-case value-at-risk

of portfolios when only the mean and covariance information is available. Variants and

extensions of El Ghaoui et al.’s work can be found in [Natarajan et al., 2008], [Zhu and

Fukushima, 2009] among others. Popescu (2007) considered a wide range of utility func-

tions in decision analysis and studied the problem of maximizing expected utility, given

only mean and covariance values. In accounting for moment uncertainty, Goh and Sim

(2010) developed a general optimization framework with recourse that takes into account

the uncertainty of mean. Delage and Ye (2010) modeled moment uncertainty via an

ellipsoidal set of mean vectors and a conic set of covariance matrices, and proved the

tractability of solving a general class of stochastic optimization problems with piecewise-

concave objective functions.

The aforementioned DRO approaches however assume that the range of moments can

be specified completely through historical data, which overlooks the limitation of data in

capturing extreme events. We present in this chapter a new DRO-type framework, which

we call comprehensive distributionally robust optimization, that enables decision makers

to seek a reasonably robust policy in the presence of rare but high-impact realization

of moment uncertainty. Our framework can be viewed as a moment-based extension

from the penalized maxmin framework studied in [Anderson et al., 2000], [Uppal and

Wang, 2003], and [Maenhout, 2004], where a penalty function is used to account for

the ambiguity of a prior reference measure. In our framework, the reference measure

is replaced by the confidence region of reference moments and alternative measures are

replaced by alternative moments.

Besides the possible modeling benefit of being distribution-free, one other advantage of

our moment-based approach is its tractability, as a penalized distribution-based approach

typically results in a computationally overwhelming optimization problem unless some

strong assumptions are made, e.g. normality, or discrete random returns (e.g. [Calafiore,

64


MOMENT UNCERTAINTY

2007]). Without these assumptions, a sampling-based approximation such as Monte-

Carlo simulation is typically required, which could lead to an extremely computationally

intensive problem. In contrast, our problem is moment-based and thus is expected to

be free from such a challenge. We provide two computationally tractable methods for

solving the problem. The first method is developed using the famous ellipsoid method

well suited for a general convex formulation, and the second one is based on semidefinite

programming reformulations and state-of-art semidefinite programming algorithms.

The structure of this chapter is as follows. We begin in section 4.1 with the highlight

of moment outliers. In section 4.2, we present a new comprehensive distributionally ro-

bust optimization approach which does not rely on full distributional information and

requires only the first two moments. In section 4.3, we show that under very mild con-

ditions, the newly developed optimization model is guaranteed to be solvable in polyno-

mial time, which provides a firm basis for future development of efficient algorithms. We

also highlight in section 4.4 the relation between our comprehensive robust optimization

framework and the classical worst-case (minimax) approach. In section 4.5, we further

specialize the problem to a particular class of convex problems and show such class of

problems can be reformulated as semidefinite programming problems that can be solved

in a practically efficient manner. Variations and extensions of the problem are addressed

such as incorporation of alternative moment structures, and extension to factor mod-

els. Finally, in section 4.6, we apply the distributionally robust approach to a portfolio

selection problem, where extensive numerical study based on real-life data is presented.

4.1 Moment Outliers

Prior DRO approaches account for moment uncertainty by constructing a region where

realized moments can possibly fall in. For example, in Delage and Ye (2010) a high-

percentile confidence region revolving around a pair of sampled mean and covariance

65


MOMENT UNCERTAINTY

is constructed and incorporated in decision optimization. They showed via portfolio-

selection experiments that the respective performance is superior to the performance

of the portfolio solved by an approach using a fixed pair of mean and covariance only.

However, what remains to be investigated is the effect of the extreme values of moments

that fall outside the region on the overall portfolio performance. Due to its extremeness,

moments at tail percentile may significantly change the portfolio selection. In addition,

these outliers become ever non-negligible in modern portfolio risk management as sev-

eral severe losses in recent financial markets are due to those rarely-happened events.

Unfortunately, a fixed-bound DRO approach, like Delage and Ye’s, may not provide a

satisfactory solution since there is no clear rule to decide a bound within this tail per-

centile. If including all physically possible realizations of moments into the uncertainty

set, one would obtain an overly pessimistic solution. Alternatively, if one specifies the

uncertainty set based on his/her confidence region of mean and covariance, this may leave

investors fully unguarded if the realized mean and covariance fall outside the uncertainty

set. In short, any fixed bound can turn out to give a overly-conservative solution or a

solution vulnerable to worst case scenarios.

In the next section, we provide a new optimization framework offering a mechanism

that can be seen as “endogenously” achieving bounds for extreme moment uncertainty.

The degree to which the bounds are enlarged will depend on the performance deterioration

that the enlargement can cause. Such a mechanism becomes possible via a novel penalty-

type of construction.

66


MOMENT UNCERTAINTY

4.2 Comprehensive Distributionally Robust Optimiza-

tion

We begin this section with considering the following scenario. An decision maker in-

tends to optimize his/her resource allocation ξTx according to a certain convex measure

function Gc, where x ∈ <n is a resource allocation vector assigned over n resources as-

sociated with the vector of random payoffs ξ. Let Q denote the probability measure of

random payoffs ξ. The allocation vector x is subject to a convex feasible set Xc ⊆ <n,

which is typically specified by some real-life constraints. He/she is uncertain about the

exact distributional form of the probability measure Q, and the information he/she can

acquire about Q is that it belongs to a distribution set characterized via a set of first

two moments (µc,Σc) = (µi,Σi) | i ∈ C. From here on, the notation Q(· ; µ,Σ)

denotes a probability measure Q associated with mean µ and covariance Σ. The set

(µs,Σs) = (µi,Σi) | i ∈ S comprises all pairs of ambiguous moments (µ,Σ) (of Q).

Thus, if µ ∈ <n and Σ ∈ <n×n then both the sets (µc,Σc) and (µs,Σs) are subsets

of the space <n × <n×n. Note that the confidence region of moments (µc,Σc) can be

either a singleton or an uncountable set. Now, we reformulate a penalized moment-based

framework in its generic form:

infx∈Xc

sup(µ,Σ)∈(µs,Σs),Q(· ; µ,Σ)

EQ(· ; µ,Σ)[Gc(ξTx)]− Tw( µ,Σ | µc,Σc ). (4.1)

For technical reasons, we consider the probability measure Q associated with measurable

space (<n,B).1

In the above formulation, the function Tw is a newly-introduced function that mea-

sures the discrepancy between a pair of moments (µ, Σ) and the set of moments specified

in the confidence region (µc,Σc). The subscript w is a users-defined penalty parameter.

We call such discrepancy as“moments discrepancy”throughout the chapter. The function

1B is Borel σ−algebra on <n.

67


MOMENT UNCERTAINTY

Tw is assumed to satisfy the following property

(µ,Σ) ∈ (µc,Σc)⇔ Tw( µ,Σ | µc,Σc ) = 0,

(µ,Σ) /∈ (µc,Σc)⇔ Tw( µ,Σ | µc,Σc ) > 0.

The magnitude of the functional Tw(·) is assumed to be positively correlated with the

moments discrepancy. Thus, the larger the moments discrepancy between (µ,Σ) and

(µc,Σc) is, the less likely it is for the measure Q(· ; µ,Σ) to be chosen for evaluating

the expectation. From a modeling perspective, the penalized moment-based problem

provides a comprehensive treatment for decision makers holding different conservative

attitudes towards the following three ranges within which (µ,Σ) possibly takes values:

– When the candidate mean and covariance (µ,Σ) stays within the confidence region

(µc,Σc), the problem recovers the standard minmax setting. In other words, when

the decision maker is certain about some realizations of moments, he/she naturally

holds strictly conservative attitude and pursues only robust performance of portfolio

selection.

– When (µ,Σ) /∈ (µc,Σc) but is in the set (µs,Σs), this represents an “ambiguity” region

that the decision maker seeks a balance between relying on his/her prior knowledge

and properly hedging the risk of model uncertainty, where (µs,Σs) contains all

“physically possible” realizations of moments. In this region, using a standard

minmax setting can lead to an impractical solution. Instead, the moment-based

problem helps the decision maker to decide appropriate conservativeness based on

the possible performance deterioration resulting from each (µ,Σ) /∈ (µc,Σc). This

leads to a less-conservative setting.

– When (µ,Σ) /∈ (µs,Σs), the moments are in a region with no “physically possible”

realizations. Therefore, the decision is optimized without taking into account this

68


MOMENT UNCERTAINTY

scenario when evaluating its worst-case performance. The decision maker holds no

conservative attitude for this region.

Comprehensive Distributionally Robust Optimization

Now, we further specialize the formulation (4.1) by refining the structure of the penalty

function Tw. We first consider two separate distance functions dµ : <n1 × <n2 → <+,

dΣ : <n1×n1 × <n2×n2 → <+ that are used to measure the deviation of (µ,Σ) from

the confidence region (µc,Σc). Specifically, we define dµ(µ, µc) := infν′∈µc ||µ − ν ′|| and

dΣ(Σ,Σc) := infσ′∈Σc ||Σ− σ′|| as distance functions, where the notation || · || denotes a

norm that satisfies the properties of positive homogeneity and of subadditivity. From here

on, for tractability we assume that the sets µc and Σc are closed, bounded and convex.

In some cases, it can be useful to have the penalty function Tw depend non-linearly on

the moments discrepancy, and we assume only that Tw is jointly convex with respect to

dµ(µ, µc) and dΣ(Σ,Σc).

To implement the overall problem in a tractable manner, we now propose the following

comprehensive distributionally robust optimization model

(Pp) minx∈Xc

maxγ,µ,Σ,Q(· ; µ,Σ)

∫Gc(ξ

Tx)dQ(ξ)− rw(γ)

subject to infν′∈µc

||µ− ν ′|| ≤ γ1 (4.2)

infσ′∈Σc

||Σ− σ′|| ≤ γ2 (4.3)

0 ≤ γ ≤ a. (4.4)

In the above model, the penalty function Tw is implemented using an alternative convex

penalty function rw together with the constraints (4.2) and (4.3). The variable γ denotes

the vector (γ1, γ2). The variables γ1, γ2 are introduced to bound the mean-covariance

discrepancy. The function rw is assumed to satisfy the properties of a norm and used to

measure the magnitude of the vector γ and thus translate the moments discrepancy into

69


MOMENT UNCERTAINTY

penalty. The constraint (4.4) provides a hard bound on γ and models the “physically

possible” region (µs,Σs).

Our last two refinements of the model (Pp) are as follows. First, the objective function

Gc is assumed to be a piecewise-linear convex function

Gc(z) := maxk=1,...,K

ak · z + bk.

This general piecewise linear structure provides decision makers the flexibility to maxi-

mize a piecewise linear utility function by setting ak = −ck and bk = −dk given the utility

function U(z) := mink=1,...,Kck · z + dk. This structure can also be easily extended to

model the popular CVaR risk measure and a more general optimized certainty equivalent

(OCE) risk measure (see [Natarajan et al., 2010]).

Furthermore, the penalty function rw(γ) is assumed to admit the form

rw(γ) :=L∑l=1

wlrl(γ), wl ≥ 0, (4.5)

where each rl(γ) is a convex norm function. In this form, the penalty parameter w is

expanded from a scalar to a vector. This expansion allows for a more flexible way to

adjust decision makers’ aversion towards model ambiguity based on particular structures

of (γ1, γ2). Thus, the index w in rw(·) should correspond to a vector (w1, ..., wL) at right-

hand-side of (4.5). In addition, w2 > w1 means that w2l > w1

l , l = 1, ..., L. We now

consider how the investor may adjust his/her ambiguity-aversion attitude using (4.5) in

the following example.

Example 4.2.1. Consider the penalized problem (Pp) with rw(γ) = w1 · γ1 + w2 · γ2 +

w3 · ||γ||2. By setting w3 = 0, ambiguity of mean and covariance can only be adjusted

independently. As an example, a risk-management oriented investor can be less sensitive

towards the ambiguity of mean and thus tends to increase the value of w1; he however

may hesitate to increase the value of w2 due to the concern of unexpected volatility. On

70


MOMENT UNCERTAINTY

the other hand, a return-driven investor may do so in an opposite way. When w3 6= 0,

ambiguity of mean and covariance can be both adjusted independently and dependently.

Thus, for the investor who thinks the chance that both mean and covariance will fall

outside the confidence region is small, increasing w3 value serves the need.

Remark 4.2.1. Classical penalized approaches based on a relative entropy penalty func-

tion can in fact be viewed as a special instance of our moment-based approach when the

standard assumption of normality is made. The relevant discussion is provided in [Li

and Kwon, 2011].

4.3 General Complexity Results

The goal of this section is to obtain a globally-optimal solution for the problem (Pp) in

a computationally tractable manner. To help the discussion, we define first the following

two functions

F(x,γ) := maxµ,Σ,Q(· ; µ,Σ)

∫Gc(ξ

Tx)dQ(ξ) | (4.2) ∼ (4.4), (4.6)

Sw(x,γ) := F(x,γ)− rw(γ).

Thus, Sw(x,γ) denotes the optimal value of (Pp), for fixed w,x,γ. The solution method

is developed based on two observations. First, given fixed w∗,x∗, the functional Sw∗(x∗,γ)

is concave with respect to γ. This concavity observation together with the convexity of

the feasible region Xc (of the allocation vector x) allows us to reformulate the problem

by exchanging the minx∈Xc and the maxγ ; thus, we can reformulate the problem (Pp) as

follows

(Pν) max0≤γ≤a

ν(γ)− rw(γ),

where

ν(γ) := minx∈Xc maxµ,Σ,Q(· ; µ,Σ)

∫Gc(ξ

Tx)dQ(ξ) | (4.2) ∼ (4.4). (4.7)

71


MOMENT UNCERTAINTY

In addition, the concavity observation also gives us the certificate that a local search

method is sufficient to find a globally optimal γ∗ if ν(γ) can be evaluated for any γ.

Our second observation is that given a fixed γ∗, there exists a computationally tractable

approach to solve the dual problem of the right-hand-side optimization problem in (4.7)

and strong duality holds for the right-hand-side problem. That is, for fixed γ∗ the

functional ν(γ∗) can be efficiently evaluated. Combining these two observations, a direct

search method (see [Kolda et al., 2003]) can be applied to solve (Pν). For such a two-

dimensional problem with box-type of constraints, a straightforward approach that leads

to global convergence is to examine the steps along x, y-coordinate directions. If there

exists a feasible and improved direction, new iterate is updated; otherwise, bisect the four

possible steps and examine them again. Although a direct search method may not be

as efficient as a derivative-based optimization method, the problem (Pν) shall be small

and simple enough to be tractable using such a method. Furthermore, if only the bound

γ∗∗ = max(γ1, γ2) is of interest to penalize, the problem (Pν) that unifies the bound

γ1 = γ2 can be solved in polynomial time using a binary search algorithm.

Up to now, we have only stated the two observations as a fact but have not justified

their validity. The first observation is proven in the Theorem 4.3.1, which hinges heavily

on the following lemma.

Lemma 4.3.1. Given that the distance functions dµ(·, µc), dΣ(·,Σc) are convex, let Qα(·

; µα,Σα) (resp. Qβ(· ; µβ,Σβ)) denote a probability measure that satisfies dµ(µα, µc) ≤

αµ (resp. dµ(µβ, µc) ≤ βµ) for some αµ(resp. βµ) and dΣ(Σα,Σc) ≤ αΣ (resp. dΣ(Σβ,Σc)

≤ βΣ) for some αΣ(resp. βΣ). Then, there exists a probability measure Qη(· ; µη,Ση) =

λ′Qα + (1− λ′)Qβ that satisfies dµ(µη, µc) ≤ ηµ and dΣ(Ση,Σc) ≤ ηΣ, where

ηµ

ηΣ

=

λ′

αµ

αΣ

+ (1− λ′)

βµ

βΣ

and 0 ≤ λ′ ≤ 1.

72


MOMENT UNCERTAINTY

Proof. Given that dµ (resp. dΣ) is a convex function, by definition the epigraph Sµ :=

(µ, t) | dµ(µ, µc) ≤ t (resp. SΣ := (Σ, s) | dΣ(Σ,Σc) ≤ s) is a convex set. Since

(µα, αµ), (µβ, βµ) ∈ Sµ and (Σα, αΣ), (Σβ, βΣ) ∈ SΣ, the following holds for any 0 ≤ λ′1 ≤

1, 0 ≤ λ′2 ≤ 1 according to the property of a convex set

λ′1(µα, αµ) + (1− λ′1)(µβ, βµ) ∈ Sµ

λ′2(Σα, αΣ) + (1− λ′2)(Σβ, βΣ) ∈ SΣ.

Thus, given that

ηµ

ηΣ

= λ′

αµ

αΣ

+ (1 − λ′)

βµ

βΣ

and 0 ≤ λ′ ≤ 1, the above

implies that ∃µη,Ση such that µη = λ′µα + (1− λ′)µβ and Ση = λ′Σα + (1− λ′)Σβ also

satisfy

dµ(µη, µc) ≤ ηµ, dΣ(Ση,Σc) ≤ ηΣ

by setting λ′1 = λ′2 = λ′. Finally, it is trivial to see that the probability measure λ′Qα +

(1− λ′)Qβ indeed satisfies

Eλ′Qα+(1−λ′)Qβ [X] = λ′EQα [X] + (1− λ′)EQβ [X],

where X is a random variable. This completes the proof.

Theorem 4.3.1. Given that the penalty function rw(γ) is convex over γ and w∗,x∗ are

fixed, the functional Sw∗(x∗,γ) is concave with respect to γ.

Proof. It suffices to show that for a fixed x∗ the function F(x∗,γ) in (4.6) is concave

with respect to γ due to induced concavity of −rw(γ). Let us consider the functional

λ′F(x∗,γα′) + (1− λ′)F(x∗,γβ′). Let

Q′α′(β′) := arg maxQ∈Q(· ; µ,Σ) : [dµ dΣ]T≤γα′(β′)

∫Gc(ξ

Tx∗)dQ(ξ),

where we abbreviate the notation dµ(µ, µc) and dΣ(Σ,Σc) as dµ dΣ. Then,

λ′F(x∗,γα′) + (1− λ′)F(x∗,γβ′) =

∫Gc(ξ

Tx∗)d(λ′Q′α′ + (1− λ′)Q′β′)(ξ). (4.8)

73


MOMENT UNCERTAINTY

Lemma 4.3.1 gives us that there exists Q′η′ ∈ Q(· ; µ,Σ) : [dµ dΣ]T ≤ λ′γα′+(1−λ′)γβ′

such that

Q′η′ = λ′Q′α′ + (1− λ′)Q′β′ .

Suppose that Q′′η′ = arg maxQ∈Q(· ; µ,Σ) : [dµ dΣ]T≤λ′γα′+(1−λ′)γβ′∫Gc(ξ

Tx∗)dQ(ξ). It fol-

lows that

(4.8) =

∫Gc(ξ

Tx∗)dQ′η′(ξ) ≤

∫Gc(ξ

Tx∗)dQ′′η′(ξ) = F(x∗, λ′γα′ + (1− λ′)γβ′).

This shows the concavity of F(x∗,γ) with respect to γ.

Next, we validate the second observation that there exists a computationally tractable

method to evaluate ν(γ∗) for each given γ∗. We resort to an ellipsoid method which is

applicable to a general class of convex optimization problems based on the equivalence

of convex set separation and convex optimization. Specifically, Grotschel et al. (1981)

showed that for a convex optimization problem with a linear objective function and a

convex feasible region C, given that the set of optimal solutions is nonempty the problem

can be solved using an ellipsoid method in polynomial time if and only if the following

procedure can be implemented in polynomial time: for an arbitrary point c, check whether

c ∈ C and if not, generate a hyperplane that separates c from C.

It should be noted that the application of the ellipsoid method must be handled

with care. This is because additional complexity associated with distance functions is

introduced, and careful analysis is needed to verify the existence of an optimal solution

and the applicability of the ellipsoid method to each embedded optimization problem.

Theorem 4.3.2 below shows the tractability of evaluating ν(γ∗). The theorem requires

only the following mild assumptions:

• The set Xc (resp. µc, Σc) is nonempty, convex and compact (closed and bounded).

• Let N (·) := ||·|| denote the chosen norm in the distance function dµ, dΣ. Evaluation

of N (·) and a sub-gradient ∇N (·) can be provided in polynomial time.

74


MOMENT UNCERTAINTY

• There exists an oracle that can verify for any x(resp. ν, σ) if x(resp. ν, σ)

is feasible with respect to the set Xc(resp. µc, Σc) or provide a hyperplane that

separates x(resp. ν, σ) from the feasible set in polynomial time.

Theorem 4.3.2. For any given γ∗, under the above assumptions, the optimal value of

ν(γ∗) is finite and the evaluation of ν(γ∗) can be done in polynomial time.

Proof. Given that Gc(z) := maxk=1,...,Kak · z+bk, using duality theory for infinite linear

programming the optimization problem associated with ν(γ∗) in (4.7) can be reformulated

as follows (cf., Theorem 2.1 in [Natarajan et al., 2010])

ν(γ∗) := infx∈Xc,r,q,y,s,t≥0

r + q (4.9)

subject to r ≥ ak(µTx) + bk + a2

ky + aks ∀ µ ∈ Sµ,∀ k = 1, ..., K

4yq ≥ t2 + s2, y ≥ 0

t2 ≥ xTΣx ∀ Σ ∈ SΣ,

where Sµ := µ | dµ(µ, µc) ≤ γ∗1 and SΣ := Σ 0 | dΣ(Σ,Σc) ≤ γ∗2. Now, we show

that a separation approach can be applied to the above problem in polynomial time.

First, hyperplanes t ≥ 0, y ≥ 0 can be generated. Then, by reformulating the second and

the third constraints as

g2(t, s, y, q) :=

√t2 + s2 + (y − q)2 − (y + q) ≤ 0,

√xTΣx− t ≤ 0,

we can find that the feasible set of (x, r, q, y, s, t) is convex for any µ ∈ Sµ and Σ ∈

SΣ. For the second constraint, it is straightforward to verify if a assignment v∗ :=

(x∗, r∗, q∗, y∗, s∗, t∗) is feasible, i.e. g2(t∗, s∗, y∗, q∗) ≤ 0, or generate a valid separation

hyperplane based on the convexity of the feasible set:

∇tg2(v∗)(t− t∗) +∇sg2(v∗)(s− s∗) +∇yg2(v∗)(y − y∗) +∇qg2(v∗)(q − q∗) + g2(v∗) ≤ 0.

75


MOMENT UNCERTAINTY

For the first constraint, feasibility can be checked for each k-constraint by solving the

optimization problem

φk := supµ∈Sµ

ak(µTx∗) + bk + a2

ky∗ + aks

∗ − r∗.

The above problem can be equivalently reformulated as

supµ,ν

ak(µTx∗) + bk + a2

ky∗ + aks

∗ − r∗ : ||µ− ν|| ≤ γ∗1 ,ν ∈ µc (4.10)

by dropping the (infν) in the original distance function. Under the assumption that the

evaluation of the chosen norm || · || and its sub-gradient can be provided in polynomial

time and the existence of an oracle with respect to µc, we can apply the oracle for an

infeasible ν∗ /∈ µc, and/or we generate a hyperplane for an infeasible (µ∗,ν∗)

∇Nµ(µ∗,ν∗)(µ− µ∗) +∇νN (µ∗,ν∗)(ν − ν∗) +N (µ∗,ν∗) ≤ γ∗1

in polynomial time. Verification of feasibility is straightforward. In addition, since the

set µc is compact and γ∗1 is finite, the set of optimal solutions for (4.10) is non-empty

given that at least one feasible solution µ,ν | µ = ν,ν ∈ µc exists. Thus, we can

conclude that φk can be evaluated in polynomial time. Then, if φk ≤ 0, feasibility of

(r∗, x∗, y∗, s∗) is verified; if φk > 0 for some optimal µ∗, then generate the hyperplane

ak(µ∗Tx) + a2

ky + aks− r ≤ −bk.

Similarly, for the third constraint, feasibility can be checked by solving the optimization

problem

ρ := supΣ∈SΣ

(x∗)TΣx∗. (4.11)

The polynomial solvability of (4.11) and its non-empty set of optimal solutions can be

justified in a similar way to the first constraint except the constraint Σ 0. To verify the

feasibility of the constraint Σ 0, a polynomial QR algorithm can be applied. If there is

any negative eigenvalue, one may choose the lowest eigenvalue to construct a separation

76


MOMENT UNCERTAINTY

hyperplane. As a result, if ρ ≤ (t∗)2, feasibility of (t∗,x∗) is verified; if ρ > (t∗)2 for some

optimal Σ∗, the hyperplane

(x∗)TΣ∗x−√

(x∗)TΣ∗x∗t ≤ 0

can be generated.

Finally, to see that the optimal value of ν(γ∗) is finite, it suffices to show that for

any x ∈ Xc, the optimal value of the optimization problem (4.9) is finite. Consider now

the original formulation of ν(γ) defined in (4.7). Given a pair of feasible µ,Σ, one can

always construct a probability measure Q, e.g. normal distribution, having µ and Σ as

its mean and covariance. This implies that ν(γ∗) is bounded below and thus its optimal

value is finite. Since the sets Xc,Sµ,SΣ are nonempty and compact, the feasible set of

(4.9) can be easily shown to be nonempty. Thus, we can conclude also that the set of

optimal solutions of (4.9) is nonempty.

Hence, given that the separation problem can be solved in polynomial time, for any

fixed γ∗, the evaluation of ν(γ∗) can be done in polynomial time.

4.4 Connection with Classical Minimax Approaches

We start this section by the following observation of the problem (Pp).

Theorem 4.4.1. Suppose that (xwi ,γwi) denotes the optimal solution for the problem

(Pp) associated with a penalty vector wi. Given an increasing sequence of penalty vectors

wi∞i=1, γwi is monotonically decreasing for a fixed x∗wi. Furthermore, the sequence

F(xwi ,γwi) is also monotonically decreasing, where

F(x,γ) := maxµ,Σ,Q(· ; µ,Σ)

∫Gc(ξ

Tx)dQ(ξ) | (4.2) ∼ (4.4).

Proof. To show that γwi is monotonically decreasing as wi increases, it suffices to discuss

the case of increasing a univariate wil . Suppose that w1l and w2

l are fixed such that

77


MOMENT UNCERTAINTY

w1l < w2

l . Let γw1 be the optimal γ w.r.t. w1l and γw2 be the optimal γ w.r.t. w2

l . By

definition, the following inequalities hold

F(x∗,γw1)− w1l · rl(γw1) ≥ F(x∗,γw2)− w1

l · rl(γw2)

F(x∗,γw2)− w2l · rl(γw2) ≥ F(x∗,γw1)− w2

l · rl(γw1).

By adding the first inequality to the second inequality, we obtain (w1l − w2

l )(rl(γw2) −

rl(γw1)) ≥ 0. w1l < w2

l implies that rl(γw1) ≥ rl(γw2). Since rl(γ) ≥ 0 and that

rl(γ) is non-decreasing, we attain γw2 ≤ γw1 . For the case of two vectors w1 < w2 one

can increase each entry of w1 at a time and finally attain w2. Since at each step γ is

monotonically decreasing, γw2 ≤ γw1 still holds for w1 < w2. This also implies that

F(x∗,γw2) ≤ F(x∗,γw1) for a fixed x∗.

Now, consider the relation between F(xw1 ,γw1) and F(xw2 ,γw2), where xw1 and xw2

are respective optimal solutions with respect to w1 and w2. Based on the above result,

the inequality F(xw1 ,γw2) ≤ F(xw1 ,γw1) holds. In addition, since xw2 is the minimizer

with respect to w2, the inequality F(xw2 ,γw2) ≤ F(xw1 ,γw2) must hold as well. Then,

these two inequalities imply that F(xw2 ,γw2) ≤ F(xw1 ,γw1).

The above theorem indicates that as decision makers gain more confidence on the prior

reference models, by increasing the penalty parameter w they are always able to improve

the performance of the resultant optimal solutions. Such an improvement directly results

from the decreasing of the optimal bounds γ on the mean-covariance discrepancy. This

also reveals a close relation between our comprehensive robust optimization framework

and the classical worst-case (minimax) approach. In Theorem 4.4.2, we formalize such

a relation by proving that the optimal solution generated from our framework in fact

implicitly corresponds to the optimal decision generated using the following minimax

78


MOMENT UNCERTAINTY

formulation.

(Pc) minx∈Xc

maxγ,µ,Σ,Q(· ; µ,Σ) ∫Gc(ξ

Tx)dQ(ξ) | (4.2) ∼ (4.4)

subject to rl(γ) ≤ bl, l = 1, ...L,

where bl is used to parameterize the constraint.

Theorem 4.4.2. The following two problems provide an identical set of optimal solutions.

That is, given that x∗,γ∗ is an optimal solution for some w∗l , l = 1, ..., L, in the first

problem, there exists b∗l , l = 1, ..., L, such that x∗,γ∗ is also optimal for the second

problem, and vice versa.

minx∈Xc

maxγ≤aF(x,γ)−

L∑l=1

wlrl(γ), wl ≥ 0,

minx∈Xc

maxγ≤aF(x,γ) | rl(γ) ≤ bl, l = 1, ..., L,

where F(x,γ) := maxµ,Σ,Q(· ; µ,Σ)

∫Gc(ξ

Tx)dQ(ξ) | (4.2) ∼ (4.4).

Proof. It suffices to prove that for a fixed x∗, if γ∗ is an optimal solution for the inner

optimization problem of the first problem with parameter w∗, there exists a b∗ for the

second problem such that γ∗ is also optimal for its inner optimization problem given

x∗. Based on the optimality condition of convex optimization problems, for γ∗ to be an

optimal solution of the first problem it is required that

F(x∗,γ∗)−L∑l=1

wlrl(γ∗)−

∑j

λj(γ∗j− aj) ≥ F(x∗,γ)−

L∑l=1

wlrl(γ)−∑j

λj(γj − aj),∀γ,

and λj(γ∗j− aj) = 0, λj ≥ 0. Similarly, for γ∗ to be an optimal solution of the second

problem, it is required that

F(x∗,γ∗)−L∑l=1

ρl(rl(γ∗)−bl)−

∑j

vj(γ∗j−aj) ≥ F(x∗,γ)−

L∑l=1

ρl(rl(γ)−bl)−∑j

vj(γj−aj),

79


MOMENT UNCERTAINTY

∀γ, and ρl(rl(γ∗) − bl) = 0, vj(γ

∗j− aj) = 0, ρl, vj ≥ 0. This optimality condition is

equivalent to

F(x∗,γ∗)−L∑l=1

ρlrl(γ∗)−

∑j

vj(γ∗j− aj) ≥ F(x∗,γ)−

L∑l=1

ρlrl(γ)−∑j

vj(γj − aj),∀γ,

and ρl(rl(γ∗) − bl) = 0, vj(γ

∗j− aj) = 0, ρl, vj ≥ 0. Then, if γ∗, λj is a solution of the

first system, γ∗, λj is also a solution for the second system with bl = rl(γ∗), vj = λj, and

ρl = wl. For the other direction, if γ∗, ρl, vj is a solution for the second system, then

γ∗, ρl, vj is also a solution for the first system with wl = ρl, λj = vj.

Intuitively, the constraint form of the penalized function can be interpreted as an ad-

ditional constraint on the mean-covariance discrepancy, which provides greater flexibility

in modeling the ambiguity. The practical value of such additional flexibility can be seen

in Section 4.5.1. Finally, the above result also supports the perspective that the use of

the penalty construction within our framework can be seen as “endogenously” achieving

bounds for hedging against extreme moment uncertainty. The bounds are determined

based on the associated performance deterioration that the change of bounds may cause.

4.5 Semidefinite Optimization Reformulations

In this section, we reformulate the problem (Pp) as a semidefinite programming prob-

lem (SDP) by further assuming that the confidence region (µc,Σc) is semidefinite rep-

resentable (SDr). In addition, we also assume that both the norm function used in the

discrepancy measurement and the penalty functions are SDr; that is, the respective epi-

graph of each function is a SDr set. Based on SDP reformulations, further efficiency of

solving (Pp) can be gained using polynomial-time interior-point methods. A wide class

of SDr functions can be found in [Ben-Tal and Nemirovski, 2001].

Throughout the rest of this section, the binary operator • denotes the Frobenius inner

product. We first consider the general case that the confidence region µc (resp. Σc) is an

80


MOMENT UNCERTAINTY

uncountable but bounded set, which is parameterized by a sampled mean vector µ0 (resp.

a sampled covariance Σ0). We further consider matrix Σ as the centered second moment

matrix, i.e. Σ := E[(ξ − µ0)(ξ − µ0)T], and assume that Σ 0. This overall setting

allows one to exploit the information of sampled mean and covariance. In Theorem 4.5.1

below, we provide a fairly general method to generate SDP reformulations to the problem

(Pp). We first maximize with respect to Q(· ; µ,Σ) and then maximize with respect to

(µ,Σ) within the feasible region. Such a strategy provides a flexible SDP reformulation,

which would be extended to other practical settings later. Before showing the main

reformulation, the following lemma is presented to facilitate the reformulation of (Pp) to

a SDP problem.

Theorem 4.5.1. Assume that the confidence region (µc,Σc), the norm measurement

|| · ||, and the penalty function rl(γ) are SDr. Also, suppose that the confidence region is

uncountable. Then, the SDP reformulation of the problem (Pp) can be generated using

the following problem that is equivalent to (Pp)

minx∈Xc,λ,Λ,r,s

r + s −Λ • µ0µT0

subject to (Ps) ≤ r Λ 12(λ − 2Λµ0 + akx)

12(λ − 2Λµ0 + akx) s + bk

0, k = 1, ..., K,

where (Ps) denotes the following problem

max0≤γ≤a,t,µ,Σ,ν,σ

λTµ+ Λ •Σ− wTt

subject to ||µ− ν|| ≤ γ1, ||Σ− σ|| ≤ γ2,ν ∈ µc,σ ∈ Σc, rl(γ) ≤ tl , l = 1, ..., L.

Proof. To ease the exposition of the proof, we first define two sets S1(γ1) and S2(γ2)

S1(γ1) := µ′ | infν∈µc||µ′ − ν|| ≤ γ1, S2(γ2) := Σ′ | inf

σ∈Σc||Σ′ − σ|| ≤ γ2.

81


MOMENT UNCERTAINTY

Given that µ := E[ξ] and Σ := E[(ξ−µ0)(ξ−µ0)T], the problem (Pp) can be reformulated

as the following semi-infinite linear problem

minx∈Xc

max0≤γ≤a,µ∈S1 (γ1),Σ∈S2 (γ2)

maxQ

∫Gc(ξ

Tx)dQ(ξ)− rw(γ)

s.t.

∫dQ(ξ) = 1,

∫ξdQ(ξ) = µ,∫

ξξT − ξµT

0 − µ0ξTdQ(ξ) = Σ− µ0µ

T0 .

Using Lemma 2.2.1, we thus have

minx∈Xc

max0≤γ≤a,µ∈S1 (γ1),Σ∈S2 (γ2)

minλ,Λ

maxξ

−rw(γ) + Gc(ξTx) + λT(µ− ξ) +

Λ • (Σ− µ0µT0 − ξξ

T+ ξµT

0 + µ0ξT) .

Since Σ 0, the interior condition holds and thus the strong duality holds for the

above dual problem. Note that the inner maximization problem with respect to ξ can be

formulated as a problem of the form: maxk=1,...,K maxξ−ξTΛξ+pk

Tξ+qk, for some pk

and qk. Thus, it is easy to see that for a problem with finite optimal value, Λ 0 must

hold. Given that the operator maxξ preserves convexity, the overall problem is convex

w.r.t. λ,Λ and is concave w.r.t. γ,µ,Σ. Applying Sion’s minimax theorem, we

can exchange max0≤γ≤a,µ∈S1 (γ1),Σ∈S2 (γ2) and minλ,Λ0 and have an equivalent problem.

By some algebraic manipulation and addition of variables r, s, the problem can be

reformulated as

minx∈Xc,λ,Λ,r,s

r + s −Λ • µ0µT0

subject to max0≤γ≤a,µ∈S1 (γ1),Σ∈S2 (γ2)

λTµ+ Λ •Σ− rw(γ) ≤ r

Gc(ξTx) + ξ

T(−λ + 2Λµ0)−Λ • ξξT ≤ s, Λ 0 ∀ξ ∈ <n.

The second constraint (by expanding Gc(ξTx))

Λ • ξξT + ξT(λ − 2Λµ0 + akx) + s + bk ≥ 0 ∀ξ ∈ <n, k = 1, ...K

82


MOMENT UNCERTAINTY

can be reformulated as Λ 12(λ − 2Λµ0 + akx)

12(λ − 2Λµ0 + akx) s + bk

0, k = 1, ...K

using Schur’s complement. For the first constraint, the left-hand side can be re-expressed

as

max0≤γ≤a,t,µ,Σ

λTµ+ Λ •Σ− wTt : infν∈µc||µ− ν|| ≤ γ1, inf

σ∈Σc||Σ− σ|| ≤ γ2, rl(γ) ≤ tl ,

l = 1, ..., L, and it is equivalent to




Given that ∃(γ∗, t∗,µ∗,Σ∗,ν∗,σ∗) that satisfies the Slater condition, which is easily ver-

ified, by applying strong duality theory of SDP and dropping the minimization operator

of the dual, the constraint is SDr. Thus, the overall problem can be reformulated as a

semidefinite programming problem.

Remark 4.5.1. The dual problem of (Ps) has a particularly useful structure that the

penalty parameter k is placed at constraints in a form of (. . . ) ≥ −w, where (. . . ) denote

the terms that linearly depends on dual variables. Such dependency allows for setting

parameter w as additional variable. Thus, if the upper bound of original objective function

is τ , one may replace the objective function by τ + κ(k), where κ is a user-defined

function. Similar discussion can also be found in [Ben-Tal et al., 2006].

Remark 4.5.2. When the confidence region µc (resp. Σc) is only a singleton, the re-

formulation can be simplified. In that case, the distance measurement (inf || · ||) reduces

to the norm measurement (|| · ||), and the constraints (4.2) and (4.3) can be directly

formulated as semi-infinite conic constraints. Lemma 2.2.1 can be extended to account

for the problem with semi-infinite conic constraints (cf. Shapiro [2001]), and the rest of

reformulation follows closely the result of Theorem 4.5.1.

83


MOMENT UNCERTAINTY

The focus so far has been on deriving a general class of efficiently solvable SDP

formulations for the problem. Except for the SDr property, no additional structure has

been imposed on the norm measurement || · ||, the confidence region (µc,Σc), and the

penalty function rw. One natural choice of ||·|| for the discrepancy dµ(µ, µc)(dΣ(Σ,Σc)) is

suggested by following the connection between moments discrepancy and KL- divergence,

specifically,

(µ− ν)Tσ−1(µ− ν) ≤ γ1 (4.12)

−γ2Σ Σ− σ γ2Σ

, (4.13)

i.e. the ellpsoidal norm || · ||σ−1 (in (4.12)) and the spectral norm of a matrix (in (4.13)),

where Σ 0. For defining a confidence region, Delage and Ye (2010) consider mean

and covariance to be bounded as follows

(ν − µ0)TΣ−10 (ν − µ0) ≤ ρ1 (4.14)

θ3Σ0 σ θ2Σ0, (4.15)

where σ = E[(ξ−µ0)(ξ−µ0)T]. This structure turns out to be identical with our choice

of measurements for moments discrepancy by setting θ2 := (1 +ρ2), θ3 := (1−ρ2). Thus,

combining (4.12), (4.13), (4.14), and (4.15) provides a coherent way to specialize the

result of Theorem 4.5.1. We provide a SDP reformulation for (Pp) associated with the

penalty function rw(γ) := w1γ1 + w2γ2 + w3||γ||2 in the following Corollary 4.5.1.

Corollary 4.5.1. Given that the penalty function is defined as rw(γ) := w1γ1 + w2γ2 +

w3||γ||2, and the constraints associated with variables µ,Σ,ν,σ in (Ps) (Theorem 4.5.1)

84


MOMENT UNCERTAINTY

are replaced by (4.12),(4.13),(4.14) and (4.15), the problem (Pp) can be reformulated as

(PJ) minx∈Xc,λ,Λ,r,s,y1,2,ζ1,2,S

[1,...,4,l

1,2

r + s −Λ • µ0µT0

subject to a1y1 + a2y

2 + ρ1ζ

2 + µT

0λ + Σ0 • S[2

−θ3(Σ0 • S[3) + θ2(Σ0 • S[4) ≤ r

l1 + ζ1 ≤ y1 + w1 (4.16)

l2 + Σ •Λ ≤ y2 + w2 (4.17)√(l1)2 + (l2)2 ≤ w3

S[4 − S[1 − S[3 −Λ 0 S[1 −λ2

−λ2

ζ1

0

S[2 −λ2

−λ2

ζ2

0

S[3 0,S[4 0, y1, y2 ≥ 0,

and Λ 12(λ − 2Λµ0 + akx)

12(λ − 2Λµ0 + akx) s + bk

0,

where k = 1, ..., K, a = (a1, a2), and the constraint√

(l1)2 + (l2)2 ≤ w3 is also SDr.

Proof. The objective function and the first constraint can be derived from Theorem

4.5.1. Only the sub-problem (Ps) in Theorem 4.5.1 needs to be further reformulated

with respect to the penalty function rw(γ) := w1γ1 + w2γ2 + w3||γ||2 and the constraints

85


MOMENT UNCERTAINTY

(4.12),(4.13),(4.14) and (4.15); that is, we need to reformulate the problem

maxγ,t,µ,Σ,ν,σ

λTµ+ Λ •Σ− w1γ1 − w2γ2 − w3t ≤ r (4.18)

subject to (µ− ν)Tσ−1(µ− ν) ≤ γ1, −γ2Σ Σ− σ γ2Σ

,

(ν − µ0)TΣ−10 (ν − µ0) ≤ ρ1, θ3Σ0 σ θ2Σ0,

||γ||2 ≤ t, 0 ≤ γ ≤ a.

We can first replace the variable Σ by (σ + γ2Σ) since the optimal value can always be

attained by such a replacement. To see why, let us first assume that the optimal solution

(γ∗, t∗,µ∗,Σ∗,ν∗,σ∗) instead satisfies Σ∗ ≺ σ∗+γ∗2Σ, and let c denotes the respective

optimal value. Now, Λ 0 together with the constraint Σ− σ γ2Σ implies that

c ≤ λTµ∗ + Λ • (σ∗ + γ∗2Σ)− w1γ∗1 − w2γ

∗2 − w3t

∗.

This implies that an alternative solution (γ∗, t∗,µ∗,Σ∗∗,ν∗,σ∗), where Σ∗∗ = σ∗+γ∗2Σ,

must also be optimal.

Then, by reformulating constraints associated with µ,ν as SDP constraints using

the Schur complement lemma and the constraint ||γ||2 ≤ t as a SOCP constraint (see

[Ben-Tal and Nemirovski, 2001]), the problem can be reformulated as

maxγ,t,µ,ν,σ

λTµ+ Λ • (σ + γ2Σ)− w1γ1 − w2γ2 − w3t

subject to

σ µ− ν

µ− ν γ1

0,

Σ0 ν − µ0

ν − µ0 ρ1

0,

θ3Σ0 σ θ2Σ0,

||

γ1

γ2

||2 ≤ t, 0 ≤ γ1 ≤ a1, 0 ≤ γ2 ≤ a2.

86


MOMENT UNCERTAINTY

As a result, we can derive the dual problem using conic duality theory and obtain the

problem (PJ).

Numerical examples of (PJ) are provided in the later section. Its practical value is

also verified in a real world application.

Remark 4.5.3. It is worth noting that by solving the reformulated problem (PJ) the dual

optimal solutions associated with the constraints (4.16) and (4.17) in (PJ) are exactly

the optimal γ1 and γ2 in the original problem, which should be clear by following our

derivation in the proof carefully. This allows one to apply the sensitivity analysis result

of SDP to study the impact of the perturbation of penalty parameter w on γ1 and γ2, which

could be difficult to study using a penalized distribution-based approach. In addition, it

should also be clear that by setting w1 = w2 = w3 = 0 in (PJ), the optimal y1 and y2 give

the values of penalty parameters that lead to γ1 = a1 and γ2 = a2 in the original problem.

This fact will be used later in our computational experiment.

In the following sections, we present variations and extensions on the problem (Pp).

Most of the work throughout the following sections are based on or closely related to

Theorem 4.5.1. In particular, we show that the problem (Pp) can be easily extended

to the case of more flexible moment structures and to the case of a factor model by

modifying the sub-problem (Ps) in Theorem 4.5.1




As a result, these models can also be efficiently solvable via a semi-definite programming

approach.

87


MOMENT UNCERTAINTY

4.5.1 Variations of Moment Uncertainty Structures

The sub-problem (Ps) can accommodate a wide class of moment uncertainty structures,

including those considered in [Tutuncu and Koenig, 2004], [Goldfarb and Iyengar, 2003],

[Natarajan et al., 2010], and [Delage and Ye, 2010]. In this section, we highlight some

useful variations that provide additional flexibility in the structure of moment uncertainty.

Affine Parametric Uncertainty In (Ps) the mean vector µ (resp. second moment

matrix Σ) is assumed to be directly perturbed and is subject to each respective SDr

constraint. Alternatively, we can achieve a more flexible setting by instead assuming µ

and Σ to be affinely dependent on a set of perturbation vectors ζi and requiring the

set to be SDr. This follows closely to the typical affine-parametric-uncertainty structure

widely adopted in robust optimization literature. To be specific, µ,Σ can be expressed

in terms of ν,σ as follows

µ = ν +∑i

ζ ′iµi , ζ ′i ∈ Uµ,

Σ = σ +∑j

ζ ′′jΣj , ζ ′′j ∈ UΣ,

where µi , Σj are user-specified parameters, and Uµ,UΣ are SDr sets. Clearly, the orig-

inal moment structure can be viewed as a special instance of the above expression. To

incorporate this moment structure, we can modify the problem (Ps) as follows and still

retain its SDr property

max0≤γ≤a,t,ν,σ,ζ′

i,ζ′′j

λT(ν +∑i

ζ ′iµi) + Λ • (σ +∑j

ζ ′′jΣj)− wTt

subject to ||ζ ′|| ≤ γ1, ||ζ ′′|| ≤ γ2,ν ∈ µc,σ ∈ Σc, rl(γ) ≤ tl , l = 1, ..., L.

Applying the above formulation, one can for example further consider the case that the

perturbation vectors ζi are subject to a “cardinality constrained uncertainty set” (see

88


MOMENT UNCERTAINTY

[Bertsimas and Sim, 2004]), e.g.,

−1 ≤ ζ ′i ≤ 1,∑i

|ζ ′i| ≤ γ1.

This perturbation structure particularly allows moment discrepancy to be defined as

maximum number of parameters that can deviate from ν, σ.

Partitioned Moments The framework we consider so far relies only on mean and

covariance information. While the use of mean/covariance information only helps to re-

move possible bias from particular choice of distribution, the framework may be criticized

by overlooking possible distributional skewness. In [Natarajan et al., 2010], partitioned

statistics information of random return is exploited to capture skewness behavior. In sum-

mary, the random return ξ is partitioned into its positive and negative parts (ξ+, ξ−),

where ξ+i = maxξi, 0 and ξ−i = minξi, 0. Then, the triple (µ+,µ−,Σp) is called the

partitioned statistics information of ξ if it satisfies

µ+ = EQ[ξ+], µ− = EQ[ξ−], Σp = EQ[

ξ+ − µ+0

ξ− − µ−0

ξ+ − µ+

0

ξ− − µ−0

T

],

where µ+0 ,µ

−0 are partitioned sampled means. By modifying the objective function ac-

cordingly, i.e. ξTx = (ξ+)Tx− (ξ−)Tx, incorporating such a partitioned moment struc-

ture into (Ps) is straightforward as shown in the following theorem. We however note

that the reformulation problem provides only the upper bound of the optimal value as

it is necessary to relax the support condition associated with (ξ+, ξ−) in order to apply

Theorem 4.5.1 to generate a tractable problem.

Theorem 4.5.2. Given that the confidence region of partitioned mean and second mo-

ment matrix (µ+c , µ

−c ,Σ

pc) are uncountable convex sets, consider the problem (Pp) in which

candidate measures are associated with partitioned moments µ+ := E[ξ+], µ− := E[ξ−],

and Σp =

Σ11 Σ12

Σ12 Σ22

. Then, the SDP reformulation of the problem that provides

89


MOMENT UNCERTAINTY

the upper bound of (Pp) can be generated using the following problem

minx∈Xc,r,s,λ+,λ−,Λ11,Λ

12,Λ

22

r + s −Λ11 • µ+0 (µ+

0 )T − 2Λ12 • µ+0 (µ−0 )T −Λ22 • µ−0 (µ−0 )T

subject to (∗), (∗∗),

where (∗) denotes the following constraint

max (λ+)Tµ+ + (λ−)Tµ− + Λ11 •Σ11 + 2 ·Λ12 •Σ12 + Λ22 •Σ22 − wTt ≤ r

subject to ||

µ+

µ−

− ν+

ν−

|| ≤ γ1, ||

Σ11 Σ12

Σ12 Σ22

− σ11 σ12

σ12 σ22

|| ≤ γ2

ν+

ν−

∈ µ+

c

µ−c

,

σ11 σ12

σ12 σ22

∈ Σpc , 0 ≤ γ ≤ a, rl(γ) ≤ tl ,

l = 1, ..., L, where γ1, γ2, t,µ+,µ−,Σ11,Σ12,Σ22,ν

+,ν−,σ11,σ12,σ22 are decision vari-

ables, and (∗∗) denotes the following positive semidefinite constraint Λ11 Λ12

Λ12 Λ22

12(· · · )

12(· · · ) s + bk

0, k = 1, ..., K,

where (· · · ) is replaced by the vector

(λ+ − 2Λ11µ+0 − 2Λ12µ

−0 + akx, λ

− − 2Λ22µ−0 − 2Λ12µ

+0 − akx),

given that the penalty function rl(·) and the norm measurement for moments discrepancy

are SDr.

4.5.2 Extensions to Factor Models

Up to now, we have assumed either a pair of reference mean and covariance or a confidence

region of possible mean and covariance values among assets are readily available. In some

cases, this assumption may pose difficulty when the number of underlying assets becomes

90


MOMENT UNCERTAINTY

large. Fortunately, the behavior of the random returns can often be captured by a fewer

number of major sources of randomness (see [Luenberger, 1998]). In these cases, a factor

model that corresponds directly to those major sources (factors) is commonly used.

In a similar vein, we show that our penalized problem can be further extended to the

case of a factor model. Consider a factor model of the return vector ξ defined as follows

ξ = Vζ + ε,

where ζ is a vector of m factors (m ≤ n), V is a factor loading matrix, and ε is a vector

of residual returns with zero mean and covariance Σε. Let µζ denote the mean vector of

ζ. The mean µ of the random return ξ is thus expressed as µ = Vµζ. For re-expressing

the second moment matrix Σ, one has to decide whether or not to keep the information

of a sampled mean µ0. Since the estimation of a sampled mean is not a difficult task,

and including such information does not add much complexity to the problem, we keep

the information and find a vector µ′0 that approximately satisfies µ0 ≈ Vµ′0. Thus, by

further defining a second moment matrix of ζ as Σζ := E[(ζ−µ′0)(ζ−µ′0)T], the matrix

Σ can alternatively be expressed as Σ ≈ VΣζVT + Σε.

Given fixed V and Σε, one straightforward way to extend our model is to modify the

problem (Ps) as follows

maxγ,t,µ,Σ,ν,σ,µζ ,Σζ


subject to ||µ− ν|| ≤ γ1, ||Σ− σ|| ≤ γ2

ν = Vµζ,σ = VΣζVT + Σε

µζ ∈ ζ1,Σζ ∈ ζ2, 0 ≤ γ ≤ a, rl(γ) ≤ tl , l = 1, ..., L,

where ζ1, ζ2 are SDr sets that correspond to the confidence regions of factors moments.

The model can also be viewed as a penalty-based extension of the factor model considered

in [El Ghaoui et al., 2003]. We should note that in the above model the deviation of factor

moments from the respective confidence region ζ1, ζ2 may not be effectively taken into

91


MOMENT UNCERTAINTY

account. Alternatively, one may consider replacing the first two constraints in the above

model by

µ = Vµ′ζ,Σ = VΣ′ζVT + Σε,

||µ′ζ − µζ|| ≤ γ1, ||Σ′ζ −Σζ|| ≤ γ2,

where µ′ζ,Σ′ζ are new variables that correspond directly to the ambiguous factor mo-

ments. Thus, the formulation directly penalizes the discrepancy of factor moments.

4.6 Application in Portfolio Selection

4.6.1 Portfolio Selection under Model Uncertainty

Modern portfolio theory sheds light on the relationship between risk and return over

available assets, guiding investors to evaluate and achieve more efficient asset allocations.

The theory requires specification of a model, e.g. a distribution of returns or moments

of a distribution. To avoid any ambiguity, from here on model refers to the probability

measure or moments that characterizes the stochastic nature of a financial market. In

practice, practitioners cannot ensure the correct choice of model due to the complex na-

ture of model determination and validation. Ellsberg (1961) has also found that investors

in fact hold aversion attitudes toward the ambiguity of models. As a classical example,

even with lower expected return, investors have higher preference for investments that are

geographically closer due to their better understanding of the return distribution. This

finding implies that investors tend to pay an additional ambiguity premium, if possible,

in investing. Therefore, portfolio selection models that do not take into account this

ambiguity-aversion attitude may be unacceptable to such investors.

The maxmin (worst-case) approaches pioneered by Gilboa and Schmeidler (1989) ac-

count for investors’ ambiguity-aversion attitude by allowing investors to maximize the

92


MOMENT UNCERTAINTY

expected utility of terminal wealth, while minimizing over a set of ambiguity measures.

Unlike classical approaches to decision making such as expected utility theory that ne-

glect an agent’s preference on the choice among multiple probability models, Gilboa and

Schmeidler provided a system of axioms under which an agent’s preference on the choice

of the models can be characterized by the worst-case approach. In this regard, Dis-

tributionally Robust Optimization can be in fact seen as a special class of Gilboa and

Schmeidler’s approach, where the set of ambiguity measures is defined via the information

of moments. Several facets of constructing a robust portfolio based on limited statistical

information can be found in [Goldfarb and Iyengar, 2003], [Tutuncu and Koenig, 2004],

and [Zhu and Fukushima, 2009]. Most recent DRO applications in portfolio selection are

within the works of [Natarajan et al., 2010] and of [Delage and Ye, 2010].

To examine the strength of our comprehensive distributionally robust optimization

approach, we specialize the framework based on the portfolio selection model employed

in [Delage and Ye, 2010], and compare it with the approaches in [Delage and Ye, 2010]

and [Popescu, 2007], and with a sample-based approach. The details of implementation

and experiments based on real market data are presented in the following section.

4.6.2 Implementation and Experiments

In this section, we provide numerical examples to illustrate the performance of our penal-

ized approach. In particular, we consider the problem (PJ) and examine its performance

by comparing to the approaches of [Popescu, 2007], [Delage and Ye, 2010], and a sample-

based approach. Except the sample-based approach, which evaluates expectation using

empirical distribution constructed from sample data, the other two approaches are both

DRO approaches that evaluate expectation based on the worst-possible distribution sub-

ject to certain constraints on the first two moments. In [Popescu, 2007], the mean µ and

the covariance Σ are assumed to be equal to the sampled mean and covariance, while

93


MOMENT UNCERTAINTY

in [Delage and Ye, 2010] µ, Σ are assumed to be bounded within a confidence region

revolving around a pair of sampled mean and covariance. The objective of these computa-

tional experiments is to contrast the performance of “fixed-bound” DRO approaches with

the penalized problem (PJ) which “endogenously” determines the bound on moments

according to the level of deterioration in worst-case performance.

We compare the performance of the four approaches on real market data. In partic-

ular, we consider in this experiment the popular CVaR risk measure as the performance

measure to be minimized for each portfolio. Recall that CVaR risk measure is defined

as CVaRδ(z) := minλ[ λ

[ + (1/δ)E[(z − λ[)+] , where (t)+ = max0, t, z denotes

the loss distribution, λ[ is a slack variable to be minimized, and δ denotes a certain

probability level. CVaR is thus the conditional expectation of the loss above (1− δ)%-

quantile. Although in general a wide range of performance measures can be modeled

using (PJ), our intent here is to avoid those associated with specific investors’ preference,

e.g. specific functional form of a utility function, and rather to select the one that can

be widely accepted by practitioners. We believe that the tradeoff between downside risk

and associated return helps to give the most direct comparison among all approaches.

We also specialize further the moment structure in the penalized model (PJ) by setting

σ = Σ0 in (4.12) and Σ = Σ0 in (4.13), which is more consistent with the one used in

[Delage and Ye, 2010] and helps to compare the two models.

Our list of stocks consists of 46 major stocks of the S&P500 index across 10 industry

categories. We collected from Yahoo! Finance the historical daily prices of the 46 stocks

from January 1st, 1992 to December 31th 2010, in total 19 years. Our experiment setting

follows closely the one considered in [Delage and Ye, 2010]. Among 46 stocks, for each

experiment we randomly choose 4 stocks as the default portfolio and then rebalance the

portfolio every 15 days. At each time of constructing/rebalancing a portfolio, the prior

30-days daily data is used to estimate sampled mean and covariance. As Delage and

94


MOMENT UNCERTAINTY

Ye has shown that their approach outperforms other approaches under such a setting,

our hope is to carry over their high quality result to this experiment and compare it

with our penalized approach. Our choice of time period to examine the performance of

each approach is inspired by the choices in [Goldfarb and Iyengar, 2003], where the time

period January 1997 – December 2000 is chosen, and in [Delage and Ye, 2010], where the

time period 2001 – 2007 is chosen. To further cover the most recent financial crisis, the

entire time period that we consider to evaluate the performance is from January 1997

to December 2010. The dataset for the time period January 1992 – December 1996 was

used for initial parameter estimation.

We assume in this experiment that investors hold strictly conservative attitudes and

pursue only robust performance when the moments are realized within 90% confidence

region. To estimate the parameters ρ1 and ρ2 that correspond to the 90% confidence

region, we apply similar statistical analysis as the one used in [Delage and Ye, 2010]. It is

however difficult to determine the “right” amount of data that gives the “best” estimation

of ρ1 and ρ2. To mitigate possible bias due to the choice of the amount of data, in

addition to the initial estimation based on the data from January 1992 to December

1996 another re-estimation based on the data from January 1992 to December 2003 is

further performed in the middle of the rebalancing period, i.e. January 2004. Thus, in

our later analysis the portfolio performance of the first 7-years period (1997-2003) will

be presented separately from the latter 7-years period (2004-2010). The estimation of ρ1

and ρ2 with respect to the 90% confidence region are given as follows

ρ1−90% = 0.1816, ρ2−90% = 3.7356, (1992− 1996)

ρ1−90% = 0.1860, ρ2−90% = 4.3827, (1992− 2003).

In addition to parameters ρ1 and ρ2, penalty parameters w1, w2, w3 are also required to

be estimated for our model (PJ). Various approaches may be considered to estimate the

penalty parameters. For example, one may attempt to find those values which generally

95


MOMENT UNCERTAINTY

lead to superior portfolio performance by solving (PJ) repeatedly based on some historical

data. However, this additional calibration procedure, which may (or may not) give

unfair advantages over classical DRO approaches, may hinder us to provide a consistent

comparison and weaken the illustration of the benefit accrued solely from the bounds that

are endogenously generated from our penalized approach. As an alternative approach,

in this experiment we generate the penalty parameters by the following procedure. At

the time that we estimate ρ1−90% and ρ2−90%, we additionally estimate another set of

parameters ρ1−99% and ρ2−99% that corresponds to a 99%-confidence region:

ρ1−99% = 0.3779, ρ2−99% = 9.3773 (1992− 1996)

ρ1−99% = 0.4161, ρ2−99% = 12.1698 (1992− 2003).

We assume that the penalty parameters are calibrated in a way that the optimal portfolio

generated by model (PJ) with a 90% confidence region is identical to the one generated by

Delage and Ye’s model with a 99% confidence region at the time of parameter estimation.

Following Remark 4.5.3, we can compute the value of the penalty parameters by solving

(PJ), where the difference a1 = ρ1−99% − ρ1−90% and a2 = ρ2−99% − ρ2−90% is set as the

upper bound of γ1, γ2 and w1 = w2 = w3 = 0. This overall estimation procedure will help

compare fairly the following three models: Delage and Ye’s model with parameters ρ =

(ρ1−90%, ρ2−90%) (denoted by DY-90), with parameters ρ = (ρ1−99%, ρ

2−99%) (denoted

by DY-99), and our penalized model (PJ) with parameters ρ = (ρ1−90%, ρ2−90%) and

penalty parameters estimated by a1, a2 (denoted by LK-90). Note that as the sampled

mean and covariance are re-estimated at each rebalancing point, DY-90 and DY-99 have

ρ1, ρ2 unchanged; that is, the fixed bounds remain the same while LK-90 instead keeps

its penalty parameters unchanged.

In addition to the above three models, the performance of Popescu’ model (denoted by

P), and a sample-based approach (denoted by SP) will also be compared. The comparison

in terms of average (avg.), geometric mean (geo.) and CVaR measures of various quantiles

96


MOMENT UNCERTAINTY

δ among all models for the time periods 1997-2003 and 2004-2010 are given in Table 4.1

and Table 4.2.

avg. geo. δ = 0.01 δ = 0.1 δ = 1 δ = 5 δ = 10 yr. ret.

P 1.0043 1.0014 0.4375 0.6631 0.7553 0.8321 0.8662 1.0685

DY-90 1.0062 1.0046 0.6931 0.733 0.7986 0.8721 0.9000 1.0931

DY-99 1.007 1.0053 0.6908 0.7328 0.8002 0.8752 0.9027 1.1042

LK-90 1.0073 1.0056 0.6911 0.7328 0.8005 0.8762 0.9036 1.1087

SP 1.0043 1.0008 0.4375 0.5577 0.7301 0.8168 0.8535 1.0703

Table 4.1: Comparison of different approaches in the period: 1997/01-2003/12


P 1.0042 1.0018 0.5634 0.5799 0.7233 0.8297 0.8723 1.0597

DY-90 1.004 1.0027 0.6219 0.6835 0.7717 0.8676 0.9046 1.0642

DY-99 1.0044 1.0032 0.6314 0.6878 0.7772 0.8739 0.9098 1.0718

LK-90 1.0047 1.0036 0.6417 0.6918 0.7803 0.8763 0.9115 1.0772

SP 1.0043 1.0013 0.5634 0.5786 0.6992 0.8158 0.8605 1.0599


Various CVaR measures are provided to ensure the consistency of the performance in

terms of downside risk. As the economy has experienced a dramatic change before and

after the 2008 financial crisis, we further provide the comparison for the time periods

2004-2007 and 2007-2010, which are separately given in Table 4.3 and Table 4.4. As

shown in the tables, among 300 experiments LK-90 exhibits overall superior performance

among all the models except having lower mean and geometric mean than the P and the

SP model during 2004-2007. For that time period, it appears that even though the P and

the SP model still expose to higher downside risk than other approaches, they enjoy the

most advantage of upward trend of the market and achieve better average return. One

possible reason for this is that the market for the time period 2004-2007 is less volatile

(compared with other time periods), for which a sample-based approach can possibly

97


MOMENT UNCERTAINTY


P 1.0091 1.0081 0.8142 0.8411 0.8686 0.9101 0.9292 1.0597

DY-90 1.0074 1.0069 0.8784 0.8955 0.92 0.9421 0.9529 1.0642

DY-99 1.0073 1.0069 0.8743 0.8955 0.9246 0.9459 0.956 1.0718

LK-90 1.0073 1.0069 0.8737 0.8963 0.9251 0.9461 0.956 1.0772

SP 1.0095 1.0083 0.782 0.8245 0.861 0.9019 0.9218 1.0599



P 0.9994 0.9957 0.5634 0.5634 0.6776 0.7847 0.8334 0.901

DY-90 1.0008 0.9987 0.6142 0.6675 0.7366 0.8265 0.8689 0.9545

DY-99 1.0016 0.9996 0.6253 0.674 0.7429 0.8325 0.8752 0.9716

LK-90 1.0022 1.0003 0.6199 0.677 0.7479 0.8357 0.8776 0.9826

SP 0.9991 0.9946 0.5634 0.5634 0.6563 0.7671 0.819 0.887


benefit the most from using only sample data. On the other hand, in all other time

periods Delage and Ye’s and our penalized approach not only perform better than the

P and the SP approach in terms of CVaR values, where the improvement can go up to

5∼10% for δ = 1, but also achieve superior average performance, where the improvement

can go up to around 0.3%. This overall superior performance is also carried over to the

comparison of long-term performance; for example, the average yearly return is also

improved up to 3∼10% by using Delage and Ye’s model or our penalized model. This

verifies the importance of taking moment uncertainty into account in real-life portfolio

selection, which helps to achieve more efficient portfolios.

By comparing the performance of DY-90, DY-99 and LK-90, we can first see that LK-

90 has a clear advantage over DY-90. Since DY-99 also outperforms DY-90, this verifies

the intuition that if there is any additional gain by increasing the fixed bound of the

confidence region, our penalized approach can as well effectively benefit from such a gain.

98


MOMENT UNCERTAINTY

To explain the reason why DY-99 outperforms DY-90 is not easy as we have discussed

earlier that deciding appropriate bounds is highly non-trivial. What can however be

intriguing is that in most cases LK-90 outperforms DY-99 in terms of both average return

and downside risk performance. Although the improvement is not as substantial as it is

on other models, which is actually plausible as we enforce the consistency of the initial

setting between DY-99 and LK-90, we believe that this overall superior performance does

reflect the benefit from using a penalized approach, which endogenously determines the

bound at each rebalancing point according to the level of deterioration in worst-case

performance. Furthermore, as shown in Table 4.2, the improvement of the CVaR value

can still go up to 1.5% while the improvement of average return is 0.03%. Another

important observation is that in the time period 2007-2010, where the market is most

volatile, the improvement of LK-90 over DY-99 is most substantial in terms of average

return, and the improvement of average yearly return is as much as the improvement of

DY-99 over DY-90. By contrasting the improvement of LK-90 over DY-99 between the

time periods 2004-2007 and 2007-2010, we find that the more volatile the market is, the

more one can possibly benefit from using our penalized approach.

In Figures 4.1 - 4.2 we have also provided the average evolution of cumulative wealth

for each model for time periods 1997-2003, 2004-2010, 2004-2007, and 2007-2010. Note

that in all figures the evolution of a unit price of the S&P500 index has also been provided

for reference purpose. As seen, for the time period 1997-2003, the P and the SP model

show their vulnerability in a constantly volatile market and their associated cumulative

wealth dropped greatly as the market crashed around 2001-2002, whereas DY-90, DY-

99 and LK-90 have much better downside risk performance. One can also observe the

strength of the penalized model LK-90 compared with DY-90 and DY-99: its greater

wealth is cumulated by consistently providing more stable performance in a volatile mar-

ket. Similar observation can also be found in the time period 2004-2010. This comparison

99


MOMENT UNCERTAINTY

0 20 40 60 80 100 120

1

1.2

1.4

1.6

1.8

2

2.2

15 days/unit

Cum

ula

tive W

ealth (

$)

Jan. 1997 − Dec. 2003

P

DY−90

DY−99

LK−90

SP

S&P500

(a) 1997/01-2003/12

0 20 40 60 80 100 120

0.8

1

1.2

1.4

1.6

1.8

2

15 days/unitC

um

ula

tive W

ealth (

$)

Jan. 2004 − Dec. 2010

P

DY−90

DY−99

LK−90

SP

S&P500

(b) 2004/01-2010/12

Figure 4.1: Cumulative wealth

0 10 20 30 40 50 60

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

15 days/unit

Cum

ula

tive W

ealth (

$)

Jan. 2004 − Jun. 2007

P

DY−90

DY−99

LK−90

SP

S&P500

(a) 2004/01-2007/06

0 10 20 30 40 50 600.4

0.5

0.6

0.7

0.8

0.9

1

1.1

15 days/unit

Cum

ula

tive W

ealth (

$)

Jun. 2007 − Dec. 2010

P

DY−90

DY−99

LK−90

SP

S&P500

(b) 2007/06-2010/12

Figure 4.2: Cumulative wealth

100


MOMENT UNCERTAINTY

contrasts further a “fixed-bound” approach with our “endogenous bound” approach. The

overall computational results support well the idea that the penalized problem (PJ)

which endogenously decides the bound of moments based on the level of deterioration in

worst-case performance, improves the overall performance.

4.7 Conclusion

In this chapter, we address the difficulty of providing a “reasonably” robust policy in

the presence of rare but high-impact realization of moment uncertainty. A penalized

moment-based framework is proposed that extends classical penalized maxmin framework

to incorporate richer forms of moment uncertainty. While the classical DRO approaches

focus on ensuring the solution is robust against a bounded set of moment vectors, our

approach provides additional level of robustness when the realized moments fall out-

side the set. Under some mild conditions, the penalized moment-based problem turns

out to be computationally tractable for a wide range of specifications. Computational

experiments were conducted, where we specialize the penalized problem to a portfolio

selection model and find promising performance of our approach using historical data.

The improvement of performance has been found more substantial as the market is more

volatile. This highlights the potential benefit of endogenously achieving bounds for mo-

ment uncertainty using our penalized approach. We have also provided a few practical

extensions of the problem. The practical performance of those extensions remains to be

examined, and we will leave those examinations for our future work.

101

Chapter 5

Conclusion and Future Research

This thesis has focused on developing a comprehensive set of moment-based optimization

models that account for various forms of uncertainty associated with distributional spec-

ifications in decision evaluation and optimization. Various financial applications that can

benefit from the developments of these models were presented. We began this thesis by

presenting a novel application in model-risk management, where resorting to the use of

moment-based optimization has been shown to be extremely useful in providing mean-

ingful risk evaluations. Prior to our work, moment-based optimization was only known

to be applicable in fairly restrictive settings, where moments considered were, in general,

low-order and assumed to be deterministic. In the first part of the thesis, we presented

new tractability results of incorporating high-order marginal moments. These results

advance the existing knowledge of tractable foundations that can be used for modeling

richer moment information, and lay the foundations for studying other possible tractable

instances.

In the second part of the thesis, two new moment-based optimization models were

proposed that address the uncertain nature of moments. In the first model, a special

form of recourse functions were constructed to account for the stochastic information

of moments, whereas in the second model, a convex penalty function was designed to

102

CHAPTER 5. CONCLUSION AND FUTURE RESEARCH

capture the extreme moments falling outside a pre-specified confidence region. Although

these two models were developed from completely different perspectives, some light can

be shed on the common features that they share from a high-level perspective. From

a modeling point of view, both models present effective approaches to mitigate the risk

associated with the uncertainty of moments in that the both models are controllable

through penalty parameters that express risk aversion. From a theoretical point of view,

both models are consistent with their deterministic robust counterparts in that the de-

terministic models can be shown as limiting cases. Finally, from a computational point

of view, the complexity of the solution methods for both models can be shown equivalent

to that of solving a finite number of their deterministic robust counterparts. That is,

our new models, while accounting for additional levels of uncertainties, do not add much

computational burden compared with their deterministic counterparts. We believe be-

cause of these prominent features, the models developed in this thesis can add significant

value to the stream of research related to moment-based optimization.

There are a number of research directions that are important to pursue. We briefly

describe them here. First, the SDP formulation provided in Theorem 2.3.1 does not

guarantee to generate the tightest upper bounds. This leads to the question: Is it possible

to find a polynomial-time algorithm that generates the tightest bounds? We suspect the

question can be quite challenging to answer. There is reason to believe that the problem

of generating the tightest bounds in Theorem 2.3.1 can be a NP-hard problem, given

that incorporating joint multivariate moments up to the fourth order is NP-hard. If

so, it is not clear how to prove such a result. An alternative approach to improve the

tightness of the bounds is to employ Lasserre’s SDP relaxation techniques (Lasserre

(2001)). However, to achieve reasonably tight bounds, the size of the associated SDP

relaxation problems can become extremely large in the case of high-dimensional problems.

This imposes significant computational challenges in seeking tighter bounds even with

103

CHAPTER 5. CONCLUSION AND FUTURE RESEARCH

the use of modern SDP solvers. To resolve this, specialized algorithms that exploit the

structure of the resulting SDP problems need to be further developed.

Another research direction that may generate a wealth of applications is to investi-

gate the tractability and applicability of two-stage stochastic semi-definite programming

models that are more general than the ones considered in Chapter 3. It should be

clear that except for special instances such as the ones in Chapter 3, solving a general

stochastic semi-definite programming instance naturally gives rise to the problem of solv-

ing a large-scale SDP. One of the challenges is that decomposition-type algorithms that

work well for large-scale stochastic linear programming problems may not be immedi-

ately applicable for stochastic semi-definite programming problems. For example, many

decomposition algorithms employ a cutting-plane type of strategy that generates cuts

from sub-problems at each iteration and re-solves the master problem repeatedly. This

re-solving procedure can become cumbersome when applied to SDP problems. This is

because existing SDP algorithms are in general not as compatible with “warmstarting” as

linear programming solvers are, an essential optimization technique that determines the

efficiency of re-solving. The detailed study of resolving these computational challenges

and exploring relevant applications will be part of our future work.

104

Appendix A

Additional Tables

A.1 Tables of Section 2.2.3

τ = 1 τ = 12 τ = 24

s′ K [CB CM Cb] [CB CM Cb] [CB CM Cb]

0.2 30 10.034 10.034 10.035 10.401 10.405 10.416 10.812 10.821 10.843

0.2 35 5.039 5.041 5.042 5.565 5.584 5.595 6.221 6.228 6.251

0.2 40 0.465 0.330 0.351 1.804 1.747 1.764 2.710 2.673 2.695

0.2 45 0.000 0.005 (0.004) 0.285 0.295 (0.295) 0.855 0.838 (0.848)

0.2 50 0.000 0.000 (0.000) 0.022 0.039 (0.038) 0.199 0.214 (0.215)

0.4 30 10.034 10.036 10.037 10.566 10.600 10.608 11.361 11.369 11.388

0.4 35 5.044 5.093 5.090 6.378 6.353 6.364 7.639 7.595 7.618

0.4 40 0.907 0.635 0.677 3.316 3.193 3.217 4.817 4.731 4.756

0.4 45 0.015 0.083 (0.077) 1.493 1.426 (1.436) 2.875 2.798 (2.818)

0.4 50 0.000 0.012 (0.011) 0.594 0.611 (0.610) 1.641 1.605 (1.616)

0.6 30 10.034 10.059 10.057 11.154 11.160 11.168 12.535 12.487 12.509

0.6 35 5.108 5.206 5.199 7.543 7.434 7.453 9.395 9.283 9.312

0.6 40 1.349 0.939 1.001 4.825 4.626 4.660 6.919 6.769 6.800

0.6 45 0.129 0.226 (0.217) 2.947 2.784 (2.808) 5.029 4.881 (4.909)

0.6 50 0.004 0.075 (0.069) 1.735 1.668 (1.676) 3.623 3.505 (3.527)

0.8 30 10.039 10.114 10.109 12.019 11.950 11.962 13.976 13.847 13.875

0.8 35 5.265 5.346 5.335 8.817 8.610 8.641 11.231 11.034 11.070

0.8 40 1.791 1.240 1.323 6.325 6.034 6.079 8.994 8.759 8.799

0.8 45 0.355 0.396 (0.383) 4.463 4.194 (4.232) 7.194 6.953 (6.992)

0.8 50 0.042 0.186 (0.175) 3.112 2.928 (2.950) 5.755 5.533 (5.567)

Table A.1: CB (resp. CM) denotes the call option price of the diffusion (resp. jump-

diffusion) model with Lo’s specification. Cb denotes the call option prices of the bench-

mark model, i.e. the jump-diffusion model with k = 1, φ2 = 0.15, λ = 0.25.

105

APPENDIX A. ADDITIONAL TABLES

s′ K τ = 1 τ = 12 τ = 24

0.2 45 0.044 0.496 1.201

0.2 50 0.021 0.182 0.436

0.4 45 0.157 1.921 2.072

0.4 50 0.070 0.908 2.269

0.6 45 0.350 2.140 1.812

0.6 50 0.151 2.431 3.987

0.8 45 0.655 1.951 1.660

0.8 50 0.285 3.716 4.560

Table A.2: ϑ(V∗) of Qmom for various values of parameters s′, K, τ , where w′ = 2.

s′ K τ = 1 τ = 12 τ = 24

0.2 45 0.044 0.496 1.201

0.2 50 0.021 0.182 0.436

0.4 45 0.157 1.897 1.618

0.4 50 0.070 0.908 2.269

0.6 45 0.350 1.679 1.469

0.6 50 0.151 2.431 2.938

0.8 45 0.655 1.571 1.393

0.8 50 0.285 3.142 2.785

Table A.3: ϑ(V∗) of Qmom for various values of parameters s′, K, τ , where w′ = 5.

A.2 Tables of Section 3.4.2

106

APPENDIX A. ADDITIONAL TABLES

Strike Price (K) Time to Maturity

1 2 3 4 5 6 7 8

K=1200

BSLow 130.7 132.4 134.6 137.2 139.9 142.7 145.5 148.4

High 143.1 160.3 175.3 188.8 201.0 212.2 222.7 232.6

RS 136.2 141.0 148.2 154.6 160.8 166.8 172.7 178.0

UB SSDP

b+=1(0%) 131.2 133.6 136.6 139.8 143.1 146.5 150.0 153.4

b+=10(81%) 149.0 152.5 155.9 177.0 179.9 182.7 200.2 202.6

b+=102(98%) 149.0 152.5 174.1 177.0 195.1 197.7 200.2 215.7

WUB SDP b+=∞ 149.0 171.1 190.0 206.4 221.2 234.6 247.0 258.7

LB SSDP

b−=1(0%) 130.7 133.5 139.2 145.0 150.4 155.3 159.8 163.8

b−=10(81%) 130.7 131.9 133.1 134.3 135.7 137.1 138.4 139.7

b−=102(98%) 130.7 131.9 133.1 134.3 135.5 136.7 137.9 139.1

WLB SDP b−=∞ 130.7 131.9 133.1 134.3 135.5 136.7 137.9 139.1

K=1325

BSLow 22.3 30.9 37.6 43.5 48.6 53.4 57.8 62.0

High 60.2 84.5 103.2 119.1 133.1 145.8 157.5 168.4

RS 40.6 53.7 65.0 74.2 82.7 90.8 97.6 105.2

UB SSDP

b+=1(0%) 27.0 37.6 45.8 52.8 59.0 64.7 70.0 74.9

b+=10(81%) 74.2 78.6 82.9 110.3 113.4 116.5 137.1 139.7

b+=102(98%) 74.2 78.6 107.1 110.3 131.7 134.4 137.1 154.6

WUB SDP b+=∞ 74.2 103.8 126.3 145.1 161.5 176.2 189.7 202.1

LB SSDP

b−=1(0%) 43.5 60.0 72.1 81.9 90.2 97.3 103.6 109.1

b−=10(81%) 16.8 23.1 28.1 32.3 53.8 56.1 58.3 60.5

b−=102(98%) 16.8 23.1 28.1 32.3 36.1 39.6 42.8 45.8

WLB SDP b−=∞ 16.8 23.1 28.1 32.3 36.1 39.6 42.8 45.8

K=1400

BSLow 1.8 6.2 10.8 15.2 19.4 23.4 27.2 30.9

High 30.6 53.6 72.0 87.7 101.7 114.5 126.3 137.3

RS 17.1 26.4 37.7 46.5 54.1 62.2 69.3 76.1

UB SSDP

b+=1(0%) 3.1 9.4 16.0 22.2 28.0 33.5 38.6 43.5

b+=10(81%) 43.5 47.9 52.1 80.2 83.4 86.5 107.7 110.3

b+=102(98%) 43.5 47.9 76.9 80.2 102.3 105.0 107.7 125.8

WUB SDP b+=∞ 43.5 73.5 96.8 116.2 133.1 148.2 162.0 174.7

LB SSDP

b−=1(0%) 15.3 32.1 45.0 55.6 64.6 72.4 79.3 85.4

b−=10(81%) 0.0 0.1 1.8 4.6 25.1 27.4 29.6 31.8

b−=102(98%) 0.0 0.1 1.8 4.6 7.6 10.6 13.5 16.3

WLB SDP b−=∞ 0.0 0.1 1.8 4.6 7.6 10.6 13.5 16.3

Table A.4: Upper/lower bounds and prices for different strike prices K, b+(b−)-values

and time to maturity under 2 regimes.

107

Appendix B

Additional Figures

B.1 Section 3.4.2

1 2 3 4 5 6 7 80

50

100

150

200

250


Upper

bounds a

nd p

rices

BSlow

RS

UBSDP

UBSSDP

(b+=10)

UBSSDP

(b+=10

2)

WUBSSDP


1 2 3 4 5 6 7 80

20

40

60

80

100

120

140

160

180


Low

er

bounds a

nd p

rices

BShigh

RS

LBSDP

LBSSDP

(b−=10)

LBSSDP

(b−=10

2)

WLBSDP


Figure B.1: The case of 3 regimes and K = 1325

108

APPENDIX B. ADDITIONAL FIGURES

1 2 3 4 5 6 7 80

50

100

150

200

250


Upper

bounds a

nd p

rices

BSlow

RS

UBSDP

UBSSDP

(b+=10)

UBSSDP

(b+=10

2)

WUBSSDP


1 2 3 4 5 6 7 80

20

40

60

80

100

120

140

160

180

Time to Maturity (5 weeks/unit)Low

er

bounds a

nd p

rices

BShigh

RS

LBSDP

LBSSDP

(b−=10)

LBSSDP

(b−=10

2)

WLBSDP



1 2 3 4 5 6 7 80

100

200

300

400

500

600

700

800

900


Upper

bounds a

nd p

rices

BSlow

RS

UBSDP

UBSSDP

(b+=10)

UBSSDP

(b+=10

2)

WUBSSDP


1 2 3 4 5 6 7 80

20

40

60

80

100

120

140

160

180

200


Low

er

bounds a

nd p

rices

BShigh

RS

LBSDP

LBSSDP

(b−=10)

LBSSDP

(b−=10

2)

WLBSDP



109

Bibliography

Anderson, E. W., Hasen L. P., and Sargent, T. J. (2000): Robustness, detection and the

price of risk.

Ariyawansa, K. A. and Zhu, Y. (2006): Stochastic semidefinite programming: a new

paradigm for stochastic optimization, 4OR-A Quarterly Journal of Operations Re-

search 4, 239–253.

Bakshi, G., Cao, C., and Chen, Z. (1997): Empirical performance of alternative option

pricing models, Journal of Finance 53, 499–547.

Ben-Tal, A., Boyd, S., and Nemirovski, A. (2006): Extending scope of robust optimiza-

tion: comprehensive robust counterparts of uncertain problems, Mathematical Pro-

gramming Series B 107(1), 63–89.

Ben-Tal, A. and Nemirovski, A. (2001): Lectures on Modern Convex Optimization: Anal-

ysis, Algorithms, and Engineering Applications, MPS/SIAM Series on Optimization,

SIAM, Philadelphia, PA, USA.

Ben-Tal, A. and Nemirovski, A. (2002): Robust optimization - methodology and appli-

cations, Mathematical Programming Series B 927(3), 453–480.

Ben-Tal, A., El Ghaoui, L., and Nemirovski, A. (2009): Robust Optimization, Princeton

University Press, Princeton, NJ, USA.

110

BIBLIOGRAPHY

Bertsimas, D., Popescu, I., and Sethuraman, J. (2000): Moment problems and semidefi-

nite programming, Handbook on Semidefinite Programming: Theory, Algorithms, and

Applications, Kluwer Academic Publishers, Dordrecht, Netherlands.

Bertsimas, D. and Popescu, I. (2002): On the relation between option and stock prices:

a convex optimization approach, Operations Research 50, 358–374.

Bertsimas, D. and Popescu, I. (2005): Optimal inequalities in probability theory: a

convex optimization approach, SIAM Journal on Optimization 15(3), 780–804.

Bertsimas, D. and Sim, M. (2004): The price of robustness, Operations Research 52(1),

35–53.

Birge, J. R. and Louveaux, F. (1997): Introduction to Stochastic Programming, Springer-

Verlag, New York, NY, USA.

Black, F. and Scholes, M. (1973): The pricing of options and corporate liabilities, Journal

of Political Economy 81, 637–654.

Boyle, P. B. and Lin, X. S. (1997): Bounds on contingent claims based on several assets,

Journal of Financial Economics 46, 383–400.

Calafiore G. (2007): Ambiguous risk measures and optimal robust portfolios, SIAM Jour-

nal of Optimization 18(3), 853–877.

Christoffersen, P. and Jacobs, K. (2004): Which GARCH model for option valuation?,

Management Science 50, 1204–1221.

Cont, R. (2006): Model uncertainty and its impact on the pricing of derivative instru-

ments, Mathematical Finance 16, 519–547.

Curto, R. E. and Fialkow, L. A. (1996): Solution of the truncated complex moment

problem for flat data, Memoirs of the American Mathematical society 119(568).

111

BIBLIOGRAPHY

Dalakouras, G. V., Kwon, R. H., and Pardalos, P. M. (2008): Semidefinite programming

approaches for bounding Asian option prices, Computational Methods in Financial

Engineering, Springer Verlag, Berlin, German.

Delage, E. and Ye Y. (2010): Distributionally robust optimization under moment uncer-

tainty with application to data-driven problems, Operations Research 58(3), 595–612.

Dupacova J. (1987): The minimax approach to stochastic programming and an illustra-

tive application, Stochastics 20, 73–88.

El Ghaoui, L., Oks, M., and Oustry, F. (2003): Worst-case value-at-risk and robust

portfolio optimization: a conic programming approach, Operations Research 51(3),

543–556.

Ellsberg, D. (1961): Risk, ambiguity, and the savage axioms, Quarterly Journal of Eco-

nomics 75(4), 643–669.

Everitt, R. and Ziemba, W. T. (1979): Stochastic programs with simple recourse, Oper-

ations Research 27, 485–502.

Follmer, H. and Schied, A. (2002): Convex measures of risk and trading constraints,

Finance and Stochastics 6(4), 429–447.

Freeland, R. K., Hardy, M. R., and Till, M. (2009): Assessing regime switching equity

return models, Technical report, University of Waterloo, Ontario, Canada.

Gilboa, I. and Schmeidler, D. (1989): Maxmin expected utility with a non-unique prior,

Journal of Mathematical Economics 18(2), 141–153.

Goh, J. and Sim, M. (2010): Distributionally robust optimization and its tractable ap-

proximations, Operations Research 58(1), 902–917.

112

BIBLIOGRAPHY

Goldfarb, D. and Iyengar, G. (2003): Robust portfolio selection problems, Mathematics

of Operations Research 28(1), 1–38.

Gotoh, J. and Konno, H. (2002): Bounding option prices by semidefinite programming:

a cutting plane algorithm, Management Science 48(5), 665–678.

Grotschel, M., Lovasz L., and Schrijver A. (1981): The ellipsoid method and its conse-

quences in combinatorial optimization, Combinatorica 1(2), 169–197.

Grundy, B. (1991): Option prices and the underlying asset’s return distribution, Journal

of Finance 46(3), 1045–1070.

Hamburger, H. (1920): Uber eine Erweiterung des Stieltjesschen Momentenproblems,

Mathematische Annalen 81(2), 235–319.

Hamburger, H. (1921): Uber eine Erweiterung des Stieltjesschen Momentenproblems,

Mathematische Annalen 82(3), 168–187.

Hamilton, J. D. (1989): A new approach to the economic analysis of nonstationary time

series and the business cycle, Econometrica 57, 357–384.

Hardy, M. R. (2001): A regime switching model of long-term stock returns, North Amer-

ican Actuarial Journal 5, 41–53.

Hsieh, K. and Ritchken, P. (2005): An empirical comparison of GARCH option pricing

models, Review of Derivatives Research 8, 129–150.

Isii, K. (1960): The extrema of probability determined by generalized moments (i)

bounded random variables, Annals of the Institute of Statistical Mathematics 12, 119–

133.

Isii, K. (1963): On sharpness of Tchebycheff-type inequalities, Annals of the Institute of

Statistical Mathematics 14, 185–197.

113

BIBLIOGRAPHY

Jackwerth, J. and Rubinstein, M. (1996): Recovering probability distributions from op-

tion prices, Journal of Finance 51, 1611–1631.

Kall, P. and Wallace, S. W. (1994): Stochastic Programming, John Wiley and Sons,

Chichester, West Sussex, England.

Kariya, T. and Liu, R. Y (2003): Asset pricing: discrete-time approach, Kluwer Academic

Publishers, Boston, MA, USA.

Kolda, T. G., Lewis R. M., and Torczon V. (2003): Optimization by direct search: new

perspectives on some classical and modern methods, SIAM Review 45(3), 385–482.

Kwon, R. H. and Li, J. Y. (2011): A stochastic semidefinite programming approach for

bounds on option pricing under regime switching, Working Paper.

Lasserre, J. B. (2001): Global optimization with polynomials and the problems of mo-

ments, SIAM Journal on Optimization 11, 796–817.

Lasserre, J. B. (2010): Moments, positive polynomials and their applications, Imperial

College Press, London, UK.

Li, J. Y. and Kwon, R. H. (2011): Portfolio selection under model uncertainty: a penal-

ized moment-based optimization approach, Working Paper.

Li, J. Y. and Kwon, R. H. (2012): Market price-based convex risk measures: a

distribution-free optimization approach, Operations Research Letters 40(2), 128–133.

Lo, A. W. (1987): Semi-parametric upper bounds for option prices and expected payoffs,

Journal of Financial Economics 19(2), 373–387.

Luenberger, D. G. (1998): Investment Science, Oxford University Press, New York, NY,

USA.

114

BIBLIOGRAPHY

Maenhout, P. J. (2004): Robust portfolio rules and asset pricing, Review of Financial

Studies 17(4), 951–983.

Natarajan, K., Pachamanova, D., and Sim, M. (2008): Incorporating asymmetric distri-

butional information in robust value-at-risk optimization, Management Science 54(3),

573–585.

Natarajan, K., Sim, M., and Uichanco, J. (2010): Tractable robust expected utility and

risk models for portfolio optimization, Mathematical Finance 20(4), 695–731.

Popescu, I. (2007): Robust mean-covariance solutions for stochastic optimization, Oper-

ations Research 55(1), 98–112.

Primbs, J. A. (2010): SDP relaxation of arbitrage pricing bounds based on option prices

and moments, Journal of Optimization Theory and Applications 144, 137–155.

Ritchken, P. (1985): On option pricing bounds, Journal of Finance 40, 1219–1233.

Scarf, H. (1958): A min-max solution of an inventory problem, Studies in The Mathe-

matical Theory of Inventory and Production 201–209.

Shapiro, A. (2001): On duality theory of conic linear problems, Semi-Infinite Program-

ming: Recent Advances, Kluwer Academic Publishers, Netherlands.

Shapiro, A., Dentcheva, D., and Ruszczynski, A. (2009): Lectures on Stochastic Program-

ming: Modeling and Theory, MPS/SIAM Series on Optimization, SIAM, Philadelphia,

PA, USA.

Smith, J. (1995): Generalized Chebyshev inequalities: theory and applications in decision

analysis, Operations Research 43(5), 807–825.

Stieltjes, T. J. (1894): Recherches sur les fractions continues, Annales de la Faculte des

Sciences de Toulouse 8, 1–122.

115

BIBLIOGRAPHY

Stieltjes, T. J. (1895): Recherches sur les fractions continues, Annales de la Faculte des

Sciences de Toulouse 9, 1–47.

So, M. K. P., Lam, K., and Li, W. K. (1998): A stochastic volatility model with markov

switching, Journal of Business and Economic Statistics 16, 244–253.

Turner, C. M., Startz, R., and Nelson, C. R. (1989): A markov model of heteroscedastic-

ity, risk and learning in the stock market, Journal of Financial Economics 25, 3–22.

Tutuncu R. H. and Koenig, M. (2004): Robust asset allocation, Annals of Operations

Research 132, 157–187.

Uppal, R. and Wang, T. (2003): Model misspecification and under-diversification, Jour-

nal of Finance 58(6), 2465–2486.

Zhu, S. S. and Fukushima, M. (2009): Worst-case conditional value-at-risk with applica-

tion to robust portfolio management, Operations Research 57(5), 1155–1168.

Zuluaga, L. F. and Pena, J. F. (2005): A conic programming approach to generalized

tchebycheff inequalities, Mathematics of Operations Research 30(2), 369–388.

116

Documents

Comprehensive Robustness via Moment-Based Optimization ... › bitstream › 1807 › ... · The use of a stochastic model to predict the likelihood of future outcomes forms an integral