Models of Price Impact Part III Essay Daniel Ritter

Models of Price Impact

Part III Essay

by

Daniel Ritter

Date:

24th of April 2015

University of Cambridge

Faculty of Mathematics

Contents

1 Introduction 1

2 The Almgren Model 3

2.1 The Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 The Linear Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Adaptive Strategies 12

3.1 Mean-Variance Optimal Strategies . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 CARA Investors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Conclusion 36

II

1 Introduction

On the first pages of many introductory books to financial mathematics, the author gives

a number of assumptions and simplifications for the markets considered. Those simpli-

fications typically hold approximately in most cases of applications, but for a thorough

understanding of financial markets they have to be dropped and novel, stronger and more

complicated models of financial markets have to be developed. Two of those simplifica-

tions are a vanishing bid-ask spread and an infinite market depth, which means that there

is any desired number of shares available at the quoted price of an asset and trading this

asset has no influence on the evolution of the price.

For small market participants these assumptions reflect real markets in a satisfiable way.

Bid-ask spreads, typically, are of the size of one or a few ticks and the real market depth is

much bigger than the volume traded by those small investors. They can execute the whole

trade at the given bid respectively ask price. For market participants who want to sell

(buy) a large number of shares of a certain stock on the other hand, those simplifications

no longer reflect real world behaviour good enough, since they would consume the entire

depth of the market at the given bid (ask) price and thus drive down (up) those prices.

This phenomenon is called price impact. Typically, a market participant is considered to

be large if he trades a number of 10 000 shares or more, which is then called a block trade.

This figure, however, should depend on the specific stock, since liquidity, and therefore

market depth, varies from stock to stock.

As described in the chapter “The Block Trader” of Sebastian Mallaby’s book “More

Money Than God” [11], those block trades gained significant importance in the period

from 1965 to 1984, when the percentage of trading volume on the markets represented by

block trades rose from less than 5 per cent to about 50 per cent, caused by an increasing

concentration of money in pension funds, insurance funds, and mutual funds, which was

privately invested before. [11, p. 52f.][16] Nowadays, due to the electronic trading, it is

possible for traders to split block trades of sizes above 10 000 shares in a number of smaller

trades with volumes of a few hundred shares. These small trades all have small impacts on

the price which then add together to the total price impact. This total impact, however,

depends significantly on how the large trade was split. It is, therefore, not only an

important question today, how trades influence the prices and how this can be modelled,

but even more which trading strategy is best for market participants wanting to sell or

buy a large number of shares. A recurring difficulty in these thoughts is that slower selling

or buying typically leads to less price impact on the one hand, since the market has more

time to recover. On the other hand, faster sales or buys minimise the risk of random price

changes during the sell or buy period. It is one of today’s problems for companies, like

hedge funds, who are trading large volumes on the market, to find the optimal middle

course.

1

In this essay, we want to discuss one approach of modelling price impact and properties

of optimal trading strategies in this model. We use the approach made by Almgren and

Chriss in [1]. It suggests that the price impact of a trade consists of a permanent impact

and a temporary impact. The permanent impact reflects the fact that other market

participants assume our trading decision to be driven by some information about future

price changes. Also, it can be based on a speculation about further trades to follow the

first one in near future. It is called permanent impact since it influences the price for

the whole period we consider. The temporary impact is assumed to only influence the

price for the particular trade itself and stems from an imbalance in supply and demand

which moves the price away from the equilibrium price to a less favourable price. This

imbalance, however, is assumed to be compensated until the next trade is made. The

basis for that is the widely discussed and empirically supported resilience of the order

book, which means that market makers tend to fill the gap occurred by trading. For

trades at an interval of some minutes the assumption of full compensation seems to be

sensible if the number of traded shares was not too large.

The essay is outlined as follows: In section 2, we will look at the Almgren model in more

detail. We will begin with a formal setup and then determine optimal strategies for a

special case. Then in section 3, we will discuss the benefit from permitting adaptive

strategies in our model. First, we will show that one can construct an optimal adaptive

trading strategy in the case of mean-variance optimisation. In the second part of section 3,

we will see that in the case of CARA (constant absolute risk aversion) utility optimisation

the optimal strategy is still a static one.

2

2 The Almgren Model

In this section, we will first formalise the setup of the Almgren model, which was broached

in the introduction. After that, we will explicitly solve the problem of mean-variance

optimisation for the linear case of the Almgren model, i. e. if we assume temporary

impact to be linear. The presented ideas come from Almgren and Chriss in [1, p. 7-14].

2.1 The Setup

The situation we want to consider in the following is that we are a trader on the market

and we hold a number of X0 ∈ R+0 shares of an asset at time 0. We want to liquidate

this portfolio by a prescribed time T < ∞. Note that we restrict ourselves to solving

the problem for this sale programme. It is obvious that the theory is also applicable for

buy programmes, where we start with 0 shares at time 0 and have to build up a certain

number of shares by prescribed time T . Also note that the situation can be generalised to

a portfolio of d assets whose price processes can be correlated in some way and make the

calculation more complicated. A discussion generalised to this possibility can be found in

[1, p. 36ff.] and [15], and to some extent in section 3.2.

The discrete times at which we are allowed to trade are given by tk := kτ = k · TN

, where

0 ≤ k ≤ N and τ := TN

is the period between each trade, typically a few minutes to some

hours. If empirical data suggests some intra-day seasonality in the traded volume, like in

[6, p. 1671], one can interpret time t as ‘volume time’ rather than physical time.

We follow a model for the underlying price process chosen by Almgren and Chriss in [1]

which is that of a discrete arithmetic random walk. That is, in periods in which we do

not trade any shares, the price would develop according to

Sk = Sk−1 + στ12 ξk,

where (ξk)k≥1 is a set of independent random variables of mean zero and variance one

on some probability space (Ω,F ,P), and S0 is the deterministic, quoted price of the

asset at time 0. We assume that the process (ξk)k≥1 respectively (Sk)k≥0 induces a fil-

tration (Fk)k≥0 ⊂ F and FN = F is complete. Possible examples for ξk would be

ξk ∼ Unif−1,+1 or ξk ∼ N (0, 1). Note that we have not added any drift to the price

process. This is a sensible choice if we have no estimates on future price movements. A

discussion about optimal execution strategies with non-vanishing drift term can be found

in [1, p. 26ff.] and to some extent in section 3.2. Using an arithmetic random walk to

model the price process has the drawback that with positive probability the prices could

become negative. This probability, however, is very small and can be neglected for typical

trading periods T of some hours or days. Then we have S0 >> σ√T = σ(SN). The reason

for choosing an arithmetic random walk rather than some positive process, like discretised

3

Geometric Brownian motion for example, is that it is mathematically more tractable and

allows an easier analysis since price changes are independent of the current price level.

A trading strategy X = (Xk)k≥0 is specified by the random variables Xk := Xtk ∈ Rwhich give the number of shares that we hold at time tk. For now, we only consider

static trading strategies X, i. e. (Xk)k≥0 is a sequence of deterministic, constant random

variables. Later, we will loosen this restriction and permit certain dynamic strategies as

well which are previsible in the filtered probability space. That is, each random variable

Xk may depend on information about the outcome of the random variables ξ1, . . . , ξk−1.

Note that here and throughout the essay we allow non-integer values for the portfolio

holdings Xk. In practice, at least X0 would be an integer but all results for Xk have to

be rounded to the next integer.

To simplify the notation in the sequel, define the number of shares of the asset sold

between times tk−1 and tk to be nk := Xk−1 −Xk. This yields the following relation:

Xk = X0 −k∑j=1

nj =N∑

j=k+1

nj

As mentioned before, in Almgren’s model we assume the price impact to be composed of

a permanent part Iperm and a temporary part Itemp such that the actual equilibrium price

evolves according to

Sk = Sk−1 + στ12 ξk − Iperm

k

and the realised price for the kth trade is given by

Sk = Sk−1 − Itempk .

So the price changes in the model are due to an exogenous factor, the volatility, which is

independent of the trading, and the endogenous factors permanent and temporary impact,

which are a reaction of the market to the trades. Both impacts should depend on the rate

of trading in the kth interval which is given by nk/τ . We, therefore, set

Ipermk = τg

(nkτ

)Itempk = h

(nkτ

)for some functions g, h : R→ R. A reasonable choice for h has to be non-decreasing, and

non-positive on (−∞, 0] and non-negative on [0,∞). Further, we assume f(v) := v · h(v)

to be strictly convex since larger trades should be punished as compared to smaller trades.

It is a result due to Huberman and Stanzl in [9] that the permanent impact has to be linear

if one wants to rule out quasi-arbitrage which “[...] is the availability of a series of trades

that generate infinite expected profits with an infinite Sharpe ratio.” [9, p. 1] That is,

not only the expected gain from such a strategy would be infinite, but even the expected

4

value of the gains divided by their standard deviation would be. This observation leads

to a linear choice for the permanent impact in our model. Since not having traded should

not be punished, we then get

g(v) = γv,

where γ > 0 and we have no constant summand.

Now we can investigate how different trading strategies lead to different revenues and

which strategies we favour. In absence of any price impact, we could just liquidate the

whole portfolio instantaneously and would receive a sum of S0X0. Considering price

impact in the above sense, our revenues when liquidating X0 shares with strategy X until

time T calculates to:

RXT =

N∑k=1

nkSk

=N∑k=1

(Xk−1 −Xk)Sk−1 −N∑k=1

nkh(nkτ

)=

N∑k=1

Xk−1Sk−1 −Xk

(Sk − στ

12 ξk + τg

(nkτ

))−

N∑k=1

τf(nkτ

)=

N−1∑k=0

XkSk −N∑k=1

XkSk +N∑k=1

Xk

(στ

12 ξk − γnk

)−

N∑k=1

τf(nkτ

)= S0X0 +

N∑k=1

(στ

12 ξk − γnk

)Xk −

N∑k=1

τf(nkτ

)We are doing this calculation without discounting the revenues from different times since

we assume a short time horizon in which we liquidate the whole portfolio. The transaction

costs we have to pay when using execution strategy X then are

C(X0, N,X) := S0X0 −RXT =

N∑k=1

(−στ

12 ξk + γnk

)Xk +

N∑k=1

τf(nkτ

),

where we identify∑N

k=1−στ12 ξkXk as the effect of volatility,

∑Nk=1 τf

(nkτ

)as the effect

of the temporary impact, and

N∑k=1

γnkXk = γN∑k=1

(Xk−1 −Xk)Xk

=γ

2

N∑k=1

X2k−1 −X2

k − (Xk −Xk−1)2

=γ

2X2

0 −γ

2

N∑k=1

n2k

5

as the effect of the permanent impact. One could imagine models where the temporary

impact carries some randomness. This case is discussed to some extent in [3]. Here,

however, the only randomness in the transaction costs lies in the volatility term of the

unaffected price process. One computes the expected transaction costs and their variance

as

E(X) := E [C(X0, N,X)] =γ

2X2

0 −γ

2

N∑k=1

n2k +

N∑k=1

τf(nkτ

), (1)

V (X) := Var [C(X0, N,X)] =N∑k=1

σ2τX2k . (2)

Since the square function is strictly convex, so is V (X) on the set of all liquidating

strategies X. Under sensible choices for γ and f , the expected costs E(X) are also

strictly convex. This has to be checked for each particular choice, however. For the rest

of this section, we will assume that γ and f are such that E(X) is strictly convex.

Note that the strategy of immediately selling all shares, that is X1 = X2 = . . . = XN = 0,

is the unique minimiser of V (X) since it yields a variance of 0. We call this strategy the

instantaneous one and denote it by X inst. The expected (and deterministic) trading costs

in this case are

E(X inst) =γ

2X2

0 −γ

2X2

0 + τf

(X0

τ

)= τf

(X0

τ

).

So without taking any risk, we have to pay costs of τf (X0/τ) for liquidating the portfolio.

Strategies with higher expected costs, therefore, are typically considered as bad choices,

unless one is a risk-loving trader. No matter if one is a risk-averse, risk-neutral, or risk-

loving trader, however, one would always choose an execution strategy with lower expected

costs over one with the same variance but higher expected costs. This leads to the concept

of the efficient frontier, introduced by Almgren and Chriss in [1]. In practice, the purpose

of portfolio liquidation is to make its value available in cash, and not speculating with

it. So it is reasonable to assume a risk-averse trader who is trying to minimise expected

costs for a given maximum level of variance. Therefore, we call a strategy X efficient or

optimal, if it minimises expected costs for a given maximum level of variance V∗:

minX: V (X)≤V∗

E(X) (3)

Due to the convexity of V (X), the set X : V (X) ≤ V∗ is convex. Also, it is bounded

since V (X) =∑N

k=1 σ2τX2

k . Hence there exists a unique minimiser X∗ of the strictly

convex function E(X). Since X∗ has variance V (X∗), we can rewrite (3) by introducing

a Lagrange multiplier λ:

minX,λ

E(X) + λ(V (X)− V (X∗))

6

Note that λ has to be non-negative by the equivalence to (3). If we wanted to solve

this explicitly, we would have to know V (X∗). Instead, we fix λ now and determine the

solution of

minX

E(X) + λ(V (X)− V (X∗))

which is the same as the solution of

minX

E(X) + λV (X), (4)

where we got rid of the unknown constant V (X∗) again. Since both E(X) and V (X)

are strictly convex, we find a unique solution X∗(λ) for each positive λ. As we vary λ

between 0 and∞, we get the set of all efficient strategies X∗, for which we can determine

the corresponding variances V∗ again. What we did here was changing the parametrising

variable in the efficient frontier from V∗ to λ. The parameter λ has also an economical

interpretation since we can identify equation (4) as the typical approach of mean-variance

optimisation for the given level of risk aversion λ.

For some cases of temporary impact functions h, the minimisation problem can be solved

explicitly. In the next section, we will do this for linear impact.

2.2 The Linear Case

Throughout this section we will assume the temporary impact to be linear:

h(nkτ

)= ε sgn(nk) +

η

τnk

for ε and η constants greater than 0. The term ε sgn(nk) can be interpreted as transaction

costs, consisting of fees and half the bid-ask spread. Note that empirical studies refute

the assumption of linear impact and suggest a power law with exponent of one half [4][7]

or 3/5 [2, p. 20], or logarithmic behaviour [13, p. 6] instead. Solving the linear case is

easy and provides some insight into the behaviour of efficient strategies, however.

Plugging in the special form for h into (1), we can compute the expected transaction costs

for the linear model:

E(X) =γ

2X2

0 −γ

2

N∑k=1

n2k +

N∑k=1

τf(nkτ

)=γ

2X2

0 −γ

2

N∑k=1

n2k +

N∑k=1

nk

(ε sgn(nk) +

η

τnk

)=γ

2X2

0 +N∑k=1

ε |nk|+(ητ− γ

2

)n2k

Note that for η/τ ≤ γ/2 we could make this expression as small as we like by first selling

a huge amount of shares and then buying them again (or vice versa). This is due to

7

the fact that the parameters γ and η would be such that the permanent impact drives

down (up) the prices in the future even more than the temporary impact drives them up

(down) for the current sale. This, of course, makes no sense for real markets and violates

all assumptions we made on the model. Thus, we assume η/τ > γ/2 in the following.

But then we see that one can never improve a strategy by intermediate buying, since this

drives up both the expected transaction costs E(X) and their variance V (X) from (2).

From now on, we therefore restrict ourselves to pure sell programmes without intermediate

buying. That is, |nk| = nk and we get that

E(X) =γ

2X2

0 + εX0 +N∑k=1

(ητ− γ

2

)n2k,

which is a strictly convex function on the set of strategies that liquidate X0 shares in time

T . In order to minimise E(X), note that it is minimal if and only if∑N

k=1 n2k is. We can

solve this constrained optimisation problem (recall that∑N

k=1 nk = X0) by introducing

a Lagrange multiplier which we call λ to avoid confusion with the risk aversion λ from

before:

0 =∂

∂nk

(N∑k=1

n2k − λ

(N∑k=1

nk −X0

))= 2nk − λ

This yields λ = 2nk for all k and in particular all nk are equal. So it must hold that

nk = X0/N . We call this strategy the linear one and denote it by X lin. Plugging it into

our formula for E(X) gives us the smallest possible expected costs of

Elin(X0, N) := E(X lin) =γ

2X2

0 +εX0+N∑k=1

(ητ− γ

2

)(X0

N

)2

=γ

2X2

0 +εX0+(η − γτ

2

) X20

T.

As well, we can compute the variance of the strategy:

Vlin(X0, N) := V (X lin) =N∑k=1

σ2τX2k =

N∑k=1

σ2τ

(N − kN

X0

)2

= σ2τX20

N−1∑`=0

(`

N

)2

= σ2 T

NX2

0

(N − 1)N(2N − 1)

6N2

= σ2TX20 ·

1

6

(1− 1

N

)(2− 1

N

)So for all V∗ ≥ Vlin(X0, N) = σ2TX2

0 · 16

(1− 1

N

) (2− 1

N

)the strategy X lin is the optimal

one. It corresponds to the risk aversion λ = 0 since it minimises E(X). Also, we already

know that the optimal strategy for λ =∞ is the instantaneous one, which yields a variance

of 0 and therefore minimises limλ→∞E(X) + λV (X). Its expected costs are

E(X inst) = εX0 +η

τX2

0 .

8

We now want to compute the minimiser of U(X) := E(X)+λV (X) for general λ ∈ (0,∞)

for the case of linear temporary impact, as it was done in [1, p. 13f.].

Proposition 2.1 In the Almgren model with linear temporary impact function, the unique

minimiser of mean-variance with risk aversion λ is given by the strategy X = (X0, . . . , XN),

with

Xj = X0sinh(κ(T − tj))

sinh(κT ),

where κ is a solution of the equation

2(cosh(κτ)− 1)(ητ− γ

2

)= λσ2τ.

Proof. We have

U(X) = E(X) + λV (X) =γ

2X2

0 + εX0 +N∑k=1

(ητ− γ

2

)(Xk−1 −Xk)

2 + λN∑k=1

σ2τX2k .

This yields

0 =∂U

∂Xj

=(ητ− γ

2

)[2(Xj −Xj+1)− 2(Xj−1 −Xj)] + 2λσ2τXj

= 2λσ2τXj −

(ητ− γ

2

)(Xj−1 − 2Xj +Xj+1)

which is equivalent to

Xj−1 − 2Xj +Xj+1 =λσ2

η − γτ2

τ 2Xj = κ2τ 2Xj,

where κ2 := λσ2/(η − γτ2

). Since κ2τ 2 > 0, a solution to this difference equation must be

unique, if it exists. We guess that the solution is of the form

Xj = c−e−κtj + c+e

κtj = c−e−κjτ + c+e

κjτ .

Such a κ must solve the difference equation, i. e.

0 = Xj−1 − 2Xj +Xj+1 − κ2τ 2Xj

= c−e−κjτ (eκτ − (2 + κ2τ 2) + e−κτ

)+ c+e

κjτ(e−κτ − (2 + κ2τ 2) + eκτ

)=(e−κτ − (2 + κ2τ 2) + eκτ

)Xj.

Since X0 6= 0 it follows that e−κτ − (2 + κ2τ 2) + eκτ = 0 and therefore

2 cosh(κτ)− 2 = eκτ + e−κτ − 2 = κ2τ 2.

For positive κ2τ 2 there exist exactly two solutions to this equation, one of them positive

and the other one negative and both having the same absolute value. This is not really

9

surprising and comes from the symmetry in our ansatz. So choose κ to be the positive

solution to 2 cosh(κτ)− 2 = κ2τ 2, say. We still have to determine the coefficients c± and

do this by using the constraints to our solution at times 0 and N :

X0 = c−e−κ·0 + c+e

κ·0 = c− + c+

0 = XN = c−e−κNτ + c+e

κNτ

This yields

c− =X0e

κNτ

eκNτ − e−κNτand c+ = − X0e

−κNτ

eκNτ − e−κNτ.

Altogether we have:

Xj =X0e

κNτ

eκNτ − e−κNτe−κjτ − X0e

−κNτ

eκNτ − e−κNτeκjτ

= X0sinh(κ(N − j)τ)

sinh(κNτ)

= X0sinh(κ(T − tj))

sinh(κT )

From the uniqueness of the solution, one can see a very important feature of optimal

trading strategies, namely that they are time-homogeneous. That is, revaluation of the

strategy at later times tk always yields the strategy obtained at time 0. The only thing

that changes is the start value Xk which replaces the start value X0. But at time tk the

difference equation stays the same and is uniquely determined by its boundary values Xk

and XN = 0. This statement can also be generalised to other impact functions h, as long

as the corresponding utility function U is quadratic. [1, p. 19]

In figure 1 you can see the trajectories of the optimal strategies for the values λ1 = 0,

λ2 = 5 · 10−7, λ3 = 3 · 10−6, and λ4 = ∞. The underlying values for the parameters are

X0 = 106, T = 5d, τ = 1d, N = 5, σ2 = 0.95, η = 2.5 · 10−6, and γ = 2.5 · 10−7. All those

values are adopted from [1, Table 1] and describe a typical situation when liquidating a

portfolio of one million shares within 5 days and being allowed to trade once a day. In

figure 2, one can see the efficient frontier for the linear case. The parameters are the same

as for figure 1. In addition, ε is chosen to be ε = 0.0625, which comes from [1, Table 1]

as well.

Note the flatness of the curve near λ1. By allowing for only a little more expected costs

(e. g. 17% for λ2), one can enormously reduce the variance (e. g. 51% for λ2). Further

reduction of variance is accompanied by strongly increasing costs, however.

10

Figure 1: Optimal Trading Trajectories for Different Levels of Risk Aversion

Figure 2: The Efficient Frontier

11

3 Adaptive Strategies

Up to now, we have only considered static trading strategies and we remarked that reval-

uation in the mid of trading has no effect on further trades. But as mentioned in the

beginning, also dynamic strategies (but fixed at initial time for all possible outcomes) are

possible. In the first part of this section, we will construct an optimal adaptive strategy

which strictly improves one’s mean-variance, measured at initial time t = 0, as compared

to the optimal static strategy obtained in section 2.2. Here, we will follow Lorenz and

Almgren in [10, p. 11-16]. In the second part, we discuss the situation for an investor

with constant absolute risk aversion (CARA). We will show that in this case there is no

gain from permitting adaptive strategies. For that, we will follow Schied, Schoneborn,

and Tehranchi in [15, p. 3-16].

3.1 Mean-Variance Optimal Strategies

For this section, we keep the setting from the linear case and we want to allow previsible

strategies as well now.

We want to obtain optimal previsible strategies by using dynamic programming, i. e. we

will derive optimal dynamic strategies recursively. Before we can formulate our results,

however, we first have to define some new notation. Let

D(X0, N) =

(X,C)

∣∣∣∣∣∣∣∣∣X = (X0, X1, . . . , XN) with XN = 0

X is previsible

X0 ≥ X1 ≥ . . . ≥ XN

C(X0, N,X) ≤ C a.s.

.

This denotes the set of all strategies which liquidate X0 shares in N steps. Also, with

each strategy in this set we get an upper bound C for the trading costs which is an

FN -measurable random variable itself and C(X0, N,X), for example, will always do the

job. Sometimes, however, it can be favourable to deliberately increase trading costs if

this minimises mean-variance. This inconvenience is due to the fact, that variance is not

monotone and punishes upward deviations in the same way as it does with downward

deviations. That is, it can happen that increasing trading costs decreases variance so

much that mean-variance in total goes down. Now define for a given level E ∈ R the set

A (X0, N,E) =

(X,C) ∈ D(X0, N)∣∣ E [C ] ≤ E

which describes the set of strategies whose expected costs are bounded from above by the

constant E. Note that this set is empty if E [C(X0, N,X)] > E for all possible execution

strategiesX. In section 2.2 we have shown that the linear strategyX lin minimises expected

12

trading costs on the set of static strategies. The result stays the same even if we allow

previsible strategies: Using the previsibility of X and Jensen’s inequality we get

E [C(X0, N,X)] = E

[γ

2X2

0 +N∑k=1

(−στ12 ξk)Xk + εX0 +

N∑k=1

(ητ− γ

2

)n2k

]

=γ

2X2

0 +N∑k=1

(−στ12 )E[E [ξk | Fk]Xk

]+ εX0 +

N∑k=1

(ητ− γ

2

)E[n2k

]=γ

2X2

0 + εX0 +N∑k=1

(ητ− γ

2

)E[n2k

]≥ γ

2X2

0 + εX0 +N∑k=1

(ητ− γ

2

)E [nk]

2 ,

and as in section 2.2 this becomes minimal only for E [nk] = X0/N . Since the above

inequality is an equality if and only if nk is constant for all k, we get the unique minimising

strategy nk = X0/N which is the linear strategy X lin. Using this, we get that A (X0, N,E)

is empty if and only if E < E(X lin).

We can also describe the efficient frontier from section 2.1 using sets A (X0, N,E):

Vmin(E) = inf

Var(C) ∣∣ (X,C) ∈ A (X0, N,E)

For E < E(X lin) this is Vmin(E) = ∞. For E ≥ E(X lin) we get Vmin(E) ≤ V (X lin) =

σ2TX20 · 1

6

(1− 1

N

) (2− 1

N

)and for E ≥ E(X inst) it holds Vmin = 0 since the strategy

of instantaneously selling all shares at time t = 0 has expected and deterministic costs

E(X inst) without any variance. Also, we define the set of all efficient strategies as

E (X0, N) =

(X,C)∣∣∣ Var

(C)≥ Var

(C)

for all (X, C) ∈ A (X0, N,E[C]).

That is, a strategy is called efficient if there is no strategy with the same or less expected

costs but lower variance.

We want to show that an efficient strategy for N steps also is efficient at each interme-

diate step. For that, we denote the tail of a trading strategy (X,C) ∈ D(X0, N), with

X = (X0, X1, . . . , XN−1, 0), by (X,C)ξ1 ∈ D(X1, N − 1) where the subscript ξ1 indi-

cates that the strategy is conditioned on the outcome of ξ1, since all random variables

X2, . . . , XN can make use of information about this outcome.

In the linear case, the trading costs are given by

C(X0, N,X) =γ

2X2

0 +N∑k=1

(ητ− γ

2

)n2k + εX0 −

N∑k=1

στ12 ξkXk.

Since (γ/2)X20 +εX0 is independent of the trading strategy, we will drop it in the following

and assume

C(X0, N,X) =N∑k=1

(ητ− γ

2

)n2k −

N∑k=1

στ12 ξkXk.

13

Then we have that

C = Cξ1 +(ητ− γ

2

)(X0 −X1)2 − στ

12 ξ1X1,

where Cξ1 describes the cost bound of the tail strategy (X,C)ξ1 . In the same manner, we

also denote the trade schedule of the tail strategy by Xξ1 , which should not be confused

with the asset holdings Xk of strategy X. It will always be clear which of them is meant

in the following. With all the introduced notation, we can now formulate the following

lemma from [10, p. 12]. Note that we have added some more details to the proof from

[10, p. 12f.] here but the idea is the same.

Lemma 3.1 For N ≥ 2, let (X,C) ∈ E (X0, N) be an efficient execution strategy with

X = (X0, X1, . . . , XN−1, 0). Then almost surely it holds (X,C)ξ1 ∈ E (X1, N − 1), i. e.

B =

(X,C)ξ1 6∈ E (X1, N − 1)⊆ Ω has probability zero.

Proof. If B = ∅, then we are done. Suppose now B 6= ∅, then by definition, on B the

strategy (X,C)ξ1 is not efficient and there exists a strategy (X∗ξ1 , C∗ξ1

) ∈ D(X1, N − 1) on

B such that Var(C∗ξ1| ξ1

)< Var

(Cξ1 | ξ1

)while E

[C∗ξ1| ξ1

]≤ E

[Cξ1 | ξ1

]. Without

loss of generality, we can assume equality here by adding the σ(ξ1)-measurable term

E[Cξ1 | ξ1

]− E

[C∗ξ1| ξ1

]which has no effect on the conditional variance.

Define now a new strategy (X, C) by replacing the trade schedules at times t2, . . . , tNand the cost bound in (X,C) by (X∗ξ1 , C

∗ξ1

) on B and keeping the original strategy on Bc.

Since (Xξ1 , Cξ1) ∈ D(X0 −X1, N − 1) on the whole of Ω then, we have that

C(ω) = Cξ1(ω)(ω) + g(ξ1(ω)) ≥ C(X1, N − 1, Xξ1)(ω) + g(ξ1(ω)) = C(X0, N, X)(ω),

where we use the abbreviation g(ξ1) :=(ητ− γ

2

)(X0 − X1)2 − στ

12 ξ1X1, which is a

σ(ξ1)-measurable expression. Therefore (X, C) ∈ D(X0, N). Also, from the fact that

E[C∗ξ1| ξ1

]= E

[Cξ1 | ξ1

]on B, it follows that

E[C]

= E[E[C | ξ1

]]= E

[E[C∗ξ1| ξ1

]1B + E

[Cξ1 | ξ1

]1Bc + g(ξ1)

]= E

[E[Cξ1 | ξ1

]+ g(ξ1)

]= E

[C]

and

Var(C)

= E[Var

(C | ξ1

)]+ Var

(E[C | ξ1

])= E

[Var

(Cξ1 + g(ξ1) | ξ1

)]+ Var

(E[Cξ1 + g(ξ1) | ξ1

])= E

[Var

(Cξ1 | ξ1

)]+ Var

(E[Cξ1 | ξ1

]+ g(ξ1)

)= E

[Var

(C∗ξ1| ξ1

)1B + Var

(Cξ1 | ξ1

)1Bc]

+ Var(E[Cξ1 | ξ1

]+ g(ξ1)

).

14

If P (B) > 0, we could then conclude

Var(C)< E

[Var

(Cξ1 | ξ1

)1B + Var

(Cξ1 | ξ1

)1Bc]

+ Var(E[Cξ1 + g(ξ1) | ξ1

])= E

[Var

(Cξ1 | ξ1

)]+ Var

(E[C | ξ1

])= E

[Var

(Cξ1 + g(ξ1) | ξ1

)]+ Var

(E[C | ξ1

])= E

[Var

(C | ξ1

)]+ Var

(E[C | ξ1

])= Var

(C)

which would contradict our assumption of (X,C) ∈ E (X0, N).

We extend now our previous definition of Vmin to shorter intervals. For 1 ≤ k ≤ N and

x ≥ 0 let

Jk(x, c) = inf

Var(C)| (X,C) ∈ A (x, k, c)

.

Then clearly, Vmin(E) = JN(X0, E). Also we get the following properties:

Jk(x, c) =

∞ , c <

(ητ− γ

2

)x2

k

Vlin(x, k) , c =(ητ− γ

2

)x2

k

non-increasing in c ,(ητ− γ

2

)x2

k≤ c ≤

(ητ− γ

2

)x2

0 , c ≥(ητ− γ

2

)x2

This is due to the fact that the linear liquidation minimises expected costs with variance

Vlin(x, k) and costs(ητ− γ

2

)x2

k. For c ≥

(ητ− γ

2

)x2 we can choose the instantaneous

liquidation which yields variance 0. In between, since we are always minimising, the

variance can only be non-increasing. For k = 1 linear and instantaneous liquidation

coincide, and hence

J1(x, c) =

∞ , c <(ητ− γ

2

)x2

0 , c ≥(ητ− γ

2

)x2.

The following relations between Jk(x, c) and E (x, k) obviously hold:

(X∗, C∗) = argmin

(X,C)∈A (x,k,c)

Var(C)⇒ (X∗, C

∗) ∈ E (x, k), (5)

(X,C) ∈ E (x, k)⇒ Var(C)

= Jk(x,E[C]) (6)

We can now formulate the main statement of this section. It helps derive optimal previsible

strategies by a recursive scheme, which minimises some value function in each trading

period. In each step we give ourselves two control parameters, the number of shares y

to keep in the portfolio at the end of the period and the cost limit function z(ξ). When

following an optimal strategy, we commit ourselves to sell the remaining y shares using an

efficient strategy with cost bound z(ξ) dependent on the price change στ12 ξ. The theorem

and its proof come from [10, p. 14f.].

15

Theorem 3.2 Let the stock price change in the next trading period be στ12 ξ. Define

Gk(x, c) =

(y, z) ∈ R× L1(Ω,R)∣∣∣ E [z(ξ)] +

(ητ− γ

2

)(x− y)2 ≤ c, 0 ≤ y ≤ x

.

Then for k ≥ 2,

Jk(x, c) = min(y,z)∈Gk(x,c)

(Var

(z(ξ)− στ

12 ξy)

+ E [Jk−1(y, z(ξ))]).

Proof. For given x ≥ 0 and c ≥ Elin(x, k), let

(X∗, C∗) = argmin

(X,C)∈A (x,k,c)

Var(C).

That means that X∗ = (x, y, . . .) is an optimal strategy for selling x shares in k trading

periods with expected costs not exceeding c. By (5) this implies (X∗, C∗) ∈ E (x, k) and

further Var(C∗)

= Jk(x,E[C∗]) by (6). Contained in this strategy X∗ we identify both

the number of shares to be sold in the first period x − y, and the strategy of selling the

remaining y shares in the remaining k − 1 periods. We denote this tail-strategy, which

may depend on the outcome of ξ, by (X∗, C∗)ξ.

By lemma 3.1, we know that (X∗, C∗)ξ ∈ E (y, k − 1) almost surely. Writing z(ξ) for

E[C∗ξ

], this implies

Var(C∗ξ

)= Jk−1(y, z(ξ)),

again by (5) and (6). Also, since minimal expected costs are achieved by the linear trading

strategy and since (X∗, C∗)ξ ∈ E (y, k − 1), it must hold

z(ξ) ≥ Elin(y, k − 1).

We can then write

E[C∗ | ξ

]= z(ξ) +

(ητ− γ

2

)(x− y)2 − στ

12 ξy,

Var(C∗ | ξ

)= Jk−1(y, z(ξ)),

and derive

E[C∗]

= E [z(ξ)] +(ητ− γ

2

)(x− y)2,

Var(C∗)

= Var(z(ξ)− στ

12 ξy)

+ E [Jk−1(y, z(ξ))] .

Since (X∗, C∗) was chosen to be the minimiser of Var

(C)

over the set A (x, k, c), we can

equivalently minimise

Var(C∗)

= Var(z(ξ)− στ

12 ξy)

+ E [Jk−1(y, z(ξ))]

16

over all (z(ξ), y) constrained to

E [z(ξ)] +(ητ− γ

2

)(x− y)2 ≤ c,

0 ≤ y ≤ x,

z(ξ) ≥ Elin(y, k − 1).

Since Jk−1(y, z(ξ)) becomes∞ for z(ξ) < Elin(y, k−1), such a pair can never be minimising

the expression above and we can drop the last constraint.

Using the theorem, we can then describe optimal solutions recursively: We start with the

original problem

Vmin(E) = min(X,C)∈A (X0,N,E)

Var(C).

By the theorem, we get that

Vmin(E) = JN(X0, E) = min(y,z)∈GN (X0,E)

(Var

(z(ξ)− στ

12 ξy)

+ E [JN−1(y, z(ξ))]),

which we can solve if JN−1(y, z(ξ)) is known for all ((y, z) ∈ GN(X0, E). We can repeat this

minimisation up to J1 where the instantaneous execution strategy is always the optimal

(and only) one. Plugging in backwards yields all minimal values and their minimisers.

Let k ≥ 2 and (y, z(ξ)) be the minimiser in

Jk(x, c) = min(y,z)∈Gk(x,c)

(Var

(z(ξ)− στ

12 ξy)

+ E [Jk−1(y, z(ξ))]),

and(X∗k−1(y, z(ξ)), C

∗k−1(y, z(ξ)

)the minimiser in

Jk−1(y, z(ξ)) = min(X,C)∈A (y,k−1,z(ξ))

Var(C).

Then recursively,

X∗k(x, c) = (x,X∗k−1(y, z(ξ))),

C∗k(x, c) = C

∗k−1(y, z(ξ)) +

(ητ− γ

2

)(x− y)2 − στ

12 ξy,

where X∗1 (x, c) = (x, 0) and C∗1(x, c) = max

(ητ− γ

2

)x2, c

. Combining the strategies for

all steps yields a (in general previsible) strategy X∗ and a cost bound C∗ ≥ C(X0, N,X

∗)

which solve the original problem of minimising Var(C)

with expected costs bounded by

some constant E. Note that, with our cost bound in the last step, we allow ourselves

to give away some money if c is greater than the actual occurred costs(ητ− γ

2

)x2. As

mentioned before, this is sensible in the case where lower costs increase the variance of

the costs. By construction of our cost bounds we will never have total expected costs

higher than E, even when giving away money in the last step.

17

Figure 3: Adaptive Trajectories [10, p. 26]

It is not immediately clear that this optimisation process will not yield the same static

execution strategy as proposition 2.1, regardless of the additional information in each

period. Numerical simulations conducted by Lorenz and Almgren in [10], however, show

that this is not the case. In figure 3, you can see the optimal static solution (drawn in

black) for some point on the static efficient frontier. Also you can see the simulations

for optimal adaptive strategies in two rather extreme cases of price movements of the

asset (red and blue). Besides the fact that the adaptive trajectories do not coincide with

the static one, the figure also shows that adaptive strategies are aggressive in-the-money

which means that we increase our trading speed when the asset price rises and decrease it

when the asset price falls. This is because when the asset price rises we can spend those

gains we make on higher impact costs to decrease future variance due to both market

volatility and the unexpected gains. If the asset price falls, we decrease future trading

costs by slower trading in order to compensate the losses due to the price fall. This

argumentation, of course, only makes sense if price movements are uncorrelated, as it is

the case for our model, or even negatively correlated. For a positive correlation, it would

be reasonable to keep holdings while prices are in an upwards trend and get rid of them,

when there is a downwards trend.

18

3.2 CARA Investors

In the previous sections, we have discussed optimal trading strategies with respect to

minimising mean-variance and, in section 3.1, we have seen that permitting adaptive

strategies can strictly improve the result of our optimisation. In this section, we fol-

low Schied, Schoneborn, and Tehranchi in [15] where they consider optimisation under

the utility function u(r) = − exp(−αr) instead, which is called exponential or CARA

(constant absolute risk aversion) utility function. It has the advantage that an optimal

solution is time consistent and does not depend on the investor’s initial wealth. Also for

a CARA utility function, we obtain that optimal trading strategies are deterministic and

cannot, as in the mean-variance case, be improved by allowing information dependent

trading. This will be the main result of this section. We update our setting a little bit

and change to a continuous time model from now on. Also, we consider a multi-asset

market with drift.

We assume that our probability space (Ω,F ,P) is equipped with a filtration (Ft)t≥0 sat-

isfying the usual conditions. For the price process, we consider a Bachelier model which

is the continuous time analogue of the previously considered model. That is, the price

processes are given by

Sit = Si0 +m∑j=1

σijBjt + bit, i = 1, . . . , d,

where S0 ∈ Rd is the initial price vector, B is an m-dimensional Brownian motion adapted

to the filtration (Ft)t≥0 and starting at B0 = 0, σ ∈ Rd×m is the volatility matrix, and

b ∈ Rd is the drift vector. To rule out arbitrage possibilities in the unperturbed market,

we assume that b ⊥ ker Σ, where Σ := σσ> is the covariance matrix. Otherwise one could

follow a constant strategy H ∈ ker Σ = ker σ> with b>H > 0. Then

d(Ht · St) = H · dSt = H · (σ dBt + b dt) = (σ>H)>dBt + b>H dt = b>H dt,

and H would be an arbitrage possibility.

The trading strategies (Xt)t∈[0,T ] are continuous time processes adapted to the filtration

(Ft)t≥0 and we assume them to be absolutely continuous. This describes that we cannot

trade a positive number of shares in an infinitesimal period of time. It assures us that

the derivative Xt exists almost everywhere, and is an L1-function. If, namely, ε > 0

is given, by the absolute continuity of X we find a δ > 0 such that for all intervals

I1 = (s1, t1), . . . , Im = (sm, tm) with 0 ≤ s1 ≤ t1 ≤ s2 ≤ t2 ≤ . . . ≤ sm ≤ tm ≤ T and∑mk=1 |tk − sk| < δ, it follows

∑mk=1 |Xtk −Xsk | < ε. But then for each coordinate X i of

X we have

ε >

m∑k=1

|Xtk −Xsk | ≥m∑k=1

|X itk−X i

sk|,

19

and hence X i is absolutely continuous. Thus, we know that the derivatives X i exist

almost everywhere and are in L1 = L1([0, T ],B([0, T ]),Leb). The same holds for X which

is given by Xt = (X1t , . . . , X

dt )>.

In addition, we assume that |Xt(ω)| is bounded for almost all ω ∈ Ω and all t ∈ [0, T ].

This assumption is sensible since we cannot buy or sell short arbitrarily many assets.

We introduce the following classification:

Xdet(T,X0) =X : [0, T ]→ Rd absolutely continuous with given X0 and XT = 0

,

X (T,X0) =

(Xt)t∈[0,T ] adapted with t 7→ Xt(ω) ∈ Xdet(T,X0) almost surely

and supt∈[0,T ]

|Xt| ∈ L∞(P)

Xdet(T,X0) describes the set of all admissible deterministic strategies which liquidate the

portfolio X0 in time T . It is a proper subset of the set X (T,X0) which denotes the set of

all admissible adapted strategies liquidating the portfolio.

Analogously to the previous sections, we see that the execution costs of a strategy X are

given by

C(X0, T,X) = −∫ T

0

Xt · dSt + F (X0, T,X),

where the functional F is given by

F (X0, T,X) : =

∫ T

0

vt · [Γ(X0 −Xt) + h(vt)] dt =1

2X>0 ΓX0 +

∫ T

0

f(vt)dt.

Here vt := −Xt is the trading speed at time t and Γ ∈ Rd×d describes the linear per-

manent impact. Further, h : Rd → Rd is the temporary impact, and f(v) := v · h(v)

which we assume to be non-negative, strictly convex, to have superlinear growth and to

be continuously differentiable. Also analogously to before, the revenues of the strategy

X ∈ X (T,X0) are:

RXT = S0 ·X0 − C(X0, T,X)

For the CARA utility function u(r) = − exp(−αr), α > 0, we can then formulate the

following theorem from [15, p. 6]. We have largely copied the proof from [15, p. 12f.] but

also filled in some details and adapted it to our case.

Theorem 3.3 We have

supX∈X (T,X0)

E[u(RX

T )]

= supX∈Xdet(T,X0)

E[u(RX

T )]. (7)

In particular, when there exists a deterministic strategy X∗ that maximises the expected

utility E[u(RX

T )]

within the class Xdet(T,X0) of deterministic strategies, then X∗ also

maximises the expected utility within the class X (T,X0) of all strategies.

20

Proof. We have to consider the expression

E[u(RX

T )]

= −e−αX0·S0E[exp

(−α∫ T

0

Xt · dSt + αF (X0, T,X)

)].

First, we note that for deterministic X it holds

E[exp

(−α∫ T

0

Xt · dSt)]

= E[exp

(∫ T

0

(−αXt) · (σdBt + b dt)

)]= exp

(1

2

∫ T

0

(−αXt)>Σ(−αXt)dt+

∫ T

0

(−αXt)>b dt

)(8)

since σ>(−αXt) is bounded and therefore∫ T

0

(−αXt) · σdBt ∼ N(

0,

∫ T

0

(−αXt)>Σ(−αXt)dt

).

Denoting the log-moment generating function of S1−S0 by Λ : Rd → R, we can compute

Λ(θ) = log(E[eθ·(S1−S0)

])= log

(E[eθ·(σB1+b)

])= log

(e

12θ>Σθ · eθ>b

)=

1

2θ>Σθ + θ>b.

Plugging this into (8) then yields

E[exp

(−α∫ T

0

Xt · dSt)]

= exp

(∫ T

0

Λ(−αXt)

),

and for the original expression we get

E[u(RX

T )]

= − exp

(−αX0 · S0 +

∫ T

0

Λ(−αXt)dt+ αF (X0, T,X)

),

if X ∈ Xdet(T,X0) and F (X0, T,X) therefore is deterministic.

Now define

M := infX∈Xdet(T,X0)

(∫ T

0


).

If M = −∞, then obviously both sides in (7) equal zero and the statement of the theorem

holds. Suppose now M > −∞ and take ε > 0 and Xε ∈ Xdet(T,X0) such that∫ T

0

Λ(−αXεt )dt+ αF (X0, T,X

ε) ≤M + ε.

We now want to bound the expression E[exp

(−α∫ T

0Xt · dSt + αF (X0, T,X)

)]for ar-

bitrary X ∈ X (T,X0) using the deterministic strategy Xε. In order to do so, we change

to the measure PX , given by the Radon-Nikodym density

dPX

dP= exp

(−α∫ T

0

Xt · dSt −∫ T

0

Λ(−αXt)dt

).

21

This expression is always positive and we get that PX ∼ P. To derive that PX is indeed

a probability measure we have to show that

E[exp

(−α∫ T

0

Xt · dSt −∫ T

0

Λ(−αXt)dt

)]= 1. (9)

To prove this, we define the simple previsible processes

Xn :=n−1∑k=0

Xtk1(tk,tk+1],

with tk := kT/n. Further, we define the processes Zn by

Znt := exp

(−α∫ t

0

Xnu · dSu −

∫ t

0

Λ(−αXnu )du

).

For u ≥ v, it holds Bu − Bv ∼ N (0, (u− v)Im), and so for θ an Fv-measurable random

variable we get

E[eθ·(Su−Sv)

∣∣Fv] = E[eθ·(σ(Bu−Bv)+b(u−v))

∣∣Fv] = e12

(u−v)θ>Σθ+θ>b(u−v) = e(u−v)Λ(θ).

Thus, we compute

E [ZnT ] = E

[exp

(−α∫ T

0

Xnu · dSu −

∫ T

0

Λ(−αXnu )du

)]= E

[exp

(n−1∑k=0

(−αXtk) · (Stk+1− Stk)−

n−1∑k=0

Λ(−αXtk)(tk+1 − tk)

)]

= E

[exp

(n−2∑k=0


n−2∑k=0


)

× E[exp

(−αXtn−1 · (Stn − Stn−1)− Λ(−αXtn−1)(tn − tn−1)

) ∣∣∣ Ftn−1

] ]

= E

[exp

(n−2∑k=0


n−2∑k=0


)],

and taking conditional expectations with respect to Ftn−2 , . . . ,Ft0 we get E [ZnT ] = 1.

Also, using that the exponential function is continuous and that

−α∫ T

0

Xnt · dSt −

∫ T

0

Λ(−αXnt )dt −→ −α

∫ T

0

Xt · dSt −∫ T

0

Λ(−αXt)dt

in probability by the definition of the integrals, we derive that

ZnT −→ exp

(−α∫ T

0

Xt · dSt −∫ T

0

Λ(−αXt)dt

)22

in probability. In order to derive that (ZnT )n≥1 is uniformly integrable, it is now sufficient

to show that it is bounded in L2 . But we can write E [(ZnT )2] as

E[(Zn

T )2]

= E[exp

(−∫ T

0

2αXnt · dSt −

∫ T

0

Λ(−2αXnt )dt

)Y n

], (10)

where we define the sequence (Y n)n≥1 by

Y n := exp

(∫ T

0

Λ(−2αXnt )− 2Λ(−αXn

t )dt

).

Using that Λ(θ) = 1/2 · θ>Σθ + θ>b is continuous we get that

supθ∈B(0,2αC)

|Λ(θ)| =: m <∞,

where C is the bound we assumed on |Xt(ω)| for almost all ω. Thus, it holds almost

surely

Y n ≤ exp

(∫ T

0

(m+ 2m)dt

)= e3mT =: K.

Iteratively taking conditional expectations in (10), we also get E [(ZnT )2] ≤ K. So (Zn

T )n≥1

is bounded in L2 and hence uniformly integrable. This, however, shows (9), since on the

one hand E [ZnT ] = 1 for all n and on the other hand by uniform integrability

E [ZnT ] −→ E

[exp

(−α∫ T

0

Xt · dSt −∫ T

0

Λ(−αXt)dt

)].

Now we can conduct the measure change and derive that

E[exp

(−α∫ T

0


)]= E

[dPX

dPexp

(∫ T

0


)]= EX

[exp

(∫ T

0


)]≥ EX

[exp(M)

]≥ e−εEX

[exp

(∫ T

0

Λ(−αXεt )dt+ αF (X0, T,X

ε)

)]= e−εEX

[−eαX0·S0E

[u(RXε

t )]]

= −e−εeαX0·S0E[u(RXε

t )].

where the first inequality holds since PX ∼ P and for P-almost all ω ∈ Ω we have

X(ω) ∈ Xdet(T,X0), which implies that PX-almost surely∫ T

0

Λ(−αXt)dt+ αF (X0, T,X) ≥M.

23

Then, we further derive

supX∈X (T,X0)

E[u(RX

T )]

= supX∈X (T,X0)

−e−αX0·S0E

[exp

(−α∫ T

0


)]≤ e−εE

[u(RXε

t )]

≤ e−ε supX∈Xdet(T,X0)

E[u(RX)

t

]and by sending ε 0 we get the result.

Remark 3.4 The proof does not depend on our specific choices for the price process

and the cost functional. It still holds if we consider the price process (St)t≥0 to be a

d-dimensional Levy process for the filtration (Ft)t≥0 which has all exponential moments

E[eλ·St

]< ∞. Also, we can replace our specific choice for the functional F (X0, T,X) in

the continuous Almgren model by an arbitrary functional F : Xdet(T,X0) → R ∪ ∞which yields the execution costs for each X(ω). To be sure that execution costs are not

infinite for all strategies, we would assume that F (X lin) <∞, where X lin is the strategy

of continuously trading at the same speed.

For our model, we can make use of the result to prove the following even stronger theorem

given in [15, p. 9]:

Theorem 3.5 For a CARA utility function, u(x) = −e−αx, α > 0, there exists a

P-almost surely unique optimal strategy X∗ ∈ X (T,X0). This strategy X∗ is a deter-

ministic function of time.

Before we can proof theorem 3.5, we have to make some preparing observations which

will show us how to approach the problem.

Remark 3.6 First, note that if in (7) a maximiser exists, it is unique since X (T,X0)

is a convex set and we can check that E[u(RX

T )]

as a function of X is strictly concave:

Let µ ∈ (0, 1) and X 6= Y ∈ X (T,X0), then with vXt = −Xt and vYt = −Yt we have

RµX+(1−µ)YT = S0 ·X0 +

∫ T

0

(µXt + (1− µ)Yt) · dSt −1

2X>0 ΓX0

−∫ T

0

f(µvXt + (1− µ)vYt )dt

> S0 ·X0 + µ

∫ T

0

Xt · dSt + (1− µ)

∫ T

0

Yt · dSt −1

2X>0 ΓX0

− µ∫ T

0

f(vXt )− (1− µ)

∫ T

0

f(vYt )dt

= µRXT + (1− µ)RY

T ,

24

since f is strictly convex. As a utility function, u is strictly increasing and strictly concave.

Therefore, we conclude

E[u(RµX+(1−µ)YT

)]> E

[u(µRX

T + (1− µ)RYT

) ]> µE

[u(RX

T )]

+ (1− µ)E[u(RY

T )].

In order to maximise E[u(RX

T )]

or equivalently to minimise E[exp(−αRX

T )], we now

define the value function of the problem by

V (T,X0, R0) : = infX∈X (T,X0)

E[e−αR

XT

](11)

= infX∈X (T,X0)

E[exp

(−αR0 + α

∫ T

0

Xt · dSt − α∫ T

0

f(vt)dt

)],

where R0 := S0 ·X0− 12X>0 ΓX0. More general, we set Rt := R0 +

∫ t0Xu ·dSu−

∫ t0f(vu)du,

which can be understood as the revenues we have secured until time t. This will later

allow us to evaluate V at different times t. We can rewrite expression (11) using theorem

3.3, the normal distribution of C(X0, T,X) if X is deterministic, and the fact that the

exponential function is both continuous and monotonically increasing:

V (T,X0, R0) = infX∈X (T,X0)

E[e−αR

XT

]= inf

X∈Xdet(T,X0)E[e−αR

XT

]= inf

X∈Xdet(T,X0)E[e−αS0·X0+αC(X0,T,X)

]= inf

X∈Xdet(T,X0)e−αS0·X0 · eαE[C(X0,T,X)]+α2

2Var(C(X0,T,X))

= exp

[−αS0 ·X0 + inf

X∈Xdet(T,X0)

(αE [C(X0, T,X)] +

α2

2Var (C(X0, T,X))

)]So as noticed in [15, p. 10], the problem we actually have to solve is the mean-variance

minimisation with level of risk aversion λ = α/2:

infX∈Xdet(T,X0)

(E [C(X0, T,X)] +

α

2Var (C(X0, T,X))

)This observation strongly simplifies the search for an optimal trading strategy. Note,

however, that we really have to restrict ourselves to deterministic strategies this time,

and we cannot improve the result by allowing adaptive strategies as in section 3.1, since

in this case C(X0, T,X) is not normally distributed any more and the above equation

fails. For

C(X0, T,X) = −∫ T

0

Xt · dSt +1

2X>0 ΓX0 +

∫ T

0

f(vt)dt,

25

we can now calculate the expectation as

E [C(X0, T,X)] =1

2X>0 ΓX0 +

∫ T

0

(−b>Xt + f(vt)

)dt,

and the variance as

Var (C(X0, T,X)) =

∫ T

0

X>t ΣXt dt.

Omitting the constant term 12X>0 ΓX0, we then obtain the Lagrangian problem

infX

∫ T

0

L(Xt, Xt)dt

where we minimise over absolutely continuous curves X starting at X0 and ending at

XT = 0, and the Lagrangian L is given by

L(q, p) :=α

2q>Σq − b>q + f(−p) (12)

for q, p ∈ Rd. The Hamiltonian corresponding to L is

H(q, p) := −α2q>Σq + b>q + f ∗(−p), (13)

where f ∗(z) = supx∈Rd(x>z − f(x)) is the Fenchel-Legendre transform of f for z ∈ Rd.

Altogether, we obtain

V (T,X0, R0) = exp

[−αS0 ·X0 + inf

X∈Xdet(T,X0)

(α

2X>0 ΓX0 + α

∫ T

0

(−b>Xt + f(vt)

)dt

+α2

2

∫ T

0

X>t ΣXtdt

)]= exp

(−αR0 + α inf

X∈Xdet(T,X0)

∫ T

0

L(Xt, Xt)dt

). (14)

In order to prove theorem 3.5, we still have to show that there is, indeed, a strategy

X ∈ Xdet(T,X0) minimising the integral in (14). To this end, consider the following

theorem which combines the results from Lemma 7.3 in [5, p. 73] and Theorem 7.1 in

[5, p. 74] for our particular case, but cannot be proven in the scope of this essay:

Theorem 3.7 For a function L : Rd × Rd → R, consider the variational problem

inf

∫ T

0

L(Yt, Yt)dt, (15)

where the infimum is taken over all Lipschitz continuous curves (Yt)0≤t≤T such that Y0 = 0

and YT = X0. Assume that the Hamiltonian H(q, p) corresponding to L(q, p) satisfies the

following conditions.

26

(H1) H(q, p) is strictly convex in p.

(H2) H is continuously differentiable.

(H3) H(q, p)/|p| → ∞ as |p| → ∞ for each q.

(H4) |∇qH| ≤ c1(p · ∇pH −H) + c2 for some constants c1, c2 with c1 ≥ 0.

(H5) p · ∇pH −H ≥ c3 for some constant c3.

(H6) H, |∇pH| ≤ g(|p|) for some non-negative, increasing function g : [0,∞)→ R.

Then (15) has an extremal. Furthermore,

u(T,X0) := min

∫ T

0

L(Yt, Yt)dt,

with the minimum taken over all Lipschitz curves with Y0 = 0 and YT = X0, solves the

Hamilton-Jacobi equation∂

∂Tu+H(X0,∇X0u) = 0

with initial condition u(0, 0) = 0.

N.B. The difference between theorem 3.7 and the original theorem and lemma in

[5, p. 73f.] is that, as suggested in [15, p. 14], we have already chosen the boundary

set B = (0, 0) ⊂ R × Rd and the boundary data f ∈ C(B) as f(0, 0) = 0 such that

all conditions required of f in the original theorem are vacuously true. Note that this

boundary data f has nothing to do with our function f(v) = v · h(v). Also, in theorem

3.7 we restrict ourselves to time-homogeneous Lagrangians respectively Hamiltonians, and

functions of time t in the original theorem become constants in our case.

Remark 3.8 While in our original problem the strategies X start at X0 and end at

XT = 0, in theorem 3.7 they have to start at 0 and end at X0. Simply by considering

Yt := XT−t, however, we are in the right position. We just have to change the Lagrangian

and Hamiltonian slightly [15, p. 14]:

L(q, p) := L(q,−p) =α

2q>Σq − b>q + f(p),

H(q, p) := H(q,−p) = −α2q>Σq + b>q + f ∗(p)

This yields:∫ T

0

L(Yt, Yt)dt =

∫ T

0

L(XT−t,− ˙(XT−t))dt =

∫ T

0

L(XT−t, XT−t)dt

=

∫ 0

T

−L(Xt, Xt)dt =

∫ T

0

L(Xt, Xt)dt

27

Theorem 3.7 is now almost applicable to our case. However, while in (14) we take the

infimum over the set of absolutely continuous curves, in theorem 3.7 we minimise over the

strictly smaller set of Lipschitz continuous curves. The following lemma shows that we

can approximate each absolutely continuous curve Y by a sequence (Y n)n≥1 of Lipschitz

continuous curves such that∫ T

0L(Y n

t , Ynt )dt →

∫ T0L(Yt, Yt)dt as n → ∞. In particular,

we find Lipschitz continuous curves Y n such that∫ T

0L(Y n

t , Ynt )dt → inf

∫ T0L(Yt, Yt)dt

and there is no loss in restricting ourselves to this smaller set. Note that it is enough to

show the lemma under the assumption that∫ T

0L(Yt, Yt)dt <∞. Otherwise, we can either

leave out Y in the sequence which approaches the infimum, or if the infimum is infinite

itself, we can simply choose an arbitrary sequence of Lipschitz continuous curves which

then also has to approach infinity. While this fact was already observed in [15, p. 15], its

justification was left out there. Here, however, we want to give a detailed proof.

Lemma 3.9 For each absolutely continuous curve Y : [0, T ]→ Rd with Y0 = 0, YT = X0,

and∫ T

0L(Yt, Yt)dt < ∞, we can find a sequence of Lipschitz curves Y n : [0, T ] → Rd

such that Y n0 = 0, Y n

T = X0, and∫ T

0L(Y n

t , Ynt )dt→

∫ T0L(Yt, Yt)dt, where L is as defined

above.

Proof. Since Y is absolutely continuous, as discussed before, we know that the derivative

Y = (Y 1, . . . , Y d)> exists almost everywhere and is in L1 = L1([0, T ],B([0, T ]),Leb).

In particular, Y = (Y 1, . . . , Y d)> is a measurable function and hence it follows that0 ≤ t ≤ T : |Yt| ≤ R

∈ B([0, T ]) for all R ≥ 0. Since Y ∈ L1, we find an R ≥ 0 such

that B :=

0 ≤ t ≤ T : |Yt| ≤ R∈ B([0, T ]) has measure Leb(B) > 0. Define now

Znt := Yt ·

∣∣∣Yt∣∣∣ ∧ n∣∣∣Yt∣∣∣ +1B(t)

Leb(B)

∫ T

0

Ys ·

(∣∣∣Ys∣∣∣− n)+∣∣∣Ys∣∣∣ ds.

Here and in the following, we always set Yt/∣∣∣Yt∣∣∣ = 0, if Yt = 0. Since Y ∈ L1, we know

that ∣∣∣∣∣∣∣1B(t)

Leb(B)

∫ T

0

Ys ·

(∣∣∣Ys∣∣∣− n)+∣∣∣Ys∣∣∣ ds

∣∣∣∣∣∣∣ ≤1

Leb(B)

∫ T

0

∣∣∣Ys∣∣∣ ·(∣∣∣Ys∣∣∣− n)+∣∣∣Ys∣∣∣ ds

≤ 1

Leb(B)

∫ T

0

∣∣∣Ys∣∣∣ ds =: β <∞,

and hence

|Znt | ≤

∣∣∣Yt∣∣∣ ·∣∣∣Yt∣∣∣ ∧ n∣∣∣Yt∣∣∣ + β ≤ n+ β.

28

Now, define Y nt :=

∫ t0Zns ds. We get that

|Y nt − Y n

s | =∣∣∣∣∫ t

s

Znudu

∣∣∣∣ ≤ ∣∣∣∣∫ t

s

|Znu |du

∣∣∣∣ ≤ ∣∣∣∣∫ t

s

(n+ β)du

∣∣∣∣ = (n+ β)|t− s|,

and hence Y n is a sequence of Lipschitz continuous curves. Also, we have Y n0 = 0 and

Y nT =

∫ T

0

Znt dt =

∫ T

0

Yt ·

∣∣∣Yt∣∣∣ ∧ n∣∣∣Yt∣∣∣ dt+Leb(B)

Leb(B)

∫ T

0

Yt ·

(∣∣∣Yt∣∣∣− n)+∣∣∣Yt∣∣∣ dt =

∫ T

0

Yt dt = X0,

for all n.

It remains to show that∫ T

0L(Y n

t , Znt )dt→

∫ T0L(Yt, Yt)dt, as n→∞:

First note that Znt → Yt as n → ∞ for almost all t ∈ [0, T ], where we use that Y ∈ L1

and hence ∫ T

0

Yt ·

(∣∣∣Yt∣∣∣− n)+∣∣∣Yt∣∣∣ dt→ 0, as n→∞.

Now, we want to use the dominated convergence theorem for the ith component of Y nt ,

Y i,nt =

∫ t

0

Zi,ns ds =

∫ T

0

Zi,ns 1[0,t](s)ds,

where Zi,ns is the ith component of Zn

s . But since on B it holds R ≥∣∣∣Yt∣∣∣ ≥ ∣∣∣Y i

t

∣∣∣, we get

that ∣∣Zi,ns 1[0,t](s)

∣∣ ≤ ∣∣Zi,ns

∣∣ ≤ |Zns | ≤ 1Bc(s)

∣∣∣Ys∣∣∣+ 1B(s) (R + β) ,

which is an integrable function of s since Y ∈ L1. The dominated convergence theorem

then yields

Y i,nt =

∫ T

0

Zi,ns 1[0,t](s)ds→

∫ T

0

Y is 1[0,t](s)ds =

∫ t

0

Y is ds = Y i

t , as n→∞,

and hence Y nt → Yt for all t. Since the function q 7→ α

2q>Σq − b>q is continuous, we then

haveα

2(Y n

t )>ΣY nt − b>Y n

t →α

2(Yt)

>ΣYt − b>Yt, as n→∞,

for all t ∈ [0, T ]. Further, since Y ∈ L1, we have

|Y nt | =

∣∣∣∣∫ t

0

Zns ds

∣∣∣∣ ≤ ∫ t

0

|Zns |ds ≤

∫ T

0

|Zns |ds

≤∫ T

0

(1Bc(s)

∣∣∣Ys∣∣∣+ 1B(s)(R + β))ds =: γ <∞,

29

and hence again with continuity of q 7→ α2q>Σq − b>q,∣∣∣α

2(Y n

t )>ΣY nt − b>Y n

t

∣∣∣ ≤ supz∈Rd : |z|≤γ

∣∣∣α2z>Σz − b>z

∣∣∣ <∞.By the dominated convergence theorem we then have∫ T

0

(α2

(Y nt )>ΣY n

t − b>Y nt

)dt→

∫ T

0

(α2

(Yt)>ΣYt − b>Yt

)dt.

Now it only remains to show that∫ T

0f(Zn

t )dt→∫ T

0f(Yt)dt:

Since f is continuous, we know that f(Znt ) → f(Yt), as n → ∞ for almost all t. Again,

we want to use dominated convergence and since f is non-negative we see that

f (Znt ) = 1Bc(t)f (Zn

t ) + 1B(t)f (Znt ) ≤ f

Yt ·∣∣∣Yt∣∣∣ ∧ n∣∣∣Yt∣∣∣

+ maxz∈Rd : |z|≤R+β

f(z).

The maximum is taken over a compact set and so the second summand is only a constant

for the continuous function f . To deal with the first summand, we note that f is assumed

strictly convex and non-negative, and f(0) = 0. Since for |Yt| > 0, (|Yt| ∧ n)/|Yt| ∈ [0, 1],

we then get

f

Yt ·∣∣∣Yt∣∣∣ ∧ n∣∣∣Yt∣∣∣

≤∣∣∣Yt∣∣∣ ∧ n∣∣∣Yt∣∣∣ · f(Yt) +

1−

∣∣∣Yt∣∣∣ ∧ n∣∣∣Yt∣∣∣ f(0) ≤ f(Yt).

For Yt = 0 the statement trivially holds. Thus,

f (Znt ) ≤ f

(Yt

)+ constant,

which is in L1 by the assumption that∫ T

0L(Yt, Yt)dt <∞. So we can apply the dominated

convergence theorem once more and get∫ T

0

f(Znt )dt→

∫ T

0

f(Yt)dt,

which finishes the proof.

We can now prove theorem 3.5. Note that it only remains to check conditions (H1)-(H6)

from theorem 3.7. This will yield the existence of a minimiser in (14) which then is

deterministic and unique as remarked before. At the proof of conditions (H1)-(H6), we

follow the idea in [15, p. 15]. Again we have added a couple of details to the original

proof.

30

Proof. First, we note that f ∗(p) is a strictly convex, continuously differentiable and su-

perlinearly growing function, since f was strictly convex, continuously differentiable and

superlinearly growing. See Theorem 26.6 in [14, p. 259] for instance, where we use the

equivalence of co-finiteness and superlinear growth in our case, and that a differentiable

convex function on the open set Rd is always continuously differentiable. Then, since in

H(q, p) = −α2q>Σq + b>q + f ∗(p)

the only dependence on p comes from f ∗(p), (H1) and (H3) are satisfied and H is contin-

uously differentiable with respect to p. By computing

∇qH(q, p) = −αΣq + b

we see thatH is also continuously differentiable with respect to q, and hence it is altogether

continuously differentiable and (H2) holds.

Theorem 26.6 in [14, p. 259] also gives us the equation

f ∗(p) = p · (∇f)−1(p)− f((∇f)−1(p)).

Since f is continuous on the closed set Rd, it is in particular a closed convex function and

we can also apply Theorem 26.5 in [14, p. 258] which gives us the equation

(∇f ∗)(p) = (∇f)−1(p).

Thus, we get

p · ∇pH(q, p)−H(q, p) = p · (∇f ∗)(p)− f ∗(p) +α

2q>Σq − b>q

= p · (∇f ∗)(p)−[p · (∇f ∗)(p)− f((∇f ∗)(p))

]+α

2q>Σq − b>q

= f((∇f ∗)(p)) +α

2q>Σq − b>q

≥ α

2q>Σq − b>q (16)

≥ α

4q>Σq − b>q, (17)

where we used f ≥ 0 and α/4 · q>Σq ≥ 0.

(H5) says that p · ∇pH −H is bounded below by a constant. Therefore, we consider the

expressionα

4q>Σq − b>q

and show that it has a global minimum (not necessarily attained at a single point but

possibly on an affine subspace of Rd). Since α > 0 and Σ = σσ> is positive semidefinite,

this is the case if its gradient becomes zero for some q ∈ Rd. But we can compute

∇q

(α4q>Σq − b>q

)=α

2Σq − b,

31

and so we have to solve the linear system

α

2Σq = b.

Since Σ = σσ> is a symmetric, positive semidefinite matrix, we find an orthonormal

basis B = (v1, v2, . . . , vd) of eigenvectors of Σ with corresponding eigenvalues ei ≥ 0,

i = 1, . . . , d. With respect to this basis the linear system becomes

α

2

e1

e2

. . .

ed

q1

q2

...

qd

=

b1

b2

...

bd

and we solve it by setting

qi :=

bi · 2αei

, if ei 6= 0

arbitrary (e. g. 0) , if ei = 0, i = 1, . . . , d.

At this point we have to recall that we assumed b ⊥ ker Σ in the beginning. So we have

that bi = 0 whenever ei = 0 since this means that vi ∈ ker Σ. Hence, q as constructed

above, indeed, solves the linear system. Defining c3 as the minimum, we then have

p · ∇pH −H ≥ c3 and this is (H5).

Using the same inequality in (16), we also get

p · ∇pH(q, p)−H(q, p) ≥ α

4q>Σq + c3. (18)

Again, we consider the orthonormal basis B = (v1, . . . , vd) of eigenvectors of Σ and we

denote the coordinates of q with respect to B by qi. That is, q =∑d

i=1 qivi. Using

eiq2i ≥

ei|qi| ≥ ei|qi| − ei , if |qi| ≥ 1

0 ≥ ei|qi| − ei , if |qi| < 1, i = 1, . . . , d,

(18) then becomes

p · ∇pH(q, p)−H(q, p) ≥ α

4

d∑i=1

eiq2i + c3 ≥

α

4

d∑i=1

ei|qi|+ c3 −α

4

d∑i=1

ei.

Now, we compute

|∇qH(q, p)| = | − αΣq + b| ≤ α|Σq|+ |b| ≤ αd∑i=1

ei|qi|+ |b|.

32

and we see that

|∇qH(q, p)| ≤ c1(p · ∇pH(q, p)−H(q, p)) + c2,

for c1 := 4 and c2 := α∑d

i=1 ei − 4c3 + |b|. Since 4 > 0 this is condition (H4).

Lastly, we have to check condition (H6). But we can simply set

g1(x) = supp∈Rd: |p|≤x

f ∗(p) + x

and

g2(x) = supp∈Rd: |p|≤x

|(∇f ∗)(p)|+ x.

Since f ∗ and ∇f ∗ are continuous, then g1 : [0,∞) → R and g2 : [0,∞) → R. Also, by

construction these functions are strictly increasing, and while g2 trivially is non-negative,

g1 also is since f ∗(0) = supq∈Rd(−f(q)) ≥ −f(0) = 0. We can now set g = g1 ∨ g2, and

have condition (H6).

So we have shown the main result of this section. In the process of proving it, we came

across the value function of the problem. In the following, we want to prove a further

property of it.

Having in mind the Martingale Principle of Optimal Control, it is suggested that the

process (Yt)0≤t≤T given by the time t value of the objective

Yt : = exp

(−αR0 + α

∫ t

0

Xu · dSu − α∫ t

0

f(vu)du

)× inf

(Xu)t≤u≤T∈X (T−t,Xt)E[exp

(α

∫ T

t

Xu · dSu − α∫ T

t

f(vu)du

)]= e−αRt · inf

(Xu)t≤u≤T∈X (T−t,Xt)E[exp

(α

∫ T

t

Xu · dSu − α∫ T

t

f(vu)du

)]= V (T − t,Xt, Rt)

is a martingale under optimal control and a submartingale otherwise. Using dXt = −vt dt,we get that

dYt = dV (T − t,Xt, Rt)

=∂V

∂T

∂(T − t)∂t

dt+∇X0V · dXt +∂V

∂R0

dRt +1

2

∂2V

(∂R0)2d〈R〉t

= −∂V∂T

dt− v>t ∇X0V dt+∂V

∂R0

(X>t σ dBt + b>Xt dt− f(vt) dt

)+

1

2

∂2V

(∂R0)2X>t ΣXt dt

and looking at the drift term we deduce that

0 = infvt∈Rd

(−∂V∂T− v>t ∇X0V +

∂V

∂R0

b>Xt −∂V

∂R0

f(vt) +1

2

∂2V

(∂R0)2X>t ΣXt

)33

or equivalently the Hamilton-Jacobi-Bellman equation

∂V

∂T=

1

2X>t ΣXt

∂2V

(∂R0)2+ b>Xt

∂V

∂R0

+ infξ∈Rd

(−ξ>∇X0V −

∂V

∂R0

f(ξ)

). (19)

Note that we only have to take the infimum over the derivatives vt since Xt is already

known at time t, but vt can be chosen freely. Also note the typing error in [15, p. 14],

where the authors gave this equation with a minus sign in front of the infimum.

In addition to this differential equation it is sensible to assume the singular initial condition

limT0

V (T,X0, R0) =

e−αR0 , if X0 = 0,

∞ , otherwise.(20)

Here the singularity reflects the fact that we must finish the liquidation by time T . As

T 0, this is only possible if we already start with an empty portfolio.

Both the HJB-equation and the initial condition were observed by Schied, Schoneborn,

and Tehranchi in [15, p. 9].

We can show that the solution for the value function we obtained before, indeed, solves

this heuristically suggested differential equation with initial condition. [15, p. 14f.]

Proposition 3.10 The function V (T,X0, R0) as given in (14) solves the singular Cauchy

problem (19) and (20).

Proof. We define

S(T,X0) : = infX∈Xdet(T,X0)

∫ T

0

L(Xt, Xt)dt

= inf

∫ T

0

L(Yt, Yt)dt,

where as before the second infimum is taken over all Lipschitz continuous curves starting

at Y0 = 0 and ending at YT = X0 by lemma 3.9. In the proof of theorem 3.5, we have

already shown that conditions (H1)-(H6) hold for the Hamiltonian H corresponding to

the Lagrangian L. Therefore by theorem 3.7, we get

0 =∂S

∂T(T,X0) +H(X0,∇X0S(T,X0)) =

∂S

∂T(T,X0) +H(X0,−∇X0S(T,X0)).

Using this, we can now check that V (T,X0, R0) = exp(−αR0 + αS(T,X0)) solves (19):

∂V

∂R0

=∂

∂R0

e−αR0+αS(T,X0) = −αV,

∂2V

(∂R0)2= α2V,

34

∇X0V = ∇X0e−αR0+αS(T,X0) = αV∇X0S(T,X0) = − ∂V

∂R0

∇X0S(T,X0),

∂V

∂T=

∂

∂Te−αR0+αS(T,X0)

= αV∂S

∂T(T,X0)

= αV (−H(X0,−∇X0S(T,X0))

= −αV(−α

2X>0 ΣX0 + b>X0 + f ∗(∇X0S(T,X0))

)=

1

2X>0 ΣX0

∂2V

(∂R0)2+ b>X0

∂V

∂R0

+∂V

∂R0

f ∗

(−∇X0V

∂V∂R0

)

=1

2X>0 ΣX0

∂2V

(∂R0)2+ b>X0

∂V

∂R0

+ (−αV ) supξ∈Rd

(−ξ>∇X0V

∂V∂R0

− f(ξ)

)

=1

2X>0 ΣX0

∂2V

(∂R0)2+ b>X0

∂V

∂R0

+ infξ∈Rd

(−ξ>∇X0V −

∂V

∂R0

f(ξ)

),

where we used that −αV < 0 when replacing the supremum by the infimum.

It remains to show the singular initial condition. First we note that for X0 = 0, it holds

S(T, 0) = 0 for all T and therefore

limT0

V (T, 0, R0) = limT0

e−αR0+αS(T,0) = limT0

e−αR0 = e−αR0 .

Further, we derive that∫ T

0

L(Xt, Xt)dt =

∫ T

0

(α2X>t ΣXt − b>Xt

)dt+

∫ T

0

f(Xt)dt

≥∫ T

0

c3 dt+ Tf

(∫ T

0

Xt

Tdt

)

= Tc3 + Tf

(−X0

T

)using the lower bound c3 on α/2 · q>Σq− b>q from the proof of condition (H5) in theorem

3.5 and Jensen’s inequality for the convex function f . If X0 6= 0, by our assumption of f

having superlinear growth, the last term blows up as we send T 0. So S, as well as V ,

approach ∞ as T 0.

35

4 Conclusion

The aim of this essay was to introduce the Almgren model as an example of a model

for price impact on markets. We saw that under the strong assumptions of a linear

temporary impact and static trading strategies one can explicitly solve the mean-variance

minimisation of occurring costs. Also, we were able to show that with linear impact one

can use dynamic programming to derive optimal adaptive strategies which do strictly

better in minimising mean-variance since they react to price changes and adjust the

trading speed. In the last part of the essay we were able to prove a rather counter-intuitive

fact, namely that if one changes to an optimisation with respect to CARA utility there

is also a unique optimal trading strategy but it is a deterministic one. Interestingly, one

can get this optimal strategy by a mean-variance optimisation over static strategies again

where the level of risk aversion λ is given by half the constant absolute risk aversion α

from the CARA utility function. For the case of a one-asset market with linear temporary

impact, we then see that the optimal strategy is the one we derived in section 2.2.

We have, however, mentioned that a linear temporary impact is strongly refuted by em-

pirical analyses. Unfortunately, the Almgren model itself makes no prediction about the

shape of the impact. It only relies on some assumptions which are economically necessary

but do not describe the impact in more detail. It is, therefore, required to estimate the

impact from real data or to establish another model which describes its shape. One way

of doing this is to look at the so-called limit order book (LOB for short) which contains

all current limit orders, i. e. buy orders below or at the bid price and sell orders at or

above the ask price. When trading many shares one would first fulfil all limit orders at the

bid respectively ask price and then go on with orders at lower respectively higher prices.

So the number of shares available at each price determines the actual price one has to

pay for an order of a certain number of shares, and this can be converted into a price

impact. In order to determine permanent and temporary price impact, one would have to

know how the LOB behaves over time. If all fulfilled orders are instantaneously replaced

by new ones, we identify the price impact as temporary. Although the resilience of the

order book is an empirical fact, complete recovery until the next trade is only plausible

for long times between the trades or small trade sizes. The assumption in the Almgren

model that price impact only consists of permanent and temporary impact, therefore, is

maybe too strong to make it applicable to trading at short intervals. It would make sense

to introduce some decay of the impact as it was done by Gatheral in [8] or in the LOB

approach first introduced by Obizhaeva and Wang in [12]. For sensibly long intervals

between the executions of trades, however, the LOB is a good tool to estimate temporary

impact. Also the permanent impact can be modelled using this approach by identifying

the new bid respectively ask price which ensues after the recovery.

36

List of Figures

1 Optimal Trading Trajectories for Different Levels of Risk Aversion . . . . . 11

2 The Efficient Frontier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Adaptive Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

37

References

[1] R. Almgren and N. Chriss. Optimal execution of portfolio transactions. J. Risk 3:

5–39 (2000)

[2] R. Almgren, C. Thum, H. L. Hauptmann and H. Li. Direct estimation of equity

market impact. Risk 18: 57–62 (2005)

[3] R. F. Almgren. Optimal execution with nonlinear impact functions and trading-

enhanced risk. Applied Mathematical Finance 10(1): 1–18 (2003)

[4] BARRA. Market impact model handbook (Berkeley, California, Barra, 1997)

[5] S. H. Benton. The Hamilton-Jacobi equation: a global approach. Mathematics in

Science and Engineering. Academic Press, New York (1977)

[6] B. Biais, P. Hillion and C. Spatt. An empirical analysis of the limit order book and

the order flow in the Paris Bourse. Journal of Finance 50(5): 1655–1689 (1995)

[7] X. Gabaix, P. Gopikrishnan, V. Plerou and H. E. Stanley. A theory of power law

distributions in financial market fluctuations. Nature 423: 267–270 (2003)

[8] J. Gatheral. No-dynamic-arbitrage and market impact. Quantitative Finance 10(7):

749–759 (2010)

[9] G. Huberman and W. Stanzl. Price manipulation and quasi-arbitrage. Econometrica

72(4): 1247–1275 (2004)

[10] J. Lorenz and R. Almgren. Mean-variance optimal adaptive execution. Applied

Mathematical Finance 18(5-6): 395–422 (2011)

[11] S. Mallaby. More Money Than God. The Penguin Press, New York (2010)

[12] A. Obizhaeva and J. Wang. Optimal trading strategy and supply/demand dynamics.

Journal of Financial Markets 16(1): 1–32 (2013)

[13] M. Potters and J. P. Bouchaud. More statistical properties of order books and price

impact. Physica A: Statistical Mechanics and its Applications 324(1): 133–140 (2003)

[14] R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, New

Jersey (1970)

[15] A. Schied, T. Schoneborn and M. Tehranchi. Optimal basket liquidation for CARA

investors is deterministic. Applied Mathematical Finance 17(5-6): 471–489 (2010)

[16] R. Smith. Street hazard. The Wall Street Journal (1985)

38

Documents

Models of Price Impact Part III Essay Daniel Ritter