NMF

Numerical Methods for FinanceDr Robert Nurnberg

This course introduces the major numerical methods needed for quantitative work in

finance. To this avail, the course will strike a balance between a general survey of

significant numerical methods anyone working in a quantitative field should know, and a

detailed study of some numerical methods specific to financial mathematics. In the first

part the course will cover e.g.

linear and nonlinear equations, interpolation and optimization,

while the second part introduces e.g.

binomial and trinomial methods, finite difference methods, Monte-Carlo simulation,

random number generators, option pricing and hedging.

The assessment consists of 80% an exam and 20% a project.

1

1. References

1. Burden and Faires (2004), Numerical Analysis.

2. Clewlow and Strickland (1998), Implementing Derivative Models.

3. Fletcher (2000), Practical Methods of Optimization.

4. Glasserman (2004), Monte Carlo Methods in Financial Engineering.

5. Higham (2004), An Introduction to Financial Option Valuation.

6. Hull (2005), Options, Futures, and Other Derivatives.

7. Kwok (1998), Mathematical Models of Financial Derivatives.

8. Press et al. (1992), Numerical Recipes in C. (online)

9. Press et al. (2002), Numerical Recipes in C++.

10. Seydel (2006), Tools for Computational Finance.

11. Wilmott, Dewynne, and Howison (1993), Option Pricing.

2

12. Winston (1994), Operations Research: Applications and Algorithms.

3

2. Preliminaries

1. Algorithms.

An algorithm is a set of instructions to construct an approximate solution to a math-

ematical problem.

A basic requirement for an algorithm is that the error can be made as small as we like.

Usually, the higher the accuracy we demand, the greater is the amount of computation

required.

An algorithm is convergent if it produces a sequence of values which converge to the

desired solution of the problem.

Example: Given c > 1 and > 0, use the bisection method to seek an approxima-

tion toc with error not greater than .

4

Example

Find x =c, c > 1 constant.

Answer

x =c x2 = c f (x) := x2 c = 0

f (1) = 1 c < 0 and f (c) = c2 c > 0 x (1, c) s.t. f (x) = 0

f (x) = 2 x > 0 f monotonically increasing x is unique.Denote In := [an, bn] with I0 = [a0, b0] = [1, c]. Let xn :=

an+bn2 .

(i) If f (xn) = 0 then x = xn.

(ii) If f (an) f (xn) > 0 then x (xn, bn), let an+1 := xn, bn+1 := bn.(iii) If f (an) f (xn) < 0 then x (an, xn), let an+1 := an, bn+1 := xn.

5

Length of In : m(In) =12m(In1) = = 12n m(I0) = c12n

Algorithm

Algorithm stops if m(In) < and let x? := xn.

Error as small as we like?

x, x? In error |x? x| = |xn x| m(In) 0 as n. X

Convergence?

I0 I1 In

! x n=0

In, f (x) = 0 , i.e. x =c. X

Implementation:

No need to define In = [an, bn]. It is sufficient to store only 3 points throughout.

Suppose x (a, b), define x := a+b2 .If x (a, x) let a := a, b := x, otherwise let a := x, b := b.

6

2. Errors.

There are various errors in computed solutions, such as

discretization error (discrete approximation to continuous systems), truncation error (termination of an infinite process), and rounding error (finite digit limitation in computer arithmetic).If a is a number and a is an approximation to a, then

the absolute error is |a a| andthe relative error is

|a a||a| provided a 6= 0.

Example: Discuss sources of errors in deriving the numerical solution of the non-

linear differential equation x = f (x) on the interval [a, b] with initial conditionx(a) = x0.

7

Example

discretization error

x = f (x) [differential equation]x(t + h) x(t)

h= f (x(t)) [difference equation]

DE =

x(t + h) x(t)h x(t)

truncation error

limn xn = x, approximate x with xN , N a large number.

TE = |x xN |rounding error

We cannot express x exactly, due to finite digit limitation. We get x instead.

RE = |x x|Total error = DE + TE + RE.

8

3.Well/Ill Conditioned Problems.

A problem is well-conditioned (or ill-conditioned) if every small perturbation of the

data results in a small (or large) change in the solution.

Example: Show that the solution to equations x + y = 2 and x + 1.01 y = 2.01 is

ill-conditioned.

Exercise: Show that the following problems are ill-conditioned:

(a) solution to the differential equation x 10x 11x = 0 with initial conditionsx(0) = 1 and x(0) = 1,(b) value of q(x) = x2 + x 1150 if x is near a root.

9

Example

x + y = 2x + 1.01 y = 2.01 x = 1y = 1

Change 2.01 to 2.02:x + y = 2x + 1.01 y = 2.02 x = 0y = 2

I.e. 0.5% change in data produces 100% change in solution: ill-conditioned !

[reason: det

(1 1

1 1.01

)= 0.01 nearly singular]

10

4. Taylor Polynomials.

Suppose f, f , . . . , f (n) are continuous on [a, b] and f (n+1) exists on (a, b). Let x0 [a, b]. Then for every x [a, b], there exists a between x0 and x with

f (x) =n

k=0

f (k)(x0)

k!(x x0)k +Rn(x)

where Rn(x) =f (n+1)()

(n + 1)!(x x0)n+1 is the remainder.

[Equivalently: Rn(x) =

xx0

f (n+1)(t)

n!(x t)n dt.]

Examples:

exp(x) =k=0

xk

k!

sin(x) =k=0

(1)k(2k + 1)!

x2k+1

11

5. Gradient and Hessian Matrix.

Assume f : Rn R.The gradient of f at a point x, written as f (x), is a column vector in Rn with ithcomponent f

xi(x).

The Hessian matrix of f at x, written as 2f (x), is an n n matrix with (i, j)thcomponent

2fxixj

(x). [As 2f

xixj=

2fxjxi

, 2f (x) is symmetric.]

Examples:

f (x) = aTx, a Rn f = a, 2f = 0 f (x) = 12 xT Ax, A symmetric f (x) = Ax, 2f = A f (x) = exp(12 xT Ax), A symmetric

f (x) = exp(12 xT Ax)Ax,2f (x) = exp(12 xT Ax)AxxT A + exp(12 xT Ax)A

12

6. Taylors Theorem.

Suppose that f : Rn R is continuously differentiable and that p Rn.Then we have

f (x + p) = f (x) +f (x + t p)Tp ,for some t (0, 1).Moreover, if f is twice continuously differentiable, we have

f (x + p) = f (x) + 10

2f (x + t p) p dt,and

f (x + p) = f (x) +f (x)Tp + 12pT 2f (x + t p) p,

for some t (0, 1).

13

7. Positive Definite Matrices.

An n n matrix A = (aij) is positive definite if it is symmetric (i.e. AT = A) andxTAx > 0 for all x Rn \ {0}. [I.e. xTAx 0 with = only if x = 0.]The following statements are equivalent:

(a) A is a positive definite matrix,

(b) all eigenvalues of A are positive,

(c) all leading principal minors of A are positive.

The leading principal minors of A are the determinants k, k = 1, 2, . . . , n, defined

by

1 = det[a11], 2 = det

[a11 a12

a21 a22

], . . . , n = detA.

A matrix A is symmetric and positive semi-definite, if xTAx 0 for all x Rn.Exercise.

Show that any positive definite matrix A has only positive diagonal entries.

14

8. Convex Sets and Functions.

A set S Rn is a convex set if the straight line segment connecting any two pointsin S lies entirely inside S, i.e., for any two points x, y S we have

x + (1 ) y S [0, 1].A function f : D R is a convex function if its domain D Rn is a convex setand if for any two points x, y D we have

f (x + (1 ) y) f (x) + (1 ) f (y) [0, 1].Exercise.

Let D Rn be a convex, open set.(a) If f : D R is continuously differentiable, then f is convex if and only if

f (y) f (x) +f (x)T (y x) x, y D .(b) If f is twice continuously differentiable, then f is convex if and only if 2f (x) is

positive semi-definite for all x in the domain.

15

Exercise 2.8.

(a) As f is convex we have for any x, y in the convex set D that

f ( y + (1 )x) f (y) + (1 ) f (x) [0, 1].Hence

f (y) f (x + (y x)) f (x)

+ f (x) .

Letting 0 yields f (y) f (x) +f (x)T (y x).For any x1, x2 D and [0, 1] let x := x1 + (1 )x2 D and y := x1.On noting that y x = x1 x1 (1 )x2 = (1 ) (x1 x2) we have thatf (x1) = f (y) f (x) +f (x)T (y x) = f (x) + (1 )f (x)T (x1 x2) . ()

Similarly, letting x := x1 + (1 )x2 and y := x2 gives, on noting that y x = (x2 x1), that

f (x2) f (x) + f (x)T (x2 x1) . ()16

Combining () + (1 ) () gives f (x1) + (1 ) f (x2) f (x) = f (x1 + (1 )x2 f is convex.

(b) For any x, x0 D use Taylors theorem at x0:f (x) = f (x0)+f (x0)T (xx0)+1

2(xx0)T 2f (x0+ (xx0)) (xx0) (0, 1)

As 2f is positive semi-definite, this immediately givesf (x) f (x0) +f (x0)T (x x0) f is convex.

Assume 2f is not positive semi-definite in the domain D. Then there exists x0 Dand x Rn s.t. xT 2f (x0) x < 0. As D is open we can find x1 := x0 + x D, for > 0 sufficiently small. Hence (x1 x0)T 2f (x0) (x1 x0) < 0. For x1 sufficientlyclose to x0 the continuity of 2f gives (x1 x0)T 2f (x0 + (x1 x0)) (x1 x0) < 0for all (0, 1). Taylors theorem then yields f (x1) < f (x0)+f (x0)T (x1x0). Thiscontradicts f being convex, see (a).

17

9. Vector Norms.

A vector norm on Rn is a function, , from Rn into R with the following properties:(i) x 0 for all x Rn and x = 0 if and only if x = 0.(ii) x = || x for all R and x Rn.(iii) x + y x + y for all x, y Rn.Common vector norms are the l1, l2 (Euclidean), and lnorms:

x1 =ni=1

|xi|, x2 ={

ni=1

x2i

}1/2, x = max

1in|xi|.

Exercise.

(a) Prove that 1, 2 and are norms.(b) Given a symmetric positive definite matrix A, prove that

xA :=xT Ax

is a norm.

18

Example.

Draw graphs defined by x1 1, x2 1, x 1 when n = 2.l1 l2 l

Exercise. Prove that for all x, y Rn we have

(a)

ni=1

|xi yi| x2 y2 [Scharz inequality]

and

(b)1nx2 x x1

n x2.

19

10. Spectral Radius.

The spectral radius of a matrix A Rnn is defined by (A) = max1in

|i|, where1, . . . , n are all the eigenvalues of A.

11.Matrix Norms.

For an n n matrix A, the natural matrix norm A for a given vector norm is defined by

A = maxx=1

Ax.The common matrix norms are

A1 = max1jn

ni=1

|aij|, A2 =(ATA)

(A) if A=AT

, A = max1in

nj=1

|aij|.

Exercise: Compute A1, A, and A2 for A =[

1 1 0

1 2 1

1 1 2

].

(Answer: A1 = A = 4 and A2 =7 +

7 3.1.)

20

12. Convergence.

A sequence of vectors {x(k)} Rn is said to converge to a vector x Rn ifx(k) x 0 as k for an arbitrary vector norm . This is equivalentto the componentwise convergence, i.e., x

(k)i xi as k , i = 1, . . . , n.

A square matrix A Rnn is said to be convergent if Ak 0 as k , whichis equivalent to (Ak)ij 0 as k for all i, j.The following statements are equivalent:

(i) A is a convergent matrix,

(ii) (A) < 1,

(iii) limkAk x = 0, for every x Rn.Exercise. Show that A is convergent, where

A =

[1/2 0

1/4 1/2

].

21

3. Algebraic Equations

1. Decomposition Methods for Linear Equations.

A matrix A Rnn is said to have LU decomposition if A = LU where L Rnnis a lower triangular matrix (lij = 0 if 1 i < j n) and U Rnn is an uppertriangular matrix (uij = 0 if 1 j < i n).The decomposition is unique if one assumes e.g. lii = 1 for 1 i n.

L =

l11

l21 l22

l31 l32 l33... . . .

ln1 ln2 ln3 . . . lnn

, U =u11 u12 u13 . . . u1n

u22 u23 . . . u2n. . . ...

un1,n1 un1,nunn

22

In general, the diagonal elements of either L or U are given and the remaining elements

of the matrices are determined by directly comparing two sides of the equation.

The linear system Ax = b is then equivalent to Ly = b and U x = y.

Exercise.

Show that the solution to Ly = b is

y1 = b1/l11, yi = (bi i1k=1

lik yk)/lii, i = 2, . . . , n

(forward substitution) and the solution to U x = y is

xn = yn/unn, xi = (yi n

k=i+1

uik xk)/uii, i = n 1, . . . , 1

(backward substitution).

23

2. Crout Algorithm.

Exercise.

Let A be tridiagonal, i.e. aij = 0 if |i j| > 1 (aij = 0 except perhaps ai1,i, aii andai,i+1), and strictly diagonally dominant (|aii| >

j 6=i |aij| holds for i = 1, . . . , n).

Show that A can be factorized as A = LU where lii = 1 for i = 1, . . . , n, u11 = a11,

and

ui,i+1 = ai,i+1

li+1,i = ai+1,i/uii

ui+1,i+1 = ai+1,i+1 li+1,i ui,i+1for i = 1, . . . , n 1. [Note: L and U are tridiagonal.]C++ Exercise: Write a program to solve a tridiagonal and strictly diagonally dom-

inant linear equation Ax = b by the Crout algorithm. The input are the number of

variables n, the matrix A, and the vector b. The output is the solution x.

24

Exercise 3.2.

u11 = a11 and ui,i+1 = ai,i+1, li+1,i = ai+1,i/uii, ui+1,i+1 = ai+1,i+1 li+1,i ui,i+1, fori = 1, . . . , n 1, can easily be shown.It remains to show that uii 6= 0 for i = 1, . . . , n. We proceed by induction to showthat

|uii| > |ai,i+1|,where for convenience we define an,n+1 := 0.

i = 1: |u11| = |a11| > |a1,2| X i 7 i + 1:

|ui+1,i+1| =ai+1,i+1 ai+1,i ai,i+1uii

|ai+1,i+1| |ai+1,i| |ai,i+1||uii| |ai+1,i+1| |ai+1,i| > |ai+1,i+2| X

Overall we have that |uii| > 0 and so the Crout algorithm is well defined. Moreover,det(A) = det(L) det(U) = det(U) =

ni=1 uii 6= 0.25

3. Choleski Algorithm.

Exercise.

Let A be a positive definite matrix. Show that A can be factorized as A = LLT

where L is a lower triangular matrix.

(i) Compute 1st column:

l11 =a11, li1 = ai1/l11, i = 2, . . . , n.

(ii) For j = 2, . . . , n 1 compute jth column:

ljj = (ajj j1k=1

l2jk)12

lij = (aij j1k=1

lik ljk)/ljj, i = j + 1, . . . , n.

26

(iii) Compute nth column:

lnn = (ann n1k=1

l2nk)12 .

27

4. Iterative Methods for Linear Equations.

Split A into A = M +N with M nonsingular and convert the equation Ax = b into

an equivalent equation x = C x + d with C = M1N and d = M1 b.Choose an initial vector x(0) and then generate a sequence of vectors by

x(k) = C x(k1) + d, k = 1, 2, . . .

The resulting sequence converges to the solution of Ax = b, for an arbitrary initial

vector x(0), if and only if (C) < 1.

The objective is to choose M such that M1 is easy to compute and (C) < 1.

The iteration stops if x(k) x(k1) < .[Note: In practice one solves M x(k) = N x(k1) + b, for k = 1, 2, . . ..]

28

Claim.

The iteration x(k) = C x(k1) + d is convergent if and only if (C) < 1.Proof.

Define e(k) := x(k) x, the error of the kth iterate. Thene(k) = C x(k1) + d (C x + d) = C (x(k1) x) = C e(k1) = C2 e(k2) = . . . Ck e(0) ,where e(0) = x(0) x is an arbitrary vector.Assume C is similar to the diagonal matrix = diag(1, . . . , n), where i are the

eigenvalues of C.

X nonsingular s.t. C = X X1

e(k) = Ck e(0) = X k X1 e(0) = X

k1. . .

kn

X1 e(0) 0 as k |i| < 1 i = 1, . . . , n (C) < 1 .

29

5. Jacobi Algorithm.

Exercise: Let M = D and N = L + U (L strict lower triangular part of A, D

diagonal, U strict upper triangular part). Show that the ith component at the kth

iteration is

x(k)i =

1

aii

bi i1j=1

aij x(k1)j

nj=i+1

aij x(k1)j

for i = 1, . . . , n.

6. GaussSeidel Algorithm.

Exercise: Let M = D + L and N = U . Show that the ith component at the kth

iteration is

x(k)i =

1

aii

bi i1j=1

aij x(k)j

nj=i+1

aij x(k1)j

for i = 1, . . . , n.

30

7. SOR Algorithm.

Exercise.

Let M = 1D + L and N = U + (1 1

)D where 0 < < 2. Show that the ith

component at the kth iteration is

x(k)i = (1 )x(k1)i +

1

aii

bi i1j=1

aij x(k)j

nj=i+1

aij x(k1)j

for i = 1, . . . , n.

C++ Exercise: Write a program to solve a diagonally dominant linear equation

Ax = b by the SOR algorithm. The input are the number of variables n, the matrix

A, the vector b, the initial vector x0, the relaxation parameter , and tolerance .

The output is the number of iterations k and the approximate solution xk.

31

8. Special Matrices.

If A is strictly diagonally dominant, then Jacobi and GaussSeidel converge for any

initial vector x(0). In addition, SOR converges for (0, 1].If A is positive definite and 0 < < 2, then the SOR method converges for any

initial vector x(0).

If A is positive definite and tridiagonal, then (CGS) = [(CJ)]2 < 1 and the optimal

choice of for the SOR method is =2

1 +1 (CGS)

[1, 2). With this choiceof , (CSOR) = 1 (CGS).Exercise.

Find the optimal for the SOR method for the matrix

A =

4 3 03 4 10 1 4

.(Answer: 1.24.)

32

9. Condition Numbers.

The condition number of a nonsingular matrix A relative to a norm is definedby

(A) = A A1.Note that (A) AA1 = I = max

x=1x = 1.

A matrix A is well-conditioned if (A) is close to one and is ill-conditioned if (A)

is much larger than one.

Suppose A < 1A1. Then the solution x to (A+ A) x = b+ b approximatesthe solution x of Ax = b with error estimate

x xx

(A)

1 A A1(bb +

AA

).

In particular, if A = 0 (no perturbation to matrix A) then

x xx (A)

bb .

33

Example.

Consider Example 1.3.

A =

(1 1

1 1.01

)

A1 = 1detA

(1.01 11 1

)=

1

0.01

(1.01 11 1

)=

(101 100100 100

)Recall

A1 = max1jn

ni=1

|aij| .Hence

A1 = max(2, 2.01) = 2.01 , A11 = max(201, 200) = 201 .

1(A) = A1 A11 = 404.01 1 (ill-conditioned!)

Similarly = 404.01 and 2 = (A) (A1) = maxmin = 402.0075.

34

10. Hilbert Matrix.

An n n Hilbert matrix Hn = (hij) is defined by hij = 1/(i + j 1) for i, j =1, 2, . . . , n.

Hilbert matrices are notoriously ill-conditioned and (Hn) very rapidly asn.

Hn =

11

2. . .

1

n

1

2

1

3. . .

1

n + 1... ...

1

n

1

n + 1. . .

1

2n 1

Exercise.

Compute the condition numbers 1(H3) and 1(H6).

35

(Answer: 1(H3) = 748 and 1(H6) = 2.9 106.)

36

11. Fixed Point Method for Nonlinear Equations.

A function g : R R has a fixed point x ifg(x) = x.

A function g is a contraction mapping on [a, b] if g : [a, b] [a, b] and|g(x)| L < 1, x (a, b)

where L is a constant.

Exercise.

Assume g is a contraction mapping on [a, b]. Prove that g has a unique fixed point x

in [a, b], and for any x0 [a, b], the sequence defined byxn+1 = g(xn), n 0,

converges to x. The algorithm is called fixed point iteration.

37

Exercise 3.11.

Existence:

Define h(x) = x g(x) on [a, b]. Then h(a) = a g(a) 0 and h(b) = b g(b) 0.As h is continuous, c [a, b] s.t. h(c) = 0. I.e. c = g(c). XUniqueness:

Suppose p, q [a, b] are two fixed points. Then|p q| = |g(p) g(q)| =

MVT, (a,b)|g() (p q)| L |p q|

(1 L) |p q| 0 |p q| 0 p = q . XConvergence:

|xn x| = |g(xn1) g(x)| = |g() (xn1 x)| L |xn1 x| . . . Ln |x0 x| 0 as n .

Hence

xn x as n . X38

12. Newton Method for Nonlinear Equations.

Assume that f C1([a, b]), f (x) = 0 (x is a root or zero) and f (x) 6= 0.The Newton method can be used to find the root x by generating a sequence {xn}satisfying

xn+1 = xn f (xn)f (xn)

, n = 0, 1, . . .

provided f (xn) 6= 0 for all n.The sequence xn converges to the root x as long as the initial point x0 is sufficiently

close to x.

The algorithm stops if |xn+1 xn| < , a prescribed error tolerance, and xn+1 istaken as an approximation to x.

Geometric Interpretation:

Tangent line at (xn, f (xn)) is

Y = f (xn) + f(xn) (X xn).

39

Setting Y = 0 yields xn+1 := X = xn f(xn)f (xn).

40

13. Choice of Initial Point.

Suppose f C2([a, b]) and f (x) = 0 with f (x) 6= 0. Then there exists > 0 suchthat the Newton method generates a sequence xn converging to x for any initial point

x0 [x , x + ] (x0 can only be chosen locally).However, if f satisfies the following additional conditions:

1. f (a)f (b) < 0,

2. f does not change sign on [a, b],

3. tangent lines to the curve y = f (x) at both a and b cut the x-axis within [a, b];

(i.e. a f(a)f (a), b f(b)f (b) [a, b])

then f (x) = 0 has a unique root x in [a, b] and Newton method converges to x for

any initial point x0 [a, b] (x0 can be chosen globally).Example.

Use the Newton method to compute x =c, c > 1, and show that the initial point

can be any point in [1, c].

41

Example.

Find x =c, c > 1.

Answer.

x is root of f (x) := x2 c.Newton: xn+1 = xn f (xn)

f (xn)= xn x

2n c2 xn

=1

2

(xn +

c

xn

), n 0.

How to choose x0?

Check the 3 conditions on [1, c].

1. f (1) = 1 c < 0, f (c) = c2 c > 0. f (1) f (c) < 0 X2. f = 2 X

3. Tangent line at 1: Y = f (1) + f (1) (X 1) = 1 c + 2 (X 1)Let Y = 0, then X = 1 +

c 12

(1, c). XTangent line at c: Y = f (c) + f (c) (X c) = c2 c + 2 c (X c)Let Y = 0, then X = c c 1

2 (1, c). X

42

Newton convergence for any x0 [1, c].

43

Numerical Example.

Find7. (From calculator:

7 = 2.6457513.)

Newton converges for all x0 [1, 7]. Choose x0 = 4.x1 =

1

2

(x0 +

7

x0

)= 2.875

x2 = 2.6548913

x3 = 2.6457670

x4 = 2.6457513

Comparison to bisection method with I0 = [1, 7]:

I1 = [1, 4]

I2 = [2.5, 4]

I3 = [2.5, 3.25]

I4 = [2.5, 2.875]...

I25 = [2.6457512, 2.6457513]

44

14. Pitfalls.

Here are some difficulties which may be encountered with the Newton method:

1. {xn} may wander around and not converge (there are only complex roots to theequation),

2. initial approximation x0 is too far away from the desired root and {xn} convergesto some other root (this usually happens when f (x0) is small),

3. {xn} may diverge to + (the function f is positive and monotonically decreasingon an unbounded interval), and

4. {xn} may repeat (cycling).

45

15. Rate of Convergence.

Suppose {xn} is a sequence that converges to x.The convergence is said to be linear if there is a constant r (0, 1) such that

|xn+1 x||xn x| r, for all n sufficiently large.

The convergence is said to be superlinear if

limn

|xn+1 x||xn x| = 0.

In particular, the convergence is said to be quadratic if

|xn+1 x||xn x|2 M, for all n sufficiently large.

where M is a positive constant, not necessarily less than 1.

Example. xn = x + 0.5n linear, xn = x + 0.5

2n quadratic.

Example. Show that the Newton method converges quadratically.

46

Example.

Define g(x) = x f (x)f (x)

. Then the Newton method is given by

xn+1 = g(xn) .

Moreover, f (x) = 0 and f (x) 6= 0 imply thatg(x) = x ,

g(x) = 1 (f)2 f f (f )2

(x) = f (x)f (x)(f (x))2

= 0 ,

g(x) =f (x)f (x)

.

Assuming that xn x we have that|xn+1 x||xn x|2 =

|g(xn) g(x)||xn x|2 =Taylor |g

(x) (xn x) + 12 g(n) (xn x)2||xn x|2

=1

2|g(n)| 1

2|g(x)| =: as n .

47

Hence |xn+1 x| |xn x|2 quadratic convergence rate.

48

4. Interpolations

1. Polynomial Approximation.

For any continuous function f defined on an interval [a, b], there exist polynomials P

that can be as close to the given function as desired.

Taylor polynomials agree closely with a given function at a specific point, but they

concentrate their accuracy only near that point.

A good polynomial needs to provide a relatively accurate approximation over an entire

interval.

49

2. Interpolating Polynomial Lagrange Form.

Suppose xi [a, b], i = 0, 1, . . . , n, are pairwise distinct mesh points in [a, b].The Lagrange polynomial p is a polynomial of degree n such that

p(xi) = f (xi), i = 0, 1, . . . , n.

p can be constructed explicitly as

p(x) =

ni=0

Li(x)f (xi)

where Li is a polynomial of degree n satisfying

Li(xj) = 0, j 6= i, Li(xi) = 1.This results in

Li(x) =j 6=i

(x xjxi xj

)i = 0, 1, . . . , n.

p is called linear interpolation if n = 1 and quadratic interpolation if n = 2.

50

Exercise.

Find the Lagrange polynomial p for the following points (x, f (x)): (1, 0), (1,3),and (2,4). Assume that a new point (0, 2) is observed, and construct a Lagrangepolynomial to incorporate this new information in it.

Error formula.

Suppose f is n + 1 times differentiable on [a, b]. Then it holds that

f (x) = p(x) +f (n+1)()

(n + 1)!(x x0) (x xn),

where = (x) lies in (a, b).

Proof.

Define g(x) = f (x) p(x) + nj=0

(x xj), where R is a constant. Clearly,g(xj) = 0 for j = 0, . . . , n. To estimate the error at x = 6 {x0, . . . , xn}, choose such that g() = 0.

51

Hence

g(x) = f (x) p(x) (f () p())nj=0

(x xj xj

).

g has at least n + 2 zeros: x0, . . . , xn, .Mean Value Theorem yields that

g has at least n + 1 zeros...

g(n+1) has at least 1 zero, say

Moreover, as p is polynomial of degree n, it holds that p(n+1) = 0.

52

Hence

0 = g(n+1)() = f (n+1)() (f () p()) (n + 1)!nj=0( xj)

Error = f () p() = f(n+1)()

(n + 1)!

nj=0

( xj) .

53

3. Trapezoid Rule.

We can use linear interpolation (n = 1, x0 = a, x1 = b) to approximate f (x) on [a, b]

and then compute baf (x) dx to get the trapezoid rule: b

a

f (x) dx 12(b a) [f (a) + f (b)] .

If we partition [a, b] into n equal subintervals with mesh points xi = a + ih, i =

0, . . . , n, and step size h = (b a)/n, we can derive the composite trapezoid rule: ba

f (x) dx h2[f (x0) + 2

n1i=1

f (xi) + f (xn)] .

The approximation error is in the order of O(h2) if |f | is bounded.

54

Use linear interpolation (n = 1, x0 = a, x1 = b) to approximate f (x) on [a, b] and then

compute baf (x) dx.

Answer.

The linear interpolating polynomial is p(x) = f (a)L0(x) + f (b)L1(x), where

L0(x) =x ba b , and L1(x) =

x ab a .

ba

f (x) dx ba

p(x) dx = f (a)

ba

x ba b dx + f (b)

ba

x ab a dx

= f (a)1

a b1

2(x b)2 |ba +f (b)

1

b a1

2(x a)2 |ba

= f (a)1

a b1

2((a b)2) + f (b) 1

b a1

2(b a)2

=b a2

(f (a) + f (b)) Trapezoid Rule

[Of course, one could have arrived at this formula with a simple geometric argument.]

55

Error Analysis.

Let f (x) = p(x) + E(x), where E(x) =f ()2

(x a) (x b) with (a, b). Assumethat |f | M is bounded. Then

ba

E(x) dx

ba

|E(x)| dx M2

ba

(x a) (b x) dx

=M

2

ba

(x a) [(b a) (x a)] dx

=M

2

ba

[(x a)2 + (b a) (x a)] dx=M

2

[13(b a)3 + 1

2(b a)3

]dx

=M

12(b a)3 .

56

The composite formula can be obtained by considering the partitioning of [a, b] into

a = x0 < x1 < . . . < xn1 < xn = b, where xi = a + i h with h :=b an

.

ba

f (x) dx =n1i=0

xi+1xi

f (x) dx n1i=0

xi+1 xi2

(f (xi) + f (xi+1))

=n1i=0

h

2(f (xi) + f (xi+1))

= h

[1

2f (a) + f (x1) + . . . + f (xn1) +

1

2f (b)

].

Error analysis then yields that

Error M12

h3 n =M (b a)

12h2 = O(h2) .

57

4. Simpsons Rule.

Exercise.

Use quadratic interpolation (n = 2, x0 = a, x1 =a+b2 , x2 = b) to approximate f (x)

on [a, b] and then compute baf (x) dx to get the Simpsons rule: b

a

f (x) dx 16(b a) [f (a) + 4 f (a + b

2) + f (b)] .

Derive the composite Simpsons rule: ba

f (x) dx h3[f (x0) + 2

n/2i=2

f (x2i2) + 4n/2i=1

f (x2i1) + f (xn)] ,

where n is an even number and xi and h are chosen as in the composite trapezoid

rule.

[One can show that the approximation error is in the order of O(h4) if |f (4)| isbounded.]

58

5. NewtonCotes Formula.

Suppose x0, . . . , xn are mesh points in [a, b], usually mesh points are equally spaced

and x0 = a, xn = b, then integral can be approximated by the NewtonCotes

formula: ba

f (x) dx ni=0

Ai f (xi)

where parameters Ai are determined in such a way that the integral is exact for all

polynomials of degree n.[Note: n+1 unknownsAi and n+1 coefficients for polynomial of degree n.] Exercise.

Use NewtonCotes formula to derive the trapezoid rule and the Simpsons rule. Prove

that if f is n + 1 times differentiable and |f (n+1)| M on [a, b] then

| ba

f (x) dxni=0

Ai f (xi)| M(n + 1)!

ba

ni=0

|x xi| dx.

59

Exercise 4.5.

We have that ba

q(x) dx =

ni=0

Ai q(xi) for all polynomials q of degree n.

Let q(x) = Lj(x), where Lj is the jth Lagrange polynomial for the data points x0, x1, . . . , xn.

I.e. Lj is of degree n and satisfies Lj(xi) = ij =

1 i = j0 i 6= j . Now ba

Lj dx =ni=0

Ai Lj(xi) = Aj

ba

f (x) dx ni=0

Ai f (xi) =ni=0

f (xi)

ba

Li(x) dx

=

ba

ni=0

f (xi)Li(x) dx =

ba

p(x) dx,

where p(x) is the interpolating Lagrange polynomial to f . Hence we find trapezoid

60

(n = 1) and Simpsons rule (n = 2, with x1 =a+b2 ).

The Lagrange polynomial has the error term

f (x) = p(x) + E(x), E(x) :=f (n+1)()

(n + 1)!(x x0) (x xn),

where = (x) lies in (a, b). Hence ba

f (x) dx ba

p(x) dx

= ba

E(x) dx

ba

|E(x)| dx

M(n + 1)!

ba

ni=0

|x xi| dx.

61

6. Ordinary Differential Equations.

An initial value problem for an ODE has the form

x(t) = f (t, x(t)), a t b and x(a) = x0. (1)(1) is equivalent to the integral equation:

x(t) = x0 +

ta

f (s, x(s)) ds, a t b. (2)

To solve (2) numerically we divide [a, b] into subintervals with mesh points ti = a+ih,

i = 0, . . . , n, and step size h = (b a)/n. (2) implies

x(ti+1) = x(ti) +

ti+1ti

f (s, x(s)) ds, i = 0, . . . , n 1.

62

(a) If we approximate f (s, x(s)) on [ti, ti+1] by f (ti, x(ti)), we get the Euler (explicit)

method for equation (1):

wi+1 = wi + hf (ti, wi), w0 = x0.

We have x(ti+1) wi+1 if h is sufficiently small.[Taylor: x(ti+1) = x(ti) + x

(ti)h +O(h2) = x(ti) + f (ti, x(ti))h +O(h2).]

(b) If we approximate f (s, x(s)) on [ti, ti+1] by linear interpolation with points (ti, f (ti, x(ti)))

and (ti+1, f (ti+1, x(ti+1))), we get the trapezoidal (implicit) method for equation

(1):

wi+1 = wi +h

2[f (ti, wi) + f (ti+1, wi+1)], w0 = x0.

(c) If we combine the Euler method with the trapezoidal method, we get the modified

Euler (explicit) method (or RungeKutta 2nd order method):

wi+1 = wi +h

2[f (ti, wi) + f (ti+1, wi + hf (ti, wi)

wi+1)], w0 = x0.

63

7. Divided Differences.

Suppose a function f and (n + 1) distinct points x0, x1, . . . , xn are given. Divided

differences of f can be expressed in a table format as follows:

xk 0DD 1DD 2DD 3DD . . .

x0 f [x0]

x1 f [x1] f [x0, x1]

x2 f [x2] f [x1, x2] f [x0, x1, x2]

x3 f [x3] f [x2, x3] f [x1, x2, x3] f [x0, x1, x2, x3]... . . .

64

where f [xi] = f (xi)

f [xi, xi+1] =f [xi+1] f [xi]xi+1 xi

f [xi, xi+1, . . . , xi+k] =f [xi+1, xi+2, . . . , xi+k] f [xi, xi+1, . . . , xi+k1]

xi+k xif [x1, . . . , xn] =

f [x2, . . . , xn] f [x1, . . . , xn1]xn x1

65

8. Interpolating Polynomial Newton Form.

One drawback of Lagrange polynomials is that there is no recursive relationship

between Pn1 and Pn, which implies that each polynomial has to be constructedindividually. Hence, in practice one uses the Newton polynomials.

The Newton interpolating polynomial Pn of degree n that agrees with f at the points

x0, x1, . . . , xn is given by

Pn(x) = f [x0] +

nk=1

f [x0, x1, . . . , xk]

k1i=0

(x xi).

Note that Pn can be computed recursively using the relation

Pn(x) = Pn1(x) + f [x0, x1, . . . , xn](x x0)(x x1) (x xn1).[Note that f [x0, x1, . . . , xk] can be found on the diagonal of the DD table.]

Exercise.

Suppose values (x, y) are given as (1,2), (2,56), (0,2), (3, 4), (1,16), and(7, 376). Is there a cubic polynomial that takes these values? (Answer: 2x3 7x2 +

66

5x 2.)

67

Exercise 4.8.

Data points: (1,2), (2,56), (0,2), (3, 4), (1,16), (7, 376).xk 0DD 1DD 2DD 3DD 4DD 5DD

1 22 56 180 2 27 93 4 2 5 2

1 16 5 3 2 07 376 49 11 2 0 0

Newton polynomial:

p(x) = 2 + 18 (x 1) 9 (x 1) (x + 2) + 2 (x 1) (x + 2) x= 2x3 7x2 + 5x 2 .

[Note: We could have stopped at the row for x3 = 3 and checked whether p3 goes through

the remaining data points.]

68

9. Piecewise Polynomial Approximations.

Another drawback of interpolating polynomials is that Pn tends to oscillate widely

when n is large, which implies that Pn(x) may be a poor approximation to f (x) if x

is not close to the interpolating points.

If an interval [a, b] is divided into a set of subintervals [xi, xi+1], i = 0, 1, . . . , n1, andon each subinterval a different polynomial is constructed to approximate a function

f , such an approximation is called spline.

The simplest spline is the linear spline P which approximates the function f on the

interval [xi, xi+1], i = 0, 1, . . . , n 1, with P agreeing with f at xi and xi+1.Linear splines are easy to construct but are not smooth at points x1, x2, . . . , xn1.

69

10. Natural Cubic Splines.

Given a function f defined on [a, b] and a set of points a = x0 < x1 < < xn = b,a function S is called a natural cubic spline if there exist n cubic polynomials Si

such that:

(a) S(x) = Si(x) for x in [xi, xi+1] and i = 0, 1, . . . , n 1;(b) Si(xi) = f (xi) and Si(xi+1) = f (xi+1) for i = 0, 1, . . . , n 1;(c) S i+1(xi+1) = S i(xi+1) for i = 0, 1, . . . , n 2;(d) S i+1(xi+1) = S i (xi+1) for i = 0, 1, . . . , n 2;(e) S (x0) = S (xn) = 0.

70

Natural Cubic Splines.

| | |

xi xi+1 xi+2

Si Si+1

(a) 4n parameters

(b) 2n equations

(c) n 1 equations(d) n 1 equations(e) 2 equations

4n equations

71

Example. Assume S is a natural cubic spline that interpolates f C2([a, b]) atthe nodes a = x0 < x1 < < xn = b. We have the following smoothness propertyof cubic splines: b

a

[S (x)]2 dx ba

[f (x)]2 dx.

In fact, it even holds that ba

[S (x)]2 dx = mingG

ba

[g(x)]2 dx,

where G := {g C2([a, b]) : g(xi) = f (xi) i = 0, 1, . . . , n}.Exercise: Determine the parameters a to h so that S(x) is a natural cubic spline,

where

S(x) = ax3 + bx2 + cx + d for x [1, 0] andS(x) = ex3 + fx2 + gx + h for x [0, 1]with interpolation conditions S(1) = 1, S(0) = 2, and S(1) = 1.(Answer: a = 1, b = 3, c = 1, d = 2, e = 1, f = 3, g = 1, h = 2.)

72

11. Computation of Natural Cubic Splines.

Denote

ci = S(xi), i = 0, 1, . . . , n.

Then c0 = cn = 0.

Since Si is a cubic function on [xi, xi+1], we know that Si is a linear function on

[xi, xi+1]. Hence it can be written as

S i (x) = cixi+1 x

hi+ ci+1

x xihi

where hi := xi+1 xi.

73

Exercise.

Show that Si is given by

Si(x) =ci

6hi(xi+1 x)3 + ci+1

6hi(x xi)3 + pi (xi+1 x) + qi (x xi),

where

pi =

(f (xi)

hi ci hi

6

), qi =

(f (xi+1)

hi ci+1 hi

6

)and c1, . . . , cn1 satisfy the linear equations:

hi1 ci1 + 2 (hi1 + hi) ci + hi ci+1 = ui,

where

ui = 6 (di di1), di = f (xi+1) f (xi)hi

for i = 1, 2, . . . , n 1.C++ Exercise: Write a program to construct a natural cubic spline. The inputs are

the number of points n + 1 and all the points (xi, yi), i = 0, 1, . . . , n. The output is

a natural cubic spline expressed in terms of the functions Si defined on the intervals

[xi, xi+1] for i = 0, 1, . . . , n 1.74

5. Basic Probability Theory

1. CDF and PDF.

Let (,F , P ) be a probability space, X be a random variable.The cumulative distribution function (cdf) F of X is defined by

F (x) = P (X x), x R.F is an increasing right-continuous function satisfying

F () = 0, F (+) = 1.If F is absolutely continuous thenX has a probability density function (pdf) f defined

by

f (x) = F (x), x R.F can be recovered from f by the relation

F (x) =

x

f (u) du.

75

2. Normal Distribution.

A random variable X has a normal distribution with parameters and 2, written

X N(, 2), if X has the pdf(x) =

1

2e(x)2

2 2

for x R.If = 0 and 2 = 1 then X is called a standard normal random variable and its cdf

is usually written as

(x) =

x

12e

u2

2 du.

If X N(, 2) then the characteristic function (Fourier transform) of X is givenby

c(s) = E(ei sX) = ei t2 s2

2 ,

where i =1. The moment generating function (mgf) is given by

m(s) = E(esX) = e t+2 s2

2 .

76

3. Approximation of Normal CDF.

It is suggested that the standard normal cdf (x) can be approximated by a poly-

nomial (x) as follows:

(x) := 1 (x) (a1k + a2k2 + a3k3 + a4k4 + a5k5) (3)when x 0 and (x) := 1 (x) when x < 0.The parameters are given by k = 11+x, = 0.2316419, a1 = 0.319381530, a2 =

0.356563782, a3 = 1.781477937, a4 = 1.821255978, and a5 = 1.330274429. Thisapproximation has a maximum absolute error less than 7.5 108 for all x.C++ Exercise: Write a program to compute (x) with (3) and compare the result

with that derived with the composite Simpson Rule (see Exercise 4.4). The input is

a variable x and the number of partitions n over the interval [0, x]. The output is the

results from the two methods and their error.

77

4. Lognormal Random Variable.

Let Y = eX and X be a N(, 2) random variable. Then Y is a lognormal random

variable.

Exercise: Show that

E(Y ) = e+12

2, E(Y 2) = e2+2

2.

5. An Important Formula in Pricing European Options.

If V is lognormally distributed and the standard deviation of ln V is s then 1

E(max(V K, 0)) = E(V ) (d1)K (d2)where

d1 =1

slnE(V )

K+s

2and d2 = d1 s.

1K will be later denoted by X. We use a different notation here to avoid an abuse of notation, as K is not a random variable.

78

E(V K)+ = E(V ) (d1)K (d2), d1 = 1slnE(V )

K+s

2and d2 = d1 s.

Proof.

Let g be the pdf of V . Then

E(V K)+ =

(v K)+ g(v) dv = K

(v K) g(v) dv.

As V is lognormal, ln V is normal N(m, s2), where m = ln(E(V )) 12 s2.Let Y := lnVm

s, i.e. V = em+s Y . Then Y N(0, 1) with pdf: (y) = 1

2e

y2

2 .

E(V K)+ = E(em+s Y K)+ = lnKm

s

(em+s y K) (y) dy

=

lnKm

s

em+s y (y) dy K lnKm

s

(y) dy

= I1 K I2 .

79

I1 =

lnKm

s

12

ey2

2 +m+s y dy

=

lnKm

s

12

e(ys)2

2 +m+s2

2 dy

= em+s2

2

lnKm

s s

12

ey2

2 dy [y s 7 y]

= em+s2

2

[1

(lnK m

s s)]

= em+s2

2

(lnK m

s+ s

)= elnE(V )

( lnK + lnE(V ) s22

s+ s

)= E(V )

(1

slnE(V )

K+s

2

)= E(V ) (d1) ,

on recalling m = ln(E(V )) 12 s2.

80

Similarly

I2 = 1 (lnK m

s

)=

(lnK m

s

)= (d1 s) = (d2) .

81

6. Correlated Random Variables.

Assume X = (X1, . . . , Xn) is an n-vector of random variables.

The mean of X is an n-vector = (E(X1), . . . , E(Xn)).

The covariance of X is an n n-matrix with componentsij = (CovX)ij = E((Xi i)(Xj j)) .

The variance of Xi is given by 2i = ii and the correlation between Xi and Xj is

given by ij =ijij

.

X is called a multi-dimensional normal vector, written as X N(,), if X has pdf

f (x) =1

(2)n/21

(det)1/2exp

{12(x )T1(x )

}where is a symmetric positive definite matrix.

82

7. Convergence.

Let {Xn} be a sequence of random variables. There are four types of convergenceconcepts associated with {Xn}: Almost sure convergence, written Xn a.s. X , if there exists a null set N suchthat for all \N one has

Xn() X(), n. Convergence in probability, written Xn P X , if for every > 0 one has

P (|Xn X| > ) 0, n. Convergence in norm, written Xn L

p X , if Xn,X Lp andE|Xn X|p 0, n.

Convergence in distribution, written Xn D X , if for any x at which P (X x)is continuous one has

P (Xn x) P (X x), n.83

8. Strong Law of Large Numbers.

Let {Xn} be independent, identically distributed (iid) random variables with finiteexpectation E(X1) = . Then

Zn

n

a.s. where Zn = X1 + +Xn.

9. Central Limit Theorem. Let {Xn} be iid random variables with finite expecta-tion and finite variance 2 > 0. For each n, let

Zn = X1 + +Xn.Then

Znn n

=Zn n

n

D Z

where Z is a N(0, 1) random variable, i.e.,

P (Zn n

n z) 1

2

z

ex2

2 dx, n.

84

10. LindebergFeller Central Limit Theorem.

Suppose X is a triangular array of random variables, i.e.,

X = {Xn1 , Xn2 , . . . , Xnk(n) : n {1, 2, . . .}}, with k(n) as n,such that, for each n, Xn1 , . . . , X

nk(n) are independently distributed and are bounded

in absolute value by a constant yn with yn 0. LetZn = X

n1 + +Xnk(n).

If E(Zn) and var(Zn) 2 > 0, then Zn converges in distribution to a normallydistributed random variable with mean and variance 2.

Note: LindebergFeller implies the Central Limit Theorem (5.9).

85

If X1, X2, . . . are iid with expectation and variance 2, then define

Xni :=Xi

n, i = 1, 2, . . . , k(n) := n.

For each n, Xn1 , . . . , Xnk(n) are independent and

E(Xni ) =E(Xi)

n= n

= 0 ,

Var(Xni ) =1

n2VarXi = 1

n22 =

1

n.

Let Zn = Xn1 + +Xnk(n) = (

ni=1Xi n) 1n , then

E(Zn) =

k(n)i=1

E(Xni ) = 0 ,

Var(Zn) =

k(n)i=1

Var(Xni ) = 1 .

Hence, by LindebergFeller,

ZnD Z N(0, 1) .

86

6. Optimization

1. Unconstrained Optimization.

Given f : Rn R.Minimize f (x) over x Rn.f has a local minimum at a point x if f (x) f (x) for all x near x, i.e.

> 0 s.t. f (x) f (x) x : x x < .

f has a global minimum at x if

f (x) f (x) x Rn.

87

2. Optimality Conditions.

First order necessary conditions:Suppose that f has a local minimum at x and that f is continuously differentiable

in an open neighbourhood of x. Thenf (x) = 0. (x is called a stationary point.) Second order sufficient Conditions:Suppose that f is twice continuously differentiable in an open neighbourhood of

x and that f (x) = 0 and 2f (x) is positive definite. Then x is a strict localminimizer of f .

Example: Show that f = (2x21 x2)(x21 2x2) has a minimum at (0, 0) along anystraight line passing through the origin, but f has no minimum at (0, 0).

Exercise: Find the minimum solution of

f (x1, x2) = 2x21 + x1x2 + x

22 x1 3x2. (4)

(Answer. 17 (1, 11).)

88

Sufficient Condition.

Taylor gives for any d Rn:f (x + d) = f (x) +f (x)T d + 12 dT 2f (x + d) d (0, 1) .

If x is not strict local minimizer, then

{xk} Rn \ {x} : xk x s.t. f (xk) f (x) .Define dk :=

xkxxkx. Then dk = 1 and there exists a subsequence {dkj} such that

dkj d? as j and d? = 1. W.l.o.g. we assume dk d? as k .f (x) f (xk) = f (x + xk x dk)

= f (x) + xk xf (x)T dk + 12 xk x2 dTk 2f (x + k xk x dk) dk= f (x) + 12 xk x2 dTk 2f (x + k xk x dk) dk .

Hence dTk 2f (x + k xk x dk) dk 0, and on letting k dT? 2f (x) d? 0 .

As d? 6= 0, this is a contradiction to 2f (x) being symmetric positive definite. Hence xis a strict local minimizer.

89

Example 6.2. Show that f = (2x21x2)(x21 2x2) has a minimum at (0, 0) along anystraight line passing through the origin, but f has no minimum at (0, 0).

Answer.

Straight line through (0, 0): x2 = x1, R fixed.g(r) := f (r, r) = (2 r2 r) (r2 2 r)g(r) = 8 r3 15 r2 + 42 r, g(r) = 24 r2 30 r + 42

g(0) = 0 and g(0) = 42 > 0 .Hence r = 0 is a minimizer for g (0, 0) is a minimizer for f along any straightline.

Now let (xk1, xk2) = (

1k, 1k2) (0, 0) as k . Then

f (xk1, xk2) =

1

k21

k2< 0 = f (0, 0) k .

Hence (0, 0) is not a minimizer for f .

[Note: f (0, 0) = 0, but 2f (0, 0) =(0 0

0 4

).]

90

3. Convex Optimization.

Exercise. When f is convex, any local minimizer x is a global minimizer of f . If in

addition f is differentiable, then any stationary point x is a global minimizer of f .

(Hint. Use a contradiction argument.)

91

Exercise 6.3.

When f is convex, any local minimizer x is a global minimizer of f .

Proof.

Suppose x is a local minimizer, but not a global minimizer. Then

x s.t. f (x) < f (x).Since f is convex, we have that

f ( x + (1 ) x) f (x) + (1 ) f (x)< f (x) + (1 ) f (x) = f (x) (0, 1] .

Let x := x + (1 ) x. Thenx x and f (x) < f (x) as 0.

This is a contradiction to x being a local minimizer.

Hence x is a global minimizer for f .

92

4. Line Search.

The basic procedure to solve numerically an unconstrained problem (minimize f (x)

over x Rn) is as follows.(i) Choose an initial point x0 Rn and an initial search direction d0 Rn and set

k = 0.

(ii) Choose a step size k and define a new point xk+1 = xk + k d

k. Check if the

stopping criterion is satisfied (f (xk+1) < ?). If yes, xk+1 is the optimalsolution, stop. If no, go to (iii).

(iii) Choose a new search direction dk+1 (descent direction) and set k = k + 1. Go to

(ii).

The essential and most difficult part in any search algorithm is to choose a descent

direction dk and a step size k with good convergence and stability properties.

93

5. Steepest Descent Method.

f is differentiable.

Choose dk = gk, where gk = f (xk), and choose k s.t.f (xk + k d

k) = minR

f (xk + dk).

Note that the successive descent directions are orthogonal to each other, i.e. (gk)T gk+1 =

0, and the convergence for some functions may be very slow, called zigzagging.

Exercise.

Use the steepest descent (SD) method to solve (4) with the initial point x0 = (1, 1).

(Answer. First three iterations give x1 = (0, 1), x2 = (0, 32), and x3 = (18, 32).)

94

Steepest Descent.

Taylor gives:

f (xk + dk) = f (xk) + f (xk)T dk +O(2).

As

f (xk)T dk = f (xk) dk cos k,with k the angle between dk and f (xk), we see that dk is a descent direction ifcos k < 0. The descent is steepest when k = cos k = 1.Zigzagging.

k is minimizer of () := f (xk + dk) with dk = gk.

Hence

0 = (k) = f (xk + k dk)T dk = f (xk+1)T (gk) = (gk+1)T gk .Hence dk+1 dk, which leads to zigzagging.

95

Exercise 6.5.

Use the SD method to solve (4) with the initial point x0 = (1, 1). [min: 17 (1, 11).]Answer.

f = (4 x1 + x2 1, 2 x2 + x1 3).Iteration 0: d0 = f (x0) = (4, 0) 6= (0, 0).

() = f (x0 + d0) = f (1 4, 1) = 2 (1 4)2 2minimum point at 0 =

14

x1 = x0 + 0 d0 = (0, 1),d1 = f (x1) = (0,1) = (0, 1) 6= (0, 0).

Iteration 1: x2 = (0, 32), d2 = (12, 0).

Iteration 2: x3 = (18, 32), d3 = (0, 18).

96

6. Newton Method.

f is twice differentiable.

Choose dk = [Hk]1gk, where Hk = 2f (xk).Set xk+1 = xk + dk.

If Hk is positive definite then dk is a descent direction.

The main drawback of the Newton method is that it requires the computation of

2f (xk) and its inverse, which can be difficult and time-consuming.Exercise.

Use the Newton method to solve (4) with x0 = (1, 1).

(Answer. First iteration gives x1 = 17 (1, 11).)

97

Newton Method.

Taylor gives

f (xk + d) f (xk) + dT f (xk) + 12 dT 2f (xk) d =: m(d)mindm(d) m(d) = 0

f (xk) +2f (xk) d = 0 .Hence choose dk = [2f (xk)]1f (xk) = [Hk]1gk.If Hk is positive definite, then so is (Hk)1, and we get

(dk)T gk = (gk)T (Hk)1 gk k gk2 < 0for some k > 0.

Hence dk is a descent direction.

[Aside: The Newton method for minx f (x) is equivalent to the Newton method for finding

a root of the system of nonlinear equations f (x) = 0.]

98

Exercise 6.6.

Use the Newton method to minimize

f (x1, x2) = 2x21 + x1x2 + x

22 x1 3x2

with x0 = (1, 1)T .

Answer.

f =(4 x1 + x2 12 x2 + x1 3

), H := 2f =

(4 1

1 2

).

H1 =1

detH

(2 11 4

)= 17

(2 11 4

).

Iteration 0: x0 = (1, 1)T , f (x0) = (4, 0)T .x1 = x0 [H0]1f (x0) =

(1

1

) 17

(2 11 4

)(4

0

)= 17

(111

).

f (x1) = (0, 0)T and H positive definite. x1 is minimum point.

99

7. Choice of Stepsize.

In computing the step size k we face a tradeoff. We would like to choose k to

give a substantial reduction of f , but at the same time we do not want to spend too

much time making the choice. The ideal choice would be the global minimizer of the

univariate function : R R defined by() = f (xk + dk), > 0,

but in general, it is too expensive to identify this value.

A common strategy is to perform an inexact line search to identify a step size that

achieves adequate reductions in f with minimum cost.

is normally chosen to satisfy the Wolfe conditions:

f (xk + k dk) f (xk) + c1 k (gk)Tdk (5)

f (xk + k dk)Tdk c2 (gk)Tdk, (6)with 0 < c1 < c2 < 1. (5) is called the sufficient decrease condition, and (6) is the

curvature condition.

100

Choice of Stepsize.

The simple condition

f (xk + k dk) < f (xk) ()

is not appropriate, as it may not lead to a sufficient reduction.

Example: f (x) = (x 1)2 1. So min f (x) = 1, but we can choose xk satisfying ()such that f (xk) = 1

k 0.

Note that the sufficient decrease condition (5)

() = f (xk + dk) `() := f (xk) + c1 (gk)Tdkyields acceptable regions for . Here () < `() for small > 0, as (gk)Tdk < 0 for

descent directions.

The curvature condition (6) is equivalent to

() c2 (0) [ > (0) ]i.e. a condition on the desired slope and so rules out unacceptably short steps . In

practice c1 = 104 and c2 = 0.9.

101

8. Convergence of Line Search Methods.

An algorithm is said to be globally convergent if

limk

gk = 0.

It can be shown that if the step sizes satisfy the Wolfe conditions

then the steepest descent method is globally convergent, so is the Newton method provided the Hessian matrices 2f (xk) have a boundedcondition number and are positive definite.

Exercise. Show that the steepest descent method is globally convergent if the

following conditions hold

(a) k satisfies the Wolfe conditions,

(b) f (x) M x Rn,(c) f C1 and f is Lipschitz: f (x)f (y) L x y x, y Rn.

102

[Hint: Show thatk=0

gk2

Exercise 6.8.

Assume that dk is a descent direction, i.e. (gk)T dk < 0, where gk := f (xk). Then if1. k satisfies the Wolfe conditions,

2. f (x) M x Rn,3. f C1 and f is Lipschitz, i.e. f (x)f (y) L x y x, y Rn,

it holds thatk=0

cos2 k gk2

The Lipschitz condition yields that

(gk+1 gk)T dk gk+1 gk dk = f (xk+1)f (xk) dk L xk+1 xk dk = k L dk2 . ()

Combining () and () gives k c2 1L

(gk)T dk

dk2 , and hence

k (gk)T dk c2 1

L

[(gk)T dk]2

dk2Together with Wolfe condition (5) we get

f (xk+1) f (xk) + c1 c2 1L

[(gk)T dk]2

dk2 = f (xk) c cos2 k gk2 ,

105

where c := c11c2L

> 0.

f (xk+1) f (xk) c cos2 k gk2 f (x0) ck

j=0

cos2 j gj2

k

j=0

cos2 j gj2 1c(f (x0)M) k

j=0

cos2 j gj2

9. Popular Search Methods.

In practice the steepest descent method and the Newton method are rarely used

due to the slow convergence rate and the difficulty in computing Hessian matrices,

respectively.

The popular search methods are

the conjugate gradient method (variation of SD method with superlinear conver-gence) and

the quasi-Newton method (variation of Newton method without computation ofHessian matrices).

There are some efficient algorithms based on the trust-region approach. See Fletcher

(2000) for details.

107

10. Constrained Optimization.

Minimize f (x) over x Rn subject tothe equality constraints

hi(x) = 0, i = 1, . . . , l ,

and the inequality constraints

gj(x) 0, j = 1, . . . ,m .

Assume that all functions involved are differentiable.

108

11. Linear Programming.

The problem is to minimize

z = c1 x1 + + cn xnsubject to

ai1 x1 + + ain xn bi, i = 1, . . . ,m ,and

x1, . . . , xn 0.LPs can be easily and efficiently solved with the simplex algorithm or the interior

point method.

MS-Excel has a good in-built LP solver capable of solving problems up to 200 vari-

ables. MATLAB with optimization toolbox also provides a good LP solver.

109

12. Graphic Method.

If an LP problem has only two decision variables (x1, x2), then it can be solved by

the graphic method as follows:

First draw the feasible region from the given constraints and a contour line of theobjective function,

then, on establishing the increasing direction perpendicular to the contour line,find the optimal point on the boundary of the feasible region,

then find two linear equations which define that point, and finally solve the two equations to obtain the optimal point.Exercise. Use the graphic method to solve the LP:

minimize z = 3 x1 2 x2subject to x1 + x2 80, 2x1 + x2 100, x1 40, and x1, x2 0.(Answer. x1 = 20, x2 = 60.)

110

13. Quadratic Programming.

Minimize

xTQx + cTx

subject to

Ax b and x 0,where Q is an n n symmetric positive definite matrix, A is an n m matrix,x, c Rn, b Rm.To solve a QP problem, one

first derives a set of equations from the KuhnTucker conditions, and then applies the Wolfe algorithm or the Lemke algorithm to find the optimalsolution.

The MS-Excel solver is capable of solving reasonably sized QP problems, similarly

for MATLAB.

111

14. KuhnTucker Conditions.

min f (x) over x Rn s.t. hi(x) = 0, i = 1, . . . , l; gj(x) 0, j = 1, . . . ,m.Assume that x is an optimal solution.

Under some regularity conditions, called the constraint qualifications, there exist

two vectors u = (u1, . . . , ul) and v = (v1, . . . , vm), called the Lagrange multipliers,

such that the following set of conditions is satisfied:

Lxk(x, u, v) = 0, k = 1, . . . , n

hi(x) = 0, i = 1, . . . , l

gj(x) 0, j = 1, . . . ,mvj gj(x) = 0, vj 0, j = 1, . . . ,m

where

L(x, u, v) = f (x) +

li=1

ui hi(x) +

mj=1

vj gj(x)

is called the Lagrange function or Lagrangian.

112

Furthermore, if f : Rn R and hi, gj : Rn R are convex, then x is an optimalsolution if and only if (x, u, v) satisfies the KuhnTucker conditions.

This holds in particular, when f is convex and hi, gj are linear.

Example.

Find the minimum solution to the function x2 x1 subject to x21 + x22 1.Exercise.

Find the minimum solution to the function x21+x222x14x2 subject to x1+2x2 2

and x2 0.(Answer. (x1, x2) =

15 (2, 4).)

113

Interpretation of KuhnTucker conditions

Assume that no equality constraints are present.

If x is an interior point, i.e. no constraints are active, then we recover the usual optimality

condition: f (x) = 0.Now assume that x lies on the boundary of the feasible set and let gjk be the active

constraints at x. Then a necessary condition for optimality is that we cannot find a

descent direction for f at x that is also a feasible direction. Such a vector cannot exist,

if

f (x) =k

vjkgjk(x) with vjk 0 . ()

This is because, if d Rn is a descent direction, thenf (x)T d < 0 andk vjkgjk(x)T d >0.

So there must exist a jk, such that gjk(x)T d > 0. But that means that d is an ascentdirection for gjk, and as gjk is active at x, it is not a feasible direction.

If we require vj gj(x) = 0 for all inactive constraints, then we can re-write () as 0 =114

f (x) +mj=1 vjgj(x) with vj 0. These are the KT conditions.

115

Application of KuhnTucker: LP Duality

Let b Rm, c Rn and A Rnm.min cT x s.t. Ax b, x 0 . (P)

Equivalent to min cT x s.t. b Ax 0, x 0 .Lagrangian: L = cT x + vT (b Ax) + yT (x).Hence, x is the solution, if there exist v and y such that

L = c AT v y = 0 y = c AT v,KT conditions: vT (b A x) = 0, yT (x) = 0,

v, y 0, b A x 0, x 0.

116

Eliminate y to find v, x:

A x b, x 0 feasible region: primalAT v c, v 0 feasible region: dualvT (b A x) = 0xT (c AT v) = 0

} xT c = xT AT v = vT b

Hence v Rm solves the dual:max bT v s.t. AT v c, v 0 . (D)

117

Here we have used that

cT x = minx0, A xb

cT x

minx0 maxv0 c

T x + vT (b Ax)= max

v0 minx0 cT x + vT (b Ax)

= maxv0 minx0 v

T b + xT (c AT v) max

v0, AT vcvT b

vT b = cT x

118

Example 6.14.

Find the minimum solution to the function x2 x1 subject to x21 + x22 1.Answer.

L = x2 x1 + v (x21 + x22 1), so the KT conditions becomeLx1 = 1 + 2 v x1 = 0 (1)Lx2 = 1 + 2 v x2 = 0 (2)

x21 + x22 1 (3)

v (x21 + x22 1) = 0, v 0 (4)

(1) v > 0 and hence x1 = 12v , x2 = 12v .Plugging this into (4) yields 2

4 v2= 1 and hence

v =12

x1 = 12, x2 = 1

2;

with the optimal value being z = 2.

119

Example.

min x1 s.t. x2 x31 0, x1 1, x2 0.Since x31 x2 0 we have x1 0 and hence x = (0, 0) is the unique minimizer.Lagrangian: L = x1 + v1 (x2 x31) + v2 (x1 1) + v3 (x2).KT conditions for a feasible point x:

L =(1 3 v1 x21 + v2

v1 v3

)= 0 (1)

v1 (x2 x31) = 0, v2 (x1 1) = 0, v3 (x2) = 0 (2)v1, v2, v3 0 (3)

Check KT conditions at x = (0, 0):

(1) v2 = 1 < 0 impossible!KT condition is not satisfied, since the constraint qualifications do not hold.

Here g1 = x2x31 and g3 = x2 are active at (0, 0), and g1 =(01

), g3 =

(01). Hence

g1 and g3 are not linearly independent!120

Exercise 6.14

Find the minimum solution to the function x21+x22 2 x1 4 x2 subject to x1+2 x2 2

and x2 0.Answer.

Lagrangian: L = x21 + x22 2 x1 4 x2 + v1 (x1 + 2 x2 2) + v2 (x2)

KT conditions:

L =(

2 x1 2 + v12 x2 4 + 2 v1 v2

)= 0 (1)

v1 (x1 + 2 x2 2) = 0, v2 x2 = 0, v1, v2 0 (2)x1 + 2 x2 2, x2 0 (3)

If x1 + 2 x2 2 < 0 then v1 = 0 x1 = 1 , x2 = 2 + 12 v2 2. Hence from (2),v2 = 0, and so x1 = 1, x2 = 2. But that contradicts (3), and so it must hold that

x1 + 2 x2 2 = 0.If x2 > 0, then v2 = 0 solving (1) together with x1 + 2 x2 2 = 0 gives x1 = 25,x2 =

45; and v1 =

65, v2 = 0. X

121

If x2 = 0 then x1 = 2 2 x2 = 2. But then from (1), v1 = 2 < 0. Impossible.

122

7. Lattice (Tree) Methods

1. Concepts in Option Pricing Theory.

A call (or put) option is a contract that gives the holder the right to buy (or sell)

a prescribed asset (underlying asset) by a certain date T (expiration date) for a

predetermined price X (exercise price).

A European option can only be exercised on the expiration date while an American

option can be exercised at any time prior to the expiration date.

The other party to the holder of the option contract is called the writer.

The holder and the writer are said to be in long and short positions of the contract,

respectively.

The terminal payoffs of a European call (or put) option is (S(T )X)+ := max(S(T )X, 0) (or (X S(T ))+), where S(T ) is the underlying asset price at time T .

123

2. Random Walk Model and Assumption.

Assume S(t) and V (t) are the asset price and the option price at time t, respectively,

and the current time is t and current asset price is S, i.e. S(t) = S.

After one period of time t, the asset price S(t + t) is either uS (up state)

with probability q or dS (down state) with probability 1 q, where q is a real(subjective) probability.

To avoid riskless arbitrage opportunities, we must have

u > R > d

where R := ert and r is the riskless interest rate.

Here r is the riskless interest rate that allows for unlimited borrowing or lending at

the same rate r.

Investing $1 at time t yields a value (return) at time t + t of $R = $(ert).

(continuous compounding).

124

Continuous compounding

Saving an amount B at time t, yields with interest rate r at time t + t without

compounding: (1 + rt)B.

Compounding the interest n times yields(1 +

rt

n

)nB ertB as n .

Example:

Consider t = 1 and r = 0.05.

Then R = er = 1.0512711, equivalent to an AER of 5.127%.

n (1 + rn)n AER

1 1.05 5 %

4 1.05094534 5.095 %

12 1.05116190 5.116 %

52 1.05124584 5.125 %

365 1.05126750 5.127 %

125

No Arbitrage. u > R > d

If R u, then short sell > 0 units of the asset, deposit S in the bank; giving aportfolio value at time t of:

(t) = 0

and at time t +t of

(t +t) = R (S) S(t +t) R (S) (uS) 0 .

Moreover, in the down state (with probability 1 q > 0) the value is(t +t) = R (S) (dS) > 0 .

Hence we can make a riskless profit with positive probability.

No arbitrage implies u > R.

A similar argument yields that d < R.

126

3. Replicating Portfolio.

We form a portfolio consisting of

units of underlying asset ( > 0 buying, < 0 short selling) and

cash amount B in riskless cash bond (B > 0 lending, B < 0 borrowing).

If we choose

=V (uS, t +t) V (dS, t +t)

uS dS ,

B =uV (dS, t +t) d V (uS, t +t)

uR dR ,we have replicated the payoffs of the option at time t+t, no matter how the asset

price changes.

127

Replicating Portfolio.

Value of portfolio at time t : (t) = S + B, and at time t +t:

(t +t) = S(t +t) +RB =

uS +RB =: tu w. prob. q dS +RB =: td w. prob. 1 qCall option value at time t +t (expiration time) :

C(t +t) =

(uS X)+ =: Ctu w. prob. q(dS X)+ =: Ctd w. prob. 1 qTo replicate the call option, we must have tu = C

tu and

td = C

td . HenceuS +RB = Ctu d S +RB = Ctd

=

Ctu Ctdu S dS

B =uCtd dCtuuR dR

This gives the fair price for the call option at time t: (t) = S + B.

128

Replicating Portfolio.

The value of the call option at time t is:

C(t) = (t) = S + B =Ctu Ctdu S dS S +

uCtd dCtuuR dR

=1

R

[R du d C

tu +

u Ru d C

td

]=

1

R

[pCtu + (1 p)Ctd

],

where

p =R du d .

No arbitrage argument: If C(t) < (t), then buy call option at price C(t) and sell

portfolio at (t) (that is short sell units of asset and lend B amounts of cash to the

bank). This gives an instantaneous profit of (t) C(t) > 0. And we know that attime t+t, the payoff from our call option will exactly compensate for the value of the

portfolio, as C(t +t) = (t +t).

A similar argument for the case C(t) > (t) yields that C(t) = (t).

129

4. Risk Neutral Option Price.

The current option price is given by

V (S, t) =1

R

(p V (uS, t +t) + (1 p)V (dS, t +t)

), (7)

where p =R du d . Note that

1. the real probability q does not appear in the option pricing formula,

2. p is a probability (0 < p < 1) since d < R < u, and

3. p (uS) + (1 p) (dS) = RS, so p is called the risk neutral probability and theprocess of finding V is called the risk neutral valuation.

130

5. Riskless Hedging Principle.

We can also derive the option pricing formula (7) as follows:

form a portfolio with a long position of one unit of option and a short position of

units of underlying asset.

By choosing an appropriate we can ensure such a portfolio is riskless.

Exercise.

Find (called delta) and derive the pricing formula (7).

131

6. Asset Price Process.

In a risk neutral continuous time model, the asset price S is assumed to follow a

lognormal process.

(as discussed in detail in Stochastic Processes I, omitted here).

Over a period [t, t+ ] the asset price S(t+ ) can be expressed in terms of S(t) = S

as

S(t + ) = S e(r12

2) + Z

where r is the riskless interest rate, the volatility, and Z N(0, 1) a standardnormal random variable.

S(t + ) has the first moment S er and the second moment S2 e(2 r+2) .

132

7. Relations Between u, d and p.

By equating the first and second moments of the asset price S(t + t) in both

continuous and discrete time models, we obtain the relation

S(p u + (1 p) d) = S ert,

S2(p u2 + (1 p) d2) = S2 e(2 r+2)t,

or equivalently,

p u + (1 p) d = ert, (8)p u2 + (1 p) d2 = e(2 r+2)t. (9)

There are two equations and three unknowns u, d, p. An extra condition is needed to

uniquely determine a solution.

[Note that (8) implies p =R du d , where R = e

rt as before.]

133

8. CoxRossRubinstein Model.

The extra condition is

u d = 1. (10)

The solutions to (8), (9), (10) are

u =1

2R

(2 + 1 +

(2 + 1)2 4R2

),

d =1

2R

(2 + 1

(2 + 1)2 4R2

),

where

2 = e(2 r+2)t.

If the higher order term O((t)32) is ignored in u, d, then

u = et, d = e

t, p =

R du d .

These are the parameters chosen by CoxRossRubinstein for their model.

Note that with this choice (9) is satisfied up to O((t)2).

134

Proof.

2 = e(2 r+2)t

= p (u2 d2) + d2

=R du d (u + d) (u d) + d

2

= Ru d u +Rd= Ru 1 + R

u Ru2 (1 + 2)u +R = 0 .

As u > d, we get the unique solutions

utrue =1 + 2 +

(1 + 2)2 4R22R

=1 + 2

2R+

(1 + 2

2R

)2 1

and

dtrue =1

u=1 + 2 (1 + 2)2 4R2

2R=1 + 2

2R(

1 + 2

2R

)2 1 .

135

Moreover1 + 2

2R=1

2

(e(2 r+

2)t + 1)ert

=1

2

(2 + (2 r + 2)t +O((t)2)

) (1 rt +O((t)2))

=1

2

(2 + 2t +O((t)2)

)= 1 + 12

2t +O((t)2),

and (1 + 2

2R

)2 1 =

(1 + 12

2t +O((t)2))2 1

=1 + 2t +O((t)2) 1

= t1 +O(t)

= t (1 +O(t))

= t +O((t)

32) .

utrue = 1 + 12 2t + t +O((t)

32) .

136

The CoxRossRubinstein (CRR) model uses

u = et = 1 +

t + 12

2t +O((t)32) .

Hence the first three terms in the Taylor series of the CRR value u and the true value

utrue match. So

u = utrue +O((t)32) .

Estimating the error in (9) by this choice of u gives

Error = p u2 + (1 p) d2 e(2 r+2)t= R (u + d) 1 e(2 r+2)t= ert

(et + e

t) 1 e(2 r+2)t

=(1 + rt +O((t)2)

) (2 + 2t +O((t)2)

) 1 (1 + (2 r + 2)t +O((t)2))

= O((t)2) .

137

9. JarrowRudd Model.


p =1

2. (11)

Exercise.

Show that the solutions to (8), (9), (11) are

u = R(1 +

e

2t 1),

d = R(1

e

2t 1).

Show that if O((t)32) is ignored then

u = e(r2

2 )t+t ,

d = e(r2

2 )tt .

(These are the parameters chosen by JarrowRudd for their model.)

Also show that (8) and (9) are satisfied up to O((t)2).

138

10. Tian Model.


p u3 + (1 p) d3 = e3 rt+3 2 t. (12)Exercise.

Show that the solutions to (8), (9), (12) are

u =RQ

2

(Q + 1 +

Q2 + 2Q 3

),

d =RQ

2

(Q + 1

Q2 + 2Q 3

),

p =R du d ,

where R = ert and Q = e2 t.

Also show that if O((t)32) is ignored, then

p =1

2 34t.

139

(Note that u d = R2Q2 instead of u d = 1 and that the binomial tree loses its

symmetry about S whenever u d 6= 1.)

140

11. BlackScholes Equation (BSE).

As the time interval t tends to zero, the one period option pricing formula, (7) with

(8) and (9), tends to the BlackScholes Equation

V

t+ r S

V

S+1

22 S2

2V

S2 r V = 0. (13)

Proof.

Two variable Taylor expansion at (S, t) gives

V (uS, t +t) = V + VS (uS S) + Vtt+ 12(VSS (uS S)2 + 2VSt (uS S)t + Vtt (t)2

)+ higher order terms .

Similarly,

V (dS, t +t) = V + VS (dS S) + Vtt+ 12(VSS (dS S)2 + 2VSt (dS S)t + Vtt (t)2

)+ higher order terms .

141

Substituting into the right hand side of (7) gives

V = {V + S VS [p (u 1) + (1 p) (d 1)] + Vtt+12[S2 VSS

(p (u 1)2 + (1 p) (d 1)2)

+2S VSt (p (u 1) + (1 p) (d 1))t+Vtt (t)

2]+ higher order terms

}ert .

Now it follows from (8) that

p (u 1) + (1 p) (d 1) = p u + (1 p) d 1 = ert 1 = rt +O((t)2) .Similarly, (8) and (9) give that

p (u 1)2 + (1 p) (d 1)2 = p u2 + (1 p) d2 2 (p u + (1 p) d) + 1= e(2 r+

2)t 2 ert + 1 = 2t +O((t)2) .

142

So, for the RHS of (7) we get

V ={V + S VS

(rt +O((t)2)

)+ Vtt

+12[S2 VSS

(2t +O((t)2)

)+ 2VSt

(rt +O((t)2)

)t]

+O((t)2)}ert

={V +

(r S VS + Vt +

12

2 S2 VSS)t +O((t)2)

} (1 rt +O((t)2))

= V +(r V + r S VS + Vt + 12 2 S2 VSS) t +O((t)2) .

Cancelling V and dividing by t yields

r V + r S VS + Vt + 12 2 S2 VSS +O(t) = 0 .Ignoring the O(t) term, we get the BSE.

The binomial model approximates BSE to 1st order accuracy.

143

12. n-Period Option Price Formula.

For a multiplicative n-period binomial process, the call value is

C = Rnnj=0

(n

j

)pj (1 p)nj max(uj dnj S X, 0) , (14)

where

(n

j

)=

n!

j!(n j)! is the binomial coefficient.

Define k to be the smallest non-negative integer such that uk dnk S X .The call pricing formula can be simplified as

C = S (n, k, p)X Rn(n, k, p) , (15)where p =

u p

Rand is the complementary binomial distribution function defined

by

(n, k, p) =n

j=k

(n

j

)pj (1 p)nj.

144

Asset:

u2 S

uS

S u dS

dS

d2 S

recombined tree

Call:

C2tuu = (u2 S X)+

Ctu

C C2tud = (u dS X)+Ctd

C2tdd = (d2 S X)+

t t +t t + 2t

145

Call price at time t +t:

Ctu =1

R

(pC2tuu + (1 p)C2tud

), Ctd =

1

R

(pC2tud + (1 p)C2tdd

).

Call price at time t:

C =1

R

(pCtu + (1 p)Ctd

)=

1

R2(p2C2tuu + 2 p (1 p)C2tud + (1 p)2C2tdd

).

Proof of (14) by induction:

Assume (14) holds for n k 1, then for n = k, there are k 1 periods between timet +t and t + kt. So

Ctu =1

Rk1

k1j=0

(k 1j

)pj (1 p)k1j (uj dk1j (uS)X)+

=1

Rk1

kj=1

(k 1j 1

)pj1 (1 p)kj (uj dkj S X)+ .

146

pCtu =1

Rk1

kj=1

(k 1j 1

)pj (1 p)kj (uj dkj S X)+ ,

and similarly

(1 p)Ctd =1

Rk1

k1j=0

(k 1j

)pj (1 p)kj (uj dkj S X)+ .

Combining gives

C =1

R

(pCtu + (1 p)Ctd

)=

1

Rk[pk (uk S X)+ + (1 p)k (dk S X)+

+k1j=1

{(k 1j 1

)+

(k 1j

)}

=

(k

j

) pj (1 p)kj (uj dkj S X)+]

This proves (14).

147

Let k be the smallest non-negative integer such that

uk dnk S X (ud

)k XS dn

k ln XS dn

/ lnu

d.

Then

(uj dnj S X)+ =0 j < kuj dnj S X j k .

Hence

C =1

Rn

nj=k

(n

j

)pj (1 p)nj (uj dnj S X)

= S

nj=k

(n

j

)[p u

R

]j [(1 p) d

R

]njX Rn

nj=k

(n

j

)pj (1 p)nj .

On setting p =p u

R, it holds that

1 p = 1 R du d

u

R=Rd + u dR (u d) =

uRu d

d

R=(1 p) d

R.

This proves (15).

148

13. BlackScholes Call Price Formula.

As the time interval t tends to zero, the n-period call price formula (15) tends to

the BlackScholes call option pricing formula

c = S (d1)X er (Tt)(d2) (16)where

d1 =ln S

X+ (r +

2

2 ) (T t)T t

and

d2 =ln S

X+ (r 22 ) (T t)T t d1

T t.

Proof.

The n-period call price (15) is given by

C = S (n, k, p)X Rn(n, k, p) = S (n, k, p)X er nt er (Tt)

(n, k, p) .

149

Hence we need to prove

(n, k, p) (d1) and (n, k, p) (d2) as t 0.

150

Let j be the binomial RV of the number of upward moves in n periods. Then

Sn = S uj dnj = S

(ud

)jdn ,

where Sn is the asset price at time T = t + nt.

ln SnS

= j lnu

d+ n ln d = j ln u + (n j) ln d ,

E(j) = n p , Var(j) = n p (1 p) ,E

(lnSn

S

)= n p ln

u

d+ n ln d , Var

(lnSn

S

)= n p (1 p)

(lnu

d

)2.

Moreover,

1 (n, k, p) = P (j k 1) = P(lnSn

S (k 1) ln u

d+ n ln d

).

As k is the smallest integer such that k ln XS dn

/ ln ud, there exists an (0, 1] such

that

k 1 = ln XS dn

/ lnu

d ,

151

and so

1 (n, k, p) = P(lnSn

S lnX

S ln u

d

).

152

Define

Xni :=

ln u with prob. pln d with prob 1 p i = 1, . . . , n ,and Zn := X

n1 + . . . +X

nn .

Then, for each n, {Xn1 , . . . , Xnn} are independent and Zn ln SnS . Hence

1 (n, k, p) = P (Zn n) , n := lnXS ln u

d.

Assuming e.g. the CRR model, we get

E(Zn) = n p ln u + n (1 p) ln d (T t), where := r 2

2,

Var(Zn) = n p (1 p) ln2 ud

2 (T t) ,

n = lnX

S ln u

d lnX

S,

and |Xni | |t| =

T tn

yn 0 as n .

153

Hence Theorem 5.10. implies that

ZnD Z N( (T t), 2 (T t)) .

154

So, as n

1 (n, k, p) = P (Zn n) 12 2 (T t)

ln XS

e(s (Tt))2

2 2 (Tt) ds .

On letting

y :=s (T t)T t ,

we have

1 (n, k, p) 12

ln XS (Tt)Tt

e

y2

2 dy =

(ln X

S (T t)T t

).

Hence

(n, k, p) 1 (ln X

S (T t)T t

)=

(ln

XS (T t)T t

)

=

(ln S

X+ (r 22 ) (T t)T t

)= (d2) .

Similarly one can show (n, k, p) (d1).155

14. Implementation.

Choose the number of time steps that is required to march down until the expiration

time T to be n, so that t =T tn

.

At time ti = t + it there are i + 1 nodes.

Denote these nodes as (i, j) with i = 0, 1, . . . , n and j = 0, 1, . . . , i, where the first

component i refers to time ti and the second component j refers to the asset price

S uj dij.

Compute the option prices V nj = V (S uj dnj, T ), j = 0, 1, . . . , n, at expiration time

tn = T over n + 1 nodes.

Then go backwards one step and compute option prices at time tn1 = T t usingthe one period binomial option pricing formula over n nodes.

Continue until reaching the current time t0 = t.

156

The recursive formula is

V ij =1

R

(p V i+1j+1 + (1 p)V i+1j

), j = 0, . . . , i and i = n 1, . . . , 0 ,

where

p =R du d .

The value V 00 = V (S, t) is the option price at the current time t.

C++ Exercise: Write a program to implement the CoxRossRubinstein model to

price options with a non-dividend paying underlying asset. The inputs are the asset

price S, the strike price X , the riskless interest rate r, the current time t and the

maturity time T , the volatility , and the number of steps n. The outputs are the

European call and put option prices.

157

15. Greeks.

Assume the current time is t = 0. Some common sensitivity measures of a European

call option price c (the rate of change of the call price with respect to the underlying

factors) are delta, gamma, vega, rho, theta, defined by

=c

S, =

2c

S2, v =

c

, =

c

r, = c

T.

Exercise.

Derive the following relations from the call price (16).

= (d1) ,

=(d1)S

T,

v = ST (d1) ,

= X T er T (d2) ,

= S (d1)

2T

r X er T (d2) .

158

16. Dynamic Hedging.

Greeks can be used to hedge an option position against changes of underlying factors.

For example, to hedge a short position of a European call against changes of the asset

price, one should keep a long position of the underlying asset with units.

This strategy requires continuous trading and is prohibitively expensive if there are

any transaction costs.

159

17. Dividends.

(a) If the asset pays a continuous dividend yield at a rate q, then use r q instead ofr in building the asset price tree, but still use r in discounting the option values

in the backward calculation.

In the risk-neutral model, the asset has a rate of return of r q.So

(8) becomes p u + (1 p) d = e(rq)t ,(9) becomes p u2 + (1 p) d2 = e(2 r2 q+2)t .

However, the discount factor1

Rremains the same:

1

R= ert .

160

(b) If the asset pays a discrete proportional dividend S between ti1 and ti, then theasset price at node (i, j) drops to (1 )uj dij S instead of the usual uj dij S.The tree is still recombined.

161

(1 ) u4 S

(1 ) u2 SS

(1 ) u dS

(1 ) d2 S (1 ) d4 S

(c) If the asset pays a discrete cash dividend D between ti1 and ti, then the assetprice at node (i, j) drops to uj dij SD instead of the usual uj dij S. There arei+ 1 new recombined trees emanating from the nodes at time ti, but the original

tree is no longer recombined.

162

Suppose a cash dividend D is paid between ti1 and ti, then at time ti+m thereare (i + 1) (m + 1) nodes, instead of i +m + 1 in a recombined tree.

If there are several cash dividends, then the number of nodes is much larger than

in a normal tree.

Example. Assume there is a cash dividend D between periods 1 and 2. Then

the possible asset prices at time n = 2 are {u2 S D, u dS D, d2 S D}. Atn = 3 there are 6 nodes: u {u2 S D, u dS D, d2 S D} and d {u2 S D, u dS D, d2 S D}, instead of 4.[Reason: d (u2 S D) 6= u (u dS D)].

163

u2 (u2 S D)

u2 S D

S

u dS D

d2 S D

d2 (d2 S D)

164

18. Trinomial Model.

Suppose the asset price is S at time t. At time t + t the asset price can be

uS (up-state) with probability pu, mS (middle-state) with probability pm, and dS

(down-state) with probability pd. Then the risk neutral one period option pricing

formula is

V (S, t) =1

R(pu V (uS, t +t) + pm V (mS, t +t) + pd V (dS, t +t)) .

The trinomial tree approach is equivalent to the explicit finite difference method. By

equating the first and second moments of the asset price S(t+t) in both continuous

and discrete models, we obtain the following equations:

pu + pm + pd = 1 ,

pu u + pmm + pd d = ert ,

pu u2 + pmm

2 + pd d2 = e(2 r+

2)t .

There are three equations and six variables u,m, d and pu, pm, pd. We need three

more equations to uniquely determine these variables.

165

19. Boyle Model.

The three additional conditions are

m = 1, u d = 1, u = et ,

where is a free parameter.

Exercise.

Show that the risk-neutral probabilities are given by

pu =(W R)u (R 1)

(u 1) (u2 1) , pm = 1 pu pd ,

pd =(W R)u2 (R 1) u3

(u 1) (u2 1) ,

where R = ert and W = e(2 r+2)t.

Note that if = 1, which corresponds to the choice of u as in the CRR model, certain

sets of parameters can lead to pm < 0. To rectify this, choose > 1.

166

Boyle claimed that if pu pm pd 13, then the trinomial scheme with 5 steps iscomparable to the CRR binomial scheme with 20 steps.

167

20. HullWhite Model.

This is the same as the Boyle model with =3.

Exercise.

Show that the risk-neutral probabilities are

pd =1

6

t

12 2

(r 1

22),

pm =2

3,

pu =1

6+

t

12 2

(r 1

22),

if terms of order O(t) are ignored.

168

21. KamradRitchken Model.

If S follows a lognormal process, then we can write ln S(t+t) = lnS(t)+Z, where

Z is a normal random variable with mean (r 12 2)t and variance 2t.KR suggested to approximate Z by a discrete random variable Za as follows: Za =

x with probability pu, 0 with probability pm, and x with probability pd, wherex =

t and 1.

The corresponding u,m, d in the trinomial tree are u = ex,m = 1, d = ex.

By omitting the higher order term O((t)2), they showed that the risk-neutral prob-

abilities are

pu =1

22+

1

2

(r 1

22)

t ,

pm = 1 12

,

pd =1

22 12

(r 1

22)

t .

Note that if = 1 then pm = 0 and the trinomial scheme is reduced to the binomial

169

scheme.

KR claimed that if pu = pm = pd =13, then the trinomial scheme with 15 steps is

comparable to the binomial scheme (pm = 0) with 55 steps. They also discussed the

trinomial scheme with two correlated state variables.

170

Za =

x with prob. pu

0 with prob. pm

x with prob. pdand the equations become

pu + pm + pd = 1 , (1)

E(Za) = x (pu pd) = (r 12 2)t , (2)Var(Za) = (x)2 (pu + pd) (x)2 (pu pd)2

O((t)2)

= 2t .

Dropping the O((t)2) term leads to

(x)2 (pu + pd) = 2t . (3)

171

Hence

pu =1

2

[t

(x)22 +

t

x(r 1

22)

], pd =

1

2

[t

(x)22 t

x(r 1

22)

].

pu = 12

[1

2+

1

(r 1

22)

t

], pd =

1

2

[1

2 1

(r 122)

t

].

172

8. Finite Difference Methods

1. Diffusion Equations of One State Variable.

u

t= c2

2u

x2, (x, t) D , (17)

where t is a time variable, x is a state variable, and u(x, t) is an unknown function

satisfying the equation.

To find a well-defined solution, we need to impose the initial condition

u(x, 0) = u0(x) (18)

and, if D = [a, b] [0,), the boundary conditionsu(a, t) = ga(t) and u(b, t) = gb(t) , (19)

where u0, ga, gb are continuous functions.

173

If D = (,) (0,), we need to impose the boundary conditionslim|x|

u(x, t) ea x2= 0 for any a > 0. (20)

(20) implies u(x, t) does not grow too fast as |x| .The diffusion equation (17) with the initial condition (18) and the boundary conditions

(19) is well-posed, i.e. there exists a unique solution that depends continuously on u0,

ga and gb.

174

2. Grid Points.

To find a numerical solution to equation (17) with finite difference methods, we first

need to define a set of grid points in the domain D as follows:

Choose a state step size x = baN

(N is an integer) and a time step size t, draw

a set of horizontal and vertical lines across D, and get all intersection points (xj, tn),

or simply (j, n),

where xj = a + jx, j = 0, . . . , N , and tn = nt, n = 0, 1, . . ..

If D = [a, b] [0, T ] then choose t = TM

(M is an integer) and tn = nt, n =

0, . . . ,M .

175

x0 = a x1 x2 . . . xN = b

t0 = 0

t1

t2

t3

...

tM = T

176

3. Finite Differences.

The partial derivatives

ux :=u

xand uxx :=

2u

x2

are always approximated by central difference quotients, i.e.

ux unj+1 unj1

2xand uxx

unj+1 2 unj + unj1(x)2

(21)

at a grid point (j, n). Here unj = u(xj, tn).

Depending on how ut is approximated, we have three basic schemes: explicit, implicit,

and CrankNicolson schemes.

177

4. Explicit Scheme.

If ut is approximated by a forward difference quotient

ut un+1j unj

t

at (j, n),

then the corresponding difference equation t

Documents

NMF