Upload
prateek-bhatnagar
View
66
Download
0
Embed Size (px)
DESCRIPTION
mn
Citation preview
Numerical Methods for FinanceDr Robert Nurnberg
This course introduces the major numerical methods needed for quantitative work in
finance. To this avail, the course will strike a balance between a general survey of
significant numerical methods anyone working in a quantitative field should know, and a
detailed study of some numerical methods specific to financial mathematics. In the first
part the course will cover e.g.
linear and nonlinear equations, interpolation and optimization,
while the second part introduces e.g.
binomial and trinomial methods, finite difference methods, Monte-Carlo simulation,
random number generators, option pricing and hedging.
The assessment consists of 80% an exam and 20% a project.
1
1. References
1. Burden and Faires (2004), Numerical Analysis.
2. Clewlow and Strickland (1998), Implementing Derivative Models.
3. Fletcher (2000), Practical Methods of Optimization.
4. Glasserman (2004), Monte Carlo Methods in Financial Engineering.
5. Higham (2004), An Introduction to Financial Option Valuation.
6. Hull (2005), Options, Futures, and Other Derivatives.
7. Kwok (1998), Mathematical Models of Financial Derivatives.
8. Press et al. (1992), Numerical Recipes in C. (online)
9. Press et al. (2002), Numerical Recipes in C++.
10. Seydel (2006), Tools for Computational Finance.
11. Wilmott, Dewynne, and Howison (1993), Option Pricing.
2
12. Winston (1994), Operations Research: Applications and Algorithms.
3
2. Preliminaries
1. Algorithms.
An algorithm is a set of instructions to construct an approximate solution to a math-
ematical problem.
A basic requirement for an algorithm is that the error can be made as small as we like.
Usually, the higher the accuracy we demand, the greater is the amount of computation
required.
An algorithm is convergent if it produces a sequence of values which converge to the
desired solution of the problem.
Example: Given c > 1 and > 0, use the bisection method to seek an approxima-
tion toc with error not greater than .
4
Example
Find x =c, c > 1 constant.
Answer
x =c x2 = c f (x) := x2 c = 0
f (1) = 1 c < 0 and f (c) = c2 c > 0 x (1, c) s.t. f (x) = 0
f (x) = 2 x > 0 f monotonically increasing x is unique.Denote In := [an, bn] with I0 = [a0, b0] = [1, c]. Let xn :=
an+bn2 .
(i) If f (xn) = 0 then x = xn.
(ii) If f (an) f (xn) > 0 then x (xn, bn), let an+1 := xn, bn+1 := bn.(iii) If f (an) f (xn) < 0 then x (an, xn), let an+1 := an, bn+1 := xn.
5
Length of In : m(In) =12m(In1) = = 12n m(I0) = c12n
Algorithm
Algorithm stops if m(In) < and let x? := xn.
Error as small as we like?
x, x? In error |x? x| = |xn x| m(In) 0 as n. X
Convergence?
I0 I1 In
! x n=0
In, f (x) = 0 , i.e. x =c. X
Implementation:
No need to define In = [an, bn]. It is sufficient to store only 3 points throughout.
Suppose x (a, b), define x := a+b2 .If x (a, x) let a := a, b := x, otherwise let a := x, b := b.
6
2. Errors.
There are various errors in computed solutions, such as
discretization error (discrete approximation to continuous systems), truncation error (termination of an infinite process), and rounding error (finite digit limitation in computer arithmetic).If a is a number and a is an approximation to a, then
the absolute error is |a a| andthe relative error is
|a a||a| provided a 6= 0.
Example: Discuss sources of errors in deriving the numerical solution of the non-
linear differential equation x = f (x) on the interval [a, b] with initial conditionx(a) = x0.
7
Example
discretization error
x = f (x) [differential equation]x(t + h) x(t)
h= f (x(t)) [difference equation]
DE =
x(t + h) x(t)h x(t)
truncation error
limn xn = x, approximate x with xN , N a large number.
TE = |x xN |rounding error
We cannot express x exactly, due to finite digit limitation. We get x instead.
RE = |x x|Total error = DE + TE + RE.
8
3.Well/Ill Conditioned Problems.
A problem is well-conditioned (or ill-conditioned) if every small perturbation of the
data results in a small (or large) change in the solution.
Example: Show that the solution to equations x + y = 2 and x + 1.01 y = 2.01 is
ill-conditioned.
Exercise: Show that the following problems are ill-conditioned:
(a) solution to the differential equation x 10x 11x = 0 with initial conditionsx(0) = 1 and x(0) = 1,(b) value of q(x) = x2 + x 1150 if x is near a root.
9
Example
x + y = 2x + 1.01 y = 2.01 x = 1y = 1
Change 2.01 to 2.02:x + y = 2x + 1.01 y = 2.02 x = 0y = 2
I.e. 0.5% change in data produces 100% change in solution: ill-conditioned !
[reason: det
(1 1
1 1.01
)= 0.01 nearly singular]
10
4. Taylor Polynomials.
Suppose f, f , . . . , f (n) are continuous on [a, b] and f (n+1) exists on (a, b). Let x0 [a, b]. Then for every x [a, b], there exists a between x0 and x with
f (x) =n
k=0
f (k)(x0)
k!(x x0)k +Rn(x)
where Rn(x) =f (n+1)()
(n + 1)!(x x0)n+1 is the remainder.
[Equivalently: Rn(x) =
xx0
f (n+1)(t)
n!(x t)n dt.]
Examples:
exp(x) =k=0
xk
k!
sin(x) =k=0
(1)k(2k + 1)!
x2k+1
11
5. Gradient and Hessian Matrix.
Assume f : Rn R.The gradient of f at a point x, written as f (x), is a column vector in Rn with ithcomponent f
xi(x).
The Hessian matrix of f at x, written as 2f (x), is an n n matrix with (i, j)thcomponent
2fxixj
(x). [As 2f
xixj=
2fxjxi
, 2f (x) is symmetric.]
Examples:
f (x) = aTx, a Rn f = a, 2f = 0 f (x) = 12 xT Ax, A symmetric f (x) = Ax, 2f = A f (x) = exp(12 xT Ax), A symmetric
f (x) = exp(12 xT Ax)Ax,2f (x) = exp(12 xT Ax)AxxT A + exp(12 xT Ax)A
12
6. Taylors Theorem.
Suppose that f : Rn R is continuously differentiable and that p Rn.Then we have
f (x + p) = f (x) +f (x + t p)Tp ,for some t (0, 1).Moreover, if f is twice continuously differentiable, we have
f (x + p) = f (x) + 10
2f (x + t p) p dt,and
f (x + p) = f (x) +f (x)Tp + 12pT 2f (x + t p) p,
for some t (0, 1).
13
7. Positive Definite Matrices.
An n n matrix A = (aij) is positive definite if it is symmetric (i.e. AT = A) andxTAx > 0 for all x Rn \ {0}. [I.e. xTAx 0 with = only if x = 0.]The following statements are equivalent:
(a) A is a positive definite matrix,
(b) all eigenvalues of A are positive,
(c) all leading principal minors of A are positive.
The leading principal minors of A are the determinants k, k = 1, 2, . . . , n, defined
by
1 = det[a11], 2 = det
[a11 a12
a21 a22
], . . . , n = detA.
A matrix A is symmetric and positive semi-definite, if xTAx 0 for all x Rn.Exercise.
Show that any positive definite matrix A has only positive diagonal entries.
14
8. Convex Sets and Functions.
A set S Rn is a convex set if the straight line segment connecting any two pointsin S lies entirely inside S, i.e., for any two points x, y S we have
x + (1 ) y S [0, 1].A function f : D R is a convex function if its domain D Rn is a convex setand if for any two points x, y D we have
f (x + (1 ) y) f (x) + (1 ) f (y) [0, 1].Exercise.
Let D Rn be a convex, open set.(a) If f : D R is continuously differentiable, then f is convex if and only if
f (y) f (x) +f (x)T (y x) x, y D .(b) If f is twice continuously differentiable, then f is convex if and only if 2f (x) is
positive semi-definite for all x in the domain.
15
Exercise 2.8.
(a) As f is convex we have for any x, y in the convex set D that
f ( y + (1 )x) f (y) + (1 ) f (x) [0, 1].Hence
f (y) f (x + (y x)) f (x)
+ f (x) .
Letting 0 yields f (y) f (x) +f (x)T (y x).For any x1, x2 D and [0, 1] let x := x1 + (1 )x2 D and y := x1.On noting that y x = x1 x1 (1 )x2 = (1 ) (x1 x2) we have thatf (x1) = f (y) f (x) +f (x)T (y x) = f (x) + (1 )f (x)T (x1 x2) . ()
Similarly, letting x := x1 + (1 )x2 and y := x2 gives, on noting that y x = (x2 x1), that
f (x2) f (x) + f (x)T (x2 x1) . ()16
Combining () + (1 ) () gives f (x1) + (1 ) f (x2) f (x) = f (x1 + (1 )x2 f is convex.
(b) For any x, x0 D use Taylors theorem at x0:f (x) = f (x0)+f (x0)T (xx0)+1
2(xx0)T 2f (x0+ (xx0)) (xx0) (0, 1)
As 2f is positive semi-definite, this immediately givesf (x) f (x0) +f (x0)T (x x0) f is convex.
Assume 2f is not positive semi-definite in the domain D. Then there exists x0 Dand x Rn s.t. xT 2f (x0) x < 0. As D is open we can find x1 := x0 + x D, for > 0 sufficiently small. Hence (x1 x0)T 2f (x0) (x1 x0) < 0. For x1 sufficientlyclose to x0 the continuity of 2f gives (x1 x0)T 2f (x0 + (x1 x0)) (x1 x0) < 0for all (0, 1). Taylors theorem then yields f (x1) < f (x0)+f (x0)T (x1x0). Thiscontradicts f being convex, see (a).
17
9. Vector Norms.
A vector norm on Rn is a function, , from Rn into R with the following properties:(i) x 0 for all x Rn and x = 0 if and only if x = 0.(ii) x = || x for all R and x Rn.(iii) x + y x + y for all x, y Rn.Common vector norms are the l1, l2 (Euclidean), and lnorms:
x1 =ni=1
|xi|, x2 ={
ni=1
x2i
}1/2, x = max
1in|xi|.
Exercise.
(a) Prove that 1, 2 and are norms.(b) Given a symmetric positive definite matrix A, prove that
xA :=xT Ax
is a norm.
18
Example.
Draw graphs defined by x1 1, x2 1, x 1 when n = 2.l1 l2 l
Exercise. Prove that for all x, y Rn we have
(a)
ni=1
|xi yi| x2 y2 [Scharz inequality]
and
(b)1nx2 x x1
n x2.
19
10. Spectral Radius.
The spectral radius of a matrix A Rnn is defined by (A) = max1in
|i|, where1, . . . , n are all the eigenvalues of A.
11.Matrix Norms.
For an n n matrix A, the natural matrix norm A for a given vector norm is defined by
A = maxx=1
Ax.The common matrix norms are
A1 = max1jn
ni=1
|aij|, A2 =(ATA)
(A) if A=AT
, A = max1in
nj=1
|aij|.
Exercise: Compute A1, A, and A2 for A =[
1 1 0
1 2 1
1 1 2
].
(Answer: A1 = A = 4 and A2 =7 +
7 3.1.)
20
12. Convergence.
A sequence of vectors {x(k)} Rn is said to converge to a vector x Rn ifx(k) x 0 as k for an arbitrary vector norm . This is equivalentto the componentwise convergence, i.e., x
(k)i xi as k , i = 1, . . . , n.
A square matrix A Rnn is said to be convergent if Ak 0 as k , whichis equivalent to (Ak)ij 0 as k for all i, j.The following statements are equivalent:
(i) A is a convergent matrix,
(ii) (A) < 1,
(iii) limkAk x = 0, for every x Rn.Exercise. Show that A is convergent, where
A =
[1/2 0
1/4 1/2
].
21
3. Algebraic Equations
1. Decomposition Methods for Linear Equations.
A matrix A Rnn is said to have LU decomposition if A = LU where L Rnnis a lower triangular matrix (lij = 0 if 1 i < j n) and U Rnn is an uppertriangular matrix (uij = 0 if 1 j < i n).The decomposition is unique if one assumes e.g. lii = 1 for 1 i n.
L =
l11
l21 l22
l31 l32 l33... . . .
ln1 ln2 ln3 . . . lnn
, U =u11 u12 u13 . . . u1n
u22 u23 . . . u2n. . . ...
un1,n1 un1,nunn
22
In general, the diagonal elements of either L or U are given and the remaining elements
of the matrices are determined by directly comparing two sides of the equation.
The linear system Ax = b is then equivalent to Ly = b and U x = y.
Exercise.
Show that the solution to Ly = b is
y1 = b1/l11, yi = (bi i1k=1
lik yk)/lii, i = 2, . . . , n
(forward substitution) and the solution to U x = y is
xn = yn/unn, xi = (yi n
k=i+1
uik xk)/uii, i = n 1, . . . , 1
(backward substitution).
23
2. Crout Algorithm.
Exercise.
Let A be tridiagonal, i.e. aij = 0 if |i j| > 1 (aij = 0 except perhaps ai1,i, aii andai,i+1), and strictly diagonally dominant (|aii| >
j 6=i |aij| holds for i = 1, . . . , n).
Show that A can be factorized as A = LU where lii = 1 for i = 1, . . . , n, u11 = a11,
and
ui,i+1 = ai,i+1
li+1,i = ai+1,i/uii
ui+1,i+1 = ai+1,i+1 li+1,i ui,i+1for i = 1, . . . , n 1. [Note: L and U are tridiagonal.]C++ Exercise: Write a program to solve a tridiagonal and strictly diagonally dom-
inant linear equation Ax = b by the Crout algorithm. The input are the number of
variables n, the matrix A, and the vector b. The output is the solution x.
24
Exercise 3.2.
u11 = a11 and ui,i+1 = ai,i+1, li+1,i = ai+1,i/uii, ui+1,i+1 = ai+1,i+1 li+1,i ui,i+1, fori = 1, . . . , n 1, can easily be shown.It remains to show that uii 6= 0 for i = 1, . . . , n. We proceed by induction to showthat
|uii| > |ai,i+1|,where for convenience we define an,n+1 := 0.
i = 1: |u11| = |a11| > |a1,2| X i 7 i + 1:
|ui+1,i+1| =ai+1,i+1 ai+1,i ai,i+1uii
|ai+1,i+1| |ai+1,i| |ai,i+1||uii| |ai+1,i+1| |ai+1,i| > |ai+1,i+2| X
Overall we have that |uii| > 0 and so the Crout algorithm is well defined. Moreover,det(A) = det(L) det(U) = det(U) =
ni=1 uii 6= 0.25
3. Choleski Algorithm.
Exercise.
Let A be a positive definite matrix. Show that A can be factorized as A = LLT
where L is a lower triangular matrix.
(i) Compute 1st column:
l11 =a11, li1 = ai1/l11, i = 2, . . . , n.
(ii) For j = 2, . . . , n 1 compute jth column:
ljj = (ajj j1k=1
l2jk)12
lij = (aij j1k=1
lik ljk)/ljj, i = j + 1, . . . , n.
26
(iii) Compute nth column:
lnn = (ann n1k=1
l2nk)12 .
27
4. Iterative Methods for Linear Equations.
Split A into A = M +N with M nonsingular and convert the equation Ax = b into
an equivalent equation x = C x + d with C = M1N and d = M1 b.Choose an initial vector x(0) and then generate a sequence of vectors by
x(k) = C x(k1) + d, k = 1, 2, . . .
The resulting sequence converges to the solution of Ax = b, for an arbitrary initial
vector x(0), if and only if (C) < 1.
The objective is to choose M such that M1 is easy to compute and (C) < 1.
The iteration stops if x(k) x(k1) < .[Note: In practice one solves M x(k) = N x(k1) + b, for k = 1, 2, . . ..]
28
Claim.
The iteration x(k) = C x(k1) + d is convergent if and only if (C) < 1.Proof.
Define e(k) := x(k) x, the error of the kth iterate. Thene(k) = C x(k1) + d (C x + d) = C (x(k1) x) = C e(k1) = C2 e(k2) = . . . Ck e(0) ,where e(0) = x(0) x is an arbitrary vector.Assume C is similar to the diagonal matrix = diag(1, . . . , n), where i are the
eigenvalues of C.
X nonsingular s.t. C = X X1
e(k) = Ck e(0) = X k X1 e(0) = X
k1. . .
kn
X1 e(0) 0 as k |i| < 1 i = 1, . . . , n (C) < 1 .
29
5. Jacobi Algorithm.
Exercise: Let M = D and N = L + U (L strict lower triangular part of A, D
diagonal, U strict upper triangular part). Show that the ith component at the kth
iteration is
x(k)i =
1
aii
bi i1j=1
aij x(k1)j
nj=i+1
aij x(k1)j
for i = 1, . . . , n.
6. GaussSeidel Algorithm.
Exercise: Let M = D + L and N = U . Show that the ith component at the kth
iteration is
x(k)i =
1
aii
bi i1j=1
aij x(k)j
nj=i+1
aij x(k1)j
for i = 1, . . . , n.
30
7. SOR Algorithm.
Exercise.
Let M = 1D + L and N = U + (1 1
)D where 0 < < 2. Show that the ith
component at the kth iteration is
x(k)i = (1 )x(k1)i +
1
aii
bi i1j=1
aij x(k)j
nj=i+1
aij x(k1)j
for i = 1, . . . , n.
C++ Exercise: Write a program to solve a diagonally dominant linear equation
Ax = b by the SOR algorithm. The input are the number of variables n, the matrix
A, the vector b, the initial vector x0, the relaxation parameter , and tolerance .
The output is the number of iterations k and the approximate solution xk.
31
8. Special Matrices.
If A is strictly diagonally dominant, then Jacobi and GaussSeidel converge for any
initial vector x(0). In addition, SOR converges for (0, 1].If A is positive definite and 0 < < 2, then the SOR method converges for any
initial vector x(0).
If A is positive definite and tridiagonal, then (CGS) = [(CJ)]2 < 1 and the optimal
choice of for the SOR method is =2
1 +1 (CGS)
[1, 2). With this choiceof , (CSOR) = 1 (CGS).Exercise.
Find the optimal for the SOR method for the matrix
A =
4 3 03 4 10 1 4
.(Answer: 1.24.)
32
9. Condition Numbers.
The condition number of a nonsingular matrix A relative to a norm is definedby
(A) = A A1.Note that (A) AA1 = I = max
x=1x = 1.
A matrix A is well-conditioned if (A) is close to one and is ill-conditioned if (A)
is much larger than one.
Suppose A < 1A1. Then the solution x to (A+ A) x = b+ b approximatesthe solution x of Ax = b with error estimate
x xx
(A)
1 A A1(bb +
AA
).
In particular, if A = 0 (no perturbation to matrix A) then
x xx (A)
bb .
33
Example.
Consider Example 1.3.
A =
(1 1
1 1.01
)
A1 = 1detA
(1.01 11 1
)=
1
0.01
(1.01 11 1
)=
(101 100100 100
)Recall
A1 = max1jn
ni=1
|aij| .Hence
A1 = max(2, 2.01) = 2.01 , A11 = max(201, 200) = 201 .
1(A) = A1 A11 = 404.01 1 (ill-conditioned!)
Similarly = 404.01 and 2 = (A) (A1) = maxmin = 402.0075.
34
10. Hilbert Matrix.
An n n Hilbert matrix Hn = (hij) is defined by hij = 1/(i + j 1) for i, j =1, 2, . . . , n.
Hilbert matrices are notoriously ill-conditioned and (Hn) very rapidly asn.
Hn =
11
2. . .
1
n
1
2
1
3. . .
1
n + 1... ...
1
n
1
n + 1. . .
1
2n 1
Exercise.
Compute the condition numbers 1(H3) and 1(H6).
35
(Answer: 1(H3) = 748 and 1(H6) = 2.9 106.)
36
11. Fixed Point Method for Nonlinear Equations.
A function g : R R has a fixed point x ifg(x) = x.
A function g is a contraction mapping on [a, b] if g : [a, b] [a, b] and|g(x)| L < 1, x (a, b)
where L is a constant.
Exercise.
Assume g is a contraction mapping on [a, b]. Prove that g has a unique fixed point x
in [a, b], and for any x0 [a, b], the sequence defined byxn+1 = g(xn), n 0,
converges to x. The algorithm is called fixed point iteration.
37
Exercise 3.11.
Existence:
Define h(x) = x g(x) on [a, b]. Then h(a) = a g(a) 0 and h(b) = b g(b) 0.As h is continuous, c [a, b] s.t. h(c) = 0. I.e. c = g(c). XUniqueness:
Suppose p, q [a, b] are two fixed points. Then|p q| = |g(p) g(q)| =
MVT, (a,b)|g() (p q)| L |p q|
(1 L) |p q| 0 |p q| 0 p = q . XConvergence:
|xn x| = |g(xn1) g(x)| = |g() (xn1 x)| L |xn1 x| . . . Ln |x0 x| 0 as n .
Hence
xn x as n . X38
12. Newton Method for Nonlinear Equations.
Assume that f C1([a, b]), f (x) = 0 (x is a root or zero) and f (x) 6= 0.The Newton method can be used to find the root x by generating a sequence {xn}satisfying
xn+1 = xn f (xn)f (xn)
, n = 0, 1, . . .
provided f (xn) 6= 0 for all n.The sequence xn converges to the root x as long as the initial point x0 is sufficiently
close to x.
The algorithm stops if |xn+1 xn| < , a prescribed error tolerance, and xn+1 istaken as an approximation to x.
Geometric Interpretation:
Tangent line at (xn, f (xn)) is
Y = f (xn) + f(xn) (X xn).
39
Setting Y = 0 yields xn+1 := X = xn f(xn)f (xn).
40
13. Choice of Initial Point.
Suppose f C2([a, b]) and f (x) = 0 with f (x) 6= 0. Then there exists > 0 suchthat the Newton method generates a sequence xn converging to x for any initial point
x0 [x , x + ] (x0 can only be chosen locally).However, if f satisfies the following additional conditions:
1. f (a)f (b) < 0,
2. f does not change sign on [a, b],
3. tangent lines to the curve y = f (x) at both a and b cut the x-axis within [a, b];
(i.e. a f(a)f (a), b f(b)f (b) [a, b])
then f (x) = 0 has a unique root x in [a, b] and Newton method converges to x for
any initial point x0 [a, b] (x0 can be chosen globally).Example.
Use the Newton method to compute x =c, c > 1, and show that the initial point
can be any point in [1, c].
41
Example.
Find x =c, c > 1.
Answer.
x is root of f (x) := x2 c.Newton: xn+1 = xn f (xn)
f (xn)= xn x
2n c2 xn
=1
2
(xn +
c
xn
), n 0.
How to choose x0?
Check the 3 conditions on [1, c].
1. f (1) = 1 c < 0, f (c) = c2 c > 0. f (1) f (c) < 0 X2. f = 2 X
3. Tangent line at 1: Y = f (1) + f (1) (X 1) = 1 c + 2 (X 1)Let Y = 0, then X = 1 +
c 12
(1, c). XTangent line at c: Y = f (c) + f (c) (X c) = c2 c + 2 c (X c)Let Y = 0, then X = c c 1
2 (1, c). X
42
Newton convergence for any x0 [1, c].
43
Numerical Example.
Find7. (From calculator:
7 = 2.6457513.)
Newton converges for all x0 [1, 7]. Choose x0 = 4.x1 =
1
2
(x0 +
7
x0
)= 2.875
x2 = 2.6548913
x3 = 2.6457670
x4 = 2.6457513
Comparison to bisection method with I0 = [1, 7]:
I1 = [1, 4]
I2 = [2.5, 4]
I3 = [2.5, 3.25]
I4 = [2.5, 2.875]...
I25 = [2.6457512, 2.6457513]
44
14. Pitfalls.
Here are some difficulties which may be encountered with the Newton method:
1. {xn} may wander around and not converge (there are only complex roots to theequation),
2. initial approximation x0 is too far away from the desired root and {xn} convergesto some other root (this usually happens when f (x0) is small),
3. {xn} may diverge to + (the function f is positive and monotonically decreasingon an unbounded interval), and
4. {xn} may repeat (cycling).
45
15. Rate of Convergence.
Suppose {xn} is a sequence that converges to x.The convergence is said to be linear if there is a constant r (0, 1) such that
|xn+1 x||xn x| r, for all n sufficiently large.
The convergence is said to be superlinear if
limn
|xn+1 x||xn x| = 0.
In particular, the convergence is said to be quadratic if
|xn+1 x||xn x|2 M, for all n sufficiently large.
where M is a positive constant, not necessarily less than 1.
Example. xn = x + 0.5n linear, xn = x + 0.5
2n quadratic.
Example. Show that the Newton method converges quadratically.
46
Example.
Define g(x) = x f (x)f (x)
. Then the Newton method is given by
xn+1 = g(xn) .
Moreover, f (x) = 0 and f (x) 6= 0 imply thatg(x) = x ,
g(x) = 1 (f)2 f f (f )2
(x) = f (x)f (x)(f (x))2
= 0 ,
g(x) =f (x)f (x)
.
Assuming that xn x we have that|xn+1 x||xn x|2 =
|g(xn) g(x)||xn x|2 =Taylor |g
(x) (xn x) + 12 g(n) (xn x)2||xn x|2
=1
2|g(n)| 1
2|g(x)| =: as n .
47
Hence |xn+1 x| |xn x|2 quadratic convergence rate.
48
4. Interpolations
1. Polynomial Approximation.
For any continuous function f defined on an interval [a, b], there exist polynomials P
that can be as close to the given function as desired.
Taylor polynomials agree closely with a given function at a specific point, but they
concentrate their accuracy only near that point.
A good polynomial needs to provide a relatively accurate approximation over an entire
interval.
49
2. Interpolating Polynomial Lagrange Form.
Suppose xi [a, b], i = 0, 1, . . . , n, are pairwise distinct mesh points in [a, b].The Lagrange polynomial p is a polynomial of degree n such that
p(xi) = f (xi), i = 0, 1, . . . , n.
p can be constructed explicitly as
p(x) =
ni=0
Li(x)f (xi)
where Li is a polynomial of degree n satisfying
Li(xj) = 0, j 6= i, Li(xi) = 1.This results in
Li(x) =j 6=i
(x xjxi xj
)i = 0, 1, . . . , n.
p is called linear interpolation if n = 1 and quadratic interpolation if n = 2.
50
Exercise.
Find the Lagrange polynomial p for the following points (x, f (x)): (1, 0), (1,3),and (2,4). Assume that a new point (0, 2) is observed, and construct a Lagrangepolynomial to incorporate this new information in it.
Error formula.
Suppose f is n + 1 times differentiable on [a, b]. Then it holds that
f (x) = p(x) +f (n+1)()
(n + 1)!(x x0) (x xn),
where = (x) lies in (a, b).
Proof.
Define g(x) = f (x) p(x) + nj=0
(x xj), where R is a constant. Clearly,g(xj) = 0 for j = 0, . . . , n. To estimate the error at x = 6 {x0, . . . , xn}, choose such that g() = 0.
51
Hence
g(x) = f (x) p(x) (f () p())nj=0
(x xj xj
).
g has at least n + 2 zeros: x0, . . . , xn, .Mean Value Theorem yields that
g has at least n + 1 zeros...
g(n+1) has at least 1 zero, say
Moreover, as p is polynomial of degree n, it holds that p(n+1) = 0.
52
Hence
0 = g(n+1)() = f (n+1)() (f () p()) (n + 1)!nj=0( xj)
Error = f () p() = f(n+1)()
(n + 1)!
nj=0
( xj) .
53
3. Trapezoid Rule.
We can use linear interpolation (n = 1, x0 = a, x1 = b) to approximate f (x) on [a, b]
and then compute baf (x) dx to get the trapezoid rule: b
a
f (x) dx 12(b a) [f (a) + f (b)] .
If we partition [a, b] into n equal subintervals with mesh points xi = a + ih, i =
0, . . . , n, and step size h = (b a)/n, we can derive the composite trapezoid rule: ba
f (x) dx h2[f (x0) + 2
n1i=1
f (xi) + f (xn)] .
The approximation error is in the order of O(h2) if |f | is bounded.
54
Use linear interpolation (n = 1, x0 = a, x1 = b) to approximate f (x) on [a, b] and then
compute baf (x) dx.
Answer.
The linear interpolating polynomial is p(x) = f (a)L0(x) + f (b)L1(x), where
L0(x) =x ba b , and L1(x) =
x ab a .
ba
f (x) dx ba
p(x) dx = f (a)
ba
x ba b dx + f (b)
ba
x ab a dx
= f (a)1
a b1
2(x b)2 |ba +f (b)
1
b a1
2(x a)2 |ba
= f (a)1
a b1
2((a b)2) + f (b) 1
b a1
2(b a)2
=b a2
(f (a) + f (b)) Trapezoid Rule
[Of course, one could have arrived at this formula with a simple geometric argument.]
55
Error Analysis.
Let f (x) = p(x) + E(x), where E(x) =f ()2
(x a) (x b) with (a, b). Assumethat |f | M is bounded. Then
ba
E(x) dx
ba
|E(x)| dx M2
ba
(x a) (b x) dx
=M
2
ba
(x a) [(b a) (x a)] dx
=M
2
ba
[(x a)2 + (b a) (x a)] dx=M
2
[13(b a)3 + 1
2(b a)3
]dx
=M
12(b a)3 .
56
The composite formula can be obtained by considering the partitioning of [a, b] into
a = x0 < x1 < . . . < xn1 < xn = b, where xi = a + i h with h :=b an
.
ba
f (x) dx =n1i=0
xi+1xi
f (x) dx n1i=0
xi+1 xi2
(f (xi) + f (xi+1))
=n1i=0
h
2(f (xi) + f (xi+1))
= h
[1
2f (a) + f (x1) + . . . + f (xn1) +
1
2f (b)
].
Error analysis then yields that
Error M12
h3 n =M (b a)
12h2 = O(h2) .
57
4. Simpsons Rule.
Exercise.
Use quadratic interpolation (n = 2, x0 = a, x1 =a+b2 , x2 = b) to approximate f (x)
on [a, b] and then compute baf (x) dx to get the Simpsons rule: b
a
f (x) dx 16(b a) [f (a) + 4 f (a + b
2) + f (b)] .
Derive the composite Simpsons rule: ba
f (x) dx h3[f (x0) + 2
n/2i=2
f (x2i2) + 4n/2i=1
f (x2i1) + f (xn)] ,
where n is an even number and xi and h are chosen as in the composite trapezoid
rule.
[One can show that the approximation error is in the order of O(h4) if |f (4)| isbounded.]
58
5. NewtonCotes Formula.
Suppose x0, . . . , xn are mesh points in [a, b], usually mesh points are equally spaced
and x0 = a, xn = b, then integral can be approximated by the NewtonCotes
formula: ba
f (x) dx ni=0
Ai f (xi)
where parameters Ai are determined in such a way that the integral is exact for all
polynomials of degree n.[Note: n+1 unknownsAi and n+1 coefficients for polynomial of degree n.] Exercise.
Use NewtonCotes formula to derive the trapezoid rule and the Simpsons rule. Prove
that if f is n + 1 times differentiable and |f (n+1)| M on [a, b] then
| ba
f (x) dxni=0
Ai f (xi)| M(n + 1)!
ba
ni=0
|x xi| dx.
59
Exercise 4.5.
We have that ba
q(x) dx =
ni=0
Ai q(xi) for all polynomials q of degree n.
Let q(x) = Lj(x), where Lj is the jth Lagrange polynomial for the data points x0, x1, . . . , xn.
I.e. Lj is of degree n and satisfies Lj(xi) = ij =
1 i = j0 i 6= j . Now ba
Lj dx =ni=0
Ai Lj(xi) = Aj
ba
f (x) dx ni=0
Ai f (xi) =ni=0
f (xi)
ba
Li(x) dx
=
ba
ni=0
f (xi)Li(x) dx =
ba
p(x) dx,
where p(x) is the interpolating Lagrange polynomial to f . Hence we find trapezoid
60
(n = 1) and Simpsons rule (n = 2, with x1 =a+b2 ).
The Lagrange polynomial has the error term
f (x) = p(x) + E(x), E(x) :=f (n+1)()
(n + 1)!(x x0) (x xn),
where = (x) lies in (a, b). Hence ba
f (x) dx ba
p(x) dx
= ba
E(x) dx
ba
|E(x)| dx
M(n + 1)!
ba
ni=0
|x xi| dx.
61
6. Ordinary Differential Equations.
An initial value problem for an ODE has the form
x(t) = f (t, x(t)), a t b and x(a) = x0. (1)(1) is equivalent to the integral equation:
x(t) = x0 +
ta
f (s, x(s)) ds, a t b. (2)
To solve (2) numerically we divide [a, b] into subintervals with mesh points ti = a+ih,
i = 0, . . . , n, and step size h = (b a)/n. (2) implies
x(ti+1) = x(ti) +
ti+1ti
f (s, x(s)) ds, i = 0, . . . , n 1.
62
(a) If we approximate f (s, x(s)) on [ti, ti+1] by f (ti, x(ti)), we get the Euler (explicit)
method for equation (1):
wi+1 = wi + hf (ti, wi), w0 = x0.
We have x(ti+1) wi+1 if h is sufficiently small.[Taylor: x(ti+1) = x(ti) + x
(ti)h +O(h2) = x(ti) + f (ti, x(ti))h +O(h2).]
(b) If we approximate f (s, x(s)) on [ti, ti+1] by linear interpolation with points (ti, f (ti, x(ti)))
and (ti+1, f (ti+1, x(ti+1))), we get the trapezoidal (implicit) method for equation
(1):
wi+1 = wi +h
2[f (ti, wi) + f (ti+1, wi+1)], w0 = x0.
(c) If we combine the Euler method with the trapezoidal method, we get the modified
Euler (explicit) method (or RungeKutta 2nd order method):
wi+1 = wi +h
2[f (ti, wi) + f (ti+1, wi + hf (ti, wi)
wi+1)], w0 = x0.
63
7. Divided Differences.
Suppose a function f and (n + 1) distinct points x0, x1, . . . , xn are given. Divided
differences of f can be expressed in a table format as follows:
xk 0DD 1DD 2DD 3DD . . .
x0 f [x0]
x1 f [x1] f [x0, x1]
x2 f [x2] f [x1, x2] f [x0, x1, x2]
x3 f [x3] f [x2, x3] f [x1, x2, x3] f [x0, x1, x2, x3]... . . .
64
where f [xi] = f (xi)
f [xi, xi+1] =f [xi+1] f [xi]xi+1 xi
f [xi, xi+1, . . . , xi+k] =f [xi+1, xi+2, . . . , xi+k] f [xi, xi+1, . . . , xi+k1]
xi+k xif [x1, . . . , xn] =
f [x2, . . . , xn] f [x1, . . . , xn1]xn x1
65
8. Interpolating Polynomial Newton Form.
One drawback of Lagrange polynomials is that there is no recursive relationship
between Pn1 and Pn, which implies that each polynomial has to be constructedindividually. Hence, in practice one uses the Newton polynomials.
The Newton interpolating polynomial Pn of degree n that agrees with f at the points
x0, x1, . . . , xn is given by
Pn(x) = f [x0] +
nk=1
f [x0, x1, . . . , xk]
k1i=0
(x xi).
Note that Pn can be computed recursively using the relation
Pn(x) = Pn1(x) + f [x0, x1, . . . , xn](x x0)(x x1) (x xn1).[Note that f [x0, x1, . . . , xk] can be found on the diagonal of the DD table.]
Exercise.
Suppose values (x, y) are given as (1,2), (2,56), (0,2), (3, 4), (1,16), and(7, 376). Is there a cubic polynomial that takes these values? (Answer: 2x3 7x2 +
66
5x 2.)
67
Exercise 4.8.
Data points: (1,2), (2,56), (0,2), (3, 4), (1,16), (7, 376).xk 0DD 1DD 2DD 3DD 4DD 5DD
1 22 56 180 2 27 93 4 2 5 2
1 16 5 3 2 07 376 49 11 2 0 0
Newton polynomial:
p(x) = 2 + 18 (x 1) 9 (x 1) (x + 2) + 2 (x 1) (x + 2) x= 2x3 7x2 + 5x 2 .
[Note: We could have stopped at the row for x3 = 3 and checked whether p3 goes through
the remaining data points.]
68
9. Piecewise Polynomial Approximations.
Another drawback of interpolating polynomials is that Pn tends to oscillate widely
when n is large, which implies that Pn(x) may be a poor approximation to f (x) if x
is not close to the interpolating points.
If an interval [a, b] is divided into a set of subintervals [xi, xi+1], i = 0, 1, . . . , n1, andon each subinterval a different polynomial is constructed to approximate a function
f , such an approximation is called spline.
The simplest spline is the linear spline P which approximates the function f on the
interval [xi, xi+1], i = 0, 1, . . . , n 1, with P agreeing with f at xi and xi+1.Linear splines are easy to construct but are not smooth at points x1, x2, . . . , xn1.
69
10. Natural Cubic Splines.
Given a function f defined on [a, b] and a set of points a = x0 < x1 < < xn = b,a function S is called a natural cubic spline if there exist n cubic polynomials Si
such that:
(a) S(x) = Si(x) for x in [xi, xi+1] and i = 0, 1, . . . , n 1;(b) Si(xi) = f (xi) and Si(xi+1) = f (xi+1) for i = 0, 1, . . . , n 1;(c) S i+1(xi+1) = S i(xi+1) for i = 0, 1, . . . , n 2;(d) S i+1(xi+1) = S i (xi+1) for i = 0, 1, . . . , n 2;(e) S (x0) = S (xn) = 0.
70
Natural Cubic Splines.
| | |
xi xi+1 xi+2
Si Si+1
(a) 4n parameters
(b) 2n equations
(c) n 1 equations(d) n 1 equations(e) 2 equations
4n equations
71
Example. Assume S is a natural cubic spline that interpolates f C2([a, b]) atthe nodes a = x0 < x1 < < xn = b. We have the following smoothness propertyof cubic splines: b
a
[S (x)]2 dx ba
[f (x)]2 dx.
In fact, it even holds that ba
[S (x)]2 dx = mingG
ba
[g(x)]2 dx,
where G := {g C2([a, b]) : g(xi) = f (xi) i = 0, 1, . . . , n}.Exercise: Determine the parameters a to h so that S(x) is a natural cubic spline,
where
S(x) = ax3 + bx2 + cx + d for x [1, 0] andS(x) = ex3 + fx2 + gx + h for x [0, 1]with interpolation conditions S(1) = 1, S(0) = 2, and S(1) = 1.(Answer: a = 1, b = 3, c = 1, d = 2, e = 1, f = 3, g = 1, h = 2.)
72
11. Computation of Natural Cubic Splines.
Denote
ci = S(xi), i = 0, 1, . . . , n.
Then c0 = cn = 0.
Since Si is a cubic function on [xi, xi+1], we know that Si is a linear function on
[xi, xi+1]. Hence it can be written as
S i (x) = cixi+1 x
hi+ ci+1
x xihi
where hi := xi+1 xi.
73
Exercise.
Show that Si is given by
Si(x) =ci
6hi(xi+1 x)3 + ci+1
6hi(x xi)3 + pi (xi+1 x) + qi (x xi),
where
pi =
(f (xi)
hi ci hi
6
), qi =
(f (xi+1)
hi ci+1 hi
6
)and c1, . . . , cn1 satisfy the linear equations:
hi1 ci1 + 2 (hi1 + hi) ci + hi ci+1 = ui,
where
ui = 6 (di di1), di = f (xi+1) f (xi)hi
for i = 1, 2, . . . , n 1.C++ Exercise: Write a program to construct a natural cubic spline. The inputs are
the number of points n + 1 and all the points (xi, yi), i = 0, 1, . . . , n. The output is
a natural cubic spline expressed in terms of the functions Si defined on the intervals
[xi, xi+1] for i = 0, 1, . . . , n 1.74
5. Basic Probability Theory
1. CDF and PDF.
Let (,F , P ) be a probability space, X be a random variable.The cumulative distribution function (cdf) F of X is defined by
F (x) = P (X x), x R.F is an increasing right-continuous function satisfying
F () = 0, F (+) = 1.If F is absolutely continuous thenX has a probability density function (pdf) f defined
by
f (x) = F (x), x R.F can be recovered from f by the relation
F (x) =
x
f (u) du.
75
2. Normal Distribution.
A random variable X has a normal distribution with parameters and 2, written
X N(, 2), if X has the pdf(x) =
1
2e(x)2
2 2
for x R.If = 0 and 2 = 1 then X is called a standard normal random variable and its cdf
is usually written as
(x) =
x
12e
u2
2 du.
If X N(, 2) then the characteristic function (Fourier transform) of X is givenby
c(s) = E(ei sX) = ei t2 s2
2 ,
where i =1. The moment generating function (mgf) is given by
m(s) = E(esX) = e t+2 s2
2 .
76
3. Approximation of Normal CDF.
It is suggested that the standard normal cdf (x) can be approximated by a poly-
nomial (x) as follows:
(x) := 1 (x) (a1k + a2k2 + a3k3 + a4k4 + a5k5) (3)when x 0 and (x) := 1 (x) when x < 0.The parameters are given by k = 11+x, = 0.2316419, a1 = 0.319381530, a2 =
0.356563782, a3 = 1.781477937, a4 = 1.821255978, and a5 = 1.330274429. Thisapproximation has a maximum absolute error less than 7.5 108 for all x.C++ Exercise: Write a program to compute (x) with (3) and compare the result
with that derived with the composite Simpson Rule (see Exercise 4.4). The input is
a variable x and the number of partitions n over the interval [0, x]. The output is the
results from the two methods and their error.
77
4. Lognormal Random Variable.
Let Y = eX and X be a N(, 2) random variable. Then Y is a lognormal random
variable.
Exercise: Show that
E(Y ) = e+12
2, E(Y 2) = e2+2
2.
5. An Important Formula in Pricing European Options.
If V is lognormally distributed and the standard deviation of ln V is s then 1
E(max(V K, 0)) = E(V ) (d1)K (d2)where
d1 =1
slnE(V )
K+s
2and d2 = d1 s.
1K will be later denoted by X. We use a different notation here to avoid an abuse of notation, as K is not a random variable.
78
E(V K)+ = E(V ) (d1)K (d2), d1 = 1slnE(V )
K+s
2and d2 = d1 s.
Proof.
Let g be the pdf of V . Then
E(V K)+ =
(v K)+ g(v) dv = K
(v K) g(v) dv.
As V is lognormal, ln V is normal N(m, s2), where m = ln(E(V )) 12 s2.Let Y := lnVm
s, i.e. V = em+s Y . Then Y N(0, 1) with pdf: (y) = 1
2e
y2
2 .
E(V K)+ = E(em+s Y K)+ = lnKm
s
(em+s y K) (y) dy
=
lnKm
s
em+s y (y) dy K lnKm
s
(y) dy
= I1 K I2 .
79
I1 =
lnKm
s
12
ey2
2 +m+s y dy
=
lnKm
s
12
e(ys)2
2 +m+s2
2 dy
= em+s2
2
lnKm
s s
12
ey2
2 dy [y s 7 y]
= em+s2
2
[1
(lnK m
s s)]
= em+s2
2
(lnK m
s+ s
)= elnE(V )
( lnK + lnE(V ) s22
s+ s
)= E(V )
(1
slnE(V )
K+s
2
)= E(V ) (d1) ,
on recalling m = ln(E(V )) 12 s2.
80
Similarly
I2 = 1 (lnK m
s
)=
(lnK m
s
)= (d1 s) = (d2) .
81
6. Correlated Random Variables.
Assume X = (X1, . . . , Xn) is an n-vector of random variables.
The mean of X is an n-vector = (E(X1), . . . , E(Xn)).
The covariance of X is an n n-matrix with componentsij = (CovX)ij = E((Xi i)(Xj j)) .
The variance of Xi is given by 2i = ii and the correlation between Xi and Xj is
given by ij =ijij
.
X is called a multi-dimensional normal vector, written as X N(,), if X has pdf
f (x) =1
(2)n/21
(det)1/2exp
{12(x )T1(x )
}where is a symmetric positive definite matrix.
82
7. Convergence.
Let {Xn} be a sequence of random variables. There are four types of convergenceconcepts associated with {Xn}: Almost sure convergence, written Xn a.s. X , if there exists a null set N suchthat for all \N one has
Xn() X(), n. Convergence in probability, written Xn P X , if for every > 0 one has
P (|Xn X| > ) 0, n. Convergence in norm, written Xn L
p X , if Xn,X Lp andE|Xn X|p 0, n.
Convergence in distribution, written Xn D X , if for any x at which P (X x)is continuous one has
P (Xn x) P (X x), n.83
8. Strong Law of Large Numbers.
Let {Xn} be independent, identically distributed (iid) random variables with finiteexpectation E(X1) = . Then
Zn
n
a.s. where Zn = X1 + +Xn.
9. Central Limit Theorem. Let {Xn} be iid random variables with finite expecta-tion and finite variance 2 > 0. For each n, let
Zn = X1 + +Xn.Then
Znn n
=Zn n
n
D Z
where Z is a N(0, 1) random variable, i.e.,
P (Zn n
n z) 1
2
z
ex2
2 dx, n.
84
10. LindebergFeller Central Limit Theorem.
Suppose X is a triangular array of random variables, i.e.,
X = {Xn1 , Xn2 , . . . , Xnk(n) : n {1, 2, . . .}}, with k(n) as n,such that, for each n, Xn1 , . . . , X
nk(n) are independently distributed and are bounded
in absolute value by a constant yn with yn 0. LetZn = X
n1 + +Xnk(n).
If E(Zn) and var(Zn) 2 > 0, then Zn converges in distribution to a normallydistributed random variable with mean and variance 2.
Note: LindebergFeller implies the Central Limit Theorem (5.9).
85
If X1, X2, . . . are iid with expectation and variance 2, then define
Xni :=Xi
n, i = 1, 2, . . . , k(n) := n.
For each n, Xn1 , . . . , Xnk(n) are independent and
E(Xni ) =E(Xi)
n= n
= 0 ,
Var(Xni ) =1
n2VarXi = 1
n22 =
1
n.
Let Zn = Xn1 + +Xnk(n) = (
ni=1Xi n) 1n , then
E(Zn) =
k(n)i=1
E(Xni ) = 0 ,
Var(Zn) =
k(n)i=1
Var(Xni ) = 1 .
Hence, by LindebergFeller,
ZnD Z N(0, 1) .
86
6. Optimization
1. Unconstrained Optimization.
Given f : Rn R.Minimize f (x) over x Rn.f has a local minimum at a point x if f (x) f (x) for all x near x, i.e.
> 0 s.t. f (x) f (x) x : x x < .
f has a global minimum at x if
f (x) f (x) x Rn.
87
2. Optimality Conditions.
First order necessary conditions:Suppose that f has a local minimum at x and that f is continuously differentiable
in an open neighbourhood of x. Thenf (x) = 0. (x is called a stationary point.) Second order sufficient Conditions:Suppose that f is twice continuously differentiable in an open neighbourhood of
x and that f (x) = 0 and 2f (x) is positive definite. Then x is a strict localminimizer of f .
Example: Show that f = (2x21 x2)(x21 2x2) has a minimum at (0, 0) along anystraight line passing through the origin, but f has no minimum at (0, 0).
Exercise: Find the minimum solution of
f (x1, x2) = 2x21 + x1x2 + x
22 x1 3x2. (4)
(Answer. 17 (1, 11).)
88
Sufficient Condition.
Taylor gives for any d Rn:f (x + d) = f (x) +f (x)T d + 12 dT 2f (x + d) d (0, 1) .
If x is not strict local minimizer, then
{xk} Rn \ {x} : xk x s.t. f (xk) f (x) .Define dk :=
xkxxkx. Then dk = 1 and there exists a subsequence {dkj} such that
dkj d? as j and d? = 1. W.l.o.g. we assume dk d? as k .f (x) f (xk) = f (x + xk x dk)
= f (x) + xk xf (x)T dk + 12 xk x2 dTk 2f (x + k xk x dk) dk= f (x) + 12 xk x2 dTk 2f (x + k xk x dk) dk .
Hence dTk 2f (x + k xk x dk) dk 0, and on letting k dT? 2f (x) d? 0 .
As d? 6= 0, this is a contradiction to 2f (x) being symmetric positive definite. Hence xis a strict local minimizer.
89
Example 6.2. Show that f = (2x21x2)(x21 2x2) has a minimum at (0, 0) along anystraight line passing through the origin, but f has no minimum at (0, 0).
Answer.
Straight line through (0, 0): x2 = x1, R fixed.g(r) := f (r, r) = (2 r2 r) (r2 2 r)g(r) = 8 r3 15 r2 + 42 r, g(r) = 24 r2 30 r + 42
g(0) = 0 and g(0) = 42 > 0 .Hence r = 0 is a minimizer for g (0, 0) is a minimizer for f along any straightline.
Now let (xk1, xk2) = (
1k, 1k2) (0, 0) as k . Then
f (xk1, xk2) =
1
k21
k2< 0 = f (0, 0) k .
Hence (0, 0) is not a minimizer for f .
[Note: f (0, 0) = 0, but 2f (0, 0) =(0 0
0 4
).]
90
3. Convex Optimization.
Exercise. When f is convex, any local minimizer x is a global minimizer of f . If in
addition f is differentiable, then any stationary point x is a global minimizer of f .
(Hint. Use a contradiction argument.)
91
Exercise 6.3.
When f is convex, any local minimizer x is a global minimizer of f .
Proof.
Suppose x is a local minimizer, but not a global minimizer. Then
x s.t. f (x) < f (x).Since f is convex, we have that
f ( x + (1 ) x) f (x) + (1 ) f (x)< f (x) + (1 ) f (x) = f (x) (0, 1] .
Let x := x + (1 ) x. Thenx x and f (x) < f (x) as 0.
This is a contradiction to x being a local minimizer.
Hence x is a global minimizer for f .
92
4. Line Search.
The basic procedure to solve numerically an unconstrained problem (minimize f (x)
over x Rn) is as follows.(i) Choose an initial point x0 Rn and an initial search direction d0 Rn and set
k = 0.
(ii) Choose a step size k and define a new point xk+1 = xk + k d
k. Check if the
stopping criterion is satisfied (f (xk+1) < ?). If yes, xk+1 is the optimalsolution, stop. If no, go to (iii).
(iii) Choose a new search direction dk+1 (descent direction) and set k = k + 1. Go to
(ii).
The essential and most difficult part in any search algorithm is to choose a descent
direction dk and a step size k with good convergence and stability properties.
93
5. Steepest Descent Method.
f is differentiable.
Choose dk = gk, where gk = f (xk), and choose k s.t.f (xk + k d
k) = minR
f (xk + dk).
Note that the successive descent directions are orthogonal to each other, i.e. (gk)T gk+1 =
0, and the convergence for some functions may be very slow, called zigzagging.
Exercise.
Use the steepest descent (SD) method to solve (4) with the initial point x0 = (1, 1).
(Answer. First three iterations give x1 = (0, 1), x2 = (0, 32), and x3 = (18, 32).)
94
Steepest Descent.
Taylor gives:
f (xk + dk) = f (xk) + f (xk)T dk +O(2).
As
f (xk)T dk = f (xk) dk cos k,with k the angle between dk and f (xk), we see that dk is a descent direction ifcos k < 0. The descent is steepest when k = cos k = 1.Zigzagging.
k is minimizer of () := f (xk + dk) with dk = gk.
Hence
0 = (k) = f (xk + k dk)T dk = f (xk+1)T (gk) = (gk+1)T gk .Hence dk+1 dk, which leads to zigzagging.
95
Exercise 6.5.
Use the SD method to solve (4) with the initial point x0 = (1, 1). [min: 17 (1, 11).]Answer.
f = (4 x1 + x2 1, 2 x2 + x1 3).Iteration 0: d0 = f (x0) = (4, 0) 6= (0, 0).
() = f (x0 + d0) = f (1 4, 1) = 2 (1 4)2 2minimum point at 0 =
14
x1 = x0 + 0 d0 = (0, 1),d1 = f (x1) = (0,1) = (0, 1) 6= (0, 0).
Iteration 1: x2 = (0, 32), d2 = (12, 0).
Iteration 2: x3 = (18, 32), d3 = (0, 18).
96
6. Newton Method.
f is twice differentiable.
Choose dk = [Hk]1gk, where Hk = 2f (xk).Set xk+1 = xk + dk.
If Hk is positive definite then dk is a descent direction.
The main drawback of the Newton method is that it requires the computation of
2f (xk) and its inverse, which can be difficult and time-consuming.Exercise.
Use the Newton method to solve (4) with x0 = (1, 1).
(Answer. First iteration gives x1 = 17 (1, 11).)
97
Newton Method.
Taylor gives
f (xk + d) f (xk) + dT f (xk) + 12 dT 2f (xk) d =: m(d)mindm(d) m(d) = 0
f (xk) +2f (xk) d = 0 .Hence choose dk = [2f (xk)]1f (xk) = [Hk]1gk.If Hk is positive definite, then so is (Hk)1, and we get
(dk)T gk = (gk)T (Hk)1 gk k gk2 < 0for some k > 0.
Hence dk is a descent direction.
[Aside: The Newton method for minx f (x) is equivalent to the Newton method for finding
a root of the system of nonlinear equations f (x) = 0.]
98
Exercise 6.6.
Use the Newton method to minimize
f (x1, x2) = 2x21 + x1x2 + x
22 x1 3x2
with x0 = (1, 1)T .
Answer.
f =(4 x1 + x2 12 x2 + x1 3
), H := 2f =
(4 1
1 2
).
H1 =1
detH
(2 11 4
)= 17
(2 11 4
).
Iteration 0: x0 = (1, 1)T , f (x0) = (4, 0)T .x1 = x0 [H0]1f (x0) =
(1
1
) 17
(2 11 4
)(4
0
)= 17
(111
).
f (x1) = (0, 0)T and H positive definite. x1 is minimum point.
99
7. Choice of Stepsize.
In computing the step size k we face a tradeoff. We would like to choose k to
give a substantial reduction of f , but at the same time we do not want to spend too
much time making the choice. The ideal choice would be the global minimizer of the
univariate function : R R defined by() = f (xk + dk), > 0,
but in general, it is too expensive to identify this value.
A common strategy is to perform an inexact line search to identify a step size that
achieves adequate reductions in f with minimum cost.
is normally chosen to satisfy the Wolfe conditions:
f (xk + k dk) f (xk) + c1 k (gk)Tdk (5)
f (xk + k dk)Tdk c2 (gk)Tdk, (6)with 0 < c1 < c2 < 1. (5) is called the sufficient decrease condition, and (6) is the
curvature condition.
100
Choice of Stepsize.
The simple condition
f (xk + k dk) < f (xk) ()
is not appropriate, as it may not lead to a sufficient reduction.
Example: f (x) = (x 1)2 1. So min f (x) = 1, but we can choose xk satisfying ()such that f (xk) = 1
k 0.
Note that the sufficient decrease condition (5)
() = f (xk + dk) `() := f (xk) + c1 (gk)Tdkyields acceptable regions for . Here () < `() for small > 0, as (gk)Tdk < 0 for
descent directions.
The curvature condition (6) is equivalent to
() c2 (0) [ > (0) ]i.e. a condition on the desired slope and so rules out unacceptably short steps . In
practice c1 = 104 and c2 = 0.9.
101
8. Convergence of Line Search Methods.
An algorithm is said to be globally convergent if
limk
gk = 0.
It can be shown that if the step sizes satisfy the Wolfe conditions
then the steepest descent method is globally convergent, so is the Newton method provided the Hessian matrices 2f (xk) have a boundedcondition number and are positive definite.
Exercise. Show that the steepest descent method is globally convergent if the
following conditions hold
(a) k satisfies the Wolfe conditions,
(b) f (x) M x Rn,(c) f C1 and f is Lipschitz: f (x)f (y) L x y x, y Rn.
102
[Hint: Show thatk=0
gk2
Exercise 6.8.
Assume that dk is a descent direction, i.e. (gk)T dk < 0, where gk := f (xk). Then if1. k satisfies the Wolfe conditions,
2. f (x) M x Rn,3. f C1 and f is Lipschitz, i.e. f (x)f (y) L x y x, y Rn,
it holds thatk=0
cos2 k gk2
The Lipschitz condition yields that
(gk+1 gk)T dk gk+1 gk dk = f (xk+1)f (xk) dk L xk+1 xk dk = k L dk2 . ()
Combining () and () gives k c2 1L
(gk)T dk
dk2 , and hence
k (gk)T dk c2 1
L
[(gk)T dk]2
dk2Together with Wolfe condition (5) we get
f (xk+1) f (xk) + c1 c2 1L
[(gk)T dk]2
dk2 = f (xk) c cos2 k gk2 ,
105
where c := c11c2L
> 0.
f (xk+1) f (xk) c cos2 k gk2 f (x0) ck
j=0
cos2 j gj2
k
j=0
cos2 j gj2 1c(f (x0)M) k
j=0
cos2 j gj2
9. Popular Search Methods.
In practice the steepest descent method and the Newton method are rarely used
due to the slow convergence rate and the difficulty in computing Hessian matrices,
respectively.
The popular search methods are
the conjugate gradient method (variation of SD method with superlinear conver-gence) and
the quasi-Newton method (variation of Newton method without computation ofHessian matrices).
There are some efficient algorithms based on the trust-region approach. See Fletcher
(2000) for details.
107
10. Constrained Optimization.
Minimize f (x) over x Rn subject tothe equality constraints
hi(x) = 0, i = 1, . . . , l ,
and the inequality constraints
gj(x) 0, j = 1, . . . ,m .
Assume that all functions involved are differentiable.
108
11. Linear Programming.
The problem is to minimize
z = c1 x1 + + cn xnsubject to
ai1 x1 + + ain xn bi, i = 1, . . . ,m ,and
x1, . . . , xn 0.LPs can be easily and efficiently solved with the simplex algorithm or the interior
point method.
MS-Excel has a good in-built LP solver capable of solving problems up to 200 vari-
ables. MATLAB with optimization toolbox also provides a good LP solver.
109
12. Graphic Method.
If an LP problem has only two decision variables (x1, x2), then it can be solved by
the graphic method as follows:
First draw the feasible region from the given constraints and a contour line of theobjective function,
then, on establishing the increasing direction perpendicular to the contour line,find the optimal point on the boundary of the feasible region,
then find two linear equations which define that point, and finally solve the two equations to obtain the optimal point.Exercise. Use the graphic method to solve the LP:
minimize z = 3 x1 2 x2subject to x1 + x2 80, 2x1 + x2 100, x1 40, and x1, x2 0.(Answer. x1 = 20, x2 = 60.)
110
13. Quadratic Programming.
Minimize
xTQx + cTx
subject to
Ax b and x 0,where Q is an n n symmetric positive definite matrix, A is an n m matrix,x, c Rn, b Rm.To solve a QP problem, one
first derives a set of equations from the KuhnTucker conditions, and then applies the Wolfe algorithm or the Lemke algorithm to find the optimalsolution.
The MS-Excel solver is capable of solving reasonably sized QP problems, similarly
for MATLAB.
111
14. KuhnTucker Conditions.
min f (x) over x Rn s.t. hi(x) = 0, i = 1, . . . , l; gj(x) 0, j = 1, . . . ,m.Assume that x is an optimal solution.
Under some regularity conditions, called the constraint qualifications, there exist
two vectors u = (u1, . . . , ul) and v = (v1, . . . , vm), called the Lagrange multipliers,
such that the following set of conditions is satisfied:
Lxk(x, u, v) = 0, k = 1, . . . , n
hi(x) = 0, i = 1, . . . , l
gj(x) 0, j = 1, . . . ,mvj gj(x) = 0, vj 0, j = 1, . . . ,m
where
L(x, u, v) = f (x) +
li=1
ui hi(x) +
mj=1
vj gj(x)
is called the Lagrange function or Lagrangian.
112
Furthermore, if f : Rn R and hi, gj : Rn R are convex, then x is an optimalsolution if and only if (x, u, v) satisfies the KuhnTucker conditions.
This holds in particular, when f is convex and hi, gj are linear.
Example.
Find the minimum solution to the function x2 x1 subject to x21 + x22 1.Exercise.
Find the minimum solution to the function x21+x222x14x2 subject to x1+2x2 2
and x2 0.(Answer. (x1, x2) =
15 (2, 4).)
113
Interpretation of KuhnTucker conditions
Assume that no equality constraints are present.
If x is an interior point, i.e. no constraints are active, then we recover the usual optimality
condition: f (x) = 0.Now assume that x lies on the boundary of the feasible set and let gjk be the active
constraints at x. Then a necessary condition for optimality is that we cannot find a
descent direction for f at x that is also a feasible direction. Such a vector cannot exist,
if
f (x) =k
vjkgjk(x) with vjk 0 . ()
This is because, if d Rn is a descent direction, thenf (x)T d < 0 andk vjkgjk(x)T d >0.
So there must exist a jk, such that gjk(x)T d > 0. But that means that d is an ascentdirection for gjk, and as gjk is active at x, it is not a feasible direction.
If we require vj gj(x) = 0 for all inactive constraints, then we can re-write () as 0 =114
f (x) +mj=1 vjgj(x) with vj 0. These are the KT conditions.
115
Application of KuhnTucker: LP Duality
Let b Rm, c Rn and A Rnm.min cT x s.t. Ax b, x 0 . (P)
Equivalent to min cT x s.t. b Ax 0, x 0 .Lagrangian: L = cT x + vT (b Ax) + yT (x).Hence, x is the solution, if there exist v and y such that
L = c AT v y = 0 y = c AT v,KT conditions: vT (b A x) = 0, yT (x) = 0,
v, y 0, b A x 0, x 0.
116
Eliminate y to find v, x:
A x b, x 0 feasible region: primalAT v c, v 0 feasible region: dualvT (b A x) = 0xT (c AT v) = 0
} xT c = xT AT v = vT b
Hence v Rm solves the dual:max bT v s.t. AT v c, v 0 . (D)
117
Here we have used that
cT x = minx0, A xb
cT x
minx0 maxv0 c
T x + vT (b Ax)= max
v0 minx0 cT x + vT (b Ax)
= maxv0 minx0 v
T b + xT (c AT v) max
v0, AT vcvT b
vT b = cT x
118
Example 6.14.
Find the minimum solution to the function x2 x1 subject to x21 + x22 1.Answer.
L = x2 x1 + v (x21 + x22 1), so the KT conditions becomeLx1 = 1 + 2 v x1 = 0 (1)Lx2 = 1 + 2 v x2 = 0 (2)
x21 + x22 1 (3)
v (x21 + x22 1) = 0, v 0 (4)
(1) v > 0 and hence x1 = 12v , x2 = 12v .Plugging this into (4) yields 2
4 v2= 1 and hence
v =12
x1 = 12, x2 = 1
2;
with the optimal value being z = 2.
119
Example.
min x1 s.t. x2 x31 0, x1 1, x2 0.Since x31 x2 0 we have x1 0 and hence x = (0, 0) is the unique minimizer.Lagrangian: L = x1 + v1 (x2 x31) + v2 (x1 1) + v3 (x2).KT conditions for a feasible point x:
L =(1 3 v1 x21 + v2
v1 v3
)= 0 (1)
v1 (x2 x31) = 0, v2 (x1 1) = 0, v3 (x2) = 0 (2)v1, v2, v3 0 (3)
Check KT conditions at x = (0, 0):
(1) v2 = 1 < 0 impossible!KT condition is not satisfied, since the constraint qualifications do not hold.
Here g1 = x2x31 and g3 = x2 are active at (0, 0), and g1 =(01
), g3 =
(01). Hence
g1 and g3 are not linearly independent!120
Exercise 6.14
Find the minimum solution to the function x21+x22 2 x1 4 x2 subject to x1+2 x2 2
and x2 0.Answer.
Lagrangian: L = x21 + x22 2 x1 4 x2 + v1 (x1 + 2 x2 2) + v2 (x2)
KT conditions:
L =(
2 x1 2 + v12 x2 4 + 2 v1 v2
)= 0 (1)
v1 (x1 + 2 x2 2) = 0, v2 x2 = 0, v1, v2 0 (2)x1 + 2 x2 2, x2 0 (3)
If x1 + 2 x2 2 < 0 then v1 = 0 x1 = 1 , x2 = 2 + 12 v2 2. Hence from (2),v2 = 0, and so x1 = 1, x2 = 2. But that contradicts (3), and so it must hold that
x1 + 2 x2 2 = 0.If x2 > 0, then v2 = 0 solving (1) together with x1 + 2 x2 2 = 0 gives x1 = 25,x2 =
45; and v1 =
65, v2 = 0. X
121
If x2 = 0 then x1 = 2 2 x2 = 2. But then from (1), v1 = 2 < 0. Impossible.
122
7. Lattice (Tree) Methods
1. Concepts in Option Pricing Theory.
A call (or put) option is a contract that gives the holder the right to buy (or sell)
a prescribed asset (underlying asset) by a certain date T (expiration date) for a
predetermined price X (exercise price).
A European option can only be exercised on the expiration date while an American
option can be exercised at any time prior to the expiration date.
The other party to the holder of the option contract is called the writer.
The holder and the writer are said to be in long and short positions of the contract,
respectively.
The terminal payoffs of a European call (or put) option is (S(T )X)+ := max(S(T )X, 0) (or (X S(T ))+), where S(T ) is the underlying asset price at time T .
123
2. Random Walk Model and Assumption.
Assume S(t) and V (t) are the asset price and the option price at time t, respectively,
and the current time is t and current asset price is S, i.e. S(t) = S.
After one period of time t, the asset price S(t + t) is either uS (up state)
with probability q or dS (down state) with probability 1 q, where q is a real(subjective) probability.
To avoid riskless arbitrage opportunities, we must have
u > R > d
where R := ert and r is the riskless interest rate.
Here r is the riskless interest rate that allows for unlimited borrowing or lending at
the same rate r.
Investing $1 at time t yields a value (return) at time t + t of $R = $(ert).
(continuous compounding).
124
Continuous compounding
Saving an amount B at time t, yields with interest rate r at time t + t without
compounding: (1 + rt)B.
Compounding the interest n times yields(1 +
rt
n
)nB ertB as n .
Example:
Consider t = 1 and r = 0.05.
Then R = er = 1.0512711, equivalent to an AER of 5.127%.
n (1 + rn)n AER
1 1.05 5 %
4 1.05094534 5.095 %
12 1.05116190 5.116 %
52 1.05124584 5.125 %
365 1.05126750 5.127 %
125
No Arbitrage. u > R > d
If R u, then short sell > 0 units of the asset, deposit S in the bank; giving aportfolio value at time t of:
(t) = 0
and at time t +t of
(t +t) = R (S) S(t +t) R (S) (uS) 0 .
Moreover, in the down state (with probability 1 q > 0) the value is(t +t) = R (S) (dS) > 0 .
Hence we can make a riskless profit with positive probability.
No arbitrage implies u > R.
A similar argument yields that d < R.
126
3. Replicating Portfolio.
We form a portfolio consisting of
units of underlying asset ( > 0 buying, < 0 short selling) and
cash amount B in riskless cash bond (B > 0 lending, B < 0 borrowing).
If we choose
=V (uS, t +t) V (dS, t +t)
uS dS ,
B =uV (dS, t +t) d V (uS, t +t)
uR dR ,we have replicated the payoffs of the option at time t+t, no matter how the asset
price changes.
127
Replicating Portfolio.
Value of portfolio at time t : (t) = S + B, and at time t +t:
(t +t) = S(t +t) +RB =
uS +RB =: tu w. prob. q dS +RB =: td w. prob. 1 qCall option value at time t +t (expiration time) :
C(t +t) =
(uS X)+ =: Ctu w. prob. q(dS X)+ =: Ctd w. prob. 1 qTo replicate the call option, we must have tu = C
tu and
td = C
td . HenceuS +RB = Ctu d S +RB = Ctd
=
Ctu Ctdu S dS
B =uCtd dCtuuR dR
This gives the fair price for the call option at time t: (t) = S + B.
128
Replicating Portfolio.
The value of the call option at time t is:
C(t) = (t) = S + B =Ctu Ctdu S dS S +
uCtd dCtuuR dR
=1
R
[R du d C
tu +
u Ru d C
td
]=
1
R
[pCtu + (1 p)Ctd
],
where
p =R du d .
No arbitrage argument: If C(t) < (t), then buy call option at price C(t) and sell
portfolio at (t) (that is short sell units of asset and lend B amounts of cash to the
bank). This gives an instantaneous profit of (t) C(t) > 0. And we know that attime t+t, the payoff from our call option will exactly compensate for the value of the
portfolio, as C(t +t) = (t +t).
A similar argument for the case C(t) > (t) yields that C(t) = (t).
129
4. Risk Neutral Option Price.
The current option price is given by
V (S, t) =1
R
(p V (uS, t +t) + (1 p)V (dS, t +t)
), (7)
where p =R du d . Note that
1. the real probability q does not appear in the option pricing formula,
2. p is a probability (0 < p < 1) since d < R < u, and
3. p (uS) + (1 p) (dS) = RS, so p is called the risk neutral probability and theprocess of finding V is called the risk neutral valuation.
130
5. Riskless Hedging Principle.
We can also derive the option pricing formula (7) as follows:
form a portfolio with a long position of one unit of option and a short position of
units of underlying asset.
By choosing an appropriate we can ensure such a portfolio is riskless.
Exercise.
Find (called delta) and derive the pricing formula (7).
131
6. Asset Price Process.
In a risk neutral continuous time model, the asset price S is assumed to follow a
lognormal process.
(as discussed in detail in Stochastic Processes I, omitted here).
Over a period [t, t+ ] the asset price S(t+ ) can be expressed in terms of S(t) = S
as
S(t + ) = S e(r12
2) + Z
where r is the riskless interest rate, the volatility, and Z N(0, 1) a standardnormal random variable.
S(t + ) has the first moment S er and the second moment S2 e(2 r+2) .
132
7. Relations Between u, d and p.
By equating the first and second moments of the asset price S(t + t) in both
continuous and discrete time models, we obtain the relation
S(p u + (1 p) d) = S ert,
S2(p u2 + (1 p) d2) = S2 e(2 r+2)t,
or equivalently,
p u + (1 p) d = ert, (8)p u2 + (1 p) d2 = e(2 r+2)t. (9)
There are two equations and three unknowns u, d, p. An extra condition is needed to
uniquely determine a solution.
[Note that (8) implies p =R du d , where R = e
rt as before.]
133
8. CoxRossRubinstein Model.
The extra condition is
u d = 1. (10)
The solutions to (8), (9), (10) are
u =1
2R
(2 + 1 +
(2 + 1)2 4R2
),
d =1
2R
(2 + 1
(2 + 1)2 4R2
),
where
2 = e(2 r+2)t.
If the higher order term O((t)32) is ignored in u, d, then
u = et, d = e
t, p =
R du d .
These are the parameters chosen by CoxRossRubinstein for their model.
Note that with this choice (9) is satisfied up to O((t)2).
134
Proof.
2 = e(2 r+2)t
= p (u2 d2) + d2
=R du d (u + d) (u d) + d
2
= Ru d u +Rd= Ru 1 + R
u Ru2 (1 + 2)u +R = 0 .
As u > d, we get the unique solutions
utrue =1 + 2 +
(1 + 2)2 4R22R
=1 + 2
2R+
(1 + 2
2R
)2 1
and
dtrue =1
u=1 + 2 (1 + 2)2 4R2
2R=1 + 2
2R(
1 + 2
2R
)2 1 .
135
Moreover1 + 2
2R=1
2
(e(2 r+
2)t + 1)ert
=1
2
(2 + (2 r + 2)t +O((t)2)
) (1 rt +O((t)2))
=1
2
(2 + 2t +O((t)2)
)= 1 + 12
2t +O((t)2),
and (1 + 2
2R
)2 1 =
(1 + 12
2t +O((t)2))2 1
=1 + 2t +O((t)2) 1
= t1 +O(t)
= t (1 +O(t))
= t +O((t)
32) .
utrue = 1 + 12 2t + t +O((t)
32) .
136
The CoxRossRubinstein (CRR) model uses
u = et = 1 +
t + 12
2t +O((t)32) .
Hence the first three terms in the Taylor series of the CRR value u and the true value
utrue match. So
u = utrue +O((t)32) .
Estimating the error in (9) by this choice of u gives
Error = p u2 + (1 p) d2 e(2 r+2)t= R (u + d) 1 e(2 r+2)t= ert
(et + e
t) 1 e(2 r+2)t
=(1 + rt +O((t)2)
) (2 + 2t +O((t)2)
) 1 (1 + (2 r + 2)t +O((t)2))
= O((t)2) .
137
9. JarrowRudd Model.
The extra condition is
p =1
2. (11)
Exercise.
Show that the solutions to (8), (9), (11) are
u = R(1 +
e
2t 1),
d = R(1
e
2t 1).
Show that if O((t)32) is ignored then
u = e(r2
2 )t+t ,
d = e(r2
2 )tt .
(These are the parameters chosen by JarrowRudd for their model.)
Also show that (8) and (9) are satisfied up to O((t)2).
138
10. Tian Model.
The extra condition is
p u3 + (1 p) d3 = e3 rt+3 2 t. (12)Exercise.
Show that the solutions to (8), (9), (12) are
u =RQ
2
(Q + 1 +
Q2 + 2Q 3
),
d =RQ
2
(Q + 1
Q2 + 2Q 3
),
p =R du d ,
where R = ert and Q = e2 t.
Also show that if O((t)32) is ignored, then
p =1
2 34t.
139
(Note that u d = R2Q2 instead of u d = 1 and that the binomial tree loses its
symmetry about S whenever u d 6= 1.)
140
11. BlackScholes Equation (BSE).
As the time interval t tends to zero, the one period option pricing formula, (7) with
(8) and (9), tends to the BlackScholes Equation
V
t+ r S
V
S+1
22 S2
2V
S2 r V = 0. (13)
Proof.
Two variable Taylor expansion at (S, t) gives
V (uS, t +t) = V + VS (uS S) + Vtt+ 12(VSS (uS S)2 + 2VSt (uS S)t + Vtt (t)2
)+ higher order terms .
Similarly,
V (dS, t +t) = V + VS (dS S) + Vtt+ 12(VSS (dS S)2 + 2VSt (dS S)t + Vtt (t)2
)+ higher order terms .
141
Substituting into the right hand side of (7) gives
V = {V + S VS [p (u 1) + (1 p) (d 1)] + Vtt+12[S2 VSS
(p (u 1)2 + (1 p) (d 1)2)
+2S VSt (p (u 1) + (1 p) (d 1))t+Vtt (t)
2]+ higher order terms
}ert .
Now it follows from (8) that
p (u 1) + (1 p) (d 1) = p u + (1 p) d 1 = ert 1 = rt +O((t)2) .Similarly, (8) and (9) give that
p (u 1)2 + (1 p) (d 1)2 = p u2 + (1 p) d2 2 (p u + (1 p) d) + 1= e(2 r+
2)t 2 ert + 1 = 2t +O((t)2) .
142
So, for the RHS of (7) we get
V ={V + S VS
(rt +O((t)2)
)+ Vtt
+12[S2 VSS
(2t +O((t)2)
)+ 2VSt
(rt +O((t)2)
)t]
+O((t)2)}ert
={V +
(r S VS + Vt +
12
2 S2 VSS)t +O((t)2)
} (1 rt +O((t)2))
= V +(r V + r S VS + Vt + 12 2 S2 VSS) t +O((t)2) .
Cancelling V and dividing by t yields
r V + r S VS + Vt + 12 2 S2 VSS +O(t) = 0 .Ignoring the O(t) term, we get the BSE.
The binomial model approximates BSE to 1st order accuracy.
143
12. n-Period Option Price Formula.
For a multiplicative n-period binomial process, the call value is
C = Rnnj=0
(n
j
)pj (1 p)nj max(uj dnj S X, 0) , (14)
where
(n
j
)=
n!
j!(n j)! is the binomial coefficient.
Define k to be the smallest non-negative integer such that uk dnk S X .The call pricing formula can be simplified as
C = S (n, k, p)X Rn(n, k, p) , (15)where p =
u p
Rand is the complementary binomial distribution function defined
by
(n, k, p) =n
j=k
(n
j
)pj (1 p)nj.
144
Asset:
u2 S
uS
S u dS
dS
d2 S
recombined tree
Call:
C2tuu = (u2 S X)+
Ctu
C C2tud = (u dS X)+Ctd
C2tdd = (d2 S X)+
t t +t t + 2t
145
Call price at time t +t:
Ctu =1
R
(pC2tuu + (1 p)C2tud
), Ctd =
1
R
(pC2tud + (1 p)C2tdd
).
Call price at time t:
C =1
R
(pCtu + (1 p)Ctd
)=
1
R2(p2C2tuu + 2 p (1 p)C2tud + (1 p)2C2tdd
).
Proof of (14) by induction:
Assume (14) holds for n k 1, then for n = k, there are k 1 periods between timet +t and t + kt. So
Ctu =1
Rk1
k1j=0
(k 1j
)pj (1 p)k1j (uj dk1j (uS)X)+
=1
Rk1
kj=1
(k 1j 1
)pj1 (1 p)kj (uj dkj S X)+ .
146
pCtu =1
Rk1
kj=1
(k 1j 1
)pj (1 p)kj (uj dkj S X)+ ,
and similarly
(1 p)Ctd =1
Rk1
k1j=0
(k 1j
)pj (1 p)kj (uj dkj S X)+ .
Combining gives
C =1
R
(pCtu + (1 p)Ctd
)=
1
Rk[pk (uk S X)+ + (1 p)k (dk S X)+
+k1j=1
{(k 1j 1
)+
(k 1j
)}
=
(k
j
) pj (1 p)kj (uj dkj S X)+]
This proves (14).
147
Let k be the smallest non-negative integer such that
uk dnk S X (ud
)k XS dn
k ln XS dn
/ lnu
d.
Then
(uj dnj S X)+ =0 j < kuj dnj S X j k .
Hence
C =1
Rn
nj=k
(n
j
)pj (1 p)nj (uj dnj S X)
= S
nj=k
(n
j
)[p u
R
]j [(1 p) d
R
]njX Rn
nj=k
(n
j
)pj (1 p)nj .
On setting p =p u
R, it holds that
1 p = 1 R du d
u
R=Rd + u dR (u d) =
uRu d
d
R=(1 p) d
R.
This proves (15).
148
13. BlackScholes Call Price Formula.
As the time interval t tends to zero, the n-period call price formula (15) tends to
the BlackScholes call option pricing formula
c = S (d1)X er (Tt)(d2) (16)where
d1 =ln S
X+ (r +
2
2 ) (T t)T t
and
d2 =ln S
X+ (r 22 ) (T t)T t d1
T t.
Proof.
The n-period call price (15) is given by
C = S (n, k, p)X Rn(n, k, p) = S (n, k, p)X er nt er (Tt)
(n, k, p) .
149
Hence we need to prove
(n, k, p) (d1) and (n, k, p) (d2) as t 0.
150
Let j be the binomial RV of the number of upward moves in n periods. Then
Sn = S uj dnj = S
(ud
)jdn ,
where Sn is the asset price at time T = t + nt.
ln SnS
= j lnu
d+ n ln d = j ln u + (n j) ln d ,
E(j) = n p , Var(j) = n p (1 p) ,E
(lnSn
S
)= n p ln
u
d+ n ln d , Var
(lnSn
S
)= n p (1 p)
(lnu
d
)2.
Moreover,
1 (n, k, p) = P (j k 1) = P(lnSn
S (k 1) ln u
d+ n ln d
).
As k is the smallest integer such that k ln XS dn
/ ln ud, there exists an (0, 1] such
that
k 1 = ln XS dn
/ lnu
d ,
151
and so
1 (n, k, p) = P(lnSn
S lnX
S ln u
d
).
152
Define
Xni :=
ln u with prob. pln d with prob 1 p i = 1, . . . , n ,and Zn := X
n1 + . . . +X
nn .
Then, for each n, {Xn1 , . . . , Xnn} are independent and Zn ln SnS . Hence
1 (n, k, p) = P (Zn n) , n := lnXS ln u
d.
Assuming e.g. the CRR model, we get
E(Zn) = n p ln u + n (1 p) ln d (T t), where := r 2
2,
Var(Zn) = n p (1 p) ln2 ud
2 (T t) ,
n = lnX
S ln u
d lnX
S,
and |Xni | |t| =
T tn
yn 0 as n .
153
Hence Theorem 5.10. implies that
ZnD Z N( (T t), 2 (T t)) .
154
So, as n
1 (n, k, p) = P (Zn n) 12 2 (T t)
ln XS
e(s (Tt))2
2 2 (Tt) ds .
On letting
y :=s (T t)T t ,
we have
1 (n, k, p) 12
ln XS (Tt)Tt
e
y2
2 dy =
(ln X
S (T t)T t
).
Hence
(n, k, p) 1 (ln X
S (T t)T t
)=
(ln
XS (T t)T t
)
=
(ln S
X+ (r 22 ) (T t)T t
)= (d2) .
Similarly one can show (n, k, p) (d1).155
14. Implementation.
Choose the number of time steps that is required to march down until the expiration
time T to be n, so that t =T tn
.
At time ti = t + it there are i + 1 nodes.
Denote these nodes as (i, j) with i = 0, 1, . . . , n and j = 0, 1, . . . , i, where the first
component i refers to time ti and the second component j refers to the asset price
S uj dij.
Compute the option prices V nj = V (S uj dnj, T ), j = 0, 1, . . . , n, at expiration time
tn = T over n + 1 nodes.
Then go backwards one step and compute option prices at time tn1 = T t usingthe one period binomial option pricing formula over n nodes.
Continue until reaching the current time t0 = t.
156
The recursive formula is
V ij =1
R
(p V i+1j+1 + (1 p)V i+1j
), j = 0, . . . , i and i = n 1, . . . , 0 ,
where
p =R du d .
The value V 00 = V (S, t) is the option price at the current time t.
C++ Exercise: Write a program to implement the CoxRossRubinstein model to
price options with a non-dividend paying underlying asset. The inputs are the asset
price S, the strike price X , the riskless interest rate r, the current time t and the
maturity time T , the volatility , and the number of steps n. The outputs are the
European call and put option prices.
157
15. Greeks.
Assume the current time is t = 0. Some common sensitivity measures of a European
call option price c (the rate of change of the call price with respect to the underlying
factors) are delta, gamma, vega, rho, theta, defined by
=c
S, =
2c
S2, v =
c
, =
c
r, = c
T.
Exercise.
Derive the following relations from the call price (16).
= (d1) ,
=(d1)S
T,
v = ST (d1) ,
= X T er T (d2) ,
= S (d1)
2T
r X er T (d2) .
158
16. Dynamic Hedging.
Greeks can be used to hedge an option position against changes of underlying factors.
For example, to hedge a short position of a European call against changes of the asset
price, one should keep a long position of the underlying asset with units.
This strategy requires continuous trading and is prohibitively expensive if there are
any transaction costs.
159
17. Dividends.
(a) If the asset pays a continuous dividend yield at a rate q, then use r q instead ofr in building the asset price tree, but still use r in discounting the option values
in the backward calculation.
In the risk-neutral model, the asset has a rate of return of r q.So
(8) becomes p u + (1 p) d = e(rq)t ,(9) becomes p u2 + (1 p) d2 = e(2 r2 q+2)t .
However, the discount factor1
Rremains the same:
1
R= ert .
160
(b) If the asset pays a discrete proportional dividend S between ti1 and ti, then theasset price at node (i, j) drops to (1 )uj dij S instead of the usual uj dij S.The tree is still recombined.
161
(1 ) u4 S
(1 ) u2 SS
(1 ) u dS
(1 ) d2 S (1 ) d4 S
(c) If the asset pays a discrete cash dividend D between ti1 and ti, then the assetprice at node (i, j) drops to uj dij SD instead of the usual uj dij S. There arei+ 1 new recombined trees emanating from the nodes at time ti, but the original
tree is no longer recombined.
162
Suppose a cash dividend D is paid between ti1 and ti, then at time ti+m thereare (i + 1) (m + 1) nodes, instead of i +m + 1 in a recombined tree.
If there are several cash dividends, then the number of nodes is much larger than
in a normal tree.
Example. Assume there is a cash dividend D between periods 1 and 2. Then
the possible asset prices at time n = 2 are {u2 S D, u dS D, d2 S D}. Atn = 3 there are 6 nodes: u {u2 S D, u dS D, d2 S D} and d {u2 S D, u dS D, d2 S D}, instead of 4.[Reason: d (u2 S D) 6= u (u dS D)].
163
u2 (u2 S D)
u2 S D
S
u dS D
d2 S D
d2 (d2 S D)
164
18. Trinomial Model.
Suppose the asset price is S at time t. At time t + t the asset price can be
uS (up-state) with probability pu, mS (middle-state) with probability pm, and dS
(down-state) with probability pd. Then the risk neutral one period option pricing
formula is
V (S, t) =1
R(pu V (uS, t +t) + pm V (mS, t +t) + pd V (dS, t +t)) .
The trinomial tree approach is equivalent to the explicit finite difference method. By
equating the first and second moments of the asset price S(t+t) in both continuous
and discrete models, we obtain the following equations:
pu + pm + pd = 1 ,
pu u + pmm + pd d = ert ,
pu u2 + pmm
2 + pd d2 = e(2 r+
2)t .
There are three equations and six variables u,m, d and pu, pm, pd. We need three
more equations to uniquely determine these variables.
165
19. Boyle Model.
The three additional conditions are
m = 1, u d = 1, u = et ,
where is a free parameter.
Exercise.
Show that the risk-neutral probabilities are given by
pu =(W R)u (R 1)
(u 1) (u2 1) , pm = 1 pu pd ,
pd =(W R)u2 (R 1) u3
(u 1) (u2 1) ,
where R = ert and W = e(2 r+2)t.
Note that if = 1, which corresponds to the choice of u as in the CRR model, certain
sets of parameters can lead to pm < 0. To rectify this, choose > 1.
166
Boyle claimed that if pu pm pd 13, then the trinomial scheme with 5 steps iscomparable to the CRR binomial scheme with 20 steps.
167
20. HullWhite Model.
This is the same as the Boyle model with =3.
Exercise.
Show that the risk-neutral probabilities are
pd =1
6
t
12 2
(r 1
22),
pm =2
3,
pu =1
6+
t
12 2
(r 1
22),
if terms of order O(t) are ignored.
168
21. KamradRitchken Model.
If S follows a lognormal process, then we can write ln S(t+t) = lnS(t)+Z, where
Z is a normal random variable with mean (r 12 2)t and variance 2t.KR suggested to approximate Z by a discrete random variable Za as follows: Za =
x with probability pu, 0 with probability pm, and x with probability pd, wherex =
t and 1.
The corresponding u,m, d in the trinomial tree are u = ex,m = 1, d = ex.
By omitting the higher order term O((t)2), they showed that the risk-neutral prob-
abilities are
pu =1
22+
1
2
(r 1
22)
t ,
pm = 1 12
,
pd =1
22 12
(r 1
22)
t .
Note that if = 1 then pm = 0 and the trinomial scheme is reduced to the binomial
169
scheme.
KR claimed that if pu = pm = pd =13, then the trinomial scheme with 15 steps is
comparable to the binomial scheme (pm = 0) with 55 steps. They also discussed the
trinomial scheme with two correlated state variables.
170
Za =
x with prob. pu
0 with prob. pm
x with prob. pdand the equations become
pu + pm + pd = 1 , (1)
E(Za) = x (pu pd) = (r 12 2)t , (2)Var(Za) = (x)2 (pu + pd) (x)2 (pu pd)2
O((t)2)
= 2t .
Dropping the O((t)2) term leads to
(x)2 (pu + pd) = 2t . (3)
171
Hence
pu =1
2
[t
(x)22 +
t
x(r 1
22)
], pd =
1
2
[t
(x)22 t
x(r 1
22)
].
pu = 12
[1
2+
1
(r 1
22)
t
], pd =
1
2
[1
2 1
(r 122)
t
].
172
8. Finite Difference Methods
1. Diffusion Equations of One State Variable.
u
t= c2
2u
x2, (x, t) D , (17)
where t is a time variable, x is a state variable, and u(x, t) is an unknown function
satisfying the equation.
To find a well-defined solution, we need to impose the initial condition
u(x, 0) = u0(x) (18)
and, if D = [a, b] [0,), the boundary conditionsu(a, t) = ga(t) and u(b, t) = gb(t) , (19)
where u0, ga, gb are continuous functions.
173
If D = (,) (0,), we need to impose the boundary conditionslim|x|
u(x, t) ea x2= 0 for any a > 0. (20)
(20) implies u(x, t) does not grow too fast as |x| .The diffusion equation (17) with the initial condition (18) and the boundary conditions
(19) is well-posed, i.e. there exists a unique solution that depends continuously on u0,
ga and gb.
174
2. Grid Points.
To find a numerical solution to equation (17) with finite difference methods, we first
need to define a set of grid points in the domain D as follows:
Choose a state step size x = baN
(N is an integer) and a time step size t, draw
a set of horizontal and vertical lines across D, and get all intersection points (xj, tn),
or simply (j, n),
where xj = a + jx, j = 0, . . . , N , and tn = nt, n = 0, 1, . . ..
If D = [a, b] [0, T ] then choose t = TM
(M is an integer) and tn = nt, n =
0, . . . ,M .
175
x0 = a x1 x2 . . . xN = b
t0 = 0
t1
t2
t3
...
tM = T
176
3. Finite Differences.
The partial derivatives
ux :=u
xand uxx :=
2u
x2
are always approximated by central difference quotients, i.e.
ux unj+1 unj1
2xand uxx
unj+1 2 unj + unj1(x)2
(21)
at a grid point (j, n). Here unj = u(xj, tn).
Depending on how ut is approximated, we have three basic schemes: explicit, implicit,
and CrankNicolson schemes.
177
4. Explicit Scheme.
If ut is approximated by a forward difference quotient
ut un+1j unj
t
at (j, n),
then the corresponding difference equation t