NMF

Embed Size (px)

DESCRIPTION

mn

Citation preview

  • Numerical Methods for FinanceDr Robert Nurnberg

    This course introduces the major numerical methods needed for quantitative work in

    finance. To this avail, the course will strike a balance between a general survey of

    significant numerical methods anyone working in a quantitative field should know, and a

    detailed study of some numerical methods specific to financial mathematics. In the first

    part the course will cover e.g.

    linear and nonlinear equations, interpolation and optimization,

    while the second part introduces e.g.

    binomial and trinomial methods, finite difference methods, Monte-Carlo simulation,

    random number generators, option pricing and hedging.

    The assessment consists of 80% an exam and 20% a project.

    1

  • 1. References

    1. Burden and Faires (2004), Numerical Analysis.

    2. Clewlow and Strickland (1998), Implementing Derivative Models.

    3. Fletcher (2000), Practical Methods of Optimization.

    4. Glasserman (2004), Monte Carlo Methods in Financial Engineering.

    5. Higham (2004), An Introduction to Financial Option Valuation.

    6. Hull (2005), Options, Futures, and Other Derivatives.

    7. Kwok (1998), Mathematical Models of Financial Derivatives.

    8. Press et al. (1992), Numerical Recipes in C. (online)

    9. Press et al. (2002), Numerical Recipes in C++.

    10. Seydel (2006), Tools for Computational Finance.

    11. Wilmott, Dewynne, and Howison (1993), Option Pricing.

    2

  • 12. Winston (1994), Operations Research: Applications and Algorithms.

    3

  • 2. Preliminaries

    1. Algorithms.

    An algorithm is a set of instructions to construct an approximate solution to a math-

    ematical problem.

    A basic requirement for an algorithm is that the error can be made as small as we like.

    Usually, the higher the accuracy we demand, the greater is the amount of computation

    required.

    An algorithm is convergent if it produces a sequence of values which converge to the

    desired solution of the problem.

    Example: Given c > 1 and > 0, use the bisection method to seek an approxima-

    tion toc with error not greater than .

    4

  • Example

    Find x =c, c > 1 constant.

    Answer

    x =c x2 = c f (x) := x2 c = 0

    f (1) = 1 c < 0 and f (c) = c2 c > 0 x (1, c) s.t. f (x) = 0

    f (x) = 2 x > 0 f monotonically increasing x is unique.Denote In := [an, bn] with I0 = [a0, b0] = [1, c]. Let xn :=

    an+bn2 .

    (i) If f (xn) = 0 then x = xn.

    (ii) If f (an) f (xn) > 0 then x (xn, bn), let an+1 := xn, bn+1 := bn.(iii) If f (an) f (xn) < 0 then x (an, xn), let an+1 := an, bn+1 := xn.

    5

  • Length of In : m(In) =12m(In1) = = 12n m(I0) = c12n

    Algorithm

    Algorithm stops if m(In) < and let x? := xn.

    Error as small as we like?

    x, x? In error |x? x| = |xn x| m(In) 0 as n. X

    Convergence?

    I0 I1 In

    ! x n=0

    In, f (x) = 0 , i.e. x =c. X

    Implementation:

    No need to define In = [an, bn]. It is sufficient to store only 3 points throughout.

    Suppose x (a, b), define x := a+b2 .If x (a, x) let a := a, b := x, otherwise let a := x, b := b.

    6

  • 2. Errors.

    There are various errors in computed solutions, such as

    discretization error (discrete approximation to continuous systems), truncation error (termination of an infinite process), and rounding error (finite digit limitation in computer arithmetic).If a is a number and a is an approximation to a, then

    the absolute error is |a a| andthe relative error is

    |a a||a| provided a 6= 0.

    Example: Discuss sources of errors in deriving the numerical solution of the non-

    linear differential equation x = f (x) on the interval [a, b] with initial conditionx(a) = x0.

    7

  • Example

    discretization error

    x = f (x) [differential equation]x(t + h) x(t)

    h= f (x(t)) [difference equation]

    DE =

    x(t + h) x(t)h x(t)

    truncation error

    limn xn = x, approximate x with xN , N a large number.

    TE = |x xN |rounding error

    We cannot express x exactly, due to finite digit limitation. We get x instead.

    RE = |x x|Total error = DE + TE + RE.

    8

  • 3.Well/Ill Conditioned Problems.

    A problem is well-conditioned (or ill-conditioned) if every small perturbation of the

    data results in a small (or large) change in the solution.

    Example: Show that the solution to equations x + y = 2 and x + 1.01 y = 2.01 is

    ill-conditioned.

    Exercise: Show that the following problems are ill-conditioned:

    (a) solution to the differential equation x 10x 11x = 0 with initial conditionsx(0) = 1 and x(0) = 1,(b) value of q(x) = x2 + x 1150 if x is near a root.

    9

  • Example

    x + y = 2x + 1.01 y = 2.01 x = 1y = 1

    Change 2.01 to 2.02:x + y = 2x + 1.01 y = 2.02 x = 0y = 2

    I.e. 0.5% change in data produces 100% change in solution: ill-conditioned !

    [reason: det

    (1 1

    1 1.01

    )= 0.01 nearly singular]

    10

  • 4. Taylor Polynomials.

    Suppose f, f , . . . , f (n) are continuous on [a, b] and f (n+1) exists on (a, b). Let x0 [a, b]. Then for every x [a, b], there exists a between x0 and x with

    f (x) =n

    k=0

    f (k)(x0)

    k!(x x0)k +Rn(x)

    where Rn(x) =f (n+1)()

    (n + 1)!(x x0)n+1 is the remainder.

    [Equivalently: Rn(x) =

    xx0

    f (n+1)(t)

    n!(x t)n dt.]

    Examples:

    exp(x) =k=0

    xk

    k!

    sin(x) =k=0

    (1)k(2k + 1)!

    x2k+1

    11

  • 5. Gradient and Hessian Matrix.

    Assume f : Rn R.The gradient of f at a point x, written as f (x), is a column vector in Rn with ithcomponent f

    xi(x).

    The Hessian matrix of f at x, written as 2f (x), is an n n matrix with (i, j)thcomponent

    2fxixj

    (x). [As 2f

    xixj=

    2fxjxi

    , 2f (x) is symmetric.]

    Examples:

    f (x) = aTx, a Rn f = a, 2f = 0 f (x) = 12 xT Ax, A symmetric f (x) = Ax, 2f = A f (x) = exp(12 xT Ax), A symmetric

    f (x) = exp(12 xT Ax)Ax,2f (x) = exp(12 xT Ax)AxxT A + exp(12 xT Ax)A

    12

  • 6. Taylors Theorem.

    Suppose that f : Rn R is continuously differentiable and that p Rn.Then we have

    f (x + p) = f (x) +f (x + t p)Tp ,for some t (0, 1).Moreover, if f is twice continuously differentiable, we have

    f (x + p) = f (x) + 10

    2f (x + t p) p dt,and

    f (x + p) = f (x) +f (x)Tp + 12pT 2f (x + t p) p,

    for some t (0, 1).

    13

  • 7. Positive Definite Matrices.

    An n n matrix A = (aij) is positive definite if it is symmetric (i.e. AT = A) andxTAx > 0 for all x Rn \ {0}. [I.e. xTAx 0 with = only if x = 0.]The following statements are equivalent:

    (a) A is a positive definite matrix,

    (b) all eigenvalues of A are positive,

    (c) all leading principal minors of A are positive.

    The leading principal minors of A are the determinants k, k = 1, 2, . . . , n, defined

    by

    1 = det[a11], 2 = det

    [a11 a12

    a21 a22

    ], . . . , n = detA.

    A matrix A is symmetric and positive semi-definite, if xTAx 0 for all x Rn.Exercise.

    Show that any positive definite matrix A has only positive diagonal entries.

    14

  • 8. Convex Sets and Functions.

    A set S Rn is a convex set if the straight line segment connecting any two pointsin S lies entirely inside S, i.e., for any two points x, y S we have

    x + (1 ) y S [0, 1].A function f : D R is a convex function if its domain D Rn is a convex setand if for any two points x, y D we have

    f (x + (1 ) y) f (x) + (1 ) f (y) [0, 1].Exercise.

    Let D Rn be a convex, open set.(a) If f : D R is continuously differentiable, then f is convex if and only if

    f (y) f (x) +f (x)T (y x) x, y D .(b) If f is twice continuously differentiable, then f is convex if and only if 2f (x) is

    positive semi-definite for all x in the domain.

    15

  • Exercise 2.8.

    (a) As f is convex we have for any x, y in the convex set D that

    f ( y + (1 )x) f (y) + (1 ) f (x) [0, 1].Hence

    f (y) f (x + (y x)) f (x)

    + f (x) .

    Letting 0 yields f (y) f (x) +f (x)T (y x).For any x1, x2 D and [0, 1] let x := x1 + (1 )x2 D and y := x1.On noting that y x = x1 x1 (1 )x2 = (1 ) (x1 x2) we have thatf (x1) = f (y) f (x) +f (x)T (y x) = f (x) + (1 )f (x)T (x1 x2) . ()

    Similarly, letting x := x1 + (1 )x2 and y := x2 gives, on noting that y x = (x2 x1), that

    f (x2) f (x) + f (x)T (x2 x1) . ()16

  • Combining () + (1 ) () gives f (x1) + (1 ) f (x2) f (x) = f (x1 + (1 )x2 f is convex.

    (b) For any x, x0 D use Taylors theorem at x0:f (x) = f (x0)+f (x0)T (xx0)+1

    2(xx0)T 2f (x0+ (xx0)) (xx0) (0, 1)

    As 2f is positive semi-definite, this immediately givesf (x) f (x0) +f (x0)T (x x0) f is convex.

    Assume 2f is not positive semi-definite in the domain D. Then there exists x0 Dand x Rn s.t. xT 2f (x0) x < 0. As D is open we can find x1 := x0 + x D, for > 0 sufficiently small. Hence (x1 x0)T 2f (x0) (x1 x0) < 0. For x1 sufficientlyclose to x0 the continuity of 2f gives (x1 x0)T 2f (x0 + (x1 x0)) (x1 x0) < 0for all (0, 1). Taylors theorem then yields f (x1) < f (x0)+f (x0)T (x1x0). Thiscontradicts f being convex, see (a).

    17

  • 9. Vector Norms.

    A vector norm on Rn is a function, , from Rn into R with the following properties:(i) x 0 for all x Rn and x = 0 if and only if x = 0.(ii) x = || x for all R and x Rn.(iii) x + y x + y for all x, y Rn.Common vector norms are the l1, l2 (Euclidean), and lnorms:

    x1 =ni=1

    |xi|, x2 ={

    ni=1

    x2i

    }1/2, x = max

    1in|xi|.

    Exercise.

    (a) Prove that 1, 2 and are norms.(b) Given a symmetric positive definite matrix A, prove that

    xA :=xT Ax

    is a norm.

    18

  • Example.

    Draw graphs defined by x1 1, x2 1, x 1 when n = 2.l1 l2 l

    Exercise. Prove that for all x, y Rn we have

    (a)

    ni=1

    |xi yi| x2 y2 [Scharz inequality]

    and

    (b)1nx2 x x1

    n x2.

    19

  • 10. Spectral Radius.

    The spectral radius of a matrix A Rnn is defined by (A) = max1in

    |i|, where1, . . . , n are all the eigenvalues of A.

    11.Matrix Norms.

    For an n n matrix A, the natural matrix norm A for a given vector norm is defined by

    A = maxx=1

    Ax.The common matrix norms are

    A1 = max1jn

    ni=1

    |aij|, A2 =(ATA)

    (A) if A=AT

    , A = max1in

    nj=1

    |aij|.

    Exercise: Compute A1, A, and A2 for A =[

    1 1 0

    1 2 1

    1 1 2

    ].

    (Answer: A1 = A = 4 and A2 =7 +

    7 3.1.)

    20

  • 12. Convergence.

    A sequence of vectors {x(k)} Rn is said to converge to a vector x Rn ifx(k) x 0 as k for an arbitrary vector norm . This is equivalentto the componentwise convergence, i.e., x

    (k)i xi as k , i = 1, . . . , n.

    A square matrix A Rnn is said to be convergent if Ak 0 as k , whichis equivalent to (Ak)ij 0 as k for all i, j.The following statements are equivalent:

    (i) A is a convergent matrix,

    (ii) (A) < 1,

    (iii) limkAk x = 0, for every x Rn.Exercise. Show that A is convergent, where

    A =

    [1/2 0

    1/4 1/2

    ].

    21

  • 3. Algebraic Equations

    1. Decomposition Methods for Linear Equations.

    A matrix A Rnn is said to have LU decomposition if A = LU where L Rnnis a lower triangular matrix (lij = 0 if 1 i < j n) and U Rnn is an uppertriangular matrix (uij = 0 if 1 j < i n).The decomposition is unique if one assumes e.g. lii = 1 for 1 i n.

    L =

    l11

    l21 l22

    l31 l32 l33... . . .

    ln1 ln2 ln3 . . . lnn

    , U =u11 u12 u13 . . . u1n

    u22 u23 . . . u2n. . . ...

    un1,n1 un1,nunn

    22

  • In general, the diagonal elements of either L or U are given and the remaining elements

    of the matrices are determined by directly comparing two sides of the equation.

    The linear system Ax = b is then equivalent to Ly = b and U x = y.

    Exercise.

    Show that the solution to Ly = b is

    y1 = b1/l11, yi = (bi i1k=1

    lik yk)/lii, i = 2, . . . , n

    (forward substitution) and the solution to U x = y is

    xn = yn/unn, xi = (yi n

    k=i+1

    uik xk)/uii, i = n 1, . . . , 1

    (backward substitution).

    23

  • 2. Crout Algorithm.

    Exercise.

    Let A be tridiagonal, i.e. aij = 0 if |i j| > 1 (aij = 0 except perhaps ai1,i, aii andai,i+1), and strictly diagonally dominant (|aii| >

    j 6=i |aij| holds for i = 1, . . . , n).

    Show that A can be factorized as A = LU where lii = 1 for i = 1, . . . , n, u11 = a11,

    and

    ui,i+1 = ai,i+1

    li+1,i = ai+1,i/uii

    ui+1,i+1 = ai+1,i+1 li+1,i ui,i+1for i = 1, . . . , n 1. [Note: L and U are tridiagonal.]C++ Exercise: Write a program to solve a tridiagonal and strictly diagonally dom-

    inant linear equation Ax = b by the Crout algorithm. The input are the number of

    variables n, the matrix A, and the vector b. The output is the solution x.

    24

  • Exercise 3.2.

    u11 = a11 and ui,i+1 = ai,i+1, li+1,i = ai+1,i/uii, ui+1,i+1 = ai+1,i+1 li+1,i ui,i+1, fori = 1, . . . , n 1, can easily be shown.It remains to show that uii 6= 0 for i = 1, . . . , n. We proceed by induction to showthat

    |uii| > |ai,i+1|,where for convenience we define an,n+1 := 0.

    i = 1: |u11| = |a11| > |a1,2| X i 7 i + 1:

    |ui+1,i+1| =ai+1,i+1 ai+1,i ai,i+1uii

    |ai+1,i+1| |ai+1,i| |ai,i+1||uii| |ai+1,i+1| |ai+1,i| > |ai+1,i+2| X

    Overall we have that |uii| > 0 and so the Crout algorithm is well defined. Moreover,det(A) = det(L) det(U) = det(U) =

    ni=1 uii 6= 0.25

  • 3. Choleski Algorithm.

    Exercise.

    Let A be a positive definite matrix. Show that A can be factorized as A = LLT

    where L is a lower triangular matrix.

    (i) Compute 1st column:

    l11 =a11, li1 = ai1/l11, i = 2, . . . , n.

    (ii) For j = 2, . . . , n 1 compute jth column:

    ljj = (ajj j1k=1

    l2jk)12

    lij = (aij j1k=1

    lik ljk)/ljj, i = j + 1, . . . , n.

    26

  • (iii) Compute nth column:

    lnn = (ann n1k=1

    l2nk)12 .

    27

  • 4. Iterative Methods for Linear Equations.

    Split A into A = M +N with M nonsingular and convert the equation Ax = b into

    an equivalent equation x = C x + d with C = M1N and d = M1 b.Choose an initial vector x(0) and then generate a sequence of vectors by

    x(k) = C x(k1) + d, k = 1, 2, . . .

    The resulting sequence converges to the solution of Ax = b, for an arbitrary initial

    vector x(0), if and only if (C) < 1.

    The objective is to choose M such that M1 is easy to compute and (C) < 1.

    The iteration stops if x(k) x(k1) < .[Note: In practice one solves M x(k) = N x(k1) + b, for k = 1, 2, . . ..]

    28

  • Claim.

    The iteration x(k) = C x(k1) + d is convergent if and only if (C) < 1.Proof.

    Define e(k) := x(k) x, the error of the kth iterate. Thene(k) = C x(k1) + d (C x + d) = C (x(k1) x) = C e(k1) = C2 e(k2) = . . . Ck e(0) ,where e(0) = x(0) x is an arbitrary vector.Assume C is similar to the diagonal matrix = diag(1, . . . , n), where i are the

    eigenvalues of C.

    X nonsingular s.t. C = X X1

    e(k) = Ck e(0) = X k X1 e(0) = X

    k1. . .

    kn

    X1 e(0) 0 as k |i| < 1 i = 1, . . . , n (C) < 1 .

    29

  • 5. Jacobi Algorithm.

    Exercise: Let M = D and N = L + U (L strict lower triangular part of A, D

    diagonal, U strict upper triangular part). Show that the ith component at the kth

    iteration is

    x(k)i =

    1

    aii

    bi i1j=1

    aij x(k1)j

    nj=i+1

    aij x(k1)j

    for i = 1, . . . , n.

    6. GaussSeidel Algorithm.

    Exercise: Let M = D + L and N = U . Show that the ith component at the kth

    iteration is

    x(k)i =

    1

    aii

    bi i1j=1

    aij x(k)j

    nj=i+1

    aij x(k1)j

    for i = 1, . . . , n.

    30

  • 7. SOR Algorithm.

    Exercise.

    Let M = 1D + L and N = U + (1 1

    )D where 0 < < 2. Show that the ith

    component at the kth iteration is

    x(k)i = (1 )x(k1)i +

    1

    aii

    bi i1j=1

    aij x(k)j

    nj=i+1

    aij x(k1)j

    for i = 1, . . . , n.

    C++ Exercise: Write a program to solve a diagonally dominant linear equation

    Ax = b by the SOR algorithm. The input are the number of variables n, the matrix

    A, the vector b, the initial vector x0, the relaxation parameter , and tolerance .

    The output is the number of iterations k and the approximate solution xk.

    31

  • 8. Special Matrices.

    If A is strictly diagonally dominant, then Jacobi and GaussSeidel converge for any

    initial vector x(0). In addition, SOR converges for (0, 1].If A is positive definite and 0 < < 2, then the SOR method converges for any

    initial vector x(0).

    If A is positive definite and tridiagonal, then (CGS) = [(CJ)]2 < 1 and the optimal

    choice of for the SOR method is =2

    1 +1 (CGS)

    [1, 2). With this choiceof , (CSOR) = 1 (CGS).Exercise.

    Find the optimal for the SOR method for the matrix

    A =

    4 3 03 4 10 1 4

    .(Answer: 1.24.)

    32

  • 9. Condition Numbers.

    The condition number of a nonsingular matrix A relative to a norm is definedby

    (A) = A A1.Note that (A) AA1 = I = max

    x=1x = 1.

    A matrix A is well-conditioned if (A) is close to one and is ill-conditioned if (A)

    is much larger than one.

    Suppose A < 1A1. Then the solution x to (A+ A) x = b+ b approximatesthe solution x of Ax = b with error estimate

    x xx

    (A)

    1 A A1(bb +

    AA

    ).

    In particular, if A = 0 (no perturbation to matrix A) then

    x xx (A)

    bb .

    33

  • Example.

    Consider Example 1.3.

    A =

    (1 1

    1 1.01

    )

    A1 = 1detA

    (1.01 11 1

    )=

    1

    0.01

    (1.01 11 1

    )=

    (101 100100 100

    )Recall

    A1 = max1jn

    ni=1

    |aij| .Hence

    A1 = max(2, 2.01) = 2.01 , A11 = max(201, 200) = 201 .

    1(A) = A1 A11 = 404.01 1 (ill-conditioned!)

    Similarly = 404.01 and 2 = (A) (A1) = maxmin = 402.0075.

    34

  • 10. Hilbert Matrix.

    An n n Hilbert matrix Hn = (hij) is defined by hij = 1/(i + j 1) for i, j =1, 2, . . . , n.

    Hilbert matrices are notoriously ill-conditioned and (Hn) very rapidly asn.

    Hn =

    11

    2. . .

    1

    n

    1

    2

    1

    3. . .

    1

    n + 1... ...

    1

    n

    1

    n + 1. . .

    1

    2n 1

    Exercise.

    Compute the condition numbers 1(H3) and 1(H6).

    35

  • (Answer: 1(H3) = 748 and 1(H6) = 2.9 106.)

    36

  • 11. Fixed Point Method for Nonlinear Equations.

    A function g : R R has a fixed point x ifg(x) = x.

    A function g is a contraction mapping on [a, b] if g : [a, b] [a, b] and|g(x)| L < 1, x (a, b)

    where L is a constant.

    Exercise.

    Assume g is a contraction mapping on [a, b]. Prove that g has a unique fixed point x

    in [a, b], and for any x0 [a, b], the sequence defined byxn+1 = g(xn), n 0,

    converges to x. The algorithm is called fixed point iteration.

    37

  • Exercise 3.11.

    Existence:

    Define h(x) = x g(x) on [a, b]. Then h(a) = a g(a) 0 and h(b) = b g(b) 0.As h is continuous, c [a, b] s.t. h(c) = 0. I.e. c = g(c). XUniqueness:

    Suppose p, q [a, b] are two fixed points. Then|p q| = |g(p) g(q)| =

    MVT, (a,b)|g() (p q)| L |p q|

    (1 L) |p q| 0 |p q| 0 p = q . XConvergence:

    |xn x| = |g(xn1) g(x)| = |g() (xn1 x)| L |xn1 x| . . . Ln |x0 x| 0 as n .

    Hence

    xn x as n . X38

  • 12. Newton Method for Nonlinear Equations.

    Assume that f C1([a, b]), f (x) = 0 (x is a root or zero) and f (x) 6= 0.The Newton method can be used to find the root x by generating a sequence {xn}satisfying

    xn+1 = xn f (xn)f (xn)

    , n = 0, 1, . . .

    provided f (xn) 6= 0 for all n.The sequence xn converges to the root x as long as the initial point x0 is sufficiently

    close to x.

    The algorithm stops if |xn+1 xn| < , a prescribed error tolerance, and xn+1 istaken as an approximation to x.

    Geometric Interpretation:

    Tangent line at (xn, f (xn)) is

    Y = f (xn) + f(xn) (X xn).

    39

  • Setting Y = 0 yields xn+1 := X = xn f(xn)f (xn).

    40

  • 13. Choice of Initial Point.

    Suppose f C2([a, b]) and f (x) = 0 with f (x) 6= 0. Then there exists > 0 suchthat the Newton method generates a sequence xn converging to x for any initial point

    x0 [x , x + ] (x0 can only be chosen locally).However, if f satisfies the following additional conditions:

    1. f (a)f (b) < 0,

    2. f does not change sign on [a, b],

    3. tangent lines to the curve y = f (x) at both a and b cut the x-axis within [a, b];

    (i.e. a f(a)f (a), b f(b)f (b) [a, b])

    then f (x) = 0 has a unique root x in [a, b] and Newton method converges to x for

    any initial point x0 [a, b] (x0 can be chosen globally).Example.

    Use the Newton method to compute x =c, c > 1, and show that the initial point

    can be any point in [1, c].

    41

  • Example.

    Find x =c, c > 1.

    Answer.

    x is root of f (x) := x2 c.Newton: xn+1 = xn f (xn)

    f (xn)= xn x

    2n c2 xn

    =1

    2

    (xn +

    c

    xn

    ), n 0.

    How to choose x0?

    Check the 3 conditions on [1, c].

    1. f (1) = 1 c < 0, f (c) = c2 c > 0. f (1) f (c) < 0 X2. f = 2 X

    3. Tangent line at 1: Y = f (1) + f (1) (X 1) = 1 c + 2 (X 1)Let Y = 0, then X = 1 +

    c 12

    (1, c). XTangent line at c: Y = f (c) + f (c) (X c) = c2 c + 2 c (X c)Let Y = 0, then X = c c 1

    2 (1, c). X

    42

  • Newton convergence for any x0 [1, c].

    43

  • Numerical Example.

    Find7. (From calculator:

    7 = 2.6457513.)

    Newton converges for all x0 [1, 7]. Choose x0 = 4.x1 =

    1

    2

    (x0 +

    7

    x0

    )= 2.875

    x2 = 2.6548913

    x3 = 2.6457670

    x4 = 2.6457513

    Comparison to bisection method with I0 = [1, 7]:

    I1 = [1, 4]

    I2 = [2.5, 4]

    I3 = [2.5, 3.25]

    I4 = [2.5, 2.875]...

    I25 = [2.6457512, 2.6457513]

    44

  • 14. Pitfalls.

    Here are some difficulties which may be encountered with the Newton method:

    1. {xn} may wander around and not converge (there are only complex roots to theequation),

    2. initial approximation x0 is too far away from the desired root and {xn} convergesto some other root (this usually happens when f (x0) is small),

    3. {xn} may diverge to + (the function f is positive and monotonically decreasingon an unbounded interval), and

    4. {xn} may repeat (cycling).

    45

  • 15. Rate of Convergence.

    Suppose {xn} is a sequence that converges to x.The convergence is said to be linear if there is a constant r (0, 1) such that

    |xn+1 x||xn x| r, for all n sufficiently large.

    The convergence is said to be superlinear if

    limn

    |xn+1 x||xn x| = 0.

    In particular, the convergence is said to be quadratic if

    |xn+1 x||xn x|2 M, for all n sufficiently large.

    where M is a positive constant, not necessarily less than 1.

    Example. xn = x + 0.5n linear, xn = x + 0.5

    2n quadratic.

    Example. Show that the Newton method converges quadratically.

    46

  • Example.

    Define g(x) = x f (x)f (x)

    . Then the Newton method is given by

    xn+1 = g(xn) .

    Moreover, f (x) = 0 and f (x) 6= 0 imply thatg(x) = x ,

    g(x) = 1 (f)2 f f (f )2

    (x) = f (x)f (x)(f (x))2

    = 0 ,

    g(x) =f (x)f (x)

    .

    Assuming that xn x we have that|xn+1 x||xn x|2 =

    |g(xn) g(x)||xn x|2 =Taylor |g

    (x) (xn x) + 12 g(n) (xn x)2||xn x|2

    =1

    2|g(n)| 1

    2|g(x)| =: as n .

    47

  • Hence |xn+1 x| |xn x|2 quadratic convergence rate.

    48

  • 4. Interpolations

    1. Polynomial Approximation.

    For any continuous function f defined on an interval [a, b], there exist polynomials P

    that can be as close to the given function as desired.

    Taylor polynomials agree closely with a given function at a specific point, but they

    concentrate their accuracy only near that point.

    A good polynomial needs to provide a relatively accurate approximation over an entire

    interval.

    49

  • 2. Interpolating Polynomial Lagrange Form.

    Suppose xi [a, b], i = 0, 1, . . . , n, are pairwise distinct mesh points in [a, b].The Lagrange polynomial p is a polynomial of degree n such that

    p(xi) = f (xi), i = 0, 1, . . . , n.

    p can be constructed explicitly as

    p(x) =

    ni=0

    Li(x)f (xi)

    where Li is a polynomial of degree n satisfying

    Li(xj) = 0, j 6= i, Li(xi) = 1.This results in

    Li(x) =j 6=i

    (x xjxi xj

    )i = 0, 1, . . . , n.

    p is called linear interpolation if n = 1 and quadratic interpolation if n = 2.

    50

  • Exercise.

    Find the Lagrange polynomial p for the following points (x, f (x)): (1, 0), (1,3),and (2,4). Assume that a new point (0, 2) is observed, and construct a Lagrangepolynomial to incorporate this new information in it.

    Error formula.

    Suppose f is n + 1 times differentiable on [a, b]. Then it holds that

    f (x) = p(x) +f (n+1)()

    (n + 1)!(x x0) (x xn),

    where = (x) lies in (a, b).

    Proof.

    Define g(x) = f (x) p(x) + nj=0

    (x xj), where R is a constant. Clearly,g(xj) = 0 for j = 0, . . . , n. To estimate the error at x = 6 {x0, . . . , xn}, choose such that g() = 0.

    51

  • Hence

    g(x) = f (x) p(x) (f () p())nj=0

    (x xj xj

    ).

    g has at least n + 2 zeros: x0, . . . , xn, .Mean Value Theorem yields that

    g has at least n + 1 zeros...

    g(n+1) has at least 1 zero, say

    Moreover, as p is polynomial of degree n, it holds that p(n+1) = 0.

    52

  • Hence

    0 = g(n+1)() = f (n+1)() (f () p()) (n + 1)!nj=0( xj)

    Error = f () p() = f(n+1)()

    (n + 1)!

    nj=0

    ( xj) .

    53

  • 3. Trapezoid Rule.

    We can use linear interpolation (n = 1, x0 = a, x1 = b) to approximate f (x) on [a, b]

    and then compute baf (x) dx to get the trapezoid rule: b

    a

    f (x) dx 12(b a) [f (a) + f (b)] .

    If we partition [a, b] into n equal subintervals with mesh points xi = a + ih, i =

    0, . . . , n, and step size h = (b a)/n, we can derive the composite trapezoid rule: ba

    f (x) dx h2[f (x0) + 2

    n1i=1

    f (xi) + f (xn)] .

    The approximation error is in the order of O(h2) if |f | is bounded.

    54

  • Use linear interpolation (n = 1, x0 = a, x1 = b) to approximate f (x) on [a, b] and then

    compute baf (x) dx.

    Answer.

    The linear interpolating polynomial is p(x) = f (a)L0(x) + f (b)L1(x), where

    L0(x) =x ba b , and L1(x) =

    x ab a .

    ba

    f (x) dx ba

    p(x) dx = f (a)

    ba

    x ba b dx + f (b)

    ba

    x ab a dx

    = f (a)1

    a b1

    2(x b)2 |ba +f (b)

    1

    b a1

    2(x a)2 |ba

    = f (a)1

    a b1

    2((a b)2) + f (b) 1

    b a1

    2(b a)2

    =b a2

    (f (a) + f (b)) Trapezoid Rule

    [Of course, one could have arrived at this formula with a simple geometric argument.]

    55

  • Error Analysis.

    Let f (x) = p(x) + E(x), where E(x) =f ()2

    (x a) (x b) with (a, b). Assumethat |f | M is bounded. Then

    ba

    E(x) dx

    ba

    |E(x)| dx M2

    ba

    (x a) (b x) dx

    =M

    2

    ba

    (x a) [(b a) (x a)] dx

    =M

    2

    ba

    [(x a)2 + (b a) (x a)] dx=M

    2

    [13(b a)3 + 1

    2(b a)3

    ]dx

    =M

    12(b a)3 .

    56

  • The composite formula can be obtained by considering the partitioning of [a, b] into

    a = x0 < x1 < . . . < xn1 < xn = b, where xi = a + i h with h :=b an

    .

    ba

    f (x) dx =n1i=0

    xi+1xi

    f (x) dx n1i=0

    xi+1 xi2

    (f (xi) + f (xi+1))

    =n1i=0

    h

    2(f (xi) + f (xi+1))

    = h

    [1

    2f (a) + f (x1) + . . . + f (xn1) +

    1

    2f (b)

    ].

    Error analysis then yields that

    Error M12

    h3 n =M (b a)

    12h2 = O(h2) .

    57

  • 4. Simpsons Rule.

    Exercise.

    Use quadratic interpolation (n = 2, x0 = a, x1 =a+b2 , x2 = b) to approximate f (x)

    on [a, b] and then compute baf (x) dx to get the Simpsons rule: b

    a

    f (x) dx 16(b a) [f (a) + 4 f (a + b

    2) + f (b)] .

    Derive the composite Simpsons rule: ba

    f (x) dx h3[f (x0) + 2

    n/2i=2

    f (x2i2) + 4n/2i=1

    f (x2i1) + f (xn)] ,

    where n is an even number and xi and h are chosen as in the composite trapezoid

    rule.

    [One can show that the approximation error is in the order of O(h4) if |f (4)| isbounded.]

    58

  • 5. NewtonCotes Formula.

    Suppose x0, . . . , xn are mesh points in [a, b], usually mesh points are equally spaced

    and x0 = a, xn = b, then integral can be approximated by the NewtonCotes

    formula: ba

    f (x) dx ni=0

    Ai f (xi)

    where parameters Ai are determined in such a way that the integral is exact for all

    polynomials of degree n.[Note: n+1 unknownsAi and n+1 coefficients for polynomial of degree n.] Exercise.

    Use NewtonCotes formula to derive the trapezoid rule and the Simpsons rule. Prove

    that if f is n + 1 times differentiable and |f (n+1)| M on [a, b] then

    | ba

    f (x) dxni=0

    Ai f (xi)| M(n + 1)!

    ba

    ni=0

    |x xi| dx.

    59

  • Exercise 4.5.

    We have that ba

    q(x) dx =

    ni=0

    Ai q(xi) for all polynomials q of degree n.

    Let q(x) = Lj(x), where Lj is the jth Lagrange polynomial for the data points x0, x1, . . . , xn.

    I.e. Lj is of degree n and satisfies Lj(xi) = ij =

    1 i = j0 i 6= j . Now ba

    Lj dx =ni=0

    Ai Lj(xi) = Aj

    ba

    f (x) dx ni=0

    Ai f (xi) =ni=0

    f (xi)

    ba

    Li(x) dx

    =

    ba

    ni=0

    f (xi)Li(x) dx =

    ba

    p(x) dx,

    where p(x) is the interpolating Lagrange polynomial to f . Hence we find trapezoid

    60

  • (n = 1) and Simpsons rule (n = 2, with x1 =a+b2 ).

    The Lagrange polynomial has the error term

    f (x) = p(x) + E(x), E(x) :=f (n+1)()

    (n + 1)!(x x0) (x xn),

    where = (x) lies in (a, b). Hence ba

    f (x) dx ba

    p(x) dx

    = ba

    E(x) dx

    ba

    |E(x)| dx

    M(n + 1)!

    ba

    ni=0

    |x xi| dx.

    61

  • 6. Ordinary Differential Equations.

    An initial value problem for an ODE has the form

    x(t) = f (t, x(t)), a t b and x(a) = x0. (1)(1) is equivalent to the integral equation:

    x(t) = x0 +

    ta

    f (s, x(s)) ds, a t b. (2)

    To solve (2) numerically we divide [a, b] into subintervals with mesh points ti = a+ih,

    i = 0, . . . , n, and step size h = (b a)/n. (2) implies

    x(ti+1) = x(ti) +

    ti+1ti

    f (s, x(s)) ds, i = 0, . . . , n 1.

    62

  • (a) If we approximate f (s, x(s)) on [ti, ti+1] by f (ti, x(ti)), we get the Euler (explicit)

    method for equation (1):

    wi+1 = wi + hf (ti, wi), w0 = x0.

    We have x(ti+1) wi+1 if h is sufficiently small.[Taylor: x(ti+1) = x(ti) + x

    (ti)h +O(h2) = x(ti) + f (ti, x(ti))h +O(h2).]

    (b) If we approximate f (s, x(s)) on [ti, ti+1] by linear interpolation with points (ti, f (ti, x(ti)))

    and (ti+1, f (ti+1, x(ti+1))), we get the trapezoidal (implicit) method for equation

    (1):

    wi+1 = wi +h

    2[f (ti, wi) + f (ti+1, wi+1)], w0 = x0.

    (c) If we combine the Euler method with the trapezoidal method, we get the modified

    Euler (explicit) method (or RungeKutta 2nd order method):

    wi+1 = wi +h

    2[f (ti, wi) + f (ti+1, wi + hf (ti, wi)

    wi+1)], w0 = x0.

    63

  • 7. Divided Differences.

    Suppose a function f and (n + 1) distinct points x0, x1, . . . , xn are given. Divided

    differences of f can be expressed in a table format as follows:

    xk 0DD 1DD 2DD 3DD . . .

    x0 f [x0]

    x1 f [x1] f [x0, x1]

    x2 f [x2] f [x1, x2] f [x0, x1, x2]

    x3 f [x3] f [x2, x3] f [x1, x2, x3] f [x0, x1, x2, x3]... . . .

    64

  • where f [xi] = f (xi)

    f [xi, xi+1] =f [xi+1] f [xi]xi+1 xi

    f [xi, xi+1, . . . , xi+k] =f [xi+1, xi+2, . . . , xi+k] f [xi, xi+1, . . . , xi+k1]

    xi+k xif [x1, . . . , xn] =

    f [x2, . . . , xn] f [x1, . . . , xn1]xn x1

    65

  • 8. Interpolating Polynomial Newton Form.

    One drawback of Lagrange polynomials is that there is no recursive relationship

    between Pn1 and Pn, which implies that each polynomial has to be constructedindividually. Hence, in practice one uses the Newton polynomials.

    The Newton interpolating polynomial Pn of degree n that agrees with f at the points

    x0, x1, . . . , xn is given by

    Pn(x) = f [x0] +

    nk=1

    f [x0, x1, . . . , xk]

    k1i=0

    (x xi).

    Note that Pn can be computed recursively using the relation

    Pn(x) = Pn1(x) + f [x0, x1, . . . , xn](x x0)(x x1) (x xn1).[Note that f [x0, x1, . . . , xk] can be found on the diagonal of the DD table.]

    Exercise.

    Suppose values (x, y) are given as (1,2), (2,56), (0,2), (3, 4), (1,16), and(7, 376). Is there a cubic polynomial that takes these values? (Answer: 2x3 7x2 +

    66

  • 5x 2.)

    67

  • Exercise 4.8.

    Data points: (1,2), (2,56), (0,2), (3, 4), (1,16), (7, 376).xk 0DD 1DD 2DD 3DD 4DD 5DD

    1 22 56 180 2 27 93 4 2 5 2

    1 16 5 3 2 07 376 49 11 2 0 0

    Newton polynomial:

    p(x) = 2 + 18 (x 1) 9 (x 1) (x + 2) + 2 (x 1) (x + 2) x= 2x3 7x2 + 5x 2 .

    [Note: We could have stopped at the row for x3 = 3 and checked whether p3 goes through

    the remaining data points.]

    68

  • 9. Piecewise Polynomial Approximations.

    Another drawback of interpolating polynomials is that Pn tends to oscillate widely

    when n is large, which implies that Pn(x) may be a poor approximation to f (x) if x

    is not close to the interpolating points.

    If an interval [a, b] is divided into a set of subintervals [xi, xi+1], i = 0, 1, . . . , n1, andon each subinterval a different polynomial is constructed to approximate a function

    f , such an approximation is called spline.

    The simplest spline is the linear spline P which approximates the function f on the

    interval [xi, xi+1], i = 0, 1, . . . , n 1, with P agreeing with f at xi and xi+1.Linear splines are easy to construct but are not smooth at points x1, x2, . . . , xn1.

    69

  • 10. Natural Cubic Splines.

    Given a function f defined on [a, b] and a set of points a = x0 < x1 < < xn = b,a function S is called a natural cubic spline if there exist n cubic polynomials Si

    such that:

    (a) S(x) = Si(x) for x in [xi, xi+1] and i = 0, 1, . . . , n 1;(b) Si(xi) = f (xi) and Si(xi+1) = f (xi+1) for i = 0, 1, . . . , n 1;(c) S i+1(xi+1) = S i(xi+1) for i = 0, 1, . . . , n 2;(d) S i+1(xi+1) = S i (xi+1) for i = 0, 1, . . . , n 2;(e) S (x0) = S (xn) = 0.

    70

  • Natural Cubic Splines.

    | | |

    xi xi+1 xi+2

    Si Si+1

    (a) 4n parameters

    (b) 2n equations

    (c) n 1 equations(d) n 1 equations(e) 2 equations

    4n equations

    71

  • Example. Assume S is a natural cubic spline that interpolates f C2([a, b]) atthe nodes a = x0 < x1 < < xn = b. We have the following smoothness propertyof cubic splines: b

    a

    [S (x)]2 dx ba

    [f (x)]2 dx.

    In fact, it even holds that ba

    [S (x)]2 dx = mingG

    ba

    [g(x)]2 dx,

    where G := {g C2([a, b]) : g(xi) = f (xi) i = 0, 1, . . . , n}.Exercise: Determine the parameters a to h so that S(x) is a natural cubic spline,

    where

    S(x) = ax3 + bx2 + cx + d for x [1, 0] andS(x) = ex3 + fx2 + gx + h for x [0, 1]with interpolation conditions S(1) = 1, S(0) = 2, and S(1) = 1.(Answer: a = 1, b = 3, c = 1, d = 2, e = 1, f = 3, g = 1, h = 2.)

    72

  • 11. Computation of Natural Cubic Splines.

    Denote

    ci = S(xi), i = 0, 1, . . . , n.

    Then c0 = cn = 0.

    Since Si is a cubic function on [xi, xi+1], we know that Si is a linear function on

    [xi, xi+1]. Hence it can be written as

    S i (x) = cixi+1 x

    hi+ ci+1

    x xihi

    where hi := xi+1 xi.

    73

  • Exercise.

    Show that Si is given by

    Si(x) =ci

    6hi(xi+1 x)3 + ci+1

    6hi(x xi)3 + pi (xi+1 x) + qi (x xi),

    where

    pi =

    (f (xi)

    hi ci hi

    6

    ), qi =

    (f (xi+1)

    hi ci+1 hi

    6

    )and c1, . . . , cn1 satisfy the linear equations:

    hi1 ci1 + 2 (hi1 + hi) ci + hi ci+1 = ui,

    where

    ui = 6 (di di1), di = f (xi+1) f (xi)hi

    for i = 1, 2, . . . , n 1.C++ Exercise: Write a program to construct a natural cubic spline. The inputs are

    the number of points n + 1 and all the points (xi, yi), i = 0, 1, . . . , n. The output is

    a natural cubic spline expressed in terms of the functions Si defined on the intervals

    [xi, xi+1] for i = 0, 1, . . . , n 1.74

  • 5. Basic Probability Theory

    1. CDF and PDF.

    Let (,F , P ) be a probability space, X be a random variable.The cumulative distribution function (cdf) F of X is defined by

    F (x) = P (X x), x R.F is an increasing right-continuous function satisfying

    F () = 0, F (+) = 1.If F is absolutely continuous thenX has a probability density function (pdf) f defined

    by

    f (x) = F (x), x R.F can be recovered from f by the relation

    F (x) =

    x

    f (u) du.

    75

  • 2. Normal Distribution.

    A random variable X has a normal distribution with parameters and 2, written

    X N(, 2), if X has the pdf(x) =

    1

    2e(x)2

    2 2

    for x R.If = 0 and 2 = 1 then X is called a standard normal random variable and its cdf

    is usually written as

    (x) =

    x

    12e

    u2

    2 du.

    If X N(, 2) then the characteristic function (Fourier transform) of X is givenby

    c(s) = E(ei sX) = ei t2 s2

    2 ,

    where i =1. The moment generating function (mgf) is given by

    m(s) = E(esX) = e t+2 s2

    2 .

    76

  • 3. Approximation of Normal CDF.

    It is suggested that the standard normal cdf (x) can be approximated by a poly-

    nomial (x) as follows:

    (x) := 1 (x) (a1k + a2k2 + a3k3 + a4k4 + a5k5) (3)when x 0 and (x) := 1 (x) when x < 0.The parameters are given by k = 11+x, = 0.2316419, a1 = 0.319381530, a2 =

    0.356563782, a3 = 1.781477937, a4 = 1.821255978, and a5 = 1.330274429. Thisapproximation has a maximum absolute error less than 7.5 108 for all x.C++ Exercise: Write a program to compute (x) with (3) and compare the result

    with that derived with the composite Simpson Rule (see Exercise 4.4). The input is

    a variable x and the number of partitions n over the interval [0, x]. The output is the

    results from the two methods and their error.

    77

  • 4. Lognormal Random Variable.

    Let Y = eX and X be a N(, 2) random variable. Then Y is a lognormal random

    variable.

    Exercise: Show that

    E(Y ) = e+12

    2, E(Y 2) = e2+2

    2.

    5. An Important Formula in Pricing European Options.

    If V is lognormally distributed and the standard deviation of ln V is s then 1

    E(max(V K, 0)) = E(V ) (d1)K (d2)where

    d1 =1

    slnE(V )

    K+s

    2and d2 = d1 s.

    1K will be later denoted by X. We use a different notation here to avoid an abuse of notation, as K is not a random variable.

    78

  • E(V K)+ = E(V ) (d1)K (d2), d1 = 1slnE(V )

    K+s

    2and d2 = d1 s.

    Proof.

    Let g be the pdf of V . Then

    E(V K)+ =

    (v K)+ g(v) dv = K

    (v K) g(v) dv.

    As V is lognormal, ln V is normal N(m, s2), where m = ln(E(V )) 12 s2.Let Y := lnVm

    s, i.e. V = em+s Y . Then Y N(0, 1) with pdf: (y) = 1

    2e

    y2

    2 .

    E(V K)+ = E(em+s Y K)+ = lnKm

    s

    (em+s y K) (y) dy

    =

    lnKm

    s

    em+s y (y) dy K lnKm

    s

    (y) dy

    = I1 K I2 .

    79

  • I1 =

    lnKm

    s

    12

    ey2

    2 +m+s y dy

    =

    lnKm

    s

    12

    e(ys)2

    2 +m+s2

    2 dy

    = em+s2

    2

    lnKm

    s s

    12

    ey2

    2 dy [y s 7 y]

    = em+s2

    2

    [1

    (lnK m

    s s)]

    = em+s2

    2

    (lnK m

    s+ s

    )= elnE(V )

    ( lnK + lnE(V ) s22

    s+ s

    )= E(V )

    (1

    slnE(V )

    K+s

    2

    )= E(V ) (d1) ,

    on recalling m = ln(E(V )) 12 s2.

    80

  • Similarly

    I2 = 1 (lnK m

    s

    )=

    (lnK m

    s

    )= (d1 s) = (d2) .

    81

  • 6. Correlated Random Variables.

    Assume X = (X1, . . . , Xn) is an n-vector of random variables.

    The mean of X is an n-vector = (E(X1), . . . , E(Xn)).

    The covariance of X is an n n-matrix with componentsij = (CovX)ij = E((Xi i)(Xj j)) .

    The variance of Xi is given by 2i = ii and the correlation between Xi and Xj is

    given by ij =ijij

    .

    X is called a multi-dimensional normal vector, written as X N(,), if X has pdf

    f (x) =1

    (2)n/21

    (det)1/2exp

    {12(x )T1(x )

    }where is a symmetric positive definite matrix.

    82

  • 7. Convergence.

    Let {Xn} be a sequence of random variables. There are four types of convergenceconcepts associated with {Xn}: Almost sure convergence, written Xn a.s. X , if there exists a null set N suchthat for all \N one has

    Xn() X(), n. Convergence in probability, written Xn P X , if for every > 0 one has

    P (|Xn X| > ) 0, n. Convergence in norm, written Xn L

    p X , if Xn,X Lp andE|Xn X|p 0, n.

    Convergence in distribution, written Xn D X , if for any x at which P (X x)is continuous one has

    P (Xn x) P (X x), n.83

  • 8. Strong Law of Large Numbers.

    Let {Xn} be independent, identically distributed (iid) random variables with finiteexpectation E(X1) = . Then

    Zn

    n

    a.s. where Zn = X1 + +Xn.

    9. Central Limit Theorem. Let {Xn} be iid random variables with finite expecta-tion and finite variance 2 > 0. For each n, let

    Zn = X1 + +Xn.Then

    Znn n

    =Zn n

    n

    D Z

    where Z is a N(0, 1) random variable, i.e.,

    P (Zn n

    n z) 1

    2

    z

    ex2

    2 dx, n.

    84

  • 10. LindebergFeller Central Limit Theorem.

    Suppose X is a triangular array of random variables, i.e.,

    X = {Xn1 , Xn2 , . . . , Xnk(n) : n {1, 2, . . .}}, with k(n) as n,such that, for each n, Xn1 , . . . , X

    nk(n) are independently distributed and are bounded

    in absolute value by a constant yn with yn 0. LetZn = X

    n1 + +Xnk(n).

    If E(Zn) and var(Zn) 2 > 0, then Zn converges in distribution to a normallydistributed random variable with mean and variance 2.

    Note: LindebergFeller implies the Central Limit Theorem (5.9).

    85

  • If X1, X2, . . . are iid with expectation and variance 2, then define

    Xni :=Xi

    n, i = 1, 2, . . . , k(n) := n.

    For each n, Xn1 , . . . , Xnk(n) are independent and

    E(Xni ) =E(Xi)

    n= n

    = 0 ,

    Var(Xni ) =1

    n2VarXi = 1

    n22 =

    1

    n.

    Let Zn = Xn1 + +Xnk(n) = (

    ni=1Xi n) 1n , then

    E(Zn) =

    k(n)i=1

    E(Xni ) = 0 ,

    Var(Zn) =

    k(n)i=1

    Var(Xni ) = 1 .

    Hence, by LindebergFeller,

    ZnD Z N(0, 1) .

    86

  • 6. Optimization

    1. Unconstrained Optimization.

    Given f : Rn R.Minimize f (x) over x Rn.f has a local minimum at a point x if f (x) f (x) for all x near x, i.e.

    > 0 s.t. f (x) f (x) x : x x < .

    f has a global minimum at x if

    f (x) f (x) x Rn.

    87

  • 2. Optimality Conditions.

    First order necessary conditions:Suppose that f has a local minimum at x and that f is continuously differentiable

    in an open neighbourhood of x. Thenf (x) = 0. (x is called a stationary point.) Second order sufficient Conditions:Suppose that f is twice continuously differentiable in an open neighbourhood of

    x and that f (x) = 0 and 2f (x) is positive definite. Then x is a strict localminimizer of f .

    Example: Show that f = (2x21 x2)(x21 2x2) has a minimum at (0, 0) along anystraight line passing through the origin, but f has no minimum at (0, 0).

    Exercise: Find the minimum solution of

    f (x1, x2) = 2x21 + x1x2 + x

    22 x1 3x2. (4)

    (Answer. 17 (1, 11).)

    88

  • Sufficient Condition.

    Taylor gives for any d Rn:f (x + d) = f (x) +f (x)T d + 12 dT 2f (x + d) d (0, 1) .

    If x is not strict local minimizer, then

    {xk} Rn \ {x} : xk x s.t. f (xk) f (x) .Define dk :=

    xkxxkx. Then dk = 1 and there exists a subsequence {dkj} such that

    dkj d? as j and d? = 1. W.l.o.g. we assume dk d? as k .f (x) f (xk) = f (x + xk x dk)

    = f (x) + xk xf (x)T dk + 12 xk x2 dTk 2f (x + k xk x dk) dk= f (x) + 12 xk x2 dTk 2f (x + k xk x dk) dk .

    Hence dTk 2f (x + k xk x dk) dk 0, and on letting k dT? 2f (x) d? 0 .

    As d? 6= 0, this is a contradiction to 2f (x) being symmetric positive definite. Hence xis a strict local minimizer.

    89

  • Example 6.2. Show that f = (2x21x2)(x21 2x2) has a minimum at (0, 0) along anystraight line passing through the origin, but f has no minimum at (0, 0).

    Answer.

    Straight line through (0, 0): x2 = x1, R fixed.g(r) := f (r, r) = (2 r2 r) (r2 2 r)g(r) = 8 r3 15 r2 + 42 r, g(r) = 24 r2 30 r + 42

    g(0) = 0 and g(0) = 42 > 0 .Hence r = 0 is a minimizer for g (0, 0) is a minimizer for f along any straightline.

    Now let (xk1, xk2) = (

    1k, 1k2) (0, 0) as k . Then

    f (xk1, xk2) =

    1

    k21

    k2< 0 = f (0, 0) k .

    Hence (0, 0) is not a minimizer for f .

    [Note: f (0, 0) = 0, but 2f (0, 0) =(0 0

    0 4

    ).]

    90

  • 3. Convex Optimization.

    Exercise. When f is convex, any local minimizer x is a global minimizer of f . If in

    addition f is differentiable, then any stationary point x is a global minimizer of f .

    (Hint. Use a contradiction argument.)

    91

  • Exercise 6.3.

    When f is convex, any local minimizer x is a global minimizer of f .

    Proof.

    Suppose x is a local minimizer, but not a global minimizer. Then

    x s.t. f (x) < f (x).Since f is convex, we have that

    f ( x + (1 ) x) f (x) + (1 ) f (x)< f (x) + (1 ) f (x) = f (x) (0, 1] .

    Let x := x + (1 ) x. Thenx x and f (x) < f (x) as 0.

    This is a contradiction to x being a local minimizer.

    Hence x is a global minimizer for f .

    92

  • 4. Line Search.

    The basic procedure to solve numerically an unconstrained problem (minimize f (x)

    over x Rn) is as follows.(i) Choose an initial point x0 Rn and an initial search direction d0 Rn and set

    k = 0.

    (ii) Choose a step size k and define a new point xk+1 = xk + k d

    k. Check if the

    stopping criterion is satisfied (f (xk+1) < ?). If yes, xk+1 is the optimalsolution, stop. If no, go to (iii).

    (iii) Choose a new search direction dk+1 (descent direction) and set k = k + 1. Go to

    (ii).

    The essential and most difficult part in any search algorithm is to choose a descent

    direction dk and a step size k with good convergence and stability properties.

    93

  • 5. Steepest Descent Method.

    f is differentiable.

    Choose dk = gk, where gk = f (xk), and choose k s.t.f (xk + k d

    k) = minR

    f (xk + dk).

    Note that the successive descent directions are orthogonal to each other, i.e. (gk)T gk+1 =

    0, and the convergence for some functions may be very slow, called zigzagging.

    Exercise.

    Use the steepest descent (SD) method to solve (4) with the initial point x0 = (1, 1).

    (Answer. First three iterations give x1 = (0, 1), x2 = (0, 32), and x3 = (18, 32).)

    94

  • Steepest Descent.

    Taylor gives:

    f (xk + dk) = f (xk) + f (xk)T dk +O(2).

    As

    f (xk)T dk = f (xk) dk cos k,with k the angle between dk and f (xk), we see that dk is a descent direction ifcos k < 0. The descent is steepest when k = cos k = 1.Zigzagging.

    k is minimizer of () := f (xk + dk) with dk = gk.

    Hence

    0 = (k) = f (xk + k dk)T dk = f (xk+1)T (gk) = (gk+1)T gk .Hence dk+1 dk, which leads to zigzagging.

    95

  • Exercise 6.5.

    Use the SD method to solve (4) with the initial point x0 = (1, 1). [min: 17 (1, 11).]Answer.

    f = (4 x1 + x2 1, 2 x2 + x1 3).Iteration 0: d0 = f (x0) = (4, 0) 6= (0, 0).

    () = f (x0 + d0) = f (1 4, 1) = 2 (1 4)2 2minimum point at 0 =

    14

    x1 = x0 + 0 d0 = (0, 1),d1 = f (x1) = (0,1) = (0, 1) 6= (0, 0).

    Iteration 1: x2 = (0, 32), d2 = (12, 0).

    Iteration 2: x3 = (18, 32), d3 = (0, 18).

    96

  • 6. Newton Method.

    f is twice differentiable.

    Choose dk = [Hk]1gk, where Hk = 2f (xk).Set xk+1 = xk + dk.

    If Hk is positive definite then dk is a descent direction.

    The main drawback of the Newton method is that it requires the computation of

    2f (xk) and its inverse, which can be difficult and time-consuming.Exercise.

    Use the Newton method to solve (4) with x0 = (1, 1).

    (Answer. First iteration gives x1 = 17 (1, 11).)

    97

  • Newton Method.

    Taylor gives

    f (xk + d) f (xk) + dT f (xk) + 12 dT 2f (xk) d =: m(d)mindm(d) m(d) = 0

    f (xk) +2f (xk) d = 0 .Hence choose dk = [2f (xk)]1f (xk) = [Hk]1gk.If Hk is positive definite, then so is (Hk)1, and we get

    (dk)T gk = (gk)T (Hk)1 gk k gk2 < 0for some k > 0.

    Hence dk is a descent direction.

    [Aside: The Newton method for minx f (x) is equivalent to the Newton method for finding

    a root of the system of nonlinear equations f (x) = 0.]

    98

  • Exercise 6.6.

    Use the Newton method to minimize

    f (x1, x2) = 2x21 + x1x2 + x

    22 x1 3x2

    with x0 = (1, 1)T .

    Answer.

    f =(4 x1 + x2 12 x2 + x1 3

    ), H := 2f =

    (4 1

    1 2

    ).

    H1 =1

    detH

    (2 11 4

    )= 17

    (2 11 4

    ).

    Iteration 0: x0 = (1, 1)T , f (x0) = (4, 0)T .x1 = x0 [H0]1f (x0) =

    (1

    1

    ) 17

    (2 11 4

    )(4

    0

    )= 17

    (111

    ).

    f (x1) = (0, 0)T and H positive definite. x1 is minimum point.

    99

  • 7. Choice of Stepsize.

    In computing the step size k we face a tradeoff. We would like to choose k to

    give a substantial reduction of f , but at the same time we do not want to spend too

    much time making the choice. The ideal choice would be the global minimizer of the

    univariate function : R R defined by() = f (xk + dk), > 0,

    but in general, it is too expensive to identify this value.

    A common strategy is to perform an inexact line search to identify a step size that

    achieves adequate reductions in f with minimum cost.

    is normally chosen to satisfy the Wolfe conditions:

    f (xk + k dk) f (xk) + c1 k (gk)Tdk (5)

    f (xk + k dk)Tdk c2 (gk)Tdk, (6)with 0 < c1 < c2 < 1. (5) is called the sufficient decrease condition, and (6) is the

    curvature condition.

    100

  • Choice of Stepsize.

    The simple condition

    f (xk + k dk) < f (xk) ()

    is not appropriate, as it may not lead to a sufficient reduction.

    Example: f (x) = (x 1)2 1. So min f (x) = 1, but we can choose xk satisfying ()such that f (xk) = 1

    k 0.

    Note that the sufficient decrease condition (5)

    () = f (xk + dk) `() := f (xk) + c1 (gk)Tdkyields acceptable regions for . Here () < `() for small > 0, as (gk)Tdk < 0 for

    descent directions.

    The curvature condition (6) is equivalent to

    () c2 (0) [ > (0) ]i.e. a condition on the desired slope and so rules out unacceptably short steps . In

    practice c1 = 104 and c2 = 0.9.

    101

  • 8. Convergence of Line Search Methods.

    An algorithm is said to be globally convergent if

    limk

    gk = 0.

    It can be shown that if the step sizes satisfy the Wolfe conditions

    then the steepest descent method is globally convergent, so is the Newton method provided the Hessian matrices 2f (xk) have a boundedcondition number and are positive definite.

    Exercise. Show that the steepest descent method is globally convergent if the

    following conditions hold

    (a) k satisfies the Wolfe conditions,

    (b) f (x) M x Rn,(c) f C1 and f is Lipschitz: f (x)f (y) L x y x, y Rn.

    102

  • [Hint: Show thatk=0

    gk2

  • Exercise 6.8.

    Assume that dk is a descent direction, i.e. (gk)T dk < 0, where gk := f (xk). Then if1. k satisfies the Wolfe conditions,

    2. f (x) M x Rn,3. f C1 and f is Lipschitz, i.e. f (x)f (y) L x y x, y Rn,

    it holds thatk=0

    cos2 k gk2

  • The Lipschitz condition yields that

    (gk+1 gk)T dk gk+1 gk dk = f (xk+1)f (xk) dk L xk+1 xk dk = k L dk2 . ()

    Combining () and () gives k c2 1L

    (gk)T dk

    dk2 , and hence

    k (gk)T dk c2 1

    L

    [(gk)T dk]2

    dk2Together with Wolfe condition (5) we get

    f (xk+1) f (xk) + c1 c2 1L

    [(gk)T dk]2

    dk2 = f (xk) c cos2 k gk2 ,

    105

  • where c := c11c2L

    > 0.

    f (xk+1) f (xk) c cos2 k gk2 f (x0) ck

    j=0

    cos2 j gj2

    k

    j=0

    cos2 j gj2 1c(f (x0)M) k

    j=0

    cos2 j gj2

  • 9. Popular Search Methods.

    In practice the steepest descent method and the Newton method are rarely used

    due to the slow convergence rate and the difficulty in computing Hessian matrices,

    respectively.

    The popular search methods are

    the conjugate gradient method (variation of SD method with superlinear conver-gence) and

    the quasi-Newton method (variation of Newton method without computation ofHessian matrices).

    There are some efficient algorithms based on the trust-region approach. See Fletcher

    (2000) for details.

    107

  • 10. Constrained Optimization.

    Minimize f (x) over x Rn subject tothe equality constraints

    hi(x) = 0, i = 1, . . . , l ,

    and the inequality constraints

    gj(x) 0, j = 1, . . . ,m .

    Assume that all functions involved are differentiable.

    108

  • 11. Linear Programming.

    The problem is to minimize

    z = c1 x1 + + cn xnsubject to

    ai1 x1 + + ain xn bi, i = 1, . . . ,m ,and

    x1, . . . , xn 0.LPs can be easily and efficiently solved with the simplex algorithm or the interior

    point method.

    MS-Excel has a good in-built LP solver capable of solving problems up to 200 vari-

    ables. MATLAB with optimization toolbox also provides a good LP solver.

    109

  • 12. Graphic Method.

    If an LP problem has only two decision variables (x1, x2), then it can be solved by

    the graphic method as follows:

    First draw the feasible region from the given constraints and a contour line of theobjective function,

    then, on establishing the increasing direction perpendicular to the contour line,find the optimal point on the boundary of the feasible region,

    then find two linear equations which define that point, and finally solve the two equations to obtain the optimal point.Exercise. Use the graphic method to solve the LP:

    minimize z = 3 x1 2 x2subject to x1 + x2 80, 2x1 + x2 100, x1 40, and x1, x2 0.(Answer. x1 = 20, x2 = 60.)

    110

  • 13. Quadratic Programming.

    Minimize

    xTQx + cTx

    subject to

    Ax b and x 0,where Q is an n n symmetric positive definite matrix, A is an n m matrix,x, c Rn, b Rm.To solve a QP problem, one

    first derives a set of equations from the KuhnTucker conditions, and then applies the Wolfe algorithm or the Lemke algorithm to find the optimalsolution.

    The MS-Excel solver is capable of solving reasonably sized QP problems, similarly

    for MATLAB.

    111

  • 14. KuhnTucker Conditions.

    min f (x) over x Rn s.t. hi(x) = 0, i = 1, . . . , l; gj(x) 0, j = 1, . . . ,m.Assume that x is an optimal solution.

    Under some regularity conditions, called the constraint qualifications, there exist

    two vectors u = (u1, . . . , ul) and v = (v1, . . . , vm), called the Lagrange multipliers,

    such that the following set of conditions is satisfied:

    Lxk(x, u, v) = 0, k = 1, . . . , n

    hi(x) = 0, i = 1, . . . , l

    gj(x) 0, j = 1, . . . ,mvj gj(x) = 0, vj 0, j = 1, . . . ,m

    where

    L(x, u, v) = f (x) +

    li=1

    ui hi(x) +

    mj=1

    vj gj(x)

    is called the Lagrange function or Lagrangian.

    112

  • Furthermore, if f : Rn R and hi, gj : Rn R are convex, then x is an optimalsolution if and only if (x, u, v) satisfies the KuhnTucker conditions.

    This holds in particular, when f is convex and hi, gj are linear.

    Example.

    Find the minimum solution to the function x2 x1 subject to x21 + x22 1.Exercise.

    Find the minimum solution to the function x21+x222x14x2 subject to x1+2x2 2

    and x2 0.(Answer. (x1, x2) =

    15 (2, 4).)

    113

  • Interpretation of KuhnTucker conditions

    Assume that no equality constraints are present.

    If x is an interior point, i.e. no constraints are active, then we recover the usual optimality

    condition: f (x) = 0.Now assume that x lies on the boundary of the feasible set and let gjk be the active

    constraints at x. Then a necessary condition for optimality is that we cannot find a

    descent direction for f at x that is also a feasible direction. Such a vector cannot exist,

    if

    f (x) =k

    vjkgjk(x) with vjk 0 . ()

    This is because, if d Rn is a descent direction, thenf (x)T d < 0 andk vjkgjk(x)T d >0.

    So there must exist a jk, such that gjk(x)T d > 0. But that means that d is an ascentdirection for gjk, and as gjk is active at x, it is not a feasible direction.

    If we require vj gj(x) = 0 for all inactive constraints, then we can re-write () as 0 =114

  • f (x) +mj=1 vjgj(x) with vj 0. These are the KT conditions.

    115

  • Application of KuhnTucker: LP Duality

    Let b Rm, c Rn and A Rnm.min cT x s.t. Ax b, x 0 . (P)

    Equivalent to min cT x s.t. b Ax 0, x 0 .Lagrangian: L = cT x + vT (b Ax) + yT (x).Hence, x is the solution, if there exist v and y such that

    L = c AT v y = 0 y = c AT v,KT conditions: vT (b A x) = 0, yT (x) = 0,

    v, y 0, b A x 0, x 0.

    116

  • Eliminate y to find v, x:

    A x b, x 0 feasible region: primalAT v c, v 0 feasible region: dualvT (b A x) = 0xT (c AT v) = 0

    } xT c = xT AT v = vT b

    Hence v Rm solves the dual:max bT v s.t. AT v c, v 0 . (D)

    117

  • Here we have used that

    cT x = minx0, A xb

    cT x

    minx0 maxv0 c

    T x + vT (b Ax)= max

    v0 minx0 cT x + vT (b Ax)

    = maxv0 minx0 v

    T b + xT (c AT v) max

    v0, AT vcvT b

    vT b = cT x

    118

  • Example 6.14.

    Find the minimum solution to the function x2 x1 subject to x21 + x22 1.Answer.

    L = x2 x1 + v (x21 + x22 1), so the KT conditions becomeLx1 = 1 + 2 v x1 = 0 (1)Lx2 = 1 + 2 v x2 = 0 (2)

    x21 + x22 1 (3)

    v (x21 + x22 1) = 0, v 0 (4)

    (1) v > 0 and hence x1 = 12v , x2 = 12v .Plugging this into (4) yields 2

    4 v2= 1 and hence

    v =12

    x1 = 12, x2 = 1

    2;

    with the optimal value being z = 2.

    119

  • Example.

    min x1 s.t. x2 x31 0, x1 1, x2 0.Since x31 x2 0 we have x1 0 and hence x = (0, 0) is the unique minimizer.Lagrangian: L = x1 + v1 (x2 x31) + v2 (x1 1) + v3 (x2).KT conditions for a feasible point x:

    L =(1 3 v1 x21 + v2

    v1 v3

    )= 0 (1)

    v1 (x2 x31) = 0, v2 (x1 1) = 0, v3 (x2) = 0 (2)v1, v2, v3 0 (3)

    Check KT conditions at x = (0, 0):

    (1) v2 = 1 < 0 impossible!KT condition is not satisfied, since the constraint qualifications do not hold.

    Here g1 = x2x31 and g3 = x2 are active at (0, 0), and g1 =(01

    ), g3 =

    (01). Hence

    g1 and g3 are not linearly independent!120

  • Exercise 6.14

    Find the minimum solution to the function x21+x22 2 x1 4 x2 subject to x1+2 x2 2

    and x2 0.Answer.

    Lagrangian: L = x21 + x22 2 x1 4 x2 + v1 (x1 + 2 x2 2) + v2 (x2)

    KT conditions:

    L =(

    2 x1 2 + v12 x2 4 + 2 v1 v2

    )= 0 (1)

    v1 (x1 + 2 x2 2) = 0, v2 x2 = 0, v1, v2 0 (2)x1 + 2 x2 2, x2 0 (3)

    If x1 + 2 x2 2 < 0 then v1 = 0 x1 = 1 , x2 = 2 + 12 v2 2. Hence from (2),v2 = 0, and so x1 = 1, x2 = 2. But that contradicts (3), and so it must hold that

    x1 + 2 x2 2 = 0.If x2 > 0, then v2 = 0 solving (1) together with x1 + 2 x2 2 = 0 gives x1 = 25,x2 =

    45; and v1 =

    65, v2 = 0. X

    121

  • If x2 = 0 then x1 = 2 2 x2 = 2. But then from (1), v1 = 2 < 0. Impossible.

    122

  • 7. Lattice (Tree) Methods

    1. Concepts in Option Pricing Theory.

    A call (or put) option is a contract that gives the holder the right to buy (or sell)

    a prescribed asset (underlying asset) by a certain date T (expiration date) for a

    predetermined price X (exercise price).

    A European option can only be exercised on the expiration date while an American

    option can be exercised at any time prior to the expiration date.

    The other party to the holder of the option contract is called the writer.

    The holder and the writer are said to be in long and short positions of the contract,

    respectively.

    The terminal payoffs of a European call (or put) option is (S(T )X)+ := max(S(T )X, 0) (or (X S(T ))+), where S(T ) is the underlying asset price at time T .

    123

  • 2. Random Walk Model and Assumption.

    Assume S(t) and V (t) are the asset price and the option price at time t, respectively,

    and the current time is t and current asset price is S, i.e. S(t) = S.

    After one period of time t, the asset price S(t + t) is either uS (up state)

    with probability q or dS (down state) with probability 1 q, where q is a real(subjective) probability.

    To avoid riskless arbitrage opportunities, we must have

    u > R > d

    where R := ert and r is the riskless interest rate.

    Here r is the riskless interest rate that allows for unlimited borrowing or lending at

    the same rate r.

    Investing $1 at time t yields a value (return) at time t + t of $R = $(ert).

    (continuous compounding).

    124

  • Continuous compounding

    Saving an amount B at time t, yields with interest rate r at time t + t without

    compounding: (1 + rt)B.

    Compounding the interest n times yields(1 +

    rt

    n

    )nB ertB as n .

    Example:

    Consider t = 1 and r = 0.05.

    Then R = er = 1.0512711, equivalent to an AER of 5.127%.

    n (1 + rn)n AER

    1 1.05 5 %

    4 1.05094534 5.095 %

    12 1.05116190 5.116 %

    52 1.05124584 5.125 %

    365 1.05126750 5.127 %

    125

  • No Arbitrage. u > R > d

    If R u, then short sell > 0 units of the asset, deposit S in the bank; giving aportfolio value at time t of:

    (t) = 0

    and at time t +t of

    (t +t) = R (S) S(t +t) R (S) (uS) 0 .

    Moreover, in the down state (with probability 1 q > 0) the value is(t +t) = R (S) (dS) > 0 .

    Hence we can make a riskless profit with positive probability.

    No arbitrage implies u > R.

    A similar argument yields that d < R.

    126

  • 3. Replicating Portfolio.

    We form a portfolio consisting of

    units of underlying asset ( > 0 buying, < 0 short selling) and

    cash amount B in riskless cash bond (B > 0 lending, B < 0 borrowing).

    If we choose

    =V (uS, t +t) V (dS, t +t)

    uS dS ,

    B =uV (dS, t +t) d V (uS, t +t)

    uR dR ,we have replicated the payoffs of the option at time t+t, no matter how the asset

    price changes.

    127

  • Replicating Portfolio.

    Value of portfolio at time t : (t) = S + B, and at time t +t:

    (t +t) = S(t +t) +RB =

    uS +RB =: tu w. prob. q dS +RB =: td w. prob. 1 qCall option value at time t +t (expiration time) :

    C(t +t) =

    (uS X)+ =: Ctu w. prob. q(dS X)+ =: Ctd w. prob. 1 qTo replicate the call option, we must have tu = C

    tu and

    td = C

    td . HenceuS +RB = Ctu d S +RB = Ctd

    =

    Ctu Ctdu S dS

    B =uCtd dCtuuR dR

    This gives the fair price for the call option at time t: (t) = S + B.

    128

  • Replicating Portfolio.

    The value of the call option at time t is:

    C(t) = (t) = S + B =Ctu Ctdu S dS S +

    uCtd dCtuuR dR

    =1

    R

    [R du d C

    tu +

    u Ru d C

    td

    ]=

    1

    R

    [pCtu + (1 p)Ctd

    ],

    where

    p =R du d .

    No arbitrage argument: If C(t) < (t), then buy call option at price C(t) and sell

    portfolio at (t) (that is short sell units of asset and lend B amounts of cash to the

    bank). This gives an instantaneous profit of (t) C(t) > 0. And we know that attime t+t, the payoff from our call option will exactly compensate for the value of the

    portfolio, as C(t +t) = (t +t).

    A similar argument for the case C(t) > (t) yields that C(t) = (t).

    129

  • 4. Risk Neutral Option Price.

    The current option price is given by

    V (S, t) =1

    R

    (p V (uS, t +t) + (1 p)V (dS, t +t)

    ), (7)

    where p =R du d . Note that

    1. the real probability q does not appear in the option pricing formula,

    2. p is a probability (0 < p < 1) since d < R < u, and

    3. p (uS) + (1 p) (dS) = RS, so p is called the risk neutral probability and theprocess of finding V is called the risk neutral valuation.

    130

  • 5. Riskless Hedging Principle.

    We can also derive the option pricing formula (7) as follows:

    form a portfolio with a long position of one unit of option and a short position of

    units of underlying asset.

    By choosing an appropriate we can ensure such a portfolio is riskless.

    Exercise.

    Find (called delta) and derive the pricing formula (7).

    131

  • 6. Asset Price Process.

    In a risk neutral continuous time model, the asset price S is assumed to follow a

    lognormal process.

    (as discussed in detail in Stochastic Processes I, omitted here).

    Over a period [t, t+ ] the asset price S(t+ ) can be expressed in terms of S(t) = S

    as

    S(t + ) = S e(r12

    2) + Z

    where r is the riskless interest rate, the volatility, and Z N(0, 1) a standardnormal random variable.

    S(t + ) has the first moment S er and the second moment S2 e(2 r+2) .

    132

  • 7. Relations Between u, d and p.

    By equating the first and second moments of the asset price S(t + t) in both

    continuous and discrete time models, we obtain the relation

    S(p u + (1 p) d) = S ert,

    S2(p u2 + (1 p) d2) = S2 e(2 r+2)t,

    or equivalently,

    p u + (1 p) d = ert, (8)p u2 + (1 p) d2 = e(2 r+2)t. (9)

    There are two equations and three unknowns u, d, p. An extra condition is needed to

    uniquely determine a solution.

    [Note that (8) implies p =R du d , where R = e

    rt as before.]

    133

  • 8. CoxRossRubinstein Model.

    The extra condition is

    u d = 1. (10)

    The solutions to (8), (9), (10) are

    u =1

    2R

    (2 + 1 +

    (2 + 1)2 4R2

    ),

    d =1

    2R

    (2 + 1

    (2 + 1)2 4R2

    ),

    where

    2 = e(2 r+2)t.

    If the higher order term O((t)32) is ignored in u, d, then

    u = et, d = e

    t, p =

    R du d .

    These are the parameters chosen by CoxRossRubinstein for their model.

    Note that with this choice (9) is satisfied up to O((t)2).

    134

  • Proof.

    2 = e(2 r+2)t

    = p (u2 d2) + d2

    =R du d (u + d) (u d) + d

    2

    = Ru d u +Rd= Ru 1 + R

    u Ru2 (1 + 2)u +R = 0 .

    As u > d, we get the unique solutions

    utrue =1 + 2 +

    (1 + 2)2 4R22R

    =1 + 2

    2R+

    (1 + 2

    2R

    )2 1

    and

    dtrue =1

    u=1 + 2 (1 + 2)2 4R2

    2R=1 + 2

    2R(

    1 + 2

    2R

    )2 1 .

    135

  • Moreover1 + 2

    2R=1

    2

    (e(2 r+

    2)t + 1)ert

    =1

    2

    (2 + (2 r + 2)t +O((t)2)

    ) (1 rt +O((t)2))

    =1

    2

    (2 + 2t +O((t)2)

    )= 1 + 12

    2t +O((t)2),

    and (1 + 2

    2R

    )2 1 =

    (1 + 12

    2t +O((t)2))2 1

    =1 + 2t +O((t)2) 1

    = t1 +O(t)

    = t (1 +O(t))

    = t +O((t)

    32) .

    utrue = 1 + 12 2t + t +O((t)

    32) .

    136

  • The CoxRossRubinstein (CRR) model uses

    u = et = 1 +

    t + 12

    2t +O((t)32) .

    Hence the first three terms in the Taylor series of the CRR value u and the true value

    utrue match. So

    u = utrue +O((t)32) .

    Estimating the error in (9) by this choice of u gives

    Error = p u2 + (1 p) d2 e(2 r+2)t= R (u + d) 1 e(2 r+2)t= ert

    (et + e

    t) 1 e(2 r+2)t

    =(1 + rt +O((t)2)

    ) (2 + 2t +O((t)2)

    ) 1 (1 + (2 r + 2)t +O((t)2))

    = O((t)2) .

    137

  • 9. JarrowRudd Model.

    The extra condition is

    p =1

    2. (11)

    Exercise.

    Show that the solutions to (8), (9), (11) are

    u = R(1 +

    e

    2t 1),

    d = R(1

    e

    2t 1).

    Show that if O((t)32) is ignored then

    u = e(r2

    2 )t+t ,

    d = e(r2

    2 )tt .

    (These are the parameters chosen by JarrowRudd for their model.)

    Also show that (8) and (9) are satisfied up to O((t)2).

    138

  • 10. Tian Model.

    The extra condition is

    p u3 + (1 p) d3 = e3 rt+3 2 t. (12)Exercise.

    Show that the solutions to (8), (9), (12) are

    u =RQ

    2

    (Q + 1 +

    Q2 + 2Q 3

    ),

    d =RQ

    2

    (Q + 1

    Q2 + 2Q 3

    ),

    p =R du d ,

    where R = ert and Q = e2 t.

    Also show that if O((t)32) is ignored, then

    p =1

    2 34t.

    139

  • (Note that u d = R2Q2 instead of u d = 1 and that the binomial tree loses its

    symmetry about S whenever u d 6= 1.)

    140

  • 11. BlackScholes Equation (BSE).

    As the time interval t tends to zero, the one period option pricing formula, (7) with

    (8) and (9), tends to the BlackScholes Equation

    V

    t+ r S

    V

    S+1

    22 S2

    2V

    S2 r V = 0. (13)

    Proof.

    Two variable Taylor expansion at (S, t) gives

    V (uS, t +t) = V + VS (uS S) + Vtt+ 12(VSS (uS S)2 + 2VSt (uS S)t + Vtt (t)2

    )+ higher order terms .

    Similarly,

    V (dS, t +t) = V + VS (dS S) + Vtt+ 12(VSS (dS S)2 + 2VSt (dS S)t + Vtt (t)2

    )+ higher order terms .

    141

  • Substituting into the right hand side of (7) gives

    V = {V + S VS [p (u 1) + (1 p) (d 1)] + Vtt+12[S2 VSS

    (p (u 1)2 + (1 p) (d 1)2)

    +2S VSt (p (u 1) + (1 p) (d 1))t+Vtt (t)

    2]+ higher order terms

    }ert .

    Now it follows from (8) that

    p (u 1) + (1 p) (d 1) = p u + (1 p) d 1 = ert 1 = rt +O((t)2) .Similarly, (8) and (9) give that

    p (u 1)2 + (1 p) (d 1)2 = p u2 + (1 p) d2 2 (p u + (1 p) d) + 1= e(2 r+

    2)t 2 ert + 1 = 2t +O((t)2) .

    142

  • So, for the RHS of (7) we get

    V ={V + S VS

    (rt +O((t)2)

    )+ Vtt

    +12[S2 VSS

    (2t +O((t)2)

    )+ 2VSt

    (rt +O((t)2)

    )t]

    +O((t)2)}ert

    ={V +

    (r S VS + Vt +

    12

    2 S2 VSS)t +O((t)2)

    } (1 rt +O((t)2))

    = V +(r V + r S VS + Vt + 12 2 S2 VSS) t +O((t)2) .

    Cancelling V and dividing by t yields

    r V + r S VS + Vt + 12 2 S2 VSS +O(t) = 0 .Ignoring the O(t) term, we get the BSE.

    The binomial model approximates BSE to 1st order accuracy.

    143

  • 12. n-Period Option Price Formula.

    For a multiplicative n-period binomial process, the call value is

    C = Rnnj=0

    (n

    j

    )pj (1 p)nj max(uj dnj S X, 0) , (14)

    where

    (n

    j

    )=

    n!

    j!(n j)! is the binomial coefficient.

    Define k to be the smallest non-negative integer such that uk dnk S X .The call pricing formula can be simplified as

    C = S (n, k, p)X Rn(n, k, p) , (15)where p =

    u p

    Rand is the complementary binomial distribution function defined

    by

    (n, k, p) =n

    j=k

    (n

    j

    )pj (1 p)nj.

    144

  • Asset:

    u2 S

    uS

    S u dS

    dS

    d2 S

    recombined tree

    Call:

    C2tuu = (u2 S X)+

    Ctu

    C C2tud = (u dS X)+Ctd

    C2tdd = (d2 S X)+

    t t +t t + 2t

    145

  • Call price at time t +t:

    Ctu =1

    R

    (pC2tuu + (1 p)C2tud

    ), Ctd =

    1

    R

    (pC2tud + (1 p)C2tdd

    ).

    Call price at time t:

    C =1

    R

    (pCtu + (1 p)Ctd

    )=

    1

    R2(p2C2tuu + 2 p (1 p)C2tud + (1 p)2C2tdd

    ).

    Proof of (14) by induction:

    Assume (14) holds for n k 1, then for n = k, there are k 1 periods between timet +t and t + kt. So

    Ctu =1

    Rk1

    k1j=0

    (k 1j

    )pj (1 p)k1j (uj dk1j (uS)X)+

    =1

    Rk1

    kj=1

    (k 1j 1

    )pj1 (1 p)kj (uj dkj S X)+ .

    146

  • pCtu =1

    Rk1

    kj=1

    (k 1j 1

    )pj (1 p)kj (uj dkj S X)+ ,

    and similarly

    (1 p)Ctd =1

    Rk1

    k1j=0

    (k 1j

    )pj (1 p)kj (uj dkj S X)+ .

    Combining gives

    C =1

    R

    (pCtu + (1 p)Ctd

    )=

    1

    Rk[pk (uk S X)+ + (1 p)k (dk S X)+

    +k1j=1

    {(k 1j 1

    )+

    (k 1j

    )}

    =

    (k

    j

    ) pj (1 p)kj (uj dkj S X)+]

    This proves (14).

    147

  • Let k be the smallest non-negative integer such that

    uk dnk S X (ud

    )k XS dn

    k ln XS dn

    / lnu

    d.

    Then

    (uj dnj S X)+ =0 j < kuj dnj S X j k .

    Hence

    C =1

    Rn

    nj=k

    (n

    j

    )pj (1 p)nj (uj dnj S X)

    = S

    nj=k

    (n

    j

    )[p u

    R

    ]j [(1 p) d

    R

    ]njX Rn

    nj=k

    (n

    j

    )pj (1 p)nj .

    On setting p =p u

    R, it holds that

    1 p = 1 R du d

    u

    R=Rd + u dR (u d) =

    uRu d

    d

    R=(1 p) d

    R.

    This proves (15).

    148

  • 13. BlackScholes Call Price Formula.

    As the time interval t tends to zero, the n-period call price formula (15) tends to

    the BlackScholes call option pricing formula

    c = S (d1)X er (Tt)(d2) (16)where

    d1 =ln S

    X+ (r +

    2

    2 ) (T t)T t

    and

    d2 =ln S

    X+ (r 22 ) (T t)T t d1

    T t.

    Proof.

    The n-period call price (15) is given by

    C = S (n, k, p)X Rn(n, k, p) = S (n, k, p)X er nt er (Tt)

    (n, k, p) .

    149

  • Hence we need to prove

    (n, k, p) (d1) and (n, k, p) (d2) as t 0.

    150

  • Let j be the binomial RV of the number of upward moves in n periods. Then

    Sn = S uj dnj = S

    (ud

    )jdn ,

    where Sn is the asset price at time T = t + nt.

    ln SnS

    = j lnu

    d+ n ln d = j ln u + (n j) ln d ,

    E(j) = n p , Var(j) = n p (1 p) ,E

    (lnSn

    S

    )= n p ln

    u

    d+ n ln d , Var

    (lnSn

    S

    )= n p (1 p)

    (lnu

    d

    )2.

    Moreover,

    1 (n, k, p) = P (j k 1) = P(lnSn

    S (k 1) ln u

    d+ n ln d

    ).

    As k is the smallest integer such that k ln XS dn

    / ln ud, there exists an (0, 1] such

    that

    k 1 = ln XS dn

    / lnu

    d ,

    151

  • and so

    1 (n, k, p) = P(lnSn

    S lnX

    S ln u

    d

    ).

    152

  • Define

    Xni :=

    ln u with prob. pln d with prob 1 p i = 1, . . . , n ,and Zn := X

    n1 + . . . +X

    nn .

    Then, for each n, {Xn1 , . . . , Xnn} are independent and Zn ln SnS . Hence

    1 (n, k, p) = P (Zn n) , n := lnXS ln u

    d.

    Assuming e.g. the CRR model, we get

    E(Zn) = n p ln u + n (1 p) ln d (T t), where := r 2

    2,

    Var(Zn) = n p (1 p) ln2 ud

    2 (T t) ,

    n = lnX

    S ln u

    d lnX

    S,

    and |Xni | |t| =

    T tn

    yn 0 as n .

    153

  • Hence Theorem 5.10. implies that

    ZnD Z N( (T t), 2 (T t)) .

    154

  • So, as n

    1 (n, k, p) = P (Zn n) 12 2 (T t)

    ln XS

    e(s (Tt))2

    2 2 (Tt) ds .

    On letting

    y :=s (T t)T t ,

    we have

    1 (n, k, p) 12

    ln XS (Tt)Tt

    e

    y2

    2 dy =

    (ln X

    S (T t)T t

    ).

    Hence

    (n, k, p) 1 (ln X

    S (T t)T t

    )=

    (ln

    XS (T t)T t

    )

    =

    (ln S

    X+ (r 22 ) (T t)T t

    )= (d2) .

    Similarly one can show (n, k, p) (d1).155

  • 14. Implementation.

    Choose the number of time steps that is required to march down until the expiration

    time T to be n, so that t =T tn

    .

    At time ti = t + it there are i + 1 nodes.

    Denote these nodes as (i, j) with i = 0, 1, . . . , n and j = 0, 1, . . . , i, where the first

    component i refers to time ti and the second component j refers to the asset price

    S uj dij.

    Compute the option prices V nj = V (S uj dnj, T ), j = 0, 1, . . . , n, at expiration time

    tn = T over n + 1 nodes.

    Then go backwards one step and compute option prices at time tn1 = T t usingthe one period binomial option pricing formula over n nodes.

    Continue until reaching the current time t0 = t.

    156

  • The recursive formula is

    V ij =1

    R

    (p V i+1j+1 + (1 p)V i+1j

    ), j = 0, . . . , i and i = n 1, . . . , 0 ,

    where

    p =R du d .

    The value V 00 = V (S, t) is the option price at the current time t.

    C++ Exercise: Write a program to implement the CoxRossRubinstein model to

    price options with a non-dividend paying underlying asset. The inputs are the asset

    price S, the strike price X , the riskless interest rate r, the current time t and the

    maturity time T , the volatility , and the number of steps n. The outputs are the

    European call and put option prices.

    157

  • 15. Greeks.

    Assume the current time is t = 0. Some common sensitivity measures of a European

    call option price c (the rate of change of the call price with respect to the underlying

    factors) are delta, gamma, vega, rho, theta, defined by

    =c

    S, =

    2c

    S2, v =

    c

    , =

    c

    r, = c

    T.

    Exercise.

    Derive the following relations from the call price (16).

    = (d1) ,

    =(d1)S

    T,

    v = ST (d1) ,

    = X T er T (d2) ,

    = S (d1)

    2T

    r X er T (d2) .

    158

  • 16. Dynamic Hedging.

    Greeks can be used to hedge an option position against changes of underlying factors.

    For example, to hedge a short position of a European call against changes of the asset

    price, one should keep a long position of the underlying asset with units.

    This strategy requires continuous trading and is prohibitively expensive if there are

    any transaction costs.

    159

  • 17. Dividends.

    (a) If the asset pays a continuous dividend yield at a rate q, then use r q instead ofr in building the asset price tree, but still use r in discounting the option values

    in the backward calculation.

    In the risk-neutral model, the asset has a rate of return of r q.So

    (8) becomes p u + (1 p) d = e(rq)t ,(9) becomes p u2 + (1 p) d2 = e(2 r2 q+2)t .

    However, the discount factor1

    Rremains the same:

    1

    R= ert .

    160

  • (b) If the asset pays a discrete proportional dividend S between ti1 and ti, then theasset price at node (i, j) drops to (1 )uj dij S instead of the usual uj dij S.The tree is still recombined.

    161

  • (1 ) u4 S

    (1 ) u2 SS

    (1 ) u dS

    (1 ) d2 S (1 ) d4 S

    (c) If the asset pays a discrete cash dividend D between ti1 and ti, then the assetprice at node (i, j) drops to uj dij SD instead of the usual uj dij S. There arei+ 1 new recombined trees emanating from the nodes at time ti, but the original

    tree is no longer recombined.

    162

  • Suppose a cash dividend D is paid between ti1 and ti, then at time ti+m thereare (i + 1) (m + 1) nodes, instead of i +m + 1 in a recombined tree.

    If there are several cash dividends, then the number of nodes is much larger than

    in a normal tree.

    Example. Assume there is a cash dividend D between periods 1 and 2. Then

    the possible asset prices at time n = 2 are {u2 S D, u dS D, d2 S D}. Atn = 3 there are 6 nodes: u {u2 S D, u dS D, d2 S D} and d {u2 S D, u dS D, d2 S D}, instead of 4.[Reason: d (u2 S D) 6= u (u dS D)].

    163

  • u2 (u2 S D)

    u2 S D

    S

    u dS D

    d2 S D

    d2 (d2 S D)

    164

  • 18. Trinomial Model.

    Suppose the asset price is S at time t. At time t + t the asset price can be

    uS (up-state) with probability pu, mS (middle-state) with probability pm, and dS

    (down-state) with probability pd. Then the risk neutral one period option pricing

    formula is

    V (S, t) =1

    R(pu V (uS, t +t) + pm V (mS, t +t) + pd V (dS, t +t)) .

    The trinomial tree approach is equivalent to the explicit finite difference method. By

    equating the first and second moments of the asset price S(t+t) in both continuous

    and discrete models, we obtain the following equations:

    pu + pm + pd = 1 ,

    pu u + pmm + pd d = ert ,

    pu u2 + pmm

    2 + pd d2 = e(2 r+

    2)t .

    There are three equations and six variables u,m, d and pu, pm, pd. We need three

    more equations to uniquely determine these variables.

    165

  • 19. Boyle Model.

    The three additional conditions are

    m = 1, u d = 1, u = et ,

    where is a free parameter.

    Exercise.

    Show that the risk-neutral probabilities are given by

    pu =(W R)u (R 1)

    (u 1) (u2 1) , pm = 1 pu pd ,

    pd =(W R)u2 (R 1) u3

    (u 1) (u2 1) ,

    where R = ert and W = e(2 r+2)t.

    Note that if = 1, which corresponds to the choice of u as in the CRR model, certain

    sets of parameters can lead to pm < 0. To rectify this, choose > 1.

    166

  • Boyle claimed that if pu pm pd 13, then the trinomial scheme with 5 steps iscomparable to the CRR binomial scheme with 20 steps.

    167

  • 20. HullWhite Model.

    This is the same as the Boyle model with =3.

    Exercise.

    Show that the risk-neutral probabilities are

    pd =1

    6

    t

    12 2

    (r 1

    22),

    pm =2

    3,

    pu =1

    6+

    t

    12 2

    (r 1

    22),

    if terms of order O(t) are ignored.

    168

  • 21. KamradRitchken Model.

    If S follows a lognormal process, then we can write ln S(t+t) = lnS(t)+Z, where

    Z is a normal random variable with mean (r 12 2)t and variance 2t.KR suggested to approximate Z by a discrete random variable Za as follows: Za =

    x with probability pu, 0 with probability pm, and x with probability pd, wherex =

    t and 1.

    The corresponding u,m, d in the trinomial tree are u = ex,m = 1, d = ex.

    By omitting the higher order term O((t)2), they showed that the risk-neutral prob-

    abilities are

    pu =1

    22+

    1

    2

    (r 1

    22)

    t ,

    pm = 1 12

    ,

    pd =1

    22 12

    (r 1

    22)

    t .

    Note that if = 1 then pm = 0 and the trinomial scheme is reduced to the binomial

    169

  • scheme.

    KR claimed that if pu = pm = pd =13, then the trinomial scheme with 15 steps is

    comparable to the binomial scheme (pm = 0) with 55 steps. They also discussed the

    trinomial scheme with two correlated state variables.

    170

  • Za =

    x with prob. pu

    0 with prob. pm

    x with prob. pdand the equations become

    pu + pm + pd = 1 , (1)

    E(Za) = x (pu pd) = (r 12 2)t , (2)Var(Za) = (x)2 (pu + pd) (x)2 (pu pd)2

    O((t)2)

    = 2t .

    Dropping the O((t)2) term leads to

    (x)2 (pu + pd) = 2t . (3)

    171

  • Hence

    pu =1

    2

    [t

    (x)22 +

    t

    x(r 1

    22)

    ], pd =

    1

    2

    [t

    (x)22 t

    x(r 1

    22)

    ].

    pu = 12

    [1

    2+

    1

    (r 1

    22)

    t

    ], pd =

    1

    2

    [1

    2 1

    (r 122)

    t

    ].

    172

  • 8. Finite Difference Methods

    1. Diffusion Equations of One State Variable.

    u

    t= c2

    2u

    x2, (x, t) D , (17)

    where t is a time variable, x is a state variable, and u(x, t) is an unknown function

    satisfying the equation.

    To find a well-defined solution, we need to impose the initial condition

    u(x, 0) = u0(x) (18)

    and, if D = [a, b] [0,), the boundary conditionsu(a, t) = ga(t) and u(b, t) = gb(t) , (19)

    where u0, ga, gb are continuous functions.

    173

  • If D = (,) (0,), we need to impose the boundary conditionslim|x|

    u(x, t) ea x2= 0 for any a > 0. (20)

    (20) implies u(x, t) does not grow too fast as |x| .The diffusion equation (17) with the initial condition (18) and the boundary conditions

    (19) is well-posed, i.e. there exists a unique solution that depends continuously on u0,

    ga and gb.

    174

  • 2. Grid Points.

    To find a numerical solution to equation (17) with finite difference methods, we first

    need to define a set of grid points in the domain D as follows:

    Choose a state step size x = baN

    (N is an integer) and a time step size t, draw

    a set of horizontal and vertical lines across D, and get all intersection points (xj, tn),

    or simply (j, n),

    where xj = a + jx, j = 0, . . . , N , and tn = nt, n = 0, 1, . . ..

    If D = [a, b] [0, T ] then choose t = TM

    (M is an integer) and tn = nt, n =

    0, . . . ,M .

    175

  • x0 = a x1 x2 . . . xN = b

    t0 = 0

    t1

    t2

    t3

    ...

    tM = T

    176

  • 3. Finite Differences.

    The partial derivatives

    ux :=u

    xand uxx :=

    2u

    x2

    are always approximated by central difference quotients, i.e.

    ux unj+1 unj1

    2xand uxx

    unj+1 2 unj + unj1(x)2

    (21)

    at a grid point (j, n). Here unj = u(xj, tn).

    Depending on how ut is approximated, we have three basic schemes: explicit, implicit,

    and CrankNicolson schemes.

    177

  • 4. Explicit Scheme.

    If ut is approximated by a forward difference quotient

    ut un+1j unj

    t

    at (j, n),

    then the corresponding difference equation t