Taylor's Theorem for Matrix Functions and Pseudospectral Bounds on the Condition Number

Taylor’s Theorem for Matrix Functions and

Pseudospectral Bounds on the

Condition NumberSamuel Relton

[email protected] @sdrelton

samrelton.com blog.samrelton.com

Joint work with Edvin [email protected]

SIAM LA15, AtlantaOctober 28th, 2015

Sam Relton (UoM) Taylor’s theorem for f (A) October 28th, 2015 1 / 16

mailto:[email protected]

http://www.twitter.com/sdrelton

http://www.samrelton.com

http://blog.samrelton.com

mailto:[email protected]

Overview

• Taylor’s theorem for scalars

• Matrix functions and their derivatives

• Taylor’s theorem for matrix functions

• Pseudospectral bounds


Taylor’s theorem

Theorem

Let f : R→ R be k times continuously differentiable at a ∈ R, then thereexists Rk : R→ R such that

f (x) =k∑

j=0

f (j)(a)

j!(x − a)j + Rk(x),

with Rk = o(|x − a|k) as x → a.


Remainder Formulae

There are various formulae for the remainder, for example.

• Lagrange form:

Rk(x) =f (k+1)(c)

(k + 1)!(x − a)k+1

• Integral form:

Rk(x) =

∫ x

a

(x − t)k

k!f (k+1)(t)dt


Taylor’s theorem for complex functions

We can generalize Taylor’s theorem to complex analytic functionsf : C→ C. Expanding about a point a ∈ C:

f (z) =k∑

j=0

f (k)(a)

k!(z − a)j + Rk(z),

Rk(z) =(z − a)k+1

2πi

∫Γ

f (ω)dω

(ω − a)k+1(ω − z).

• Rk is now expressed as a contour integral

• Γ is a circle, centred at a, such that f is analytic within Γ


Taylor’s theorem for complex functions

We can generalize Taylor’s theorem to complex analytic functionsf : C→ C. Expanding about a point a ∈ C:

f (z) =k∑

j=0

f (k)(a)

k!(z − a)j + Rk(z),

Rk(z) =(z − a)k+1

2πi

∫Γ

f (ω)dω

(ω − a)k+1(ω − z).

• Rk is now expressed as a contour integral

• Γ is a circle, centred at a, such that f is analytic within Γ


Matrix functions and their derivatives

We are interested in functions f : Cn×n → Cn×n that generalize scalarfunctions e.g.,

exp(A) =∞∑k=0

Ak

k!,

log(I + A) =∞∑k=1

(−1)k+1Ak

k, ρ(A) < 1.

Applications include:

• Differential equations: dudt = Au(t), u(t) = exp(tA)u(0).

• Second order ODES with sine and cosine.

• Ranking importance of nodes in a graph, etc.


Matrix functions and their derivatives

We are interested in functions f : Cn×n → Cn×n that generalize scalarfunctions e.g.,

exp(A) =∞∑k=0

Ak

k!,

log(I + A) =∞∑k=1

(−1)k+1Ak

k, ρ(A) < 1.

Applications include:

• Differential equations: dudt = Au(t), u(t) = exp(tA)u(0).

• Second order ODES with sine and cosine.

• Ranking importance of nodes in a graph, etc.


Frechet derivatives

Definition (Frechet derivative)

The Frechet derivative of f at A is Lf (A, ·) : Cn×n → Cn×n which is linearand, for any E , satisfies

f (A + E )− f (A) = Lf (A, E ) + o(‖E‖).

• Lf (A, E ) is a linear approximation to f (A + E )− f (A).

• Higher order derivatives are defined recursively (Higham & R., 2014).

• Applications include matrix optimization, image processing, modelreduction, etc.


Condition numbers

A condition number describes the sensitivity of f and A to smallperturbation which arise from rounding error etc.

The absolute condition number is given by

condabs(f , A) := limε→0

sup‖E‖≤ε

‖f (A + E )− f (A)‖ε

= max‖E‖=1

‖Lf (A, E )‖,

whilst the relative condition number is

condrel(f , A) := condabs(f , A)‖A‖‖f (A)‖

.


Taylor’s theorem for matrix functions - 1

Previous work on Taylor’s theorem includes the following.

• Expanding f (A) about a matrix αI (Higham, 2008)

f (A) =∞∑j=0

f (j)(α)

j!(A− αI )j .

• Expansion in higher-order Frechet derivatives(Al-Mohy and Higham, 2010).

f (A + E ) =∞∑j=0

1

j!D

[j]f (A, E ).

We give an explicit remainder term for the latter.



Previous work on Taylor’s theorem includes the following.

• Expanding f (A) about a matrix αI (Higham, 2008)

f (A) =∞∑j=0

f (j)(α)

j!(A− αI )j .

• Expansion in higher-order Frechet derivatives(Al-Mohy and Higham, 2010).

f (A + E ) =∞∑j=0

1

j!D

[j]f (A, E ).

We give an explicit remainder term for the latter.



Theorem (Deadman and R.)

Let f have a power series with radius of convergence r and let D be asimply connected set within the circle of radius r centered at 0. LetA, E ∈ Cn×n be such that Λ(A), Λ(A + E ) ⊂ D. Then for any k ∈ N

f (A + E ) = Tk(A, E ) + Rk(A, E ),

where

Tk(A, E ) =k∑

j=0

1

j!D

[j]f (A, E ),

Rk(A, E ) =1

2πi

∫Γ

f (z)(zI − A− E )−1[E (zI − A)−1

]k+1dz ,

and Γ is a closed contour in D enclosing Λ(A) and Λ(A + E ).


Example - Taylor’s theorem for f (A) = (I + A)−1

If f (A) = (I + A)−1 (with ρ(A) < 1) then

D[1]f (A, E ) = −(I + A)−1E (I + A)−1

D[2]f (A, E ) = 2(I + A)−1E (I + A)−1E (I + A)−1

Therefore we have

f (A + E )= (I + A)−1

− (I + A)−1E (I + A)−1

+ (I + A)−1E (I + A)−1E (I + A)−1

+1

2πi

∫Γ

1

1 + z(zI − A− E )−1

[E (zI − A)−1

]3.




D[1]f (A, E ) = −(I + A)−1E (I + A)−1

D[2]f (A, E ) = 2(I + A)−1E (I + A)−1E (I + A)−1

Therefore we have

f (A + E )= (I + A)−1

− (I + A)−1E (I + A)−1

+ (I + A)−1E (I + A)−1E (I + A)−1

+1

2πi

∫Γ

1

1 + z(zI − A− E )−1

[E (zI − A)−1

]3.




D[1]f (A, E ) = −(I + A)−1E (I + A)−1

D[2]f (A, E ) = 2(I + A)−1E (I + A)−1E (I + A)−1

Therefore we have

f (A + E )= (I + A)−1

− (I + A)−1E (I + A)−1

+ (I + A)−1E (I + A)−1E (I + A)−1

+1

2πi

∫Γ

1

1 + z(zI − A− E )−1

[E (zI − A)−1

]3.




D[1]f (A, E ) = −(I + A)−1E (I + A)−1

D[2]f (A, E ) = 2(I + A)−1E (I + A)−1E (I + A)−1

Therefore we have

f (A + E )= (I + A)−1

− (I + A)−1E (I + A)−1

+ (I + A)−1E (I + A)−1E (I + A)−1

+1

2πi

∫Γ

1

1 + z(zI − A− E )−1

[E (zI − A)−1

]3.


Application to Pade approximants

Let f (z) = pm(z)/qn(z) + O(zm+n+1) be the [m, n] Pade approximationto f (z) with truncation error Sm,n(z). Then

f (X ) =pm(X )

qn(X )− Sm,n(X ).

After some rearrangement, and application of our formula for theremainder term, we find

Sm,n(X ) =qn(X )−1Xm+n+1

2πi

∫Γ

qn(z)f (z)(zI − X )−1

zm+n+1dz .


Applying pseudospectrum - 1Recall that the ε-pseudospectrum of X is the set

Λε(X ) = {z ∈ C : ‖(zI − X )−1‖ ≥ ε−1}.

The ε-psuedospectral radius is ρε = max |z | for z ∈ Λε(X ).

-1 0 1 2 3

-3

-2

-1

0

1

2

3

-2.5

-2

-1.5

-1


Applying pseudospectrum - 1

Recall that the ε-pseudospectrum of X is the set

Λε(X ) = {z ∈ C : ‖(zI − X )−1‖ ≥ ε−1}.

The ε-psuedospectral radius is ρε = max |z | for z ∈ Λε(X ).

Using this we can bound the remainder term by

‖Rk(A, E )‖ ≤ ‖E‖k+1Lε

2πεk+1maxz∈Γε

|f (z)|,

where

• Γε is a contour enclosing Λε(A) and Λε(A + E ).

• Lε is the length of the contour Γε.

• ε is a parameter to be chosen.



Applying this to R0(A, E ) gives a bound on the condition number.

condabs(f , A) ≤ Lε

2πε2maxz∈Γε

|f (z)|,

where Γε encloses Λε(A) and has length Lε.

Interesting because:

• Usually only lower bounds on condition number are known.

• Computing (or estimating) this efficiently could be of considerableinterest in practice or for algorithm design.



We can get a much simpler version of this using the following result.

Lemma (Reddy, Schmid, and Henningson)

Let W (A) be the numerical range of A and ∆δ be a closed disk of radiusδ. Then for all ε > 0

Λε(A) ⊂W (A) +∆ε,

and therefore ρε(A) ≤ ‖A‖2 + ε.

Take Γε to be a circle of radius ‖A‖2 + ε in our bound:

Corollary (Deadman and R., 2015)

condabs(f , A) ≤ ‖A‖2 + ε

ε2max

|z|=‖A‖2+ε|f (z)|.



We can get a much simpler version of this using the following result.

Lemma (Reddy, Schmid, and Henningson)

Let W (A) be the numerical range of A and ∆δ be a closed disk of radiusδ. Then for all ε > 0

Λε(A) ⊂W (A) +∆ε,

and therefore ρε(A) ≤ ‖A‖2 + ε.

Take Γε to be a circle of radius ‖A‖2 + ε in our bound:

Corollary (Deadman and R., 2015)

condabs(f , A) ≤ ‖A‖2 + ε

ε2max

|z|=‖A‖2+ε|f (z)|.


Summary

So far:

• Obtained explicit remainder term for Taylor polynomials of matrixfunctions

• Used pseudospectra to obtain (computable!) upper bound oncondition number

• Shown how can be applied to analysis of Pade approximants

Future work:

• Use to analyze current matrix function algorithms in more detail


Science

Taylor's Theorem for Matrix Functions and Pseudospectral Bounds on the Condition Number