32
The Search for the Nearest Defective Matrix Michael L. Overton Courant Institute of Mathematical Sciences New York University Joint work with R. Alam, S. Bora, R. Byers RPB Workshop Berlin, 2006

The Search for the Nearest Defective Matrix Michael L. Overton Courant Institute of Mathematical Sciences New York University Joint work with R. Alam,

Embed Size (px)

Citation preview

The Search for the Nearest Defective Matrix

Michael L. Overton

Courant Institute of Mathematical SciencesNew York University

Joint work with R. Alam, S. Bora, R. Byers

RPB WorkshopBerlin, 2006

Brent Memories

• 1975• 1986-1987

The Nearest Defective Matrix to A

• Defective means having an eigenvalue with algebraic multiplicity > geometric multiplicity

• Equivalently, not diagonalizable• Equivalently, with a nontrivial Jordan block• Equivalently, with a nonlinear elementary divisor• Equivalently, with an infinitely ill-conditioned

eigenvalue• Same as distance to nearest matrix with a

multiple eigenvalue• Same as distance to nearest matrix with a double

eigenvalue• The search for this matrix began with Wilkinson

Wilkinson, AEP, 1965• Defined condition number of a simple

eigenvalue as 1/|y*x|, where y and x are respectively normalized left and right eigenvectors

• “Even if the eigenvalues are well separated they may still be very ill-conditioned.”

• An example of such A is given• “It might be expected that there is a matrix

close to A which has some nonlinear elementary divisors and we can readily see that this is true…”

Ruhe, Numer. Math., 1970• “It is a well known fact that a matrix which is close

to one with multiple eigenvalues has an ill-conditioned eigensystem. We show a converse, that if a matrix has an ill-conditioned eigensystem, it is also close to a matrix having multiple eigenvalues.”

Golub and Wilkinson, SIREV, 1976• “The solution of the eigenvalue problem for a

nonnormal matrix presents severe practical difficulties when A is defective or close to a defective matrix.”

• “We now show that when an eigenvalue of A is ill-conditioned, A is necessarily relatively close to a matrix with a multiple eigenvalue.”

• Improves on Ruhe’s bound and an unpublished bound of Kahan

Demmel, PhD thesis, 1983• Defines diss_path(A) as distance from n by n matrix A

to nearest matrix with multiple eigenvalues (“path” refers to the path traveled by the eigenvalues under a smoothly varying perturbation)

• Defines diss_region(A) as largest such that the “area swept out by the eigenvalues under perturbation”, {z: z is an eigenvalue of A+E for some E with ||E||≤ }, consists of n disjoint regions

• Notes that for all norms diss_path ≥ diss_region, since in the 2nd definition the perturbations may be different

• Observes that for some norms diss_path > diss_region • Says it is an interesting open question as to whether

diss_path = diss_region in the case of the 2 and Frobenius norms

Pseudospectra• Pseudospectra Gateway

www.comlab.ox.ac/pseudospectra/• Spectra and Pseudospectra, Trefethen and

Embree, Princeton, 2005• In language of pseudospectra, Demmel’s

question asks whether, for the 2 and F norms, the distance to the nearest defective matrix is the same as the largest for which the -pseudospectrum consists of n disjoint regions

• That is, the smallest for which two -pseudospectral components coalesce

• EigTool (T. Wright and Trefethen, 2003) (2 and F norm)

Fundamental Thm of Pseudospectra• The following two definitions of the 2 and F norm -

pseudospectrum of A are equivalent: • {z: det(A+E - zI) = 0 for some E with ||E||≤ }• {z: n(A – zI) ≤ } (where n denotes smallest

singular value)• Easy to prove via SVD:

– If z satisfies first condition, n(A – zI) ≤ – If z satisfies second condition, A – n(A – zI) unvn* must

have an eigenvalue z

• Thus EigTool plots contours of n

• Extends to other norms• Importance emphasized by Trefethen for many

years• Who first made this observation?

Wilkinson, Utilitas Math, 1984, 71 pp• “A problem of primary interest to us is the

distance, measured in the 2 norm, of our matrix A from matrices having multiple eigenvalues.”

• “We expect that when the condition number is large A will be, at least in some sense, close to a matrix having a double eigenvalue. A major objective of this paper is to quantify this statement.”

• “The nearest defective matrix will often have an eigenvalue of multiplicity 2 and the nearest matrix with an eigenvalue of larger multiplicity may be much further away.”

• Still no mention of pseudospectra

Wilkinson, Utilitas Math 30, 1986, 43 pp.• “The domain D() in the complex plane defined

by ||(A – zI)-1||-1 ≤ uniquely identifies all z which can be induced as eigenvalues by perturbations E with ||E||≤”

• “We would like to emphasize the sheer economy of this theorem.”

• May be the first explicit observation of the fundamental theorem of pseudospectra, although Varah observed the less trivial direction direction in 1979 (possible exception: Landau)

• “The behaviour of the domain D() as increases from 0 is of fundamental interest to us.”

• “When A has distinct eigenvalues and is sufficiently small, D() consists of n isolated domains” [connected components]

Wilkinson, Util Math 30, continued• “A problem of basic interest to us is the smallest

value of for which one of these domains [connected components] coalesces with one of the others.”

• Observes that coalescence of two components of D() at z for a particular shows that each of two eigenvalues can be moved to the same z by a perturbation of norm but this does not imply that a single perturbation exists that can induce a double eigenvalue.

• Like Demmel, gives examples where this is indeed not possible for the 1 and norms

• Observes that “it might be felt that the 2 norm is more satisfactory…” but that “computation of the smallest singular value for a range of values of zis generally prohibitive”.

The Editors, Util Math 31, 1987• When we sent Vol 30 to press, the last paper was

a 40-page study by Wilkinson on sensitivity of eigenvalues. At that time we little thought that Vol 31 would commence with an “in memoriam” notice…

Demmel, 1987• In proceedings of a conference in memory of

Wilkinson• “No counterexample [to a conjecture that the

answer to his question is yes] is yet known.”• “A simple guaranteed way to compute the

distance to the nearest defective matrix remains elusive.”

Malyshev, Numer. Math., 1999

• Distance to nearest defective matrix in 2-norm is

• Inner minimization is unimodal, but outer is potentially a hard global optimization problem.

• Inspired by algorithm to compute the “real stability radius”

Edelman and Lippert, 1998-1999• Used ideas from both pseudospectra and

differential geometry• Used these to argue that, generically, distance to

nearest defective matrix in the F-norm is the height of the lowest saddle point of f(x,y) = n(A – (x+iy) I)

• The implication is that the answer to Demmel’s question is yes, at least generically, but this is not addressed

• No algorithm is give to find this lowest critical point

Alam and Bora, LAA, 2005• First proof that the answer to Demmel’s

question is yes• Furthermore, the infimum is always achieved

by a defective matrix with a double eigenvalue• Proof is not easy• A. Lewis has now obtained a more

topologically-based proof• This confirms that the nearest nondefective

matrix is A – n(A–zI)unvn*, where z = x+iy and(x,y) is the lowest saddle point of f(x,y) = n(A – (x+iy) I) for the 2 and F norms – as long as n(A–zI) is simple

Nongeneric Case• What if n-1(A-zI) = n(A-zI) ?• Then A – n-1(A-zI)un-1vn-1* – n(A-zI) unvn* is a

nearest matrix with a multiple eigenvalue – in the 2 norm

• This eigenvalue is nondefective (it has algebraic multiplicity 2 and geometric multiplicity 2)

• One can also construct a nearest defective matrix, for both 2 and F norms

• Example: normal matrices: pseudospectral components are circles

• More interesting: nonnormal block diagonal matrices, when two coalescing eigenvalues come from different blocks: pseudospectral components are not circles but coalesce tangentially

Clarke Generalized Gradient • Clarke (PhD thesis, 1973; book, 1982)• Assume f is locally Lipschitz• The Clarke generalized gradient of f at x RN is

the convex hull of limits of gradients of f at x,

∂Cf(x) = conv { limr Df(xr) : xrx , xr Q} where Q = {w: f is differentiable at w}

Alam, Bora, Byers, Overton, 2006• Theorem: in all cases, for the 2 and F norm, the

distance to the nearest defective matrix is the height of the lowest generalized saddle point of f(x,y) = n(A – (x+iy) I)

• That is, such that 0 ∂Cf (x,y), but (x,y) is not a local extremum of f

• Covers the generic case that n-1 < n (smooth saddle point, n has a zero gradient)

• And the nongeneric case that n-1 = n

(nonsmooth saddle point, n is not differentiable at saddle point and has two different limits, which point in opposite directions when point of coalescence is approached from the two different pseudospectral components

Algorithm• Still no guaranteed algorithm known• Heuristic: apply Newton’s method in 2 real

variables (1 complex variable) to search for a zero of the gradient of n(A – (x+iy) I)

• This breaks down in the nongeneric case

An Iteration Covering Both Cases• Idea: apply Newton’s method to find a zero of

the following function mapping C x R to C x R

• Use known bounds of Wilkinson and Alam to provide starting points (weighted averages of eigenvalue pairs)

Convergence Rate• Smooth saddle point: quadratic in theory and

practice• Nonsmooth saddle point: typically also see

quadratic convergence• But the function to which we are applying

Newton’s method is not smooth on C X R !• We think we can show quadratic convergence

by looking at the behaviour along lines through the nonsmooth saddle point

Transition Between Cases• If we break the block diagonal structure by

adding a small top right entry connecting the blocks, we see abrupt transition from the nongeneric to the generic case

• Condition number of Jacobian jumps arbitrarily large and drops as we increase the size of the perturbation

• When we increase it enough, algorithm transitions to discovering that = 0, indicating the saddle point is smooth

Sep

• Given A Cnxn and B Cmxm, find smallest max{||E||, ||F||} such that A+E and B+F share a common eigenvalue

• Equivalently, find smallest such that -pseudospectra are disjoint

• Posed by Varah, 1979, Demmel 1986• First algorithm: Gu-Overton, 2006• This is a nearest defective matrix problem for

Diag(A,B) with the requirement that the block diagonal structure be preserved

Thanks To• Volker Merhmann, our host in Berlin in 2004• Gene Golub and Nick Trefethen, from whom I

learned about numerical linear algebra and pseudospectra

• The Brent family, for many special memories