Non-square matrix.pdf

7/28/2019 Non-square matrix.pdf

1/18

58 Linear Algebra and Matrix Analysis

Non-Square MatricesThere is a useful variation on the concept of eigenvalues and eigenvectors which is denedfor both square and non-square matrices. Throughout this discussion, for A

C m n , letA denote the operator norm induced by the Euclidean norms on C n and C m (which we

denote by ), and let A F denote the Frobenius norm of A. Note that we still havey,Ax C m = yAx = Ay, x C n for x

C n , yC m .

From AC m n one can construct the square matrices AA

C n n and AAC m m .

Both of these are Hermitian positive semi-denite. In particular AA and AAare diagonal-izable with real non-negative eigenvalues. Except for the multiplicities of the zero eigenvalue,these matrices have the same eigenvalues; in fact, we have:

Proposition. Let A

C m n and B

C n m with m

n. Then the eigenvalues of BA

(counting multiplicity) are the eigenvalues of AB , together with n m zeroes. (Remark: Forn = m, this was Problem 4 on Problem Set 5.)Proof. Consider the ( n + m) (n + m) matrices

C 1 =AB 0B 0 and C 2 =

0 0B BA .

These are similar since S 1C 1S = C 2 where

S =I A0 I and S

1

=I

A

0 I .

But the eigenvalues of C 1 are those of AB along with n zeroes, and the eigenvalues of C 2 arethose of BA along with m zeroes. The result follows.

So for any m, n , the eigenvalues of AA and AA differ by |n m| zeroes. Let p =min(m, n ) and let 1 2 p (0) be the joint eigenvalues of AA and AA.Denition. The singular values of A are the numbers

1 2 p 0,where i = i . (When n > m , one often also denes singular values m +1 = = n = 0.)

It is a fundamental result that one can choose orthonormal bases for C n and C m so thatA maps one basis into the other, scaled by the singular values. Let = diag ( 1, . . . , p)C m n be the diagonal matrix whose ii entry is i (1 i p).

Singular Value Decomposition (SVD)If A

C m n , then there exist unitary matrices U C m m , V

C n n such that A = U V ,where

C m n is the diagonal matrix of singular values.


2/18

Non-Square Matrices 59

Proof. By the same argument as in the square case, A 2 = AA . But

AA = 1 = 21 , so A = 1.

So we can choose x

C n with x = 1 and Ax = 1. Write Ax = 1y where y = 1.Complete x and y to unitary matrices

V 1 = [x, v2, , vn ]C n n and U 1 = [y, u2, , um ]C m m .Since U 1 AV 1 A1 is the matrix of A in these bases it follows that

A1 =1 w

0 B

for some w

C n 1 and B

C (m 1) (n 1) . Now observe that

21 + ww

21 + wwBw

= A11w

A1 1w

= 1(21 + ww)

12

since A1 = A = 1 by the invariance of under unitary multiplication.It follows that ( 21 + ww) 12 1, so w = 0, and thus

A1 =1 00 B .

Now apply the same argument to B and repeat to get the result. For this, observe that

21 00 BB = A

1A1 = V

1 AAV 1

is unitarily similar to A

A, so the eigenvalues of B

B are 2 n ( 0). Observealso that the same argument shows that if AR m n , then U and V can be taken to be realorthogonal matrices.This proof given above is direct, but it masks some of the key ideas. We now sketch an

alternative proof that reveals more of the underlying structure of the SVD decomposition.

Alternative Proof of SVD : Let {v1, . . . , vn}be an orthonormal basis of C n consistingof eigenvectors of AA associated with 1 2 n ( 0), respectively, and letV = [v1 vn ]C n n . Then V is unitary, andV AAV =

diag(1, . . . , n )

R n n .


3/18


For 1 i n,Avi 2 = ei V

AAV ei = i = 2i .

Choose the integer r such that

1 r > r +1 = = n = 0(r turns out to be the rank of A). Then for 1 i r , Avi = i u i for a unique u iC m withu i = 1. Moreover, for 1 i, j r ,

ui u j =1

i jvi A

Av j =1

i jei e j = ij .

So we can append vectors ur +1 , . . . , u m C m (if necessary) so that U = [u1 um ]C m mis unitary. It follows easily that AV = U , so A = U V .

The ideas in this second proof are derivable from the equality A = U V expressing theSVD of A (no matter how it is constructed). The SVD equality is equivalent to AV = U .Interpreting this equation columnwise gives

Avi = i u i (1 i p),and

Avi = 0 for i > m if n > m ,

where {v1, . . . , vn }are the columns of V and {u1, . . . , u m }are the columns of U . So A mapsthe orthonormal vectors{

v1, . . . , v p

}into the orthogonal directions

{u1, . . . , u p

}with the

singular values 1 p as scale factors. (Of course if i = 0 for an i p, then Avi = 0,and the direction of u i is not represented in the range of A.)The vectors v1, . . . , vn are called the right singular vectors of A, and u1, . . . , u m are called

the left singular vectors of A. Observe that

AA = V V and = diag ( 21 , . . . , 2n )

R n n

even if m < n . SoV AAV = = diag ( 1, . . . n ),

and thus the columns of V form an orthonormal basis consisting of eigenvectors of AA

C n n . Similarly AA= U U , so

U AAU = = diag ( 21 , . . . , 2

p ,m n zeroes if m>n

0, . . . , 0 )R m m ,and thus the columns of U form an orthonormal basis of C m consisting of eigenvectors of AA

C m m .

Caution. We cannot choose the bases of eigenvectors {v1, . . . , vn }of AA (corresponding to1, . . . , n ) and {u1, . . . , u m }of AA (corresponding to 1, . . . , p, [0, . . . , 0]) independently:we must have Avi = i u i for i > 0.


4/18


In general, the SVD is not unique. is uniquely determined but if AA has multipleeigenvalues, then one has freedom in the choice of bases in the corresponding eigenspace,so V (and thus U ) is not uniquely determined. One has complete freedom of choice of orthonormal bases of

N (AA) and

N (AA): these form the right-most columns of V and U ,

respectively. For a nonzero multiple singular value, one can choose the basis of the eigenspaceof AA (choosing columns of V ), but then the corresponding columns of U are determined;or, one can choose the basis of the eigenspace of AA (choosing columns of U ), but then thecorresponding columns of V are determined. If all the singular values 1, . . . , n of A aredistinct, then each column of V is uniquely determined up to a factor of modulus 1, i.e., V is determined up to right multiplication by a diagonal matrix

D = diag ( ei 1 , . . . , e i n ).

Such a change in V must be compensated for by multiplying the rst n columns of U by D(the rst n

1 cols. of U by diag (ei 1 , . . . , e i n 1 ) if n = 0); of course if m > n , then the

last m n columns of U have further freedom (they are in N (AA)).There is an abbreviated form of SVD useful in computation. Since rank is preserved underunitary multiplication, rank ( A) = r iff 1 r > 0 = r +1 = . Let U r C m r ,V r C n r be the rst r columns of U , V , respectively, and let r = diag ( 1, . . . , r )R r r .Then A = U r r V r (exercise).

Applications of SVDIf m = n, then A

C n n has eigenvalues as well as singular values. These can differsignicantly. For example, if A is nilpotent, then all of its eigenvalues are 0. But all of the

singular values of A vanish iff A = 0. However, for A normal, we have:Proposition. Let A

C n n be normal, and order the eigenvalues of A as

|1| |2| |n |.Then the singular values of A are i = | i|, 1 i n.Proof. By the Spectral Theorem for normal operators, there is a unitary V

C n n forwhich A = V V , where = diag( 1, . . . , n ). For 1 i n, choose di C for whichdi i = | i| and |di| = 1, and let D = diag ( d1, . . . , d n ). Then D is unitary, and

A = ( V D)(D)V

U V ,

where U = V D is unitary and = D = diag ( |1|, . . . , |n |) is diagonal with decreasingnonnegative diagonal entries.Note that both the right and left singular vectors (columns of V , U ) are eigenvectors of

A; the columns of U have been scaled by the complex numbers di of modulus 1.The Frobenius and Euclidean operator norms of A

C m n are easily expressed in termsof the singular values of A:

A F =n

i=1

2i

12

and A = 1 =

(AA),


5/18


as follows from the unitary invariance of these norms. There are no such simple expressions(in general) for these norms in terms of the eigenvalues of A if A is square (but not normal).Also, one cannot use the spectral radius (A) as a norm on C n n because it is possible for(A) = 0 and A = 0; however, on the subspace of C n n consisting of the normal matrices,(A) is a norm since it agrees with the Euclidean operator norm for normal matrices.

The SVD is useful computationally for questions involving rank. The rank of AC m n

is the number of nonzero singular values of A since rank is invariant under pre- and post-multiplication by invertible matrices. There are stable numerical algorithms for computingSVD (try on matlab ). In the presence of round-off error, row-reduction to echelon formusually fails to nd the rank of A when its rank is < min(m, n ); for such a matrix, thecomputed SVD has the zero singular values computed to be on the order of machine , andthese are often identiable as numerical zeroes. For example, if the computed singularvalues of A are 102, 10, 1, 10 1, 10 2, 10 3, 10 4, 10 15, 10 15 , 10 16 with machine 10 16 ,one can safely expect rank( A) = 7.

Another application of the SVD is to derive the polar form of a matrix. This is theanalogue of the polar form z = re i in C . (Note from problem 1 on Prob. Set 6, U

C n n

is unitary iff U = eiH for some Hermitian H C n n ).

Polar Form

Every AC n n may be written as A = P U , where P is positive semi-denite Hermitian

and U is unitary.

Proof. Let A = U V be a SVD for A, and write

A = ( U U )(UV ).

Then U U is positive semi-denite Hermitian and UV is unitary.

Observe in the proof that the eigenvalues of P are the singular values of A; this is truefor any polar decomposition of A (exercise). We note that in the polar form A = P U , P isalways uniquely determined and U is uniquely determined if A is invertible (as in z = re i ).The uniqueness of P follows from the following two facts:

(i) AA= P UU P = P 2 and

(ii) every positive semi-denite Hermitian matrix has a unique positive semi-denite Her-mitian square root (see H-J, Theorem 7.2.6).

If A is invertible, then so is P , so U = P 1A is also uniquely determined. There is also aversion of the polar form for non-square matrices; see H-J for details.

Linear Least Squares ProblemsIf A

C m n and bC m , the linear system Ax = b might not be solvable. Instead, we can

solve the minimization problem: nd x

C n to attain inf xC n Ax

b 2 (Euclidean norm).


6/18


This is called a least-squares problem since the square of the Euclidean norm is a sum of squares. At a minimum of (x) = Ax b 2 we must have (x) = 0, or equivalently

(x; v) = 0

v

C n ,

where

(x; v) =

ddt(x + tv)

t=0

is the directional derivative. If y(t) is a differentiable curve in C m , then

ddt

y(t) 2 = y(t), y(t) + y(t), y (t) = 2Re y(t), y(t) .Taking y(t) = A(x + tv) b, we obtain that

(x) = 0(vC n ) 2Re Ax b,Av = 0A(Ax b) = 0 ,

i.e.,AAx = Ab .

These are called the normal equations (they say ( Ax b)R(A)).

Linear Least Squares, SVD, and Moore-Penrose PseudoinverseThe Projection Theorem (for nite dimensional S )

Let V be an inner product space and let S

V be a nite dimensional subspace. Then

(1) V = S S , i.e., given vV , unique yS and z S

for which

v = y + z

(so y = P v, where P is the orthogonal projection of V onto S ; alsoz = ( I P )v, and I P is the orthogonal projection of V onto S ).(2) Given vV , the y in (1) is the unique element of S which satisesv yS .(3) Given v

V , the y in (1) is the unique element of S realizing theminimum

minsS

v s 2 .


7/18


Remark. The content of the Projection Theorem is contained in the following picture:

I

g g g y

f f f x

f f w

0

S v z

y

S

Proof. (1) Let {1, . . . , r }be an orthonormal basis of S . Given vV , let

y =r

j =1

j , v j and z = v y.

Then v = y + z and yS . For 1 k r ,k , z = k , v k , y = k , v k , v = 0 ,

so z S . Uniqueness follows from the fact that S S = {0}.

(2) Since z = v

y, this is just a restatement of z

S .

(3) For any sS ,v s = y s S

+ z

S ,

so by the Pythagorean Theorem ( pq pq 2 = p 2 + q 2),v s 2 = y s 2 + z 2.

Therefore, v s 2 is minimized iff s = y, and then v y 2 = z 2.

Theorem : [Normal Equations for Linear Least Squares]

Let AC m n , b

C m and be the Euclidean norm. Then xC n realizes the minimum:minxC n

bAx 2

if and only if x is a solution to the normal equations AAx = Ab.

Proof. Recall from early in the course that we showed that R(L)a = N (L ) for any lineartransformation L. If we identify C n and C m with their duals using the Euclidean innerproduct and take L to be multiplication by A, this can be rewritten as

R(A) =

N (A).


8/18


Now apply the Projection Theorem, taking S = R(A) and v = b. Any s S can berepresented as Ax for some xC n (not necessarily unique if rank( A) < n ). We concludethat y = Ax realizes the minimum iff bAxR(A) = N (A),

or equivalently AAx = Ab.

The minimizing element s = yS is unique. Since y R(A), there exists xC n forwhich Ax = y, or equivalently, there exists x C n minimizing bAx 2. Consequently,there is an xC n for which AAx = Ab, that is, the normal equations are consistent.If rank(A) = n, then there is a unique xC n for which Ax = y. This x is the uniqueminimizer of bAx 2 as well as the unique solution of the normal equations AAx = Ab.However, if rank (A) = r < n , then the minimizing vector x is not unique; x can be modiedby adding any element of

N (A). (Exercise. Show

N (A) =

N (AA).) However, there is a

uniquex {xC n : x minimizes bAx 2}= {xC n : Ax = y}

of minimum norm (x is read x dagger).

r r r r r r r r r r r r

r r r r r r r r r r r r

0

x{x : Ax = y}

(in C n )

To see this, note that since {xC n : Ax = y}is an affine translate of the subspace N (A),a translated version of the Projection Theorem shows that there is a unique x {xC n :Ax = y}for which x N (A) and this x is the unique element of {xC n : Ax = y}of minimum norm. Notice also that {x : Ax = y}= {x : AAx = Ab}.In summary: given bC m , then xC n minimizes bAx 2 over xC n iff Ax is theorthogonal projection of b onto R(A), and among this set of solutions there is a unique x of minimum norm. Alternatively, x is the unique solution of the normal equations AAx = Abwhich also satises xN (A).The map A : C m C n which maps bC m into the unique minimizer x of bAx 2of minimum norm is called the Moore-Penrose pseudo-inverse of A. We will see momentarilythat A is linear, so it is represented by an n m matrix which we also denote by A (and wealso call this matrix the Moore-Penrose pseudo-inverse of A). If m = n and A is invertible,then every b

C n is in R(A), so y = b, and the solution of Ax = b is unique, given byx = A 1b. In this case A = A 1. So the pseudo-inverse is a generalization of the inverse topossibly non-square, non-invertible matrices.

The linearity of the map A can be seen as follows. For AC m n , the above considera-

tions show that A| N (A ) is injective and maps onto R(A). Thus A| N (A ) : N (A) R(A)is an isomorphism. The denition of A amounts to the formulaA = ( A

| N (A ) ) 1

P 1,


9/18


where P 1 : C m R(A) is the orthogonal projection onto R(A). Since P 1 and (A| N (A ) ) 1are linear transformations, so is A.The pseudo-inverse of A can be written nicely in terms of the SVD of A. Let A = U V

be an SVD of A, and let r = rank( A) (so 1

r > 0 = r +1 =

). Dene

= diag ( 11 , . . . , 1r , 0, . . . , 0)

C n m .

(Note: It is appropriate to call this matrix as it is easily shown that the pseudo-inverseof

C m n is this matrix (exercise).)

Proposition. If A = U V is an SVD of A, then A = V U is an SVD of A.

Proof. Denote by u1, um the columns of U and by v1, , vn the columns of V . Thestatement that A = U V is an SVD of A is equivalent to the three conditions:1. {u1, um }is an orthonormal basis of C

m

such that span {u1, ur }= R(A)2. {v1, , vn }is an orthonormal basis for C n such that span {vr +1 , , vn }= N (A)3. Avi = i ui for 1 i r .

The conditions on the spans in 1. and 2. are equivalent to span {ur +1 , um }= R(A) andspan{v1, , vr }= N (A). The formulaA = ( A| N (A ) ) 1 P 1

shows that span

{ur +1 ,

um

}=

N (A), span

{v1,

, vr

}=

R(A), and that Au i = 1

ivi

for 1 i r . Thus the conditions 1.-3. hold for A with U and V interchanged and with ireplaced by 1i . Hence A = V U is an SVD for A.A similar formula can be written for the abbreviated form of the SVD. If U r

C m r andV r

C n r are the rst r columns of U , V , respectively, and

r = diag ( 1, . . . , r )C r r ,

then the abbreviated form of the SVD of A is A = U r r V r . The above Proposition showsthat A = V r 1r U r .

One rarely actually computes A. Instead, to minimize b

Ax 2 using the SVD of A

one computesx = V r ( 1r (U

r b)) .

For bC m , we saw above that if x = Ab, then Ax = y is the orthogonal projection

of b onto R(A). Thus AA is the orthogonal projection of C m onto R(A). This is also cleardirectly from the SVD:AA = U r r V r V r

1r U

r = U r r 1r U

r = U r U

r = r j =1 u j u

j

which is clearly the orthogonal projection onto R(A). (Note that V r V r = I r since thecolumns of V are orthonormal.) Similarly, since w = A(Ax) is the vector of least length


10/18


satisfying Aw = Ax, AA is the orthogonal projection of C n onto N (A). Again, this alsois clear directly from the SVD:AA = V r 1r U

r U r r V

r = V r V

r = r j =1 v j v

j

is the orthogonal projection onto R(V r ) = N (A). These relationships are substitutes forAA 1 = A 1A = I for invertible AC n n . Similarly, one sees that(i) AXA = A,

(ii) XAX = X ,

(iii) (AX )= AX ,

(iv) (XA)= XA,

where X = A. In fact, one can show that X C n m is A if and only if X satises (i), (ii),

(iii), (iv). (Exercise see section 5.54 in Golub and Van Loan.)The pseudo inverse can be used to extend the (Euclidean operator norm) condition

number to general matrices: (A) = A A = 1/ r (where r = rank A).

LU FactorizationAll of the matrix factorizations we have studied so far are spectral factorizations in thesense that in obtaining these factorizations, one is obtaining the eigenvalues and eigenvec-tors of A (or matrices related to A, like AA and AA for SVD). We end our discussionof matrix factorizations by mentioning two non-spectral factorizations. These non-spectralfactorizations can be determined directly from the entries of the matrix, and are compu-tationally less expensive than spectral factorizations. Each of these factorizations amountsto a reformulation of a procedure you are already familiar with. The LU factorization isa reformulation of Gaussian Elimination, and the QR factorization is a reformulation of Gram-Schmidt orthogonalization.

Recall the method of Gaussian Elimination for solving a system Ax = b of linear equa-tions, where A

C n n is invertible and bC n . If the coefficient of x1 in the rst equation

is nonzero, one eliminates all occurrences of x1 from all the other equations by adding ap-propriate multiples of the rst equation. These operations do not change the set of solutions

to the equation. Now if the coefficient of x2 in the new second equation is nonzero, it canbe used to eliminate x2 from the further equations, etc. In matrix terms, if

A =a vT

u A C n n

with a = 0, aC , u, v

C n 1, and AC (n 1) (n 1) , then using the rst row to zero out u

amounts to left multiplication of the matrix A by the matrix

1 0

u

aI


11/18


to get

(*) 1 0

ua I

a vT

u A= a v

T

0 A1.

Dene

L1 =1 0ua I

C n n and U 1 =a vT

0 A1

and observe that

L 11 =1 0

ua I .

Hence (*) becomesL 11 A = U 1, or equivalently, A = L1U 1 .

Note that L1 is lower triangular and U 1 is block upper-triangular with one 1 1 block andone (n 1) (n 1) block on the block diagonal. The components of ua C n 1 are calledmultipliers , they are the multiples of the rst row subtracted from subsequent rows, and theyare computed in the Gaussian Elimination algorithm. The multipliers are usually denoted

u/a =

m21m31

...mn 1

.

Now, if the (1, 1) entry of A1 is not 0, we can apply the same procedure to A1: if

A1 =a1 vT 1u1 A1

C (n 1) (n 1)

with a1 = 0, letting

L2 =1 0u 1a 1 I

C (n 1) (n 1)

and forming

L 12 A1 =1 0

u 1a 1 I a1 vT 1u1 A1

= a1 vT 1

0 A2 U 2C(n 1) (n 1)

(where A2 C (n 2) (n 2) ) amounts to using the second row to zero out elements of the

second column below the diagonal. Setting L2 =1 00 L2

and U 2 =a vT

0 U 2, we have

L 12 L 11 A =

1 00 L 12

a vT

0 A1= U 2,


12/18


which is block upper triangular with two 1 1 blocks and one (n 2) (n 2) block on theblock diagonal. The components of u 1a 1 are multipliers, usually denoted

u1a1

=

m32m42

...mn 2

.

Notice that these multipliers appear in L2 in the second column, below the diagonal. Con-tinuing in a similar fashion,

L 1n 1 L 12 L 11 A = U n 1 U is upper triangular (provided along the way that the (1 , 1) entries of A, A1, A2, . . . , A n 2 are

nonzero so the process can continue). Dene L = ( L 1n 1 L

11 ) 1 = L1L2 Ln 1. ThenA = LU . (Remark: A lower triangular matrix with 1s on the diagonal is called a unit lower

triangular matrix, so L j , L 1 j , L 1 j 1 L 11 , L1 L j , L 1, L are all unit lower triangular.) Foran invertible AC n n , writing A = LU as a product of a unit lower triangular matrix Land a (necessarily invertible) upper triangular matrix U (both in C n n ) is called the LU

factorization of A.

Remarks:

(1) If AC n n is invertible and has an LU factorization, it is unique (exercise).

(2) One can show that A

Cn n

has an LU factorization iff for 1 j n, the upper left j j principal submatrix

a11 a1 j...a j 1 a jj

is invertible.

(3) Not every invertible AC n n has an LU-factorization. (Example: 0 11 0 doesnt.)

Typically, one must permute the rows of A to move nonzero entries to the appropriatespot for the elimination to proceed. Recall that a permutation matrix P

C n n is theidentity I with its rows (or columns) permuted. Any such P

R n n is orthogonal,

so P 1

= P T

. Permuting the rows of A amounts to left multiplication by a permu-tation matrix P T ; then P T A has an LU factorization, so A = P LU (called the PLUfactorization of A).

(4) Fact: Every invertible AC n n has a (not necessarily unique) PLU factorization.

(5) It turns out that L = L1 Ln 1 =1 0

m21... . . .

mn 1 1has the multipliers m ij below

the diagonal.


13/18


(6) The LU factorization can be used to solve linear systems Ax = b (where A = LU C n n is invertible). The system Ly = b can be solved by forward substitution (rstequation gives x1, etc.), and Ux = y can be solved by back-substitution ( n th equationgives xn , etc.), giving the solution of Ax = LUx = b. See section 3.5 of H-J.

QR Factorization

Recall rst the Gram-Schmidt orthogonalization process. Let V be an inner product space,and suppose a1, . . . , a n V are linearly independent. Dene q 1, . . . , q n inductively, as follows:let p1 = a1 and q 1 = p1/ p1 ; then for 2 j n, let

p j = a j

j 1

i=1 q i , a j q i and q j = p j / p j .

Since clearly for 1 k n we have q k span{a1, . . . , a k}, each p j is nonzero by the linearindependence of {a1, . . . , a n }, so each q j is well-dened. It is easily seen that {q 1, . . . , q n }is an orthonormal basis for span {a1, . . . , a n }. The relations above can be solved for a j toexpress a j in terms of the q i with i j . Dening r jj = p j (so p j = r jj q j ) and r ij = q i , a jfor 1 i < j n, we have: a1 = r 11q 1, a2 = r 12q 1 + r 22q 2, and in general a j = ji=1 r ij q i .

Remarks:

(1) If a1, a2, is a linearly independent sequence in V , we can apply the Gram-Schmidtprocess to obtain an orthonormal sequence q 1, q 2, . . . with the property that for k 1,{q 1, . . . , q k}is an orthonormal basis for span {a1, . . . , a k}.

(2) If the a j s are linearly dependent, then for some value(s) of k, akspan{a1, . . . , a k 1},and then pk = 0. The process can be modied by setting q k = 0 and proceeding. Weend up with orthogonal q j s, some of which have q j = 1 and some have q j = 0.Then for k

1, the nonzero vectors in the set

{q 1, . . . , q k

}form an orthonormal basis

for span{a1, . . . , a k}.(3) The classical Gram-Schmidt algorithm described above applied to n linearly indepen-

dent vectors a1, . . . , a n C m (where of course m n) does not behave well compu-tationally. Due to the accumulation of round-off error, the computed q j s are not as

orthogonal as one would want (or need in applications): q j , q k is small for j = kwith j near k, but not so small for j k or j k. An alternate version, ModiedGram-Schmidt, is equivalent in exact arithmetic, but behaves better numerically. Inthe following pseudo-codes, p denotes a temporary storage vector used to accumulatethe sums dening the p j s.


14/18


Classic Gram-Schmidt Modied Gram-SchmidtFor j = 1 , , n do For j = 1 , . . . , n do

p := a j p := a j

For i = 1 , . . . , j 1 do For i = 1 , . . . , j 1 dor ij = q i , a j r ij = q i , p

p := p r ij q i p := p r ij q ir jj := p r jj = p

q j := p/r jj q j := p/r jj

The only difference is in the computation of r ij : in Modied Gram-Schmidt, we or-thogonalize the accumulated partial sum for p j against each q i successively.

Proposition. Suppose AC m n with m n. Then Q C m m which is unitary andan upper triangular R C m n (i.e. r ij = 0 for i > j ) for which A = QR . If Q C m n

denotes the rst n columns of Q and RC n n denotes the rst n rows of R, then clearly

also A = QR = [Q]R0

= QR. Moreover

(a) We may choose an R with nonnegative diagonal entries.

(b) If A is of full rank (i.e. rank (A) = n, or equivalently the columns of A are linearlyindependent), then we may choose an R with positive diagonal entries, in which case

the condensed factorization A = QR is unique (and thus in this case if m = n, thefactorization A = QR is unique since then Q = Q and R = R).

(c) If A is of full rank, the condensed factorization A = QR is essentially unique: if A = Q1R1 = Q2R2, then a unitary diagonal matrix D

C n n for which Q2 = Q1D

(rescaling the columns of Q1) and R2 = DR1 (rescaling the rows of R1).

Proof. If the columns of A are linearly independent, we can apply the Gram-Schmidtprocess described above. Let Q = [q 1, . . . , q n ]

C m n , and dene R C n n by setting

r ij = 0 for i > j , and r ij to be the value computed in Gram-Schmidt for i

j . Then

A = QR. Extending {q 1, . . . , q n }to an orthonormal basis {q 1, . . . , q m }of C m , and settingQ = [q 1, . . . , q m ] and R =

R0

C m n , we have A = QR . Since r jj > 0 in G-S, we

have the existence part of (b). Uniqueness follows by induction passing through the G-Sprocess again, noting that at each step we have no choice. (c) follows easily from (b) sinceif rank (A) = n, then rank( R) = n in any QR factorization of A.

If the columns of A are linearly dependent, we alter the Gram-Schmidt algorithm as inRemark (2) above. Notice that q k = 0 iff r kj = 0 j , so if {q k 1 , . . . , q k r }are the nonzerovectors in {q 1, . . . , q n } (where of course r = rank( A)), then the nonzero rows in R areprecisely rows k1, . . . , k r . So if we deneQ = [q k 1

q k r ]

C m r and R

C r n to be these


15/18


16/18


(or Givens rotations), quite analogous to computing a QR factorization. Here, however,similarity transformations are being performed, so they require left and right multiplicationby the Householder transformations leading to an inability to zero out the rst subdiagonal(i = j + 1) in the process. If A is Hermitian and upper-Hessenberg, A is tridiagonal. Thisinitial reduction is to decrease the computational cost of the iterations in the QR algorithm.It is successful because upper-Hessenberg form is preserved by the iterations: if Ak is upperHessenberg, so is Ak+1 .

There are many sophisticated variants of the QR algorithm (shifts to speed up conver-gence, implicit shifts to allow computing a real quasi-upper triangular matrix similar to areal matrix using only real arithmetic, etc.). We consider the basic algorithm over C .

The (Basic) QR Algorithm

Given AC n n , let A0 = A. For k = 0 , 1, 2, . . . , starting with Ak , do a QR factorization of

Ak : Ak = Qk Rk , and then dene Ak+1 = Rk Qk .Remark. Rk = Qk Ak , so Ak +1 = Qk Ak Qk is unitarily similar to Ak . The algorithm uses theQ of the QR factorization of Ak to perform the next unitary similarity transformation.

Convergence of the QR AlgorithmWe will show under mild hypotheses that all of the subdiagonal elements of Ak converge to0 as k . See section 2.6 in H-J for examples where the QR algorithm does not converge.See also sections 7.5, 7.6, 8.2 in Golub and Van Loan for more discussion.Lemma. Let Q j ( j = 1 , 2, . . .) be a sequence of unitary matrices in C n n and R j ( j =1, 2, . . .) be a sequence of upper triangular matrices in C n n with positive diagonal entries.Suppose Q j R j I as j . Then Q j I and R j I .Proof Sketch. Let Q j k be any subsequence of Q j . Since the set of unitary matrices inC n n is compact, a sub-subsequence Q j k l and a unitary Q Q j k l Q. So R j k l =Q j k l Q j k l R j k l QI = Q. So Q is unitary, upper triangular, with nonnegative diagonalelements, which implies easily that Q = I . Thus every subsequence of Q j has in turna sub-subsequence converging to I . By standard metric space theory, Q j I , and thusR j = Q j Q j R j I I = I .Theorem. Suppose A

C n n

has eigenvalues 1, . . . , n with |1| > |2| > > |n | >0. Choose X C n n X 1AX = diag (1, . . . , n ), and suppose X 1 has an LU decomposition. Generate the sequence A0 = A, A1, A2, . . . using the QR algorithm. Then the subdiagonal entries of Ak 0 as k , and for 1 j n, the j th diagonal entry j .Proof. Dene Qk = Q0Q1 Qk and Rk = Rk R0. Then Ak+1 = Qk AQk .Claim: Qk Rk = Ak +1 .

Proof: Clear for k = 0. Suppose Qk 1Rk 1 = Ak . Then

Rk = Ak+1 Qk = Q

k AQk Q

k = Q

k AQk 1,


17/18


soRk = Rk Rk 1 = Qk AQk 1Rk 1 = Q

k Ak +1 ,

so Qk Rk = Ak+1 .

Now, choose a QR factorization of X and an LU factorization of X 1

: X = QR , X 1

=LU (where Q is unitary, L is unit lower triangular, R and U are upper triangular withnonzero diagonal entries). Then

Ak +1 = X k+1 X 1 = QR k+1 LU = QR (k+1 L (k+1) )k+1 U.

Let E k +1 = k +1 L (k+1) I and F k+1 = RE k +1 R 1.Claim: E k+1 0 (and thus F k+1 0) as k .Proof: Let lij denote the elements of L. E k+1 is strictly lower triangular, and for i > j its ij

element is i j

k+1lij

0 as k

since

| i

| 0. Then if x C n has nonzero

component in the direction of the eigenvector corresponding to 1 when expanded in terms of the eigenvectors of A, it follows that the sequence Ak x/ Ak x converges to a unit eigenvectorcorresponding to 1. The condition in the Theorem above that X 1 has an LU factorizationimplies that the (1 , 1) entry of X 1 is nonzero, so when e1 is expanded in terms of theeigenvectors x1, . . . , x n (the columns of X ), the x1-coefficient is nonzero. SoAk+1 e1/ Ak+1 e1converges to x 1 for some

C with | | = 1. Let ( q k )1 denote the rst column of Qk and(r k )11 denote the (1 , 1)-entry of Rk ; then

Ak+1 e1 = Qk Rk e1 = ( r k )11Qk e1 = ( r k )11(q k )1,

so (q k )1 x 1. Since Ak+1 = Q

k AQk , the rst column of Ak +1 converges to

10

...0

.

Further insight into the relationship between the QR algorithm and the power method,inverse power method, and subspace iteration, can be found in this delightful paper: Under-standing the QR Algorithm by D. S. Watkins (SIAM Review, vol. 24, 1982, pp. 427440).

Documents

Non-square matrix.pdf