Iterative methods with special structures

Iterative methods with special structures

David F. Gleich!Purdue University!

David Gleich · Purdue 1

Two projects.

1.  Circulant structure in tensors and a linear system solver.

2.  Localized structure in the matrix exponential and relaxation methods.


Circulant algebra 40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 2 / 29

IntroductionKilmer, Martin, and Perrone (2008) presented a circulantalgebra: a set of operations that generalize matrix algebra tothree-way data and provided an SVD.

The essence of this approach amounts to viewingthree-dimensional objects as two-dimensional arrays (i.e.,matrices) of one-dimensional arrays (i.e., vectors).

Braman (2010) developed spectraland other decompositions.

We have extended this algebra withthe ingredients required for iterativemethods such as the power methodand Arnoldi method, and have char-acterized the behavior of these algo-rithms.

With Chen Greif and James Varah at UBC.


40 60 80 100 120

40

60

80

mm


Three-way arrays

Given an m ⇥ n ⇥ k table of data,we view this data as an m ⇥ n ma-trix where each “scalar” is a vectorof length k.

A 2 Km⇥nk

We denote the space of length-k scalars as Kk.These scalars interact like circulant matrices.


40 60 80 100 120

40

60

80

mm


Circulants

Circulant matrices are a commutative, closed class under thestandard matrix operations.

2666664

�1 �k . . . �2

�2 �1. . .

......

. . . . . . �k�k . . . �2 �1

3777775

We’ll see more of their properties shortly!

40 60 80 100 120

40

60

80

mm


The circ operationWe denote the space of length-k scalars as Kk.These scalars interact like circulant matrices.

� = {�1 ... �k} 2 Kk.

� $ circ(�) ⌘

2666664

�1 �k . . . �2

�2 �1. . .

......

. . . . . . �k�k . . . �2 �1

3777775.

�+�$ circ(�)+circ(�) and ��$ circ(�)circ(�);

0 = {0 0 ... 0} 1 = {1 0 ... 0}

Kk is the ring of length-k circulants.David Gleich · Purdue 5

Scalars to matrix-vector products

40 60 80 100 120

40

60

80

mm


Operations (cont.)More operations are simplified in the Fourier space too. Letcft(�) = di�g [�1, ..., �k]. Because the �j values are theeigenvalues of circ(�), we have:

abs(�) = icft(di�g [|�1 |, ..., |�k |]),� = icft(di�g [�1, ..., �k]) = icft(cft(�)�), and

angle(�) = icft(di�g [�1/ |�1 |, ..., �k/ |�k |]).

Proofs are simple, e.g. angle(�) � angle(�) = 1Live Matlab demo

40 60 80 100 120

40

60

80

mm


cft and icftWe define the “Circulant Fourier Transform” or cft

cft : � 2 Kk 7! Ck⇥k

and its inverseicft : Ck⇥k 7! Kk

as follows:

cft(�) ⌘ñ�1

. . .�k

ô= F�circ(�)F,

icft

Çñ�1

. . .�k

ôå⌘ �$F cft(�)F�,

where �j are the eigenvalues of circ(�) as produced in theFourier transform. These transformations satisfyicft(cft(�)) = � and provide a convenient way of movingbetween operations in Kk to the more familiar environment ofdiagonal matrices in Ck⇥k.

40 60 80 100 120

40

60

80

mm


The circ operation on matrices

A � x =

264

Pnj=1 A1,j � �j

...Pnj=1 Am,j � �j

375$24

circ(A1,1) ... circ(A1,n)...

. . ....

circ(Am,1) ... circ(Am,n)

3524circ(�1)

...circ(�n)

35 .

Define

circ(A) ⌘24

circ(A1,1) ... circ(A1,n)...

. . ....

circ(Am,1) ... circ(Am,n)

35 circ(x) ⌘24circ(�1)

...circ(�n)

35

A � x$ circ(A)circ(x) matrix-vector products.

x � �$ circ(x)circ(�) vector-scalar products

This is equivalent to Kilmer, Martin, Perrone (2008).


The special structure

This circulant structure is our special structure for this first problem. We look at two types of iterative methods: 1.  the power method and 2.  the Arnoldi method.


A perplexing result! 40 60 80 100 120

40

60

80

mm


Example

Run the power method on{2 3 1} {0 0 0}{0 0 0} {3 1 1}

�

Result � = (1/3) {10 4 4}


Some understanding through decoupling 40 60 80 100 120

40

60

80

mm


Back to figure


Some understanding through decoupling

40 60 80 100 120

40

60

80

mm


ExampleLet A =î {2 3 1} {8 �2 0}{�2 0 2} {3 1 1}

ó. The result of the circ and cft

operations are:

circ(A) =

26666664

2 1 3 8 0 �23 2 1 �2 8 01 3 2 0 �2 8�2 2 0 3 1 10 �2 2 1 3 12 0 �2 1 1 3

37777775,

(�⌦ F�)circ(A)(�⌦ F) =

266666664

6 6�p3� �9+

p3�p

3� �9�p3�

0 5�3+

p3� 2

�3�p3� 2

377777775,

cft(A) =

266666664

6 60 5

�p3� �9+

p3�

�3+p3� 2 p

3� �9�p3�

�3�p3� 2

377777775.


40 60 80 100 120

40

60

80

mm


Example

A ={2 3 1} {0 0 0}{0 0 0} {3 1 1}

�

A1 =6 00 5

�, A2 =ñ-�p3 0

0 2

ô, A3 =ñ�p3 00 2

ô.

�1 = icft(di�g [6 2 2]) = (1/3) {10 4 4}�2 = icft(di�g [5 -�

p3 �p3]) = (1/3) {5 2 2}

�3 = icft(di�g [6 -�p3 �p3]) = {2 3 1}

�4 = icft(di�g [5 2 2]) = (1/3) {3 1 1} .The corresponding eigenvectors are

x1 ={1/3 1/3 1/3}{2/3 -1/3 -1/3}

�; x2 ={2/3 -1/3 -1/3}{1/3 1/3 1/3}

�;

x3 ={1 0 0}{0 0 0}

�; x4 =

{0 0 0}{1 0 0}

�.

Some understanding through decoupling


Convergence of the power method is in terms of the individual blocks

40 60 80 100 120

40

60

80

mm


The power method converges

Let A 2 Kn⇥nk have a canonical set of eigenvalues �1, . . . ,�n

where |�1| > |�2|, then the power method in the circulantalgebra convergences to an eigenvector x1 with eigenvalue �1.Where we use the ordering ...

� < �$ cft(�) < cft(�) elementwise

40 60 80 100 120

40

60

80

mm


Canonical setThere are more eigenvalues

�5 = icft(di�g [6 -�p3 2]) �6 = icft(di�g [6 2 �

p3])

�7 = icft(di�g [5 -�p3 2]) �8 = icft(di�g [5 2 �

p3]),

altogether polynomial number, exceeds dimension of matrix.

Definition. A canonical set of eigenvalues and eigenvectors isa set of minimum size, ordered such thatabs(�1) � abs(�2) � . . . � abs(�k), which contains theinformation to reproduce any eigenvalue or eigenvector of A

In this case, the only canonical set is {(�1,x1), (�2,x2)}. (Needtwo, and have abs(�1) � abs(�2).)


40 60 80 100 120

40

60

80

mm


2000 4000 6000 800010

−15

10−10

10−5

100

✓2 +2 cos( 2⇡/n)

2 +2 cos( ⇡/n)

◆2 i

✓6 +2 cos( 2⇡/n)

6 +2 cos( ⇡/n)

◆2 i

✓6 +2 cos( 2⇡/n)

6 +2 cos( ⇡/n)

◆i

iteration

ma

gn

itud

e

Eigenvalue ErrorEigenvector Change

Figure: The convergence behavior of the powermethod in the circulant algebra. The gray lines showthe error in the each eigenvalue in Fourier space.These curves track the predictions made based onthe eigenvalues as discussed in the text. The redline shows the magnitude of the change in theeigenvector. We use this as the stopping criteria. Italso decays as predicted by the ratio of eigenvalues.The blue fit lines have been visually adjusted tomatch the behavior in the convergence tail.

0 10 20 30 40 50

10−15

10−10

10−5

100

Arnoldi iteration

Magnitu

de

Absolute errorResidual magnitude

Figure: The convergence behavior of a GMRESprocedure using the circulant Arnoldi process. Thegray lines show the error in each Fourier componentand the red line shows the magnitude of theresidual. We observe poor convergence in oneFourier component; until the Arnoldi basis capturesall of the eigenvalues after N/2+ 1 = 26 iterations.These results show how the two computations areperforming individual power methods or Arnoldiprocesses in Fourier space.

The Arnoldi Method Using our repertoire of operations, the Arnoldi method in the circulant algebra is equivalent to individual Arnoldi processes on each matrix. Equivalent to a block Arnoldi process. Using the cft and icft operations, we produce an Arnoldi factorization:

40 60 80 100 120

40

60

80

mm


The Arnoldi process… Let A be an n⇥ n matrix with real valued entries. Then theArnoldi method is a technique to build an orthogonal basisfor the Krylov subspace Kt(A,v) = span{v,Av, . . . ,At�1v},where v is an initial vector.

… We have the decomposition

AQt = Qt+1Ht+1,t

where Qt is an n⇥ t matrix, and Ht+1,t is a (t + 1)⇥ t upperHessenberg matrix.

… Using our repertoire of operations, the Arnoldi method inthe circulant algebra is equivalent to individual Arnoldiprocesses on each matrix Aj.

… Equivalent to a block Arnoldi process.… Using the cft and icft operations, we produce an Arnoldifactorization:

A �Qt = Qt+1 �Ht+1,t .David Gleich · Purdue 13

A number of interesting mathematical results from this algebra

1.  A case study of how “decoupled” block iterations arise and are meaningful for an application.

2.  It’s a beautiful algebra. E.g.

40 60 80 100 120

40

60

80

mm


Operations (cont.)More operations are simplified in the Fourier space too. Letcft(�) = di�g [�1, ..., �k]. Because the �j values are theeigenvalues of circ(�), we have:

abs(�) = icft(di�g [|�1 |, ..., |�k |]),� = icft(di�g [�1, ..., �k]) = icft(cft(�)�), and

angle(�) = icft(di�g [�1/ |�1 |, ..., �k/ |�k |]).

Proofs are simple, e.g. angle(�) � angle(�) = 1Live Matlab demo


Conclusion to the circulant algebra

Paper available from "https://www.cs.purdue.edu/homes/dgleich/ The power and Arnoldi method in an algebra of circulants, NLA 2013, Gleich, Greif, Varah. Code available from"https://www.cs.purdue.edu/homes/dgleich/codes/camat


Project 2

Fast relaxation methods to estimate a column of the martrix exponential. With Kyle Kloster


Matrix exponentials

exp(A) is defined as

1X

k=0

1

k !

Ak Always converges

dx

dt= Ax(t) , x(t) = exp(tA)x(0)

Evolution operator "for an ODE

A is n ⇥ n, real


special case of a function of a matrix f (A)

others are f (x) = 1/x ; f (x) = sinh(x)...

Matrix exponentials on large networks

exp(A) =

1X

k=0

1

k !

Ak If A is the adjacency matrix, then Ak counts the number of length k paths between node pairs.

[Estrada 2000, Farahat et al. 2002, 2006] Large entries denote important nodes or edges. Used for link prediction and centrality

If P is a transition matrix (column stochastic), then Pk is the probability of a length k walk between node pairs.

[Kondor & Lafferty 2002, Kunegis & Lommatzsch 2009, Chung 2007] Used for link prediction, kernels, and clustering or community detection

exp(P) =

1X

k=0

1

k !

Pk


This talk: a column of the matrix exponential

x = exp(P)ec

x the solution

P the matrix

ec the column


This talk: a column of the matrix exponential

x = exp(P)ec

x the solution

P the matrix

ec the column

localized large, sparse, stochastic


Uniformly localized "solutions in livejournal

1 2 3 4 5

x 106

0

0.5

1

1.5

nnz = 4815948

magnitu

de

plot(x)

100

101

102

103

104

105

106

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

1−

no

rm e

rro

r

largest non−zeros retained10

010

110

210

310

410

510

6

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

1−

no

rm e

rro

r

largest non−zeros retained

x = exp(P)ec


nnz(x) = 4, 815, 948

Gleich & Kloster, arXiv:1310.3423

Our mission!Exploit the structure Find the solution with work "roughly proportional to the "localization, not the matrix.


Our algorithms for uniform localization"www.cs.purdue.edu/homes/dgleich/codes/nexpokit

100 101 102 103 104 105 10610−8

10−6

10−4

10−2

100

non−zeros

1−no

rm e

rror

gexpmgexpmqexpmimv

100 101 102 103 104 105 10610−8

10−6

10−4

10−2

100

non−zeros

1−no

rm e

rror


work = O⇣

log(

1

" )(

1

" )

3/2d2

(log d)

2

⌘nnz = O

⇣log(

1

" )(

1

" )

3/2d(log d)

⌘

Matrix exponentials on large networks Is a single column interesting? Yes!

exp(P)ec =

1X

k=0

1

k !

Pk ec Link prediction scores for node c A community relative to node c

But … modern networks are "large ~ O(109) nodes, sparse ~ O(1011) edges, constantly changing …

and so we’d like "speed over accuracy


6,$0� 5(9,(:� "��6RFLHW\�IRU�,QGXVWULDO�DQG�$SSOLHG�0DWKHPDWLFV�9RO��1R��2FWREHU��

1,1(7((1� '8%,286� :$<6� 72� &20387(�7+(� (;321(17,$/� 2)� $�0$75,; �

&/(9(� 02/(5W� $1'� &+$5/(6� 9$1� /2$1W�

$EVWUDFW��,Q�SULQFLSOH��WKH�H[SRQHQWLDO�RI�D�PDWUL[�FRXOG�EH�FRPSXWHG�LQ�PDQ\�ZD\V��0HWKRGV�LQYROYLQJ�DSSUR[LPDWLRQ�WKHRU\��GLIIHUHQWLDO�HTXDWLRQV��WKH�PDWUL[�HLJHQYDOXHV��DQG�WKH�PDWUL[�FKDUDFWHULVWLF�SRO\��QRPLDO�KDYH�EHHQ�SURSRVHG��,Q�SUDFWLFH��FRQVLGHUDWLRQ�RI�FRPSXWDWLRQDO�VWDELOLW\�DQG�HIILFLHQF\�LQGLFDWHV�WKDW�VRPH�RI�WKH�PHWKRGV�DUH�SUHIHUDEOH�WR�RWKHUV��EXW�WKDW�QRQH�DUH�FRPSOHWHO\�VDWLVIDFWRU\��

��,QWURGXFWLRQ��0DWKHPDWLFDO�PRGHOV� RI� PDQ\�SK\VLFDO��ELRORJLFDO��DQG�HFRQRPLF�SURFHVVHV�LQYROYH�V\VWHPV�RI�OLQHDU��FRQVWDQW�FRHIILFLHQW�RUGLQDU\�GLIIHUHQWLDO�HTXDWLRQV�

[��W�� $[��W��

+HUH�$� LV�D�JLYHQ��IL[HG��UHDO�RU�FRPSOH[�Q�E\�Q�PDWUL[��$�VROXWLRQ�YHFWRU�[�W��LV�VRXJKW�ZKLFK�VDWLVILHV�DQ�LQLWLDO�FRQGLWLRQ�

[��2�� [R��

,Q�FRQWURO�WKHRU\��$�LV�NQRZQ�DV�WKH�VWDWH�FRPSDQLRQ�PDWUL[�DQG�[�W��LV�WKH�V\VWHP�UHVSRQVH��

,Q�SULQFLSOH��WKH�VROXWLRQ�LV�JLYHQ�E\�[�W�� HW$[R�ZKHUH�HW$�FDQ�EH�IRUPDOO\�GHILQHG�E\�WKH�FRQYHUJHQW�SRZHU�VHULHV�

W�$��W$� W$�H� ,�,$�� a��

7KH�HIIHFWLYH�FRPSXWDWLRQ�RI�WKLV�PDWUL[�IXQFWLRQ�LV�WKH�PDLQ�WRSLF�RI�WKLV�VXUYH\��:H�ZLOO�SULPDULO\�EH�FRQFHUQHG�ZLWK�PDWULFHV�ZKRVH�RUGHU�Q�LV�OHVV�WKDQ�D�IHZ�

KXQGUHG��VR�WKDW�DOO�WKH�HOHPHQWV�FDQ�EH�VWRUHG�LQ�WKH�PDLQ�PHPRU\�RI�D�FRQWHPSRUDU\�FRPSXWHU��2XU�GLVFXVVLRQ�ZLOO�EH�OHVV�JHUPDQH�WR�WKH�W\SH�RI�ODUJH��VSDUVH�PDWULFHV�ZKLFK�RFFXU�LQ�WKH�PHWKRG�RI�OLQHV�IRU�SDUWLDO�GLIIHUHQWLDO�HTXDWLRQV��

'R]HQV�RI�PHWKRGV�IRU�FRPSXWLQJ�H�W$FDQ�EH�REWDLQHG�IURP�PRUH�RU�OHVV�FODVVLFDO�UHVXOWV�LQ�DQDO\VLV��DSSUR[LPDWLRQ�WKHRU\��DQG�PDWUL[�WKHRU\��6RPH�RI�WKH�PHWKRGV�KDYH�EHHQ�SURSRVHG�DV�VSHFLILF�DOJRULWKPV��ZKLOH�RWKHUV�DUH�EDVHG�RQ�OHVV�FRQVWUXFWLYH�FKDUDFWHUL]DWLRQV��2XU� ELEOLRJUDSK\�FRQFHQWUDWHV�RQ� UHFHQW�SDSHUV�ZLWK�VWURQJ�DOJRULWKPLF�FRQWHQW��DOWKRXJK�ZH�KDYH�LQFOXGHG�D�IDLU�QXPEHU�RI�UHIHUHQFHV�ZKLFK�SRVVHVV�KLVWRULFDO�RU�WKHRUHWLFDO�LQWHUHVW��

,Q�WKLV�VXUYH\�ZH�WU\�WR�GHVFULEH�DOO�WKH�PHWKRGV�WKDW�DSSHDU�WR�EH�SUDFWLFDO��FODVVLI\�WKHP�LQWR�ILYH�EURDG�FDWHJRULHV��DQG�DVVHVV�WKHLU�UHODWLYH�HIIHFWLYHQHVV��$FWX��DOO\��HDFK�RI�WKH��PHWKRGV��ZKHQ�FRPSOHWHO\�LPSOHPHQWHG�PLJKW�OHDG�WR�PDQ\�GLIIHUHQW�FRPSXWHU�SURJUDPV�ZKLFK�GLIIHU�LQ�YDULRXV�GHWDLOV��0RUHRYHU��WKHVH�GHWDLOV�PLJKW�KDYH�PRUH�LQIOXHQFH�RQ�WKH�DFWXDO�SHUIRUPDQFH�WKDQ�RXU�JURVV�DVVHVVPHQW�LQGLFDWHV��7KXV��RXU�FRPPHQWV�PD\�QRW�GLUHFWO\�DSSO\�WR�SDUWLFXODU�VXEURXWLQHV��

,Q�DVVHVVLQJ�WKH�HIIHFWLYHQHVV�RI�YDULRXV�DOJRULWKPV�ZH�ZLOO�EH�FRQFHUQHG�ZLWK�WKH�IROORZLQJ�DWWULEXWHV��OLVWHG�LQ�GHFUHDVLQJ�RUGHU�RI�LPSRUWDQFH��JHQHUDOLW\��UHOLDELOLW\��

�5HFHLYHG�E\�WKH�HGLWRUV�-XO\��DQG�LQ�UHYLVHG�IRUP�0DUFK��W�'HSDUWPHQW�RI�0DWKHPDWLFV��8QLYHUVLW\�RI�1HZ�0H[LFR��$OEXTXHUTXH��1HZ�0H[LFR��7KLV�

ZRUN�ZDV�SDUWLDOO\�VXSSRUWHG�E\�16)�*UDQW�0&6��W�'HSDUWPHQW�RI�&RPSXWHU�6FLHQFH��&RUQHOO�8QLYHUVLW\��,WKDFD��1HZ�<RUN��7KLV�ZRUN�ZDV�

SDUWLDOO\�VXSSRUWHG�E\�16)�*UDQW�0&6��

��

This content downloaded from 128.210.126.199 on Sun, 28 Jul 2013 21:30:56 PMAll use subject to JSTOR Terms and Conditions

SIAM REVIEW c⃝ 2003 Society for Industrial and Applied MathematicsVol. 45, No. 1, pp. 3–49

Nineteen Dubious Ways toCompute the Exponential of aMatrix, Twenty-Five Years Later∗

Cleve Moler†

Charles Van Loan‡

Abstract. In principle, the exponential of a matrix could be computed in many ways. Methods involv-ing approximation theory, differential equations, the matrix eigenvalues, and the matrixcharacteristic polynomial have been proposed. In practice, consideration of computationalstability and efficiency indicates that some of the methods are preferable to others butthat none are completely satisfactory.

Most of this paper was originally published in 1978. An update, with a separatebibliography, describes a few recent developments.

Key words. matrix, exponential, roundoff error, truncation error, condition

AMS subject classifications. 15A15, 65F15, 65F30, 65L99

PII. S0036144502418010

1. Introduction. Mathematical models of many physical, biological, and eco-nomic processes involve systems of linear, constant coefficient ordinary differentialequations

x(t) = Ax(t).

Here A is a given, fixed, real or complex n-by-n matrix. A solution vector x(t) issought which satisfies an initial condition

x(0) = x0.

In control theory, A is known as the state companion matrix and x(t) is the systemresponse.

In principle, the solution is given by x(t) = etAx0, where etA can be formallydefined by the convergent power series

etA = I + tA +t2A2

2!+ · · · .

The effective computation of this matrix function is the main topic of this survey.

∗Published electronically February 3, 2003. A portion of this paper originally appeared in SIAMReview, Volume 20, Number 4, 1978, pages 801–836.

http://www.siam.org/journals/sirev/45-1/41801.html†The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098 ([email protected]).‡Department of Computer Science, Cornell University, 4130 Upson Hall, Ithaca, NY 14853-7501

([email protected]).

3

Dow

nloa

ded

07/2

8/13

to 1

28.2

10.1

26.1

99. R

edis

tribu

tion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.siam

.org

/jour

nals

/ojs

a.ph

p


Our underlying method

Direct expansion!A few matvecs, quick loss of sparsity due to fill-in This method is stable for stochastic P!

"… no cancellation, unbounded norm, etc. !!


x = exp(P)ec ⇡PN

k=0

1

k !

P

kec = xN

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Our underlying method "as a linear system Direct expansion! "!!!


x = exp(P)ec ⇡PN

k=0

1

k !

P

kec = xN

2

6666664

III�P/1 III

�P/2. . .. . . III

�P/N III

3

7777775

2

6666664

v0v1......

vN

3

7777775=

2

6666664

ec0......0

3

7777775xN =

NX

i=0

vi

(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec

Lemma we approximate xN well if we approximate v well

Our mission (2)!Approximately solve " when A, b are sparse,"x is localized.


Ax = b

Coordinate descent, Gauss-Southwell, Gauss-Seidel, relaxation & “push” methods


Algebraically! Procedurally!

Solve(A,b) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + r(j) For i where A(i,j) != 0 r(i) = r(i) – z*A(i,j)

Ax = b

r

(k ) = b � Ax

(k )

x

(k+1) = x

(k ) + ejeTj r

(k )

r

(k+1) = r

(k ) � r (k )j Aej

Back to the exponential


2

6666664

III�P/1 III

�P/2. . .. . . III

�P/N III

3

7777775

2

6666664

v0v1......

vN

3

7777775=

2

6666664

ec0......0

3

7777775xN =

NX

i=0

vi


Solve this system via the same method. Optimization 1 build system implicitly Optimization 2 don’t store vi, just store sum xN

Error analysis for Gauss-Southwell


Theorem

Assume P is column-stochastic, v

(0)

= 0.

(Nonnegativity)

iterates and residuals are nonnegative

v

(l) � 0 and r

(l) � 0

(Convergence)

residual goes to 0:

kr

(l)k1

Q

l

k=1

�1 � 1

2dk

� l

(

� 1

2d

)


“easy”

“annoying” d is the

largest degree

Proof sketch

Gauss-Southwell picks largest residual ⇒  Bound the update by avg. nonzeros in residual (sloppy) ⇒  Algebraic convergence with slow rate, but each update is

REALLY fast O(d max log n).

If d is log log n, then our method runs in sub-linear time "(but so does just about anything)


Overall error analysis


Components!Truncation to N terms Residual to error Approximate solve

Theorem kxN(`) � xk1 1

N!N+

1e· `� 1

2d

After ℓ steps of Gauss-Southwell

More recent error analysis


Theorem (Gleich and Kloster, 2013 arXiv:1310.3423)" Consider computing the matrix exponential using the Gauss-Southwell relaxation method in a graph with a Zipf-law in the degrees with exponent p=1 and max-degree d, then the work involved in getting a solution with 1-norm error ε is

work = O⇣

log(

1

" )(

1

" )

3/2d2

(log d)

2

⌘

Problem size &"Runtimes

106

107

108

109

1010

10−4

10−2

100

102

104

|V|+ nnz(P)

runtim

e (

s)

expmvhalfgexpmqgexpmexpmimv

Figure 7 – The median runtime of our methods for the seven graphs over 100 trials (only 50 trials

for the largest two datasets). The coordinate relaxation methods have highly variable runtime,

but can be very fast on graphs such as webbase (the point nearest 10

9on the x-axis). We did

not run the gexpm function for matrices larger than the livejournal graph.

104

106

108

10−4

10−3

10−2

10−1

time

in s

eco

nd

s

graph size10

410

610

810

−4

10−2

100

102

104

time

in s

eco

nd

s

graph size10

610

810

−4

10−2

100

102

104

time

in s

eco

nd

s

max−degree squared

Figure 8 – The distribution of runtimes for the gexpm method on two forest-fire graphs (left:

p = 0.4, middle, p = 0.48) of various graph sizes, where graph size is computed as the sum of

the number of vertices and the number of non-zeros in the adjacency matrix. The thick line is

the median runtime over 50 trials, and the shaded region shows the 25% to 75% quartiles (the

shaded region is very tight for the second two figures). The final plot shows the relationship

between the max-degree squared and the runtime in seconds. This figure shows that the runtime

scales in a nearly linear relationship with the max-degree squared, as predicted by our theory.

The large deviations from the line of best fit might be explained by the fact that only a single

forest-fire graph was generated for each graph size.

30

Table 3 – The real-world datasets we use in our experiments span three orders of magnitude in

size.

Graph |V | nnz(P ) nnz(P )/|V |

itdk0304 190,914 1,215,220 6.37

dblp-2010 226,413 1,432,920 6.33

flickr-scc 527,476 9,357,071 17.74

ljournal-2008 5,363,260 77,991,514 14.54

webbase-2001 118,142,155 1,019,903,190 8.63

twitter-2010 33,479,734 1,394,440,635 41.65

friendster 65,608,366 3,612,134,270 55.06

Real-world networks The datasets used are summarized in Table 3. They include

a version of the flickr graph from [Bonchi et al., 2012] containing just the largest

strongly-connected component of the original graph; dblp-2010 from [Boldi et al., 2011],

itdk0304 in [(The Cooperative Association for Internet Data Analyais), 2005], ljournal-

2008 from [Boldi et al., 2011, Chierichetti et al., 2009], twitter-2010 [Kwak et al., 2010]

webbase-2001 from [Hirai et al., 2000, Boldi and Vigna, 2005], and the friendster graph

in [Yang and Leskovec, 2012].

Implementation details All experiments were performed on either a dual processor

Xeon e5-2670 system with 16 cores (total) and 256GB of RAM or a single processor

Intel i7-990X, 3.47 GHz CPU and 24 GB of RAM. Our algorithms were implemented in

C++ using the Matlab MEX interface. All data structures used are memory-e�cient:

the solution and residual are stored as hash tables using Google’s sparsehash pack-

age. The precise code for the algorithms and the experiments below are available via

https://www.cs.purdue.edu/homes/dgleich/codes/nexpokit/.

Comparison We compare our implementation with a state-of-the-art Matlab function

for computing the exponential of a matrix times a vector, expmv [Al-Mohy and Higham,

2011]. We customized this method with knowledge that kP k1

= 1. This single change

results in a great improvement to the runtime of their code. In each experiment, we

use as the “true solution” the result of a call to expmv using the ‘single’ option, which

guarantees a 1-norm error bounded by 2�24, or, for smaller problems, we use a Taylor

approximation with the number of terms predicted by Lemma 12.

6.1 Accuracy on Large Entries

When both gexpm and gexpmq terminate, they satisfy a 1-norm error of ". Many appli-

cations do not require precise solution values but instead would like the correct set of

25


References and ongoing work

Kloster and Gleich, Workshop on Algorithms for the Web-graph, 2013. Also see the journal version on arXiv. www.cs.purdue.edu/homes/dgleich/codes/nexpokit

•  Error analysis using the queue (almost done …) •  Better linear systems for faster convergence •  Asynchronous coordinate descent methods •  Scaling up to billion node graphs (done …)


Supported by NSF CAREER 1149756-CCF www.cs.purdue.edu/homes/dgleich

Technology

Iterative methods with special structures