66
Divide and conquer algorithms for large eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota Applied Math. Seminar Berkeley, Apr. 22, 2015

Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Divide and conquer algorithms for largeeigenvalue problems

Yousef SaadDepartment of Computer Science

and Engineering

University of Minnesota

Applied Math. SeminarBerkeley, Apr. 22, 2015

Page 2: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Collaborators:

ä Joint work with

• Vassileos Kalantzis [grad student]

• Ruipeng Li [grad student]

• Haw-ren Fang [former post-doc]

• Grady Schoefield and Jim Chelikowsky [UT Austin][windowing into PARSEC]

ä Work supported by DOE : Scalable Computational Toolsfor Discovery and Design: Excited State Phenomena in EnergyMaterials [Involves 3 institutions: UT Austin, UC Berkeley, UMinn]

2 Berkeley – 04-22-2015

Page 3: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Introduction & Motivation

ä Density Functional Theory deals with ground states

ä Excited states involve transitions and invariably lead to muchmore complex computations

ä Problem: very large number of eigenpairs to compute

An illustration: Time-Dependent Density Functional Theory(TDDFT). In the so-called Cassida approach we need tocompute the eigenvalues of a dense matrix K which is builtusing both occupied and unocuppied states

3 Berkeley – 04-22-2015

Page 4: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

ä Each pair→ one columnof Kä To compute each columnneed to solve a Poisson eqn.on domainä Intense calculation , lotsof parallelism

One column of K

Occupied States Empty Statesλ λ λ λ λ λ1 2 v v+1 . c

i j

An illustration: Si34 H36 [Done in 2004-2005]

ä Number of Occupied States (Valence band) nv = 86

ä Number of Unoccupied States (Conduction band) nc = 154

ä Size of Coupling Matrix: N = nv ∗ nc = 13, 244

ä Number of Poisson solves = 13,244

4 Berkeley – 04-22-2015

Page 5: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

ä Similar types of calculations in the GW approach [see, e.g.,BerkeleyGW]

ä But more complex

ä Challenge:

‘Hamiltonian of size∼ 1 Million, get 10% of bands’

5 Berkeley – 04-22-2015

Page 6: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Solution: Spectrum Slicing

Rationale. Eigenvectors on both ends of wanted spectrumneed not be orthogonalized against each other :

ä Idea: Get the spectrum by ‘slices’ or ’windows’ [e.g., a fewhundreds or thousands of pairs at a time]

ä Can use polynomial or rational filters6 Berkeley – 04-22-2015

Page 7: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Compute slices separately

ä Deceivingly simple looking idea.

ä Issues:

• Deal with interfaces : duplicate/missing eigenvalues

• Window size [need estimate of eigenvalues]

• How to compute each slice? [polynomial / rationalfilters?, ..]

7 Berkeley – 04-22-2015

Page 8: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Computing a slice of the spectrum

Q:How to compute eigenvalues in the middle of thespectrum of a large Hermitian matrix?

A:Common practice: Shift and invert + some projec-tion process (Lanczos, subspace iteration..)

Mainsteps:

1) Select a shift (or sequence of shifts) σ;2) Factor A− σI: A− σI = LDLT

3) Apply Lanczos algorithm to (A− σI)−1

ä Solves with A− σI carried out using factorization

ä Limitation: factorization

ä First Alternative: Polynomial filtering

8 Berkeley – 04-22-2015

Page 9: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

POLYNOMIAL FILTERS

Page 10: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Polynomial filtering

ä Apply Lanczos orSubspace iteration to:

M = φ(A) where φ(t) isa polynomial

ä Each matvec y = Av is replaced by y = φ(A)v

ä Eigenvalues in high part of filter will be computed first

ä Consider Subspace Iteration. In following script:

• B = (A− cI)/h so Λ(B) ⊂ [−1, 1]

• Rayleigh-Ritz with B [shifted-scaled A]

• Compute residuals w.r.t. B

• Exit when enough eigenpairs have converged

10 Berkeley – 04-22-2015

Page 11: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

for iter=1:maxitY = PolFilt(B, V, mu);[V, R] = qr(Y,0);

%— Rayleigh-Ritz with BC = V’*(B*V);[X, D] = eig(C);d = diag(D);

%— get e-values closest to center of interval[∼, idx] = sort(abs(d-gam),’ascend’);

%— get their residuals...

%— if enough pairs have converged exit...

end

11 Berkeley – 04-22-2015

Page 12: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

What polynomials?

ä For end-intervals can just use Chebyshev

ä For inside intervals: several choices

ä Recall the main goal:A polynomial that haslarge values for λ ∈[a, b] small values else-where

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.2

0

0.2

0.4

0.6

0.8

1

λ

φ

( λ )

λi

φ(λi)

Pol. of degree 32 approx δ(.5) in [−1 1]

12 Berkeley – 04-22-2015

Page 13: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Least-squares approach

ä Two stage approach usedin filtlan [H-r Fang, YS 2011] -ä First select an “ideal filter”ä e.g., a piecewise polyno-mial function [a spline] ba

φ

ä For example φ = Hermite interpolating pol. in [0,a], andφ = 1 in [a, b]

ä Referred to as the ‘Base filter’

13 Berkeley – 04-22-2015

Page 14: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

• Then approximate basefilter by degree k polynomialin a least-squares sense.• Can do this without

numerical integrationba

φ

Main advantage: Extremely flexible.

Method: Build a sequence of polynomials φk which approxi-mate the ideal PP filter φ, in the L2 sense.

ä In filtlan, we used a form of Conjugate Residual techniquein polynomial space. Details skipped.

14 Berkeley – 04-22-2015

Page 15: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Low-pass, high-pass, & barrier (mid-pass) filters

-0.2

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2 2.5 3

f(λ)

λ

[a,b]=[0,3], [ξ,η]=[0,1], d=10

base filter ψ(λ)polynomial filter ρ(λ)

γ

-0.2

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2 2.5 3

f(λ)

λ

[a,b]=[0,3], [ξ,η]=[0.5,1], d=20

base filter ψ(λ)polynomial filter ρ(λ)

γ

ä See Reference on Lanczos + pol. filtering: Bekas, Kokio-poulou, YS (2008) for motivation, etc.

ä H.-r Fang and YS “Filtlan” paper [SISC,2012] and code

15 Berkeley – 04-22-2015

Page 16: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Misconception: High degree polynomials are bad

Degree1000(zoom)

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2

f(λ)

λ

d=1000

base filter ψ(λ)polynomial filter ρ(λ)

γ

Degree1600(zoom)

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2

f(λ)

λ

d=1600

base filter ψ(λ)polynomial filter ρ(λ)

γ

16 Berkeley – 04-22-2015

Page 17: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

A simpler approach: Chebyshev + Jackson damping

ä Simply seek the best LS approximation to step function

ä Add damping coefficients to reduce oscillations

Chebyshev-Jackson appro-ximation of a function f :

f(x) ≈k∑i=0

gki γiTi(x)

γi =

[arccos(a)− arccos(b)] /π : i = 0

2 [sin(i arccos(a))− sin(i arccos(b))] /(iπ) : i > 0

ä Expression for gki : see L. O. Jay, YS, J. R. Chelikowsky, ’99

17 Berkeley – 04-22-2015

Page 18: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−0.2

0

0.2

0.4

0.6

0.8

1

1.2Mid−pass polynom. filter [−1 .3 .6 1]; Degree = 80

Standard Cheb.Jackson−Cheb.

18 Berkeley – 04-22-2015

Page 19: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Polynomial filters: An even simpler approach

ä Simply seek the LSapproximation to the δ−Dirac functionä Centered at the mid-dle of the interval.ä Can use same damp-ing: Jackson, or Lanc-zos σ damping.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.2

0

0.2

0.4

0.6

0.8

1

λ

φ

( λ )

λi

φ(λi)

Pol. of degree 32 approx δ(.5) in [−1 1]

19 Berkeley – 04-22-2015

Page 20: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

ä Filters for 3 slices

−1 −0.95 −0.9 −0.85 −0.8 −0.75 −0.7 −0.65 −0.6 −0.55 −0.5

0

0.2

0.4

0.6

0.8

1

λ

φ (

λ )

20 Berkeley – 04-22-2015

Page 21: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Tests – Test matrices

** From Filtlan article with H-R Fang

ä Experiments on two dual-core AMD Opteron(tm) Proces-sors 2214 @ 2.2GHz and 16GB memory.

Test matrices:

* Five Hamiltonians from electronic structure calculations,

* AndrewsmatrixN = 60, 000, nnz ≈ 760K, interval [4, 5];nev=1,844 eigenvalues, (3,751 to the left of η)

* A discretized Laplacian (FD) n = 106, interval = [1, 1.01],nev= 276, (>17,000 on the left of η)

ä Here : report only on Andrews and Laplacean

21 Berkeley – 04-22-2015

Page 22: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Results for Andrews - set 1 of stats

method degree # iter # matvecs memory

d = 20 9,440 188,800 4,829

filt. Lan. d = 30 6,040 180,120 2,799

(mid-pass) d = 50 3,800 190,000 1,947

d = 100 2,360 236,000 1,131

d = 10 5,990 59,900 2,799

filt. Lan. d = 20 4,780 95,600 2,334

(high-pass) d = 30 4,360 130,800 2,334

d = 50 4,690 234,500 2,334

Part. ⊥ Lanczos 22,345 22,345 10,312

ARPACK 30,716 30,716 6,129

22 Berkeley – 04-22-2015

Page 23: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Results for Andrews - CPU times (sec.)

method degree ρ(A)v reorth eigvec total

d = 20 2,797 192 4,834 9,840

filt. Lan. d = 30 2,429 115 2,151 5,279

(mid-pass) d = 50 3,040 65 521 3,810

d = 100 3,757 93 220 4,147

d = 10 1,152 2,911 2,391 7,050

filt. Lan. d = 20 1,335 1,718 1,472 4,874

(high-pass) d = 30 1,806 1,218 1,274 4,576

d = 50 3,187 1,032 1,383 5,918

Part. ⊥ Lanczos 217 30,455 64,223 112,664

ARPACK 345 †423,492 †18,094 441,934

23 Berkeley – 04-22-2015

Page 24: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Results for Laplacian – Matvecs and Memory

method degree # iter # matvecs memory

mid-pass filter

600 1,400 840,000 10,913

1, 000 950 950,000 7,640

1, 600 710 1,136,000 6,358

Results for Laplacian – CPU times

method degree ρ(A)v reorth eigvec total

mid-pass filter

600 97,817 927 241 99,279

1, 000 119,242 773 162 120,384

1, 600 169,741 722 119 170,856

24 Berkeley – 04-22-2015

Page 25: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Spectrum slicing in PARSEC

** From: A Spectrum Slicing Method for the Kohn-Sham Prob-lem, G. Schofield, J. R. Chelikowsky and YS, Computer PhysicsComm., vol 183 (2011) pp. 487-505.

ä Preliminary implementation in our code: PARSEC

ä Uses the simpler Jackson-Chebyshev filters

ä For details on windowing, etc., see paper

ä Illustration shown next using 16 slices: States for unrelaxedSi275H172 as computed using 16 slices. Each slice→ solid line.Original full spectrum→ dashed line.

Page 26: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

�1.2 �0.6 0.0 �1.2 �0.6 0.0 �1.2 �0.6 0.0 �1.2 �0.6 0.0

�1.2 �0.6 0.0 �1.2 �0.6 0.0 �1.2 �0.6 0.0 �1.2 �0.6 0.0

�1.2 �0.6 0.0 �1.2 �0.6 0.0 �1.2 �0.6 0.0 �1.2 �0.6 0.0

�1.2 �0.6 0.0 �1.2 �0.6 0.0 �1.2 �0.6 0.0 �1.2 �0.6 0.0Energy (Ry)

26 Berkeley – 04-22-2015

Page 27: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

How do I slice my spectrum?

Analogue question:

How would I slice an onion if Iwant each slice to have aboutthe same mass?

27 Berkeley – 04-22-2015

Page 28: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

ä A good tool: Density of States – see:

• L. Lin, YS, Chao Yang recent paper.

• KPM method – see, e.g., : [Weisse, Wellein, Alvermann,Fehske, ’06]

• Interesting instance of a tool from physics used in linearalgebra.

ä Misconception: ‘load balancing will be assured by just hav-ing slices with roughly equal numbers of eigenvalues’

ä Situation is much more complex

28 Berkeley – 04-22-2015

Page 29: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

0 5 10 15 20−0.005

0

0.005

0.01

0.015

0.02

0.025

Slice spectrum into 8 with the DOS

DOS

ä We must have:

∫ ti+1

ti

φ(t)dt =1

nslices

∫ b

aφ(t)dt

29 Berkeley – 04-22-2015

Page 30: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

RATIONAL FILTERS

Page 31: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Why use rational filters?

ä Consider a spectrum like this one:

109

ä Polynomial filtering approach would be utterly ineffective forthis case

ä Second issue: situation when Matrix-vector products areexpensive

31 Berkeley – 04-22-2015

Page 32: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

ä Alternative is touse rational filters:

φ(z) =∑j

αjz−σj

ä We now need to solve linear systems

ä Tool: Cauchy integral representations of spectral projectors

P = −12iπ

∫Γ(A− sI)−1ds

• Numer. integr. P → P̃• Use Krylov or S.I. on P̃

ä Sakurai-Sugiura approach [Krylov]

ä Polizzi [FEAST, Subsp. Iter. ]

32 Berkeley – 04-22-2015

Page 33: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Better rational filters

ä When using rational filters often one resorts to Cauchyintegrals :

Spectral projector P = −12iπ

∫Γ(A−sI)−1ds + Numer. Integ.

ä Approximation theoryviewpoint:Find rational function thatapproximates step functionin [−1, 1] [after shifting +scaling]ä e.g., work by S. Guettelet al. 2014. Uniform ap-prox.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−0.2

0

0.2

0.4

0.6

0.8

1

1.2

LS

Mid−pt

Gauss

33 Berkeley – 04-22-2015

Page 34: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Different Look at the 3 curves

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

LS

Mid−pt

Gauss

What about this case?

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−0.5

0

0.5

1

1.5

2

2.5

3

3.5

LS

Mid−pt

Gauss

ä The blue curve on the right is likely to give better resultsthan the other 2 when subspace iteration is used.

ä Want large values inside [−1, 1] and small ones outside,but do not care much how well step function is approximated

34 Berkeley – 04-22-2015

Page 35: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

What makes a good filter

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.5

1

1.5

2

2.5

real(σ) =[ 0.0]; imag(σ) =[ 1.1]; pow = 3

Opt. pole3

Single pole

Single−pole3

Gauss 2−poles

Gauss 2−poles2

−1.5 −1.4 −1.3 −1.2 −1.1 −1 −0.9 −0.8 −0.7 −0.6 −0.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

real(σ) =[ 0.0]; imag(σ) =[ 1.1]; pow = 3

Opt. pole3

Single pole

Single−pole3

Gauss 2−poles

Gauss 2−poles2

ä Assume subspace iteration is used with above filters. Whichfilter will give better convergence?

ä Simplest and best indicator of performance of a filter is themagnitude of its derivative at -1 (or 1)

35 Berkeley – 04-22-2015

Page 36: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

The Cauchy integral viewpoint

ä Standard Mid-point, Gauss-Chebyshev (1st, 2nd) and Gauss-Legendre quadratures. Left: filters, right: poles

Scaling used so φ(−1) = 12

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Gauss−LegendreGauss−Cheb. 1

Gauss−Cheb. 2Mid−pt

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

Number of poles = 4

LegendreCheb−1Mid−pt

36 Berkeley – 04-22-2015

Page 37: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Gauss−LegendreGauss−Cheb. 1

Gauss−Cheb. 2Mid−pt

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

Number of poles = 6

LegendreCheb−1Mid−pt

ä Notice how the sharper curves have poles close to real axis

37 Berkeley – 04-22-2015

Page 38: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

The Gauss viewpoint: Least-squares rational filters

ä Given: poles σ1, σ2, · · · , σp

ä Related basis functions φj(z) = 1z−σj

Find φ(z) =∑pj=1αjφj(z) that minimizes∫∞−∞w(t)|h(t)− φ(t)|2dt

ä h(t) = step function χ[−1,1].

ä w(t)= weight function.For example a = 10,β = 0.1

w(t) =

0 if |t| > aβ if |t| ≤ 11 else

38 Berkeley – 04-22-2015

Page 39: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

How does this work?

ä A small example : Laplacean on a 43×53 grid. (n = 2279)

ä Take 4 poles obtained from mid-point rule(Nc = 2 on each1/2 plane)

ä Want: eigenvalues inside [0, 0.2]. There are nev = 31 ofthem.

ä Use 1) standard subspace iteration + Cauchy (FEAST) then2) subspace iteration + LS Rat. Appox.

ä Use subspace of dim nev + 6

ä β = 0.2

39 Berkeley – 04-22-2015

Page 40: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

−5 −4 −3 −2 −1 0 1 2 3 4 5−0.5

0

0.5

1

1.5

2

2.5

3

3.5

Cauchy fun.

LS rat. fun.

0 5 10 1510

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Cauchy fun.

LS rat. fun.

ä LS Uses the same poles + same factorizations as Cauchybut

ä ... much faster as expected from a look at the curves of thefunctions

40 Berkeley – 04-22-2015

Page 41: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

ä Other advantages:

• Can select poles far away from real axis→ faster iterativesolvers [E. Di Napoli, YS, et al-, work in progress]

• Can use multiple poles (!)

• Very flexible – can be adapted to many situations

41 Berkeley – 04-22-2015

Page 42: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Influence of parameter β

ä Recall: large β → better approximation to h – not needed

ä Smaller values are better

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 21

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

β

β /

dz

(−1

)Derivative of φ

β at z=−1, as a function of β

Mid−ptLS−Mid−pt

GaussLS−Gauss

42 Berkeley – 04-22-2015

Page 43: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Better rational filters: Multiple poles

ä Current choices: Mid-point/Trapezoidal rule, Gauss, Zolotarev,...

ä None of these allows for repeated (’multiple’) poles e.g., withone pole of multuplicity k:

φ(z) =α1

z − σ+

α2

(z − σ)2+ · · ·+

αk

(z − σ)k

ä Advantage: Fewer exact or incomplete factorizations neededfor solves

ä Next: Illustration with same example as before

43 Berkeley – 04-22-2015

Page 44: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Better rational filters: Example

ä Take same example as before 43× 53 Laplacean

ä Now take 6 poles [3× 2 midpoint rule]

ä Repeat each pole [double poles.]

−5 −4 −3 −2 −1 0 1 2 3 4 5−2

0

2

4

6

8

10

12

14

16

Cauchy fun.

LS rat. fun.

LS mult.pole

1 2 3 4 5 6 7 8 9 1010

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

44 Berkeley – 04-22-2015

Page 45: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

ä General form

φ(z) =

Np∑i=1

ki∑j=1

αij

(z − zi)j

ä We then need to minimize (as before)∫∞−∞w(t)|h(t)− φ(t)|2dt

ä Need to exploit partial fraction expansions

ä Straightforward extension [but somewhat more cumbersome]

45 Berkeley – 04-22-2015

Page 46: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

ä Next: See what we can do with *one* double pole

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.5

1

1.5

2

2.5

double pole

Gauss 1−pole

Gauss 2−poles

46 Berkeley – 04-22-2015

Page 47: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Who needs a circle?Two poles2 + comparison with compounding

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Sigma = +/−0.6+/−1i ; pow = 2 2

Multiple pole

Single pole

Gauss 2−poles

Gauss 2−poles2

47 Berkeley – 04-22-2015

Page 48: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

SPECTRAL SCHUR COMPLEMENT TECHNIQUES

Page 49: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Introduction: Domain-Decomposition

* Joint work with Vasilis Kalantzis and Ruipeng Li

32 4 5

7 8 9 10

11 12 13 15

17 18

21 22 23 24 25

201916

6

1

14

ä Partition graph using edge-separators (‘vertex-based parti-tioning’)

49 Berkeley – 04-22-2015

Page 50: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Distributed graphand its matrixrepresentation

Local Interface

points

External

interface pointsInternal points

XiXi

Ai

ä Stack all interior variables u1, u2, · · · , up into a vector u,then interface variables y

ä Result:

B1 . . . E1

B2 . . . E2... . . . ...

Bp EpET

1 ET2 . . . ET

p C

︸ ︷︷ ︸

PAP T

u1

u2...upy

= λ

u1

u2...upy

50 Berkeley – 04-22-2015

Page 51: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Notation:

Write as:

A =

(B EET C

)

0 100 200 300 400 500 600 700 800 900

0

100

200

300

400

500

600

700

800

900

nz = 4380

51 Berkeley – 04-22-2015

Page 52: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

The spectral Schur complement

• Eliminating the ui’s we get

S1(λ) E12 · · · E1p

E21 S2(λ) · · · E2p

... . . . ...

E>p1 E>p2 · · · Sp(λ)

y1

y2

...

yp

= 0

• Si(λ) = Ci − λI − E>i (Bi − λI)−1Ei

• Interface problem (non-linear): S(λ)y(λ) = 0.

• Top part can be recovered as ui = −(B−λI)−1Eiy(λ).

• See also AMLS [Bennighof, Lehoucq, 2003]

52 Berkeley – 04-22-2015

Page 53: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Spectral Schur complement (cont.)

State problem as: • Find σ ∈ R such that

One eigenvalue of S(σ) ≡ 0 , or,

ä µ(σ) = 0 where µ(σ) == smallest (|.|) eig of S(σ).

ä Can treat µ(σ) as a function→ root-finding problem.

ä The function µ(σ) is analytic for any σ /∈ Λ(B) with

dµ(σ)

dσ= −1−

‖(B − σI)−1Ey(σ)‖22

‖y(σ)‖22

.

53 Berkeley – 04-22-2015

Page 54: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Basic algorithm - Newton’s scheme

ä We can formulate a Newton-based algorithm.

ALGORITHM : 1 Newton Scheme

1 Select initial σ2 Repeat:3 Compute µ(σ) = Smallest eigenvalue in modulus4 of S(σ) & associated eigenvector y(σ)5 Set η := ‖(B − σI)−1Ey(σ)‖2

6 Set σ := σ + µ(σ)/(1 + η2)7 Until: |µ(σ)| ≤ tol

54 Berkeley – 04-22-2015

Page 55: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Short illustration - eigen-branches between two poles

0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

σ

µi(σ

)

Eigs 1 through 9 in pole−interval [0.133 0.241]

2.355 2.36 2.365 2.37 2.375 2.38−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

σ

µi(σ

)

Eigs 26 to 32 in pole−interval [2.358 − 2.377]

ä There may be a few, or no, eigenvalues λ between twopoles.

55 Berkeley – 04-22-2015

Page 56: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Eigen-branches across the poles

2.355 2.36 2.365 2.37 2.375 2.38 2.385 2.39 2.395 2.4−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

σ

µi(σ

)

Eigs 26 through 32 in pole−interval [2.3580 2.4000]

S(σ) = C−σI−ET(B−σI)−1E ≡ C−σI−∑mi=1

wiwTi

θi−σ

56 Berkeley – 04-22-2015

Page 57: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Branch-hopping

ä Once we converge ...

ä ... start Newton from point on branch immediatly abovecurrent root.

hopping

Branch

0 5 10 15 203

3.01

3.02

3.03

3.04

3.05

3.06

3.07

Iteration

Branch−hopping

Eigenvalues of A

57 Berkeley – 04-22-2015

Page 58: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Evaluating µ(σ)

Inverse Iteration

• For any σ we just need one (two) eigenvalues of S(σ).

• Good setting for “Inverse-Iteration” type approaches.

Lanczos algorithm

• Tends to be expensive for large p and eigenvalues deepinside spectrum

58 Berkeley – 04-22-2015

Page 59: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Numerical experiments

Some details

• Implementation in MPI-C++ using PETSc

• Tests performed on Itasca Linux cluster @ MSI.

The model problem

• Tests on 3-D dicretized Laplacians (7pt. st. – FD).

• We use nx, ny, nz to denote the three dimensions.

• tol set to 1e− 12.

• Single-level partitioning – One node per sub-domain.

59 Berkeley – 04-22-2015

Page 60: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Example of convergence history

0 5 10 15 20 2510

−20

10−15

10−10

10−5

100

Convergence of the 8 eigvls next to σ = 1.0

Iteration of Algorithm 3.20 5 10 15 20 25

10−15

10−10

10−5

100

Convergence of the 8 eigvls next to σ = 2.0

Iteration of Algorithm 3.2

Rel. res. for a few consecutive eigenvalues.Left 40× 40× 20 grid. Right: 20× 20× 20

60 Berkeley – 04-22-2015

Page 61: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Effect of number of domains p on convergence

0.1 0.15 0.2 0.25−1

−0.5

0

0.5

1

σ

µi(σ

)

Eigs 1 through 9 in pole−interval [0.133 0.241]

0.12 0.14 0.16 0.18 0.2 0.22 0.24−0.8

−0.6

−0.4

−0.2

0

0.2

σ

µi(σ

)

Eigs 1 through 9 in pole−interval [0.133 0.241]

Eigen-branches µ1(σ), . . . , µ9(σ) in [0.133, 0.241] for a 33×23× 1 grid. Left: p = 4, right: p = 16.

61 Berkeley – 04-22-2015

Page 62: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Numerical experiments: Parallel tests

• Tests performed on Itasca Linux cluster @ MSI.

• Each node is a two-socket, quad-core 2.8 GHz Intel XeonX5560 “Nehalem EP" with 24 GB of system memory.

• Interconnection : 40-gigabit QDR InfiniBand (IB).

The model problem

• Tests on 3-D dicretized Laplacians (7pt. st. – FD).

• We use nx, ny, nz to denote the three dimensions.

• tol set to 1e− 12.

62 Berkeley – 04-22-2015

Page 63: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Wall-clock timings to compute the first k = 1 and k = 5eigenpairs to the right of σ.

σ = 0.0 σ = 0.5(p, k) s Sec It Sec It

71×

70×

69 (32,1) 65647 1.66 3 11.7 4

(32,5) −− 10.1 15 68.2 19(64,1) 83358 0.47 3 5.20 4(64,5) −− 3.20 14 31.2 19(128,1) 108508 0.16 3 1.90 4(128,5) −− 1.10 14 11.7 18

101×

100×

99 (64,1) 181901 5.90 3 41.1 3

(64,5) −− 28.3 15 221.5 15(128,1) 230849 1.18 3 7.80 3(128,5) −− 6.24 15 41.5 14(256,1) 293626 0.65 3 2.90 3(256,5) −− 3.80 15 14.8 14

Page 64: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

σ = 0.0 σ = 0.5(p, k) s Sec It Sec It

601×

600

(16,1) 7951 0.65 3 3.10 3(16,5) −− 4.42 15 15.9 15(32,1) 12377 0.26 3 2.90 3(32,5) −− 1.60 14 16.4 15(64,1) 18495 0.20 3 1.10 3(64,5) −− 1.10 14 5.40 14

801×

800

(32,1) 16673 1.83 3 11.1 3(32,5) −− 10.2 15 65.1 15(64,1) 24945 0.73 3 5.00 3(64,5) −− 4.10 15 28.20 14(128,1) 36611 0.31 3 2.10 3(128,5) −− 1.80 15 9.70 14

64 Berkeley – 04-22-2015

Page 65: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Conclusion

Part I: Polynomial filtering

ä Polnym. Filter. appealing when # of eigenvectors to becomputed is large and when Matvecs are inexpensive

ä Will not work too well for generalized eigenvalue problem

ä Will not work well for spectra with very large outliers.

Part II: Rational filtering

ä We must rethink the way we view Rational filtering - awayfrom Cauchy and into approximation of functions. LS approachis flexible, easy to implement, easy to understand.

65 Berkeley – 04-22-2015

Page 66: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/TalkBerkeley2015.pdf · 2015-04-22 · Yousef Saad Department of Computer Science and Engineering University

Part III: Domain Decomposition

ä We *must* combine DD with any filtering technique [rationalor polynomial]

ä Many ideas still to explore in Domain Decomposition forinterior eigenvalue problems

ä Filtlan code available here:

www.cs.umn.edu/~saad/software/filtlan

66 Berkeley – 04-22-2015