Solving Laplacian Systems: Some Contributions from Theoretical Computer Science

Solving Laplacian Systems:Some Contributions from

Theoretical Computer Science

Nick HarveyUBC Department of Computer Science

Today’s Problem• Solve the system of linear equations

L x = bwhere L is the Laplacian matrix of a graph.

• Assumptions:– Want fast algorithms for sparse matrices:

running time ¼ # non-zero entries of L– Want provable, asymptotic running time bounds– Computational model: single CPU, random access

memory, infinite-precision arithmetic– Ignore numerical stability, etc.

Think: Symmetric, Diagonally Dominant

What is the Laplacian of a Graph?a c

b

d

L =

2 -1 -1-1 2 -1-1 -1 3 -1

-1 1

a

b

c

d

a b c d degree of node c= # edges that hit c

-1 indicates an edgefrom a to c

• L is symmetric, diagonally dominant) positive semidefinite) all eigenvalues real, non-negative

• Kernel(L) = span( [1,1,…,1] ) (assuming G connected)

d

L =

2 -1 -1-1 2 -1-1 -1 3 -1

-1 1

• What is effective resistance between a and d?– Highschool Calculation: 1 + 1/(1+1/(1+1)) = 5/3– Linear Algebra: xT L+ x, where x = [1, 0, 0, -1]

Replace edges with 1-ohm resistors

L+ =

17 1 -3 -151 17 -3 -15-3 -3 9 -3

-15 -15 -3 33

/48

“Pseudoinverse”

ca

b

Why is L useful?

PDEs

Laplace’s EquationGiven a function f : ¡ ! R,

find a function g : ! R such that

¡

Image from Wikipedia.

q2 r2g(e) ¼ (g(f)-g(e))-(g(e)-g(d)) + (g(b)-g(e))-(g(e)-g(h)) = g(b)+g(d)+g(f)+g(h)-4g(e)

So q2r2g(v) ¼ -L g(v) for all vertices v in grid interior

Conclusion: can compute approximate solution toLaplace’s equation by solving a linear system involving L.

ba c

fed

g h i

L =

2 -1 -1

-1 3 -1 -1

-1 2 -1

-1 3 -1 -1

-1 -1 4 -1 -1

-1 -1 3 -1

-1 2 -1

-1 -1 3 -1

-1 -1 2

q

Discretization

Aren’t existing algorithms good enough?• Preconditioned Conjugate Gradient– Exact solution in O(nm) time (L has size nxn, m non-zeros)

• Caveat: this bound fails with inexact arithmetic

– Better bounds with a good preconditioner?• Multigrid– Provable performance for specific types of PDEs in

low-dimensional spaces– Not intended for general “Lx = b” Laplacian systems

• Algebraic Multigrid– This is close to what we want– I am unaware of an existing theorem of the form:

For any matrix from class X we can efficiently compute coarsenings such that convergence rate is Y

Some Contributions fromTheoretical Computer Science

• Let L be Laplacian of a graph with n vertices• Vaidya [unpublished ‘91],

Bern-Gilbert-Hendrickson-Nguyen-Toledo [SIAM J. Matrix Anal. ’06]:– Key Idea: Use Graph Theory to design preconditioners for L– Can find a preconditioner P in O((¢n)1.5) time such that:• relative condition number κ(L,P) = O((¢n)1.5)• solving systems in P takes O(¢n) time

– Using conjugate gradient method, get a solutionfor “Lx=b” with relative error ² in O((¢n)1.75 log(1/²)) time

– i.e., let x := L+ b and let y be the algorithm’s outputThen ky-xkL · ² kxkL

(y-x)T L (y-x) xT L x

, max degree ¢

• Let L be Laplacian of a graph with n vertices, m edges• Spielman-Teng [STOC ‘04]:– Can find a solution for “Lx = b” with relative error ²

in O(m1+o(1) log(1/²)) time.– Can view as a rigorous implementation of Algebraic Multigrid– Highly technical: journal version is 116 pages

• Koutis-Miller-Peng [FOCS ‘11]:– Can find a solution for “Lx = b” with relative error ²

in O(m log n (log log n)2 log(1/²)) time.– Significantly simpler: only 16 pages

• Ingredients: low-stretch trees, random matrix theory

Some Contributions fromTheoretical Computer Science

Iterative methods & Preconditioning• Suppose you want to solve Lx = b (m non-zeros)• Instead choose a preconditioner matrix P and solve

P-1/2 L P-1/2 y = P-1/2 b• Setting x = P-1/2 y gives a solution to Lx = b• The relative condition number is

• Iterative methods (e.g., conjugate gradient) give– a solution with relative error ²– after iterations– each iteration takes time

O(m + (time to solve a linear system in P) )

Tool #1: Low-Stretch Trees

Image from D. Spielman. “Algorithms, Graph Theory, and Linear Equations in Laplacians”. Talk at ICM 2010.

Laplacians of Subgraphs• Suppose you want to solve Lx = b, where

L is the Laplacian of graph G=(V,E)• For any FµE, we can write L = LF + LE\F, where– LF is the Laplacian of (V,F)– LE\F is the Laplacian of (V,E\F)

• Easy Fact: λmin(LF+ L) ¸ 1 ) κ(L,LF) · λmax(LF

+ L)a c

b

d

L =

2 -1 -1-1 2 -1-1 -1 3 -1

-1 1

abcd

a b c d

LF =

2 -1 -1-1 1-1 1

abcd

a b c d

LE\F =1 -1-1 2 -1

-1 1

abcd

a b c d

F

F

Subtree Preconditioners• Let L be the Laplacian of G=(V,E) (n = #vertices, m = #edges)

• Let T=(V,F) be a subtree of G• Consider using P=LF as a preconditioner• Useful Property: Solving linear systems in P takes

O(n) time, essentially by back-substitution

G T

Subtree Preconditioners• Let L be the Laplacian of G=(V,E) (n = #vertices, m = #edges)

• Let T=(V,F) be a subtree of G• Consider using P=LF as a preconditioner• Useful Property: Solving linear systems in P takes

O(n) time, essentially by back-substitution• Easy Fact: κ(L,P) · λmax(P+ L)• CG gives a solution with relative error ² in time

Low-Stretch Trees• Let L be the Laplacian of G=(V,E) (n = #vertices, m = #edges)

• Let T=(V,F) be a subtree of G and P=LF

• Lemma [Boman-Hendrickson ‘01]: λmax(P-1 L) · stretch(G,T),where

• Theorem [Alon-Karp-Peleg-West ‘91]:Every graph G has a tree T with stretch(G,T) = m1+o(1).

G Tu

v

u

v

distance fromu to v is 5




• Theorem [Alon-Karp-Peleg-West ‘91]:Every graph G has a tree T with stretch(G,T) = m1+o(1).

• Theorem [Abraham-Bartal-Neiman ‘08]:Every G has a tree T with stretch(G,T) · m log n (log log n)2.Moreover, T can be found in O(m log2 n) time.




• Theorem [Alon-Karp-Peleg-West ‘91]:Every graph G has a tree T with stretch(G,T) = m1+o(1)

and T can be found in O(m log2 n) time.• Corollary: Every Laplacian L has a preconditioner P s.t.– κ(L,P) · λmax(P-1 L) · m1+o(1)

– CG gives ²-approx solution in time

– Tighter analysis gives [Spielman-Woo ‘09]

Tool #2: Random Sampling

Data from L. Adamic, N. Glance. “The Political Blogosphere and the 2004 U.S. Election: Divided They Blog”, 2005.

Spectral Sparsifiers• Let LG be the Laplacian of G=(V,E) (n = #vertices, m = #edges)

• A spectral sparsifier of G is a (weighted) graph H=(V,F) s.t.• |F| is small• 1-² · λmin(LH

+ L) and λmax(LH+ L) · 1+²

• Useful notation: (1-²) LG ¹ LH ¹ (1+²) LG

• Consider the linear system LG x = b.Actual solution is x := LG

+ b.Instead, compute y := LH

+ b.

• Then y has low multiplicative error: ky-xkLG · 2² kxkLG

• Computing y is fast since H is sparse:conjugate gradient method takes O(n|F|) time

Spectral Sparsifiers• Let LG be the Laplacian of G=(V,E) (n = #vertices, m = #edges)

• A spectral sparsifier of G is a (weighted) graph H=(V,F) s.t.• |F| is small• (1-²) LG ¹ LH ¹ (1+²) LG

• Theorem [Spielman-Srivastava ‘08]: Every G has aspectral sparsifier H with |F| = O(n log n / ²2).Moreover, H can be constructed in O(m log3 n) time.

• Algorithm: Using CG, we get an ²-approx solutionto “Lx = b” in O(m log3 n + n2 log n/ ²2) time– Caveat: this algorithm uses circular logic. To construct the

sparsifier H, we need to solve several linear systems “Lx = b”.

Decomposing Laplacian into Edges• Let LG be the Laplacian of G=(V,E)

• For every e2E, let Le be the Laplacian of (V,{e})

• Then LG = e Le

a c

b

d LG =2 -1 -1-1 2 -1-1 -1 3 -1

-1 1

Lab

1 -1-1 1

1 -1-1 1

1 -1

-1 1

1 -1-1 1

Lac Lbc Lcd

• Let LG be the Laplacian of G=(V,E)

• For every e2E, let Le be the Laplacian of (V,{e})

• Then LG = e Le

• Sparsification:– Find coefficients we for every e2E

– Let H be the weighted graph where e has weight we

– So LH = e we Le

• Goals:• Most of the we are zero

• (1-²) LG ¹ LH ¹ (1+²) LG

Decomposing Laplacian into Edges

The General Problem:Sparsifying Sums of PSD Matrices

• General Problem: Given PSD matrices Le s.t. e Le = L, find coefficients we, mostly zero, such that (1-²) L ¹ e we Le ¹ (1+²) L

• Theorem: [Ahlswede-Winter ’02]Random sampling gives w with O( n log n/²2 ) non-zeros.

• Theorem: [de Carli Silva-Harvey-Sato ‘11],building on [Batson-Spielman-Srivastava ‘09]There exists w with O( n/²2 ) non-zeros.

Concentration Inequalities• Theorem: [Chernoff ‘52, Hoeffding ‘63]

Let Y1,…,Yk be i.i.d. random non-negative real numbers s.t. E[ Yi ] = Z and Yi·uZ. Then

• Theorem: [Ahlswede-Winter ‘02]Let Y1,…,Yk be i.i.d. random PSD nxn matricess.t. E[ Yi ] = Z and Yi¹uZ. Then

The only difference

Solving the General Problem• General Problem: Given PSD matrices Le s.t. e Le = L,

find coefficients we, mostly zero, such that (1-²) L ¹ e we Le ¹ (1+²) L

• AW Theorem: Let Y1,…,Yk be i.i.d. random PSD matricessuch that E[ Yi ] = Z and Yi¹uZ. Then

• To solve General Problem with O(n log n/²2) non-zeros• Repeat k:=£(n log n /²2) times• Pick an edge e with probability pe := Tr(Le L+) / n• Increment we by 1/k¢pe

Main Caveat: Samplingprobabilities are hard to compute.

Low-Stretch Trees+ Random Sampling

+ Recursion

= Nearly-Optimal Algorithm

Obstacles encountered so far1. Low-stretch trees are easy to compute but only

give preconditioners with κ ¼ m• Low-stretch tree: fast, low-quality preconditioner

2. Sparsifiers give preconditioners with κ ¼ 1+², but they are harder to compute• Sparsifier: slow, high-quality preconditioner

3. Using a sparsifier H as a preconditioner is not very efficient because solving LH

+ x = b is slow• Sparsifier: slow to construct and slow to use

Bypassing the Obstacles• Idea #1: Get a “medium-quality” preconditioner by

combining the low- and high-quality preconditioners.• Specifically, do random sampling according to stretch

• Intuition: The path is a bad approximation to the cycle

G

T

Κ(LG,LT) = n

High condition numbercaused by missing edge,which has high stretch.

Bypassing the Obstacles• Idea #1: Get a “medium-quality” preconditioner by

combining the low- and high-quality preconditioners.• Specifically, do random sampling according to stretch• Compute a low-stretch tree T. For every edge uv, set

• Construct sparsifier H by making O(m / log2 n) samples, where e is sampled with probability pe.Then add log4 n copies of T to H.

• Theorem [Koutis, Miller, Peng FOCS’10]:• H has at most O(m / log2 n) edges• LG ¹ LH ¹ log4 n LG

G

H1

H2Solving LH2 x = b - Output ²-approx solution to LH2 x = b

…

LG ¹ LH1 ¹ log4 n LG

LH1 ¹ LH2 ¹ log4 n LH1


• Idea #2 [Joshi ‘97, Spielman-Teng ‘04]: To solve linear systems in the sparsifier “LH x = b”, use recursion.

Solving LG x = b - Construct sparsifier H1

- Recursively solve LH1 x = b - Use Chebyshev iterations to improve x - Output ²-approx solution to LG x = b

Solving LH1 x = b - Construct sparsifier H2

- Recursively solve LH2 x = b - Use Chebyshev iterations to improve x - Output ²-approx solution to LH1 x = b

H2

…

LG ¹ LH1 ¹ log4 n LG



Sketch of Analysis: - Few Chebyshev iterations because Hi+1 isa good approximation of Hi. - Few levels of recursion because Hi+1 isa constant factor smaller than Hi.

G

H1

• Idea #2 [Joshi ‘97, Spielman-Teng ‘04]: To solve linear systems in the sparsifier “LH x = b”, use recursion.

Solving LG x = b - Construct sparsifier H1

- Recursively solve LH1 x = b - Use Chebyshev iterations to improve x - Output ²-approx solution to LG x = b

Solving LH1 x = b - Construct sparsifier H2

- Recursively solve LH2 x = b - Use Chebyshev iterations to improve x - Output ²-approx solution to LH1 x = b

Conclusion• Let A be a symmetric, diagonally dominant matrix

of size nxn with m non-zero entries.• There is an algorithm to solve “Ax = b” with relative

error ² in O(m log n (log log n)2 log(1/²)) time.• Ingredients: Low-stretch trees, concentration of

random matrices

• Parallelization?• Practical implementation?• Numerical stability?• Arbitrary PSD matrices?

Open Questions

Documents

Solving Laplacian Systems: Some Contributions from Theoretical Computer Science