Low-Rank Tensor Techniques for High-Dimensional …Low-Rank Tensor Techniques for High-Dimensional Problems Daniel Kressner CADMOS Chair for Numerical Algorithms and HPC MATHICSE,

Low-Rank Tensor Techniquesfor High-Dimensional Problems

Daniel KressnerCADMOS Chair for Numerical Algorithms and HPC

MATHICSE, EPFL

1

ContentsI What is a tensor?I ApplicationsI Matrices and low rankI CP and TuckerI Hierarchical TuckerI Algorithms based on low-rank tensorsI Conclusions

2

What is a tensor?I Vectors, matrices, and tensorsI Basic calculus with tensorsI Vectorization and matricizationI µ-mode matrix productsI Two classes of tensor problems

3

Vectors, matrices, and tensors

Vector Matrix Tensor

I scalar = tensor of order 0I (column) vector = tensor of order 1I matrix = tensor of order 2I tensor of order 3

= n1n2n3 numbers arranged in n1 × n2 × n3 array4

Tensors of arbitrary orderA d-th order tensor X of size n1 × n2 × · · · × nd is a d-dimensionalarray with entries

Xi1,i2,...,id , iµ ∈ 1, . . . ,nµ for µ = 1, . . . ,d .

In the following, entries of X are real (for simplicity)

X ∈ Rn1×n2×···×nd .

Multi-index notation:

I = 1, . . . ,n1 × 1, . . . ,n2 × · · · × 1, . . . ,nd.

Then i ∈ I is a tuple of d indices:

i = (i1, i2, . . . , id ).

Allows to write entries of X as Xi for i ∈ I.

5

Two important points1. A matrix A ∈ Rm×n has a natural interpretation as a linear

operator in terms of matrix-vector multiplications:

A : Rn → Rm, A : x 7→ A · x .

There is no such (unique and natural) interpretation for tensors! fundamental difficulty to define meaningful general notion ofeigenvalues and singular values of tensors.

2. Number of entries in tensor grows exponentially with d Curse of dimensionality.

Example: Tensor of order 30 with n1 = n2 = · · · = nd = 10 has1030 entries = 8× 1012 Exabyte storage!1

For d 1: Cannot afford to store tensor explicitly (in terms of itsentries).

1Global data storage calculated at 295 exabyte, seehttp://www.bbc.co.uk/news/technology-12419672.

6

http://www.bbc.co.uk/news/technology-12419672

Basic calculusI Addition of two equal-sized tensors X ,Y:

Z = X + Y ⇔ Zi = Xi + Yi ∀i ∈ I.

I Scalar product with α ∈ R:

Z = αX ⇔ Zi = αXi ∀i ∈ I.

vector space structure.

I Inner product of two equal-sized tensors X ,Y:

〈X ,Y〉 :=∑i∈I

xiyi .

Induced norm‖X‖ :=

(∑i∈I

x2i

)1/2

For a 2nd order tensor (= matrix) this corresponds to theFrobenius norm.

7

VectorizationTensor X of size n1 × n2 × · · · × nd has n1 · n2 · · · nd entries many ways to stack entries in a (loooong) column vector.One possible choice:The vectorization of X is denoted by vec(X ), where

vec : Rn1×n2×···×nd → Rn1·n2···nd

stacks the entries of a tensor in reverse lexicographical order into along column vector.

Remark: For d = 2, this is the usual way how matrices are vectorized.

A =

a11 a12a21 a22a31 a32

⇒ vec(A) =

a11a21a31

a12a22a32

8

VectorizationExample: d = 3, n1 = 3, n2 = 2, n3 = 3.

vec(X ) =

x111x112x113x121

...

...x321x322x323

9

MatricizationI A matrix has two modes (column mode and row mode).I A d th-order tensor X has d modes (µ = 1, µ = 2, . . ., µ = d).

Let us fix all but one mode, e.g., µ = 1: Then

X (:, i2, i3, . . . , id ) (abuse of MATLAB notation)

is a vector of length n1 for each choice of i2, . . . , id .

View tensor X as a bunch of column vectors:

10

MatricizationStack vectors into an n1 × (n2 · · · nd ) matrix:

X ∈ Rn1×n2×···×nd X (1) ∈ Rn1×(n2n3···nd )

For µ = 1, . . . ,d , the µ-mode matricization of X is a matrix

X (µ) ∈ Rnµ×(n1···nµ−1nµ+1···nd )

with entries (X (µ)

)iµ1 ,(i1,...,iµ−1,iµ+1...id )

= Xi ∀i ∈ I.

11

MatricizationIn MATLAB: a = rand(2,3,4,5);

I 1-mode matricization:reshape(a,2,3*4*5)

I 2-mode matricization:b = permute(a,[2 1 3 4]);reshape(b,3,2*4*5)



For a matrix A ∈ Rn1×n2 :

A(1) = A, A(2) = AT .

12

µ-mode matrix productsConsider 1-mode matricization X (1) ∈ Rn1×(n2···nd ):

Seems to make sense to multiply an m × n1 matrix A from the left:

Y (1) := A X (1) ∈ Rm×(n2···nd ).

Can rearrange Y (1) back into an m × n2 × · · · × nd tensor Y.This is called 1-mode matrix multiplication

Y = A 1 X ⇔ Y (1) = AX (1)

More formally (and more ugly):

Yi1,i2,...,id =

n1∑k=1

ai1,kXk,i2,...,id .

13

µ-mode matrix productsGeneral definition of a µ-mode matrix product with A ∈ Rm×n1 :

Y = A µ X ⇔ Y (µ) = AX (µ).

More formally (and more ugly):

Yi1,i2,...,id =

n1∑k=1

aiµ,kXi1,...,iµ−1,k,iµ+1,...,id .

For matrices:I 1-mode multiplication = multiplication from the left:

Y = A 1 X = A X .

I 2-mode multiplication = transposed multiplication from the right:

Y = A 2 X = X AT .

14

Kronecker productFor m× n matrix A and k × ` matrix B, Kronecker product defined as

B ⊗ A :=

b11A · · · b1`A...

...bk1A · · · bk`A

∈ Rkm×`n.

Most important properties (for our purposes):1. vec(A X ) = (I ⊗ A) vec(X ).2. vec(X AT ) = (A⊗ I) vec(X ).3. (B ⊗ A)(D ⊗ C) = (BD ⊗ AC).4. Im ⊗ In = Imn.

15

µ-mode matrix products and vectorizationBy definition,

vec(X ) = vec(X (1)

).

Consequently, also

vec(A 1 X ) = vec(A X (1)

).

Vectorized version of 1-mode matrix product:

vec(A 1 X ) = (In2···nd ⊗ A)vec(X )

= (Ind ⊗ · · · ⊗ In2 ⊗ A) vec(X ).

Relation between µ-mode matrix product and matrix-vector product:

vec(A µ X ) = (Ind ⊗ · · · ⊗ Inµ+1 ⊗ A⊗ Inµ−1 ⊗ · · · ⊗ In1 ) vec(X )

16

Two classes of tensor problemsClass 1: function-related tensorsConsider a function u(ξ1, . . . , ξd ) ∈ R in d variables ξ1, . . . , ξd .Tensor U ∈ Rn1×···×nd represents discretization of u:I U contains function values of u evaluated on a grid; orI U contains coefficients of truncated expansion in tensorized

basis functions:

u(ξ1, . . . , ξd ) ≈∑i∈I

Ui φi1 (ξ1)φi2 (ξ2) · · ·φid (ξd ).

Typical setting:I U only given implicitly, e.g., as the solution of a discretized PDE;I seek approximations to U with very low storage and tolerable

accuracy.I d may become very large.

Focus of this lecture on function-related tensors!

17

Discretization of function in d variablesξ1, . . . , ξd ∈ [0,1] #function values grows exponentially with d

18

Separability helpsIdeal situation:Function f separable:f (ξ1, ξ2, . . . , ξd ) = f1(ξ1)f2(ξ2) . . . fd (ξd )

Kronecker product

diskretized f

discretized f j O(nd ) memory O(dn) memoryOf course:Exact separability rarely satisfied inpractice.

19

Two classes of tensor problemsClass 2: data-related tensorsTensor U ∈ Rn1×···×nd contains multi-dimensional data.

Example 1: U2011,3,2 denotes the number of papers published 2011by author 3 in the mathematical journal 2.

Example 2: A video of 1000 frames with resolution 640× 480 canbe viewed as a 640× 480× 1000 tensor.

Typical setting:I entries of U given explicitly (at least partially).I extraction of dominant features from U .I usually moderate values for d .

20

SummaryI Tensor X ∈ Rn1×···×nd is a d-dimensional array.I Various ways of reshaping entries of a tensor X into a vector or

matrix.I µ-mode matrix multiplication can be expressed with Kronecker

products

Further reading:I T. Kolda and B. W. Bader. Tensor decompositions and

applications. SIAM Rev. 51 (2009), no. 3, 455–500.Software:

I MATLAB offers basic functionality to work with d-dimensionalarrays.

I MATLAB Tensor Toolbox: http://www.csmr.ca.sandia.gov/~tgkolda/TensorToolbox/

21

http://www.csmr.ca.sandia.gov/~tgkolda/TensorToolbox/

http://www.csmr.ca.sandia.gov/~tgkolda/TensorToolbox/

Applications inscientific computing

I High-dimensional elliptic PDEsI High-dimensional PDE-eigenvalue problemsI Quantum many-body problemsI Stochastic Automata NetworksI further applications

22

High-dimensional elliptic PDEs: 3D model problemI Consider

−∆u = f in Ω, u|∂Ω = 0,

on unit cube Ω = [0,1]3.I Discretize on tensor grid.

Uniform grid for simplicity:

ξ(j)µ = jh, h =

1n + 1

for µ = 1,2,3.

I Approximate solution tensor U ∈ Rn×n×n:

Ui1,i2,i3 ≈ u(ξ

(i1)1 , ξ

(i2)2 , . . . , ξ

(id )d

).

23

High-dimensional elliptic PDEs: 3D model problemI Discretization of 1D-Laplace:

−∂xx ≈

2 −1

−1 2. . .

. . . . . . −1−1 2

=: A.

I Application in each coordinate direction:

−∂ξ1ξ1u(ξ1, ξ2, ξ3) ≈ A 1 U ,−∂ξ2ξ2u(ξ1, ξ2, ξ3) ≈ A 2 U ,−∂ξ3ξ3u(ξ1, ξ2, ξ3) ≈ A 3 U .

I Hence,−∆u ≈ A 1 U + A 2 U + A 3 U

or in vectorized form with u = vec(U):

−∆u ≈ (I ⊗ I ⊗ A + I ⊗ A⊗ I + A⊗ I ⊗ I)u.

24

High-dimensional elliptic PDEs: 3D model problemFinite difference discretization of model problem

−∆u = f in Ω, u|∂Ω = 0

for Ω = [0,1]3 takes the form

(I ⊗ I ⊗ A + I ⊗ A⊗ I + A⊗ I ⊗ I)u = f.

Similar structure for finite element discretization with tensorized FEs:

V⊗W⊗ Z =∑

αijk vi (ξ1)wj (ξ2)zk (ξ3) : αijk ∈ R

with

V = v1(ξ1), . . . , vn(ξ1), W = w1(ξ2), . . . ,wn(ξ2), Z = z1(ξ3), . . . , zn(ξ3)

Galerkin discretization

(KV ⊗MW ⊗MZ + MV ⊗ KW ⊗MZ + MV ⊗MW ⊗ KZ )u = f,

with 1D mass/stiffness matrices MV ,MW ,MZ ,KV ,KW ,KZ .25

High-dimensional elliptic PDEs: Arbitrary dimensionsFinite difference discretization of model problem

−∆u = f in Ω, u|∂Ω = 0

for Ω = [0,1]d takes the form

( d∑j=1

I ⊗ · · · ⊗ I ⊗ A⊗ I ⊗ · · · ⊗ I)

u = f.

To obtain such Kronecker structure in general:I tensorized domain;I highly structured grid;I coefficients that can be written/approximated as sum of

separable functions.

26

High-dimensional PDE-eigenvalue problemsPDE-eigenvalue problem

∆u(ξ) + V (ξ)u(ξ) = λu(ξ) in Ω = [0,1]d ,u(ξ) = 0 on ∂Ω.

Assumption: Potential represented as

V (ξ) =s∑

j=1

V (1)j (ξ1)V (2)

j (ξ2) · · ·V (d)j (ξd ).

finite difference discretization

Au = (AL +AV )u = λu,

with

AL =d∑

j=1

I ⊗ · · · ⊗ I︸︷︷︸d−j times

⊗AL ⊗ I ⊗ · · · ⊗ I︸︷︷︸j−1 times

,

AV =s∑

j=1

A(d)V ,j ⊗ · · · ⊗ A(2)

V ,j ⊗ A(1)V ,j .

27

Quantum many-body problemsI spin-1/2 particles: proton, neutron, electron, and quark.I two states: spin-up, spin-downI quantum state for each spin represented by vector in C2 (spinor)I quantum state for system of d spins represented by vector in C2d

I quantum mechanical operators expressed in terms of Paulimatrices

Px =

[0 11 0

], Py =

[0 −ii 0

], Pz =

[1 00 −1

].

I spin Hamiltonian: sum of Kronecker products of Pauli matricesand identities each term describes physical (inter)action of spins

I interaction of spins described by graphI Goal: Compute ground state of spin Hamiltonian.

28

Quantum many-body problemsExample: 1d chain of 5 spins with periodic boundary conditions

1 3 4 52

Hamiltonian describing pairwise interaction between nearestneighbors:

H = Pz ⊗ Pz ⊗ I ⊗ I ⊗ I+ I ⊗ Pz ⊗ Pz ⊗ I ⊗ I+ I ⊗ I ⊗ Pz ⊗ Pz ⊗ I+ I ⊗ I ⊗ I ⊗ Pz ⊗ Pz+ Pz ⊗ I ⊗ I ⊗ I ⊗ Pz

29

Quantum many-body problemsI Ising (ZZ) model for 1d chain of d spins with open boundary

conditions:

H =

p−1∑k=1

I ⊗ · · · ⊗ I ⊗ Pz ⊗ Pz ⊗ I ⊗ · · · ⊗ I

+λ

p∑k=1

I ⊗ · · · ⊗ I ⊗ Px ⊗ I ⊗ · · · ⊗ I

λ = ratio between strength of magnetic field and pairwiseinteractions

I 1d Heisenberg (XY) modelI Current research: 2d models.I More details in:

Huckle/Waldherr/Schulte-Herbrüggen: Computations inQuantum Tensor Networks.Schollwöck: The density-matrix renormalization group in the ageof matrix product states.

30

Stochastic Automata Networks (SANs)

I 3 stochastic automata A1,A2,A3 having 3 states each.I Vector x (i)

t ∈ R3 describes probabilities of states (1), (2), (3) in Aiat time t

I No coupling between automata local transition x (i)t 7→ x (i)

t+1described by Markov chain:

x (i)t+1 = Eix

(i)t ,

with a stochastic matrix Ei .I Stationary distribution of Ai = Perron vector of Ei (eigenvector for

eigenvalue 1).

31

Stochastic Automata Networks (SANs)

I 3 stochastic automata A1,A2,A3 having 3 states each.I Coupling between automata local transition x (i)

t 7→ x (i)t+1 not

described by Markov chain.I Need to consider all possible combinations of states in

(A1,A2,A3):

(1,1,1), (1,1,2), (1,1,3), (1,2,1), (1,2,2), . . . .

I Vector xt ∈ R33(or tensor X (t) ∈ R3×3×3) describes probabilities

of combined states.

32

Stochastic Automata Networks (SANs)I Transition xt 7→ xt+1 described by Markov chain:

xt+1 = Ext ,

with a large stochastic matrix E .I Oversimplified example:

E = I ⊗ I ⊗ E1 + I ⊗ E2 ⊗ I + E3 ⊗ I ⊗ I︸︷︷︸local transition

.

+ I ⊗ E21 ⊗ E12︸︷︷︸interaction between A1,A2

+ E32 ⊗ E23 ⊗ I︸︷︷︸interaction between A2,A3

I Goal: Compute stationary distribution = Perron vector of E .I More details in:

Stewart: Introduction to the Numerical Solution of MarkovChains. Chapter 9.Buchholz: Product Form Approximations for CommunicatingMarkov Processes.

33

Further applicationsOther applications in scientific computing featuring low-rank tensorconcepts:

I Boltzmann equation [Ibragimov/Rjasanow’2009].I Dynamical systems [Koch/Lubich’2009].I Parabolic PDEs [Andreev/Tobler’2011], [Khoromskij’2009].I Stochastic PDEs [Khoromskij/Schwab’2010],

[Matthies/Zander’2011], [Kressner/Tobler’2011],[Ballani/Grasedyck/Kluge’2011], . . .

I Electronic structure calculation [Chinnamsetty et al.’2007], [Fladet al.’2009], [Khoromskij/Khoromskaja’2009],[Limpanuparb/Gill’2009], [Benedikt et al.’2011],[Mohlenkamp’2011], . . .

I Evaluation of boundary integrals (in BEM): [Grasedyck],[Khoromskij/Sauter/Veit’2011].

I . . .

34

SummaryI Large diversity of applications leading to linear systems /

eigenvalue problems with Kronecker product structures.I For many problems of practical interest:

Explicit storage / computation of solution infeasible.I Increasing use of low-rank tensor techniques.

Heaviest use currently:DMRG for quantum many-body problems.

I Remark: For PDE-related applications, high dimensionality canalso be addressed during the discretization phase (sparse grids,adaptive sparse discretization, . . .).Has advantages and disadvantages.

35

Approximatelow-rank matrices

I Singular value decompositionI Separability and low rankI Separability by polynomial interpolationI Separability by exponential sumsI Low rank of snapshot matrices

36

Low-rank approximationSetting: Matrix X ∈ Rn×m, m and n too large to compute/store Xexplicitly.Idea: Replace X by RST with R ∈ Rn×r ,S ∈ Rm×r and r m,n.

X RST

Memory nm nr + rmCost ops(m,n) ops(m,n)× r

minm,n (?)

min‖X − RST‖2 : R ∈ Rn×r ,S ∈ Rm×r = σk+1.

with singular values σ1 ≥ σ2 ≥ · · · ≥ σminm,n of X .

37

Construction from singular value decompositionSVD: Let matrix X ∈ Rn×m and k = minm,n. Then ∃ orthonormalmatrices

U =[u1, u2, . . . , uk

]∈ Rn×k , V =

[v1, v2, . . . , vk

]∈ Rm×k ,

such thatX = UΣV T , Σ = diag(σ1, σ2, . . . , σk ).

Choose r ≤ k and partition

X =[U1, U2

] [ Σ1 00 Σ2

] [V1, V2

]T= U1 Σ1︸︷︷︸

=:R

V T1︸︷︷︸

=:ST

+ U2Σ2V T2 .

Then ‖X − RST‖2 = ‖Σ2‖2 = σr+1.

Good low rank approximation if singular values decay sufficiently fast.

Also: span(X ) ≈ span(R), span(X T ) ≈ span(ST )

38

Discretization of bivariate functionI Bivariate function: f (x , y) :

[xmin, xmax

]×[ymin, ymax

]→ R.

I Function values on tensor grid [x1, . . . , xn]× [y1, . . . , ym]:

F =

f (x1, y1) f (x1, y2) · · · f (x1, yn)f (x2, y1) f (x2, y2) · · · f (x2, yn)

......

...f (xm, y1) f (xm, y2) · · · f (xm, yn)

Basic but crucial observation: f (x , y) = g(x)h(y)

F =

g(x1)h(y1) · · · g(x1)h(yn)...

...g(xm)h(y1) · · · g(xm)h(yn)

=

g(x1)...

g(xm)

[ h(y1) · · · h(yn) ]

Separability implies rank 1.

39

Separability and low rankApproximation by sum of separable functions

f (x , y) = g1(x)h1(y) + · · ·+ gr (x)hr (y)︸︷︷︸=:fr (x,y)

+ error.

Define

Fr =

fr (x1, y1) · · · fr (x1, yn)...

...fr (xm, y1) · · · fr (xm, yn)

.Then Fr has rank ≤ r and ‖F − Fr‖F ≤

√mn × error.

σr+1(F ) ≤‖F − Fr‖2 ≤ ‖F − Fr‖F ≤

√mn × error.

Semi-separable approximation implies low-rank approximation.

40

Semi-separable approximation by polynomialsSolution of approximation problem

f (x , y) = g1(x)h1(y) + · · ·+ gr (x)hr (y) + error.

not trivial; gj ,hj can be chosen arbitrarily!

General construction by polynomial interpolation:1. Lagrange interpolation of f (x , y) in y -coordinate:

Iy [f ](x , y) =r∑

j=1

f (x , θj )Lj (y)

with Lagrange polynomials Lj of degree r − 1 on [xmin, xmax].

2. Interpolation of Iy [f ] in x-coordinate:

Ix [Iy [f ]](x , y) =r∑

i,j=1

f (ξi , θj )Li (x)Lj (y) =r∑

i=1

Li,x (x)Lj,y (y),

where f [f (ξi , θj )]i,j is “diagonalized” by SVD.41

Semi-separable approximation by polynomials

error ≤ ‖f − Ix [Iy [f ]]‖∞= ‖f − Ix [f ] + Ix [f ]− Ix [Iy [f ]]‖∞≤ ‖f − Ix [f ]‖∞ + ‖Ix‖∞‖f − Iy [f ]‖∞

with Lebesgue constant ‖Ix‖∞ ∼ log r when using Chebyshevinterpolation nodes.

Polynomial interpolation error typically much too pessimistic

I Lebesgue constants hit hard in high dimensions: (log r)d−1.I Severe theoretical barriers for general smooth multivariate

functions:E. Novak and H. Wozniakowski: Tractability of MultivariateProblems, Volume I and II. EMS.

42

Semi-separable approximation of 1/(x + y)Consider

f (x , y) =1

x + y, x , y ∈ [α, β], 0 < α < β.

Apply numerical quadrature:

1z

=

∫ ∞0

e−tz dt =r∑

j=1

ωje−γj z + error.

Inserting z = x + y

1x + y

=r∑

j=1

ωje−γj (x+y) + error =r∑

j=1

ωje−γj xe−γj y + error.

Choice of nodes γj > 0 and weights ωj > 0 as in [Stenger’93,Braess’86, Braess/Hackbusch’05]

error ≤ 8|α|

exp[− rπ2

log(8β/α)

].

43

Semi-separable approximation by exponential sumsI Consider more general case of function f (x , y) := g(x + y).I Approximation of g(z) with z := x + y by exponential sum

g(z) ≈r∑

j=1

ωj exp(γjz) (1)

for some coefficients γj , ωj ∈ R.I (1) gives semi-separable approximation for f :

f (x , y) = g(x + y) ≈r∑

j=1

ωj exp(γj (x + y))

=r∑

j=1

ωj exp(γjx) exp(γjy).

I Naturally extends to arbitrarily many variables.I Problem: (1) nontrivial approx problem [Braess’1986],

[Hackbusch’2006], . . .44

Low-rank approximation of snapshot matricesVector-valued function

x(α) : [αmin, αmax]→ Rn

Sampling at α1, . . . , αm ∈ [αmin, αmax]:

Snapshot matrix X = =[x(α1), x(α2), . . . , x(αm)

]

45

Example: Baking 1 cookieStationary heat equation with pw constant heat conductivity σ(x , α):

−∇(σ(x , α)∇u) = f in Ω = [−1,1]2

u = 0 on ∂Ω,

I σ(baking tray) = 1I σ(cookie) = 1 + α

I Undetermined parameter

α ∈ [αmin, αmax].

0 0.5 1 1.5 2

0

0.5

1

1.5

2

# Vertices : 455, # Elements : 825,# Edges : 1279

Standard FE discretization results in linearly parameter-dependentlinear system

(A0 + αA1)x(α) = b.

46

Singular value decay – observationI 1 Cookie: n = 371,m = 101.

log10(singular values of snapshot matrix)

0 20 40 60 80 100−20

−15

−10

−5

0

5

I Foundation of Proper Orthogonal Decomposition and ReducedBasis Methods.

47

Singular value decay – explanationPolynomial approximation:

x(α) = x0 + αx1 + α2x2 + · · ·+ αk−1xk−1 + error.

Approximation error:I Assume b(·), A(·) analytic x(·) analytic.I Then

error . ρ−k ,

where ρ > 1 depends on domain of analyticity of A,b.(Proof: Direct extension of classical result for scalar-valuedfunctions.)

48

Singular value decay – explanationPolynomial approximation:

x(α) = x0 + αx1 + α2x2 + · · ·+ αk−1xk−1 + error.

Snapshot matrix:

X =[x(α1), x(α2), . . . , x(αm)

]=

[x0, x1, . . . , xk−1

]

1 1 . . . 1α1 α2 . . . αm...

......

αk−11 αk−1

2 . . . αk−1m

+ error

= matrix of rank k + error

σk+1(X ) ≤ error . ρ−k

Remark: Trivially extends to pw analytic case.

49

Singular value decay – pw analytic caseExample: Consider smallest singular value σ(z) and correspondingright singular vector v(z) of B(z) = A− izI for z ∈ [−1,1].

I s(z) only Lipschitzcont, but pw anal.

I v(z) discontinuous,but pw anal.

I A = 2× 2 block diag randn, n = 400.I Snapshot matrix of singular vectors:

X =[

v(z1), v(z2), . . . , v(z100)]

for equidistant samples zj ∈ [−1,1].

σ(z) Singular values of X

−1 −0.5 0 0.5 10

0.005

0.01

0.015

0.02

0.025

0.03

z

0 20 40 60 80 10010

−20

10−15

10−10

10−5

100

105

50

Summary

Need strong singular value decay for good low-rank approximations.

For function-related matrices/tensors: Strong link to semi-separableapproximations.

Smoothness seems to be important... at least somehow.I Fortunately, smoothness is not necessary.

Piecewise smoothness can be enough.I Unfortunately, smoothness is not sufficient for higher-order

tensors.I Need to impose stronger regularity as dimension/order d

increases, based, e.g., on mixed weak derivatives [Yserentant:Regularity and approximability of electronic wave functions.2010].

51

Low-rank tensors:CP and Tucker

I CPI TuckerI Higher-order SVDI Tensor networks

52

CP decompositionI Aim: Generalize concept of low rank from matrices to tensors.I One possibility motivated by

X =[a1, a2, . . . , aR

][b1, b2, . . . , bR

]T=

= a1bT1 + a2bT

2 + · · ·+ aRbTR .

vectorization

vec(X ) = b1 ⊗ a1 + b2 ⊗ a2 + · · ·+ bR ⊗ aR .

Canonical Polyadic decomposition of tensor X ∈ Rn1×n2×n3 definedvia

vec(X ) = c1 ⊗ b1 ⊗ a1 + c2 ⊗ b2 ⊗ a2 + · · ·+ cR ⊗ bR ⊗ aR

for vectors aj ∈ Rn1 , bj ∈ Rn2 , cj ∈ Rn3 .

CP directly corresponds to semi-separable approximation.Tensor rank of X = minimal possible R

53

CP decompositionIllustration of CP decomposition

vec(X ) = c1 ⊗ b1 ⊗ a1 + c2 ⊗ b2 ⊗ a2 + · · ·+ cR ⊗ bR ⊗ aR .

c1

a1

b1

cr

ar

br

X

54

CP decompositionI CP decomposition offers low data-complexity; for constant R:

linear complexity in d .I For matrices:

I rank r is upper semi-continuous closedness property:sequence of rank= r matrices can only converge to rank≤ r matrix.

I best low-rank approximation possible by successive rank-1approximations.

I Robust black-box algorithms/software available (svd, Lanczos).

For tensors of order d ≥ 3:I tensor rank R is not upper

semi-continuous

lack of closedness

I successive rank-1 approximations failI all algorithms based on optimization

techniques (ALS, Gauss-Newton)Picture taken from [Kolda/Bader’2009].

55

Tucker decompositionI Aim: Generalize concept of low rank from matrices to tensors.I Alternative possibility motivated by

A = U · Σ · V T , U ∈ Rn1×r , V ∈ Rn2×r , Σ ∈ Rr×r .

vectorization

vec(X ) =(V ⊗ U

)· vec(Σ).

Ignore diagonal structure of Σ and call it C.

Tucker decomposition of tensor X ∈ Rn1×n2×n3 defined via

vec(X ) =(W ⊗ V ⊗ U

)· vec(C)

with U ∈ Rn1×r1 , V ∈ Rn2×r2 , W ∈ Rn3×r3 ,and core tensor C ∈ Rr1×r2×r3 .

In terms of µ-mode matrix products:

X = U 1 V 2 W 3 C =: (U,V ,W ) C.

56

Tucker decompositionIllustration of Tucker decomposition

X = (U,V ,W ) C

X CU

V

W

57

Tucker decompositionConsider all three matricizations:

X (1) = U · C(1) ·(W ⊗ V

)T,

X (2) = V · C(2) ·(W ⊗ U

)T,

X (3) = W · C(3) ·(V ⊗ U

)T.

These are low rank decompositions

rank(X (1)

)≤ r1, rank

(X (2)

)≤ r2, rank

(X (3)

)≤ r3.

Multilinear rank of tensor X ∈ Rn1×n2×n3 defined by tuple

(r1, r2, r3), with ri = rank(X (i)).

58

Higher-order SVD (HOSVD)Goal: Approximate given tensor X by Tucker decomposition withprescribed multilinear rank (r1, r2, r3).

1. Calculate SVD of matricizations:

X (µ) = UµΣµV Tµ for µ = 1,2,3.

2. Truncate basis matrices:

Uµ := Uµ(:,1 : rµ) for µ = 1,2,3.

3. Form core tensor:

vec(C) :=(UT

3 ⊗ UT2 ⊗ UT

1)· vec(X ).

Truncated tensor produced by HOSVD [Lathauwer/DeMoor/Vandewalle’2000]:

vec(X)

:=(U3 ⊗ U2 ⊗ U1

)· vec(C).

Remark:Orthogonal projection X :=

(π1 π2 π3

)X with πµX := UµUT

µ µ X .59

Higher-order SVD (HOSVD)Tensor X resulting from HOSVD satisfies quasi-optimality condition

‖X − X‖ ≤√

d‖X − Xbest‖,

where Xbest is best approximation of X with multilinear ranks(r1, . . . , rd ).

Proof:

‖X − X‖2 = ‖X − (π1 π2 π3)X‖2

= ‖X − π1X‖2 + ‖π1X − (π1 π2)X‖2 + · · ·· · ·+ ‖(π1 π2)X − (π1 π2 π3)X‖2

≤ ‖X − π1X‖2 + ‖X − π2X‖2 + ‖X − π3X‖2

Using‖X − πµX‖ ≤ ‖X − Xbest‖ for µ = 1,2,3

leads to‖X − X‖2 ≤ 3 · ‖X − Xbest‖2.

Best approximation: See [Kolda/Bader’09].60

Tucker decomposition – SummaryFor general tensors:

I multilinear rank r is upper semi-continuous closednessproperty.

I HOSVD – simple and robust algorithm to obtain quasi-optimallow-rank approximation.

I quasi-optimality good enough for most applications in scientificcomputing.

I robust black-box algorithms/software available (e.g., TensorToolbox).

Drawback:Storage of core tensor ∼ rd

curse of dimensionality

61

Tensor network diagramsTensor network = undirected graph with:

I each node is a tensor;I each outgoing edge is a mode;I each connected edge represents a contraction; example:

Zi1,i2,i3,i4 =r∑

j=1

Xi1,i2,jYj,i3,i4 .2

13 1

2

3

I number of free edges = order of tensor represented by entirenetwork

Researchers on quantum many-body problems think2 in terms oftensor networks!

2and dream62

Tensor network diagramsExamples:

1 2

3 3

1 2 1 2

2 2 2 2

1 1 1 1

1 11

1

22

2

(v)(i) (ii) (iii) (iv)

(i) vector;(ii) matrix;(iii) matrix-matrix multiplication;(iv) Tucker decomposition;(v) hierarchical Tucker decomposition.

63

Low-rank tensors:Hierarchical Tucker

I Intro of Hierarchical Tucker Decomposition (HTD)I MATLAB toolbox htuckerI Basic operations: µ-mode matrix multiplication, addition, . . .I Advanced Operations: inner product, elementwise multiplication,. . .

64

IntroductionI CP offers low data complexity but difficult truncation;I Tucker offers simple truncation but high data complexity.

Recently developed formats:I Matrix Product State (MPS),I TT decomposition,I Hierarchical Tucker decomposition (HTD).

Aim to offer compromise between CP and Tucker.

Focus in this lecture: HTD.I L. Grasedyck. Hierarchical singular value decomposition of tensors.

SIAM J. Matrix Anal. Appl., 31(4):2029–2054, 2010.I W. Hackbusch and S. Kühn. A new scheme for the tensor

representation. J. Fourier Anal. Appl., 15(5):706–722, 2009.I D. Kressner and C. Tobler. htucker – A MATLAB toolbox for the

hierarchical Tucker decomposition. In preparation. Seehttp://www.math.ethz.ch/~ctobler.

65

http://www.math.ethz.ch/~ctobler

More general matricizationsRecall: µ-mode matricization for tensor X ,

X (µ) ∈ Rnµ×(n1···nµ−1nµ+1···nd ), µ = 1, . . . ,d .

It is getting ugly...

General matricization for mode de-composition 1, . . . ,d = t ∪ s:

X (t) ∈ R(nt1 ···ntk )×(ns1 ···nsd−k )

with(X (t)

)(it1 ,...,itk ),(is1 ,...,isd−k )

:= Xi1,...,id .

X

X (1)

X (1,2)

66

Hierarchical constructionSingular value decomposition: X (t) = Ut ΣtUT

s .

Column spaces are nested

t = t1 ∪ t2 ⇒ span(Ut ) ⊂ span(Ut2 ⊗ Ut1 )

⇒ ∃Bt : Ut = (Ut2 ⊗ Ut1 )Bt .

Size of Ut :Ut ∈ Rnt1 ···ntk×rt with rt = rank

(X (t)).

For d = 4:

U12 = (U2 ⊗ U1)B12

U34 = (U4 ⊗ U3)B34

vec(X ) = X (1234) = (U34 ⊗ U12)B1234

⇒ vec(X ) = (U4 ⊗ U3 ⊗ U2 ⊗ U1)(B34 ⊗ B12)B1234.

67

Dimension treeTree structure for d = 4:

B12

U1

U2

U3

U4

B34

B1234(n2 × r2)

(n3 × r3)

(n4 × r4)

(n1 × r1)

(r1r2 × r12)(r1r2 × r12)

(r3r4 × r34)

(r12r34 × 1)

Reshape:

B12 ∈ Rr1r2×r12 ⇒ B12 ∈ Rr1×r2×r12

B34 ∈ Rr3r4×r34 ⇒ B34 ∈ Rr3×r4×r34

B1234 ∈ Rr12r34×1 ⇒ B1234 ∈ Rr12×r34

68

Dimension tree

B34

B12

U4

U3

U2

U1

B1234

I Often, U1,U2,U3,U4 are orthonormal. This is advantageous butnot required.

I Storage requirements for general d :

O(dnr) +O(dr3),

where r = maxrt, n = maxnµ.69

Constructors for MATLAB class htensor

x = htensor([4 5 6 7]) constructs zero htensor of size4× 5× 6× 7, with a balanced dimension tree.

x = htensor([4 5 6 7], ’TT’) constructs zero htensorof size 4× 5× 6× 7, with a TT-style dimension tree.

x = htensor(U1, U2, U3) constructs htensor fromtensor in CP decomp X (i1, i2, i3) =

∑j U1(i1, j)U2(i2, j)U3(i3, j).

x = htenrandn([4 5 6 7]) constructs htensor of size4× 5× 6× 7, with random ranks and random entries.

x = htenones([4 5 6 7]) constructs htensor of size4× 5× 6× 7, with all entries one.

...

70

Basic functionality for MATLAB class htensorExample: x is in htensor of order 4.

x(1, 3, 4, 2) returns entry of X .x(1, 3, :, :) returns slice of X as an htensor.full(x) returns full tensor represented by X . (use with care)

disp_tree(htenrand([5 4 6 3])) returns treestructure/ranks:

ans is an htensor of size 5 x 4 x 6 x 31-4 1; 6 3 11-2 2; 3 4 6

1 4; 5 32 5; 4 4

3-4 3; 3 3 33 6; 6 34 7; 3 3

spy(x) displays spy plots of Ut ,Bt , on the dimension tree.change_root(x, i) switches root node.

71

Singular value treeplot_sv(x) plots singular values of corresponding matricizations inthe dimension tree of a tensor X .

Example: Singular value tree of solution to elliptic PDE with 4parameters.

Dim. 1, 2 Dim. 3, 4, 5

Dim. 1 Dim. 2 Dim. 3 Dim. 4, 5

Dim. 4 Dim. 5

Remark: Singular values are computed from Gramians. 72

Basic ops: µ-mode matrix multiplicationApplication of matrix A ∈ Rm×nµ to mode µ of X ∈ Rn1×···×nd :

Y = A µ X ⇔ Y (µ) = AX (µ).

Nearly trivial if X is in H-Tucker format:

A µ X = A µ((U1, . . . ,Ud ) C

)= (U1, . . . ,Uµ−1,AUµ,Uµ+1, . . . ,Ud ) C

I Almost no operations required.I Ranks stay the same.I Orthogonality destroyed.

ttm(x, A, 2) applies matrix A to htensor X in mode 2.y = ttm(x, A, B, C, [2, 3, 4])y = ttm(x, @(x)(fft(x)), 2) applies FFT in mode 2.y = ttm(x, A, B, C, [2, 3, 4], ’h’) successivelyapplies matrices AT , BT , CT in modes 2,3,4.

73

Addition of low-rank matricesAddition of two matrices in low-rank format:

A = U1ΣAUT2 , B = V1ΣBV T

2

⇒A + B =

[U1 V1

] [ ΣA 00 ΣB

] [U2 V2

]TI No operations required.I Rank increases.I Orthogonality destroyed.

74

Addition of low-rank tensorsAddition of four tensors X1,X2,X3,X4 in H-Tucker format:

X1 + X2 + X3 + X4.

Proceed as in matrix case by embedding factors in larger matrices.I No operations required.I H-Tucker rank increases.I Orthogonality destroyed.

Command in htucker: x1 + x2 + x3 + x4

75

U [4]1

U [4]2

U [4]3

U [4]4

B[1]12B[2]

12B[3]

12B[4]

12

B[1]34B[2]

34B[3]

34B[4]

34

B[1]1234B[2]

1234

B[3]1234B[4]

1234

U [3]1U [2]

1U [1]1

U [3]3

U [3]2U [2]

2

U [2]3U [1]

3

U [1]2

U [3]4U [2]

4U [1]4

76

OrthogonalizationAny tensor X in H-Tucker format can be orthogonalized in the sensethat all factors in the dimension tree, except for the root node, containorthonormal columns.

Example: vec(X ) = (U4 ⊗ U3 ⊗ U2 ⊗ U1)(B34 ⊗ B12)B1234.

Step 1: QR decompositions Ut = QtRt

vec(X ) = (Q4 ⊗Q3 ⊗Q2 ⊗Q1)(B34 ⊗ B12)B1234

with B34 := (R4 ⊗ R3)B34, B12 := (R2 ⊗ R1)B12.

Step 2: QR decompositions B34 = Q34R34, B12 = Q12R12

vec(X ) = (Q4 ⊗Q3 ⊗Q2 ⊗Q1)(Q34 ⊗Q12)B1234

with B1234 := (R34 ⊗ R12)B1234.

Compt. requirements for general d : O(dnr2) +O(dr4).

Command in htucker: x = orthog(x)

77

Norms and inner productsInner product of two tensors X ,Y ∈ Rn1×···nd :

〈X ,Y〉 = 〈vec(X ), vec(Y)〉 =

n1∑i1=1

· · ·nd∑

id =1

xi1,...,id yi1,...,id .

Can be performed efficiently in H-Tucker, provided that X ,Y havecompatible dimension trees.

Example: Two tensors of order 4

〈X ,Y〉 = (Bx1234)T (Bx

34 ⊗ Bx12)T (Ux

4 ⊗ Ux3 ⊗ Ux

2 ⊗ Ux1 )T · · ·

· · · (Uy4 ⊗ Uy

3 ⊗ Uy2 ⊗ Uy

1 )(By34 ⊗ By

12)By1234

Norm: After X has been orthogonalized:

‖X‖ =√〈X ,X〉 = ‖Bx

12···d‖F .

Possibly most accurate way to compute norm. Used in norm(x).

78

Computation of inner products

〈X ,Y〉 =

n1∑i1=1

· · ·nd∑

id =1

xi1,...,id yi1,...,id .

79


80


81


82


83

Computation of inner products – contraction step

(Bxt )

T

(Uxt2)

T Uyt2(Ux

t1)T Uy

t1

Byt

(Uxt )T Uy

t = (Bxt )T ((Ux

t2 )T Uyt2 ⊗ (Ux

t1 )T Uyt1

)By

t .

I htucker command: innerprod(x,y)I Overall cost: O(dnr2) +O(dr4).

84

Reduced Gramians in H-Tucker

t

Ut

Gt

t

Ut

X (t) = UtV Tt ⇒ X (t)(X (t))T = Ut V T

t Vt︸︷︷︸=:Gt

UTt

If Ut orthonormal svd(X (t)

)=√

eig(Gt ) (used in plot_sv).85


86


87


88


89


90


Implemented in htucker command gramians(x).

91

Advanced operationsI TruncationI Combined addition + truncationI Elementwise multiplicationI Elementwise reciprocal

92

Truncation of explicit tensorLet X ∈ Rn1×n2×···×nd be explicitly given.

I For each tree node t , let Wt contain rt dominant left singularvectors of X (t) and define projection

πtX = WtW Tt t X ⇔ πtX (t) = WtW T

t X (t).

I Truncated tensor:

X :=( ∏

t∈TL

πt

)· · ·( ∏

t∈T1

πt

)X ,

where T` contains all nodes on level `.I [Grasedyck’2010]: ‖X − X‖ ≤

√2d − 3 ‖X − Xbest‖.

Proof similar as for HOSVD.

93

Truncation of explicit tensorExample:

vecX = (W4W T4 ⊗W3W T

3 ⊗W2W T2 ⊗W1W T

1 )(W34W T34 ⊗W12W T

12)vecX= (W4 ⊗W3 ⊗W2 ⊗W1) · · ·

([W T4 ⊗W T

3 ]W34︸︷︷︸=:B34

⊗ [W T2 ⊗W T

1 ]W12︸︷︷︸=:B12

) ([W T34 ⊗W T

12]vecX )︸︷︷︸=:B1234

.

opts.max_rank = 10 maximal rank at truncation.opts.rel_eps = 1e-6 maximal relative truncation error.opts.abs_eps = 1e-6 maximal absolute truncation error.Condition max_rank takes precedence over rel_eps andabs_eps.xt = htensor.truncate_rtl(x, opts) returns truncatedtensor X of a multidimensional array.

Remark: There is also a significantly fasterhtensor.truncate_ltr (proceeds successively from leafs toroots), for which the same error bound holds [Tobler’10].

94

Truncation of H-Tucker tensorLet X ∈ Rn1×n2×···×nd be in H-Tucker format and orthogonalized.

I Compute left singular vectors of X (t) = UtV Tt from eigenvectors

ofX (t)(X (t))T

= Ut V Tt Vt︸︷︷︸=Gt

UTt ,

with reduced Gramian Gt .If St contains rt dominant eigenvectors of Gt Wt = UtSt .

I Traverse tree from root to leafs. In each step:

Btp

StSTt

Bt

Bt

Btp

STt

St

STt Btp

St Bt

I In htucker: truncate(x,opts). Complexity O(dnr2 + dr4).

95

Combined addition + truncationSum of more than two tensors:

Y = X1 + X2 + · · ·+ Xs.

Two possibilities to incorporate truncation operator T :1. Y ≈ T (X1 + X2 + X3 + · · ·+ Xs)

2. Y ≈ T (· · · (T (T (X1 + X2) + X3) + · · ·+ Xs)

Option 2 is usually significantly cheaper but may suffer from severecancellation.

Artificial example: X1,X2,X3 ∈ R101×101×101 truncated tensor griddiscretizations for summands of

f (x1, x2, x3) = tan(x1 + x2 + x3) + (x1 + x2 + x3)−1 − tan(x1 + x2 + x3).

Error(Option 1) ≈ 10−7. Error(Option 2) ≈ 1.3.

What is wrong with Option 1?

96

Combined addition + truncation

U [4]1

U [4]2

U [4]3

U [4]4

B[1]12B[2]

12B[3]

12B[4]

12

B[1]34B[2]

34B[3]

34B[4]

34

B[1]1234B[2]

1234

B[3]1234B[4]

1234

U [3]1U [2]

1U [1]1

U [3]3

U [3]2U [2]

2

U [2]3U [1]

3

U [1]2

U [3]4U [2]

4U [1]4

I Orthogonalization (needed before truncation) destroys blockdiagonal structure.

I Complexity O(dns2r2 + ds4r4) for s summands.

97

Combined addition + truncationIdea: New variant delays orthogonalization to keep block diagonalstructure in transfer tensors as long as possible.

Reduces O(dns2r2 + ds4r4) to O(dns2r2 + ds2r4 + ds3r3)

100

101

10−2

10−1

100

101

102

Number of summands

Run

time

[s]

time truncate stdtime truncate sumtime truncate succ.O(t4)O(t2)O(t)

I htucker command: add_truncate(x1 x2 x3 x4, opts).

98

Elementwise multiplicationElementwise multiplication (also called Hadamard or Schur product)of two low-rank matrices A = U1ΣAUT

2 ,B = V1ΣBV T2 :

A ? B = (U1 V1)(ΣA ⊗ ΣB)(U2 V2)T ,

with the row-wise Khatri-Rao product

C D =

cT1...

cTn

dT

1...

dTn

=

cT1 ⊗ dT

1...

cTn ⊗ dT

n

I Orthogonality destroyed.I Rank increases significantly.

But: singular value decay of ΣA ⊗ ΣB may become significantlystronger additional opportunities for truncation.

99

Elementwise multiplicationElementwise multiplication of two tensors X ,Y in H-Tucker format:

I Row-wise Khatri-Rao product of leaf matrices.I “Kronecker product” of non-leaf tensors.I Optional: Products are only formed after suitable truncation to

avoid excessive memory requirements.Commands in htucker:x.*y (without truncation)x.ˆ2 (without truncation)elem_mult( x, y, opt ) (with truncation)

100

Elementwise reciprocalGoal: Compute reciprocal of each entry in tensor X .

Basic idea: Newton-Schultz iteration

y0 = 1, yi+1 = yi + yi (1− x yi ), (2)

converges to 1/x for 0 < x < 2.

Apply (2) simultaneously to all entries.

Code snippet of elem_reciprocal( x, opt ) in htucker:

all_ones = htenones(size(x));y = all_ones;for it=1:maxit

xy = elem_mult( x, y );xy = truncate( all_ones - xy );xy = elem_mult( xy, y );y = truncate( y + xy );

end

See also [Oseledets et al. 2009].101

Elementwise reciprocalExample: (x1 + x2 + x3 + x4)−1 with xi ∈ [10−3,1].

c = laplace_core(4);U = [ones(100, 1), linspace(1e-3, 1, 100)’];x = ttm(c, U, U, U, U);inv_x = elem_reciprocal(x, opts);

0 2 4 6 8 10 1210

−5

100

||y*x

k −

1||/

||1||

Convergence of ‖X ? Yk − 1‖.

Dim. 1, 2

Dim. 3, 4

Dim. 1

Dim. 2

Dim. 3

Dim. 4

Singular value tree upon conver-gence.

102

SummaryI HTD offers good compromise between CP and Tucker.I Algorithms often quite technical but conceptually simple.I Computational complexity ∼ d but often ∼ r4:

Curse of dimensionality ⇒ curse of rank ?I Important to keep in mind:

Unless d is tiny, tensor X can/should never be formed explicitly.

All operations need to be performed implicitly in HTD.

Can pose severe problems even for seemingly simple operations:min(X ), max(X ), abs(X ), 1./X , . . .

103

104

Algorithms based onlow-rank tensors

I Inexact LOBPCGI ALS / MALS

105

Strategies for solving tensor equationsI In many practical situations, tensor X is given implicitly as

solution to linear system A(X ) = B, eigenvalue problemA(X ) = λX , nonlinear system, ODE, . . .

Two main strategies to use low-rank tensor techniques:1. Combine existing iterative solver (e.g., CG, LOBPCG, GMRES)

with repeated low-rank truncation of iterates ( inexact CG).I Straightforward to derive and implement (based, e.g., onhtucker).

I Hard to analyze impact of nonnegligible truncations on accuracyand convergence.

I Intermediate rank growth may result in excessive computing timesand/or harm accuracy+convergence.

2. Formulate optimization problem, constrain to low-rank tensors,iteratively optimize wrt individual factors of low-rank format.

I Works well in practice.I Convergence theory not well understood.I Not straightforward to implement.

106

Example: PDE-eigenvalue problemGoal: Compute smallest eigenvalue for

∆u(ξ) + V (ξ)u(ξ) = λu(ξ) in Ω = [0,1]d ,u(ξ) = 0 on ∂Ω.

Assumption: Potential represented as

V (ξ) =s∑

j=1

V (1)j (ξ1)V (2)

j (ξ2) · · ·V (d)j (ξd ).

finite difference discretization

Au = (AL +AV )u = λu,

with

AL =d∑

j=1

I ⊗ · · · ⊗ I︸︷︷︸d−j times

⊗AL ⊗ I ⊗ · · · ⊗ I︸︷︷︸j−1 times

,

AV =s∑

j=1

A(d)V ,j ⊗ · · · ⊗ A(2)

V ,j ⊗ A(1)V ,j .

107

LOBPCG methodLOBPCG with block size 1 [Knyazev’01] for computing smallesteigenvalue of

Ax = λx , A symmetric.

λ0 = 〈x0, x0〉A, p0 = 0for k = 0,1, . . . (until converged) do

rk = B−1(Axk − λk x)U =

[xk , rk , pk

]A = UT AU, M = UT UFind eigenpair (λk+1, y), with ‖y‖2 = 1, for smallest eigenvalueof matrix pencil A− λM.pk+1 = y2 · rk + y3 · pkxk+1 = y1 · xk + pk+1xk+1 ← xk+1/‖xk+1‖2

end forReturn (λmin, x) = (λk+1, xk+1).

108

Tensor low-rank LOBPCGTruncated LOBPCG with block size 1 for computing smallesteigenvalue of

A(X ) = λX , A symmetric, X tensor.

λ0 = 〈X0,X0〉A, P0 = 0 · Xfor k = 0,1, . . . (until converged) doRk = B−1(A(Xk )− λkXk ), Rk ← T (Rk )U1 = Xk , U2 = Rk , U3 = PkAij = 〈Ui ,Uj〉A, Mij = 〈Ui ,Uj〉Find eigenpair (λk+1, y), with ‖y‖2 = 1, for smallest eigenvalueof matrix pencil A− λM.Pk+1 = y2 · Rk + y3 · Pk Pk+1 ← T (Pk+1)Xk+1 = y1 · Xk + Pk+1 Xk+1 ← T (Xk+1)Xk+1 ← Xk+1/

√〈Xk+1,Xk+1〉

end forReturn (λmin,X ) = (λk+1,Xk+1).

T = truncation to hierarchical low rank

109

Implementation details

OrthogonalizationIn standard LOBPCG, orthogonalization of U is recommended[Knyazev 2010]. This is not practical with low-rank tensors, as rankswould grow and truncation would destroy orthogonality.

TruncationXk ,Rk ,Pk are truncated in every step. Moreover, application of A(·)and preconditioner B−1(·) may involve truncation during theapplication of these operators.

Inner productReduced matrix A is very sensitive to truncation in A(·). Thecomputation of Ai,j = 〈Ui ,Uj〉A must be exact.

110

Numerical Experiments - Sine potential

PDE-eigenvalue problem with Ω = [0, π]d and sine potential

V (ξ) = q ·d∏

i=1

sin(ξi )

for some constant q > 0. We choose d = 10, n = 128.

Preconditioner: [Grasedyck 2004]

A−1L =

∫ ∞0

exp(−tAL)dt

≈M∑

j=−M

ωj exp(−αjA(d)L )⊗ · · · ⊗ exp(−αjA

(1)L ) =: B−1,

for a certain, optimized and tabulated choice of coefficients αj , ωj > 0.We choose M = 10.

111

Numerical Experiments - Sine potential

q = 1

0 10 20 30 4010

−8

10−6

10−4

10−2

100

102

104

Re

sid

ua

l

Iterations

0 10 20 30 400

10

20

30

40

50

Ma

xim

al ra

nk

eps 1e−2

eps 1e−4

eps 1e−8

q = 1000

0 10 20 30 4010

−8

10−6

10−4

10−2

100

102

104

Re

sid

ua

l

Iterations

0 10 20 30 400

10

20

30

40

50

Ma

xim

al ra

nk

eps 1e−2

eps 1e−4

112

ALSOriginally from computational quantum physics [Schollwöck 2011],recently investigated by [Huckle et al. 2010; Oseledets, Khoromskij2010; Holtz et al. 2010; Dolgov, Oseledets 2011]

Goal:

min 〈X ,A(X )〉〈X ,X〉

: X ∈ H-Tucker((rt )t∈T

), X 6= 0

Method: Choose one node t , fix all other nodes, set new tensor atnode t to minimize Rayleigh quotient 〈X ,A(X )〉

〈X ,X〉 . This is done for allnodes (a sweep), and sweeps are continued until convergence.

Sketch:

X (t) = UtV Tt =

(Utr ⊗ Utl

)BtV T

t ,

vec(X ) =(Vt ⊗ Utr ⊗ Utl

)vec(Bt ) = Ut vec(Bt ).

⇒ min

yT (UTt AUt )y

yT (UTt Ut )y

: y ∈ Rrtl rtr rt , y 6= 0.

113

Computation of reduced matrices

Consider A = Ad ⊗ · · · ⊗ A1 (Other operators can be treated similarly)

Compute

At := UTt AUt =

(Vt ⊗ Utr ⊗ Utl

)TA(Vt ⊗ Utr ⊗ Utl

)= At ⊗ Atr ⊗ Atl ,

where

Atl = UTtl

(⊗i∈tl

Ai

)Utl , Atr = UT

tr

(⊗i∈tr

Ai

)Utr , At = V T

t

(⊗i 6∈t

Ai

)Vt .

Additionally

Mt := UTt Ut = V T

t Vt ⊗ UTtr Utr ⊗ UT

tl Utl = Mt ⊗Mtr ⊗Mtl ,

114

Computation of reduced matrices

A1 A3 A5 A6 A7 A8A2 A4

A12 A34

A1234

115

MALS

Method:I Select edge of tensor network.I Combine tensors at the adjacent nodes to form a higher-order

tensor.I Set this tensor to minimize the Rayleigh quotient.I Use low-rank approximation to split new combined tensor into

two tensors at adjacent nodes of selected edge.

116

MALS - Illustration

117

Numerical Experiments – Sine potential

PDE-eigenvalue problem with Ω = [0, π]d and sine potential

V (ξ) = q ·d∏

i=1

sin(ξi )

for some constant q > 0. Choose d = 10, n = 128, q = 1000.Preconditioner: [Grasedyck 2004]

A−1L =

∫ ∞0

exp(−tAL)dt

≈M∑

j=−M

ωj exp(−αjA(d)L )⊗ · · · ⊗ exp(−αjA

(1)L ) =: B−1,

for a certain, optimized choice of coefficients αj , ωj > 0. We chooseM = 10.

118

Numerical Experiments – Sine potential

ALS

0 100 200 300 400 50010

−15

10−10

10−5

100

105

Execution time [s]

0 100 200 300 400 50015

20

25

30

35

40

45err_lambda

res

nr_iter

Hierarchical ranks 40.

MALS

0 100 200 300 400 50010

−15

10−10

10−5

100

105

Execution time [s]

0 100 200 300 400 5000

20

40

60

80

100err_lambda

res

eps

rank

nr_iter

Maximal hierarchical rank 30.

119

Conclusions and Outlook

120

Conclusions and OutlookI Scientific computing with low-rank tensors rapidly evolving field

and highly technical.I Precise scope of applications far from clear; many applications

remain to be explored. More analysis and comparison toalternative techniques (sparse grids, adaptive tensordiscretization, Monte Carlo, . . .) needed.

Some current trends:I Tensorization of vectors + low rank (discrete Chebfun?) by

Hackbusch, Khoromskij, Oseledets, Tyrtishnikov, . . .I Computational differential geometry on low-rank tensor manifolds

by Koch, Lubich, Schneider, Uschmajew, Vandereycken, . . .I Robust low rank (Candes et al.) for tensors suitable way of

dealing with singularities?I . . .

Acknowledgments: Presentation heavily benefited from joint workwith Christine Tobler (ETH Zurich).

121

Documents

Low-Rank Tensor Techniques for High-Dimensional …Low-Rank Tensor Techniques for High-Dimensional Problems Daniel Kressner CADMOS Chair for Numerical Algorithms and HPC MATHICSE,