64
Numerical Linear Algebra A biased survey Daniel Kressner Chair of Numerical Algorithms and HPC MATHICSE / SMA / SB / EPF Lausanne daniel.kressner@epfl.ch http://anchp.epfl.ch Banff, 7.10.2014

Daniel Kressner Chair of Numerical Algorithms and HPC ...the format IMost frequently used in applications featuring dense matrices: integral operators with nonlocal kernel. IHSS matrices/H2

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

  • Numerical Linear AlgebraA biased survey

    Daniel Kressner

    Chair of Numerical Algorithms and HPCMATHICSE / SMA / SB / EPF Lausanne

    [email protected]://anchp.epfl.ch

    Banff, 7.10.2014

    http://anchp.epfl.ch

  • Purpose of this talk

    I give a flavor of the general field of numerical linear algebra 1

    I discuss the role of sparsity1

    I discuss role of optimization in numerical linear algebra1

    I point out some exciting new developments1

    1according to the strongly biased/limited view of the speaker

  • Outline

    I Numerical Linear AlgebraI general principlesI the beauty of black boxesI the beauty of error analysis

    I Numerical Linear Algebra and Sparsity and OptimizationI sparse and data-sparse matricesI sparse and data-sparse vectors

  • General principles

  • Numerical linear algebra

    Typical tasks for a matrix A:

    I Linear systemsAx = b

    I Eigenvalue problemsAx = λx

    I Matrix functions

    exp(A), log(A),√

    A, sign(A), . . .

    Recurring principle: Exploitation of structure.

    Well-established: Structure in A (sparsity, symmetry, low-rank, . . .)

    Current trends: Structure in x (sparsity, low-rank, incorporation ofinformation from underlying application . . .)

  • Numerical linear algebra in scientific computing...

    Application

    Mathematical Model

    Discretization /Linearization

    Linear Algebraproblem

    Picture taken from http://en.wikipedia.org/wiki/Food_chain

    http://en.wikipedia.org/wiki/Food_chain

  • ...a fundamental component

    Numerical linear algebraI often dominates computational time in scientific computing imposes limitations on model complexity, discretizationaccuracy, . . .

    I transfers knowledge across different disciplinesI deeply dives into algorithmic design and analysisI provides black-box solversI provides software libraries (LAPACK) at the heart of virtually

    every scientific computing package (Maple, Mathematica,NumPy, MATLAB, R, Trilinos)

    I is at the forefront of high-performance computing (LINPACKbenchmark)

  • Software development – a recent example

    2005

    2009

    2012

    before 2005. Eigenvalue solver in ScaLAPACK 1.x knownto be slow and buggy.

    2005–2009. Algorithmic developments (pipelined multi-shifts, parallel aggressive early deflation) and researchcode. Jointly with R. Granat and B. Kågström.Main publication: A novel parallel QR algorithm for hybriddistributed memory HPC systems. SIAM J. Sci. Comput.,32(4):2345–2378, 2010.

    2009–2011. Further algorithmic improvements, debug-ging, bug fixing, documentation, testing/benchmarking,incorporation of feedback from Intel and others. Effortsjoined by PhD student M. Shao.

    November 2011. Release of production code as routinePxHSEQR in ScaLAPACK 2.0.

    2012 –∞. Code maintenance.

    201? ACM TOMS software publication.

  • Software development – a recent example

    Old eigenvalue solver in ScaLAPACK 1.x vs.New eigenvalue solver (research code) vs.New eigenvalue solver in ScaLAPACK 2.0 (production code)

    ScaLAPACK 1.x 2009 ScaLAPACK 2.00

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    1.6

    1.8

    2x 10

    4

    16755 sec

    575 sec 320 sec

    2009 ScaLAPACK 2.00

    0.5

    1

    1.5

    2

    2.5

    3x 10

    4

    7 hours

    2 hours

    16 000× 16 000 matrix 100 000× 100 000 matrixon 100 cores1 on 1 024 cores

    1Intel Xeon quadcore L5420 2.5 GHz nodes

  • The beauty of black boxesNo need to care about what is inside.

  • A black box

    eig Eigenvalues and eigenvectors.E = eig(A) produces a column vector E containingthe eigenvalues of a square matrix A....

    Underlying variant of QR algorithm:I Braman/Byers/Mathias: The multishift QR algorithm. Part I: Maintaining well-focused shifts

    and level 3 performance. SIAM J. Matrix Anal., 2002.I Braman/Byers/Mathias: The multishift QR algorithm. Part II: Aggressive early deflation.2

    SIAM J. Matrix Anal., 2002.

    2Awarded SIAG/Linear Algebra + SIAM outstanding paper prizes.

  • Not a black box

    eig Eigenvalues and eigenvectors.E = eig(A,abstol,reltol,maxit,ns,aed,adhoc,...) producesa column vector E containing the eigenvalues of a squarematrix A, where

    - abstol is the tolerance for declaring eigenvalues converged

    - reltol is a relative convergence criterion, which may ormay not improve accuracy. Choose at your own risk.Wilkinson [quote from Stewart’2002]: "... if you want to argueabout it, I would rather be on your side."

    - ns is the number of shifts used in every iterationChoose carefully! The correct choice depends on your machineas well as on your application!

    - aedsize is the size of the aggressive early deflation windowChoose carefully! The correct choice depends on your machineas well as on your application!

    - adhoc is a magic number used when things start falling apart

    - ...

  • The beauty of error analysis

  • Arnoldi/Lanczos/CG in a nutshell

    I Assume A allows for cheap matrix-vector multiplications .I For starting vector x0, consider Krylov subspace

    Kk (A, x0) := span{x0,Ax0,A2x0, . . . ,Ak−1x0}.

    I Arnoldi method = Gram-Schmidt + twist applied to Kk (A, x0).I Produces orthonormal basis Uk ∈ Rn×k of Kk (A, x0).I Arnoldi decomposition:

    AUk = Uk Hk + rank 1,

    where Hk is k × k Hessenberg.Symmetric A I Hk tridiagonalI three-term recurrence Lanczos methodI no need to store full basis UkI CG/Lanczos = Galerkin with Kk (A, x0)

  • Numerical example

    Lanczos applied to symmetric matrix with isolated eigenvalue:

    0 20 40 60 80 10010

    −20

    10−10

    100

    1010

    ‖UTk Uk − I‖2

    0 20 40 60 80 10010

    −20

    10−10

    100

    1010

    Convergence of 3 largest Ritzvalues

    Total loss of orthogonality of Uk in finite-precision arithmetic Total failure of CG/Lanczos?

  • Error analysis

    Paige’1976/1980: Lanczos performed in finite-precision arithmeticyields perturbed Arnoldi decomposition

    AUk = Uk Hk + rank 1 +4Uk , ‖4Uk‖ = O(machine precision)

    I Starting point for deep understanding of behavior of Lanczos infinite precision arithmetic.

    I Subsequent results by Greenbaum, Meurant, Strakoš, Wülling,Zemke, and others.

    I Restores reputation of several properties that hold in exactarithmetic, but there are subtle differences!

    I Survey [Meurant/Strakoš’2006] and book [Meurant’2006].

  • Data-sparse matrices

  • A tridiagonal matrix

    A =

    0 20 40

    0

    10

    20

    30

    40

    50

    nz = 148

    Exact matrix sparsity is a shy deer.

  • Inverse of tridiagonal matrix

    A−1 =

    0 20 40

    0

    10

    20

    30

    40

    50

    nz = 2500

    I Explicit inverse needed, e.g., in sparse covariance matrixestimation.

    I More common situation: Inverse implicitly present in Schurcomplements, e.g., direct sparse factorizations / signalprocessing on graphs.

  • Inverse of symmetric tridiagonal matrix, cond(A) ≈ 3

    A−1 =

    0 10 20 30 40 50

    0

    10

    20

    30

    40

    50

    |white entries| ≤ 10−15

    I Classical result [Demko/Moss/Smith’1984] for banded matrices:Exponential decay of [A−1]ij wrt |i − j |. Decay rate for pos. def.:√

    cond(A)− 1√cond(A) + 1

    I Proof based on polynomial approximation of 1/x on [λmin, λmax].

  • Inverse of symmetric tridiagonal matrix, cond(A) ≈ 5

    A−1 =

    0 10 20 30 40 50

    0

    10

    20

    30

    40

    50

    |white entries| ≤ 10−15

    I Classical result [Demko/Moss/Smith’1984] for banded matrices:Exponential decay of [A−1]ij wrt |i − j |. Decay rate for pos. def.:√

    cond(A)− 1√cond(A) + 1

    I Proof based on polynomial approximation of 1/x on [λmin, λmax].

  • Inverse of symmetric tridiagonal matrix, cond(A) ≈ 40

    A−1 =

    0 10 20 30 40 50

    0

    10

    20

    30

    40

    50

    |white entries| ≤ 10−15

    I Classical result [Demko/Moss/Smith’1984] for banded matrices:Exponential decay of [A−1]ij wrt |i − j |. Decay rate for pos. def.:√

    cond(A)− 1√cond(A) + 1

    I Proof based on polynomial approximation of 1/x on [λmin, λmax].

  • Inverse of symmetric tridiagonal matrix, cond(A) ≈ 103

    A−1 =

    0 10 20 30 40 50

    0

    10

    20

    30

    40

    50

    |white entries| ≤ 10−15

    I Classical result [Demko/Moss/Smith’1984] for banded matrices:Exponential decay of [A−1]ij wrt |i − j |. Decay rate for pos. def.:√

    cond(A)− 1√cond(A) + 1

    I Proof based on polynomial approximation of 1/x on [λmin, λmax].

  • Inverse of symmetric tridiagonal matrix, cond(A) ≈ 103

    A−1 =

    0 10 20 30 40 50

    0

    10

    20

    30

    40

    50

    |white entries| ≤ 10−15Extensions:I More general graph structures: decay wrt distance between two

    nodes [Benzi/Razouk’2007].I 2D structures [Canuto/Simoncini/Verani’2014].I Operator algebra framework [Bickel/Lindner’2012].

  • Inverse of symmetric tridiagonal matrix, cond(A) ≈ 103

    A−1 =

    0 10 20 30 40 50

    0

    10

    20

    30

    40

    50

    I Each off-diagonal block has rank 1.+ Nestedness of singular vectors.= semi-separable matrix (represented by 2 vectors of length n)

    2 books by [Vandebril/Van Barel/Mastronardi’2007/2008].

  • Hierarchical matrices

    I cluster tree of column/row indices block partitioning

    I each admissible block replaced bylow-rank matrix using, e.g., adaptive crossapproximation

    I clustering/admissibility decided viaanalytic properties (smoothness of kernel)or topological properties (graphclustering)

    I LU factorization, inversion, QRfactorization, ... can be performed withinthe format

    I Most frequently used in applications featuring dense matrices:integral operators with nonlocal kernel.

    I HSS matrices/H2 matrices= Hierarchical matrices + nestedness of low-rank factors.O(n log n) storage O(n) storage

    I Books by Bebendorf, Börm, and Hackbusch.

  • Hierarchical matrices: Current trendsI Inject additional analytical information to limit ranks:

    Gillman, Greengard, Hao, Martinsson, Zorin, ...I Randomized algorithms for low-rank compression:

    Chiu, Demanet, Greengard, Martinsson, ...I Parallel implementation:

    Dongarra et al., Keyes et al., Kriemann, Li et al., ...I Combination with sparse LU factorization:

    Chandrasekaran, Li, Xia, ...I Application to machine learning:

    kernel methods [Si/Hsieh/Dhillon’2014], covariance matrixestimation [Ballani/DK’2014].

  • Data-sparse vectors

  • SettingI Linear system or eigenvalue problem

    Ax = b, Ax = λx ,

    with x ∈ Rn.I Very large/HUGE n requires exploiting additional info beyond

    structure of A.

    Exploit compressibility of x .

    Examples:I stable wavelet/frame discretizations of PDEs approximate sparsity

    I Lyapunov matrix equations in control theory/model reduction approximate low (matrix) rank

    I discretizations of certain high-dimensional PDEs approximate low (tensor) rank

  • General algorithms

    1. Iterate and compress:I Combine existing iterative solver with compression of iterates.I Analyzed only for stationary iterations.I Examples:

    I Adaptive wavelet method by Cohen/Dahmen/DeVore’2001/2002.I Power method+low tensor rank by Beylkin/Mohlenkam’2002.I Richardson iteration+low tensor rank for stochastic PDEs by

    Khoromskij/Schwab’2011.I CG/BiCGstab+low tensor rank for parametric PDEs by

    DK/Tobler’2011.I Combination of adaptive wavelet+low tensor rank by

    Bachmayr/Dahmen’2014.

  • General algorithms

    2. Constrain and optimize:I Reformulate linear algebra problem as optimization problem.I Constrain admissible set to compressed vectors.I Works best for low-rank structures.

    Often much more efficient than iterate+compress.I Allows for greedy strategies.

    3. Specialized algorithms:

    I Low-rank solvers for Lyapunov and Riccati equations;see [Benner/Saak’2013], [Simoncini’2014] for surveys.

    I . . .

  • Example: PDE-eigenvalue problemGoal: Compute smallest eigenvalue for

    ∆u(ξ) + V (ξ)u(ξ) = λu(ξ) in Ω = [0,1]d ,u(ξ) = 0 on ∂Ω.

    Assumption: Potential represented as

    V (ξ) =s∑

    j=1

    V (1)j (ξ1)V(2)j (ξ2) · · ·V

    (d)j (ξd ).

    finite difference/tensorized finite elements discretization

    Au = (AL +AV )u = λu,with

    AL =d∑

    j=1

    I ⊗ · · · ⊗ I︸ ︷︷ ︸d−j times

    ⊗AL ⊗ I ⊗ · · · ⊗ I︸ ︷︷ ︸j−1 times

    ,

    AV =s∑

    j=1

    A(d)V ,j ⊗ · · · ⊗ A(2)V ,j ⊗ A

    (1)V ,j .

  • Example: Henon-Heiles potentialConsider Ω = [−10,2]d and potential ([Meyer et al. 1990; Raab et al.2000; Faou et al. 2009])

    V (ξ) =12

    d∑j=1

    σjξ2j +

    d−1∑j=1

    (σ∗(ξjξ

    2j+1 −

    13ξ3j ) +

    σ2∗16

    (ξ2j + ξ2j+1)

    2).

    with σj ≡ 1, σ∗ = 0.2.

    Discretization with n = 128 dof/dimension for d = 20 dimensions.

    I Eigenvector has nd ≈ 1042 entries.I Explicit storage of eigenvector would require 1025 exabyte!

  • Example: Henon-Heiles potentialConsider Ω = [−10,2]d and potential ([Meyer et al. 1990; Raab et al.2000; Faou et al. 2009])

    V (ξ) =12

    d∑j=1

    σjξ2j +

    d−1∑j=1

    (σ∗(ξjξ

    2j+1 −

    13ξ3j ) +

    σ2∗16

    (ξ2j + ξ2j+1)

    2).

    with σj ≡ 1, σ∗ = 0.2.

    Discretization with n = 128 dof/dimension for d = 20 dimensions.

    I Eigenvector has nd ≈ 1042 entries.I Explicit storage of eigenvector would require 1025 exabyte!

    Solved with accuracy 10−12 in less than 1 hour on laptop.

  • Rayleigh quotients wrt low-rank matricesd = 2 : symmetric n2 × n2 matrix A.

    λmin(A) = minx 6=0

    〈x ,Ax〉〈x , x〉

    .

    We now...I reshape vector x into n × n matrix X ;I reinterpret Ax as linear operator A : X 7→ A(X );I for example if A =

    ∑sk=1 Bk ⊗ Ak then

    A(X ) =s∑

    k=1

    Bk XATk .

  • Rayleigh quotients wrt low-rank matricesd = 2 : symmetric n2 × n2 matrix A.

    λmin(A) = minX 6=0

    〈X ,A(X )〉〈X ,X 〉

    with matrix inner product 〈·, ·〉. We now...I restrict X to low-rank matrices.

  • Rayleigh quotients wrt low-rank matricesd = 2 : symmetric n2 × n2 matrix A.

    λmin(A)≈ minX=UV T 6=0

    〈X ,A(X )〉〈X ,X 〉

    .

    I Approximation error governed by low-rank approximability of X .I Solved by Riemannian optimization techniques or

    alternating linear scheme (ALS).

  • ALSALS for solving

    λmin(A)≈ minX=UV T 6=0

    〈X ,A(X )〉〈X ,X 〉

    .

    Initially:I fix target rank rI U ∈ Rm×r ,V n×r randomly, such that V is ONB

    λ̃− λ = 6× 103residual = 3× 103

  • ALSALS for solving

    λmin(A)≈ minX=UV T 6=0

    〈X ,A(X )〉〈X ,X 〉

    .

    Fix V , optimize for U.

    〈X ,A(X )〉 = vec(UV T )TA vec(UV T )= vec(U)T (V ⊗ I)TA(V ⊗ I)vec(U)

    Compute smallest eigenvalue of reduced matrix (rn × rn) matrix

    (V ⊗ I)TA(V ⊗ I).

    Note: Computation of reduced matrix benefits from Kroneckerstructure of A.

  • ALSALS for solving

    λmin(A)≈ minX=UV T 6=0

    〈X ,A(X )〉〈X ,X 〉

    .

    Fix V , optimize for U.

    λ̃− λ = 2× 103residual = 2× 103

  • ALSALS for solving

    λmin(A)≈ minX=UV T 6=0

    〈X ,A(X )〉〈X ,X 〉

    .

    Orthonormalize U, fix U, optimize for V .

    〈X ,A(X )〉 = vec(UV T )TA vec(UV T )= vec(V T )(I ⊗ U)TA(I ⊗ U)vec(V T )

    Compute smallest eigenvalue of reduced matrix (rn × rn) matrix

    (I ⊗ U)TA(I ⊗ U).

    Note: Computation of reduced matrix benefits from Kroneckerstructure of A.

  • ALSALS for solving

    λmin(A)≈ minX=UV T 6=0

    〈X ,A(X )〉〈X ,X 〉

    .

    Orthonormalize U, fix U, optimize for V .

    λ̃− λ = 1.5× 10−7residual = 7.7×10−3

  • ALSALS for solving

    λmin(A)≈ minX=UV T 6=0

    〈X ,A(X )〉〈X ,X 〉

    .

    Orthonormalize V , fix V , optimize for U.

    λ̃− λ = 1× 10−12residual = 6× 10−7

  • ALSALS for solving

    λmin(A)≈ minX=UV T 6=0

    〈X ,A(X )〉〈X ,X 〉

    .

    Orthonormalize U, fix U, optimize for V .

    λ̃− λ = 7.6× 10−13residual = 7.2×10−8

  • d � 1:Low-rank tensor formats

  • Tensor network diagrams

    I Introduced by Roger Penrose.I Heavily used in quantum mechanics (spin networks).

  • These are two matrices A,B

  • This is the matrix product C = AB

    Cij =r∑

    k=1

    Aik Bkj

  • This is the matrix product C = UΣV T

    Cij =r∑

    k=1

    r∑`=1

    Uik Σk`Vj`

    If r � n: Implicit representation of C via smaller matrices U,V ,Σ.

  • This is a tensor X of order 3

    I X ∈ Rn1×n2×n3 is a 3D arrayI Xijk denotes entry (i , j , k)

  • This is a tensor X of order 3 in Tucker decomposition

    Xijk =r1∑

    `1=1

    r2∑`2=1

    r3∑`3=1

    C`1`2`3Ui`1Vj`2Wk`3

    Implicit representation of X viaI r1 × r2 × r3 core tensor CI n1 × r1 matrix U spans first modeI n2 × r2 matrix V spans second modeI n3 × r3 matrix W spans third mode.

  • Tucker decomposition & multilinear rank

    Reshape tensor into matrix by slicing, e.g. for first dimension:

    X = X(1) = ∈ Rn1×(n2·n3)

    Multilinear rank of tensor X ∈ Rn1×n2×n3 defined by tuple

    r = (r1, r2, r3), with ri = rank(X(i)).

    X = U

    W

    V

    C

    Representation of rank-r-tensor:Tucker decomposition:

    X = C ×1 U ×2 V ×3 W

    U ∈ Rn1×r1 , V ∈ Rn2×r2 , W ∈ Rn3×r3 , andcore tensor C ∈ Rr1×r2×r3

  • This is a tensor X of order 6 in TT decomposition

    I X implicitly represented by four r × n × r tensors and two n × rmatrices

    I Quantum mechanics: MPS (matrix product states)I Introduced in numerical analysis by Oseledets and Tyrtishnikov.

  • Ranks of a tensor in TT decomposition

    This partition corresponds to low-rank factorization

    X (1,2,3) = UV T , X (1,2,3) ∈ Rn1n2n3×n4n5n6 , U ∈ Rn1n2n3×r3 , V ∈ Rn4n5n6×r3

    X (1,2,3) is matricization/unfolding/flattening/reshape of X :

    Merge multi-indices (1,2,3) into row indices andmulti-indices (4,5,6) into column indices

    The ranks of X (1,...,µ) for µ = 1, . . . ,d − 1 are the TT ranks of X .

  • When to expect good low-rank approximations?I Consider given function-related tensor and study best

    approximation error wrt TT ranks r .I Approximation error from separation wrt to {x1, . . . , xa}:

    f (x1, . . . , xa, xa+1, . . . , xd ) ≈r∑

    k=1

    gk (x1, . . . , xa)hk (xa+1, . . . , xd )

    for a = 1, . . . ,d − 1.I Well-known: For analytic functions

    error . exp(−rmax{1/a,1/(d−a)}).

    I [Temlyakov’1992, Uschmajew/Schneider’2013]: For f ∈ Bs,mix

    error . r−2s(log r)2s(max{a,d−a}−1).

    Smoothness is neither sufficient nor necessary for high dimensions!

    Need to take into account topology of problem (e.g., [Hastings’2008]for quantum ground states). Extended in [DK/Uschmajew’2014].

  • Two tensor X ,Y of order 6 in TT decomposition

  • Inner product of two tensors in TT decomposition

    I Carrying out contractions requires O(dnr4) instead of O(nd )operations for tensors of order d .

  • This is a tensor X of order 16 in PEPS

    I PEPS = Projected Entangled Pair StatesI Schuch/Wolf/Verstraete/Cirac’2007: inner product of two PEPS is

    NP hardI Landsberg/Qi/Ye’2012: PEPS not Zariski closed

  • ALS for TT decompositions

    I Originates from quantum mechanics = one-site DMRG.I More general (numerical analysis) viewpoint developed in

    [Holtz/Rohwedder/Schneider’2012; Dolgov/Oseledets’2012].

    Goal:min

    { 〈X ,A(X )〉〈X ,X〉

    : X ∈Mr, X 6= 0}

    Mr =

    ALS: Choose one node t , fix all other nodes, determine new tensor atnode t by minimizing Rayleigh quotient 〈X ,A(X )〉〈X ,X〉 . This is done for allnodes (a sweep), and sweeps are continued until convergence.

  • ALS for TT decompositions: Nuts & Bolts

    I Subproblems are of size nr2 × nr2:I Iterative method (LOBPCG) needed if nr 2 > O(102).I Availability of good preconditioner crucial.I Preconditioner can be inherited from full problem.

    I Rank adaptation strategies:I DMRG: Merge/optimize/split neighbouring cores.I Cheaper alternative: Enrich neighbouring cores with local

    (preconditioned) gradient information [Dolgov/Savostyanov’2013],[DK/Steinlechner/Uschmajew’2013].

    I Computation of several eigenvalues possible with block-TTformat [Dolgov/Khoromskij/Oseledets/Savostyanov’2014].

    I Local convergence results in [Rohwedder/Uschmajew’2013,Uschmajew/Vandereycken’2013].

  • Numerical Experiments - Henon-Heiles, d = 20

    ALS

    0 500 1000 1500 2000 250010

    −15

    10−10

    10−5

    100

    105

    Execution time [s]

    0 500 1000 1500 2000 25000

    10

    20

    30

    40

    50

    60err_lambda

    res

    nr_iter

    Size = 12820 ≈ 1042. Maximal TT rank 40.

  • Numerical Experiments - Henon-Heiles, d = 100

    0 500 1000 1500 2000 2500 300010

    −8

    10−6

    10−4

    10−2

    100

    102

    104

    Resid

    ual and e

    igenvalu

    e e

    rror

    I spectral discretization with 10 dof/dimension size 10100

    I Algorithm: Combination of ALS with preconditioned residuals[DK/Steinlechner/Uschmajew’2013].

    I rank adaptivityI computed smallest eigenvalue = 70.7415

  • Low-rank tensor techniquesI Emerged during last five years in numerical analysis.I Successfully applied to:

    I parameter-dependent / multi-dimensional integrals;I electronic structure calculations: Hartree-Fock / DFT;I stochastic and parametric PDEs;I high-dimensional Boltzmann / chemical master / Fokker-Planck /

    Schrödinger equations;I micromagnetism;I rational approximation problems;I computational homogenization;I computational finance;I multivariate regression and machine learning;I queuing models;I context-aware recommender systemsI . . .

    I For references on these applications, seeI L. Grasedyck, DK, Ch. Tobler (2013). A literature survey of low-

    rank tensor approximation techniques. GAMM-Mitteilungen, 36(1).I W. Hackbusch (2012). Tensor Spaces and Numerical Tensor

    Calculus, Springer.

  • ConclusionsI Numerical linear algebra alive and healthy.I Lots of potential to incorporate techniques from compressed

    sensing/optimization/... into numerical linear algebra algorithms.

    Some important aspects not covered:

    I Preconditioning.I Randomized algorithms.I Communication-avoiding algorithms.I Merging iterative methods with (adaptive) discretization.I . . .