85
Une journ´ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire de Math´ ematiques Jean Leray Anthony Nouy Centrale Nantes 1 / 42

Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Une journee autour de Ron DeVore Sep. 13, 2019

Approximation with tree tensor networks

Anthony Nouy

Centrale NantesLaboratoire de Mathematiques Jean Leray

Anthony Nouy Centrale Nantes 1 / 42

Page 2: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Statistical learning

Two typical tasks of statistical learning are to

approximate a random variable Y by a function of a set of variablesX = (X1, . . . ,Xd), from samples of the pair Z = (X ,Y ) (supervised learning)

approximate the probability distribution of a random vector Z = (Z1, . . . ,Zd) fromsamples of the distribution (unsupervised learning)

Anthony Nouy Centrale Nantes 2 / 42

Page 3: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Risk

A classical approach is to introduce a risk functional R(v) whose minimizer over the setof functions v is the target function u and such that

R(v)−R(u)

measures some distance between the target u and the function v .

The risk is defined as an expectation

R(v) = E(γ(v ,Z))

where γ is called a contrast (or loss) function.

For least-squares regression in supervised learning, R(v) = E((Y − v(X ))2),u(X ) = E(Y |X ) and

R(v)−R(u) = E((u(X )− v(X ))2) = ‖u − v‖2L2µwith X ∼ µ.

For unsupervised learning with L2-loss, R(v) = E(‖v‖2L2µ − 2v(Z)) and

R(v)−R(u) = ‖u − v‖2L2µ is the L2 distance between v and the probability density

u of Z with respect to a reference measure µ.

Anthony Nouy Centrale Nantes 3 / 42

Page 4: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Risk

A classical approach is to introduce a risk functional R(v) whose minimizer over the setof functions v is the target function u and such that

R(v)−R(u)

measures some distance between the target u and the function v .

The risk is defined as an expectation

R(v) = E(γ(v ,Z))

where γ is called a contrast (or loss) function.

For least-squares regression in supervised learning, R(v) = E((Y − v(X ))2),u(X ) = E(Y |X ) and

R(v)−R(u) = E((u(X )− v(X ))2) = ‖u − v‖2L2µwith X ∼ µ.

For unsupervised learning with L2-loss, R(v) = E(‖v‖2L2µ − 2v(Z)) and

R(v)−R(u) = ‖u − v‖2L2µ is the L2 distance between v and the probability density

u of Z with respect to a reference measure µ.

Anthony Nouy Centrale Nantes 3 / 42

Page 5: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Risk

A classical approach is to introduce a risk functional R(v) whose minimizer over the setof functions v is the target function u and such that

R(v)−R(u)

measures some distance between the target u and the function v .

The risk is defined as an expectation

R(v) = E(γ(v ,Z))

where γ is called a contrast (or loss) function.

For least-squares regression in supervised learning, R(v) = E((Y − v(X ))2),u(X ) = E(Y |X ) and

R(v)−R(u) = E((u(X )− v(X ))2) = ‖u − v‖2L2µwith X ∼ µ.

For unsupervised learning with L2-loss, R(v) = E(‖v‖2L2µ − 2v(Z)) and

R(v)−R(u) = ‖u − v‖2L2µ is the L2 distance between v and the probability density

u of Z with respect to a reference measure µ.

Anthony Nouy Centrale Nantes 3 / 42

Page 6: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Risk

A classical approach is to introduce a risk functional R(v) whose minimizer over the setof functions v is the target function u and such that

R(v)−R(u)

measures some distance between the target u and the function v .

The risk is defined as an expectation

R(v) = E(γ(v ,Z))

where γ is called a contrast (or loss) function.

For least-squares regression in supervised learning, R(v) = E((Y − v(X ))2),u(X ) = E(Y |X ) and

R(v)−R(u) = E((u(X )− v(X ))2) = ‖u − v‖2L2µwith X ∼ µ.

For unsupervised learning with L2-loss, R(v) = E(‖v‖2L2µ − 2v(Z)) and

R(v)−R(u) = ‖u − v‖2L2µ is the L2 distance between v and the probability density

u of Z with respect to a reference measure µ.

Anthony Nouy Centrale Nantes 3 / 42

Page 7: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Risk

Variational methods for PDEs [Eigel et al 2018]: with Z uniformly distributed onD = (0, 1)d and a risk

R(v) = E(|∇v(Z)|2)− 2v(Z)f (Z))

the target function u in H10 is such that

−∆u = f on D,

andR(v)−R(u) = ‖v − u‖2H1

0

Anthony Nouy Centrale Nantes 4 / 42

Page 8: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Empirical risk minimization

Given i.i.d. samples {zi}ni=1 of Z , an approximation unF of u is obtained by minimization

of the empirical risk

Rn(v) =1

n

n∑i=1

γ(v , zi )

over a certain model class F .

Denoting by uF the minimizer of the risk over F , the error

R(unF )−R(u) = R(un

F )−R(uF )︸ ︷︷ ︸estimation error

+ R(uF )−R(u)︸ ︷︷ ︸approximation error

For a given sample, when taking larger and larger model classes, approximation error↘ while estimation error ↗.

Methods should be proposed for the selection of a model class taking the best fromthe available information.

Anthony Nouy Centrale Nantes 5 / 42

Page 9: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Empirical risk minimization

Given i.i.d. samples {zi}ni=1 of Z , an approximation unF of u is obtained by minimization

of the empirical risk

Rn(v) =1

n

n∑i=1

γ(v , zi )

over a certain model class F .

Denoting by uF the minimizer of the risk over F , the error

R(unF )−R(u) = R(un

F )−R(uF )︸ ︷︷ ︸estimation error

+ R(uF )−R(u)︸ ︷︷ ︸approximation error

For a given sample, when taking larger and larger model classes, approximation error↘ while estimation error ↗.

Methods should be proposed for the selection of a model class taking the best fromthe available information.

Anthony Nouy Centrale Nantes 5 / 42

Page 10: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Empirical risk minimization

Given i.i.d. samples {zi}ni=1 of Z , an approximation unF of u is obtained by minimization

of the empirical risk

Rn(v) =1

n

n∑i=1

γ(v , zi )

over a certain model class F .

Denoting by uF the minimizer of the risk over F , the error

R(unF )−R(u) = R(un

F )−R(uF )︸ ︷︷ ︸estimation error

+ R(uF )−R(u)︸ ︷︷ ︸approximation error

For a given sample, when taking larger and larger model classes, approximation error↘ while estimation error ↗.

Methods should be proposed for the selection of a model class taking the best fromthe available information.

Anthony Nouy Centrale Nantes 5 / 42

Page 11: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Outline

1 Tree tensor networks

2 Estimation error

3 Approximation error

4 Learning algorithms

Anthony Nouy Centrale Nantes 6 / 42

Page 12: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Outline

1 Tree tensor networks

2 Estimation error

3 Approximation error

4 Learning algorithms

Anthony Nouy Centrale Nantes 7 / 42

Page 13: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Tensor ranks

We consider a space H of functions defined on the set X = X1 × . . .×Xd equipped witha product measure µ = µ1 ⊗ . . .⊗ µd .

For a subset of variablesα ⊂ {1, . . . , d} := D,

a function v ∈ H can be identified with a bivariate function

v(xα, xαc )

where xα and xαc are complementary groups of variables.

The canonical rank of this bivariate function is called the α-rank of v , denoted rankα(v).It is the minimal integer rα such that

v(x) =

rα∑k=1

vαk (xα)wαc

k (xαc )

Anthony Nouy Centrale Nantes 8 / 42

Page 14: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Tensor formats

For T ⊂ 2D a collection of subsets of D and a given tuple r = (rα)α∈T , a tensorformat is defined by

T Tr (H) = {v ∈ H : rankα(v) ≤ rα, α ∈ T} .

In the particular case where T is a dimension partition tree, T Tr is a tree-based

tensor format.

{1, 2, 3, 4, 5}

{1} {2} {3} {4} {5}

Tucker

{1, 2, 3, 4, 5}

{1, 2, 3, 4}

{1, 2, 3}

{1, 2}

{1} {2}

{3}

{4}

{5}

Tensor Train

{1, 2, 3, 4, 5}

{1, 2, 3}

{1}

{2, 3}

{2} {3}

{4, 5}

{4} {5}

Hierarchical Tucker

Anthony Nouy Centrale Nantes 9 / 42

Page 15: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Tree-based formats as tensor networks

Consider a tensor space H = H1 ⊗ . . .⊗Hd of functions in L2µ(X ), and let

{φνiν : iν ∈ I ν} be a basis of Hν ⊂ L2µν (Xν), typically polynomials, wavelets...

A function v in T Tr (H) = {v ∈ H : rankT (v) ≤ r} admits an explicit representation

v(x) =∑iα∈Iαα∈L(T )

∑1≤kβ≤rββ∈T

∏α∈T\L(T )

aα(kβ )β∈S(α),kα

∏α∈L(T )

aαiα,kαφαiα(xα)

where each parameter aα is in a tensor space RKα .

a1,2,3,4,5

a1,2,3

a1 a2,3

a2 a3

a4,5

a4 a5

Representation complexity

C(T , r ,H) =∑

α∈T\L(T )

rα∏

β∈S(α)

rβ +∑

α∈L(T )

rα dim(Hα).

Anthony Nouy Centrale Nantes 10 / 42

Page 16: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Tree-based tensor format as a deep neural network

By identifying a tensor a(α) ∈ Rn1×...×ns×rα with a Rrα -valued multilinear function

f (α) : Rn1 × . . .× Rns → Rrα ,

a function v in T Tr admits a representation as a tree-structured composition of

multilinear functions {f (α)}α∈T .

f 1,2,3,4,5

f 1,2,3

f 1 f 2,3

f 2 f 3

f 4,5

f 4 f 5

v(x) = f D(f 1,2,3(f 1(Φ1(x1)), f 2,3(f 2(Φ2(x2)), f 3(Φ3(x3))), f 4,5(f 4(Φ4(x4)), f 5(Φ5(x5))))

where Φν(xν) = (φνiν (xν))iν∈Iν ∈ R#Iν .

It corresponds to a deep network with a sparse architecture (given by T ), a depthbounded by d − 1, and width at level ` related to the α-ranks of the nodes α of level `.

Anthony Nouy Centrale Nantes 11 / 42

Page 17: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Tree-based tensor format as a deep neural network

By identifying a tensor a(α) ∈ Rn1×...×ns×rα with a Rrα -valued multilinear function

f (α) : Rn1 × . . .× Rns → Rrα ,

a function v in T Tr admits a representation as a tree-structured composition of

multilinear functions {f (α)}α∈T .

f 1,2,3,4,5

f 1,2,3

f 1 f 2,3

f 2 f 3

f 4,5

f 4 f 5

v(x) = f D(f 1,2,3(f 1(Φ1(x1)), f 2,3(f 2(Φ2(x2)), f 3(Φ3(x3))), f 4,5(f 4(Φ4(x4)), f 5(Φ5(x5))))

where Φν(xν) = (φνiν (xν))iν∈Iν ∈ R#Iν .

It corresponds to a deep network with a sparse architecture (given by T ), a depthbounded by d − 1, and width at level ` related to the α-ranks of the nodes α of level `.

Anthony Nouy Centrale Nantes 11 / 42

Page 18: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Outline

1 Tree tensor networks

2 Estimation error

3 Approximation error

4 Learning algorithms

Anthony Nouy Centrale Nantes 12 / 42

Page 19: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Estimation error

Given a model class F , a minimizer uF of the risk R over F and a minimizer unF of the

empirical risk Rn over F , the estimation error

R(unF )−R(uF ) ≤ R(un

F )− Rn(unF ) + Rn(uF )−R(uF )

so that

E(R(unF )−R(uF )) ≤ E(sup

v∈F|Rn(v)−R(v)|) = E(sup

v∈F|1n

n∑i=1

γ(v , zi )− E(γ(v ,Z))|)

Anthony Nouy Centrale Nantes 13 / 42

Page 20: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Estimation error

Assume that F is compact in L∞, and that for all v ,w ∈ F , the contrast is uniformlybounded and Lipschitz,

|γ(v ,Z)| ≤ M, |γ(v ,Z)− γ(w ,Z)| ≤ L‖v − w‖L∞

Then using a standard concentration inequality (here Hoeffding), we obtain

P(|1n

n∑i=1

γ(v , zi )− E(γ(v ,Z))| ≥ εM) ≤ 2e−nε2

2

where N εM2L

= N ( εM2L,F , ‖ · ‖L∞) is the covering number of F .

We deduce that for any ε,

E(R(unF )−R(uF )) ≤ εM + 2Mη(n, ε,F )

and a bound can be obtained by taking the infimum over ε.

Anthony Nouy Centrale Nantes 14 / 42

Page 21: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Estimation error

Assume that F is compact in L∞, and that for all v ,w ∈ F , the contrast is uniformlybounded and Lipschitz,

|γ(v ,Z)| ≤ M, |γ(v ,Z)− γ(w ,Z)| ≤ L‖v − w‖L∞

Then using a standard concentration inequality (here Hoeffding), we obtain

P( supv∈F|1n

n∑i=1

γ(v , zi )− E(γ(v ,Z))| ≥ εM) ≤ 2N εM2Le−

nε2

2 := η(n, ε,F )

where N εM2L

= N ( εM2L,F , ‖ · ‖L∞) is the covering number of F .

We deduce that for any ε,

E(R(unF )−R(uF )) ≤ εM + 2Mη(n, ε,F )

and a bound can be obtained by taking the infimum over ε.

Anthony Nouy Centrale Nantes 14 / 42

Page 22: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Estimation error

Assume that F is compact in L∞, and that for all v ,w ∈ F , the contrast is uniformlybounded and Lipschitz,

|γ(v ,Z)| ≤ M, |γ(v ,Z)− γ(w ,Z)| ≤ L‖v − w‖L∞

Then using a standard concentration inequality (here Hoeffding), we obtain

P( supv∈F|1n

n∑i=1

γ(v , zi )− E(γ(v ,Z))| ≥ εM) ≤ 2N εM2Le−

nε2

2 := η(n, ε,F )

where N εM2L

= N ( εM2L,F , ‖ · ‖L∞) is the covering number of F .

We deduce that for any ε,

E(R(unF )−R(uF )) ≤ εM + 2Mη(n, ε,F )

and a bound can be obtained by taking the infimum over ε.

Anthony Nouy Centrale Nantes 14 / 42

Page 23: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Metric entropy of tree-based tensor formats[with Bertrand Michel]

Assume H ⊂ L∞(X ) with basis functions {φi}i∈I normalized in L∞(X ).

For any representation of v ∈ T Tr (H) with parameters {f α : α ∈ T}, the multilinearity of

the parametrization implies

‖v‖L∞ ≤∏α

‖f α‖α,∞,

with a suitable choice of norms

‖f α‖α,∞ = sup‖zβ‖∞≤1

‖f α((zβ)β∈S(α))‖∞

Considering the model class

F = RF1, F1 = {v ∈ T Tr (H) : max

α∈T‖f α‖α,∞ ≤ 1},

we obtain an upper bound of the metric entropy

logN (ε,F , ‖ · ‖L∞) ≤ C(T , r ,H) log(3ε−1R#T ),

where C(T , r ,H) is the representation complexity of elements of F .

Anthony Nouy Centrale Nantes 15 / 42

Page 24: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Coming back to estimation error

We conclude thatE(R(un

F )−R(uF )) ≤ εM + 2Mη

for

n ≥ 2ε−2

(log(η−1/2) + C(T , r ,H) log(6ε−1RL

M#T )

)and

E(R(unF )−R(uF )) . M

√C(T , r ,H)

n

up to log factors.

Anthony Nouy Centrale Nantes 16 / 42

Page 25: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Improving estimation error

Improved bounds may be obtained using refined concentration inequalities forsuprema of empirical processes

supv∈F|1n

n∑i=1

γ(v , zi )− E(γ(v ,Z))|

A better performance can be obtained by using as empirical risk

Rn(v) =1

n

n∑i=1

wiγ(v , zi )

where the zi are i.i.d. samples from a measure dPZ = w(z)dPZ and the weightswi = w(zi )

−1.

The choice of sampling measure should be adapted to the risk and model class, andmay be deduced from concentration inequalities. See [Cohen and Migliorati 2017]for least-squares regression and linear models.

For tree tensor networks, use a different sample for each parameter. Link toempirical PCA [Nouy 2019] [Haberstich, Nouy, Perrin 2020].

Anthony Nouy Centrale Nantes 17 / 42

Page 26: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Improving estimation error

Improved bounds may be obtained using refined concentration inequalities forsuprema of empirical processes

supv∈F|1n

n∑i=1

γ(v , zi )− E(γ(v ,Z))|

A better performance can be obtained by using as empirical risk

Rn(v) =1

n

n∑i=1

wiγ(v , zi )

where the zi are i.i.d. samples from a measure dPZ = w(z)dPZ and the weightswi = w(zi )

−1.

The choice of sampling measure should be adapted to the risk and model class, andmay be deduced from concentration inequalities. See [Cohen and Migliorati 2017]for least-squares regression and linear models.

For tree tensor networks, use a different sample for each parameter. Link toempirical PCA [Nouy 2019] [Haberstich, Nouy, Perrin 2020].

Anthony Nouy Centrale Nantes 17 / 42

Page 27: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Improving estimation error

Improved bounds may be obtained using refined concentration inequalities forsuprema of empirical processes

supv∈F|1n

n∑i=1

γ(v , zi )− E(γ(v ,Z))|

A better performance can be obtained by using as empirical risk

Rn(v) =1

n

n∑i=1

wiγ(v , zi )

where the zi are i.i.d. samples from a measure dPZ = w(z)dPZ and the weightswi = w(zi )

−1.

The choice of sampling measure should be adapted to the risk and model class, andmay be deduced from concentration inequalities. See [Cohen and Migliorati 2017]for least-squares regression and linear models.

For tree tensor networks, use a different sample for each parameter. Link toempirical PCA [Nouy 2019] [Haberstich, Nouy, Perrin 2020].

Anthony Nouy Centrale Nantes 17 / 42

Page 28: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Improving estimation error

Improved bounds may be obtained using refined concentration inequalities forsuprema of empirical processes

supv∈F|1n

n∑i=1

γ(v , zi )− E(γ(v ,Z))|

A better performance can be obtained by using as empirical risk

Rn(v) =1

n

n∑i=1

wiγ(v , zi )

where the zi are i.i.d. samples from a measure dPZ = w(z)dPZ and the weightswi = w(zi )

−1.

The choice of sampling measure should be adapted to the risk and model class, andmay be deduced from concentration inequalities. See [Cohen and Migliorati 2017]for least-squares regression and linear models.

For tree tensor networks, use a different sample for each parameter. Link toempirical PCA [Nouy 2019] [Haberstich, Nouy, Perrin 2020].

Anthony Nouy Centrale Nantes 17 / 42

Page 29: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Outline

1 Tree tensor networks

2 Estimation error

3 Approximation error

4 Learning algorithms

Anthony Nouy Centrale Nantes 18 / 42

Page 30: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Approximation properties of tree tensor networks

We want to quantify the approximation error

minv∈T T

r

R(v)−R(u)

for a target function u in a given function class, i.e. study the expressive power of treetensor networks, and compare it with other approximation tools.

Anthony Nouy Centrale Nantes 19 / 42

Page 31: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Approximation properties of tree tensor networks

We consider least-squares regression and L2 density estimation, where

minv∈T T

r

R(v)−R(u) = minv∈T T

r

‖u − v‖2L2µ := eT ,r (u)2.

SinceT Tr =

⋂α∈T

{v : rankα(v) ≤ rα},

a lower bound is given byeT ,r (u) ≥ max

α∈Teα,rα(u)

whereeα,rα(u) = min

rankα(v)≤rα‖u − v‖L2µ

Anthony Nouy Centrale Nantes 20 / 42

Page 32: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Singular value decomposition

u(x1, . . . , xd) can be identified with a bivariate function u(xα, xαc ) inL2µα⊗µαc (Xα ×Xαc ) which admits a singular value decomposition

u(xα, xαc ) =

rankα(u)∑k=1

σαk vαk (xα)vα

c

k (xαc ),

The problem of best approximation of u by a function with α-rank rα admits as asolution the truncated singular value decomposition urα of u

urα(xα, xαc ) =

rα∑k=1

σαk vαk (xα)vα

c

k (xαc )

where {vα1 , . . . , vαrα} are the rα α-principal components of u, and

eα,rα(u) =

√∑k>rα

(σαk )2.

Anthony Nouy Centrale Nantes 21 / 42

Page 33: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Linear widths of multivariate functions

The subspace of principal components Uα = span{vα1 , . . . , vαrα} is solution of

mindim(Uα)=rα

∫‖u(·, xαc )− PUαu(·, xαc )‖2L2µα dµαc (xαc ) = eα,rα(u)2

where PUα is the orthogonal projection onto Uα.

Considering the set of partial evaluations of u

Kα(u) = {u(·, xαc ) : xαc ∈ Xαc } ⊂ L2µα(Xα),

andeα,rα(u) ≤ min

dim(Uα)=rαsup

v∈Kα(u)

‖v − PUαv‖L2µα = drα(Kα(u))L2µα,

the upper bound being the Kolmogorov rα-width of Kα(u)

Furthermore, since eα,rα(u) = eαc ,rα(u), we have

eα,rα(u) ≤ min{drα(Kα(u))L2µα

, drα(Kαc (u))L2µαc

}

Anthony Nouy Centrale Nantes 22 / 42

Page 34: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Linear widths of multivariate functions

The subspace of principal components Uα = span{vα1 , . . . , vαrα} is solution of

mindim(Uα)=rα

∫‖u(·, xαc )− PUαu(·, xαc )‖2L2µα dµαc (xαc ) = eα,rα(u)2

where PUα is the orthogonal projection onto Uα.

Considering the set of partial evaluations of u

Kα(u) = {u(·, xαc ) : xαc ∈ Xαc } ⊂ L2µα(Xα),

andeα,rα(u) ≤ min

dim(Uα)=rαsup

v∈Kα(u)

‖v − PUαv‖L2µα = drα(Kα(u))L2µα,

the upper bound being the Kolmogorov rα-width of Kα(u)

Furthermore, since eα,rα(u) = eαc ,rα(u), we have

eα,rα(u) ≤ min{drα(Kα(u))L2µα

, drα(Kαc (u))L2µαc

}

Anthony Nouy Centrale Nantes 22 / 42

Page 35: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Linear widths of multivariate functions

The subspace of principal components Uα = span{vα1 , . . . , vαrα} is solution of

mindim(Uα)=rα

∫‖u(·, xαc )− PUαu(·, xαc )‖2L2µα dµαc (xαc ) = eα,rα(u)2

where PUα is the orthogonal projection onto Uα.

Considering the set of partial evaluations of u

Kα(u) = {u(·, xαc ) : xαc ∈ Xαc } ⊂ L2µα(Xα),

andeα,rα(u) ≤ min

dim(Uα)=rαsup

v∈Kα(u)

‖v − PUαv‖L2µα = drα(Kα(u))L2µα,

the upper bound being the Kolmogorov rα-width of Kα(u)

Furthermore, since eα,rα(u) = eαc ,rα(u), we have

eα,rα(u) ≤ min{drα(Kα(u))L2µα

, drα(Kαc (u))L2µαc

}

Anthony Nouy Centrale Nantes 22 / 42

Page 36: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Upper bound through higher order singular value decomposition

Given the spaces of principal components Uα, α ∈ T , we can define an approximation

ur =∏α∈T

PUαu ∈ TTr

obtained by successive orthogonal projections (suitably ordered) such that [Grasedyck2010]

eT ,r (u)2 ≤ ‖u − ur‖2 ≤∑α∈T

eα,rα(u)2

Then error bounds can be obtained with information on the α-singular values of u, or onKolmogorov witdhs of the sets Kα(u) and Kαc (u) of partial evaluations of u.

Anthony Nouy Centrale Nantes 23 / 42

Page 37: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Expressive power of tree tensor networks

For standard regularity classes, they perform almost as well as standardapproximation tools.

For example, for u ∈ Hkmix((0, 1)d), Kα(u) ⊂ Hk

mix((0, 1)#α) for any α. From boundsof Kolmogorov widths of Sobolev balls

eα,rα(u) ≤ drα(Kα(u)) . r−kα log(rα)k(#α−1)

we obtain that the complexity to achieve a precision ε (with binary trees) is

C(ε) . ε−3/k log(ε−1)dd1+3/(2k) up to powers of log(ε−1)

Performs almost as well as hyperbolic cross approximation (sparse tensors).

Similar results in [Schneider and Uschmajew 2014] using results on bilinearapproximation [Temlyakov 1989]. See also [Griebel and Harbrecht 2019]

Anthony Nouy Centrale Nantes 24 / 42

Page 38: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Expressive power of tree tensor networks

For standard regularity classes, they perform almost as well as standardapproximation tools.

For example, for u ∈ Hkmix((0, 1)d), Kα(u) ⊂ Hk

mix((0, 1)#α) for any α. From boundsof Kolmogorov widths of Sobolev balls

eα,rα(u) ≤ drα(Kα(u)) . r−kα log(rα)k(#α−1)

we obtain that the complexity to achieve a precision ε (with binary trees) is

C(ε) . ε−3/k log(ε−1)dd1+3/(2k) up to powers of log(ε−1)

Performs almost as well as hyperbolic cross approximation (sparse tensors).

Similar results in [Schneider and Uschmajew 2014] using results on bilinearapproximation [Temlyakov 1989]. See also [Griebel and Harbrecht 2019]

Anthony Nouy Centrale Nantes 24 / 42

Page 39: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Expressive power of tree tensor networks[with M. Bachmayr and R. Schneider]

But they can perform much better for non standard classes of functions, e.g. atree-structured composition of regular functions {fα : α ∈ T}, see [Mhaskar, Liao,Poggio 2016] for deep neural networks.

f1,2,3,4 (f1,2 (f1(x1), f2(x2)) , f3,4 (f3(x3), f4(x4)))

{1, 2, 3, 4}

{1, 2}

{1} {2}

{3, 4}

{3} {4}

Assuming that the functions fα ∈W k,∞ with ‖fα‖L∞ ≤ 1 and ‖fα‖W k,∞ ≤ B, thecomplexity to achieve an accuracy ε

C(ε) . ε−3/k(L + 1)3B3Ld1+3/2k

with L = log2(d) for a balanced tree and L + 1 = d for a linear tree.

• Bad influence of the depth through the norm B of functions fα (roughness).• For B ≤ 1 (and even for 1-Lipschitz functions), the complexity only scales

polynomially in d : no curse of dimensionality !

Anthony Nouy Centrale Nantes 25 / 42

Page 40: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Expressive power of tree tensor networks[with M. Bachmayr and R. Schneider]

But they can perform much better for non standard classes of functions, e.g. atree-structured composition of regular functions {fα : α ∈ T}, see [Mhaskar, Liao,Poggio 2016] for deep neural networks.

f1,2,3,4 (f1,2 (f1(x1), f2(x2)) , f3,4 (f3(x3), f4(x4)))

{1, 2, 3, 4}

{1, 2}

{1} {2}

{3, 4}

{3} {4}

Assuming that the functions fα ∈W k,∞ with ‖fα‖L∞ ≤ 1 and ‖fα‖W k,∞ ≤ B, thecomplexity to achieve an accuracy ε

C(ε) . ε−3/k(L + 1)3B3Ld1+3/2k

with L = log2(d) for a balanced tree and L + 1 = d for a linear tree.

• Bad influence of the depth through the norm B of functions fα (roughness).• For B ≤ 1 (and even for 1-Lipschitz functions), the complexity only scales

polynomially in d : no curse of dimensionality !

Anthony Nouy Centrale Nantes 25 / 42

Page 41: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Expressive power of tree tensor networks[with M. Bachmayr and R. Schneider]

But they can perform much better for non standard classes of functions, e.g. atree-structured composition of regular functions {fα : α ∈ T}, see [Mhaskar, Liao,Poggio 2016] for deep neural networks.

f1,2,3,4 (f1,2 (f1(x1), f2(x2)) , f3,4 (f3(x3), f4(x4)))

{1, 2, 3, 4}

{1, 2}

{1} {2}

{3, 4}

{3} {4}

Assuming that the functions fα ∈W k,∞ with ‖fα‖L∞ ≤ 1 and ‖fα‖W k,∞ ≤ B, thecomplexity to achieve an accuracy ε

C(ε) . ε−3/k(L + 1)3B3Ld1+3/2k

with L = log2(d) for a balanced tree and L + 1 = d for a linear tree.

• Bad influence of the depth through the norm B of functions fα (roughness).

• For B ≤ 1 (and even for 1-Lipschitz functions), the complexity only scalespolynomially in d : no curse of dimensionality !

Anthony Nouy Centrale Nantes 25 / 42

Page 42: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Expressive power of tree tensor networks[with M. Bachmayr and R. Schneider]

But they can perform much better for non standard classes of functions, e.g. atree-structured composition of regular functions {fα : α ∈ T}, see [Mhaskar, Liao,Poggio 2016] for deep neural networks.

f1,2,3,4 (f1,2 (f1(x1), f2(x2)) , f3,4 (f3(x3), f4(x4)))

{1, 2, 3, 4}

{1, 2}

{1} {2}

{3, 4}

{3} {4}

Assuming that the functions fα ∈W k,∞ with ‖fα‖L∞ ≤ 1 and ‖fα‖W k,∞ ≤ B, thecomplexity to achieve an accuracy ε

C(ε) . ε−3/k(L + 1)3B3Ld1+3/2k

with L = log2(d) for a balanced tree and L + 1 = d for a linear tree.

• Bad influence of the depth through the norm B of functions fα (roughness).• For B ≤ 1 (and even for 1-Lipschitz functions), the complexity only scales

polynomially in d : no curse of dimensionality !

Anthony Nouy Centrale Nantes 25 / 42

Page 43: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Expressive power of tree tensor networks

A function in canonical format (shallow network)

u(x) =r∑

k=1

u1k(x1) . . . ud

k (xd)

can be represented in tree-based format with a similar complexity.

Conversely, a typical function in tree-based format T Tr has a canonical rank

depending exponentially in d .

Deep is better !

For a balanced or linear binary tree T , the subset of tensors v in T Tr (Rn×...×n) with

canonical rank less than min{n, r}d/2 is of Lebesgue measure 0 [Cohen et al. 2016,Khrulkov et al 2018]

But a typical function in T Tr may admit a representation complexity exponential in d

when using another tree.

Anthony Nouy Centrale Nantes 26 / 42

Page 44: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Expressive power of tree tensor networks

A function in canonical format (shallow network)

u(x) =r∑

k=1

u1k(x1) . . . ud

k (xd)

can be represented in tree-based format with a similar complexity.

Conversely, a typical function in tree-based format T Tr has a canonical rank

depending exponentially in d .

Deep is better !

For a balanced or linear binary tree T , the subset of tensors v in T Tr (Rn×...×n) with

canonical rank less than min{n, r}d/2 is of Lebesgue measure 0 [Cohen et al. 2016,Khrulkov et al 2018]

But a typical function in T Tr may admit a representation complexity exponential in d

when using another tree.

Anthony Nouy Centrale Nantes 26 / 42

Page 45: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Expressive power of tree tensor networks

A function in canonical format (shallow network)

u(x) =r∑

k=1

u1k(x1) . . . ud

k (xd)

can be represented in tree-based format with a similar complexity.

Conversely, a typical function in tree-based format T Tr has a canonical rank

depending exponentially in d .

Deep is better !

For a balanced or linear binary tree T , the subset of tensors v in T Tr (Rn×...×n) with

canonical rank less than min{n, r}d/2 is of Lebesgue measure 0 [Cohen et al. 2016,Khrulkov et al 2018]

But a typical function in T Tr may admit a representation complexity exponential in d

when using another tree.

Anthony Nouy Centrale Nantes 26 / 42

Page 46: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Influence of the tree

As an example, consider the probability distribution f (x) = P(X = x) of a Markov chainX = (X1, . . . ,Xd) given by

f (x) = f1(x1)f2|1(x2|x1) . . . fd|d−1(xd |xd−1)

where bivariate functions fi|i−1 have a rank bounded by r .

With the linear tree T containing interior nodes {1, 2}, {1, 2, 3}, . . . , {1, . . . , d − 1},f admits a representation in tree-based format with storage complexity in r 4.

The canonical rank of f is exponential in d .

But when considering the linear tree Tσ = {σ(α) : α ∈ T} obtained by applyingpermutation σ = (1, 3, . . . , d − 1, 2, 4, . . . , d) to the tree T , the storage complexityin tree-based format is also exponential in d .

Anthony Nouy Centrale Nantes 27 / 42

Page 47: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Influence of the tree

As an example, consider the probability distribution f (x) = P(X = x) of a Markov chainX = (X1, . . . ,Xd) given by

f (x) = f1(x1)f2|1(x2|x1) . . . fd|d−1(xd |xd−1)

where bivariate functions fi|i−1 have a rank bounded by r .

With the linear tree T containing interior nodes {1, 2}, {1, 2, 3}, . . . , {1, . . . , d − 1},f admits a representation in tree-based format with storage complexity in r 4.

The canonical rank of f is exponential in d .

But when considering the linear tree Tσ = {σ(α) : α ∈ T} obtained by applyingpermutation σ = (1, 3, . . . , d − 1, 2, 4, . . . , d) to the tree T , the storage complexityin tree-based format is also exponential in d .

Anthony Nouy Centrale Nantes 27 / 42

Page 48: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Influence of the tree

As an example, consider the probability distribution f (x) = P(X = x) of a Markov chainX = (X1, . . . ,Xd) given by

f (x) = f1(x1)f2|1(x2|x1) . . . fd|d−1(xd |xd−1)

where bivariate functions fi|i−1 have a rank bounded by r .

With the linear tree T containing interior nodes {1, 2}, {1, 2, 3}, . . . , {1, . . . , d − 1},f admits a representation in tree-based format with storage complexity in r 4.

The canonical rank of f is exponential in d .

But when considering the linear tree Tσ = {σ(α) : α ∈ T} obtained by applyingpermutation σ = (1, 3, . . . , d − 1, 2, 4, . . . , d) to the tree T , the storage complexityin tree-based format is also exponential in d .

Anthony Nouy Centrale Nantes 27 / 42

Page 49: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Approximation properties of tree tensor networks

Choosing a good tree (architecture of network) is a crucial but combinatorial problem...

{2} {3}

{7}

{5} {4}

{8}

{1} {6} {2} {7}

{4}

{8} {1} {5} {3}

{6}

{1} {4}{2} {8}

{3} {7} {6}

{5}

{3} {2}{4}

{7}{5}

{8}{6} {1}

Anthony Nouy Centrale Nantes 28 / 42

Page 50: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Outline

1 Tree tensor networks

2 Estimation error

3 Approximation error

4 Learning algorithms

Anthony Nouy Centrale Nantes 29 / 42

Page 51: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Learning algorithm for tree tensor networks

A function v in the model class T Tr (H) has a representation v(x) = Ψ(x)((aα)α∈T )

where each parameter aα is in a tensor space RKα and Ψ(x) is a multilinear map.

The empirical risk minimization problem over the nonlinear model class T Tr

min(aα)α∈T

1

n

n∑i=1

γ(Ψ(·)((aα)α∈T ), zi )

can be solved using an alternating minimization algorithm, solving at each step anempirical risk minimization problem with a linear model

Ψ(x)((aα)α∈T ) =∑k∈Kα

Ψαk (x)aαk

with functions Ψαk (x) depending on fixed parameters aβ , β 6= α.

In a L2 setting, possible re-parametrization for having orthonormal functions Ψαk (x).

Sparsity in the tensors aα can be exploited in different ways, e.g. by proposingdifferent sparsity patterns and use a model selection technique (e.g. based onvalidation).

For a leaf node ν, the approximation space Hν can be selected from a candidatesequence of spaces Hν0 ⊂ . . . ⊂ HνL ⊂ . . .

Anthony Nouy Centrale Nantes 30 / 42

Page 52: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Learning algorithm for tree tensor networks

A function v in the model class T Tr (H) has a representation v(x) = Ψ(x)((aα)α∈T )

where each parameter aα is in a tensor space RKα and Ψ(x) is a multilinear map.

The empirical risk minimization problem over the nonlinear model class T Tr

min(aα)α∈T

1

n

n∑i=1

γ(Ψ(·)((aα)α∈T ), zi )

can be solved using an alternating minimization algorithm, solving at each step anempirical risk minimization problem with a linear model

Ψ(x)((aα)α∈T ) =∑k∈Kα

Ψαk (x)aαk

with functions Ψαk (x) depending on fixed parameters aβ , β 6= α.

In a L2 setting, possible re-parametrization for having orthonormal functions Ψαk (x).

Sparsity in the tensors aα can be exploited in different ways, e.g. by proposingdifferent sparsity patterns and use a model selection technique (e.g. based onvalidation).

For a leaf node ν, the approximation space Hν can be selected from a candidatesequence of spaces Hν0 ⊂ . . . ⊂ HνL ⊂ . . .

Anthony Nouy Centrale Nantes 30 / 42

Page 53: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Learning algorithm for tree tensor networks

A function v in the model class T Tr (H) has a representation v(x) = Ψ(x)((aα)α∈T )

where each parameter aα is in a tensor space RKα and Ψ(x) is a multilinear map.

The empirical risk minimization problem over the nonlinear model class T Tr

min(aα)α∈T

1

n

n∑i=1

γ(Ψ(·)((aα)α∈T ), zi )

can be solved using an alternating minimization algorithm, solving at each step anempirical risk minimization problem with a linear model

Ψ(x)((aα)α∈T ) =∑k∈Kα

Ψαk (x)aαk

with functions Ψαk (x) depending on fixed parameters aβ , β 6= α.

In a L2 setting, possible re-parametrization for having orthonormal functions Ψαk (x).

Sparsity in the tensors aα can be exploited in different ways, e.g. by proposingdifferent sparsity patterns and use a model selection technique (e.g. based onvalidation).

For a leaf node ν, the approximation space Hν can be selected from a candidatesequence of spaces Hν0 ⊂ . . . ⊂ HνL ⊂ . . .

Anthony Nouy Centrale Nantes 30 / 42

Page 54: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Learning algorithm for tree tensor networks

A function v in the model class T Tr (H) has a representation v(x) = Ψ(x)((aα)α∈T )

where each parameter aα is in a tensor space RKα and Ψ(x) is a multilinear map.

The empirical risk minimization problem over the nonlinear model class T Tr

min(aα)α∈T

1

n

n∑i=1

γ(Ψ(·)((aα)α∈T ), zi )

can be solved using an alternating minimization algorithm, solving at each step anempirical risk minimization problem with a linear model

Ψ(x)((aα)α∈T ) =∑k∈Kα

Ψαk (x)aαk

with functions Ψαk (x) depending on fixed parameters aβ , β 6= α.

In a L2 setting, possible re-parametrization for having orthonormal functions Ψαk (x).

Sparsity in the tensors aα can be exploited in different ways, e.g. by proposingdifferent sparsity patterns and use a model selection technique (e.g. based onvalidation).

For a leaf node ν, the approximation space Hν can be selected from a candidatesequence of spaces Hν0 ⊂ . . . ⊂ HνL ⊂ . . .

Anthony Nouy Centrale Nantes 30 / 42

Page 55: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Learning algorithm for tree tensor networks

A function v in the model class T Tr (H) has a representation v(x) = Ψ(x)((aα)α∈T )

where each parameter aα is in a tensor space RKα and Ψ(x) is a multilinear map.

The empirical risk minimization problem over the nonlinear model class T Tr

min(aα)α∈T

1

n

n∑i=1

γ(Ψ(·)((aα)α∈T ), zi )

can be solved using an alternating minimization algorithm, solving at each step anempirical risk minimization problem with a linear model

Ψ(x)((aα)α∈T ) =∑k∈Kα

Ψαk (x)aαk

with functions Ψαk (x) depending on fixed parameters aβ , β 6= α.

In a L2 setting, possible re-parametrization for having orthonormal functions Ψαk (x).

Sparsity in the tensors aα can be exploited in different ways, e.g. by proposingdifferent sparsity patterns and use a model selection technique (e.g. based onvalidation).

For a leaf node ν, the approximation space Hν can be selected from a candidatesequence of spaces Hν0 ⊂ . . . ⊂ HνL ⊂ . . .

Anthony Nouy Centrale Nantes 30 / 42

Page 56: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Learning algorithm for tree tensor networks

Selection an optimal model class T Tr (H) is a combinatorial problem.

An algorithm is proposed in [Grelier, Nouy, Chevreuil 2018] that performs adaptations ofthe tree T (architecture), the rank r (widths) and the approximation space H.

Start with an initial tree T and learn an approximation v ∈ T Tr (H) with rank

r = (1, ..., 1). Then repeat

Increase some ranks rα based on estimates of truncation errors

minrankα(v)≤rα

R(v)−R(u)

Learn an approximation v in T Tr (H), with adaptive selection of H

Optimize the tree for reducing the storage complexity of v (stochastic algorithmusing a suitable distribution over the set of trees)

minT

C(T , rankT (v),H)

Anthony Nouy Centrale Nantes 31 / 42

Page 57: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Learning algorithm for tree tensor networks

Selection an optimal model class T Tr (H) is a combinatorial problem.

An algorithm is proposed in [Grelier, Nouy, Chevreuil 2018] that performs adaptations ofthe tree T (architecture), the rank r (widths) and the approximation space H.

Start with an initial tree T and learn an approximation v ∈ T Tr (H) with rank

r = (1, ..., 1). Then repeat

Increase some ranks rα based on estimates of truncation errors

minrankα(v)≤rα

R(v)−R(u)

Learn an approximation v in T Tr (H), with adaptive selection of H

Optimize the tree for reducing the storage complexity of v (stochastic algorithmusing a suitable distribution over the set of trees)

minT

C(T , rankT (v),H)

Anthony Nouy Centrale Nantes 31 / 42

Page 58: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Learning algorithm for tree tensor networks

Selection an optimal model class T Tr (H) is a combinatorial problem.

An algorithm is proposed in [Grelier, Nouy, Chevreuil 2018] that performs adaptations ofthe tree T (architecture), the rank r (widths) and the approximation space H.

Start with an initial tree T and learn an approximation v ∈ T Tr (H) with rank

r = (1, ..., 1). Then repeat

Increase some ranks rα based on estimates of truncation errors

minrankα(v)≤rα

R(v)−R(u)

Learn an approximation v in T Tr (H), with adaptive selection of H

Optimize the tree for reducing the storage complexity of v (stochastic algorithmusing a suitable distribution over the set of trees)

minT

C(T , rankT (v),H)

Anthony Nouy Centrale Nantes 31 / 42

Page 59: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Learning algorithm for tree tensor networks

Selection an optimal model class T Tr (H) is a combinatorial problem.

An algorithm is proposed in [Grelier, Nouy, Chevreuil 2018] that performs adaptations ofthe tree T (architecture), the rank r (widths) and the approximation space H.

Start with an initial tree T and learn an approximation v ∈ T Tr (H) with rank

r = (1, ..., 1). Then repeat

Increase some ranks rα based on estimates of truncation errors

minrankα(v)≤rα

R(v)−R(u)

Learn an approximation v in T Tr (H), with adaptive selection of H

Optimize the tree for reducing the storage complexity of v (stochastic algorithmusing a suitable distribution over the set of trees)

minT

C(T , rankT (v),H)

Anthony Nouy Centrale Nantes 31 / 42

Page 60: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Learning algorithm for tree tensor networks

Selection an optimal model class T Tr (H) is a combinatorial problem.

An algorithm is proposed in [Grelier, Nouy, Chevreuil 2018] that performs adaptations ofthe tree T (architecture), the rank r (widths) and the approximation space H.

Start with an initial tree T and learn an approximation v ∈ T Tr (H) with rank

r = (1, ..., 1). Then repeat

Increase some ranks rα based on estimates of truncation errors

minrankα(v)≤rα

R(v)−R(u)

Learn an approximation v in T Tr (H), with adaptive selection of H

Optimize the tree for reducing the storage complexity of v (stochastic algorithmusing a suitable distribution over the set of trees)

minT

C(T , rankT (v),H)

Anthony Nouy Centrale Nantes 31 / 42

Page 61: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in supervised learning: composition of functions

Consider a tree-structured composition of functions

u(X ) = h(h(h(X1,X2), h(X3,X4)), h(h(X5,X6), h(X7,X8))),

where h(t, s) = 9−1(2 + ts)2 is a bivariate function and where the d = 8 randomvariables X1, . . . ,X8 are independent and uniform on [−1, 1].

h

h

h

X1 X2

h

X3 X4

h

h

X5 X6

h

X7 X8

We use polynomial approximation spaces H (with adaptive selection of the degree), sothat function u could (in principle) be recovered exactly for any choice of tree with asufficiently high rank.

Anthony Nouy Centrale Nantes 32 / 42

Page 62: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in supervised learning: composition of functions

We consider the tree T 1 coinciding with the structure of u, for which

C(T 1, rankT 1(u),H) = 2427

{1} {2} {3} {4} {5} {6} {7} {8}

(a) Tree T 1

{8} {1} {6} {4} {7} {2} {3} {5}

(b) Tree T 1σ

By considering a permutation T 1σ = {σ(α) : α ∈ T 1} of T 1, with

σ = (8, 1, 6, 4, 7, 2, 3, 5), we have a complexity

C(T 1σ, rankT1

σ(u),H) ≥ 9.106

Anthony Nouy Centrale Nantes 33 / 42

Page 63: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in supervised learning: composition of functions

We consider the tree T 1 coinciding with the structure of u, for which

C(T 1, rankT 1(u),H) = 2427

{1} {2} {3} {4} {5} {6} {7} {8}

(c) Tree T 1

{8} {1} {6} {4} {7} {2} {3} {5}

(d) Tree T 1σ

By considering a permutation T 1σ = {σ(α) : α ∈ T 1} of T 1, with

σ = (8, 1, 6, 4, 7, 2, 3, 5), we have a complexity

C(T 1σ, rankT 1

σ(u),H) ≥ 9.106

Anthony Nouy Centrale Nantes 33 / 42

Page 64: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in supervised learning: composition of functions

We consider a linear tree T 2 and start the algorithm from a tree T 2σ = {σ(α) : α ∈ T 2}

obtained by applying a random permutation σ to T 2.

{1} {2}{3}{4}{5}{6}{7}{8}

(e) Tree T 2

{6} {8}{4}{5}{1}{7}{2}{3}

(f) Tree T 2σ for σ = (6, 8, 4, 5, 1, 7, 2, 3)

Anthony Nouy Centrale Nantes 34 / 42

Page 65: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in supervised learning: composition of functions

Behavior of the algorithm with a sample size n = 105

Iteration rank r εtest(v) C(T , r ,H)

1 (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) 3.38 10−2 79

6 84

51

72

3

Anthony Nouy Centrale Nantes 35 / 42

Page 66: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in supervised learning: composition of functions

Behavior of the algorithm with a sample size n = 105

Iteration rank r εtest(v) C(T , r ,H)

2 (1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1) 2.95 10−2 100

3 4 2 8

6

7 1 5

Anthony Nouy Centrale Nantes 35 / 42

Page 67: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in supervised learning: composition of functions

Behavior of the algorithm with a sample size n = 105

Iteration rank r εtest(v) C(T , r ,H)

3 (1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1) 2.45 10−2 121

7 5

3 4 6 8

2 1

Anthony Nouy Centrale Nantes 35 / 42

Page 68: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in supervised learning: composition of functions

Behavior of the algorithm with a sample size n = 105

Iteration rank r εtest(v) C(T , r ,H)

4 (1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 2, 1, 2, 2, 1) 1.85 10−2 1425 (1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2) 8.97 10−3 163

5 6

3 4 7 8

2 1

Anthony Nouy Centrale Nantes 35 / 42

Page 69: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in supervised learning: composition of functions

Behavior of the algorithm with a sample size n = 105

Iteration rank r εtest(v) C(T , r ,H)

6 (1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2) 8.89 10−3 188

5 6 3 4

7 8

2 1

Anthony Nouy Centrale Nantes 35 / 42

Page 70: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in supervised learning: composition of functions

Behavior of the algorithm with a sample size n = 105

Iteration rank r εtest(v) C(T , r ,H)

7 (1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2) 8.87 10−3 188

5 6

3 4 7 8

2 1

Anthony Nouy Centrale Nantes 35 / 42

Page 71: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in supervised learning: composition of functions

Behavior of the algorithm with a sample size n = 105

Iteration rank r εtest(v) C(T , r ,H)

8 (1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2) 3.97 10−3 1889 (1, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 3, 3) 1.55 10−4 308

3 4

5 6 7 8

2 1

Anthony Nouy Centrale Nantes 35 / 42

Page 72: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in supervised learning: composition of functions

Behavior of the algorithm with a sample size n = 105

Iteration rank r εtest(v) C(T , r ,H)

10 (1, 3, 3, 3, 3, 2, 3, 3, 3, 2, 3, 3, 3, 3, 3) 1.18 10−4 36411 (1, 3, 4, 3, 4, 2, 4, 3, 4, 2, 4, 3, 4, 4, 4) 6.65 10−6 52012 (1, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 5, 5) 1.19 10−6 72313 (1, 4, 5, 4, 5, 3, 5, 4, 5, 3, 5, 4, 5, 5, 5) 1.72 10−7 86514 (1, 4, 6, 4, 6, 3, 6, 4, 6, 3, 6, 4, 6, 6, 6) 1.47 10−8 111315 (1, 5, 6, 5, 6, 3, 6, 5, 6, 3, 6, 5, 6, 6, 6) 7.02 10−9 131116 (1, 5, 7, 5, 7, 3, 7, 5, 7, 3, 7, 5, 7, 7, 7) 1.27 10−10 164317 (1, 5, 8, 5, 8, 3, 8, 5, 8, 3, 8, 5, 8, 8, 8) 3.87 10−12 201518 (1, 5, 9, 5, 9, 3, 9, 5, 9, 3, 9, 5, 9, 9, 9) 2.95 10−14 2427

3 4 2 1 5 6 7 8

Anthony Nouy Centrale Nantes 35 / 42

Page 73: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in supervised learning: composition of functions

Behavior of the algorithm for different sample sizes n.

n P(T = T 1) εtest(v) C(T , r ,H)

103 90% [1.75 10−5, 1.75 10−4] [360, 1062]104 90% [2.15 10−8, 4.10 10−3] [185, 2741]105 100% [4.67 10−15, 8.92 10−3] [163, 2594]

Table: training sample size n, estimation of the probability of obtaining the ideal tree T 1 andranges (over the 10 trials) for the test error, and the storage complexity.

Anthony Nouy Centrale Nantes 36 / 42

Page 74: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in unsupervised learning

We consider truncated normal distribution with zero mean and covariance matrix Σ. Itssupport is X = ×6

ν=1[−5σν , 5σν ], with σ2ν = Σνν , and its density (with respect to

Lebesgue measure)

f (x) ∼ exp

(−1

2xTΣ−1x

)1x∈X ,

We consider polynomial approximation spaces Hν (with adaptive selection of the degree).

Anthony Nouy Centrale Nantes 37 / 42

Page 75: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in unsupervised learning

Consider

Σ =

2 0 0.5 1 0 0.50 1 0 0 0.5 0

0.5 0 2 0 0 11 0 0 3 0 00 0.5 0 0 1 0

0.5 0 1 0 0 2

.

After a permutation (3, 6, 1, 4, 2, 5) of its rows and columns, it comes the matrix2 1 0.5 0 0 01 2 0.5 0 0 0

0.5 0.5 2 1 0 00 0 1 3 0 00 0 0 0 1 0.50 0 0 0 0.5 1

(X1,X3,X4,X6) and (X2,X5) are independent, as well as X4 and (X3,X6), so that

f (x) = f1,3,4,6(x1, x3, x4, x6)f2,5(x2, x5) = f4,1(x4, x1)f1,3,6(x1, x3, x6)f2,5(x2, x5)

with rank{2,5}(f ) = rank{1,3,4,6}(f ) = 1,rank{1,4}(f ) = rank{1}(f{1,3,6}) = rank{3,6}(f1,3,6) = rank{3,6}(f ).

Anthony Nouy Centrale Nantes 38 / 42

Page 76: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in unsupervised learningf (x) = f4,1(x4, x1)f1,3,6(x1, x3, x6)f2,5(x2, x5)

n Risk × 10−2 L2-error T C(T , r ,H)

102 [−5.50, 119] [0.53, 4.06] Fig. (a) [311, 311]103 [−7.29,−5.93] [0.22, 0.47] Fig. (b) [311, 637]104 [−7.60,−6.85] [0.11, 0.33] Fig. (c) [521, 911]105 [−7.68,−7.66] [0.04, 0.07] Fig. (c) [911, 1213]106 [−7.70,−7.69] [0.01, 0.01] Fig. (c) [1283, 1546]

Table: Ranges over 10 trials

{1, 2, 3, 4, 5, 6}{2, 3, 4, 5, 6}

{2, 4, 5, 6}{2, 4, 5}

{4, 5}

{4} {5}{2}

{6}{3}

{1}

Figure: (a) Best tree over 10 trials for n = 102

Anthony Nouy Centrale Nantes 39 / 42

Page 77: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in unsupervised learningf (x) = f4,1(x4, x1)f1,3,6(x1, x3, x6)f2,5(x2, x5)

n Risk × 10−2 L2-error T C(T , r ,H)

102 [−5.50, 119] [0.53, 4.06] Fig. (a) [311, 311]103 [−7.29,−5.93] [0.22, 0.47] Fig. (b) [311, 637]104 [−7.60,−6.85] [0.11, 0.33] Fig. (c) [521, 911]105 [−7.68,−7.66] [0.04, 0.07] Fig. (c) [911, 1213]106 [−7.70,−7.69] [0.01, 0.01] Fig. (c) [1283, 1546]

Table: Ranges over 10 trials

{1, 2, 3, 4, 5, 6}{1, 3, 4, 6}{3, 4, 6}{3, 6}

{6} {3}{4}{1}

{2, 5}

{2} {5}

Figure: (b) Best tree over 10 trials for n = 103

Anthony Nouy Centrale Nantes 39 / 42

Page 78: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in unsupervised learningf (x) = f4,1(x4, x1)f1,3,6(x1, x3, x6)f2,5(x2, x5)

n Risk × 10−2 L2-error T C(T , r ,H)

102 [−5.50, 119] [0.53, 4.06] Fig. (a) [311, 311]103 [−7.29,−5.93] [0.22, 0.47] Fig. (b) [311, 637]104 [−7.60,−6.85] [0.11, 0.33] Fig. (c) [521, 911]105 [−7.68,−7.66] [0.04, 0.07] Fig. (c) [911, 1213]106 [−7.70,−7.69] [0.01, 0.01] Fig. (c) [1283, 1546]

Table: Ranges over 10 trials

{1, 2, 3, 4, 5, 6}{1, 3, 4, 6}{1, 4}

{1} {4}

{3, 6}

{3} {6}

{2, 5}

{2} {5}

Figure: (c) Best tree over 10 trials for n = 104, 105, 106

Anthony Nouy Centrale Nantes 39 / 42

Page 79: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Example in unsupervised learningf (x) = f4,1(x4, x1)f1,3,6(x1, x3, x6)f2,5(x2, x5)

n Risk × 10−2 L2-error T C(T , r ,H)

102 [−5.50, 119] [0.53, 4.06] Fig. (a) [311, 311]103 [−7.29,−5.93] [0.22, 0.47] Fig. (b) [311, 637]104 [−7.60,−6.85] [0.11, 0.33] Fig. (c) [521, 911]105 [−7.68,−7.66] [0.04, 0.07] Fig. (c) [911, 1213]106 [−7.70,−7.69] [0.01, 0.01] Fig. (c) [1283, 1546]

Table: Ranges over 10 trials

Convergence rate close to O(n−1/2),the minimax rate for the estimation of analytic densities.

Anthony Nouy Centrale Nantes 39 / 42

Page 80: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Concluding remarks

Open questions for a complete learning theory...

The obtained theoretical results are related to the minimizer unF of the empirical risk

over the model class, but available algorithms do not guarantee to find a solution of

minv∈T T

r

Rn(v)

Algorithms generate a sequence of estimations in different model classes. Modelselection strategies should be proposed (for selecting tree, ranks and backgroundapproximation space) that guarantee oracle inequalities.

Convexification of tree tensor networks ?

Anthony Nouy Centrale Nantes 40 / 42

Page 81: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Concluding remarks

Open questions for a complete learning theory...

The obtained theoretical results are related to the minimizer unF of the empirical risk

over the model class, but available algorithms do not guarantee to find a solution of

minv∈T T

r

Rn(v)

Algorithms generate a sequence of estimations in different model classes. Modelselection strategies should be proposed (for selecting tree, ranks and backgroundapproximation space) that guarantee oracle inequalities.

Convexification of tree tensor networks ?

Anthony Nouy Centrale Nantes 40 / 42

Page 82: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Concluding remarks

Open questions for a complete learning theory...

The obtained theoretical results are related to the minimizer unF of the empirical risk

over the model class, but available algorithms do not guarantee to find a solution of

minv∈T T

r

Rn(v)

Algorithms generate a sequence of estimations in different model classes. Modelselection strategies should be proposed (for selecting tree, ranks and backgroundapproximation space) that guarantee oracle inequalities.

Convexification of tree tensor networks ?

Anthony Nouy Centrale Nantes 40 / 42

Page 83: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Concluding remarks

A fundamental problem would be to characterize approximation classes Aγ of treetensor networks, those functions for which tree tensor networks give a certainperformance

infv∈T T

r (H)R(v)−R(u) . γ(C)−1

for some growth function γ and C a measure of complexity of T Tr (H).

These are not standard regularity classes (see [Dahmen et al 2016, Bachmayr andDahmen 2015]).

For an approximation class Aγ , we would like to devise (black box) algorithms thatselect a model class T T

r (H) and provide an estimation u ∈ T Tr (H) such that

E(R(u)−R(u)) . γ(C)−1,

or γ replaced by another growth function.

Anthony Nouy Centrale Nantes 41 / 42

Page 84: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Concluding remarks

A fundamental problem would be to characterize approximation classes Aγ of treetensor networks, those functions for which tree tensor networks give a certainperformance

infv∈T T

r (H)R(v)−R(u) . γ(C)−1

for some growth function γ and C a measure of complexity of T Tr (H).

These are not standard regularity classes (see [Dahmen et al 2016, Bachmayr andDahmen 2015]).

For an approximation class Aγ , we would like to devise (black box) algorithms thatselect a model class T T

r (H) and provide an estimation u ∈ T Tr (H) such that

E(R(u)−R(u)) . γ(C)−1,

or γ replaced by another growth function.

Anthony Nouy Centrale Nantes 41 / 42

Page 85: Centrale Nantes Laboratoire de Math ematiques Jean Leray · Une journ ee autour de Ron DeVore Sep. 13, 2019 Approximation with tree tensor networks Anthony Nouy Centrale Nantes Laboratoire

Thank you for your attention

References

A. Falco, W. Hackbusch, and A. Nouy.Tree-based tensor formats.SeMA Journal, Oct 2018.

E. Grelier, A. Nouy, M. Chevreuil.Learning with tree-based tensor formats.Arxiv eprints, Nov. 2018.

A. Nouy.Higher-order principal component analysis for the approximation of tensors intree-based low-rank formats.Numerische Mathematik, 141(3):743–789, Mar 2019.

E. Grelier, R. Lebrun, A. Nouy.Learning high-dimensional probability distributions using tree tensor networksComing soon...

Anthony Nouy Centrale Nantes 42 / 42