High-order modeling of parametric systems in uncertainty ... · Parameterized problems We consider the simulation/experiment of a parameterized problem: L[u(x);Z] = 0; As we’ve

High-order modeling of parametric systems inuncertainty quantification

Akil Narayan1,2

1Department of MathematicsUniversity of Massachusetts Dartmouth

2Scientific Computing and Imaging InstituteUniversity of Utah

IMA short course in UQ: Minneapolis, MN

IMA short course in UQ

Parameterized problems

We consider the simulation/experiment of a parameterized problem:

L [u(x);Z] = 0,

As we’ve seen, this is a pervasive problem in UQ.

u is a desired solution field that depends on x and Z

x ∈ Rp (p = 1, 2, 3, 4) is a spatial/temporal variable

Z ∈ Rd (d > 1) is a parametric, random variable

Z ∼ fZ can model free parameters or any kind of uncertainty (random fields,stochastic fluctutations, geometric variability, epistemic uncertainty, etc.)


Parameterized problems: topics in UQ

L [u(x);Z] = 0,

Our focus is on approximation of u in the Z coordinate, Z ∈ Rd.

What kinds of discretizations are appropriate?

How to compute probabilistic quantities?

What is possible when d≫ 1?

I will mostly ignore the x variable in all that follows.

We are interested in methods that are “high order” for approximation in Z.


What does “high order” mean?

Computation of connection coefficients and measure modifications 479

Fig. 2 Plot of the convergenceof the operator Qp

N for variousvalues of the regularityparameter s

zn is the nth root of unity, zn = exp(2π i/n), and zjn = (zn)

j . The family p[qω] is thuswell-defined. We compute the approximations Qp

Nf (−s) for various values of N ands = 0,1,2,3. The L2

w-error between QpNf (−s) and f (−s) is computed using a 104-

point Gauss-Legendre grid. According to Corollary 5.1, we expect the asymptoticconvergence rate to be N−1/2−s . This is confirmed in Fig. 2.

5.3 Non-polynomial modifications

In many situations it is desirable to evaluate or manipulate orthogonal polynomialsfrom a non-classical family. Such a desire comes up naturally in the field of poly-nomial chaos expansions in uncertainty quantification [27]. Consider a parameter X

that can be e.g. the thermal diffusion coefficient in a chemical process. For practicalreasons, the value of X may not be known due to incomplete knowledge of under-lying physics, or it may be tied to physically random fluctuations in a medium. Insuch situations it is natural to model X as a random variable with probability densityfunction ωX . A quantity of interest f (X) (e.g. the concentration of a chemical after areaction) is determined by physics governed by the parameter X. Evaluating f froma realization of X can involve solutions to large-scale discretized partial differentialequations. It is therefore attractive to approximate the dependence of f on X with asurrogate that can be evaluated with ease. Assuming that the integral

!f 2ωXdx is

finite, a natural approximation space to consider is ΠN , using elements pn[ωX] asbasis elements: this simplifies computations and allows straightforward computationof probabilistic moments. Several extensions of this idea have been considered forvector-valued parameters [25], Karhunen-Loeve expansions [12], random fields [3],etc., and arise in applications to micro-channel fluid flow [28], electrochemical pro-cesses [4], and electromagnetic systems [20], to name a few.

A first step in applications of polynomial chaos methods is the determination of theorthogonal basis pn[ωX] that forms the approximation space. When ωX correspondsto a classical orthogonal polynomial weight function, this task is accomplished withknowledge of the recurrence coefficients. However ωX is frequently a density func-

It means that we try to exploitsmoothness (say, existence of sderivatives) to accelerateconvergence.

Recall that vanilla Monte Carloconverges at a 1/

√N rate.


Parameterized problems: today’s topics

L [u(x);Z] = 0,

Today will focus on some popular methods in today’s literature:

Common types of UQ approximations; intrusive and non-intrusivemethods

high-order approximations: generalized Polynomial Chaos

one-dimensional theory and algorithms

basic multidimensional theory

Tomorrow will focus more on novel methods:

sparse grids, QMC

nonlinear approximations, compressive sampling

near-optimal stability and equilibrium measures


What can you expect from high-order methods?

High-order methods are great!

If your response surface is smooth: extreme bang for your buck

For UQ: generally less mathematically painful than traditional high-orderalgorithms for PDE

So. Much. Existing. Theory.

Like other methods, can wrap existing software non-intrusively

High-order methods are awful!

You need to be really careful about how you compute things

Approximations are (generally) global in nature, and vanillaapproximations are non-adaptive

You really want/need control on experimental design

If you cannot get into the “region of convergence”, you might as welljust do Monte Carlo


The standard approximation approach

Most situations: can treat Z as a standard Euclidean parameter

Then in principle, we can approximate in the extended variable (x,Z).

But in practice, we treat the approximation of x and Z inseparation-of-variables format.

For these lectures, the x variable will be suppressed.For notation, we will replace Z by its lowercase version z.


Types of data

We approximate u(z) using familiar mathematical methods:Create an approximation for u in a basis with N degrees of freedom:

u(z) =N∑

n=1

cnϕn(z)

(Note: the coefficients cn depend on x!)

Given M data points for u, create an approximation for u in a basis.

“Data points” can mean nearly anything (any reasonable algebraicfunctional):

u(zm), E [u(Z)Zm] =

∫D

u(z)ZmfZ(z)dz, E [u(Z)1A(Z)]

NB: M and N should be related to each other, but very general relations arepossible.


Intrusive and non-intrusive

So how do we choose these M pieces of data?

For UQ, point-evaluations are much easier to obtain than most other types ofdata.

“Intrusive” methods – any data that is not point-evaluation (or is noteasily derivable from point-evaluation)

are typically more accurate for any given N and have attractive errorestimates (cf. Galerkin solutions to PDE’s vs collocation solutions)

typically require much more investment in mathematical derivations andprogramming man-hours

“Non-intrusive” methods – data that comes directly frompoint-evaluations.

can quickly and easily be implemented if existing deterministic black-boxor legacy software is available

error estimates are elusive or difficult to derive


Examples

Intrusive methods: variational finite-element discretizations,(Petrov-)Galerkin methods

Non-intrusive methods: Monte-Carlo estimators, finite-difference methods,spectral collocation methods

Among many UQ practitioners, non-intrusive methods play a much greaterrole. (But much research still addresses intrusive methods.)


The choice of approximation space

Our representation is

u(z) =N∑

n=1

cnϕn(z)

What should the ϕn be?(Or: what should ΦN = span ϕ1, . . . , ϕn be?)

For now, assume z ∈ R. We will choose ΦN = span1, z, z2, . . . , zn−1

.

This is the gPC approach.

(How do we know this is a good choice? Stay tuned....)

In particular, we choose ϕn as an orthonormal set of polynomials1:

deg ϕn = (n− 1), E [ϕn(Z)ϕm(Z)] = δm,n

1Yep, we need conditions on fZ to do this.


Univariate orthogonal polynomials

Given fZ , the family ϕ1, . . . , is uniquely defined (modulo sign change)

three-term recurrence

discrete Fourier transform on point-evaluation data of√fZu

if we know uN =∑N

n=1 cnϕn(z), then

EuN (Z) = c1, Var uN (Z) =

N∑n=2

c2n

Explicit forms for ϕn known for standard distributions: Z Uniform: Legendre polynomials Z Beta: Jacobi polynomials Z Normal: Hermite polynomials Z Exponential/Gamma: Laguerre polynomials

Most of today: ϕn are an orthonormal polynomial system


How can we reconstruct polynomial approximations?

Given u(z), we are trying to construct an approximation

uN (z) =N∑

n=1

cnϕn(z),

with ϕn polynomials, and we hope that uN ≈ u.

There are various ways to do this:

continuous/discrete projection

interpolation

enforce coefficient sparsity (‘compressive sampling’)

regression/least-squares

We will briefly cover many of these.


ProjectionsGiven u(z), we are trying to construct an approximation

uN (z) =N∑

n=1

cnun(z),

and we hope that uN ≈ u. For now, Z ∈ [−1, 1] ⊂ RAssume u ∈ L2

fZ([−1, 1]), i.e.,

varu(Z) = ∥u∥2L2

fZ

=∫ 1

−1u2(z)fZ(z)dz <∞.

Then a well-defined optimal projection onto a polynomial space is

uN = argminv∈ΦN

∥u− v∥L2fZ

.

And if the ϕn are an orthonormal basis for ΦN , then

uN =

N∑n=1

cnun(z)

cn = E [u(Z)ϕn(Z)] =

∫ 1

−1

u(z)ϕn(z)fZ(z)dz

This is not so easy! u is a complicated response function ... and we need totake several integrals of it.


Computing projections – the Stochastic Galerkinmethod

We cannot directly compute the moments∫ 1

−1

u(z)ϕn(z)fZ(z)dz

because we only have access to an equation that u satisfies.

One way to approximate these moments of u is to impose that the momentsof the equation’s residual vanish:

du

dt= a(Z)u, u(0, Z) = b(Z)

d

dx

[a(x, Z)

du

dx

]= b(x, Z), u(0, Z) = u(1, Z) = 0


Stochastic Galerkin

One can usually derive estimates for the Galerkin solution that are optimalwith respect to the sought projection.I.e.,

E [uN (Z)− u(Z)]2 ∼ argmin

v∈ΦN

∥u− v∥L2fZ

So this is a mathematically attractive method......but in practice this can be quite painful.

In addition, it is intrusive.


Non-intrusive methods

Since u(x,Z) is the response from a simulation or experiment, then given Zm

an iid copy of Z, it’s usually easy to generate u(x,Zm).

I.e., using u(x,Zm) to discern properties of u is non-intrusive, usually notrequiring retooling of legacy software or existing data.

For the L2fZ

projection,

cn = E [u(Z)ϕn(Z)] =

∫ 1

−1

u(z)ϕn(z)fZ(z)dz

(Requires that ϕn are orthonormal.)

Approximate cn with an integration rule.


Monte CarloWe can use a simple MC estimate:

cn =1

M

M∑j=1

u(Zj)ϕn(Zj) Zj ∼ fZ

cn = (G−1c)n, (G)n,k =1

M

M∑j=1

ϕn(Zj)ϕk(Zj)

This can be used to derive an estimate for the error:

E (u(Z)− uN (Z))2 ∼ argmin

v∈ΦN

E (u(Z)− v(Z))2

if M ≳ K(N)N logN .

The quantity K(N) is an oversampling/stability factor associated with thisproblem:

K(N) =1

Nsup

z∈[−1,1]

N∑n=1

ϕ2n(z)

This stability factor depends heavily on distribution of Z. (ϕn arefZ-orthonormal.)


Optimal stabiltiy

For most random variables of interest, requiring

M ≳ K(N)N logN

is simply infeasible.

Let’s try an IS-MC approach:

cn =1

M

M∑j=1

u(Zj)ϕn(Zj)f(Zj)

g(Zj), Zj ∼ g

This problem has a new (similar) stability factor K.

Can we tweak this biasing distribution to minimize K?

Even if we optimize g for a given N , do we need to re-optimize every time wechange N?


The arcsine distribution

It turns out one can show that g ∼ Beta(12 ,

12

)is asymptotically optimal:

Let Zj be iid sampled from the density 1π√1−z2

. Then for with

M ≳ N logN , and the estimator from cn above:

E (u(Z)− uN (Z))2 ≲ [1 + εN ]E (u(Z)− uN (Z))

2,

for “large” N .

This 1D polynomial recovery result is truefor essentially any distribution of Z.

The optimality of this g – the arcsine or “Chebyshev” distribution – willappear throughout.


Chebyshev-sampled stabilization

CHRISTOFFEL LEAST SQUARES 15

0 2 4 6 8 101

1.2

1.4

1.6

1.8

2

2.2

2.4k = 5

k = 20

k = 50

k = 100

↵ =

1/m

in(Q

k)

Jacobi, ↵ =

20 40 60 80 1001

1.2

1.4

1.6

1.8

2

Degree k

1/m

in(Q

k)

Hermite

Laguerre

Figure 3. Inverse of the minimum eigenvalue of the matrix Qk for various one-dimensional poly-nomial families. Left: Jacobi polynomials with symmetric parameters ↵ = . Right: Hermite andLaguerre polynomials.

to conclude that Qk

is quite well-conditioned for classical one-dimensional problems, but this isunlikely to persist for large dimensions.

We are able to prove a convergence result if |f(z)| L for all z 2 D. The random function S

N

f

is the CLS-bounded discrete projection. We introduce a truncation of this discrete projection:

ef(z) = T

L

heS

f

i,

where T

L

(x) = sgn(x)min |x|, L. Following the arguments in [14], we can bound the error for thetruncated CLS estimator.

Theorem 5.2. For a given function f , let g , f be the L

2w

projection onto P. If the numberof samples S in the CLS-bounded algorithm satisfies (26), then the mean-square L

2w

error of thetruncated CLS approximation satisfies

hkf e

fk2w

i kgk2

w

+ 42(Q)kegk2w

+"(S)

min(Q)kgk ew +

8L2

S

r

(30)

with "(S) , 22 log 2(1+r) log S

! 0 as S ! 1, and (Q) = max(Q)/min(Q) the 2-norm condition number

of Q.

Proof. Our proof follows that of Theorem 2 in [14]. Under the sampling condition (26) with = 12 ,

we have the following inequality with probability 2/Sr:

max

G1

max

Q1/2

max

Q1/2G1Q1/2

max

Q1/2

2

min (Q)(31)

We denote the probabilistic set under which this happens as +, with the set under which thisfails as . Then

hkf e

fk2w

i

hkf eS

fk2w+

i+

8L2

S

r

where we have used the fact that kTL

[f ]kw

kfkw

, and T

L

[f ] = f if |f | L. We note that since g

is L2w

-orthogonal to f , and eS

is the identity on P, then

f eS

f = g eS

g,

22 AKIL NARAYAN, JOHN D. JAKEMAN, AND TAO ZHOU

1 1.5 2 2.5 3 3.5 4 4.5 5

101

102

103

Polynomial degree k

Conditionnumber

Laguerre, S = N logN , k

MC d = 10

CLS d = 10

2 4 6 8 10 12 14100

102

104

106

108

1010

1012

Polynomial degree k

Conditionnumber

Laguerre, S = N logN , 2/5k

MC d = 10

CLS d = 10

Figure 7. Condition number against polynomial degree k for total-degree k (left)

and hyperbolic-cross 2/5k (right) 10-dimensional Laguerre polynomial spaces

approximation for the test function f(z) = expP

d

i=1 zi

, in the 3-dimensional total degree

space. Again, our approach remains stable, while the MC sampling strategy becomes unstable asthe polynomial degree is increased. However, in the case the CLS estimator has noticeably worseaccuracy.

5 10 15 20 25 301015

1012

109

106

103

100

Polynomial degree k` 2

error

Legendre, d = 2

MC S = 1.5N

CLS S = 1.5N

5 10 15 20

105

103

101

101

103

105

Polynomial degree k

` 2error

Hermite, d = 3

MC S = 2N

CLS S = 2N

Figure 8. Approximation error against polynomial degree k. Left: Legendre ap-

proximation of f(Y ) = exp

Pdi=1 Y

2i

. Right: Hermite approximation of f(Y ) =

exp

Pdi=1 Yi

.

6.2.2. Di↵usion equation. Consider the heterogeneous di↵usion equation in one-spatial dimension

(35) d

dx

a(x, z)

du

dx

(x, z)

= 1, (x, z) 2 (0, 1) Iz, u(0, z) = 0, u(1, z) = 0.

with an uncertain di↵usivity coecient that satisfies

(36) a(x, z) = a+

a

dX

k=1

p

k

k

(x)zk

,


Interpolative recovery

We want a reconstruction

uN =N∑

n=1

cnϕn

using M samples. Requiring M > N is a bit onerous.

One concrete way to form a reconstruction with M = N is interpolation.

uN (zn) = u(zn), n = 1, . . . , N

I.e.,

Φc = u, (Φ)m,n = ϕn(zm)


Interpolatory stability

Like the least-squares regression before, there is a stability constant for thisrecovery problem. First, compute cardinal functions:

ℓn(zm) = δn,m, ℓn ∈ ΦN

The stability factor is

ΛN = supz∈[−1,1]

N∑n=1

|ℓn(z)|

The error in the reconstruction behaves like

∥uN (z)− u(z)∥L∞ ≤ ΛN argminv∈ΠN

∥u− v∥L∞

Interpolatory reconstructions (rather, ΛN ) are extremely sensitive to thechoice of zm.


Arcsine again

But how badly does ΛN behave? Turns out...really badly.

It’s so bad, we can only distinguish exponential dependence fromnon-exponential dependence:

Let z1, . . . be a sequence of samples, with ΛN the stability factor for the firstN samples. Then

ΛN exp(−cN)→ 0 for any c > 0 =⇒ 1

N

N∑n=1

δzn ⇒1

π√1− z2

I.e., if one wants to avoid exponentially-bad stability, the samples mustasymptotically distribute like the arcsine measure.


QuadratureWe’ve discussed MC quadrature/regression and interpolation fornon-intrusive recovery of the representation

uN (z) =

N∑n=1

cnϕn(z)

But there’s one very nice quadrature method we haven’t explored.

For any fZ that supports an orthogonal polynomial basis, then given anyM > 0, there exist M sample locations z1, . . . , zM and M weightsw1, . . . , wM such that

N∑m=1

wmp(zm) =

∫D

p(z)fZ(z)dz,

for all polynomials p of degree 2M − 1 or less.

This set of samples and weights is the M -point Gauss quadrature rule.


Gaussian quadrature

Gauss quadrature essentially makes the transformation between√fZ(z)

√1− z2u↔ cn an isometry.

So Gaussian quadrature is really good. How do these samples distribute?

More importantly, can we use their distribution to tell us something about a“best” measure to sample from?


Nonstandard 1D distributions

If Z has density fZ , but is not a standard distribution, are we stuck?

“Modification” algorithms are one way to overcome this.


Multivariate approximation

Suppose Z ∈ Rd takes values in D ⊂ Rd with density fZ .

If fZ has finite polynomial moments of all orders, the same gPC idea works,but is far more technical.

If Z has independent components, then

fZ(z) = fZ1(z1) fZ2(z2) · · · fZd(zd)

and

D = ×dq=1Iq,

where Iq ⊂ R are one-dimensional domains.


Multivariate polynomials

If Z has independent components, we usually use products of univariatepolynomials to form d-variate basis functions:

z = (z1, . . . , zd)

α = (α1, . . . , αd) “Multi-index”

|α| = α1 + · · ·+ αd “Multi-index” norm

ϕα(z) =d∏

j=1

ϕj,αj (zj)

uk(z) =∑|α|≤k

cαϕα(z) “Expansion of order/degree k”

Things could start to get messy...we’ll use simpler notation:

1, . . . , N ←→ α | |α| ≤ k , fk(z) =∑|α|≤k

cαϕα(z) =N∑

n=1

cnϕn(z)


Multivariate polynomials

Oh right, one more small detail:

Degree-k expansion =⇒ N =

(d+ kk

)=

(d+ k)!

k!d!∼ kd

(That’s a big N)

Oh, and much of the computational 1D theory does not carry over to themultidimensional case, even if Z has independent components.


Multivariate odds and ends

Polynomial spaces are sparsely or adaptively constructed.4 A. Narayan AND T. Zhou

0 5 10 15 20 25

0

5

10

15

20

25

↵1

↵2

Tensor product space P dk indices

0 5 10 15 20 25

0

5

10

15

20

25

↵1

↵2

Total degree space T dk indices

0 5 10 15 20 25

0

5

10

15

20

25

↵1

↵2

Hyperbolic cross space Hdk indices

Figure 1. Indices associated with the tensor product set

Pd,k (left), the

total degree set

Td,k (center), and the hyperbolic cross set

Hd,k (right). All

sets have dimension d = 2 and degree k = 25.

Some classical polynomial spaces to which uN may belong are the tensor-product, totaldegree, and hyperbolic cross spaces, respectively:

Pd,k =

↵ 2 d0

| max

j↵j k

, P dk = span

z↵ |↵ 2

Pd,k

,(4a)

Td,k =

n

↵ 2 d0

| |↵| ko

, T dk = span

z↵ |↵ 2

Td,k

(4b)

Hd,k =

8

<

:

↵ 2 d0

|Y

j

↵j + 1 k + 1

9

=

;

, Hdk = span

z↵ |↵ 2

Hd,k

(4c)

We have chosen particular definitions of these spaces above, but there are generalizations.E.g., dimensional anisotropy can be used to ‘bias’ the index set toward more importantdimensions, or a different `p norm (0 < p < 1) can be placed on index space to tailor thehyperbolic cross spaces. The dimensions of T d

k and P dk are

tdk , dimT dk =

d+ kk

, pdk , dimP dk = (k + 1)

d.(5)

The dimension of Hdk has the following upper bound [61]:

hdk.= dimHd

k b(k + 1)(1 + log(k + 1))

d1c.

For index sets that do not fall into the categories defined by (4), we will use the notationP () to denote the corresponding polynomial space.

For the index sets (4), we immediately see the curse of dimensionality: the dimensions ofT dk and P d

k increase exponentially with d, although tdk is smaller than pdk. The indices in thesets

Pd,k,

Td,k, and

Hd,k are graphically plotted in Figure 1 for d = 2 and polynomial degree

k = 25.This highlights a challenge with gPC in high dimensions: the number of degrees of freedom

required to resolve highly oscillatory structure grows exponentially with dimension. (Indeed,this is a challenge for any non-adapted multivariate approximation scheme.) In the nextsection we narrow our focus to collocation schemes.

And there are analogues of univariate optimal sampling measures.


Polynomial methods

We’ve seen that approximation with polynomials works well. In fact, for evenmildly smooth functions, this is an almost-universal truth:

Let u : [−1, 1]d → R be a function of d variables.

Assume that u has s ≥ 0 derivatives. Suppose we try to approximate u witha degree-k polynomial:

u(z) ≈ uN (z) =∑|α|≤k

cnϕn(z)(z)

The best possible uN we can find obeys

∥u− uN∥ ≲ k−s


Polynomial methods

The best possible uN we can find obeys

∥u− uN∥ ≲ k−s

Note that

Convergence holds in most norms of interest

the smoother the function (s ↑), the faster the convergence

if s is ∞, we get exponential convergence in k!

the degree k is not linearly related to degrees of freedom N

so essentially, ∥u− uN∥ ≲ N−s/d


N -widths: A peek down the rabbit hole

Perhaps we can do better than N−s/d by abandoning polynomials?

Still represent u with N degrees of freedom:

u ≈ uN (z) =N∑

n=1

cnϕn(z),

but allow ourselves to pick ϕn as arbitrary functions in a given space.

The Gelfand/Kolmogorov N -width of a space of functions V is the best(smallest) error we obtain by using the N -dimensional subspace spanned byϕn:

dN (V ) = infϕ1,...,ϕN

supu∈V∥u− uN∥

This definition is not constructive, and does not really yield anycomputational algorithms. (Well, if V = R

N , this is SVD-ish.)


Polynomials again

The interesting surprise is that, for V a space of d-dimensional functions thathave s derivatives:

dN (V ) ∼ N−s/d

With polynomials, we got N−s/d error.

By relaxing our approximation to general functions, we get again N−s/d error.

The good news: polynomials are essentially optimal.

The bad news: for a fixed error E, the best we can do is require

N ≳ E−d/s =(1E

)d/sFor a fixed error, the minimum N needed scales exponentially with d. This isthe approximation theory manifestation of the curse of dimensionality.

Of course, with large s, the required N scales inverse-exponentially with s.Tongue-in-cheek: “blessing of smoothness”


Multidimensional integration – sparse grids

Recall that for polynomials we frequently interested in approximating∫D

u(z)ϕn(z)dz

If D is a hypercube, we can use tensor-product quadrature:

N∑j1=1

N∑j2=1

· · ·N∑

jd=1︸︷︷︸wj1,...,jdu(zj1 , . . . , zjd)ϕn(zj1 , . . . , zjd)

This requires Nd samples.

An alternative that frequently attains similar accuracy are sparse gridquadrature rules.


Sparse gridsA simple sparse grid is a union of tensor-product grids.

Suppose Q1, Q2, . . ., are a sequence of one-dimensional grids.

∪d≤|α|≤d+k

d∏j=1

Qαj

l 2

W11

W 12

W21l 1

Figure 3.3. Scheme of subspaces for d = 2. Each square representsone subspace Wl with its associated grid points. The supports of thecorresponding basis functions have the same mesh size hl and coverthe domain Ω.

of subspaces Wl as shown in Figure 3.3 for the 2D case, V (∞)n corresponds

to a square sector of subspaces: see Figure 3.4.Obviously, the dimension of V (∞)

n (i.e., the number of inner grid pointsin the underlying grid) is

!!V (∞)n

!! ="2n − 1

#d = O(2d·n) = O(h−dn ). (3.31)

For the error u − u(∞)n of the interpolant u(∞)

n ∈ V (∞)n of a given function

u ∈ Xq,20 (Ω) with respect to the different norms we are interested in, the

following lemma states the respective results.

Bungartz and Griebel, Acta Numerica 20014IMA short course in UQ

Sparse grids

Additionally, we introduce the hierarchical increments Wl,

Wl := span!φl,i : 1 ≤ i ≤ 2l − 1, ij odd for all 1 ≤ j ≤ d

", (3.11)

for which the relationVl =

#

k≤l

Wk (3.12)

can be easily seen. Note that the supports of all basis functions φl,i spanningWl are mutually disjoint. Thus, with the index set

Il :=!i ∈ Nd : 1 ≤ i ≤ 2l − 1, ij odd for all 1 ≤ j ≤ d

", (3.13)

we get another basis of Vl, the hierarchical basis!φk,i : i ∈ Ik,k ≤ l

", (3.14)

which generalizes the well-known 1D basis shown in Figure 3.2 to thed-dimensional case by means of a tensor product approach. With thesehierarchical difference spaces Wl, we can define

V :=∞$

l1=1

· · ·∞$

ld=1

W(l1,...,ld) =#

l∈Nd

Wl (3.15)

with its natural hierarchical basis!φl,i : i ∈ Il, l ∈ Nd

". (3.16)

Except for completion with respect to the H1-norm, V is simply the under-lying Sobolev space H1

0 (Ω), i.e., V = H10 (Ω).

Figure 3.2. Piecewise linear hierarchical basis (solid)vs. nodal point basis (dashed).

Local or global basis functions can beconstructed.

Integration proceeds by integratingindividual functions.

For that, neither the shape or size of the supports nor the number of degrees offreedom per element should be changed. In our hierarchical context, this can beobtained by using a corresponding number of a node’s ancestors to determine thelocal interpolant (see Fig. 5 for the quartic case). Accordingly, the hierarchicalbasis polynomials of degree p – the counterparts of the /l;i from (3) – can beconstructed by prescribing a value of 1 in the local point and 0 in this point’s nextp hierarchical ancestors. Note that this implies that basis functions of degree p cannot appear earlier than on level l ¼ p " 1.

With these hierarchical polynomial bases, an increased interpolation accuracy ofOðN"p"1 $ j log2 N jðpþ2Þðd"1ÞÞ with respect to both the L1- and the L2-norm couldbe proved (for corresponding higher regularity, of course), and numericalexamples show the same behaviour for the finite element solution of elliptic BVP.For a formal and detailed analysis of higher order elements on sparse grids, werefer to [5].

The application of polynomial bases of this type to multivariate quadrature re-veals a first drawback for non-vanishing boundary values. Here, as in the linearcase, the construction starts with linear basis functions located on the boundary

Fig. 5. Needed hierarchical ancestors for p ¼ 4 (i.e. boundary points of the respective basis function’ssupport and two more; solid) and descendants (dotted) of two grid points

Fig. 6. Hierarchical basis polynomials for p ¼ 4: construction via hierarchical Lagrangian interpola-tion (left) and used restriction to the respective hierarchical support (right)

Multivariate Quadrature on Adaptive Sparse Grids 95

Barthelmann, Advances in Comp. Math 2000


Sparse grids

Hierarchical construction is useful for adaptivity.

f planed ðxÞ ¼

f 1ðxÞ; 3x1 þ 2x2 % 0:01 > 0;f 2d ðxÞ þ 1

2 cosðpðx1 þ x2 þ 0:3ÞÞ þ 1; otherwise:

(ð7dÞ

where

f 1ðxÞ ¼ exp %X2

i¼1

x2i

!% x3

1 % x32;

f 2d ðxÞ ¼ 1þ f 1ðxÞ þ 1

4d

Xd

i¼2

x2i :

Fig. 4 displays the collocation nodes generated by the discontinuity detector for each of the four two-dimensional bench-mark problems. Red points represent intervals identified as containing a discontinuity. The classification of the Monte-Carlosamples for the four benchmark problems are visualized in Fig. 5. Note that the classification does not require functionalevaluations. These two-dimensional functions are decomposed into two elements. Each sample is either classified as belong-ing to element E1 (blue points) or element E2 (green points). Red points represent samples that cannot be classified with cer-tainty as they fall within the d resolution employed by the detector. The discontinuity detector has two main tunableparameters, the minimum resolution d and the maximum order of the polynomial annihilation scheme. In all the examplesd ¼ 2 & 10%6 and the maximum order in (6) is m ¼ 6.

Fig. 12. Discontinuity detection applied to a two dimensional function f multi2 with multiple discontinuities. The true function is shown in (a), the points

generated by the discontinuity detection algorithm are shown in (b), and 10,000 randomly classified points are shown in (c). Four smooth regions areidentified. Green points represent the subset of random points that cannot be classified with certainty. (For interpretation of the references to colour in thisfigure legend, the reader is referred to the web version of this article.)

J.D. Jakeman et al. / Journal of Computational Physics 242 (2013) 790–808 803

5.3 A Good and a Bad Example

Obviously, adaptive schemes are advantageous if the function’s rough parts areconcentrated on a confined part of the domain – as in the following example withsurpluses tending towards infinity close to parts of the boundary:

f ðxÞ :¼ 1

43

Z

½0;1%3

Y3

i¼1

1ffiffiffiffixip dx : ð12Þ

Figure 12 shows how the adaptive refinement improves the convergence process.The third curve in the diagram on the left denoted ‘‘sorted nodes’’ was plotted inorder to evaluate the theoretical power of our refinement criterion (the missingmonotony will be discussed in the next subsection). The volume contributions ofall nodes were sorted a posteriori according to their absolute value. Then, the Nmost important points were chosen to get what our current algorithm assumes tobe the optimal selection of points. Although Fig. 12 supports the chosen criterion,we will see in the next example that volume contribution might not be the onlyproperty one has to look at.

Of course, an adaptive algorithm can not be expected to considerably improve theintegration of a very smooth function, because the resulting grids are always closeto regular ones. However, a straightforward adaptation can even be harmful. Forthat, consider

f ðxÞ :¼ 1

ðe& 1Þ5

Z

½0;1%5

Y5

i¼1exi dx : ð13Þ

As before, Fig. 13 shows the three curves for the regular grid, the adaptive grid,and the sum of the sorted nodes. Here, adaptivity worsens the results by up tofour digits. The second remarkable thing is that, generally, even the sum of sortednodes performs worse, which indicates some serious flaws in the adaptation

Fig. 12. Successful adaptive quadrature of (12): convergence behaviour (left) and adaptive grid forx3 ¼ 0:5 (right)

Multivariate Quadrature on Adaptive Sparse Grids 101


Quasi Monte Carlo

Quasi Monte Carlo methods can deterministically approximate integrals∫[[0,1]d

u(z)dz ≈ 1

M

M∑m=1

u(zm)

The samples zm need to be “space-filling”.

The theoretical underpinning of this method is∣∣∣∣∣∫[[0,1]d

u(z)dz − 1

M

M∑m=1

u(zm)

∣∣∣∣∣ ≤ V (u)D∗M .

The quantity V (u) is a measure of the size/stability of u, and D∗M measures

the degree to which z1, . . . , zM is space-filling.

In practice, V (u) is not computable, difficult to estimate, and is infinite formany functions of interest.


DiscrepancyThe quantity D∗

M is the discrepancy between the empirical distribution andLebesgue/Borel measure:

D∗M = sup

A=∏d

j=1[0,yj ]

∣∣∣∣#(A ∩ zmm)

M− µ(A)

∣∣∣∣

https://en.wikipedia.org/wiki/Halton sequence


How low can you go?

Obviously we want D∗M to be small.∣∣∣∣∣

∫[[0,1]d

u(z)dz − 1

M

M∑m=1

u(zm)

∣∣∣∣∣ ≤ V (u)D∗M .

However, the best known low discrepancy points satisfy

D∗M ≳ (logM)d−1

M

Several point sets approach this lower bound: Halton sequences, Sobolsequences, van der Corput sequences, (t, n, s) nets, . . ..

We effectively get O(1/N) convergence in the end.


Nonlinear approximations

Now for something completely different.

Recall that most approximations have the form

uN (z) =

N∑n=1

cnun(z)

The map z 7→ u(z) is frequently low-rank.

This depends on what you’re interested in:

Eu(x,Z), u(x∗, Z) Pr [u(x∗, Z) ∈ A]

The parametric dimensionality is frequently an extreme overestimation of theeffective dimensionality.


Fractional differentiationThe solutions of the PDE

(−∇)(1−z/2)x u(x, z) = f(x, z), z ∈ [αmin, 2]

for various z have low-rank structure.P.N. Vabishchevich / Journal of Computational Physics 282 (2015) 289–302 301

Fig. 17. Solution for α = 0.75 (vmax = 0.152168).

Fig. 18. The error of the solution for δ = 0.5λ1.

Fig. 19. The error of the solution for δ = 0.25λ1.

0 10 20 3010−15

10−10

10−5

100

105

Sin

gularValues

S ingular Values of u(·, α) : α ∈ (αmin, 1.9)

αmin= 0.0001αmin= 0.001αmin= 0.01αmin= 0.1

Bonito and Pasciak, Math Comp 2014


Hippocampus classification

FLETCHER et al.: PGA FOR THE STUDY OF NONLINEAR STATISTICS OF SHAPE 1003

representation, the symmetric space mean algorithm involvinggeodesic computations is replaced by a simpler linear average.However, linear averaging produces invalid medial atoms. Todemonstrate this we computed a linear average of the atoms ata corresponding location in the hippocampus mesh across thepopulation. This average was compared to the symmetric spaceaverage described in this paper. The resulting two medial atomsare shown in Fig. 6. The symmetric space mean is a valid medialatom, while the linear average is not because the two spoke vec-tors do not have equal length. The ratio of the two spoke lengthsin the linear average is 1.2 to 1.

D. M-rep PGA

The PGA algorithm for m-rep models is a direct adaptationof Algorithm 2. The only concern is to check that the data islocalized enough for the projection operator to be unique. Thatis, we must determine the neighborhood used in (10) and (11).Again there is no restriction on the and components. For

components it is also sufficient to consider a neighborhoodwith radius . Therefore, there are no further constraints onthe data than those discussed for the mean. Also, we can expectthe projection operator to be well-approximated in the tangentspace, given the discussion of the error in Section IV.C and thefact that the data lie within 0.1276 rad. from the mean. Finally,the computation of the PGA of a collection of m-rep models isas follows.

Algorithm 5: M-rep PGAInput: M-rep models,Output: Principal directions,

Variances,intrinsic mean of (Algorithm 4)

eigenvectors/eigenvalues of .

Analogous to linear PCA models, we may choose a subsetof the principal directions that is sufficient to describe thevariability of the m-rep shape space. New m-rep models may begenerated within this subspace of typical objects. Given a set ofreal coefficients , we generate a new m-repmodel by

(17)

where is chosen to be within .The m-rep PGA algorithm was applied to the aligned hip-

pocampus data set. Fig. 7 displays the first three modes of vari-ation as the implied boundaries of the m-reps generated fromPGA coefficients .A plot of the eigenvalues and their cumulative sums is given inFig. 8. The first 30 modes capture 95 percent of the total vari-ability, which is a significant reduction from the original 192dimensions of the hippocampus m-rep model.

Fig. 6. The resulting average of corresponding medial atoms in thehippocampus models using (a) symmetric space averaging and (b) linearaveraging. Notice that the linear average is not a valid medial atom as the twospokes do not have equal length.

Fig. 7. The first three PGA modes of variation for the hippocampus m-reps.From left to right are the PGA deformations for , and .

Fig. 8. A plot of the eigenvalues from the modes of variation and theircumulative sums.

VI. DISCUSSION

We presented a new approach to describing shape variabilitythrough PGA of medial representations. While m-rep parame-ters are not linear vector spaces, we showed that they are ele-ments of a Riemannian symmetric space. We developed PGAas a method for efficiently describing the variability of data on

About 30 components can describe ∼95% of the variation in a model withabout 190 apparently important dimensions.

Fletcher, Lu, Pizer, and Joshi, IEEE Transactions on Medical Imaging 2004


Subspace detection

So many models have apparently low-rank solution manifolds. How to exploitthis?

One strategy: find a subspace of the z coordinate that is most important.

E.g., active subspaces seek a linear transformation

Wy = z,

with y ∈ Rs with s < d. (W has orthonormal columns.)

Now given u = u(z, just approximate as a function of y:

u(z) ≈ v(y) = E [u(x) |y]


The N -width again

One explanation for all these is the N -width of a subspace

dN (U) = infϕ1,...,ϕN

supu∈U

infuN∈spanϕ1,...,ϕN

∥u− uN∥

NB: U need not be a linear space, and ϕj are general.

U = u(·, z) | z ∈ D

It turns out that several manifolds U of interest have provablyexponentially-decaying N -widths.

In principle, N pieces of information is exponentially accurate, regardless ofparametric dimension.


Greedy schemes

Greedy schemes form approximations sequentially:

ϕn+1 = argmaxu∈U

infun∈spanϕ1,...,ϕn

∥u− un∥

reduced order/basis models

empirical interpolation

(greedy) optimal experimental design

skeleton factorizations


Weak greedy schemes

Weak greedy schemes only optimize approximately:


∥ϕn+1 − un∥ ≥ δ argmaxu∈V


∥u− un∥,

where 0 < δ < 1.

Computational implementations are all weakly greedy.

δ allows us to make “mistakes”.


Close enough to the N width

Weak greedy schemes get very close to the true N -width:

If ϕ1, . . . , ϕN are chosen with a weakly greedy scheme, then

supu∈U

infuN∈spanϕ1,...,ϕN

∥u− uN∥ ≲√dN (U)

Questions for the ages:

Can we show that a general reduced-order algorithms is weakly greedy?

Can we show in general that N -widths are small?

How much of a mistake (δ) can we make?

Even the inner “inf” can be approximate. How approximate?

These weak greedy schemes are the modern, practical high-orderalgorithms


References

“Fast Numerical Methods for Stochastic Computations: A Review”, DXiu, Communications in Computational Physics, v5 (2009)

“Sparse grids”, H-J Bungartz, M Griebel, Acta Numerica, v13 (2004)

“When Are Quasi-Monte Carlo Algorithms Efficient for HighDimensional Integrals?”, I.H. Sloan, H. Wozniakowski, Journal ofComplexity, v14 (1998)

N -widths in approximation theory, A. Pinkus, Springer 1985

“Nonlinear approximation”, R.A. DeVore, Acta Numerica, v7 (1998)


Documents

High-order modeling of parametric systems in uncertainty ... · Parameterized problems We consider the simulation/experiment of a parameterized problem: L[u(x);Z] = 0; As we’ve