44
Statistical guarantees for Bayesian uncertainty quantification in inverse problems Richard Nickl Statistical Laboratory University of Cambridge (UK) Isaac Newton Institute, 9th April 2018 Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 1 / 23

Statistical guarantees for Bayesian uncertainty ...€¦ · dP u f (Y)d( f) R dP u f (Y )d f ... For a xed ‘credibility level’ 1 ; >0 and test function , let C "= fx 2R : jx h

Embed Size (px)

Citation preview

Statistical guarantees for Bayesian uncertaintyquantification in inverse problems

Richard Nickl

Statistical LaboratoryUniversity of Cambridge (UK)

Isaac Newton Institute, 9th April 2018

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 1 / 23

Bayesian statistics and uncertainty quantification

Consider observing data Y drawn at random from some unknown probabilitydistribution Pθ belonging to a statistical model P = {Pθ : θ ∈ Θ} indexed byθ in some parameter space Θ.

The Bayesian statistician specifies a prior distribution Π on Θ, assumes

Y |θ ∼ Pθ,

and bases inferences on the ‘updated’ posterior distribution

θ|Y ∼ Π(·|Y ),

typically derived from Bayes theorem.

We can use the posterior characteristics (such as mean or mode) as analgorithmic output θ(Y ) to estimate θ. Bayesian credible sets

C (α,Y ) ={θ : |θ − θ(Y )| ≤ R(α,Y )

},

with R(α,Y ) the 1− α posterior quantiles, quantify the uncertainty in θ.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 2 / 23

Bayesian statistics and uncertainty quantification

Consider observing data Y drawn at random from some unknown probabilitydistribution Pθ belonging to a statistical model P = {Pθ : θ ∈ Θ} indexed byθ in some parameter space Θ.

The Bayesian statistician specifies a prior distribution Π on Θ, assumes

Y |θ ∼ Pθ,

and bases inferences on the ‘updated’ posterior distribution

θ|Y ∼ Π(·|Y ),

typically derived from Bayes theorem.

We can use the posterior characteristics (such as mean or mode) as analgorithmic output θ(Y ) to estimate θ. Bayesian credible sets

C (α,Y ) ={θ : |θ − θ(Y )| ≤ R(α,Y )

},

with R(α,Y ) the 1− α posterior quantiles, quantify the uncertainty in θ.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 2 / 23

Bayesian statistics and uncertainty quantification

Consider observing data Y drawn at random from some unknown probabilitydistribution Pθ belonging to a statistical model P = {Pθ : θ ∈ Θ} indexed byθ in some parameter space Θ.

The Bayesian statistician specifies a prior distribution Π on Θ, assumes

Y |θ ∼ Pθ,

and bases inferences on the ‘updated’ posterior distribution

θ|Y ∼ Π(·|Y ),

typically derived from Bayes theorem.

We can use the posterior characteristics (such as mean or mode) as analgorithmic output θ(Y ) to estimate θ. Bayesian credible sets

C (α,Y ) ={θ : |θ − θ(Y )| ≤ R(α,Y )

},

with R(α,Y ) the 1− α posterior quantiles, quantify the uncertainty in θ.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 2 / 23

Credible sets for uncertainty quantification

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 3 / 23

The Bernstein-von Mises (BvM) theorem

First discovered by Laplace (1812), expanded by Bernstein and von Mises in theearly 20th century, and proved in its general form by Le Cam (1986), the BvMtheorem states, for large sample size n:

Π(·|Y ) ≈ N(θMLE , (nI (θ0))−1) for Y ∼ Pθ0 , θ0 ∈ Θ ⊂ Rp, p ∈ N,

whenever the prior Π has a positive density on Θ and if the Fisher informationI (θ0) is invertible. The approximation holds in total variation distance on p.m.’s.

The posterior mean EΠ(θ|Y ) can typically replace the maximum likelihoodestimator θMLE in the above approximation.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 4 / 23

Consequences of BvM theorems for UQ

Computing posterior probabilities is then approximately the same as computingthem under a N(θMLE , (nI (θ0))−1)-distribution, and so for p fixed and n→∞:

Cn s.t. Π(Cn|Y ) = 1− α (Bayesian credible set)

⇒ Pθ0 (θ0 ∈ Cn)→ 1− α, (frequentist confidence set)

|Cn|Rp = OPθ0(1/√n) (optimal diameter)

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 5 / 23

Bernstein - von Mises (BvM) in infinite dimensions?

When the parameter-space Θ is high- or infinite-dimensional, the justificationof Bayes methods is less clear.

In his 1999 Wald lecture (published in AoS), David Freedman showed that anatural Euclidean L2-credible ball Cn in a standard (‘direct’) nonparametricregression model with Gaussian priors is NOT a valid frequentist confidenceset, and in fact may have frequentist coverage probability

Pθ0 (θ0 ∈ Cn)→ 0.

With I. Castillo (2013, 2014, AoS) we showed that while BvM results do nothold in L2, they can hold in ‘negative smoothness function spaces’, such as inSobolev spaces Hγ for γ < −d/2.

Here we present Bernstein-von Mises theorems for some infinite-dimensionalPDE-constrained inverse problems.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 6 / 23

Bernstein - von Mises (BvM) in infinite dimensions?

When the parameter-space Θ is high- or infinite-dimensional, the justificationof Bayes methods is less clear.

In his 1999 Wald lecture (published in AoS), David Freedman showed that anatural Euclidean L2-credible ball Cn in a standard (‘direct’) nonparametricregression model with Gaussian priors is NOT a valid frequentist confidenceset, and in fact may have frequentist coverage probability

Pθ0 (θ0 ∈ Cn)→ 0.

With I. Castillo (2013, 2014, AoS) we showed that while BvM results do nothold in L2, they can hold in ‘negative smoothness function spaces’, such as inSobolev spaces Hγ for γ < −d/2.

Here we present Bernstein-von Mises theorems for some infinite-dimensionalPDE-constrained inverse problems.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 6 / 23

Bernstein - von Mises (BvM) in infinite dimensions?

When the parameter-space Θ is high- or infinite-dimensional, the justificationof Bayes methods is less clear.

In his 1999 Wald lecture (published in AoS), David Freedman showed that anatural Euclidean L2-credible ball Cn in a standard (‘direct’) nonparametricregression model with Gaussian priors is NOT a valid frequentist confidenceset, and in fact may have frequentist coverage probability

Pθ0 (θ0 ∈ Cn)→ 0.

With I. Castillo (2013, 2014, AoS) we showed that while BvM results do nothold in L2, they can hold in ‘negative smoothness function spaces’, such as inSobolev spaces Hγ for γ < −d/2.

Here we present Bernstein-von Mises theorems for some infinite-dimensionalPDE-constrained inverse problems.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 6 / 23

Bernstein - von Mises (BvM) in infinite dimensions?

When the parameter-space Θ is high- or infinite-dimensional, the justificationof Bayes methods is less clear.

In his 1999 Wald lecture (published in AoS), David Freedman showed that anatural Euclidean L2-credible ball Cn in a standard (‘direct’) nonparametricregression model with Gaussian priors is NOT a valid frequentist confidenceset, and in fact may have frequentist coverage probability

Pθ0 (θ0 ∈ Cn)→ 0.

With I. Castillo (2013, 2014, AoS) we showed that while BvM results do nothold in L2, they can hold in ‘negative smoothness function spaces’, such as inSobolev spaces Hγ for γ < −d/2.

Here we present Bernstein-von Mises theorems for some infinite-dimensionalPDE-constrained inverse problems.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 6 / 23

Inverse problems

Notation: For H1,H2 Hilbert spaces consider an injective map

u = uf = u(f ), u : H1 → H2.

The goal is to recover f from uf , or from some discrete linear measurementfunctionals (Li (uf ))ni=1, such as point evaluations uf (xi ) at some grid (xi )

ni=1

on an underlying domain.

Real observations usually exhibit statistical behaviour, e.g., we observe

Yi = uf (xi ) + εi , i = 1, . . . , n, εi ∼i.i.d. N(0, 1),

or equivalently for W a Gaussian white noise in H2, the functional equation

Y = uf + εW, ε > 0 (≈ 1/√n) a noise level.

If H2 is infinite-dimensional, W is not defined as a proper Borel randomvariable in H2. “The data are rougher than uf ”.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 7 / 23

Inverse problems

Notation: For H1,H2 Hilbert spaces consider an injective map

u = uf = u(f ), u : H1 → H2.

The goal is to recover f from uf , or from some discrete linear measurementfunctionals (Li (uf ))ni=1, such as point evaluations uf (xi ) at some grid (xi )

ni=1

on an underlying domain.

Real observations usually exhibit statistical behaviour, e.g., we observe

Yi = uf (xi ) + εi , i = 1, . . . , n, εi ∼i.i.d. N(0, 1),

or equivalently for W a Gaussian white noise in H2, the functional equation

Y = uf + εW, ε > 0 (≈ 1/√n) a noise level.

If H2 is infinite-dimensional, W is not defined as a proper Borel randomvariable in H2. “The data are rougher than uf ”.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 7 / 23

Inverse problems

Notation: For H1,H2 Hilbert spaces consider an injective map

u = uf = u(f ), u : H1 → H2.

The goal is to recover f from uf , or from some discrete linear measurementfunctionals (Li (uf ))ni=1, such as point evaluations uf (xi ) at some grid (xi )

ni=1

on an underlying domain.

Real observations usually exhibit statistical behaviour, e.g., we observe

Yi = uf (xi ) + εi , i = 1, . . . , n, εi ∼i.i.d. N(0, 1),

or equivalently for W a Gaussian white noise in H2, the functional equation

Y = uf + εW, ε > 0 (≈ 1/√n) a noise level.

If H2 is infinite-dimensional, W is not defined as a proper Borel randomvariable in H2. “The data are rougher than uf ”.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 7 / 23

Bayes solutions of statistical inverse problems

A principled approach to statistical inverse problems is the Bayesian one.

Model the function f by some prior distribution Π on function space and useBayes’ rule to compute the conditional posterior distribution

f ∼ Π, Y |f ∼ Puf ⇒ f |Y ∼ dPuf (Y )dΠ(f )∫dPuf (Y )dΠ(f )

where dPf ≡ dPuf is the law of a Gaussian white noise shifted by uf .

The prior provides a regularisation step, common in inverse problems.

But the posterior distribution also provides ‘uncertainty quantification’.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 8 / 23

Bayes solutions of statistical inverse problems

A principled approach to statistical inverse problems is the Bayesian one.

Model the function f by some prior distribution Π on function space and useBayes’ rule to compute the conditional posterior distribution

f ∼ Π, Y |f ∼ Puf ⇒ f |Y ∼ dPuf (Y )dΠ(f )∫dPuf (Y )dΠ(f )

where dPf ≡ dPuf is the law of a Gaussian white noise shifted by uf .

The prior provides a regularisation step, common in inverse problems.

But the posterior distribution also provides ‘uncertainty quantification’.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 8 / 23

Bayes solutions of statistical inverse problems

A principled approach to statistical inverse problems is the Bayesian one.

Model the function f by some prior distribution Π on function space and useBayes’ rule to compute the conditional posterior distribution

f ∼ Π, Y |f ∼ Puf ⇒ f |Y ∼ dPuf (Y )dΠ(f )∫dPuf (Y )dΠ(f )

where dPf ≡ dPuf is the law of a Gaussian white noise shifted by uf .

The prior provides a regularisation step, common in inverse problems.

But the posterior distribution also provides ‘uncertainty quantification’.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 8 / 23

Bayes solutions of statistical inverse problems

A principled approach to statistical inverse problems is the Bayesian one.

Model the function f by some prior distribution Π on function space and useBayes’ rule to compute the conditional posterior distribution

f ∼ Π, Y |f ∼ Puf ⇒ f |Y ∼ dPuf (Y )dΠ(f )∫dPuf (Y )dΠ(f )

where dPf ≡ dPuf is the law of a Gaussian white noise shifted by uf .

The prior provides a regularisation step, common in inverse problems.

But the posterior distribution also provides ‘uncertainty quantification’.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 8 / 23

Example I: X -ray transforms and transport PDEs

For M a ‘simple’ manifold in Rd and unknown f : M → R, considerevaluating integrals along geodesics γ(x,v)

uf (x , v) =

∫ τ(x,v)

0

f (γ(x,v)(t))dt, x ∈ ∂M,

where τ(x , v) is the exit time of a geodesic from x into M in v -direction.

In integral geometry the map u(f ) = uf is known as the X-ray transform off . When M equals the ‘flat’ disk, uf is the classical Radon transform of f .

The goal is to reconstruct f from noisy measurements Y = uf + εW of ufwhere W is white noise in ‘geodesic space’ L2(∂+SM).

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 9 / 23

Example I: X -ray transforms and transport PDEs

For M a ‘simple’ manifold in Rd and unknown f : M → R, considerevaluating integrals along geodesics γ(x,v)

uf (x , v) =

∫ τ(x,v)

0

f (γ(x,v)(t))dt, x ∈ ∂M,

where τ(x , v) is the exit time of a geodesic from x into M in v -direction.

In integral geometry the map u(f ) = uf is known as the X-ray transform off . When M equals the ‘flat’ disk, uf is the classical Radon transform of f .

The goal is to reconstruct f from noisy measurements Y = uf + εW of ufwhere W is white noise in ‘geodesic space’ L2(∂+SM).

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 9 / 23

Example I: X -ray transforms and transport PDEs

For M a ‘simple’ manifold in Rd and unknown f : M → R, considerevaluating integrals along geodesics γ(x,v)

uf (x , v) =

∫ τ(x,v)

0

f (γ(x,v)(t))dt, x ∈ ∂M,

where τ(x , v) is the exit time of a geodesic from x into M in v -direction.

In integral geometry the map u(f ) = uf is known as the X-ray transform off . When M equals the ‘flat’ disk, uf is the classical Radon transform of f .

The goal is to reconstruct f from noisy measurements Y = uf + εW of ufwhere W is white noise in ‘geodesic space’ L2(∂+SM).

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 9 / 23

Bernstein-von Mises theorems for X -ray transforms

Draw f from any standard Gaussian process prior Π, and rescale it near theboundary by 1/

√dM , where dM(x), x ∈ M, is the distance function to ∂M.

Sampling from the Gaussian posterior Π(·|Y ), and computation of the MAPestimate f , are fairly straightforward here due to linearity.

Theorem (Monard, N, Paternain (2017))

If f ∼ Π(·|Y ), then for every ψ ∈ C∞(M) we have as ε→ 0 in PYf0

-probability,

L(ε−1〈f − f , ψ〉L2(M)|Y

)→weakly N (0, ‖u[u∗u]−1(ψ)‖2

L2(∂+(SM))).

The crucial invertibility result for u∗u is in itself novel and based onarguments from micro-local analysis of pseudo-differential operators.

Notice that (u∗u)(1) /∈ C∞(M), a complication related to ‘boundary effects’,necessitating the re-weighting near ∂M.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 10 / 23

Bernstein-von Mises theorems for X -ray transforms

Draw f from any standard Gaussian process prior Π, and rescale it near theboundary by 1/

√dM , where dM(x), x ∈ M, is the distance function to ∂M.

Sampling from the Gaussian posterior Π(·|Y ), and computation of the MAPestimate f , are fairly straightforward here due to linearity.

Theorem (Monard, N, Paternain (2017))

If f ∼ Π(·|Y ), then for every ψ ∈ C∞(M) we have as ε→ 0 in PYf0

-probability,

L(ε−1〈f − f , ψ〉L2(M)|Y

)→weakly N (0, ‖u[u∗u]−1(ψ)‖2

L2(∂+(SM))).

The crucial invertibility result for u∗u is in itself novel and based onarguments from micro-local analysis of pseudo-differential operators.

Notice that (u∗u)(1) /∈ C∞(M), a complication related to ‘boundary effects’,necessitating the re-weighting near ∂M.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 10 / 23

Bernstein-von Mises theorems for X -ray transforms

Draw f from any standard Gaussian process prior Π, and rescale it near theboundary by 1/

√dM , where dM(x), x ∈ M, is the distance function to ∂M.

Sampling from the Gaussian posterior Π(·|Y ), and computation of the MAPestimate f , are fairly straightforward here due to linearity.

Theorem (Monard, N, Paternain (2017))

If f ∼ Π(·|Y ), then for every ψ ∈ C∞(M) we have as ε→ 0 in PYf0

-probability,

L(ε−1〈f − f , ψ〉L2(M)|Y

)→weakly N (0, ‖u[u∗u]−1(ψ)‖2

L2(∂+(SM))).

The crucial invertibility result for u∗u is in itself novel and based onarguments from micro-local analysis of pseudo-differential operators.

Notice that (u∗u)(1) /∈ C∞(M), a complication related to ‘boundary effects’,necessitating the re-weighting near ∂M.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 10 / 23

Bernstein-von Mises theorems for X -ray transforms

Draw f from any standard Gaussian process prior Π, and rescale it near theboundary by 1/

√dM , where dM(x), x ∈ M, is the distance function to ∂M.

Sampling from the Gaussian posterior Π(·|Y ), and computation of the MAPestimate f , are fairly straightforward here due to linearity.

Theorem (Monard, N, Paternain (2017))

If f ∼ Π(·|Y ), then for every ψ ∈ C∞(M) we have as ε→ 0 in PYf0

-probability,

L(ε−1〈f − f , ψ〉L2(M)|Y

)→weakly N (0, ‖u[u∗u]−1(ψ)‖2

L2(∂+(SM))).

The crucial invertibility result for u∗u is in itself novel and based onarguments from micro-local analysis of pseudo-differential operators.

Notice that (u∗u)(1) /∈ C∞(M), a complication related to ‘boundary effects’,necessitating the re-weighting near ∂M.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 10 / 23

Asymptotic normality of the MAP estimate

• From convergence of moments in the Bernstein-von Mises theorem we candeduce asymptotic optimality of the MAP estimate f maximising

Qε(f ) = 2〈Y , uf 〉L2(M) − ‖uf ‖2L2(M) − λ‖

√dM f ‖2

Hβ , β > d/2.

Theorem (Monard, N, Paternain (2017))

If f0 ∈ Cα(O), α > 0, we have for every ψ ∈ C∞(M) and under PYf0

, as ε→ 0,

1

ε〈f − f0, ψ〉L2(O) →d Z ∼ N (0, ‖u[u∗u]−1(ψ)‖2

L2(∂+SM)).

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 11 / 23

A confident credible set for the Tikhonov regulariser

• Consider a Bayesian credible interval

Cε = {x ∈ R : |〈f , ψ〉 − x | ≤ Rε}, ψ ∈ C∞(M),

where Rε is chosen, for some given significance level α, such that

Π(Cε|Y ) = 1− α, 0 < α < 1.

• The previous theorems imply frequentist coverage properties of Cε, namely

PYf0 (〈f0, ψ〉 ∈ Cε)→ 1− α, and ε−1Rε →PY

f0 Φ−1(1− α)

as ε→ 0. Here Φ = Pr(|Z | ≤ ·) with Z ∼ N(0, ‖u[u∗u]−1(ψ)‖2L2(∂+(SM))).

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 12 / 23

Implementation with Gaussian priors

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 13 / 23

Example II: Schrodinger equation

We now turn to a non-linear inverse problem arising in PDEs.

For O a bounded regular domain in Rd , let f : O → [0,∞) be an unknownpotential and for ∆ the standard Laplacian consider solutions u = uf to thetime-independent Schrodinger equation

2u − f u = 0 on O, s.t. u = g on ∂O,

where g : ∂O → [0,∞) prescribes Dirichlet boundary conditions.

From the Feynman-Kac formula, for (Ws : s ≥ 0) a Brownian motion startedat x with exit time τO from O

uf (x) = E x[g(WτO )e−

∫ τO0 f (Ws )ds

], x ∈ O.

Here the forward map f 7→ u(f ) is non-linear.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 14 / 23

Example II: Schrodinger equation

We now turn to a non-linear inverse problem arising in PDEs.

For O a bounded regular domain in Rd , let f : O → [0,∞) be an unknownpotential and for ∆ the standard Laplacian consider solutions u = uf to thetime-independent Schrodinger equation

2u − f u = 0 on O, s.t. u = g on ∂O,

where g : ∂O → [0,∞) prescribes Dirichlet boundary conditions.

From the Feynman-Kac formula, for (Ws : s ≥ 0) a Brownian motion startedat x with exit time τO from O

uf (x) = E x[g(WτO )e−

∫ τO0 f (Ws )ds

], x ∈ O.

Here the forward map f 7→ u(f ) is non-linear.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 14 / 23

Example II: Schrodinger equation

We now turn to a non-linear inverse problem arising in PDEs.

For O a bounded regular domain in Rd , let f : O → [0,∞) be an unknownpotential and for ∆ the standard Laplacian consider solutions u = uf to thetime-independent Schrodinger equation

2u − f u = 0 on O, s.t. u = g on ∂O,

where g : ∂O → [0,∞) prescribes Dirichlet boundary conditions.

From the Feynman-Kac formula, for (Ws : s ≥ 0) a Brownian motion startedat x with exit time τO from O

uf (x) = E x[g(WτO )e−

∫ τO0 f (Ws )ds

], x ∈ O.

Here the forward map f 7→ u(f ) is non-linear.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 14 / 23

Inverse Schrodinger equation as a scattering problem

• Physically f models an ‘attenuation’ or ‘cooling’ of the steady state temperaturedistribution in the classical heat equation with initial boundary temperaturesprescribed by g . The local amount of ‘cooling’ is described by f . For applicationssee Bal and Uhlmann (2010) and Bao and Li (2005).

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 15 / 23

A series prior & a posterior contraction theorem

We are observing Y = uf + εW where u = uf is the solution of

2u − fu = 0 on O, s.t. u = g on ∂O

with smooth g > 0. The inverse problem is to infer the potential f > 0 fromthe data as the noise level ε→ 0.

We model f by a prior Π induced by the random function

log f =∑l≤J,r

bl,rΦOl,r , bl,r ∼i.i.d. U(−B2−l(s+d/2), 2−l(s+d/2)B),

where the (ΦOl,r ) form a (boundary corrected) wavelet basis of L2(O).

We regard s,B as given and choose J such that 2J ≈ ε−2s/(2s+4+d).

Theorem (N 2017)

Suppose f0 > 0 satisfies ‖ log f0‖C sc (O) ≤ B for some s > 2 + d/2. If Π(·|Y ) is the

posterior distribution arising from the above prior Π, then, as ε→ 0 we have

Π(f : ‖f − f0‖L2 ≥ Mε2s/(2s+4+d) logγ(1/ε)|Y

)→ 0 in PY

f0 -probability.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 16 / 23

A series prior & a posterior contraction theorem

We are observing Y = uf + εW where u = uf is the solution of

2u − fu = 0 on O, s.t. u = g on ∂O

with smooth g > 0. The inverse problem is to infer the potential f > 0 fromthe data as the noise level ε→ 0.

We model f by a prior Π induced by the random function

log f =∑l≤J,r

bl,rΦOl,r , bl,r ∼i.i.d. U(−B2−l(s+d/2), 2−l(s+d/2)B),

where the (ΦOl,r ) form a (boundary corrected) wavelet basis of L2(O).

We regard s,B as given and choose J such that 2J ≈ ε−2s/(2s+4+d).

Theorem (N 2017)

Suppose f0 > 0 satisfies ‖ log f0‖C sc (O) ≤ B for some s > 2 + d/2. If Π(·|Y ) is the

posterior distribution arising from the above prior Π, then, as ε→ 0 we have

Π(f : ‖f − f0‖L2 ≥ Mε2s/(2s+4+d) logγ(1/ε)|Y

)→ 0 in PY

f0 -probability.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 16 / 23

A series prior & a posterior contraction theorem

We are observing Y = uf + εW where u = uf is the solution of

2u − fu = 0 on O, s.t. u = g on ∂O

with smooth g > 0. The inverse problem is to infer the potential f > 0 fromthe data as the noise level ε→ 0.

We model f by a prior Π induced by the random function

log f =∑l≤J,r

bl,rΦOl,r , bl,r ∼i.i.d. U(−B2−l(s+d/2), 2−l(s+d/2)B),

where the (ΦOl,r ) form a (boundary corrected) wavelet basis of L2(O).

We regard s,B as given and choose J such that 2J ≈ ε−2s/(2s+4+d).

Theorem (N 2017)

Suppose f0 > 0 satisfies ‖ log f0‖C sc (O) ≤ B for some s > 2 + d/2. If Π(·|Y ) is the

posterior distribution arising from the above prior Π, then, as ε→ 0 we have

Π(f : ‖f − f0‖L2 ≥ Mε2s/(2s+4+d) logγ(1/ε)|Y

)→ 0 in PY

f0 -probability.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 16 / 23

Formulation of the Bernstein-von Mises theorem

For a fixed function ψ we can now ask whether a Bernstein–von Misestheorem holds true:

Let f ∼ Π(·|Y ) and f the posterior mean. As ε→ 0 do we have

ε−1

∫O

(f − f )ψ|Y →d N(0, I−1f0

(ψ))

in PYf0

-probability? If so, what is the ‘correct’ asymptotic variance I−1f0

?

In fact we want more: we wish to prove simultaneous convergence of thestochastic processes(

ε−1

∫O

(f − f )ψ : ψ ∈ Ψ|Y)→d (X (ψ) : ψ ∈ Ψ),

where Ψ is a maximal class of test functions and X (ψ) ∼ N(0, I−1f0

(ψ)).

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 17 / 23

Formulation of the Bernstein-von Mises theorem

For a fixed function ψ we can now ask whether a Bernstein–von Misestheorem holds true:

Let f ∼ Π(·|Y ) and f the posterior mean. As ε→ 0 do we have

ε−1

∫O

(f − f )ψ|Y →d N(0, I−1f0

(ψ))

in PYf0

-probability? If so, what is the ‘correct’ asymptotic variance I−1f0

?

In fact we want more: we wish to prove simultaneous convergence of thestochastic processes(

ε−1

∫O

(f − f )ψ : ψ ∈ Ψ|Y)→d (X (ψ) : ψ ∈ Ψ),

where Ψ is a maximal class of test functions and X (ψ) ∼ N(0, I−1f0

(ψ)).

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 17 / 23

A canonical limiting Gaussian distribution

• Define a Gaussian process

X = (X (ψ)), EX (ψ)X (ψ′) = 〈Sf0 (ψ/uf0 ),Sf0 (ψ′/uf0 )〉L2(O), ψ, ψ′ ∈ Cαc (O),

as the image of a standard Gaussian white noise W under the Schrodinger-typeoperator ψ 7→ Sf0 (ψ/uf0 ), Sf0 (h) = (∆/2)h − f0h.

• If a sequence of stochastic processes (Xn(ψ)) is to converge uniformly in ψ ∈ Ψtowards X, then the law Nf0 of X needs to be tight for the supremum norm on Ψ.

• If Ψ = Ψα consists of the unit ball of an α-Holder space Cαc (O), α > 0, thenthe maximal spaces where this is possible are characterised in the following

Theorem (N 2017)

The Gaussian measure Nf0 induces a tight Gaussian probability measure on thetopological dual space (Cαc (O))∗ when α > 2 + d/2 but not when α ≤ 2 + d/2.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 18 / 23

The Bernstein-von Mises theorem

Theorem (N 2017)

Let α > 2 + d/2. Assume ‖ log f0‖C sc (O) < B for s > max(2 + d/2, d). Let

f ∼ Π(·|Y ) with posterior mean f , and denote by β any metric for weakconvergence of probability distributions on (Cαc (O))∗. Then as ε→ 0,

β(L(ε−1(f − f )|Y ),Nf0

)→PY

f0 0.

Thus for α > 2 + d/2 however, a full infinite-dimensional Gaussianapproximation of the posterior is available in the dual space (Cαc )∗

From the direct problem in Castillo and Nickl (2013), we may expect to beable to ‘denoise the white noise’ in

(C d/2+γc )∗, γ > 0,

but the structure of the limiting Gaussian measure Nf0 implies that we pay aprice for the ill-posedness and that the BvM-theorem holds only in

(C d/2+2+γc )∗, γ > 0.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 19 / 23

The Bernstein-von Mises theorem

Theorem (N 2017)

Let α > 2 + d/2. Assume ‖ log f0‖C sc (O) < B for s > max(2 + d/2, d). Let

f ∼ Π(·|Y ) with posterior mean f , and denote by β any metric for weakconvergence of probability distributions on (Cαc (O))∗. Then as ε→ 0,

β(L(ε−1(f − f )|Y ),Nf0

)→PY

f0 0.

Thus for α > 2 + d/2 however, a full infinite-dimensional Gaussianapproximation of the posterior is available in the dual space (Cαc )∗

From the direct problem in Castillo and Nickl (2013), we may expect to beable to ‘denoise the white noise’ in

(C d/2+γc )∗, γ > 0,

but the structure of the limiting Gaussian measure Nf0 implies that we pay aprice for the ill-posedness and that the BvM-theorem holds only in

(C d/2+2+γc )∗, γ > 0.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 19 / 23

Optimal credible sets I

• For a fixed ‘credibility level’ 1− β, β > 0 and test function ψ, let

Cε = {x ∈ R : |x − 〈f , ψ〉L2 | ≤ Rε}

with posterior quantile constants Rε chosen such that Π(Cε|Y ) = 1− β.

Corollary

Let ψ ∈ Cαc (O) with α > 2 + d/2. Then as ε→ 0 we have

PYf0 (〈f0, ψ〉L2 ∈ Cε)→ 1− β,

as ε→ 0 and the diameter Rε of Cε satisfies

ε−1Rε →PYf0 Φ−1(1− β)

where Φ−1 is the inverse of the map t 7→ N(0, ‖Sf0 (ψ/uf0 )‖2L2(O))([−t, t]).

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 20 / 23

Optimal credible sets II

• Intersecting all the previous credible sets in ψ ∈ Ψ and using thesimultaneous-in-ψ Gaussian approximation, we construct a global confidence set.

• Formally, let Cε ⊂ supp(Π(·|Y )) be the smallest (Cαc (O))∗-ball centred at theposterior mean f for which Π(Cε|Y ) = 1− β, β > 0.

Theorem (N 2017)

The above credible set satisfies

PYf0 (f0 ∈ Cε)→ 1− β,

as ε→ 0. Its diameter in L1(K )-norm, for K any compact subset of O, is of(near) minimax-optimal order: for any κ > 0,

|Cε|L1(K) = OPYf0

(ε2s/(2s+4+d)ε−κ

).

• The proof uses the BvM theorem in (Cα(O))∗, α = 2 + d/2 + ε, combined withideas from interpolation theory.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 21 / 23

Outlook & Extensions: Statistics and PDEs

• Our techniques give a template to obtain results of a similar nature in otherpossibly non-linear inverse problems.

• Crucial invertibility results for the ‘information operator’ require potentiallynovel PDE techniques.

• Our results do NOT provide universal guarantees for blind Bayesian methods –in fact the results depend heavily on the inverse problem and the kind ofregularisation used. Many open questions remain, particularly for non-linearinverse problems.

• Particularly the validity of the use of Gaussian process priors for non-linearproblems in PDE seems a delicate question, some positive results are work inprogress.

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 22 / 23

References

R. Nickl, Bernstein-von Mises theorems for non-linear inverse problems I:Schrodinger equation, arXiv 2017.

F. Monard, R. Nickl, G. Paternain, Efficient nonparametric Bayesian inference forX -ray transforms, arXiv 2017, Annals of Statistics, to appear.

Acknowledgements:

Richard Nickl (U. of Cambridge) Bayesian Inverse Problems INI 2018 23 / 23