Data assimilation – Schrödinger's perspective · Data assimilation – Schrödinger’s perspective Sebastian Reich () Universität Potsdam/ University of Reading IMS NUS, August

Data assimilation – Schrödinger’s perspective

Sebastian Reich (www.sfb1294.de)

Universität Potsdam/ University of Reading

IMS NUS, August 30, 2018

Universität Potsdam/ University of Reading 1

Core components of DA

Interacting Particle Systems

Coupling of Measures

Mean Field Equations


Entropic couplings


Ensemble prediction I

Ensemble prediction system with M members:d

dtZit

= f (Zit) , Zi0 ∼ π0 , i = 1, . . . ,M .

Source: ECMWF


Ensemble prediction II

Source: The quiet revolution of numerical weather prediction, Nature, 2015


Ensemble–based data assimilation

Continuous–in–time assimilation of precipitation data yt:

d

dtZit

= f (Zit) + α1Qt(Z

it− Zt) + α2Kt(yt − h(Zi

t))

Additional terms:

É Inflation: α1 > 0, Qt ∈ RNz×Nz spd,

Zt =1

M

∑

i

Zit

É Nudging: α2 > 0, gain matrix Kt ∈ RNz×Ny , forward operator h.


SDEs & inflation I

Forward SDEdZ+

t = ft(Z+t )dt + γ1/2dW+

t ,

X+0 ∼ π0, t ∈ [0,T], W+

t standard Brownian motion forward in time.

Generates probability measure P[0,T] over C([0,T],RNz ) with marginaldensities πt, i.e. Zt ∼ πt.

The same measure is generated by backward SDE

dZ−t

= bt(Z−t

)dt + γ1/2dW−t,

W−t Brownian motion backward in time, X−T ∼ πT .

It holds thatbt(z) = ft(z)− γ∇z logπt(z) .


SDEs & inflation II

Fokker-Planck equation for marginals:

∂tπt = −∇z · (πtft) +γ

2∆zπt = −∇z · (πtbt) −

γ

2∆zπt

= −∇z · (πtut)

with

ut(z) =1

2(ft(z) + bt(z)) = ft(z)−

γ

2∇z logπt(z) .

Replace forward SDE by mean field equation

d

dtZt = ft(Zt)−

γ

2∇z logπt(Zt) , Z0 ∼ π0 .

Remark. Generates path measure Q[0,T] which is different from SDEmeasure P[0,T]; only marginals πt agree!


SDEs & inflation III

Lagrangian interacting particles (Gaussian approximation to πt):

d

dtZit

= ft(Zit) +

γ

2(Pt)

−1(Zit− Zt) ,

Zi0 ∼ π0, i = 1, . . . ,M, empirical covariance matrix

Pt =1

M− 1

∑

i

(Zit− Zt)(Zi

t− Zt)T .

Connection to inflation:

Qt = (Pt)−1, α1 = γ/2 .


References

É Daley, R., Atmospheric Data Analysis, Cambridge University Press,1991.

É Nelson, E., Dynamical Theories of Brownian Motion, PrincetonUniversity Press, 1967.

É del Moral, P., Mean field simulation for Monte Carlo integration, CRCPress, 2013.

É SR, Cotter, J., Probabilistic Forecasting and Bayesian DataAssimilation, Cambridge University Press, 2015.

É Law, K. et al., Data Assimilation – A Mathematical Introduction,Springer, 2015.

É Qiang Liu, Stein Variational Gradient Descent as Gradient Flow,arXiv:1704.07520v2, 2017.


Recap: Assimilation of precipitation data

Continuous–in–time assimilation of precipitation data yt:

d

dtZit

= f (Zit) + α1Qt(Z

it− Zt) + α2Kt(yt − h(Zi

t))

É Inflation: α1 > 0, Qt ∈ RNz×Nz spd,

Zt =1

M

∑

i

Zit

É Nudging: α2 > 0, gain matrix Kt ∈ RNz×Ny , forward operator h.


SDEs & nudging I

Given a likelihood function

L(z[0,T]) := exp

�

−∫ T

0Vt(zt)dt

�

.

For example

Vt(z) =β

2‖h(z)− yt‖2 .

Bayes theorem (Radon–Nikodym):

dbP[0,T]

dP[0,T](z[0,T]) :=

L(z[0,T])

P[0,T][L].

The measure bP[0,T] solves the filtering/smoothing problem of SDEinference.


SDEs & nudging II

Mean–field formulation:

dbZt =�

ft(bZt) + Pt∇zψt(bZt)

dt +p

γdWt

with the potential ψt satisfying the elliptic PDE

∇z · (bπtPt∇zψt) = bπt(Vt − Vt)

bZ0 ∼ bπ0 = π0, Vt = bπt[Vt], Pt = cov (bZt).


SDEs & nudging III

If bπt Gaussian and h(z) = Hz in Vt, then

Pt∇zψt(z) = βPtHT

yt −Hz +HZt

2

!

.

Compare to nudging scheme:

α2Kt(yt − Hz) ,

i.e., Kt = PtHT, β = α2, but innovation different.


Combining nudging and inflation

Ensemble Kalman–Bucy filter (Gaussian approximation to πt):

d

dtZit

= ft(Zit) +

γ

2(Pt)

−1(Zit− Zt) + βKt

yt −h(Zi

t) + ht

2

!

Zi0 ∼ π0, i = 1, . . . ,M, empirical covariance matrices

Pt =1

M− 1

∑

i

(Zit− Zt)(Zi

t− Zt)T ,

Kt =1

M− 1

∑

i

(Zit− Zt)(h(Zi

t)− ht)T .

Remark. Feedback particle filter for likelihood with

Vt(z)dt⇒1

2‖h(z)‖2dt − h(z)Tdyt .


References

É Moser, J., On the volume elements on a manifold, Trans. Amer. Math.Soc., 1965.

É SR, A dynamical systems framework for intermittent dataassimilation, BIT, 2010.

É Daum, F., Huang, J., Particle filter for nonlinear filters, ICASSP, IEEE,2011.

É Bergemann, K, SR, An ensemble Kalman–Bucy filter forcontinuous–time data assimilation, Meteorolog. Zeitschrift, 2012.

É Yang, T. et al., Feedback particle filter, IEEE Trans. Autom. Control,2013.

É Taghvaei, A. et al., Kalman filter and its modern extensions for thecontinuous–time nonlinear filtering problem, J. Dyn. Sys., Meas., andControl, 2018.

É de Wiljes, et al., Long–time stability and accuracy of the ensembleKalman–Bucy filter for fully observed processes and smallmeasurement noise, SIAM J. Appl. Dyn. Sys., 2018.


Discrete–time observations I

StateState StateModel Model

Data

Assimilation

Data

Assimilation

Data

Assimilation

time


Discrete–time observations II

Discrete–time observations:

ytn = h(Ztn) +R1/2 Ξtn , n = 1, . . . ,N .

Likelihood function:

L(z[0,T]) := exp

�

−1

2

∑

n

(ytn − h(ztn))TR−1(ytn − h(ztn))

�

.

Bayes:dbP[0,T]

dP[0,T](Z+

[0,T]) :=L(Z+

[0,T])

P[0,T][L].


Discrete–time observations III

For simplicity: single observation, i.e.

N = 1, R = I, t1 = T, L(z) =1

2‖yT − h(z)‖2 .

But keep recursive nature of sequential DA in mind!

Four main players:

É last analysis: π0

É forecast based on last analysis: πTÉ new analysis at time t = T (Bayes, filtering distribution): bπTÉ smoothing distribution at t = 0: bπ0


Discrete–time observations IV

⇡0 ⇡1Prediction

Filtering

b⇡1b⇡0

Smoothing

Optimal Control

Data

Timet = 0 t = 1

Schrödinger

y1


Example: Gaussian mixture I

π1 is a Gaussian mixture, Gaussian likelihood, bπ1 weighted Gaussianmixture.

prior

-1 -0.5 0 0.5 1

space

0

0.02

0.04

0.06

0.08

0.1

-4 -2 0 2 4

space

0

0.1

0.2

0.3

0.4

0.5prediction

smoothing

-1 -0.5 0 0.5 1

space

0

0.05

0.1

0.15

0.2

-4 -2 0 2 4

space

0

0.5

1

1.5filtering


Example: Gaussian mixture II

-2 -1.5 -1 -0.5 0 0.5 1 1.5

space

0

0.5

1

1.5

2

2.5optimal control proposals

-2 -1.5 -1 -0.5 0 0.5 1 1.5

space

0

0.5

1

1.5

2

2.5Schroedinger proposal

Applied to smoothing PDF bπ0. Applied to π0 directly!


Example: Brownian dynamics

Scalar Brownian dynamics under a double well potential (γ = 0.5):

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

variable z

0

2

4

6

8

10

po

ten

tia

l

initial mean

observation

-5 0 5

0

0.5

1

1.5

210

4 last analysis

-5 0 5

0

1

2

3

410

4 forecast

-5 0 5

0

1

2

3

410

4 smoother

-5 0 5

0

2

4

610

4new analysis

The forecast and the new analysis at T = 0.5 are nearly singular withrespect to each other.

The relation between the last analysis (π0) and the smoother (bπ0) issomewhat better.


Discrete–time observations III

Forward-backward smoother iteration:

É Forward:dZ+

t = f (Z+t )dt +

p

γW+t ,

Z+0 ∼ π0. Yields πt.

É Backward:

dbZ−t

= f (bZ−t

)dt − γ∇z logπt(bZ−t )dt +p

γW−t,

with bZ−T ∼ bπT andbπT(z) ∝ L(z)πT(z) .

Yields bπt.

Smoother:

dbZ+t = f (bZ+

t )dt + γ∇z logbπt

πt(bZ+

t )dt +p

γW+t

bZ+0 ∼ bπ0, bZ+

T ∼ bπT .


Example: Brownian dynamics II

Scalar Brownian dynamics under a double well potential (γ = 0.5):

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

variable z

0

2

4

6

8

10

po

ten

tia

l

initial mean

observation

-5 0 5

0

0.5

1

1.5

210

4 last analysis

-5 0 5

0

1

2

3

410

4 forecast

-5 0 5

0

1

2

3

410

4 smoother

-5 0 5

0

2

4

610

4new analysis

The forward smoother SDE links the smoother measure bπ0 with bπT .

Still requires transforming π0 into bπ0 (but now at t = 0).


Schrödinger problem of DA I

A different perspective on sequential DA:

Schrödinger problem. Find the measureeP[0,T] which minimises the Kullback-Leiblerdivergence

eP[0,T] = arg infQ�P

KL(Q[0,T]||P[0,T])

subject to the constraints

eπ0 = q0 = π0, eπT = qT = bπT .

-5 0 5

0

0.5

1

1.5

210

4 last analysis

-5 0 5

0

1

2

3

410

4 forecast

-5 0 5

0

1

2

3

410

4 smoother

-5 0 5

0

2

4

610

4new analysis

The measure eP[0,T] is generated by a controlled SDE

deZ+t = f (eZ+

t )dt + ut(eZ+t )dt +

p

γdW+t .


Schrödinger problem of DA II

Find an initial distribution ϕ+0 and its evolution ϕ+

t under the forward SDE

dZ+t = f (Z+

t ) dt +p

γdW+t

such that the associated backward SDE

dZ−t

=�

f (Z−t

)− γ∇z logϕ+t (Z−

t)�

dt +p

γdW−t

with final condition ϕ−T := bπT leads to marginals ϕ−t such that

ϕ−0 = π0 .

Then the control

ut = γ∇z logϕ−t

ϕ+t

solves the Schrödinger problem.


Example I: Lorenz–63

Starting point:

(i) Euler-Maruyama discretization

Zn+1 = Zn + f (Zn)∆t + (γ∆t)1/2Ξn

giving rise to transition kernel

q+(zn+1|zn) .

(ii) Given M samples zjn from Zn and M samples zi

n+1 from Zn+1 definethe M× M matrix Q by

qij = q+(zin+1|z

jn

) .

(iii) Observations at time tn+1 lead to importance weights win+1.


Example II: Lorenz–63

Schrödinger problem:

P∗ = argminP∈Π

KL(P||Q) , KL(P||Q) :=M∑

i,j=1

pij logpij

qij,

subject to

Π =

(

P ∈ RM×M : pij ≥ 0,M∑

i=1

pij = 1/M,M∑

j=1

pij = wi

)

Application to DA

a) Schrödinger resample: Use P∗ to sample from bπn+1

P[bZjn+1] = zi

n+1] = Mp∗ij.

b) Schrödinger transform: Set

bzjn+1 = M

M∑

i=1

zin+1p

∗ij

+ (γ∆t)Ξj .


Example III: Lorenz–63

0.05 0.1 0.15 0.2

1/M

10-3

10-2

RM

SE

RMS errors as a function M

particle filter

Schrodinger transform

Schrodinger resample

EnKBF

Stochastic Lorenz–63, observations at every time–step, observation errorscaled with inverse time–step.


References

É Schrödinger, E., Über die Umkehrung der Naturgesetze,Sitzungsberichte der Preuss. Phys. Math. Klasse, 1932.

É Yongxin Chen et al., On the relation between optimal transport andSchrödinger bridges: A stochastic control viewpoint, J. Optim. Theoryand Appl., 2016.

É Fearnhead, P., Künsch, H.R., Particle filters and data assimilation,Annual Review of Statistics and Its Application, 2018.

É Guarniero et al., The iterated auxiliary particle filter, J. AmericanStatist. Assoc., 2017.

É van Leeuwen, P.-J., Nonlinear data assimilation, Frontiers in AppliedDynamical Systems, Vol. 2, 2015.

É Peyre, G., Cuturi, M., Computational Optimal Transport,arXiv:1803.00567, 2018.

É SR, Data assimilation, Acta Numerica, 2019, preliminary version onarXiv 1807.08351.


OT meets IS I

Importance sampling (IS): Available realisations ZiT∼ πT with

importance weights

wi ∝bπT

πT(Zi

T) .

Optimal transport (OT): Instead of resampling, findcoupling/transformation

bZT = ∇zψ(ZT) ,

ZT ∼ πT and bZ ∼ bπT .

More abstractly,

bZT(a) =

∫

ZT(a′)δ(a′ − ∇aψ(a))da′ ,

where A is some random reference variable. For example, A = ZT .


OT meets IS II

Replace the integral by a sum and formally write

bZjT =

M∑

i=1

ZiTdij

RequiresM∑

i=1

dij = 1,1

M

∑

j

dij = wi .

Select an "optimal" transformation through maximising correlation

V(D) =1

M

∑

ij

dijZiT· ZjT =

1

M

∑

j

bZjT · Z

jT .

In addition, either dij ≥ 0 (Ensemble Transform Particle Filter) or

1

M− 1

M∑

i=1

(bZiT− bz̄T)(bZi

T− bz̄T)T =

M∑

i=1

wi(ZiT− bz̄T)(Zi

T− bz̄T)T

(Nonlinear Ensemble Transform Filter).


Numerical example I

Lorenz-63 model, first component observed infrequently (∆t = 0.12) andwith large measurement noise (R = 8):

Figure: RMSEs for various second-order accurate LETFs compared to the ETPF, theESRF, and the SIR PF as a function of the sample size, M.


Numerical example II

Hybrid filter: P := PESRF(α) PETPF(1− α).

Figure: RMSEs for hybrid ESRF (α = 0) and 2nd-order corrected NETF/ETPF (α = 1)as a function of the sample size, M.


References

É Evensen, G., Data assimilation – the ensemble Kalman filter,Springer, 2009.

É SR, A nonparametric ensemble transform method for Bayesianinference, SIAM J. Sci. Comput., 2013.

É SR, Cotter, J., Probabilistic Forecasting and Bayesian DataAssimilation, Cambridge University Press, 2015.

É Tödter J., Ahrens, B., A second–order exact ensemble square rootfilter for nonlinear data assimilation, Month. Weather Rev., 2015.

É Chustagulprom, N. et al., A hybrid ensemble transform particle filterfor nonlinear and spatially extended systems, SIAM/ASA J. UQ, 2016.

É Acevedo, W., et al., Second–order accurate ensemble transformparticle filters, SIAM J. Sci. Comput., 2017.

É Fearnhead, P., Künsch, H.R., Particle filters and data assimilation,Annual Review of Statistics and Its Application, 2018.


Summary

É Continuous–in–time DA naturally leads to various interacting particlesystems.

É Schrödinger problem provides an "optimal" mathematical frameworkfor sequential DA with discrete–in–time observations.

É Numerical implementation nontrivial; good drift corrections can bederived using Gaussian approximations or kernel methods.

É Coupling arguments are central to derivation of interacting particlesystems.

É Relevant to rare event simulations, optimal control problems andderivative–free optimization.


The End

(source: J.B. Handelsman, New Yorker, 05/31/1969)


Collaborators

É Walter Acevedo

É Kay Bergemann

É Yuan Cheng

É Nawinda Chustagulprom

É Colin Cotter

É Jana de Wiljes

É Prashant Mehta

É Wilhelm Stannat

É Amari Taghvaei


Documents

Data assimilation – Schrödinger's perspective · Data assimilation – Schrödinger’s perspective Sebastian Reich () Universität Potsdam/ University of Reading IMS NUS, August